Files

Your Name 0f1bf5f1af Docs: Final polish - 6-col language table and refined manifesto

2026-01-24 19:12:08 +02:00

7.9 KiB

Raw Blame History

🎙️ W H I S P E R V O I C E

SOVEREIGN SPEECH RECOGNITION

"The master's tools will never dismantle the master's house."
Build your own tools. Run them locally. Free your mind.

View Source • Report Issue

📡 The Transmission

We live in an era of enclosure. Our words, our thoughts, and our digital footprints are strip-mined by centralized giants, turned into capital, and sold back to us as "convenience."

Whisper Voice is a rejection of that contract.

It is built on the axiom that your voice belongs to you. By bringing state-of-the-art inference down from the server farms and running it on your own metal, we reclaim a small piece of the digital commons. This software answers to no one but you. It has no telemetry, no subscription, and no masters.

⚡ The Engine

This operates on the silicon. It is not a wrapper. It is a machine.

Component	Technology	Benefit
Inference Core	Faster-Whisper	Hyper-optimized implementation via CTranslate2. Delivers 4x velocity over standard PyTorch execution.
Compression	INT8 quantization	Enables Pro-grade models (`Large-v3`) to run on consumer hardware, democratizing access to high-fidelity AI.
Sensory Gate	Silero VAD	Enterprise-grade Voice Activity Detection filters out the noise, ensuring only intent is captured.
Interface	Qt 6 / QML	Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign.

🌎 Universal Translator

Whisper Voice v1.0.1 introduces a Neural Translation Engine built directly into the core. It bypasses the need for corporate translation APIs entirely.

Input: Speak in French, Japanese, Russian, or 96 other languages.
Output: The engine instantly reconstructs the semantic meaning in fluent English.
Local Execution: No API keys. No data leaks. The translation happens on your GPU.

Dual-Channel Workflow

The application listens on two separate channels simultaneously, allowing for seamless fluid switching between local and international communication.

F9 (Default) -> Transcribe: Types exactly what you say, in the language you speak.
F10 (Default) -> Translate: Translates whatever you say in any language into English.

🕹️ Command & Control

Global Hotkeys

The agent runs silently in the background, waiting for your signal.

Transcribe (F9): Opens the channel for standard speech-to-text.
Translate (F10): Opens the channel for neural translation.
Customization: Remap these keys in Settings. The recorder supports complex chords (e.g. Ctrl + Alt + Space) to fit your workflow.

Injection Protocols

Clipboard Paste: Standard text injection. Instant, reliable.
Simulate Typing: Mimics physical keystrokes at superhuman speed (6000 CPM). Bypasses anti-paste restrictions and "protected" windows.

📊 Intelligence Matrix

Select the model that aligns with your available resources.

Model	VRAM (GPU)	RAM (CPU)	Designation	Capability
`Tiny`	~500 MB	~1 GB	⚡ Supersonic	Command & Control, older hardware.
`Base`	~600 MB	~1 GB	🚀 Very Fast	Daily driver for low-power laptops.
`Small`	~1 GB	~2 GB	⏩ Fast	High accuracy English dictation.
`Medium`	~2 GB	~4 GB	⚖️ Balanced	Complex vocabulary, foreign accents.
`Large-v3 Turbo`	~4 GB	~6 GB	✨ Optimal	The Sweet Spot. Near-Large intelligence, Medium speed.
`Large-v3`	~5 GB	~8 GB	🧠 Maximum	Professional grade. Uncompromised.

Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.

🛠️ Deployment

📥 Installation

Acquire: Download WhisperVoice.exe from Releases.
Deploy: Place it anywhere. It is portable.
Bootstrap: Run it. The agent will self-provision an isolated Python runtime (~2GB) on first launch.
Sync: Future updates are handled by the Smart Bootstrapper, which surgically updates only changed files, respecting your bandwidth and your settings.

🔧 Troubleshooting

App crashes on start: Ensure you have Microsoft Visual C++ Redistributable 2015-2022 installed.
"Simulate Typing" is slow: Some applications (remote desktops, legacy games) cannot handle the data stream. Lower the typing speed in Settings to ~1200 CPM.
No Audio: The agent listens to the Default Communication Device. Verify your Windows Sound Control Panel.

<EFBFBD> Supported Languages

The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on Auto-Detect for fluid multilingual usage.


Afrikaans 🇿🇦	Albanian 🇦🇱	Amharic 🇪🇹	Arabic 🇸🇦	Armenian 🇦🇲	Assamese 🇮🇳
Azerbaijani 🇦🇿	Bashkir 🇷🇺	Basque 🇪🇸	Belarusian 🇧🇾	Bengali 🇧🇩	Bosnian 🇧🇦
Breton 🇫🇷	Bulgarian 🇧🇬	Burmese 🇲🇲	Castilian 🇪🇸	Catalan 🇪🇸	Chinese 🇨🇳
Croatian 🇭🇷	Czech 🇨🇿	Danish 🇩🇰	Dutch 🇳🇱	English 🇺🇸	Estonian 🇪🇪
Faroese 🇫🇴	Finnish 🇫🇮	Flemish 🇧🇪	French 🇫🇷	Galician 🇪🇸	Georgian 🇬🇪
German 🇩🇪	Greek 🇬🇷	Gujarati 🇮🇳	Haitian 🇭🇹	Hausa 🇳🇬	Hawaiian 🇺🇸
Hebrew 🇮🇱	Hindi 🇮🇳	Hungarian 🇭🇺	Icelandic 🇮🇸	Indonesian 🇮🇩	Italian 🇮🇹
Japanese 🇯🇵	Javanese 🇮🇩	Kannada 🇮🇳	Kazakh 🇰🇿	Khmer 🇰🇭	Korean 🇰🇷
Lao 🇱🇦	Latin 🇻🇦	Latvian 🇱🇻	Lingala 🇨🇩	Lithuanian 🇱🇹	Luxembourgish 🇱🇺
Macedonian 🇲🇰	Malagasy 🇲🇬	Malay 🇲🇾	Malayalam 🇮🇳	Maltese 🇲🇹	Maori 🇳🇿
Marathi 🇮🇳	Moldavian 🇲🇩	Mongolian 🇲🇳	Myanmar 🇲🇲	Nepali 🇳🇵	Norwegian 🇳🇴
Occitan 🇫🇷	Panjabi 🇮🇳	Pashto 🇦🇫	Persian 🇮🇷	Polish 🇵🇱	Portuguese 🇵🇹
Punjabi 🇮🇳	Romanian 🇷🇴	Russian 🇷🇺	Sanskrit 🇮🇳	Serbian 🇷🇸	Shona 🇿🇼
Sindhi 🇵🇰	Sinhala 🇱🇰	Slovak 🇸🇰	Slovenian 🇸🇮	Somali 🇸🇴	Spanish 🇪🇸
Sundanese 🇮🇩	Swahili 🇰🇪	Swedish 🇸🇪	Tagalog 🇵🇭	Tajik 🇹🇯	Tamil 🇮🇳
Tatar 🇷🇺	Telugu 🇮🇳	Thai 🇹🇭	Tibetan 🇨🇳	Turkish 🇹🇷	Turkmen 🇹🇲
Ukrainian 🇺🇦	Urdu 🇵🇰	Uzbek 🇺🇿	Vietnamese 🇻e	Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿	Yiddish 🇮🇱
Yoruba 🇳🇬

⚖️ PUBLIC DOMAIN (CC0 1.0)

No Rights Reserved. No Gods. No Masters. No Managers.

Credit to OpenAI (Whisper), Systran (Faster-Whisper), and Silero (VAD).

7.9 KiB Raw Blame History Unescape Escape