# đī¸ W H I S P E R V O I C E
### SOVEREIGN SPEECH RECOGNITION

[](https://git.lashman.live/lashman/whisper_voice/releases/latest)
[](https://creativecommons.org/publicdomain/zero/1.0/)
> *"The master's tools will never dismantle the master's house."*
>
> **Build your own tools. Run them locally. Free your mind.**
[View Source](https://git.lashman.live/lashman/whisper_voice) âĸ [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues)
## đĄ The Transmission
We are witnessing the **enshittification** of the digital world. What were once vibrant social commons are being walled off, strip-mined for data, and degraded into rent-seeking silos. Your voice is no longer your own; it is a training set for a corporate oracle that charges you for the privilege of listening.
**Whisper Voice** is a small act of sabotage against this trend.
It is built on the axiom of **Technological Sovereignty**. By moving state-of-the-art inference from the server farms to your own silicon, you reclaim the means of digital production. No telemetry. No subscriptions. No "cloud processing" that eavesdrops on your intent.
---
## ⥠The Engine
Whisper Voice operates directly on the metal. It is not an API wrapper; it is an autonomous machine.
| Component | Technology | Benefit |
| :--- | :--- | :--- |
| **Inference Core** | **Faster-Whisper** | Hyper-optimized C++ implementation via **CTranslate2**. Delivers **4x velocity** over standard PyTorch. |
| **Compression** | **INT8 quantization** | Enables Pro-grade models (`Large-v3`) to run on consumer-grade GPUs, democratizing elite AI. |
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only pure intent is processed. |
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. |
### đ Compatibility Matrix (Windows)
The core engine (`CTranslate2`) is heavily optimized for Nvidia tensor cores.
| Manufacturer | Hardware | Status | Notes |
| :--- | :--- | :--- | :--- |
| **Nvidia** | GTX 900+ / RTX | â
**Supported** | Full heavy-metal acceleration. |
| **AMD** | Radeon RX | â ī¸ **CPU Fallback** | Runs on CPU. Valid for `Small/Medium`, slow for `Large`. |
| **Intel** | Arc / Iris | â ī¸ **CPU Fallback** | Runs on CPU. Valid for `Small/Medium`, slow for `Large`. |
| **Apple** | M1 / M2 / M3 | â **Unsupported** | Release is strictly Windows x64. |
> **AMD Users**: v1.0.3 auto-detects GPU failures and silently falls back to CPU.
## đī¸ Universal Transcription
At its core, Whisper Voice is the ultimate bridge between thought and text. It listens with superhuman precision, converting spoken word into written form across **99 languages**.
* **Punctuation Mastery**: Automatically handles capitalization and complex punctuation formatting.
* **Contextual Intelligence**: Smarter than standard dictation; it understands the flow of sentences to resolve homophones and technical jargon ($1.5k vs "fifteen hundred dollars").
* **Total Privacy**: Your private dictation, legal notes, or creative writing never leave your RAM.
### Workflow: `F9 (Default)`
The primary channel for native-language transcription. It transcribes precisely what it hears in the language you speak (or the one you've locked in Settings).
### đ§ Intelligent Correction (New in v1.1.0)
Whisper Voice now integrates a local **Llama 3.2 1B** LLM to act as a "Silent Consultant". It post-processes transcripts to fix grammar or polish style without effectively "chatting" back.
It is strictly trained on a **Forensic Protocol**: it will never lecture you, never refuse to process explicit language, and never sanitize your words. Your profanity is yours to keep.
#### Correction Modes:
* **Standard (Default)**: Fixes grammar, punctuation, and capitalization while keeping every word you said.
* **Grammar Only**: Strictly fixes objective errors (spelling/agreement). Touches nothing else.
* **Rewrite**: Polishes the flow and clarity of your sentences while explicitly preserving your original tone (Casual stays casual, Formal stays formal).
#### Supported Languages:
The correction engine is optimized for **English, German, French, Italian, Portuguese, Spanish, Hindi, and Thai**. It also performs well on **Russian, Chinese, Japanese, and Romanian**.
This approach incurs a ~2s latency penalty but uses **zero extra VRAM** when in Low VRAM mode.
## đ Universal Translation
Whisper Voice v1.0.1 includes a **Neural Translation Engine** that allows you to bridge any linguistic gap instantly.
* **Input**: Speak in French, Japanese, Russian, or **96 other languages**.
* **Output**: The engine instantly reconstructs the semantic meaning into fluent **English**.
* **Task Protocol**: Handled via the dedicated `F10` channel.
### đ Why only English translation?
A common question arises: *Why can't I translate from French to Japanese?*
The architecture of the underlying Whisper model is a **Many-to-English** design. During its massive training phase (680,000 hours of audio), the translation task was specifically optimized to map the global linguistic commons onto a single bridge language: **English**. This allowed the model to reach incredible levels of semantic understanding without the exponential complexity of a "Many-to-Many" mapping.
By focusing its translation decoder solely on English, Whisper achieves "Zero-Shot" quality that rivals specialized translation engines while remaining lightweight enough to run on your local GPU.
---
## đšī¸ Command & Control
### Global Hotkeys
The agent runs silently in the background, waiting for your signal.
* **Transcribe (F9)**: Opens the channel for standard speech-to-text.
* **Translate (F10)**: Opens the channel for neural translation.
* **Customization**: Remap these keys in Settings. The recorder supports complex chords (e.g. `Ctrl + Alt + Space`) to fit your workflow.
### Injection Protocols
* **Clipboard Paste**: Standard text injection. Instant, reliable.
* **Simulate Typing**: Mimics physical keystrokes at superhuman speed (6000 CPM). Bypasses anti-paste restrictions and "protected" windows.
## đ Intelligence Matrix
Select the model that aligns with your available resources.
| Model | VRAM (GPU) | RAM (CPU) | Designation | Capability |
| :--- | :--- | :--- | :--- | :--- |
| `Tiny` | **~500 MB** | ~1 GB | ⥠**Supersonic** | Command & Control, older hardware. |
| `Base` | **~600 MB** | ~1 GB | đ **Very Fast** | Daily driver for low-power laptops. |
| `Small` | **~1 GB** | ~2 GB | ⊠**Fast** | High accuracy English dictation. |
| `Medium` | **~2 GB** | ~4 GB | âī¸ **Balanced** | Complex vocabulary, foreign accents. |
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ⨠**Optimal** | **The Sweet Spot.** Near-Large intelligence, Medium speed. |
| `Large-v3` | **~5 GB** | ~8 GB | đ§ **Maximum** | Professional grade. Uncompromised. |
> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.*
### đ Low VRAM Mode
For users with limited GPU memory (e.g., 4GB cards) or those running heavy games simultaneously, Whisper Voice offers a specialized **Low VRAM Mode**.
* **Behavior**: The AI model is aggressively unloaded from the GPU immediately after every transcription.
* **Benefit**: When idle, the app consumes near-zero VRAM (~0MB), leaving your GPU completely free for gaming or rendering.
* **Trade-off**: There is a "cold start" latency of 1-2 seconds for every voice command as the model reloads from the disk cache.
---
## đ ī¸ Deployment
### đĨ Installation
1. **Acquire**: Download `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
2. **Deploy**: Place it anywhere. It is portable.
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python runtime (~2GB) on first launch.
4. **Sync**: Future updates are handled by the **Smart Bootstrapper**, which surgically updates only changed files, respecting your bandwidth and your settings.
### đ§ Troubleshooting
* **App crashes on start**: Ensure you have [Microsoft Visual C++ Redistributable 2015-2022](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) installed.
* **"Simulate Typing" is slow**: Some applications (remote desktops, legacy games) cannot handle the data stream. Lower the typing speed in Settings to ~1200 CPM.
* **No Audio**: The agent listens to the **Default Communication Device**. Verify your Windows Sound Control Panel.
---
## đ Supported Languages
The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on **Auto-Detect** for fluid multilingual usage.
| | | | | | |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Afrikaans đŋđĻ | Albanian đĻđą | Amharic đĒđš | Arabic đ¸đĻ | Armenian đĻđ˛ | Assamese đŽđŗ |
| Azerbaijani đĻđŋ | Bashkir đˇđē | Basque đĒđ¸ | Belarusian đ§đž | Bengali đ§đŠ | Bosnian đ§đĻ |
| Breton đĢđˇ | Bulgarian đ§đŦ | Burmese đ˛đ˛ | Castilian đĒđ¸ | Catalan đĒđ¸ | Chinese đ¨đŗ |
| Croatian đđˇ | Czech đ¨đŋ | Danish đŠđ° | Dutch đŗđą | English đēđ¸ | Estonian đĒđĒ |
| Faroese đĢđ´ | Finnish đĢđŽ | Flemish đ§đĒ | French đĢđˇ | Galician đĒđ¸ | Georgian đŦđĒ |
| German đŠđĒ | Greek đŦđˇ | Gujarati đŽđŗ | Haitian đđš | Hausa đŗđŦ | Hawaiian đēđ¸ |
| Hebrew đŽđą | Hindi đŽđŗ | Hungarian đđē | Icelandic đŽđ¸ | Indonesian đŽđŠ | Italian đŽđš |
| Japanese đ¯đĩ | Javanese đŽ Indonesa | Kannada đŽđŗ | Kazakh đ°đŋ | Khmer đ°đ | Korean đ°đˇ |
| Lao đąđĻ | Latin đģđĻ | Latvian đąđģ | Lingala đ¨đŠ | Lithuanian đąđš | Luxembourgish đąđē |
| Macedonian đ˛đ° | Malagasy đ˛đŦ | Malay đ˛đž | Malayalam đŽđŗ | Maltese đ˛đš | Maori đŗđŋ |
| Marathi đŽđŗ | Moldavian đ˛đŠ | Mongolian đ˛đŗ | Myanmar đ˛đ˛ | Nepali đŗđĩ | Norwegian đŗđ´ |
| Occitan đĢđˇ | Panjabi đŽđŗ | Pashto đĻđĢ | Persian đŽđˇ | Polish đĩđą | Portuguese đĩđš |
| Punjabi đŽđŗ | Romanian đˇđ´ | Russian đˇđē | Sanskrit đŽđŗ | Serbian đˇđ¸ | Shona đŋđŧ |
| Sindhi đĩđ° | Sinhala đąđ° | Slovak đ¸đ° | Slovenian đ¸đŽ | Somali đ¸đ´ | Spanish đĒđ¸ |
| Sundanese đŽđŠ | Swahili đ°đĒ | Swedish đ¸đĒ | Tagalog đĩđ | Tajik đšđ¯ | Tamil đŽđŗ |
| Tatar đˇđē | Telugu đŽđŗ | Thai đšđ | Tibetan đ¨đŗ | Turkish đšđˇ | Turkmen đšđ˛ |
| Ukrainian đēđĻ | Urdu đĩđ° | Uzbek đēđŋ | Vietnamese đģe | Welsh đ´ķ §ķ ĸķ ˇķ Ŧķ ŗķ ŋ | Yiddish đŽđą |
| Yoruba đŗđŦ | | | | | |
### âī¸ PUBLIC DOMAIN (CC0 1.0)
*No Rights Reserved. No Gods. No Masters. No Managers.*
Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD).