5.7 KiB
WHISPER VOICE
SOVEREIGN SPEECH RECOGNITION
Your Voice. Your Machine. Your Data.
A high-performance, locally-run dictation agent for the liberated desktop.
✊ The Manifesto
We hold these truths to be self-evident: That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.
Whisper Voice is built on the principle of technological sovereignty. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.
"The master's tools will never dismantle the master's house." — Audre Lorde
Build your own tools. Run them locally.
⚡ Technical Core
Whisper Voice is not a wrapper for an API. It is a fully contained neural inference engine running on your metal.
The Engine: Faster-Whisper
We utilize the CTranslate2 backend—a high-performance inference engine for Transformer models. This allows us to run OpenAI's Whisper architectures with:
- 4x Speedup over standard PyTorch implementations.
- 4x Memory Reduction via 8-bit quantization (
int8), enabling Pro-grade models on consumer GPUs.
The Sense: Silero VAD
To distinguish human speech from background noise, we employ Silero VAD (Voice Activity Detection). This ensures that the agent only listens when you speak, conserving compute resources and preventing hallucinated text from silence.
The Interface: Qt 6 (PySide6)
The UI is built with Qt Quick/QML, rendering a hardware-accelerated, glassmorphic overlay that feels native to modern desktop environments while remaining completely decoupled from OS spyware.
📊 Model Intelligence
Select the intelligence level that matches your hardware reality.
| Model | GPU VRAM | CPU RAM | Speed | Best For |
|---|---|---|---|---|
| Tiny | ~500 MB | ~1 GB | ⚡ Supersonic | Quick commands, older machinery. |
| Base | ~600 MB | ~1 GB | 🚀 Very Fast | Daily driving on low-power laptops. |
| Small | ~1 GB | ~2 GB | ⏩ Fast | High accuracy for English dictation. |
| Medium | ~2 GB | ~4 GB | ⚖️ Balanced | Complex vocabulary and accents. |
| Large-v3 Turbo | ~4 GB | ~6 GB | ✨ Optimal | The sweet spot. Large-level smarts, Medium-level speed. |
| Large-v3 | ~5 GB | ~8 GB | 🧠 Maximum | Professional transcription. Uncompromised quality. |
Note: You must select your available Compute Device (CUDA GPU or CPU) in the Settings to enable acceleration.
🛠️ Operational Guide
Deployment
- Download: Grab the latest
WhisperVoice.exefrom Releases. - Install: There is no installation. Place the executable in a directory you control (e.g.,
C:\Tools\WhisperVoice). - Bootstrap: Run it. The agent will self-provision its own isolated Python environment (~2GB). This ensures your system PATH remains clean and unpolluted.
Usage
- Hotkeys: The default trigger is
F9. You can rebind this in Settings to any combination (e.g.,Ctrl+Space,Alt+V). - Injection Modes:
- Clipboard Paste: Standard, reliable text insertion.
- Simulate Typing: A stealth mode that physically mimics keystrokes (up to 6000 CPM) to bypass applications that block pasting (e.g., games, remote terminals).
- Tray Agent: The app lives in your system tray. Right-click the icon to access Settings or terminate the process.
Removal
- Portable: To uninstall, simply delete the folder. No registry keys, no hidden services, no trace left behind.
🔧 Troubleshooting
The app crashes immediately on start
Ensure you have the Microsoft Visual C++ Redistributable (2015-2022) installed, as the underlying CTranslate2 engine requires these standard libraries."Simulate Typing" is slow or misses characters
Adjust the Typing Speed slider in Settings. Some older applications cannot handle supersonic 6000 CPM input; try lowering it to 1200 CPM.Microphone not picking up audio
The agent uses your System Default Input Device. Ensure your microphone is set as Default in Windows Sound Settings.⚖️ License & Rights
Public Domain (CC0 1.0)
To the extent possible under law, the creators of this interface have waived all copyright and related or neighboring rights to this work. This tool belongs to the commons. It is a gift to the digital proletariat.
- Fork it.
- Mod it.
- Distribute it.
Credits
- OpenAI: For the Whisper weights (MIT).
- Systran: For Faster-Whisper (MIT).
- Qt Company: For the UI framework (LGPL).
No gods, no cloud managers.
