We hold these truths to be self-evident: That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.

Whisper Voice is built on the principle of technological sovereignty. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.

"The master's tools will never dismantle the master's house." — Audre Lorde
Build your own tools. Run them locally.

⚡ Technical Core

Whisper Voice is not a wrapper for an API. It is a fully contained neural inference engine running on your metal.

The Engine: Faster-Whisper

We utilize the CTranslate2 backend—a high-performance inference engine for Transformer models. This allows us to run OpenAI's Whisper architectures with:

4x Speedup over standard PyTorch implementations.
4x Memory Reduction via 8-bit quantization (int8), enabling Pro-grade models on consumer GPUs.

The Sense: Silero VAD

To distinguish human speech from background noise, we employ Silero VAD (Voice Activity Detection). This ensures that the agent only listens when you speak, conserving compute resources and preventing hallucinated text from silence.

The Interface: Qt 6 (PySide6)

The UI is built with Qt Quick/QML, rendering a hardware-accelerated, glassmorphic overlay that feels native to modern desktop environments while remaining completely decoupled from OS spyware.

📊 Model Intelligence

Select the intelligence level that matches your hardware reality.

Model	GPU VRAM	CPU RAM	Speed	Best For
Tiny	~500 MB	~1 GB	⚡ Supersonic	Quick commands, older machinery.
Base	~600 MB	~1 GB	🚀 Very Fast	Daily driving on low-power laptops.
Small	~1 GB	~2 GB	⏩ Fast	High accuracy for English dictation.
Medium	~2 GB	~4 GB	⚖️ Balanced	Complex vocabulary and accents.
Large-v3 Turbo	~4 GB	~6 GB	✨ Optimal	The sweet spot. Large-level smarts, Medium-level speed.
Large-v3	~5 GB	~8 GB	🧠 Maximum	Professional transcription. Uncompromised quality.

Note: You must select your available Compute Device (CUDA GPU or CPU) in the Settings to enable acceleration.

🛠️ Operational Guide

Deployment

Download: Grab the latest WhisperVoice.exe from Releases.
Install: There is no installation. Place the executable in a directory you control (e.g., C:\Tools\WhisperVoice).
Bootstrap: Run it. The agent will self-provision its own isolated Python environment (~2GB). This ensures your system PATH remains clean and unpolluted.

Usage

Hotkeys: The default trigger is F9. You can rebind this in Settings to any combination (e.g., Ctrl+Space, Alt+V).
Injection Modes:
- Clipboard Paste: Standard, reliable text insertion.
- Simulate Typing: A stealth mode that physically mimics keystrokes (up to 6000 CPM) to bypass applications that block pasting (e.g., games, remote terminals).
Tray Agent: The app lives in your system tray. Right-click the icon to access Settings or terminate the process.

Advanced Features

File Transcription: Need to transcribe a pre-recorded audio file? Right-click the System Tray Icon and select Transcribe File. Supports .wav, .mp3, .m4a, and most common formats.

🌐 Supported Languages

The model is trained on 680,000 hours of multilingual data and supports the following languages with high accuracy:

Click to expand full list (99 Languages)


Afrikaans	Albanian	Amharic	Arabic
Armenian	Assamese	Azerbaijani	Bashkir
Basque	Belarusian	Bengali	Bosnian
Breton	Bulgarian	Burmese	Castilian
Catalan	Chinese	Croatian	Czech
Danish	Dutch	English	Estonian
Faroese	Finnish	Flemish	French
Galician	Georgian	German	Greek
Gujarati	Haitian	Haitian Creole	Hausa
Hawaiian	Hebrew	Hindi	Hungarian
Icelandic	Indonesian	Italian	Japanese
Javanese	Kannada	Kazakh	Khmer
Korean	Lao	Latin	Latvian
Letzeburgesch	Lingala	Lithuanian	Luxembourgish
Macedonian	Malagasy	Malay	Malayalam
Maltese	Maori	Marathi	Moldavian
Mongolian	Myanmar	Nepali	Norwegian
Nynorsk	Occitan	Panjabi	Pashto
Persian	Polish	Portuguese	Punjabi
Pushto	Romanian	Russian	Sanskrit
Serbian	Shona	Sindhi	Sinhala
Sinhalese	Slovak	Slovenian	Somali
Spanish	Sundanese	Swahili	Swedish
Tagalog	Tajik	Tamil	Tatar
Telugu	Thai	Tibetan	Turkish
Turkmen	Ukrainian	Urdu	Uzbek
Valencian	Vietnamese	Welsh	Yiddish
Yoruba

Note: The model will automatically detect the language being spoken.

🔧 Troubleshooting

The app crashes immediately on start

Ensure you have the Microsoft Visual C++ Redistributable (2015-2022) installed, as the underlying CTranslate2 engine requires these standard libraries.

"Simulate Typing" is slow or misses characters

Adjust the Typing Speed slider in Settings. Some older applications cannot handle supersonic 6000 CPM input; try lowering it to 1200 CPM.

Microphone not picking up audio

The agent uses your System Default Input Device. Ensure your microphone is set as Default in Windows Sound Settings.

⚖️ License & Rights

Public Domain (CC0 1.0)

To the extent possible under law, the creators of this interface have waived all copyright and related or neighboring rights to this work. This tool belongs to the commons. It is a gift to the digital proletariat.

Fork it.
Mod it.
Distribute it.

Credits

OpenAI: For the Whisper weights (MIT).
Systran: For Faster-Whisper (MIT).
Qt Company: For the UI framework (LGPL).

No gods, no cloud managers.

Releases 6

v1.2.0 Latest

2026-02-18 22:30:48 +02:00

Languages

Python 52.4%

QML 44.1%

GLSL 3.1%

Batchfile 0.4%