2026-01-24 17:27:54 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00

WHISPER VOICE

SOVEREIGN SPEECH RECOGNITION


Banner

Your Voice. Your Machine. Your Data.
A high-performance, locally-run dictation agent for the liberated desktop.

Download License


Microphone


The Manifesto

We hold these truths to be self-evident: That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.

Whisper Voice is built on the principle of technological sovereignty. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.

"The master's tools will never dismantle the master's house." — Audre Lorde
Build your own tools. Run them locally.


Technical Core

Whisper Voice is not a wrapper for an API. It is a fully contained neural inference engine running on your metal.

The Engine: Faster-Whisper

We utilize the CTranslate2 backend—a high-performance inference engine for Transformer models. This allows us to run OpenAI's Whisper architectures with:

  • 4x Speedup over standard PyTorch implementations.
  • 4x Memory Reduction via 8-bit quantization (int8), enabling Pro-grade models on consumer GPUs.

The Sense: Silero VAD

To distinguish human speech from background noise, we employ Silero VAD (Voice Activity Detection). This ensures that the agent only listens when you speak, conserving compute resources and preventing hallucinated text from silence.

The Interface: Qt 6 (PySide6)

The UI is built with Qt Quick/QML, rendering a hardware-accelerated, glassmorphic overlay that feels native to modern desktop environments while remaining completely decoupled from OS spyware.


📊 Model Intelligence

Select the intelligence level that matches your hardware reality.

Model GPU VRAM CPU RAM Speed Best For
Tiny ~500 MB ~1 GB Supersonic Quick commands, older machinery.
Base ~600 MB ~1 GB 🚀 Very Fast Daily driving on low-power laptops.
Small ~1 GB ~2 GB Fast High accuracy for English dictation.
Medium ~2 GB ~4 GB ⚖️ Balanced Complex vocabulary and accents.
Large-v3 Turbo ~4 GB ~6 GB Optimal The sweet spot. Large-level smarts, Medium-level speed.
Large-v3 ~5 GB ~8 GB 🧠 Maximum Professional transcription. Uncompromised quality.

Note: You must select your available Compute Device (CUDA GPU or CPU) in the Settings to enable acceleration.


🛠️ Operational Guide

Deployment

  1. Download: Grab the latest WhisperVoice.exe from Releases.
  2. Install: There is no installation. Place the executable in a directory you control (e.g., C:\Tools\WhisperVoice).
  3. Bootstrap: Run it. The agent will self-provision its own isolated Python environment (~2GB). This ensures your system PATH remains clean and unpolluted.

Usage

  • Hotkeys: The default trigger is F9. You can rebind this in Settings to any combination (e.g., Ctrl+Space, Alt+V).
  • Injection Modes:
    • Clipboard Paste: Standard, reliable text insertion.
    • Simulate Typing: A stealth mode that physically mimics keystrokes (up to 6000 CPM) to bypass applications that block pasting (e.g., games, remote terminals).
  • Tray Agent: The app lives in your system tray. Right-click the icon to access Settings or terminate the process.

Advanced Features

  • File Transcription: Need to transcribe a pre-recorded audio file? Right-click the System Tray Icon and select Transcribe File. Supports .wav, .mp3, .m4a, and most common formats.

🌐 Supported Languages

The model is trained on 680,000 hours of multilingual data and supports the following languages with high accuracy:

Click to expand full list (99 Languages)
Afrikaans Albanian Amharic Arabic
Armenian Assamese Azerbaijani Bashkir
Basque Belarusian Bengali Bosnian
Breton Bulgarian Burmese Castilian
Catalan Chinese Croatian Czech
Danish Dutch English Estonian
Faroese Finnish Flemish French
Galician Georgian German Greek
Gujarati Haitian Haitian Creole Hausa
Hawaiian Hebrew Hindi Hungarian
Icelandic Indonesian Italian Japanese
Javanese Kannada Kazakh Khmer
Korean Lao Latin Latvian
Letzeburgesch Lingala Lithuanian Luxembourgish
Macedonian Malagasy Malay Malayalam
Maltese Maori Marathi Moldavian
Mongolian Myanmar Nepali Norwegian
Nynorsk Occitan Panjabi Pashto
Persian Polish Portuguese Punjabi
Pushto Romanian Russian Sanskrit
Serbian Shona Sindhi Sinhala
Sinhalese Slovak Slovenian Somali
Spanish Sundanese Swahili Swedish
Tagalog Tajik Tamil Tatar
Telugu Thai Tibetan Turkish
Turkmen Ukrainian Urdu Uzbek
Valencian Vietnamese Welsh Yiddish
Yoruba

Note: The model will automatically detect the language being spoken.


🔧 Troubleshooting

The app crashes immediately on start Ensure you have the Microsoft Visual C++ Redistributable (2015-2022) installed, as the underlying CTranslate2 engine requires these standard libraries.
"Simulate Typing" is slow or misses characters Adjust the Typing Speed slider in Settings. Some older applications cannot handle supersonic 6000 CPM input; try lowering it to 1200 CPM.
Microphone not picking up audio The agent uses your System Default Input Device. Ensure your microphone is set as Default in Windows Sound Settings.

⚖️ License & Rights

Public Domain (CC0 1.0)

To the extent possible under law, the creators of this interface have waived all copyright and related or neighboring rights to this work. This tool belongs to the commons. It is a gift to the digital proletariat.

  • Fork it.
  • Mod it.
  • Distribute it.

Credits

  • OpenAI: For the Whisper weights (MIT).
  • Systran: For Faster-Whisper (MIT).
  • Qt Company: For the UI framework (LGPL).

No gods, no cloud managers.

Description
No description provided
Readme 106 MiB
v1.2.0 Latest
2026-02-18 22:30:48 +02:00
Languages
Python 51.8%
QML 44.7%
GLSL 3.1%
Batchfile 0.4%