Files
whisper_voice/README.md
2026-01-24 17:29:59 +02:00

7.0 KiB

🎙️ W H I S P E R   V O I C E

SOVEREIGN SPEECH RECOGNITION


Status Download License


"The master's tools will never dismantle the master's house." — Audre Lorde
Build your own tools. Run them locally.

Report IssueView SourceReleases


The Manifesto

We hold these truths to be self-evident: That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.

Whisper Voice is built on the principle of technological sovereignty. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.


Technical Architecture

This operates on the metal. It is not a wrapper. It is an engine.

Component Technology Benefit
Inference Core Faster-Whisper Hyper-optimized implementation of OpenAI's Whisper using CTranslate2. Delivers 4x speedups over PyTorch.
Quantization INT8 8-bit quantization enables Pro-grade models (Large-v3) to run on consumer GPUs with minimal VRAM.
Sensory Gate Silero VAD Enterprise-grade Voice Activity Detection filters out silence and background noise, conserving compute.
Interface Qt 6 / QML Hardware-accelerated, glassmorphic UI that feels native yet remains OS-independent.

📊 Intelligence Matrix

Select the model that aligns with your hardware capabilities.

Model VRAM (GPU) RAM (CPU) Velocity Designation
Tiny ~500 MB ~1 GB Supersonic Command & Control, older hardware.
Base ~600 MB ~1 GB 🚀 Very Fast Daily driver for low-power laptops.
Small ~1 GB ~2 GB Fast High accuracy English dictation.
Medium ~2 GB ~4 GB ⚖️ Balanced Complex vocabulary, foreign accents.
Large-v3 Turbo ~4 GB ~6 GB Optimal Sweet Spot. Near-Large smarts, Medium speed.
Large-v3 ~5 GB ~8 GB 🧠 Maximum Professional transcription. Uncompromised.

Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.


🛠️ Operations

📥 Deployment

  1. Download: Grab WhisperVoice.exe from Releases.
  2. Deploy: Place it anywhere. It is portable.
  3. Bootstrap: Run it. The agent will self-provision an isolated Python environment (~2GB) on first launch.

🕹️ Controls

  • Global Hook: F9 (Default). Press to open the channel. Release to inject text.
  • Tray Agent: Retracts to the system tray. Right-click for Settings or File Transcription.

📡 Input Modes

Mode Description Speed
Clipboard Paste Standard text injection via OS clipboard. Instant
Simulate Typing Mimics physical keystrokes. Bypasses anti-paste blocks. Up to 6000 CPM

🌐 Universal Translation

The model listens in 99 languages and translates them to English or transcribes them natively.

Click to view supported languages
Afrikaans 🇿🇦 Albanian 🇦🇱 Amharic 🇪🇹 Arabic 🇸🇦
Armenian 🇦🇲 Assamese 🇮🇳 Azerbaijani 🇦🇿 Bashkir 🇷🇺
Basque 🇪🇸 Belarusian 🇧🇾 Bengali 🇧🇩 Bosnian 🇧🇦
Breton 🇫🇷 Bulgarian 🇧🇬 Burmese 🇲🇲 Castilian 🇪🇸
Catalan 🇪🇸 Chinese 🇨🇳 Croatian 🇭🇷 Czech 🇨🇿
Danish 🇩🇰 Dutch 🇳🇱 English 🇺🇸 Estonian 🇪🇪
Faroese 🇫🇴 Finnish 🇫🇮 Flemish 🇧🇪 French 🇫🇷
Galician 🇪🇸 Georgian 🇬🇪 German 🇩🇪 Greek 🇬🇷
Gujarati 🇮🇳 Haitian 🇭🇹 Hausa 🇳🇬 Hawaiian 🇺🇸
Hebrew 🇮🇱 Hindi 🇮🇳 Hungarian 🇭🇺 Icelandic 🇮🇸
Indonesian 🇮🇩 Italian 🇮🇹 Japanese 🇯🇵 Javanese 🇮🇩
Kannada 🇮🇳 Kazakh 🇰🇿 Khmer 🇰🇭 Korean 🇰🇷
Lao 🇱🇦 Latin 🇻🇦 Latvian 🇱🇻 Lingala 🇨🇩
Lithuanian 🇱🇹 Luxembourgish 🇱🇺 Macedonian 🇲🇰 Malagasy 🇲🇬
Malay 🇲🇾 Malayalam 🇮🇳 Maltese 🇲🇹 Maori 🇳🇿
Marathi 🇮🇳 Moldavian 🇲🇩 Mongolian 🇲🇳 Myanmar 🇲🇲
Nepali 🇳🇵 Norwegian 🇳🇴 Occitan 🇫🇷 Panjabi 🇮🇳
Pashto 🇦🇫 Persian 🇮🇷 Polish 🇵🇱 Portuguese 🇵🇹
Punjabi 🇮🇳 Romanian 🇷🇴 Russian 🇷🇺 Sanskrit 🇮🇳
Serbian 🇷🇸 Shona 🇿🇼 Sindhi 🇵🇰 Sinhala 🇱🇰
Slovak 🇸🇰 Slovenian 🇸🇮 Somali 🇸🇴 Spanish 🇪🇸
Sundanese 🇮🇩 Swahili 🇰🇪 Swedish 🇸🇪 Tagalog 🇵🇭
Tajik 🇹🇯 Tamil 🇮🇳 Tatar 🇷🇺 Telugu 🇮🇳
Thai 🇹🇭 Tibetan 🇨🇳 Turkish 🇹🇷 Turkmen 🇹🇲
Ukrainian 🇺🇦 Urdu 🇵🇰 Uzbek 🇺🇿 Vietnamese 🇻e
Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿 Yiddish 🇮🇱 Yoruba 🇳🇬

🔧 Troubleshooting

🔥 App crashes on start
The underlying engine requires standard C++ libraries. Install the Microsoft Visual C++ Redistributable (2015-2022).
🐌 "Simulate Typing" is slow
Some apps (games, RDP) can't handle supersonic input. Go to Settings and lower the Typing Speed to ~1200 CPM.
🎤 No Audio / Silence
The agent listens to the Default Communication Device. Ensure your microphone is set correctly in Windows Sound Settings.

⚖️ PUBLIC DOMAIN (CC0 1.0)

No Rights Reserved. No Gods. No Managers.

Credit to OpenAI (Whisper), Systran (Faster-Whisper), and Silero (VAD).