Files
whisper_voice/README.md

7.9 KiB
Raw Blame History

🎙️ W H I S P E R   V O I C E

SOVEREIGN SPEECH RECOGNITION


Status Download License


"The master's tools will never dismantle the master's house."
Build your own tools. Run them locally. Free your mind.

View SourceReport Issue



📡 The Transmission

We live in an era of enclosure. Our words, our thoughts, and our digital footprints are strip-mined by centralized giants, turned into capital, and sold back to us as "convenience."

Whisper Voice is a rejection of that contract.

It is built on the axiom that your voice belongs to you. By bringing state-of-the-art inference down from the server farms and running it on your own metal, we reclaim a small piece of the digital commons. This software answers to no one but you. It has no telemetry, no subscription, and no masters.


The Engine

This operates on the silicon. It is not a wrapper. It is a machine.

Component Technology Benefit
Inference Core Faster-Whisper Hyper-optimized implementation via CTranslate2. Delivers 4x velocity over standard PyTorch execution.
Compression INT8 quantization Enables Pro-grade models (Large-v3) to run on consumer hardware, democratizing access to high-fidelity AI.
Sensory Gate Silero VAD Enterprise-grade Voice Activity Detection filters out the noise, ensuring only intent is captured.
Interface Qt 6 / QML Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign.

🌎 Universal Translator

Whisper Voice v1.0.1 introduces a Neural Translation Engine built directly into the core. It bypasses the need for corporate translation APIs entirely.

  • Input: Speak in French, Japanese, Russian, or 96 other languages.
  • Output: The engine instantly reconstructs the semantic meaning in fluent English.
  • Local Execution: No API keys. No data leaks. The translation happens on your GPU.

Dual-Channel Workflow

The application listens on two separate channels simultaneously, allowing for seamless fluid switching between local and international communication.

  • F9 (Default) -> Transcribe: Types exactly what you say, in the language you speak.
  • F10 (Default) -> Translate: Translates whatever you say in any language into English.

🕹️ Command & Control

Global Hotkeys

The agent runs silently in the background, waiting for your signal.

  • Transcribe (F9): Opens the channel for standard speech-to-text.
  • Translate (F10): Opens the channel for neural translation.
  • Customization: Remap these keys in Settings. The recorder supports complex chords (e.g. Ctrl + Alt + Space) to fit your workflow.

Injection Protocols

  • Clipboard Paste: Standard text injection. Instant, reliable.
  • Simulate Typing: Mimics physical keystrokes at superhuman speed (6000 CPM). Bypasses anti-paste restrictions and "protected" windows.

📊 Intelligence Matrix

Select the model that aligns with your available resources.

Model VRAM (GPU) RAM (CPU) Designation Capability
Tiny ~500 MB ~1 GB Supersonic Command & Control, older hardware.
Base ~600 MB ~1 GB 🚀 Very Fast Daily driver for low-power laptops.
Small ~1 GB ~2 GB Fast High accuracy English dictation.
Medium ~2 GB ~4 GB ⚖️ Balanced Complex vocabulary, foreign accents.
Large-v3 Turbo ~4 GB ~6 GB Optimal The Sweet Spot. Near-Large intelligence, Medium speed.
Large-v3 ~5 GB ~8 GB 🧠 Maximum Professional grade. Uncompromised.

Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.


🛠️ Deployment

📥 Installation

  1. Acquire: Download WhisperVoice.exe from Releases.
  2. Deploy: Place it anywhere. It is portable.
  3. Bootstrap: Run it. The agent will self-provision an isolated Python runtime (~2GB) on first launch.
  4. Sync: Future updates are handled by the Smart Bootstrapper, which surgically updates only changed files, respecting your bandwidth and your settings.

🔧 Troubleshooting

  • App crashes on start: Ensure you have Microsoft Visual C++ Redistributable 2015-2022 installed.
  • "Simulate Typing" is slow: Some applications (remote desktops, legacy games) cannot handle the data stream. Lower the typing speed in Settings to ~1200 CPM.
  • No Audio: The agent listens to the Default Communication Device. Verify your Windows Sound Control Panel.


<EFBFBD> Supported Languages

The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on Auto-Detect for fluid multilingual usage.

Afrikaans 🇿🇦 Albanian 🇦🇱 Amharic 🇪🇹 Arabic 🇸🇦 Armenian 🇦🇲 Assamese 🇮🇳
Azerbaijani 🇦🇿 Bashkir 🇷🇺 Basque 🇪🇸 Belarusian 🇧🇾 Bengali 🇧🇩 Bosnian 🇧🇦
Breton 🇫🇷 Bulgarian 🇧🇬 Burmese 🇲🇲 Castilian 🇪🇸 Catalan 🇪🇸 Chinese 🇨🇳
Croatian 🇭🇷 Czech 🇨🇿 Danish 🇩🇰 Dutch 🇳🇱 English 🇺🇸 Estonian 🇪🇪
Faroese 🇫🇴 Finnish 🇫🇮 Flemish 🇧🇪 French 🇫🇷 Galician 🇪🇸 Georgian 🇬🇪
German 🇩🇪 Greek 🇬🇷 Gujarati 🇮🇳 Haitian 🇭🇹 Hausa 🇳🇬 Hawaiian 🇺🇸
Hebrew 🇮🇱 Hindi 🇮🇳 Hungarian 🇭🇺 Icelandic 🇮🇸 Indonesian 🇮🇩 Italian 🇮🇹
Japanese 🇯🇵 Javanese 🇮🇩 Kannada 🇮🇳 Kazakh 🇰🇿 Khmer 🇰🇭 Korean 🇰🇷
Lao 🇱🇦 Latin 🇻🇦 Latvian 🇱🇻 Lingala 🇨🇩 Lithuanian 🇱🇹 Luxembourgish 🇱🇺
Macedonian 🇲🇰 Malagasy 🇲🇬 Malay 🇲🇾 Malayalam 🇮🇳 Maltese 🇲🇹 Maori 🇳🇿
Marathi 🇮🇳 Moldavian 🇲🇩 Mongolian 🇲🇳 Myanmar 🇲🇲 Nepali 🇳🇵 Norwegian 🇳🇴
Occitan 🇫🇷 Panjabi 🇮🇳 Pashto 🇦🇫 Persian 🇮🇷 Polish 🇵🇱 Portuguese 🇵🇹
Punjabi 🇮🇳 Romanian 🇷🇴 Russian 🇷🇺 Sanskrit 🇮🇳 Serbian 🇷🇸 Shona 🇿🇼
Sindhi 🇵🇰 Sinhala 🇱🇰 Slovak 🇸🇰 Slovenian 🇸🇮 Somali 🇸🇴 Spanish 🇪🇸
Sundanese 🇮🇩 Swahili 🇰🇪 Swedish 🇸🇪 Tagalog 🇵🇭 Tajik 🇹🇯 Tamil 🇮🇳
Tatar 🇷🇺 Telugu 🇮🇳 Thai 🇹🇭 Tibetan 🇨🇳 Turkish 🇹🇷 Turkmen 🇹🇲
Ukrainian 🇺🇦 Urdu 🇵🇰 Uzbek 🇺🇿 Vietnamese 🇻e Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿 Yiddish 🇮🇱
Yoruba 🇳🇬


⚖️ PUBLIC DOMAIN (CC0 1.0)

No Rights Reserved. No Gods. No Masters. No Managers.

Credit to OpenAI (Whisper), Systran (Faster-Whisper), and Silero (VAD).