Files
whisper_voice/README.md
Your Name 0b2b5848e2 Fix: Translation Reliability, Click-Through, and Docs Sync
- Transcriber: Enforced 'beam_size=5' and prompt injection for robust translation.
- Transcriber: Removed conditioning on previous text to prevent language stickiness.
- Transcriber: Refactored kwargs to sanitize inputs.
- Overlay: Fixed click-through by toggling WS_EX_TRANSPARENT.
- UI: Added real download progress reporting.
- Docs: Refactored language list to table.
2026-01-24 19:05:43 +02:00

153 lines
8.4 KiB
Markdown

<div align="center">
# 🎙️ W H I S P E R &nbsp; V O I C E
### SOVEREIGN SPEECH RECOGNITION
<br>
![Status](https://img.shields.io/badge/STATUS-OPERATIONAL-success?style=for-the-badge&logo=server)
[![Download](https://img.shields.io/gitea/v/release/lashman/whisper_voice?gitea_url=https%3A%2F%2Fgit.lashman.live&label=Download&style=for-the-badge&logo=windows&logoColor=white&color=2563eb)](https://git.lashman.live/lashman/whisper_voice/releases/latest)
[![License](https://img.shields.io/badge/LICENSE-CC0_PUBLIC_DOMAIN-lightgrey?style=for-the-badge&logo=creative-commons&logoColor=black)](https://creativecommons.org/publicdomain/zero/1.0/)
<br>
> *"The master's tools will never dismantle the master's house."* — Audre Lorde
> <br>
> **Build your own tools. Run them locally.**
[Report Issue](https://git.lashman.live/lashman/whisper_voice/issues) • [View Source](https://git.lashman.live/lashman/whisper_voice) • [Releases](https://git.lashman.live/lashman/whisper_voice/releases)
</div>
<br>
## ✊ The Manifesto
**We hold these truths to be self-evident:** That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.
**Whisper Voice** is built on the principle of **technological sovereignty**. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.
---
## ⚡ Technical Architecture
This operates on the metal. It is not a wrapper. It is an engine.
| Component | Technology | Benefit |
| :--- | :--- | :--- |
| **Inference Core** | **Faster-Whisper** | Hyper-optimized implementation of OpenAI's Whisper using **CTranslate2**. Delivers **4x speedups** over PyTorch. |
| **Quantization** | **INT8** | 8-bit quantization enables Pro-grade models (`Large-v3`) to run on consumer GPUs with minimal VRAM. |
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out silence and background noise, conserving compute. |
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that feels native yet remains OS-independent. |
---
## 🌎 Native Translation Engine
Whisper Voice v1.0.1 introduces a powerful **Universal Translator** built directly into the core. This is not a web-request to Google Translate. This is a neural network running on your GPU that understands the semantic meaning of speech and reconstructs it in fluent English.
* **Any Language Source**: Speak in French, Japanese, Russian, or 96 other languages.
* **English Output**: The engine instantly transcribes the audio into English text.
* **Zero Latency**: Translation happens in real-time as you speak (sentence-by-sentence).
### Dual-Channel Operation
You do not need to switch modes manually. The application listens on two separate channels simultaneously.
* **F9 (Default)** -> **Transcribe**: Types exactly what you say, in the language you speak.
* **F10 (Default)** -> **Translate**: Translates whatever you say in *any* language into English.
This allows for seamless bilingual workflows. Dictate a message to a local friend on `F9`, then instantly reply to an international colleague on `F10` without touching a single setting.
---
## 🕹️ Controls & Configuration
### Global Hotkeys
The system runs silently in the background. Control it via global shortcuts:
* **Transcribe (Default: F9)**: Use this for normal speech-to-text. It respects the language set in Settings (or Auto-Detect).
* **Translate (Default: F10)**: Use this to force translation to English.
* **Customization**: Both keys can be remapped in the Settings menu. The recorder supports complex combinations (e.g., `Ctrl + Alt + Space`).
### Input Modes
* **Clipboard Paste**: Injects text via OS clipboard. Instant, but some games disable paste.
* **Simulate Typing**: Mimics physical keystrokes. Bypasses anti-cheat and anti-paste blocks. Configurable speed (default 6000 CPM) to prevent game kicks.
---
## 📊 Intelligence Matrix (Models)
Select the model that aligns with your hardware capabilities.
| Model | VRAM (GPU) | RAM (CPU) | Velocity | Designation |
| :--- | :--- | :--- | :--- | :--- |
| `Tiny` | **~500 MB** | ~1 GB | ⚡ **Supersonic** | Command & Control, older hardware. |
| `Base` | **~600 MB** | ~1 GB | 🚀 **Very Fast** | Daily driver for low-power laptops. |
| `Small` | **~1 GB** | ~2 GB | ⏩ **Fast** | High accuracy English dictation. |
| `Medium` | **~2 GB** | ~4 GB | ⚖️ **Balanced** | Complex vocabulary, foreign accents. |
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **Sweet Spot.** Near-Large smarts, Medium speed. |
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional transcription. Uncompromised. |
> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.*
---
## 🛠️ Operations
### 📥 Deployment
1. **Download**: Grab `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
2. **Deploy**: Place it anywhere. It is portable.
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python environment (~2GB) on first launch.
4. **Updates**: Simply replace the `.exe`. The **Smart Bootstrapper** will detect the update and sync only the changed files, preserving your settings and skipping unnecessary downloads.
### 🔧 Troubleshooting
* **App crashes on start**: Ensure you have [Microsoft Visual C++ Redistributable 2015-2022](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) installed.
* **"Simulate Typing" is slow**: Some applications (remote desktops, older games) choke on super-fast input. Lower the typing speed in Settings to ~1200 CPM.
* **No Audio**: The agent listens to the **Default Communication Device**. Check your Windows Sound Control Panel.
---
## 🌐 Supported Languages
The engine supports 99 languages. You can lock the engine to a specific language in Settings to improve accuracy, or leave it on **Auto-Detect** for multilingual usage.
([See full language list below](#full-language-list))
---
## 🌐 Full Language List
| | | | | |
| :--- | :--- | :--- | :--- | :--- |
| Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 | Armenian 🇦🇲 |
| Assamese 🇮🇳 | Azerbaijani 🇦🇿 | Bashkir 🇷🇺 | Basque 🇪🇸 | Belarusian 🇧🇾 |
| Bengali 🇧🇩 | Bosnian 🇧🇦 | Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 |
| Castilian 🇪🇸 | Catalan 🇪🇸 | Chinese 🇨🇳 | Croatian 🇭🇷 | Czech 🇨🇿 |
| Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 | Faroese 🇫🇴 |
| Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 |
| Hawaiian 🇺🇸 | Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 |
| Indonesian 🇮🇩 | Italian 🇮🇹 | Japanese 🇯🇵 | Javanese 🇮🇩 | Kannada 🇮🇳 |
| Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 | Lao 🇱🇦 | Latin 🇻🇦 |
| Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 | Macedonian 🇲🇰 |
| Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 |
| Norwegian 🇳🇴 | Occitan 🇫🇷 | Panjabi 🇮🇳 | Pashto 🇦🇫 | Persian 🇮🇷 |
| Polish 🇵🇱 | Portuguese 🇵🇹 | Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 |
| Sanskrit 🇮🇳 | Serbian 🇷🇸 | Shona 🇿🇼 | Sindhi 🇵🇰 | Sinhala 🇱🇰 |
| Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 | Sundanese 🇮🇩 |
| Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 | Tajik 🇹🇯 | Tamil 🇮🇳 |
| Tatar 🇷🇺 | Telugu 🇮🇳 | Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 |
| Turkmen 🇹🇲 | Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e |
| Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿 | Yiddish 🇮🇱 | Yoruba 🇳🇬 | | |
<div align="center">
### ⚖️ PUBLIC DOMAIN (CC0 1.0)
*No Rights Reserved. No Gods. No Managers.*
Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD).
</div>