Docs: Final polish - 6-col language table and refined manifesto

This commit is contained in:
Your Name
2026-01-24 19:12:08 +02:00
parent 0b2b5848e2
commit 0f1bf5f1af

152
README.md
View File

@@ -5,147 +5,143 @@
<br>
![Status](https://img.shields.io/badge/STATUS-OPERATIONAL-success?style=for-the-badge&logo=server)
[![Download](https://img.shields.io/gitea/v/release/lashman/whisper_voice?gitea_url=https%3A%2F%2Fgit.lashman.live&label=Download&style=for-the-badge&logo=windows&logoColor=white&color=2563eb)](https://git.lashman.live/lashman/whisper_voice/releases/latest)
[![License](https://img.shields.io/badge/LICENSE-CC0_PUBLIC_DOMAIN-lightgrey?style=for-the-badge&logo=creative-commons&logoColor=black)](https://creativecommons.org/publicdomain/zero/1.0/)
![Status](https://img.shields.io/badge/STATUS-OPERATIONAL-success?style=for-the-badge&logo=server&color=2ecc71)
[![Download](https://img.shields.io/gitea/v/release/lashman/whisper_voice?gitea_url=https%3A%2F%2Fgit.lashman.live&label=Install&style=for-the-badge&logo=windows&logoColor=white&color=3b82f6)](https://git.lashman.live/lashman/whisper_voice/releases/latest)
[![License](https://img.shields.io/badge/LICENSE-PUBLIC_DOMAIN-lightgrey?style=for-the-badge&logo=creative-commons&logoColor=black)](https://creativecommons.org/publicdomain/zero/1.0/)
<br>
> *"The master's tools will never dismantle the master's house."* — Audre Lorde
> *"The master's tools will never dismantle the master's house."*
> <br>
> **Build your own tools. Run them locally.**
> **Build your own tools. Run them locally. Free your mind.**
[Report Issue](https://git.lashman.live/lashman/whisper_voice/issues) • [View Source](https://git.lashman.live/lashman/whisper_voice) • [Releases](https://git.lashman.live/lashman/whisper_voice/releases)
[View Source](https://git.lashman.live/lashman/whisper_voice) • [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues)
</div>
<br>
<br>
## The Manifesto
## 📡 The Transmission
**We hold these truths to be self-evident:** That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.
We live in an era of enclosure. Our words, our thoughts, and our digital footprints are strip-mined by centralized giants, turned into capital, and sold back to us as "convenience."
**Whisper Voice** is built on the principle of **technological sovereignty**. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.
**Whisper Voice** is a rejection of that contract.
It is built on the axiom that **your voice belongs to you**. By bringing state-of-the-art inference down from the server farms and running it on your own metal, we reclaim a small piece of the digital commons. This software answers to no one but you. It has no telemetry, no subscription, and no masters.
---
## ⚡ Technical Architecture
## ⚡ The Engine
This operates on the metal. It is not a wrapper. It is an engine.
This operates on the silicon. It is not a wrapper. It is a machine.
| Component | Technology | Benefit |
| :--- | :--- | :--- |
| **Inference Core** | **Faster-Whisper** | Hyper-optimized implementation of OpenAI's Whisper using **CTranslate2**. Delivers **4x speedups** over PyTorch. |
| **Quantization** | **INT8** | 8-bit quantization enables Pro-grade models (`Large-v3`) to run on consumer GPUs with minimal VRAM. |
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out silence and background noise, conserving compute. |
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that feels native yet remains OS-independent. |
| **Inference Core** | **Faster-Whisper** | Hyper-optimized implementation via **CTranslate2**. Delivers **4x velocity** over standard PyTorch execution. |
| **Compression** | **INT8 quantization** | Enables Pro-grade models (`Large-v3`) to run on consumer hardware, democratizing access to high-fidelity AI. |
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only intent is captured. |
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. |
---
<br>
## 🌎 Native Translation Engine
## 🌎 Universal Translator
Whisper Voice v1.0.1 introduces a powerful **Universal Translator** built directly into the core. This is not a web-request to Google Translate. This is a neural network running on your GPU that understands the semantic meaning of speech and reconstructs it in fluent English.
Whisper Voice v1.0.1 introduces a **Neural Translation Engine** built directly into the core. It bypasses the need for corporate translation APIs entirely.
* **Any Language Source**: Speak in French, Japanese, Russian, or 96 other languages.
* **English Output**: The engine instantly transcribes the audio into English text.
* **Zero Latency**: Translation happens in real-time as you speak (sentence-by-sentence).
* **Input**: Speak in French, Japanese, Russian, or **96 other languages**.
* **Output**: The engine instantly reconstructs the semantic meaning in fluent **English**.
* **Local Execution**: No API keys. No data leaks. The translation happens on your GPU.
### Dual-Channel Operation
You do not need to switch modes manually. The application listens on two separate channels simultaneously.
### Dual-Channel Workflow
The application listens on two separate channels simultaneously, allowing for seamless fluid switching between local and international communication.
* **F9 (Default)** -> **Transcribe**: Types exactly what you say, in the language you speak.
* **F10 (Default)** -> **Translate**: Translates whatever you say in *any* language into English.
This allows for seamless bilingual workflows. Dictate a message to a local friend on `F9`, then instantly reply to an international colleague on `F10` without touching a single setting.
---
## 🕹️ Controls & Configuration
## 🕹️ Command & Control
### Global Hotkeys
The system runs silently in the background. Control it via global shortcuts:
The agent runs silently in the background, waiting for your signal.
* **Transcribe (Default: F9)**: Use this for normal speech-to-text. It respects the language set in Settings (or Auto-Detect).
* **Translate (Default: F10)**: Use this to force translation to English.
* **Customization**: Both keys can be remapped in the Settings menu. The recorder supports complex combinations (e.g., `Ctrl + Alt + Space`).
* **Transcribe (F9)**: Opens the channel for standard speech-to-text.
* **Translate (F10)**: Opens the channel for neural translation.
* **Customization**: Remap these keys in Settings. The recorder supports complex chords (e.g. `Ctrl + Alt + Space`) to fit your workflow.
### Input Modes
* **Clipboard Paste**: Injects text via OS clipboard. Instant, but some games disable paste.
* **Simulate Typing**: Mimics physical keystrokes. Bypasses anti-cheat and anti-paste blocks. Configurable speed (default 6000 CPM) to prevent game kicks.
### Injection Protocols
* **Clipboard Paste**: Standard text injection. Instant, reliable.
* **Simulate Typing**: Mimics physical keystrokes at superhuman speed (6000 CPM). Bypasses anti-paste restrictions and "protected" windows.
---
<br>
## 📊 Intelligence Matrix (Models)
## 📊 Intelligence Matrix
Select the model that aligns with your hardware capabilities.
Select the model that aligns with your available resources.
| Model | VRAM (GPU) | RAM (CPU) | Velocity | Designation |
| Model | VRAM (GPU) | RAM (CPU) | Designation | Capability |
| :--- | :--- | :--- | :--- | :--- |
| `Tiny` | **~500 MB** | ~1 GB | ⚡ **Supersonic** | Command & Control, older hardware. |
| `Base` | **~600 MB** | ~1 GB | 🚀 **Very Fast** | Daily driver for low-power laptops. |
| `Small` | **~1 GB** | ~2 GB | ⏩ **Fast** | High accuracy English dictation. |
| `Medium` | **~2 GB** | ~4 GB | ⚖️ **Balanced** | Complex vocabulary, foreign accents. |
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **Sweet Spot.** Near-Large smarts, Medium speed. |
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional transcription. Uncompromised. |
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **The Sweet Spot.** Near-Large intelligence, Medium speed. |
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional grade. Uncompromised. |
> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.*
---
## 🛠️ Operations
## 🛠️ Deployment
### 📥 Deployment
1. **Download**: Grab `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
### 📥 Installation
1. **Acquire**: Download `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
2. **Deploy**: Place it anywhere. It is portable.
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python environment (~2GB) on first launch.
4. **Updates**: Simply replace the `.exe`. The **Smart Bootstrapper** will detect the update and sync only the changed files, preserving your settings and skipping unnecessary downloads.
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python runtime (~2GB) on first launch.
4. **Sync**: Future updates are handled by the **Smart Bootstrapper**, which surgically updates only changed files, respecting your bandwidth and your settings.
### 🔧 Troubleshooting
* **App crashes on start**: Ensure you have [Microsoft Visual C++ Redistributable 2015-2022](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) installed.
* **"Simulate Typing" is slow**: Some applications (remote desktops, older games) choke on super-fast input. Lower the typing speed in Settings to ~1200 CPM.
* **No Audio**: The agent listens to the **Default Communication Device**. Check your Windows Sound Control Panel.
* **"Simulate Typing" is slow**: Some applications (remote desktops, legacy games) cannot handle the data stream. Lower the typing speed in Settings to ~1200 CPM.
* **No Audio**: The agent listens to the **Default Communication Device**. Verify your Windows Sound Control Panel.
<br>
---
## 🌐 Supported Languages
## <EFBFBD> Supported Languages
The engine supports 99 languages. You can lock the engine to a specific language in Settings to improve accuracy, or leave it on **Auto-Detect** for multilingual usage.
The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on **Auto-Detect** for fluid multilingual usage.
([See full language list below](#full-language-list))
| | | | | | |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 | Armenian 🇦🇲 | Assamese 🇮🇳 |
| Azerbaijani 🇦🇿 | Bashkir 🇷🇺 | Basque 🇪🇸 | Belarusian 🇧🇾 | Bengali 🇧🇩 | Bosnian 🇧🇦 |
| Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 | Castilian 🇪🇸 | Catalan 🇪🇸 | Chinese 🇨🇳 |
| Croatian 🇭🇷 | Czech 🇨🇿 | Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 |
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 | Indonesian 🇮🇩 | Italian 🇮🇹 |
| Japanese 🇯🇵 | Javanese 🇮🇩 | Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 |
| Macedonian 🇲🇰 | Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 | Norwegian 🇳🇴 |
| Occitan 🇫🇷 | Panjabi 🇮🇳 | Pashto 🇦🇫 | Persian 🇮🇷 | Polish 🇵🇱 | Portuguese 🇵🇹 |
| Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 | Sanskrit 🇮🇳 | Serbian 🇷🇸 | Shona 🇿🇼 |
| Sindhi 🇵🇰 | Sinhala 🇱🇰 | Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 |
| Sundanese 🇮🇩 | Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 | Tajik 🇹🇯 | Tamil 🇮🇳 |
| Tatar 🇷🇺 | Telugu 🇮🇳 | Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 | Turkmen 🇹🇲 |
| Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e | Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿 | Yiddish 🇮🇱 |
| Yoruba 🇳🇬 | | | | | |
---
## 🌐 Full Language List
| | | | | |
| :--- | :--- | :--- | :--- | :--- |
| Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 | Armenian 🇦🇲 |
| Assamese 🇮🇳 | Azerbaijani 🇦🇿 | Bashkir 🇷🇺 | Basque 🇪🇸 | Belarusian 🇧🇾 |
| Bengali 🇧🇩 | Bosnian 🇧🇦 | Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 |
| Castilian 🇪🇸 | Catalan 🇪🇸 | Chinese 🇨🇳 | Croatian 🇭🇷 | Czech 🇨🇿 |
| Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 | Faroese 🇫🇴 |
| Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 |
| Hawaiian 🇺🇸 | Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 |
| Indonesian 🇮🇩 | Italian 🇮🇹 | Japanese 🇯🇵 | Javanese 🇮🇩 | Kannada 🇮🇳 |
| Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 | Lao 🇱🇦 | Latin 🇻🇦 |
| Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 | Macedonian 🇲🇰 |
| Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 |
| Norwegian 🇳🇴 | Occitan 🇫🇷 | Panjabi 🇮🇳 | Pashto 🇦🇫 | Persian 🇮🇷 |
| Polish 🇵🇱 | Portuguese 🇵🇹 | Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 |
| Sanskrit 🇮🇳 | Serbian 🇷🇸 | Shona 🇿🇼 | Sindhi 🇵🇰 | Sinhala 🇱🇰 |
| Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 | Sundanese 🇮🇩 |
| Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 | Tajik 🇹🇯 | Tamil 🇮🇳 |
| Tatar 🇷🇺 | Telugu 🇮🇳 | Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 |
| Turkmen 🇹🇲 | Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e |
| Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿 | Yiddish 🇮🇱 | Yoruba 🇳🇬 | | |
<br>
<br>
<div align="center">
### ⚖️ PUBLIC DOMAIN (CC0 1.0)
*No Rights Reserved. No Gods. No Managers.*
*No Rights Reserved. No Gods. No Masters. No Managers.*
Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD).