From 306bd075ed7173544286046f0bd5166786c771f8 Mon Sep 17 00:00:00 2001 From: Your Name Date: Sat, 24 Jan 2026 17:29:59 +0200 Subject: [PATCH] Aesthetic overhaul of documentation --- README.md | 190 ++++++++++++++++++++++++++---------------------------- 1 file changed, 90 insertions(+), 100 deletions(-) diff --git a/README.md b/README.md index dd2a2b3..e86e374 100644 --- a/README.md +++ b/README.md @@ -1,129 +1,120 @@
-# WHISPER VOICE +# ๐ŸŽ™๏ธ W H I S P E R   V O I C E ### SOVEREIGN SPEECH RECOGNITION
-![Banner](https://img.shields.io/badge/STATUS-OPERATIONAL-success?style=for-the-badge&logo=server) - -**Your Voice. Your Machine. Your Data.** -
-*A high-performance, locally-run dictation agent for the liberated desktop.* - +![Status](https://img.shields.io/badge/STATUS-OPERATIONAL-success?style=for-the-badge&logo=server) [![Download](https://img.shields.io/gitea/v/release/lashman/whisper_voice?gitea_url=https%3A%2F%2Fgit.lashman.live&label=Download&style=for-the-badge&logo=windows&logoColor=white&color=2563eb)](https://git.lashman.live/lashman/whisper_voice/releases/latest) [![License](https://img.shields.io/badge/LICENSE-CC0_PUBLIC_DOMAIN-lightgrey?style=for-the-badge&logo=creative-commons&logoColor=black)](https://creativecommons.org/publicdomain/zero/1.0/)
-

- Microphone -

+> *"The master's tools will never dismantle the master's house."* โ€” Audre Lorde +>
+> **Build your own tools. Run them locally.** + +[Report Issue](https://git.lashman.live/lashman/whisper_voice/issues) โ€ข [View Source](https://git.lashman.live/lashman/whisper_voice) โ€ข [Releases](https://git.lashman.live/lashman/whisper_voice/releases)
---- +
## โœŠ The Manifesto **We hold these truths to be self-evident:** That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy. -Whisper Voice is built on the principle of **technological sovereignty**. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours. - -> *"The master's tools will never dismantle the master's house."* โ€” Audre Lorde ->
**Build your own tools. Run them locally.** +**Whisper Voice** is built on the principle of **technological sovereignty**. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours. --- -## โšก Technical Core +## โšก Technical Architecture -Whisper Voice is not a wrapper for an API. It is a fully contained neural inference engine running on your metal. +This operates on the metal. It is not a wrapper. It is an engine. -### The Engine: Faster-Whisper -We utilize the **CTranslate2** backendโ€”a high-performance inference engine for Transformer models. This allows us to run OpenAI's Whisper architectures with: -* **4x Speedup** over standard PyTorch implementations. -* **4x Memory Reduction** via 8-bit quantization (`int8`), enabling Pro-grade models on consumer GPUs. - -### The Sense: Silero VAD -To distinguish human speech from background noise, we employ **Silero VAD** (Voice Activity Detection). This ensures that the agent only listens when you speak, conserving compute resources and preventing hallucinated text from silence. - -### The Interface: Qt 6 (PySide6) -The UI is built with **Qt Quick/QML**, rendering a hardware-accelerated, glassmorphic overlay that feels native to modern desktop environments while remaining completely decoupled from OS spyware. +| Component | Technology | Benefit | +| :--- | :--- | :--- | +| **Inference Core** | **Faster-Whisper** | Hyper-optimized implementation of OpenAI's Whisper using **CTranslate2**. Delivers **4x speedups** over PyTorch. | +| **Quantization** | **INT8** | 8-bit quantization enables Pro-grade models (`Large-v3`) to run on consumer GPUs with minimal VRAM. | +| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out silence and background noise, conserving compute. | +| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that feels native yet remains OS-independent. | --- -## ๐Ÿ“Š Model Intelligence +## ๐Ÿ“Š Intelligence Matrix -Select the intelligence level that matches your hardware reality. +Select the model that aligns with your hardware capabilities. -| Model | GPU VRAM | CPU RAM | Speed | Best For | +| Model | VRAM (GPU) | RAM (CPU) | Velocity | Designation | | :--- | :--- | :--- | :--- | :--- | -| **Tiny** | ~500 MB | ~1 GB | โšก Supersonic | Quick commands, older machinery. | -| **Base** | ~600 MB | ~1 GB | ๐Ÿš€ Very Fast | Daily driving on low-power laptops. | -| **Small** | ~1 GB | ~2 GB | โฉ Fast | High accuracy for English dictation. | -| **Medium** | ~2 GB | ~4 GB | โš–๏ธ Balanced | Complex vocabulary and accents. | -| **Large-v3 Turbo** | ~4 GB | ~6 GB | โœจ **Optimal** | The sweet spot. Large-level smarts, Medium-level speed. | -| **Large-v3** | ~5 GB | ~8 GB | ๐Ÿง  Maximum | Professional transcription. Uncompromised quality. | +| `Tiny` | **~500 MB** | ~1 GB | โšก **Supersonic** | Command & Control, older hardware. | +| `Base` | **~600 MB** | ~1 GB | ๐Ÿš€ **Very Fast** | Daily driver for low-power laptops. | +| `Small` | **~1 GB** | ~2 GB | โฉ **Fast** | High accuracy English dictation. | +| `Medium` | **~2 GB** | ~4 GB | โš–๏ธ **Balanced** | Complex vocabulary, foreign accents. | +| `Large-v3 Turbo` | **~4 GB** | ~6 GB | โœจ **Optimal** | **Sweet Spot.** Near-Large smarts, Medium speed. | +| `Large-v3` | **~5 GB** | ~8 GB | ๐Ÿง  **Maximum** | Professional transcription. Uncompromised. | -*Note: You must select your available Compute Device (CUDA GPU or CPU) in the Settings to enable acceleration.* +> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.* --- -## ๐Ÿ› ๏ธ Operational Guide +## ๐Ÿ› ๏ธ Operations -### Deployment -1. **Download**: Grab the latest `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases). -2. **Install**: There is no installation. Place the executable in a directory you control (e.g., `C:\Tools\WhisperVoice`). -3. **Bootstrap**: Run it. The agent will self-provision its own isolated Python environment (~2GB). This ensures your system PATH remains clean and unpolluted. +### ๐Ÿ“ฅ Deployment +1. **Download**: Grab `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases). +2. **Deploy**: Place it anywhere. It is portable. +3. **Bootstrap**: Run it. The agent will self-provision an isolated Python environment (~2GB) on first launch. -### Usage -* **Hotkeys**: The default trigger is `F9`. You can rebind this in Settings to any combination (e.g., `Ctrl+Space`, `Alt+V`). -* **Injection Modes**: - * *Clipboard Paste*: Standard, reliable text insertion. - * *Simulate Typing*: A stealth mode that physically mimics keystrokes (up to 6000 CPM) to bypass applications that block pasting (e.g., games, remote terminals). -* **Tray Agent**: The app lives in your system tray. Right-click the icon to access Settings or terminate the process. +### ๐Ÿ•น๏ธ Controls +* **Global Hook**: `F9` (Default). Press to open the channel. Release to inject text. +* **Tray Agent**: Retracts to the system tray. Right-click for **Settings** or **File Transcription**. -### Advanced Features -* **File Transcription**: Need to transcribe a pre-recorded audio file? Right-click the **System Tray Icon** and select **Transcribe File**. Supports `.wav`, `.mp3`, `.m4a`, and most common formats. +### ๐Ÿ“ก Input Modes +| Mode | Description | Speed | +| :--- | :--- | :--- | +| **Clipboard Paste** | Standard text injection via OS clipboard. | Instant | +| **Simulate Typing** | Mimics physical keystrokes. Bypasses anti-paste blocks. | Up to **6000** CPM | -### ๐ŸŒ Supported Languages -The model is trained on 680,000 hours of multilingual data and supports the following languages with high accuracy: +--- + +## ๐ŸŒ Universal Translation + +The model listens in **99 languages** and translates them to English or transcribes them natively.
-Click to expand full list (99 Languages) +Click to view supported languages +
| | | | | | :--- | :--- | :--- | :--- | -| Afrikaans | Albanian | Amharic | Arabic | -| Armenian | Assamese | Azerbaijani | Bashkir | -| Basque | Belarusian | Bengali | Bosnian | -| Breton | Bulgarian | Burmese | Castilian | -| Catalan | Chinese | Croatian | Czech | -| Danish | Dutch | English | Estonian | -| Faroese | Finnish | Flemish | French | -| Galician | Georgian | German | Greek | -| Gujarati | Haitian | Haitian Creole | Hausa | -| Hawaiian | Hebrew | Hindi | Hungarian | -| Icelandic | Indonesian | Italian | Japanese | -| Javanese | Kannada | Kazakh | Khmer | -| Korean | Lao | Latin | Latvian | -| Letzeburgesch | Lingala | Lithuanian | Luxembourgish | -| Macedonian | Malagasy | Malay | Malayalam | -| Maltese | Maori | Marathi | Moldavian | -| Mongolian | Myanmar | Nepali | Norwegian | -| Nynorsk | Occitan | Panjabi | Pashto | -| Persian | Polish | Portuguese | Punjabi | -| Pushto | Romanian | Russian | Sanskrit | -| Serbian | Shona | Sindhi | Sinhala | -| Sinhalese | Slovak | Slovenian | Somali | -| Spanish | Sundanese | Swahili | Swedish | -| Tagalog | Tajik | Tamil | Tatar | -| Telugu | Thai | Tibetan | Turkish | -| Turkmen | Ukrainian | Urdu | Uzbek | -| Valencian | Vietnamese | Welsh | Yiddish | -| Yoruba | | | | +| Afrikaans ๐Ÿ‡ฟ๐Ÿ‡ฆ | Albanian ๐Ÿ‡ฆ๐Ÿ‡ฑ | Amharic ๐Ÿ‡ช๐Ÿ‡น | Arabic ๐Ÿ‡ธ๐Ÿ‡ฆ | +| Armenian ๐Ÿ‡ฆ๐Ÿ‡ฒ | Assamese ๐Ÿ‡ฎ๐Ÿ‡ณ | Azerbaijani ๐Ÿ‡ฆ๐Ÿ‡ฟ | Bashkir ๐Ÿ‡ท๐Ÿ‡บ | +| Basque ๐Ÿ‡ช๐Ÿ‡ธ | Belarusian ๐Ÿ‡ง๐Ÿ‡พ | Bengali ๐Ÿ‡ง๐Ÿ‡ฉ | Bosnian ๐Ÿ‡ง๐Ÿ‡ฆ | +| Breton ๐Ÿ‡ซ๐Ÿ‡ท | Bulgarian ๐Ÿ‡ง๐Ÿ‡ฌ | Burmese ๐Ÿ‡ฒ๐Ÿ‡ฒ | Castilian ๐Ÿ‡ช๐Ÿ‡ธ | +| Catalan ๐Ÿ‡ช๐Ÿ‡ธ | Chinese ๐Ÿ‡จ๐Ÿ‡ณ | Croatian ๐Ÿ‡ญ๐Ÿ‡ท | Czech ๐Ÿ‡จ๐Ÿ‡ฟ | +| Danish ๐Ÿ‡ฉ๐Ÿ‡ฐ | Dutch ๐Ÿ‡ณ๐Ÿ‡ฑ | English ๐Ÿ‡บ๐Ÿ‡ธ | Estonian ๐Ÿ‡ช๐Ÿ‡ช | +| Faroese ๐Ÿ‡ซ๐Ÿ‡ด | Finnish ๐Ÿ‡ซ๐Ÿ‡ฎ | Flemish ๐Ÿ‡ง๐Ÿ‡ช | French ๐Ÿ‡ซ๐Ÿ‡ท | +| Galician ๐Ÿ‡ช๐Ÿ‡ธ | Georgian ๐Ÿ‡ฌ๐Ÿ‡ช | German ๐Ÿ‡ฉ๐Ÿ‡ช | Greek ๐Ÿ‡ฌ๐Ÿ‡ท | +| Gujarati ๐Ÿ‡ฎ๐Ÿ‡ณ | Haitian ๐Ÿ‡ญ๐Ÿ‡น | Hausa ๐Ÿ‡ณ๐Ÿ‡ฌ | Hawaiian ๐Ÿ‡บ๐Ÿ‡ธ | +| Hebrew ๐Ÿ‡ฎ๐Ÿ‡ฑ | Hindi ๐Ÿ‡ฎ๐Ÿ‡ณ | Hungarian ๐Ÿ‡ญ๐Ÿ‡บ | Icelandic ๐Ÿ‡ฎ๐Ÿ‡ธ | +| Indonesian ๐Ÿ‡ฎ๐Ÿ‡ฉ | Italian ๐Ÿ‡ฎ๐Ÿ‡น | Japanese ๐Ÿ‡ฏ๐Ÿ‡ต | Javanese ๐Ÿ‡ฎ๐Ÿ‡ฉ | +| Kannada ๐Ÿ‡ฎ๐Ÿ‡ณ | Kazakh ๐Ÿ‡ฐ๐Ÿ‡ฟ | Khmer ๐Ÿ‡ฐ๐Ÿ‡ญ | Korean ๐Ÿ‡ฐ๐Ÿ‡ท | +| Lao ๐Ÿ‡ฑ๐Ÿ‡ฆ | Latin ๐Ÿ‡ป๐Ÿ‡ฆ | Latvian ๐Ÿ‡ฑ๐Ÿ‡ป | Lingala ๐Ÿ‡จ๐Ÿ‡ฉ | +| Lithuanian ๐Ÿ‡ฑ๐Ÿ‡น | Luxembourgish ๐Ÿ‡ฑ๐Ÿ‡บ | Macedonian ๐Ÿ‡ฒ๐Ÿ‡ฐ | Malagasy ๐Ÿ‡ฒ๐Ÿ‡ฌ | +| Malay ๐Ÿ‡ฒ๐Ÿ‡พ | Malayalam ๐Ÿ‡ฎ๐Ÿ‡ณ | Maltese ๐Ÿ‡ฒ๐Ÿ‡น | Maori ๐Ÿ‡ณ๐Ÿ‡ฟ | +| Marathi ๐Ÿ‡ฎ๐Ÿ‡ณ | Moldavian ๐Ÿ‡ฒ๐Ÿ‡ฉ | Mongolian ๐Ÿ‡ฒ๐Ÿ‡ณ | Myanmar ๐Ÿ‡ฒ๐Ÿ‡ฒ | +| Nepali ๐Ÿ‡ณ๐Ÿ‡ต | Norwegian ๐Ÿ‡ณ๐Ÿ‡ด | Occitan ๐Ÿ‡ซ๐Ÿ‡ท | Panjabi ๐Ÿ‡ฎ๐Ÿ‡ณ | +| Pashto ๐Ÿ‡ฆ๐Ÿ‡ซ | Persian ๐Ÿ‡ฎ๐Ÿ‡ท | Polish ๐Ÿ‡ต๐Ÿ‡ฑ | Portuguese ๐Ÿ‡ต๐Ÿ‡น | +| Punjabi ๐Ÿ‡ฎ๐Ÿ‡ณ | Romanian ๐Ÿ‡ท๐Ÿ‡ด | Russian ๐Ÿ‡ท๐Ÿ‡บ | Sanskrit ๐Ÿ‡ฎ๐Ÿ‡ณ | +| Serbian ๐Ÿ‡ท๐Ÿ‡ธ | Shona ๐Ÿ‡ฟ๐Ÿ‡ผ | Sindhi ๐Ÿ‡ต๐Ÿ‡ฐ | Sinhala ๐Ÿ‡ฑ๐Ÿ‡ฐ | +| Slovak ๐Ÿ‡ธ๐Ÿ‡ฐ | Slovenian ๐Ÿ‡ธ๐Ÿ‡ฎ | Somali ๐Ÿ‡ธ๐Ÿ‡ด | Spanish ๐Ÿ‡ช๐Ÿ‡ธ | +| Sundanese ๐Ÿ‡ฎ๐Ÿ‡ฉ | Swahili ๐Ÿ‡ฐ๐Ÿ‡ช | Swedish ๐Ÿ‡ธ๐Ÿ‡ช | Tagalog ๐Ÿ‡ต๐Ÿ‡ญ | +| Tajik ๐Ÿ‡น๐Ÿ‡ฏ | Tamil ๐Ÿ‡ฎ๐Ÿ‡ณ | Tatar ๐Ÿ‡ท๐Ÿ‡บ | Telugu ๐Ÿ‡ฎ๐Ÿ‡ณ | +| Thai ๐Ÿ‡น๐Ÿ‡ญ | Tibetan ๐Ÿ‡จ๐Ÿ‡ณ | Turkish ๐Ÿ‡น๐Ÿ‡ท | Turkmen ๐Ÿ‡น๐Ÿ‡ฒ | +| Ukrainian ๐Ÿ‡บ๐Ÿ‡ฆ | Urdu ๐Ÿ‡ต๐Ÿ‡ฐ | Uzbek ๐Ÿ‡บ๐Ÿ‡ฟ | Vietnamese ๐Ÿ‡ปe | +| Welsh ๐Ÿด๓ ง๓ ข๓ ท๓ ฌ๓ ณ๓ ฟ | Yiddish ๐Ÿ‡ฎ๐Ÿ‡ฑ | Yoruba ๐Ÿ‡ณ๐Ÿ‡ฌ | | -*Note: The model will automatically detect the language being spoken.*
--- @@ -131,35 +122,34 @@ The model is trained on 680,000 hours of multilingual data and supports the foll ## ๐Ÿ”ง Troubleshooting
-The app crashes immediately on start -Ensure you have the Microsoft Visual C++ Redistributable (2015-2022) installed, as the underlying CTranslate2 engine requires these standard libraries. +๐Ÿ”ฅ App crashes on start +
+The underlying engine requires standard C++ libraries. Install the Microsoft Visual C++ Redistributable (2015-2022). +
-"Simulate Typing" is slow or misses characters -Adjust the Typing Speed slider in Settings. Some older applications cannot handle supersonic 6000 CPM input; try lowering it to 1200 CPM. +๐ŸŒ "Simulate Typing" is slow +
+Some apps (games, RDP) can't handle supersonic input. Go to Settings and lower the Typing Speed to ~1200 CPM. +
-Microphone not picking up audio -The agent uses your System Default Input Device. Ensure your microphone is set as Default in Windows Sound Settings. +๐ŸŽค No Audio / Silence +
+The agent listens to the Default Communication Device. Ensure your microphone is set correctly in Windows Sound Settings. +
--- -## โš–๏ธ License & Rights +
-**Public Domain (CC0 1.0)** +### โš–๏ธ PUBLIC DOMAIN (CC0 1.0) -To the extent possible under law, the creators of this interface have waived all copyright and related or neighboring rights to this work. This tool belongs to the commons. It is a gift to the digital proletariat. +*No Rights Reserved. No Gods. No Managers.* -* **Fork it.** -* **Mod it.** -* **Distribute it.** +Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD). -### Credits -* **OpenAI**: For the Whisper weights (MIT). -* **Systran**: For Faster-Whisper (MIT). -* **Qt Company**: For the UI framework (LGPL). - -*No gods, no cloud managers.* +