156 lines
7.0 KiB
Markdown
156 lines
7.0 KiB
Markdown
<div align="center">
|
|
|
|
# 🎙️ W H I S P E R V O I C E
|
|
### SOVEREIGN SPEECH RECOGNITION
|
|
|
|
<br>
|
|
|
|

|
|
[](https://git.lashman.live/lashman/whisper_voice/releases/latest)
|
|
[](https://creativecommons.org/publicdomain/zero/1.0/)
|
|
|
|
<br>
|
|
|
|
> *"The master's tools will never dismantle the master's house."* — Audre Lorde
|
|
> <br>
|
|
> **Build your own tools. Run them locally.**
|
|
|
|
[Report Issue](https://git.lashman.live/lashman/whisper_voice/issues) • [View Source](https://git.lashman.live/lashman/whisper_voice) • [Releases](https://git.lashman.live/lashman/whisper_voice/releases)
|
|
|
|
</div>
|
|
|
|
<br>
|
|
|
|
## ✊ The Manifesto
|
|
|
|
**We hold these truths to be self-evident:** That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.
|
|
|
|
**Whisper Voice** is built on the principle of **technological sovereignty**. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.
|
|
|
|
---
|
|
|
|
## ⚡ Technical Architecture
|
|
|
|
This operates on the metal. It is not a wrapper. It is an engine.
|
|
|
|
| Component | Technology | Benefit |
|
|
| :--- | :--- | :--- |
|
|
| **Inference Core** | **Faster-Whisper** | Hyper-optimized implementation of OpenAI's Whisper using **CTranslate2**. Delivers **4x speedups** over PyTorch. |
|
|
| **Quantization** | **INT8** | 8-bit quantization enables Pro-grade models (`Large-v3`) to run on consumer GPUs with minimal VRAM. |
|
|
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out silence and background noise, conserving compute. |
|
|
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that feels native yet remains OS-independent. |
|
|
|
|
---
|
|
|
|
## 📊 Intelligence Matrix
|
|
|
|
Select the model that aligns with your hardware capabilities.
|
|
|
|
| Model | VRAM (GPU) | RAM (CPU) | Velocity | Designation |
|
|
| :--- | :--- | :--- | :--- | :--- |
|
|
| `Tiny` | **~500 MB** | ~1 GB | ⚡ **Supersonic** | Command & Control, older hardware. |
|
|
| `Base` | **~600 MB** | ~1 GB | 🚀 **Very Fast** | Daily driver for low-power laptops. |
|
|
| `Small` | **~1 GB** | ~2 GB | ⏩ **Fast** | High accuracy English dictation. |
|
|
| `Medium` | **~2 GB** | ~4 GB | ⚖️ **Balanced** | Complex vocabulary, foreign accents. |
|
|
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **Sweet Spot.** Near-Large smarts, Medium speed. |
|
|
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional transcription. Uncompromised. |
|
|
|
|
> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.*
|
|
|
|
---
|
|
|
|
## 🛠️ Operations
|
|
|
|
### 📥 Deployment
|
|
1. **Download**: Grab `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
|
|
2. **Deploy**: Place it anywhere. It is portable.
|
|
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python environment (~2GB) on first launch.
|
|
|
|
### 🕹️ Controls
|
|
* **Global Hook**: `F9` (Default). Press to open the channel. Release to inject text.
|
|
* **Tray Agent**: Retracts to the system tray. Right-click for **Settings** or **File Transcription**.
|
|
|
|
### 📡 Input Modes
|
|
| Mode | Description | Speed |
|
|
| :--- | :--- | :--- |
|
|
| **Clipboard Paste** | Standard text injection via OS clipboard. | Instant |
|
|
| **Simulate Typing** | Mimics physical keystrokes. Bypasses anti-paste blocks. | Up to **6000** CPM |
|
|
|
|
---
|
|
|
|
## 🌐 Universal Translation
|
|
|
|
The model listens in **99 languages** and translates them to English or transcribes them natively.
|
|
|
|
<details>
|
|
<summary><b>Click to view supported languages</b></summary>
|
|
<br>
|
|
|
|
| | | | |
|
|
| :--- | :--- | :--- | :--- |
|
|
| Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 |
|
|
| Armenian 🇦🇲 | Assamese 🇮🇳 | Azerbaijani 🇦🇿 | Bashkir 🇷🇺 |
|
|
| Basque 🇪🇸 | Belarusian 🇧🇾 | Bengali 🇧🇩 | Bosnian 🇧🇦 |
|
|
| Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 | Castilian 🇪🇸 |
|
|
| Catalan 🇪🇸 | Chinese 🇨🇳 | Croatian 🇭🇷 | Czech 🇨🇿 |
|
|
| Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 |
|
|
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 |
|
|
| Galician 🇪🇸 | Georgian 🇬🇪 | German 🇩🇪 | Greek 🇬🇷 |
|
|
| Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
|
|
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 |
|
|
| Indonesian 🇮🇩 | Italian 🇮🇹 | Japanese 🇯🇵 | Javanese 🇮🇩 |
|
|
| Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
|
|
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 |
|
|
| Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 | Macedonian 🇲🇰 | Malagasy 🇲🇬 |
|
|
| Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
|
|
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 |
|
|
| Nepali 🇳🇵 | Norwegian 🇳🇴 | Occitan 🇫🇷 | Panjabi 🇮🇳 |
|
|
| Pashto 🇦🇫 | Persian 🇮🇷 | Polish 🇵🇱 | Portuguese 🇵🇹 |
|
|
| Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 | Sanskrit 🇮🇳 |
|
|
| Serbian 🇷🇸 | Shona 🇿🇼 | Sindhi 🇵🇰 | Sinhala 🇱🇰 |
|
|
| Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 |
|
|
| Sundanese 🇮🇩 | Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 |
|
|
| Tajik 🇹🇯 | Tamil 🇮🇳 | Tatar 🇷🇺 | Telugu 🇮🇳 |
|
|
| Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 | Turkmen 🇹🇲 |
|
|
| Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e |
|
|
| Welsh 🏴 | Yiddish 🇮🇱 | Yoruba 🇳🇬 | |
|
|
|
|
</details>
|
|
|
|
---
|
|
|
|
## 🔧 Troubleshooting
|
|
|
|
<details>
|
|
<summary><b>🔥 App crashes on start</b></summary>
|
|
<blockquote>
|
|
The underlying engine requires standard C++ libraries. Install the <b>Microsoft Visual C++ Redistributable (2015-2022)</b>.
|
|
</blockquote>
|
|
</details>
|
|
|
|
<details>
|
|
<summary><b>🐌 "Simulate Typing" is slow</b></summary>
|
|
<blockquote>
|
|
Some apps (games, RDP) can't handle supersonic input. Go to <b>Settings</b> and lower the <b>Typing Speed</b> to ~1200 CPM.
|
|
</blockquote>
|
|
</details>
|
|
|
|
<details>
|
|
<summary><b>🎤 No Audio / Silence</b></summary>
|
|
<blockquote>
|
|
The agent listens to the <b>Default Communication Device</b>. Ensure your microphone is set correctly in Windows Sound Settings.
|
|
</blockquote>
|
|
</details>
|
|
|
|
---
|
|
|
|
<div align="center">
|
|
|
|
### ⚖️ PUBLIC DOMAIN (CC0 1.0)
|
|
|
|
*No Rights Reserved. No Gods. No Managers.*
|
|
|
|
Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD).
|
|
|
|
</div>
|