149 lines
7.9 KiB
Markdown
149 lines
7.9 KiB
Markdown
<div align="center">
|
||
|
||
# 🎙️ W H I S P E R V O I C E
|
||
### SOVEREIGN SPEECH RECOGNITION
|
||
|
||
<br>
|
||
|
||

|
||
[](https://git.lashman.live/lashman/whisper_voice/releases/latest)
|
||
[](https://creativecommons.org/publicdomain/zero/1.0/)
|
||
|
||
<br>
|
||
|
||
> *"The master's tools will never dismantle the master's house."*
|
||
> <br>
|
||
> **Build your own tools. Run them locally. Free your mind.**
|
||
|
||
[View Source](https://git.lashman.live/lashman/whisper_voice) • [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues)
|
||
|
||
</div>
|
||
|
||
<br>
|
||
<br>
|
||
|
||
## 📡 The Transmission
|
||
|
||
We live in an era of enclosure. Our words, our thoughts, and our digital footprints are strip-mined by centralized giants, turned into capital, and sold back to us as "convenience."
|
||
|
||
**Whisper Voice** is a rejection of that contract.
|
||
|
||
It is built on the axiom that **your voice belongs to you**. By bringing state-of-the-art inference down from the server farms and running it on your own metal, we reclaim a small piece of the digital commons. This software answers to no one but you. It has no telemetry, no subscription, and no masters.
|
||
|
||
---
|
||
|
||
## ⚡ The Engine
|
||
|
||
This operates on the silicon. It is not a wrapper. It is a machine.
|
||
|
||
| Component | Technology | Benefit |
|
||
| :--- | :--- | :--- |
|
||
| **Inference Core** | **Faster-Whisper** | Hyper-optimized implementation via **CTranslate2**. Delivers **4x velocity** over standard PyTorch execution. |
|
||
| **Compression** | **INT8 quantization** | Enables Pro-grade models (`Large-v3`) to run on consumer hardware, democratizing access to high-fidelity AI. |
|
||
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only intent is captured. |
|
||
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. |
|
||
|
||
<br>
|
||
|
||
## 🌎 Universal Translator
|
||
|
||
Whisper Voice v1.0.1 introduces a **Neural Translation Engine** built directly into the core. It bypasses the need for corporate translation APIs entirely.
|
||
|
||
* **Input**: Speak in French, Japanese, Russian, or **96 other languages**.
|
||
* **Output**: The engine instantly reconstructs the semantic meaning in fluent **English**.
|
||
* **Local Execution**: No API keys. No data leaks. The translation happens on your GPU.
|
||
|
||
### Dual-Channel Workflow
|
||
The application listens on two separate channels simultaneously, allowing for seamless fluid switching between local and international communication.
|
||
|
||
* **F9 (Default)** -> **Transcribe**: Types exactly what you say, in the language you speak.
|
||
* **F10 (Default)** -> **Translate**: Translates whatever you say in *any* language into English.
|
||
|
||
---
|
||
|
||
## 🕹️ Command & Control
|
||
|
||
### Global Hotkeys
|
||
The agent runs silently in the background, waiting for your signal.
|
||
|
||
* **Transcribe (F9)**: Opens the channel for standard speech-to-text.
|
||
* **Translate (F10)**: Opens the channel for neural translation.
|
||
* **Customization**: Remap these keys in Settings. The recorder supports complex chords (e.g. `Ctrl + Alt + Space`) to fit your workflow.
|
||
|
||
### Injection Protocols
|
||
* **Clipboard Paste**: Standard text injection. Instant, reliable.
|
||
* **Simulate Typing**: Mimics physical keystrokes at superhuman speed (6000 CPM). Bypasses anti-paste restrictions and "protected" windows.
|
||
|
||
<br>
|
||
|
||
## 📊 Intelligence Matrix
|
||
|
||
Select the model that aligns with your available resources.
|
||
|
||
| Model | VRAM (GPU) | RAM (CPU) | Designation | Capability |
|
||
| :--- | :--- | :--- | :--- | :--- |
|
||
| `Tiny` | **~500 MB** | ~1 GB | ⚡ **Supersonic** | Command & Control, older hardware. |
|
||
| `Base` | **~600 MB** | ~1 GB | 🚀 **Very Fast** | Daily driver for low-power laptops. |
|
||
| `Small` | **~1 GB** | ~2 GB | ⏩ **Fast** | High accuracy English dictation. |
|
||
| `Medium` | **~2 GB** | ~4 GB | ⚖️ **Balanced** | Complex vocabulary, foreign accents. |
|
||
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **The Sweet Spot.** Near-Large intelligence, Medium speed. |
|
||
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional grade. Uncompromised. |
|
||
|
||
> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.*
|
||
|
||
---
|
||
|
||
## 🛠️ Deployment
|
||
|
||
### 📥 Installation
|
||
1. **Acquire**: Download `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
|
||
2. **Deploy**: Place it anywhere. It is portable.
|
||
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python runtime (~2GB) on first launch.
|
||
4. **Sync**: Future updates are handled by the **Smart Bootstrapper**, which surgically updates only changed files, respecting your bandwidth and your settings.
|
||
|
||
### 🔧 Troubleshooting
|
||
* **App crashes on start**: Ensure you have [Microsoft Visual C++ Redistributable 2015-2022](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) installed.
|
||
* **"Simulate Typing" is slow**: Some applications (remote desktops, legacy games) cannot handle the data stream. Lower the typing speed in Settings to ~1200 CPM.
|
||
* **No Audio**: The agent listens to the **Default Communication Device**. Verify your Windows Sound Control Panel.
|
||
|
||
<br>
|
||
|
||
---
|
||
|
||
## <20> Supported Languages
|
||
|
||
The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on **Auto-Detect** for fluid multilingual usage.
|
||
|
||
| | | | | | |
|
||
| :--- | :--- | :--- | :--- | :--- | :--- |
|
||
| Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 | Armenian 🇦🇲 | Assamese 🇮🇳 |
|
||
| Azerbaijani 🇦🇿 | Bashkir 🇷🇺 | Basque 🇪🇸 | Belarusian 🇧🇾 | Bengali 🇧🇩 | Bosnian 🇧🇦 |
|
||
| Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 | Castilian 🇪🇸 | Catalan 🇪🇸 | Chinese 🇨🇳 |
|
||
| Croatian 🇭🇷 | Czech 🇨🇿 | Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 |
|
||
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
|
||
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
|
||
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 | Indonesian 🇮🇩 | Italian 🇮🇹 |
|
||
| Japanese 🇯🇵 | Javanese 🇮🇩 | Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
|
||
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 |
|
||
| Macedonian 🇲🇰 | Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
|
||
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 | Norwegian 🇳🇴 |
|
||
| Occitan 🇫🇷 | Panjabi 🇮🇳 | Pashto 🇦🇫 | Persian 🇮🇷 | Polish 🇵🇱 | Portuguese 🇵🇹 |
|
||
| Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 | Sanskrit 🇮🇳 | Serbian 🇷🇸 | Shona 🇿🇼 |
|
||
| Sindhi 🇵🇰 | Sinhala 🇱🇰 | Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 |
|
||
| Sundanese 🇮🇩 | Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 | Tajik 🇹🇯 | Tamil 🇮🇳 |
|
||
| Tatar 🇷🇺 | Telugu 🇮🇳 | Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 | Turkmen 🇹🇲 |
|
||
| Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e | Welsh 🏴 | Yiddish 🇮🇱 |
|
||
| Yoruba 🇳🇬 | | | | | |
|
||
|
||
<br>
|
||
<br>
|
||
|
||
<div align="center">
|
||
|
||
### ⚖️ PUBLIC DOMAIN (CC0 1.0)
|
||
*No Rights Reserved. No Gods. No Masters. No Managers.*
|
||
|
||
Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD).
|
||
|
||
</div>
|