Final documentation polish

This commit is contained in:
Your Name
2026-01-24 17:20:22 +02:00
parent 0d426aea4b
commit e900201214

111
README.md
View File

@@ -5,12 +5,20 @@
<br> <br>
![Banner](https://img.shields.io/badge/STATUS-OPERATIONAL-success?style=for-the-badge&logo=server)
**Your Voice. Your Machine. Your Data.** **Your Voice. Your Machine. Your Data.**
<br> <br>
*A high-performance, locally-run dictation agent for the liberated desktop.* *A high-performance, locally-run dictation agent for the liberated desktop.*
[![Download](https://img.shields.io/github/v/release/lashman/whisper_voice?label=Download&style=for-the-badge&color=2563eb)](https://git.lashman.live/lashman/whisper_voice/releases/latest) [![Download](https://img.shields.io/badge/DOWNLOAD-v1.0.0-2563eb?style=for-the-badge&logo=windows&logoColor=white)](https://git.lashman.live/lashman/whisper_voice/releases/latest)
[![License](https://img.shields.io/badge/License-CC0_1.0-lightgrey?style=for-the-badge)](https://creativecommons.org/publicdomain/zero/1.0/) [![License](https://img.shields.io/badge/LICENSE-CC0_PUBLIC_DOMAIN-lightgrey?style=for-the-badge&logo=creative-commons&logoColor=black)](https://creativecommons.org/publicdomain/zero/1.0/)
<br>
<p align="center">
<img src="https://raw.githubusercontent.com/Tarikul-Islam-Anik/Animated-Fluent-Emojis/master/Emojis/Objects/Microphone.png" alt="Microphone" width="100" />
</p>
</div> </div>
@@ -18,49 +26,84 @@
## ✊ The Manifesto ## ✊ The Manifesto
**We hold these truths to be self-evident: That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.** **We hold these truths to be self-evident:** That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.
Whisper Voice is built on the principle of **technological sovereignty**. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours. Whisper Voice is built on the principle of **technological sovereignty**. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.
> *"The master's tools will never dismantle the master's house."* — Audre Lorde
> <br>**Build your own tools. Run them locally.**
---
## ⚡ Technical Core ## ⚡ Technical Core
Under the hood, Whisper Voice exploits the raw power of **[Faster-Whisper](https://github.com/SYSTRAN/faster-whisper)**, a hyper-optimized implementation of OpenAI's Whisper model using CTranslate2. Whisper Voice is not a wrapper for an API. It is a fully contained neural inference engine running on your metal.
* **Zero Latency Loop**: By eliminating network round-trips, transcription happens as fast as your hardware can think. ### The Engine: Faster-Whisper
* **Privacy by Physics**: Data physically cannot leave your machine because the engine has no cloud uplink. The cable is cut. We utilize the **CTranslate2** backend—a high-performance inference engine for Transformer models. This allows us to run OpenAI's Whisper architectures with:
* **Precision Engineering**: Leveraging 8-bit quantization (`int8`) to run professional-grade models on consumer hardware with minimal memory footprint. * **4x Speedup** over standard PyTorch implementations.
* **4x Memory Reduction** via 8-bit quantization (`int8`), enabling Pro-grade models on consumer GPUs.
### The Sense: Silero VAD
To distinguish human speech from background noise, we employ **Silero VAD** (Voice Activity Detection). This ensures that the agent only listens when you speak, conserving compute resources and preventing hallucinated text from silence.
### The Interface: Qt 6 (PySide6)
The UI is built with **Qt Quick/QML**, rendering a hardware-accelerated, glassmorphic overlay that feels native to modern desktop environments while remaining completely decoupled from OS spyware.
--- ---
## 📊 Model Performance ## 📊 Model Intelligence
Choose the engine that matches your hardware capabilities. Select the intelligence level that matches your hardware reality.
| Model | GPU VRAM (rec.) | CPU RAM (rec.) | Relative Speed | Capability | | Model | GPU VRAM | CPU RAM | Speed | Best For |
| :--- | :--- | :--- | :--- | :--- | | :--- | :--- | :--- | :--- | :--- |
| **Tiny** | ~500 MB | ~1 GB | Supersonic | Quick commands, simple dictation. | | **Tiny** | ~500 MB | ~1 GB | Supersonic | Quick commands, older machinery. |
| **Base** | ~600 MB | ~1 GB | Very Fast | Good balance for older hardware. | | **Base** | ~600 MB | ~1 GB | 🚀 Very Fast | Daily driving on low-power laptops. |
| **Small** | ~1 GB | ~2 GB | Fast | Standard driver. High accuracy for English. | | **Small** | ~1 GB | ~2 GB | Fast | High accuracy for English dictation. |
| **Medium** | ~2 GB | ~4 GB | Moderate | High precision. Great for accents. | | **Medium** | ~2 GB | ~4 GB | ⚖️ Balanced | Complex vocabulary and accents. |
| **Large-v3 Turbo** | ~4 GB | ~6 GB | Fast/Mod | **Best Balance.** Near Large accuracy at much higher speeds. | | **Large-v3 Turbo** | ~4 GB | ~6 GB | **Optimal** | The sweet spot. Large-level smarts, Medium-level speed. |
| **Large-v3** | ~5 GB | ~8 GB | Heavy | Professional grade. Near-perfect understanding. | | **Large-v3** | ~5 GB | ~8 GB | 🧠 Maximum | Professional transcription. Uncompromised quality. |
*Note: CPU inference is significantly slower than GPU but fully supported via highly optimized vector instructions (AVX2).* *Note: The agent automatically detects your hardware (CUDA GPU or CPU) and optimizes the runtime accordingly.*
--- ---
## 🛠️ Usage Guide ## 🛠️ Operational Guide
### Installation ### Deployment
1. **Acquire**: Download the latest portable executable from the [Releases](https://git.lashman.live/lashman/whisper_voice/releases) page. 1. **Download**: Grab the latest `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
2. **Deploy**: Place `WhisperVoice.exe` in a directory of your choosing. 2. **Install**: There is no installation. Place the executable in a directory you control (e.g., `C:\Tools\WhisperVoice`).
3. **Initialize**: Run the executable. It will autonomously hydrate its runtime environment (approx. 2GB) on the first launch. 3. **Bootstrap**: Run it. The agent will self-provision its own isolated Python environment (~2GB). This ensures your system PATH remains clean and unpolluted.
### Operation ### Usage
1. **Configure**: Right-click the **System Tray Icon** to open Settings. Select your **Model Size** and **Compute Device**. * **Hotkeys**: The default trigger is `F9`. You can rebind this in Settings to any combination (e.g., `Ctrl+Space`, `Alt+V`).
2. **Engage**: Press `F9` (or your custom hotkey) to open the channel. * **Injection Modes**:
3. **Dictate**: Speak clearly. The noise gate will isolate your voice. * *Clipboard Paste*: Standard, reliable text insertion.
4. **Execute**: Release the key. The machine interprets the signal and injects the text into your active window immediately. * *Simulate Typing*: A stealth mode that physically mimics keystrokes (up to 6000 CPM) to bypass applications that block pasting (e.g., games, remote terminals).
* **Tray Agent**: The app lives in your system tray. Right-click the icon to access Settings or terminate the process.
### Removal
* **Portable**: To uninstall, simply delete the folder. No registry keys, no hidden services, no trace left behind.
---
## 🔧 Troubleshooting
<details>
<summary><b>The app crashes immediately on start</b></summary>
Ensure you have the <b>Microsoft Visual C++ Redistributable (2015-2022)</b> installed, as the underlying CTranslate2 engine requires these standard libraries.
</details>
<details>
<summary><b>"Simulate Typing" is slow or misses characters</b></summary>
Adjust the <b>Typing Speed</b> slider in Settings. Some older applications cannot handle supersonic 6000 CPM input; try lowering it to 1200 CPM.
</details>
<details>
<summary><b>Microphone not picking up audio</b></summary>
The agent uses your <b>System Default Input Device</b>. Ensure your microphone is set as Default in Windows Sound Settings.
</details>
--- ---
@@ -68,15 +111,15 @@ Choose the engine that matches your hardware capabilities.
**Public Domain (CC0 1.0)** **Public Domain (CC0 1.0)**
To the extent possible under law, the creators of this interface have waived all copyright and related or neighboring rights to this work. This tool belongs to the commons. To the extent possible under law, the creators of this interface have waived all copyright and related or neighboring rights to this work. This tool belongs to the commons. It is a gift to the digital proletariat.
* **Fork it.** * **Fork it.**
* **Mod it.** * **Mod it.**
* **Sell it.** * **Distribute it.**
* **Liberate it.**
### Acknowledgments ### Credits
While this interface is CC0, it relies on the shoulders of giants: * **OpenAI**: For the Whisper weights (MIT).
* **OpenAI Whisper Models**: Released under the MIT License. * **Systran**: For Faster-Whisper (MIT).
* **Faster-Whisper & CTranslate2**: Released under the MIT License. * **Qt Company**: For the UI framework (LGPL).
*No gods, no cloud managers.* *No gods, no cloud managers.*