Update docs with license and model stats

This commit is contained in:
Your Name
2026-01-24 17:16:53 +02:00
parent b15ce8076f
commit 0d426aea4b

View File

@@ -9,7 +9,8 @@
<br>
*A high-performance, locally-run dictation agent for the liberated desktop.*
[Download Release](https://git.lashman.live/lashman/whisper_voice/releases) • [View Source](https://git.lashman.live/lashman/whisper_voice) • [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues)
[![Download](https://img.shields.io/github/v/release/lashman/whisper_voice?label=Download&style=for-the-badge&color=2563eb)](https://git.lashman.live/lashman/whisper_voice/releases/latest)
[![License](https://img.shields.io/badge/License-CC0_1.0-lightgrey?style=for-the-badge)](https://creativecommons.org/publicdomain/zero/1.0/)
</div>
@@ -23,30 +24,32 @@ Whisper Voice is built on the principle of **technological sovereignty**. It pro
## ⚡ Technical Core
Under the hood, Whisper Voice exploits the raw power of **Faster-Whisper**, a highly optimized implementation of OpenAI's Whisper model using CTranslate2. This delivers:
Under the hood, Whisper Voice exploits the raw power of **[Faster-Whisper](https://github.com/SYSTRAN/faster-whisper)**, a hyper-optimized implementation of OpenAI's Whisper model using CTranslate2.
* **Zero Latency Loop**: By eliminating network round-trips, transcription happens as fast as your hardware can think.
* **Privacy by Physics**: Data physically cannot leave your machine because the engine has no cloud uplink. The cable is cut.
* **Precision Engineering**: Leveraging 8-bit quantization to run even the `Large-v3` models on consumer GPUs with minimal memory footprint.
## ✨ Capabilities
### 🧠 Adaptive Intelligence
Choose the model that fits your rig. From `Tiny` (low resource, high speed) to `Large` (human-level accuracy). The agent automatically configures itself for your available Compute Device (CUDA GPU or CPU).
### 🚀 Inputs & Injection
* **Global Hotkey**: A rigorous system-wide hook (default `F9`) puts the ear of the machine at your fingertips.
* **Simulated Typing**: Bylaws of some applications block pasting? No problem. Our engine simulates keystrokes at supersonic speeds (up to **6000 CPM**), bypassing restrictions like water flowing around a rock.
* **Clipboard Mode**: Standard, lightning-fast text injection for permissive environments.
### 🛡️ System Integration
* **Glassmorphic UI**: A modern, non-intrusive QML interface that respects your screen real estate.
* **Tray Agent**: Retracts to the system tray, maintaining a low profile until summoned.
* **Bootstrapper**: A self-assembling runtime that provisions its own dependencies using an isolated embedded Python environment. No pollution of your system PATH.
* **Precision Engineering**: Leveraging 8-bit quantization (`int8`) to run professional-grade models on consumer hardware with minimal memory footprint.
---
## <EFBFBD> Usage Guide
## 📊 Model Performance
Choose the engine that matches your hardware capabilities.
| Model | GPU VRAM (rec.) | CPU RAM (rec.) | Relative Speed | Capability |
| :--- | :--- | :--- | :--- | :--- |
| **Tiny** | ~500 MB | ~1 GB | Supersonic | Quick commands, simple dictation. |
| **Base** | ~600 MB | ~1 GB | Very Fast | Good balance for older hardware. |
| **Small** | ~1 GB | ~2 GB | Fast | Standard driver. High accuracy for English. |
| **Medium** | ~2 GB | ~4 GB | Moderate | High precision. Great for accents. |
| **Large-v3 Turbo** | ~4 GB | ~6 GB | Fast/Mod | **Best Balance.** Near Large accuracy at much higher speeds. |
| **Large-v3** | ~5 GB | ~8 GB | Heavy | Professional grade. Near-perfect understanding. |
*Note: CPU inference is significantly slower than GPU but fully supported via highly optimized vector instructions (AVX2).*
---
## 🛠️ Usage Guide
### Installation
1. **Acquire**: Download the latest portable executable from the [Releases](https://git.lashman.live/lashman/whisper_voice/releases) page.
@@ -54,39 +57,26 @@ Choose the model that fits your rig. From `Tiny` (low resource, high speed) to `
3. **Initialize**: Run the executable. It will autonomously hydrate its runtime environment (approx. 2GB) on the first launch.
### Operation
1. **Configure**: Open Settings via the tray icon. Select your **Model Size** and **Compute Device**.
1. **Configure**: Right-click the **System Tray Icon** to open Settings. Select your **Model Size** and **Compute Device**.
2. **Engage**: Press `F9` (or your custom hotkey) to open the channel.
3. **Dictate**: Speak clearly. The noise gate will isolate your voice.
4. **Execute**: Release the key. The machine interprets the signal and injects the text into your active window immediately.
---
## 🧪 Model Performance
## ⚖️ License & Rights
| Model | VRAM (Approx) | Speed | Capabilities |
| :--- | :--- | :--- | :--- |
| **Tiny** | < 1 GB | Supersonic | Quick commands, simple dictation. |
| **Base** | 1 GB | Very Fast | Good balance for older hardware. |
| **Small** | 2 GB | Fast | Standard daily driver. High English accuracy. |
| **Medium** | 5 GB | Moderate | High precision, handles accents well. |
| **Large-v3** | 8 GB+ | Heavy | Professional grade. Near-perfect understanding. |
**Public Domain (CC0 1.0)**
*Note: Performance scales with your GPU capabilities.*
To the extent possible under law, the creators of this interface have waived all copyright and related or neighboring rights to this work. This tool belongs to the commons.
* **Fork it.**
* **Mod it.**
* **Sell it.**
* **Liberate it.**
---
### Acknowledgments
While this interface is CC0, it relies on the shoulders of giants:
* **OpenAI Whisper Models**: Released under the MIT License.
* **Faster-Whisper & CTranslate2**: Released under the MIT License.
## 🤝 Mutual Aid
This software is free as in freedom. It is a commons, not a commodity.
Contributions are welcome from all who share the vision of decentralized, local-first computing. Whether it is code, documentation, or design—labor given freely enriches the community whole.
**[Fork the Repository](https://git.lashman.live/lashman/whisper_voice)**
---
<div align="center">
<i>"The master's tools will never dismantle the master's house."</i>
<br>
<b>Build your own tools. Run them locally.</b>
</div>
*No gods, no cloud managers.*