Update docs with license and model stats
This commit is contained in:
80
README.md
80
README.md
@@ -9,7 +9,8 @@
|
||||
<br>
|
||||
*A high-performance, locally-run dictation agent for the liberated desktop.*
|
||||
|
||||
[Download Release](https://git.lashman.live/lashman/whisper_voice/releases) • [View Source](https://git.lashman.live/lashman/whisper_voice) • [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues)
|
||||
[](https://git.lashman.live/lashman/whisper_voice/releases/latest)
|
||||
[](https://creativecommons.org/publicdomain/zero/1.0/)
|
||||
|
||||
</div>
|
||||
|
||||
@@ -23,30 +24,32 @@ Whisper Voice is built on the principle of **technological sovereignty**. It pro
|
||||
|
||||
## ⚡ Technical Core
|
||||
|
||||
Under the hood, Whisper Voice exploits the raw power of **Faster-Whisper**, a highly optimized implementation of OpenAI's Whisper model using CTranslate2. This delivers:
|
||||
Under the hood, Whisper Voice exploits the raw power of **[Faster-Whisper](https://github.com/SYSTRAN/faster-whisper)**, a hyper-optimized implementation of OpenAI's Whisper model using CTranslate2.
|
||||
|
||||
* **Zero Latency Loop**: By eliminating network round-trips, transcription happens as fast as your hardware can think.
|
||||
* **Privacy by Physics**: Data physically cannot leave your machine because the engine has no cloud uplink. The cable is cut.
|
||||
* **Precision Engineering**: Leveraging 8-bit quantization to run even the `Large-v3` models on consumer GPUs with minimal memory footprint.
|
||||
|
||||
## ✨ Capabilities
|
||||
|
||||
### 🧠 Adaptive Intelligence
|
||||
Choose the model that fits your rig. From `Tiny` (low resource, high speed) to `Large` (human-level accuracy). The agent automatically configures itself for your available Compute Device (CUDA GPU or CPU).
|
||||
|
||||
### 🚀 Inputs & Injection
|
||||
* **Global Hotkey**: A rigorous system-wide hook (default `F9`) puts the ear of the machine at your fingertips.
|
||||
* **Simulated Typing**: Bylaws of some applications block pasting? No problem. Our engine simulates keystrokes at supersonic speeds (up to **6000 CPM**), bypassing restrictions like water flowing around a rock.
|
||||
* **Clipboard Mode**: Standard, lightning-fast text injection for permissive environments.
|
||||
|
||||
### 🛡️ System Integration
|
||||
* **Glassmorphic UI**: A modern, non-intrusive QML interface that respects your screen real estate.
|
||||
* **Tray Agent**: Retracts to the system tray, maintaining a low profile until summoned.
|
||||
* **Bootstrapper**: A self-assembling runtime that provisions its own dependencies using an isolated embedded Python environment. No pollution of your system PATH.
|
||||
* **Precision Engineering**: Leveraging 8-bit quantization (`int8`) to run professional-grade models on consumer hardware with minimal memory footprint.
|
||||
|
||||
---
|
||||
|
||||
## <EFBFBD>️ Usage Guide
|
||||
## 📊 Model Performance
|
||||
|
||||
Choose the engine that matches your hardware capabilities.
|
||||
|
||||
| Model | GPU VRAM (rec.) | CPU RAM (rec.) | Relative Speed | Capability |
|
||||
| :--- | :--- | :--- | :--- | :--- |
|
||||
| **Tiny** | ~500 MB | ~1 GB | Supersonic | Quick commands, simple dictation. |
|
||||
| **Base** | ~600 MB | ~1 GB | Very Fast | Good balance for older hardware. |
|
||||
| **Small** | ~1 GB | ~2 GB | Fast | Standard driver. High accuracy for English. |
|
||||
| **Medium** | ~2 GB | ~4 GB | Moderate | High precision. Great for accents. |
|
||||
| **Large-v3 Turbo** | ~4 GB | ~6 GB | Fast/Mod | **Best Balance.** Near Large accuracy at much higher speeds. |
|
||||
| **Large-v3** | ~5 GB | ~8 GB | Heavy | Professional grade. Near-perfect understanding. |
|
||||
|
||||
*Note: CPU inference is significantly slower than GPU but fully supported via highly optimized vector instructions (AVX2).*
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Usage Guide
|
||||
|
||||
### Installation
|
||||
1. **Acquire**: Download the latest portable executable from the [Releases](https://git.lashman.live/lashman/whisper_voice/releases) page.
|
||||
@@ -54,39 +57,26 @@ Choose the model that fits your rig. From `Tiny` (low resource, high speed) to `
|
||||
3. **Initialize**: Run the executable. It will autonomously hydrate its runtime environment (approx. 2GB) on the first launch.
|
||||
|
||||
### Operation
|
||||
1. **Configure**: Open Settings via the tray icon. Select your **Model Size** and **Compute Device**.
|
||||
1. **Configure**: Right-click the **System Tray Icon** to open Settings. Select your **Model Size** and **Compute Device**.
|
||||
2. **Engage**: Press `F9` (or your custom hotkey) to open the channel.
|
||||
3. **Dictate**: Speak clearly. The noise gate will isolate your voice.
|
||||
4. **Execute**: Release the key. The machine interprets the signal and injects the text into your active window immediately.
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Model Performance
|
||||
## ⚖️ License & Rights
|
||||
|
||||
| Model | VRAM (Approx) | Speed | Capabilities |
|
||||
| :--- | :--- | :--- | :--- |
|
||||
| **Tiny** | < 1 GB | Supersonic | Quick commands, simple dictation. |
|
||||
| **Base** | 1 GB | Very Fast | Good balance for older hardware. |
|
||||
| **Small** | 2 GB | Fast | Standard daily driver. High English accuracy. |
|
||||
| **Medium** | 5 GB | Moderate | High precision, handles accents well. |
|
||||
| **Large-v3** | 8 GB+ | Heavy | Professional grade. Near-perfect understanding. |
|
||||
**Public Domain (CC0 1.0)**
|
||||
|
||||
*Note: Performance scales with your GPU capabilities.*
|
||||
To the extent possible under law, the creators of this interface have waived all copyright and related or neighboring rights to this work. This tool belongs to the commons.
|
||||
* **Fork it.**
|
||||
* **Mod it.**
|
||||
* **Sell it.**
|
||||
* **Liberate it.**
|
||||
|
||||
---
|
||||
### Acknowledgments
|
||||
While this interface is CC0, it relies on the shoulders of giants:
|
||||
* **OpenAI Whisper Models**: Released under the MIT License.
|
||||
* **Faster-Whisper & CTranslate2**: Released under the MIT License.
|
||||
|
||||
## 🤝 Mutual Aid
|
||||
|
||||
This software is free as in freedom. It is a commons, not a commodity.
|
||||
|
||||
Contributions are welcome from all who share the vision of decentralized, local-first computing. Whether it is code, documentation, or design—labor given freely enriches the community whole.
|
||||
|
||||
**[Fork the Repository](https://git.lashman.live/lashman/whisper_voice)**
|
||||
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
<i>"The master's tools will never dismantle the master's house."</i>
|
||||
<br>
|
||||
<b>Build your own tools. Run them locally.</b>
|
||||
</div>
|
||||
*No gods, no cloud managers.*
|
||||
|
||||
Reference in New Issue
Block a user