diff --git a/README.md b/README.md index 61ce846..78c4439 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,8 @@
*A high-performance, locally-run dictation agent for the liberated desktop.* -[Download Release](https://git.lashman.live/lashman/whisper_voice/releases) • [View Source](https://git.lashman.live/lashman/whisper_voice) • [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues) +[![Download](https://img.shields.io/github/v/release/lashman/whisper_voice?label=Download&style=for-the-badge&color=2563eb)](https://git.lashman.live/lashman/whisper_voice/releases/latest) +[![License](https://img.shields.io/badge/License-CC0_1.0-lightgrey?style=for-the-badge)](https://creativecommons.org/publicdomain/zero/1.0/) @@ -23,30 +24,32 @@ Whisper Voice is built on the principle of **technological sovereignty**. It pro ## ⚡ Technical Core -Under the hood, Whisper Voice exploits the raw power of **Faster-Whisper**, a highly optimized implementation of OpenAI's Whisper model using CTranslate2. This delivers: +Under the hood, Whisper Voice exploits the raw power of **[Faster-Whisper](https://github.com/SYSTRAN/faster-whisper)**, a hyper-optimized implementation of OpenAI's Whisper model using CTranslate2. * **Zero Latency Loop**: By eliminating network round-trips, transcription happens as fast as your hardware can think. * **Privacy by Physics**: Data physically cannot leave your machine because the engine has no cloud uplink. The cable is cut. -* **Precision Engineering**: Leveraging 8-bit quantization to run even the `Large-v3` models on consumer GPUs with minimal memory footprint. - -## ✨ Capabilities - -### 🧠 Adaptive Intelligence -Choose the model that fits your rig. From `Tiny` (low resource, high speed) to `Large` (human-level accuracy). The agent automatically configures itself for your available Compute Device (CUDA GPU or CPU). - -### 🚀 Inputs & Injection -* **Global Hotkey**: A rigorous system-wide hook (default `F9`) puts the ear of the machine at your fingertips. -* **Simulated Typing**: Bylaws of some applications block pasting? No problem. Our engine simulates keystrokes at supersonic speeds (up to **6000 CPM**), bypassing restrictions like water flowing around a rock. -* **Clipboard Mode**: Standard, lightning-fast text injection for permissive environments. - -### 🛡️ System Integration -* **Glassmorphic UI**: A modern, non-intrusive QML interface that respects your screen real estate. -* **Tray Agent**: Retracts to the system tray, maintaining a low profile until summoned. -* **Bootstrapper**: A self-assembling runtime that provisions its own dependencies using an isolated embedded Python environment. No pollution of your system PATH. +* **Precision Engineering**: Leveraging 8-bit quantization (`int8`) to run professional-grade models on consumer hardware with minimal memory footprint. --- -## �️ Usage Guide +## 📊 Model Performance + +Choose the engine that matches your hardware capabilities. + +| Model | GPU VRAM (rec.) | CPU RAM (rec.) | Relative Speed | Capability | +| :--- | :--- | :--- | :--- | :--- | +| **Tiny** | ~500 MB | ~1 GB | Supersonic | Quick commands, simple dictation. | +| **Base** | ~600 MB | ~1 GB | Very Fast | Good balance for older hardware. | +| **Small** | ~1 GB | ~2 GB | Fast | Standard driver. High accuracy for English. | +| **Medium** | ~2 GB | ~4 GB | Moderate | High precision. Great for accents. | +| **Large-v3 Turbo** | ~4 GB | ~6 GB | Fast/Mod | **Best Balance.** Near Large accuracy at much higher speeds. | +| **Large-v3** | ~5 GB | ~8 GB | Heavy | Professional grade. Near-perfect understanding. | + +*Note: CPU inference is significantly slower than GPU but fully supported via highly optimized vector instructions (AVX2).* + +--- + +## 🛠️ Usage Guide ### Installation 1. **Acquire**: Download the latest portable executable from the [Releases](https://git.lashman.live/lashman/whisper_voice/releases) page. @@ -54,39 +57,26 @@ Choose the model that fits your rig. From `Tiny` (low resource, high speed) to ` 3. **Initialize**: Run the executable. It will autonomously hydrate its runtime environment (approx. 2GB) on the first launch. ### Operation -1. **Configure**: Open Settings via the tray icon. Select your **Model Size** and **Compute Device**. +1. **Configure**: Right-click the **System Tray Icon** to open Settings. Select your **Model Size** and **Compute Device**. 2. **Engage**: Press `F9` (or your custom hotkey) to open the channel. 3. **Dictate**: Speak clearly. The noise gate will isolate your voice. 4. **Execute**: Release the key. The machine interprets the signal and injects the text into your active window immediately. --- -## 🧪 Model Performance +## ⚖️ License & Rights -| Model | VRAM (Approx) | Speed | Capabilities | -| :--- | :--- | :--- | :--- | -| **Tiny** | < 1 GB | Supersonic | Quick commands, simple dictation. | -| **Base** | 1 GB | Very Fast | Good balance for older hardware. | -| **Small** | 2 GB | Fast | Standard daily driver. High English accuracy. | -| **Medium** | 5 GB | Moderate | High precision, handles accents well. | -| **Large-v3** | 8 GB+ | Heavy | Professional grade. Near-perfect understanding. | +**Public Domain (CC0 1.0)** -*Note: Performance scales with your GPU capabilities.* +To the extent possible under law, the creators of this interface have waived all copyright and related or neighboring rights to this work. This tool belongs to the commons. +* **Fork it.** +* **Mod it.** +* **Sell it.** +* **Liberate it.** ---- +### Acknowledgments +While this interface is CC0, it relies on the shoulders of giants: +* **OpenAI Whisper Models**: Released under the MIT License. +* **Faster-Whisper & CTranslate2**: Released under the MIT License. -## 🤝 Mutual Aid - -This software is free as in freedom. It is a commons, not a commodity. - -Contributions are welcome from all who share the vision of decentralized, local-first computing. Whether it is code, documentation, or design—labor given freely enriches the community whole. - -**[Fork the Repository](https://git.lashman.live/lashman/whisper_voice)** - ---- - -
-"The master's tools will never dismantle the master's house." -
-Build your own tools. Run them locally. -
+*No gods, no cloud managers.*