diff --git a/README.md b/README.md index b85fbc1..61ce846 100644 --- a/README.md +++ b/README.md @@ -1,71 +1,92 @@ -# Whisper Voice +
-**Reclaim Your Voice from the Cloud.** +# WHISPER VOICE +### SOVEREIGN SPEECH RECOGNITION -Whisper Voice is a high-performance, strictly local speech-to-text tool designed for the desktop. It provides instant, high-accuracy dictation anywhere on your system—no internet connection required, no corporate servers, and absolutely no data harvesting. +
-We believe that the tools of production—and communication—should belong to the individual, not rented from centralized tech giants. +**Your Voice. Your Machine. Your Data.** +
+*A high-performance, locally-run dictation agent for the liberated desktop.* + +[Download Release](https://git.lashman.live/lashman/whisper_voice/releases) • [View Source](https://git.lashman.live/lashman/whisper_voice) • [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues) + +
--- -## ✊ Core Principles +## ✊ The Manifesto -### 1. Total Autonomy (Local-First) -Your voice data is yours alone. Unlike commercial alternatives that siphon your words to remote data centers for processing and profiling, Whisper Voice runs entirely on your hardware. **No masters, no servers.** You retain full sovereignty over your digital footprint. +**We hold these truths to be self-evident: That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.** -### 2. Decentralized Power -By leveraging optimized local processing, we strip away the need for reliance on massive, energy-hungry corporate infrastructure. This is technology scaled to the human level—powerful, efficient, and completely under your control. +Whisper Voice is built on the principle of **technological sovereignty**. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours. -### 3. Accessible to All -High-quality speech recognition shouldn't be gated behind subscriptions or paywalls. This tool is free, open, and built to empower users to interact with their machines on their own terms. +## ⚡ Technical Core + +Under the hood, Whisper Voice exploits the raw power of **Faster-Whisper**, a highly optimized implementation of OpenAI's Whisper model using CTranslate2. This delivers: + +* **Zero Latency Loop**: By eliminating network round-trips, transcription happens as fast as your hardware can think. +* **Privacy by Physics**: Data physically cannot leave your machine because the engine has no cloud uplink. The cable is cut. +* **Precision Engineering**: Leveraging 8-bit quantization to run even the `Large-v3` models on consumer GPUs with minimal memory footprint. + +## ✨ Capabilities + +### 🧠 Adaptive Intelligence +Choose the model that fits your rig. From `Tiny` (low resource, high speed) to `Large` (human-level accuracy). The agent automatically configures itself for your available Compute Device (CUDA GPU or CPU). + +### 🚀 Inputs & Injection +* **Global Hotkey**: A rigorous system-wide hook (default `F9`) puts the ear of the machine at your fingertips. +* **Simulated Typing**: Bylaws of some applications block pasting? No problem. Our engine simulates keystrokes at supersonic speeds (up to **6000 CPM**), bypassing restrictions like water flowing around a rock. +* **Clipboard Mode**: Standard, lightning-fast text injection for permissive environments. + +### 🛡️ System Integration +* **Glassmorphic UI**: A modern, non-intrusive QML interface that respects your screen real estate. +* **Tray Agent**: Retracts to the system tray, maintaining a low profile until summoned. +* **Bootstrapper**: A self-assembling runtime that provisions its own dependencies using an isolated embedded Python environment. No pollution of your system PATH. --- -## ✨ Features - -* **100% Offline Processing**: Once the recognition engine is downloaded, the cable can be cut. Nothing leaves your machine. -* **Universal Compatibility**: Works in any text field—editors, chat apps, terminals, or browsers. If you can type there, you can speak there. -* **Adaptive Input**: - * *Clipboard Mode*: Standard paste injection. - * *High-Speed Simulation*: Simulates keystrokes at supersonic speeds (up to 6000 CPM) for apps that block pasting. -* **System Integration**: Minimalist overlay and system tray presence. It exists when you need it and vanishes when you don't. -* **Resource Efficiency**: Optimized to run smoothly on consumer hardware without monopolizing your system resources. - ---- - -## 🚀 Getting Started +## �️ Usage Guide ### Installation -1. Download the latest release. -2. Run `WhisperVoice.exe`. -3. On the first run, the bootstrapper will autonomously provision the necessary runtime environment. This ensures your system remains clean and dependencies are self-contained. +1. **Acquire**: Download the latest portable executable from the [Releases](https://git.lashman.live/lashman/whisper_voice/releases) page. +2. **Deploy**: Place `WhisperVoice.exe` in a directory of your choosing. +3. **Initialize**: Run the executable. It will autonomously hydrate its runtime environment (approx. 2GB) on the first launch. -### Usage -1. **Set Your Trigger**: Configure a global hotkey (default: `F9`) in the settings. -2. **Speak Freely**: Hold the hotkey (or toggle it) and speak. -3. **Direct Action**: Your words are instantly transcribed and injected into your active window. +### Operation +1. **Configure**: Open Settings via the tray icon. Select your **Model Size** and **Compute Device**. +2. **Engage**: Press `F9` (or your custom hotkey) to open the channel. +3. **Dictate**: Speak clearly. The noise gate will isolate your voice. +4. **Execute**: Release the key. The machine interprets the signal and injects the text into your active window immediately. --- -## ⚙️ Configuration +## 🧪 Model Performance -The **Settings** panel puts the means of configuration in your hands: +| Model | VRAM (Approx) | Speed | Capabilities | +| :--- | :--- | :--- | :--- | +| **Tiny** | < 1 GB | Supersonic | Quick commands, simple dictation. | +| **Base** | 1 GB | Very Fast | Good balance for older hardware. | +| **Small** | 2 GB | Fast | Standard daily driver. High English accuracy. | +| **Medium** | 5 GB | Moderate | High precision, handles accents well. | +| **Large-v3** | 8 GB+ | Heavy | Professional grade. Near-perfect understanding. | -* **Recognition Engine**: Choose the size of the model that fits your hardware capabilities (Tiny to Large). Larger models offer greater precision but require more computing power. -* **Input Method**: Switch between "Clipboard Paste" and "Simulate Typing" depending on target application restrictions. -* **Typing Speed**: Adjust the keystroke injection rate. Crank it up to 6000 CPM for instant text delivery. -* **Run on Startup**: Configure the agent to be ready the moment your session begins. +*Note: Performance scales with your GPU capabilities.* --- ## 🤝 Mutual Aid -This project thrives on community collaboration. If you have improvements, fixes, or ideas, you are encouraged to contribute. We build better systems when we build them together, horizontally and transparently. +This software is free as in freedom. It is a commons, not a commodity. -* **Report Issues**: If something breaks, let us know. -* **Contribute Code**: The source is open. Fork it, improve it, share it. +Contributions are welcome from all who share the vision of decentralized, local-first computing. Whether it is code, documentation, or design—labor given freely enriches the community whole. + +**[Fork the Repository](https://git.lashman.live/lashman/whisper_voice)** --- -*Built with local processing libraries and Qt.* -*No gods, no cloud managers.* +
+"The master's tools will never dismantle the master's house." +
+Build your own tools. Run them locally. +