Update documentation with new manifesto

This commit is contained in:
Your Name
2026-01-24 17:06:31 +02:00
parent 9ff0e8d108
commit 26f1d8a3e7

186
README.md
View File

@@ -1,165 +1,71 @@
# Whisper Voice - Native Windows AI Transcriber
# Whisper Voice
**Whisper Voice** is a high-performance, native Windows application that brings the power of OpenAI's **Whisper** model to your desktop in a seamless, interactive way.
**Reclaim Your Voice from the Cloud.**
Designed for productivity "power users", it allows you to invoke a global hotkey, dictate your thoughts, and have the transcribed text instantly typed into *any* active application (Notepad, Word, Slack, VS Code, etc.).
Whisper Voice is a high-performance, strictly local speech-to-text tool designed for the desktop. It provides instant, high-accuracy dictation anywhere on your system—no internet connection required, no corporate servers, and absolutely no data harvesting.
It features a modern, floating "Pill" UI with real-time audio visualization, built on top of the robust PySide6 (Qt) framework.
We believe that the tools of production—and communication—should belong to the individual, not rented from centralized tech giants.
---
## ✊ Core Principles
### 1. Total Autonomy (Local-First)
Your voice data is yours alone. Unlike commercial alternatives that siphon your words to remote data centers for processing and profiling, Whisper Voice runs entirely on your hardware. **No masters, no servers.** You retain full sovereignty over your digital footprint.
### 2. Decentralized Power
By leveraging optimized local processing, we strip away the need for reliance on massive, energy-hungry corporate infrastructure. This is technology scaled to the human level—powerful, efficient, and completely under your control.
### 3. Accessible to All
High-quality speech recognition shouldn't be gated behind subscriptions or paywalls. This tool is free, open, and built to empower users to interact with their machines on their own terms.
---
## ✨ Features
- **🎙️ Global Hotkey**: Press **F8** anywhere in Windows to start recording. Press again to stop.
- **🤖 Local AI Intelligence**: Powered by `faster-whisper`. Runs entirely on your machine. No cloud API keys, no data leaving your PC.
- **⚡ High Performance**: Uses the 'Small' Whisper model by default (~500MB), optimized for a balance of speed and accuracy.
- **🎨 Modern UI**: A frameless, draggable, floating "Pill" window with a Neon **Audio Visualizer** that reacts to your voice.
- **🔌 Smart Bootstrapper**: The app is portable and self-healing. On the first run, it checks for the AI model and downloads it automatically if missing.
- **✍️ Auto-Type**: Automatically simulates keyboard input to paste the transcribed text where your cursor is.
- **🔋 Portable**: Can be compiled into a single `.exe` file that you can carry on a USB drive.
* **100% Offline Processing**: Once the recognition engine is downloaded, the cable can be cut. Nothing leaves your machine.
* **Universal Compatibility**: Works in any text field—editors, chat apps, terminals, or browsers. If you can type there, you can speak there.
* **Adaptive Input**:
* *Clipboard Mode*: Standard paste injection.
* *High-Speed Simulation*: Simulates keystrokes at supersonic speeds (up to 6000 CPM) for apps that block pasting.
* **System Integration**: Minimalist overlay and system tray presence. It exists when you need it and vanishes when you don't.
* **Resource Efficiency**: Optimized to run smoothly on consumer hardware without monopolizing your system resources.
---
## 🛠️ Requirements
## 🚀 Getting Started
- **OS**: Windows 10 or 11 (64-bit).
- **Python**: 3.10 or newer (if running from source).
- **Hardware**: A reasonable CPU (Modern Intel i5/AMD Ryzen). NVIDIA GPU recommended for instant speed (requires CUDA setup), but runs fine on CPU.
- **Dependencies**:
- **FFmpeg**: Essential for audio processing. (See Setup Guide).
### Installation
1. Download the latest release.
2. Run `WhisperVoice.exe`.
3. On the first run, the bootstrapper will autonomously provision the necessary runtime environment. This ensures your system remains clean and dependencies are self-contained.
### Usage
1. **Set Your Trigger**: Configure a global hotkey (default: `F9`) in the settings.
2. **Speak Freely**: Hold the hotkey (or toggle it) and speak.
3. **Direct Action**: Your words are instantly transcribed and injected into your active window.
---
## 🚀 Installation & Setup
## ⚙️ Configuration
### Option A: Running from Source (Developers)
The **Settings** panel puts the means of configuration in your hands:
1. **Clone the Repository**:
```bash
git clone https://github.com/your/repo.git
cd whisper_voice
```
2. **Environment Setup**:
It is highly recommended to use a virtual environment.
```cmd
python -m venv venv
venv\Scripts\activate
```
3. **Install Python Dependencies**:
```cmd
pip install -r requirements.txt
```
4. **FFmpeg Setup**:
- **Method 1 (System-wide)**: Download FFmpeg and add the `bin` folder to your Windows PATH environment variable.
- **Method 2 (Portable)**: Download `ffmpeg.exe` and place it in a `libs` folder inside the project root:
```text
whisper_voice/
├── main.py
├── libs/
│ └── ffmpeg.exe <-- Place here
```
5. **Run the App**:
```cmd
python main.py
```
*Or use the provided `run_source.bat` script.*
### Option B: Building a Portable EXE
You can compile the application into a single executable file for easy distribution.
1. Follow the **Running from Source** steps above to set up your environment.
2. Install `pyinstaller`:
```cmd
pip install pyinstaller
```
3. Run the Build Script:
```cmd
build_exe.bat
```
*(Or run `pyinstaller build.spec` manually).*
4. **Locate the EXE**:
The result will be in the `dist` folder: `dist/WhisperVoice.exe`.
5. **Distribution**:
- You can send just the `.exe` to anyone.
- **Note**: The end-user will still need FFmpeg. You can zip the `libs` folder alongside the EXE to make it truly "unzip and run".
* **Recognition Engine**: Choose the size of the model that fits your hardware capabilities (Tiny to Large). Larger models offer greater precision but require more computing power.
* **Input Method**: Switch between "Clipboard Paste" and "Simulate Typing" depending on target application restrictions.
* **Typing Speed**: Adjust the keystroke injection rate. Crank it up to 6000 CPM for instant text delivery.
* **Run on Startup**: Configure the agent to be ready the moment your session begins.
---
## 🎮 Usage Guide
## 🤝 Mutual Aid
1. **First Run Initialization**:
- When you launch the app, you will see a **"Initializing..."** window.
- If the AI Model (`models/` folder) is missing, the app will automatically download it (~500MB).
- Once complete, the app minimizes to the System Tray.
This project thrives on community collaboration. If you have improvements, fixes, or ideas, you are encouraged to contribute. We build better systems when we build them together, horizontally and transparently.
2. **Dictation**:
- Focus the text field where you want to type (e.g., click into a Notepad document).
- Press **F8**.
- The **Floating Pill** appears on screen. Use the visualizer to confirm it hears you.
- Speak your sentence.
- Press **F8** again to stop.
- The Pill turns **Blue** ("Thinking...").
- Wait a moment... the text will appear!
3. **System Tray**:
- Look for the application icon in your taskbar tray (near the clock).
- Right-click -> **Quit Whisper Voice** to exit the application completely.
* **Report Issues**: If something breaks, let us know.
* **Contribute Code**: The source is open. Fork it, improve it, share it.
---
## 📁 Project Structure
```text
whisper_voice/
├── main.py # Application Entry Point & Orchestrator
├── task.md # Development Task Tracking
├── requirements.txt # Python Dependencies
├── build.spec # PyInstaller Configuration
├── run_source.bat # Helper script
├── build_exe.bat # Helper script
├── src/
│ ├── core/
│ │ ├── audio_engine.py # Microphone recording logic
│ │ ├── transcriber.py # AI Model wrapper (Faster-Whisper)
│ │ ├── hotkey_manager.py # Global keyboard hooks
│ │ └── paths.py # Path resolution (EXE vs Script)
│ ├── ui/
│ │ ├── overlay.py # Main Pill Window
│ │ ├── visualizer.py # Audio Spectrum Widget
│ │ ├── loader.py # Bootstrapper/Downloader UI
│ │ └── tray.py # System Tray Icon
│ └── utils/
│ ├── injector.py # Clipboard/Paste logic
│ └── downloader.py # File download utilities
```
---
## ❓ Troubleshooting
**Q: Nothing happens when I press F8.**
- Check the System Tray to ensure the app is running.
- Ensure you have given the app "Input Monitoring" permissions if prompted (rare on standard Windows).
- Some Antivirus software might block the "Global Hotkey" feature. Whitelist the app.
**Q: The app crashes with an error about FFmpeg.**
- `faster-whisper` requires FFmpeg. Make sure `ffmpeg.exe` is either in your system PATH or in a `libs` folder next to the `main.py` (or EXE).
**Q: Transcription is slow.**
- The "Small" model is generally fast, but on older CPUs, it might take 2-5 seconds for a long sentence.
- To use a GPU, you must install the NVIDIA cuDNN libraries and the `torch` version with CUDA support. This prototype setup defaults to CPU/Auto for compatibility.
**Q: "Failed to load model" error.**
- Delete the `models` folder and restart the app to force a re-download.
---
**License**: MIT
**Author**: Antigravity
*Built with local processing libraries and Qt.*
*No gods, no cloud managers.*