diff --git a/README.md b/README.md index c652241..b85fbc1 100644 --- a/README.md +++ b/README.md @@ -1,165 +1,71 @@ -# Whisper Voice - Native Windows AI Transcriber +# Whisper Voice -**Whisper Voice** is a high-performance, native Windows application that brings the power of OpenAI's **Whisper** model to your desktop in a seamless, interactive way. +**Reclaim Your Voice from the Cloud.** -Designed for productivity "power users", it allows you to invoke a global hotkey, dictate your thoughts, and have the transcribed text instantly typed into *any* active application (Notepad, Word, Slack, VS Code, etc.). +Whisper Voice is a high-performance, strictly local speech-to-text tool designed for the desktop. It provides instant, high-accuracy dictation anywhere on your system—no internet connection required, no corporate servers, and absolutely no data harvesting. -It features a modern, floating "Pill" UI with real-time audio visualization, built on top of the robust PySide6 (Qt) framework. +We believe that the tools of production—and communication—should belong to the individual, not rented from centralized tech giants. + +--- + +## ✊ Core Principles + +### 1. Total Autonomy (Local-First) +Your voice data is yours alone. Unlike commercial alternatives that siphon your words to remote data centers for processing and profiling, Whisper Voice runs entirely on your hardware. **No masters, no servers.** You retain full sovereignty over your digital footprint. + +### 2. Decentralized Power +By leveraging optimized local processing, we strip away the need for reliance on massive, energy-hungry corporate infrastructure. This is technology scaled to the human level—powerful, efficient, and completely under your control. + +### 3. Accessible to All +High-quality speech recognition shouldn't be gated behind subscriptions or paywalls. This tool is free, open, and built to empower users to interact with their machines on their own terms. --- ## ✨ Features -- **🎙️ Global Hotkey**: Press **F8** anywhere in Windows to start recording. Press again to stop. -- **🤖 Local AI Intelligence**: Powered by `faster-whisper`. Runs entirely on your machine. No cloud API keys, no data leaving your PC. -- **⚡ High Performance**: Uses the 'Small' Whisper model by default (~500MB), optimized for a balance of speed and accuracy. -- **🎨 Modern UI**: A frameless, draggable, floating "Pill" window with a Neon **Audio Visualizer** that reacts to your voice. -- **🔌 Smart Bootstrapper**: The app is portable and self-healing. On the first run, it checks for the AI model and downloads it automatically if missing. -- **✍️ Auto-Type**: Automatically simulates keyboard input to paste the transcribed text where your cursor is. -- **🔋 Portable**: Can be compiled into a single `.exe` file that you can carry on a USB drive. +* **100% Offline Processing**: Once the recognition engine is downloaded, the cable can be cut. Nothing leaves your machine. +* **Universal Compatibility**: Works in any text field—editors, chat apps, terminals, or browsers. If you can type there, you can speak there. +* **Adaptive Input**: + * *Clipboard Mode*: Standard paste injection. + * *High-Speed Simulation*: Simulates keystrokes at supersonic speeds (up to 6000 CPM) for apps that block pasting. +* **System Integration**: Minimalist overlay and system tray presence. It exists when you need it and vanishes when you don't. +* **Resource Efficiency**: Optimized to run smoothly on consumer hardware without monopolizing your system resources. --- -## 🛠️ Requirements +## 🚀 Getting Started -- **OS**: Windows 10 or 11 (64-bit). -- **Python**: 3.10 or newer (if running from source). -- **Hardware**: A reasonable CPU (Modern Intel i5/AMD Ryzen). NVIDIA GPU recommended for instant speed (requires CUDA setup), but runs fine on CPU. -- **Dependencies**: - - **FFmpeg**: Essential for audio processing. (See Setup Guide). +### Installation +1. Download the latest release. +2. Run `WhisperVoice.exe`. +3. On the first run, the bootstrapper will autonomously provision the necessary runtime environment. This ensures your system remains clean and dependencies are self-contained. + +### Usage +1. **Set Your Trigger**: Configure a global hotkey (default: `F9`) in the settings. +2. **Speak Freely**: Hold the hotkey (or toggle it) and speak. +3. **Direct Action**: Your words are instantly transcribed and injected into your active window. --- -## 🚀 Installation & Setup +## ⚙️ Configuration -### Option A: Running from Source (Developers) +The **Settings** panel puts the means of configuration in your hands: -1. **Clone the Repository**: - ```bash - git clone https://github.com/your/repo.git - cd whisper_voice - ``` - -2. **Environment Setup**: - It is highly recommended to use a virtual environment. - ```cmd - python -m venv venv - venv\Scripts\activate - ``` - -3. **Install Python Dependencies**: - ```cmd - pip install -r requirements.txt - ``` - -4. **FFmpeg Setup**: - - **Method 1 (System-wide)**: Download FFmpeg and add the `bin` folder to your Windows PATH environment variable. - - **Method 2 (Portable)**: Download `ffmpeg.exe` and place it in a `libs` folder inside the project root: - ```text - whisper_voice/ - ├── main.py - ├── libs/ - │ └── ffmpeg.exe <-- Place here - ``` - -5. **Run the App**: - ```cmd - python main.py - ``` - *Or use the provided `run_source.bat` script.* - -### Option B: Building a Portable EXE - -You can compile the application into a single executable file for easy distribution. - -1. Follow the **Running from Source** steps above to set up your environment. -2. Install `pyinstaller`: - ```cmd - pip install pyinstaller - ``` -3. Run the Build Script: - ```cmd - build_exe.bat - ``` - *(Or run `pyinstaller build.spec` manually).* - -4. **Locate the EXE**: - The result will be in the `dist` folder: `dist/WhisperVoice.exe`. - -5. **Distribution**: - - You can send just the `.exe` to anyone. - - **Note**: The end-user will still need FFmpeg. You can zip the `libs` folder alongside the EXE to make it truly "unzip and run". +* **Recognition Engine**: Choose the size of the model that fits your hardware capabilities (Tiny to Large). Larger models offer greater precision but require more computing power. +* **Input Method**: Switch between "Clipboard Paste" and "Simulate Typing" depending on target application restrictions. +* **Typing Speed**: Adjust the keystroke injection rate. Crank it up to 6000 CPM for instant text delivery. +* **Run on Startup**: Configure the agent to be ready the moment your session begins. --- -## 🎮 Usage Guide +## 🤝 Mutual Aid -1. **First Run Initialization**: - - When you launch the app, you will see a **"Initializing..."** window. - - If the AI Model (`models/` folder) is missing, the app will automatically download it (~500MB). - - Once complete, the app minimizes to the System Tray. +This project thrives on community collaboration. If you have improvements, fixes, or ideas, you are encouraged to contribute. We build better systems when we build them together, horizontally and transparently. -2. **Dictation**: - - Focus the text field where you want to type (e.g., click into a Notepad document). - - Press **F8**. - - The **Floating Pill** appears on screen. Use the visualizer to confirm it hears you. - - Speak your sentence. - - Press **F8** again to stop. - - The Pill turns **Blue** ("Thinking..."). - - Wait a moment... the text will appear! - -3. **System Tray**: - - Look for the application icon in your taskbar tray (near the clock). - - Right-click -> **Quit Whisper Voice** to exit the application completely. +* **Report Issues**: If something breaks, let us know. +* **Contribute Code**: The source is open. Fork it, improve it, share it. --- -## 📁 Project Structure - -```text -whisper_voice/ -├── main.py # Application Entry Point & Orchestrator -├── task.md # Development Task Tracking -├── requirements.txt # Python Dependencies -├── build.spec # PyInstaller Configuration -├── run_source.bat # Helper script -├── build_exe.bat # Helper script -├── src/ -│ ├── core/ -│ │ ├── audio_engine.py # Microphone recording logic -│ │ ├── transcriber.py # AI Model wrapper (Faster-Whisper) -│ │ ├── hotkey_manager.py # Global keyboard hooks -│ │ └── paths.py # Path resolution (EXE vs Script) -│ ├── ui/ -│ │ ├── overlay.py # Main Pill Window -│ │ ├── visualizer.py # Audio Spectrum Widget -│ │ ├── loader.py # Bootstrapper/Downloader UI -│ │ └── tray.py # System Tray Icon -│ └── utils/ -│ ├── injector.py # Clipboard/Paste logic -│ └── downloader.py # File download utilities -``` - ---- - -## ❓ Troubleshooting - -**Q: Nothing happens when I press F8.** -- Check the System Tray to ensure the app is running. -- Ensure you have given the app "Input Monitoring" permissions if prompted (rare on standard Windows). -- Some Antivirus software might block the "Global Hotkey" feature. Whitelist the app. - -**Q: The app crashes with an error about FFmpeg.** -- `faster-whisper` requires FFmpeg. Make sure `ffmpeg.exe` is either in your system PATH or in a `libs` folder next to the `main.py` (or EXE). - -**Q: Transcription is slow.** -- The "Small" model is generally fast, but on older CPUs, it might take 2-5 seconds for a long sentence. -- To use a GPU, you must install the NVIDIA cuDNN libraries and the `torch` version with CUDA support. This prototype setup defaults to CPU/Auto for compatibility. - -**Q: "Failed to load model" error.** -- Delete the `models` folder and restart the app to force a re-download. - ---- - -**License**: MIT -**Author**: Antigravity +*Built with local processing libraries and Qt.* +*No gods, no cloud managers.*