# Whisper Voice - Native Windows AI Transcriber **Whisper Voice** is a high-performance, native Windows application that brings the power of OpenAI's **Whisper** model to your desktop in a seamless, interactive way. Designed for productivity "power users", it allows you to invoke a global hotkey, dictate your thoughts, and have the transcribed text instantly typed into *any* active application (Notepad, Word, Slack, VS Code, etc.). It features a modern, floating "Pill" UI with real-time audio visualization, built on top of the robust PySide6 (Qt) framework. --- ## ✨ Features - **🎙️ Global Hotkey**: Press **F8** anywhere in Windows to start recording. Press again to stop. - **🤖 Local AI Intelligence**: Powered by `faster-whisper`. Runs entirely on your machine. No cloud API keys, no data leaving your PC. - **⚡ High Performance**: Uses the 'Small' Whisper model by default (~500MB), optimized for a balance of speed and accuracy. - **🎨 Modern UI**: A frameless, draggable, floating "Pill" window with a Neon **Audio Visualizer** that reacts to your voice. - **🔌 Smart Bootstrapper**: The app is portable and self-healing. On the first run, it checks for the AI model and downloads it automatically if missing. - **✍️ Auto-Type**: Automatically simulates keyboard input to paste the transcribed text where your cursor is. - **🔋 Portable**: Can be compiled into a single `.exe` file that you can carry on a USB drive. --- ## 🛠️ Requirements - **OS**: Windows 10 or 11 (64-bit). - **Python**: 3.10 or newer (if running from source). - **Hardware**: A reasonable CPU (Modern Intel i5/AMD Ryzen). NVIDIA GPU recommended for instant speed (requires CUDA setup), but runs fine on CPU. - **Dependencies**: - **FFmpeg**: Essential for audio processing. (See Setup Guide). --- ## 🚀 Installation & Setup ### Option A: Running from Source (Developers) 1. **Clone the Repository**: ```bash git clone https://github.com/your/repo.git cd whisper_voice ``` 2. **Environment Setup**: It is highly recommended to use a virtual environment. ```cmd python -m venv venv venv\Scripts\activate ``` 3. **Install Python Dependencies**: ```cmd pip install -r requirements.txt ``` 4. **FFmpeg Setup**: - **Method 1 (System-wide)**: Download FFmpeg and add the `bin` folder to your Windows PATH environment variable. - **Method 2 (Portable)**: Download `ffmpeg.exe` and place it in a `libs` folder inside the project root: ```text whisper_voice/ ├── main.py ├── libs/ │ └── ffmpeg.exe <-- Place here ``` 5. **Run the App**: ```cmd python main.py ``` *Or use the provided `run_source.bat` script.* ### Option B: Building a Portable EXE You can compile the application into a single executable file for easy distribution. 1. Follow the **Running from Source** steps above to set up your environment. 2. Install `pyinstaller`: ```cmd pip install pyinstaller ``` 3. Run the Build Script: ```cmd build_exe.bat ``` *(Or run `pyinstaller build.spec` manually).* 4. **Locate the EXE**: The result will be in the `dist` folder: `dist/WhisperVoice.exe`. 5. **Distribution**: - You can send just the `.exe` to anyone. - **Note**: The end-user will still need FFmpeg. You can zip the `libs` folder alongside the EXE to make it truly "unzip and run". --- ## 🎮 Usage Guide 1. **First Run Initialization**: - When you launch the app, you will see a **"Initializing..."** window. - If the AI Model (`models/` folder) is missing, the app will automatically download it (~500MB). - Once complete, the app minimizes to the System Tray. 2. **Dictation**: - Focus the text field where you want to type (e.g., click into a Notepad document). - Press **F8**. - The **Floating Pill** appears on screen. Use the visualizer to confirm it hears you. - Speak your sentence. - Press **F8** again to stop. - The Pill turns **Blue** ("Thinking..."). - Wait a moment... the text will appear! 3. **System Tray**: - Look for the application icon in your taskbar tray (near the clock). - Right-click -> **Quit Whisper Voice** to exit the application completely. --- ## 📁 Project Structure ```text whisper_voice/ ├── main.py # Application Entry Point & Orchestrator ├── task.md # Development Task Tracking ├── requirements.txt # Python Dependencies ├── build.spec # PyInstaller Configuration ├── run_source.bat # Helper script ├── build_exe.bat # Helper script ├── src/ │ ├── core/ │ │ ├── audio_engine.py # Microphone recording logic │ │ ├── transcriber.py # AI Model wrapper (Faster-Whisper) │ │ ├── hotkey_manager.py # Global keyboard hooks │ │ └── paths.py # Path resolution (EXE vs Script) │ ├── ui/ │ │ ├── overlay.py # Main Pill Window │ │ ├── visualizer.py # Audio Spectrum Widget │ │ ├── loader.py # Bootstrapper/Downloader UI │ │ └── tray.py # System Tray Icon │ └── utils/ │ ├── injector.py # Clipboard/Paste logic │ └── downloader.py # File download utilities ``` --- ## ❓ Troubleshooting **Q: Nothing happens when I press F8.** - Check the System Tray to ensure the app is running. - Ensure you have given the app "Input Monitoring" permissions if prompted (rare on standard Windows). - Some Antivirus software might block the "Global Hotkey" feature. Whitelist the app. **Q: The app crashes with an error about FFmpeg.** - `faster-whisper` requires FFmpeg. Make sure `ffmpeg.exe` is either in your system PATH or in a `libs` folder next to the `main.py` (or EXE). **Q: Transcription is slow.** - The "Small" model is generally fast, but on older CPUs, it might take 2-5 seconds for a long sentence. - To use a GPU, you must install the NVIDIA cuDNN libraries and the `torch` version with CUDA support. This prototype setup defaults to CPU/Auto for compatibility. **Q: "Failed to load model" error.** - Delete the `models` folder and restart the app to force a re-download. --- **License**: MIT **Author**: Antigravity