Initial commit of WhisperVoice

2026-01-24 17:03:52 +02:00
commit 9ff0e8d108
118 changed files with 6102 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,165 @@
+# Whisper Voice - Native Windows AI Transcriber
+
+**Whisper Voice** is a high-performance, native Windows application that brings the power of OpenAI's **Whisper** model to your desktop in a seamless, interactive way. 
+
+Designed for productivity "power users", it allows you to invoke a global hotkey, dictate your thoughts, and have the transcribed text instantly typed into *any* active application (Notepad, Word, Slack, VS Code, etc.).
+
+It features a modern, floating "Pill" UI with real-time audio visualization, built on top of the robust PySide6 (Qt) framework.
+
+---
+
+## ✨ Features
+
+-   **🎙️ Global Hotkey**: Press **F8** anywhere in Windows to start recording. Press again to stop.
+-   **🤖 Local AI Intelligence**: Powered by `faster-whisper`. Runs entirely on your machine. No cloud API keys, no data leaving your PC. 
+-   **⚡ High Performance**: Uses the 'Small' Whisper model by default (~500MB), optimized for a balance of speed and accuracy.
+-   **🎨 Modern UI**: A frameless, draggable, floating "Pill" window with a Neon **Audio Visualizer** that reacts to your voice.
+-   **🔌 Smart Bootstrapper**: The app is portable and self-healing. On the first run, it checks for the AI model and downloads it automatically if missing.
+-   **✍️ Auto-Type**: Automatically simulates keyboard input to paste the transcribed text where your cursor is.
+-   **🔋 Portable**: Can be compiled into a single `.exe` file that you can carry on a USB drive.
+
+---
+
+## 🛠️ Requirements
+
+-   **OS**: Windows 10 or 11 (64-bit).
+-   **Python**: 3.10 or newer (if running from source).
+-   **Hardware**: A reasonable CPU (Modern Intel i5/AMD Ryzen). NVIDIA GPU recommended for instant speed (requires CUDA setup), but runs fine on CPU.
+-   **Dependencies**: 
+    -   **FFmpeg**: Essential for audio processing. (See Setup Guide).
+
+---
+
+## 🚀 Installation & Setup
+
+### Option A: Running from Source (Developers)
+
+1.  **Clone the Repository**:
+    ```bash
+    git clone https://github.com/your/repo.git
+    cd whisper_voice
+    ```
+
+2.  **Environment Setup**:
+    It is highly recommended to use a virtual environment.
+    ```cmd
+    python -m venv venv
+    venv\Scripts\activate
+    ```
+
+3.  **Install Python Dependencies**:
+    ```cmd
+    pip install -r requirements.txt
+    ```
+
+4.  **FFmpeg Setup**:
+    -   **Method 1 (System-wide)**: Download FFmpeg and add the `bin` folder to your Windows PATH environment variable.
+    -   **Method 2 (Portable)**: Download `ffmpeg.exe` and place it in a `libs` folder inside the project root:
+        ```text
+        whisper_voice/
+        ├── main.py
+        ├── libs/
+        │   └── ffmpeg.exe  <-- Place here
+        ```
+
+5.  **Run the App**:
+    ```cmd
+    python main.py
+    ```
+    *Or use the provided `run_source.bat` script.*
+
+### Option B: Building a Portable EXE
+
+You can compile the application into a single executable file for easy distribution.
+
+1.  Follow the **Running from Source** steps above to set up your environment.
+2.  Install `pyinstaller`:
+    ```cmd
+    pip install pyinstaller
+    ```
+3.  Run the Build Script:
+    ```cmd
+    build_exe.bat
+    ```
+    *(Or run `pyinstaller build.spec` manually).*
+
+4.  **Locate the EXE**:
+    The result will be in the `dist` folder: `dist/WhisperVoice.exe`.
+
+5.  **Distribution**:
+    -   You can send just the `.exe` to anyone.
+    -   **Note**: The end-user will still need FFmpeg. You can zip the `libs` folder alongside the EXE to make it truly "unzip and run".
+
+---
+
+## 🎮 Usage Guide
+
+1.  **First Run Initialization**:
+    -   When you launch the app, you will see a **"Initializing..."** window.
+    -   If the AI Model (`models/` folder) is missing, the app will automatically download it (~500MB).
+    -   Once complete, the app minimizes to the System Tray.
+
+2.  **Dictation**:
+    -   Focus the text field where you want to type (e.g., click into a Notepad document).
+    -   Press **F8**.
+    -   The **Floating Pill** appears on screen. Use the visualizer to confirm it hears you.
+    -   Speak your sentence.
+    -   Press **F8** again to stop.
+    -   The Pill turns **Blue** ("Thinking...").
+    -   Wait a moment... the text will appear!
+
+3.  **System Tray**:
+    -   Look for the application icon in your taskbar tray (near the clock).
+    -   Right-click -> **Quit Whisper Voice** to exit the application completely.
+
+---
+
+## 📁 Project Structure
+
+```text
+whisper_voice/
+├── main.py                 # Application Entry Point & Orchestrator
+├── task.md                 # Development Task Tracking
+├── requirements.txt        # Python Dependencies
+├── build.spec              # PyInstaller Configuration
+├── run_source.bat          # Helper script
+├── build_exe.bat           # Helper script
+├── src/
+│   ├── core/
+│   │   ├── audio_engine.py    # Microphone recording logic
+│   │   ├── transcriber.py     # AI Model wrapper (Faster-Whisper)
+│   │   ├── hotkey_manager.py  # Global keyboard hooks
+│   │   └── paths.py           # Path resolution (EXE vs Script)
+│   ├── ui/
+│   │   ├── overlay.py         # Main Pill Window
+│   │   ├── visualizer.py      # Audio Spectrum Widget
+│   │   ├── loader.py          # Bootstrapper/Downloader UI
+│   │   └── tray.py            # System Tray Icon
+│   └── utils/
+│       ├── injector.py        # Clipboard/Paste logic
+│       └── downloader.py      # File download utilities
+```
+
+---
+
+## ❓ Troubleshooting
+
+**Q: Nothing happens when I press F8.**
+-   Check the System Tray to ensure the app is running.
+-   Ensure you have given the app "Input Monitoring" permissions if prompted (rare on standard Windows).
+-   Some Antivirus software might block the "Global Hotkey" feature. Whitelist the app.
+
+**Q: The app crashes with an error about FFmpeg.**
+-   `faster-whisper` requires FFmpeg. Make sure `ffmpeg.exe` is either in your system PATH or in a `libs` folder next to the `main.py` (or EXE).
+
+**Q: Transcription is slow.**
+-   The "Small" model is generally fast, but on older CPUs, it might take 2-5 seconds for a long sentence.
+-   To use a GPU, you must install the NVIDIA cuDNN libraries and the `torch` version with CUDA support. This prototype setup defaults to CPU/Auto for compatibility.
+
+**Q: "Failed to load model" error.**
+-   Delete the `models` folder and restart the app to force a re-download.
+
+---
+
+**License**: MIT  
+**Author**: Antigravity