Whisper Voice - Native Windows AI Transcriber

Whisper Voice is a high-performance, native Windows application that brings the power of OpenAI's Whisper model to your desktop in a seamless, interactive way.

Designed for productivity "power users", it allows you to invoke a global hotkey, dictate your thoughts, and have the transcribed text instantly typed into any active application (Notepad, Word, Slack, VS Code, etc.).

It features a modern, floating "Pill" UI with real-time audio visualization, built on top of the robust PySide6 (Qt) framework.

✨ Features

🎙️ Global Hotkey: Press F8 anywhere in Windows to start recording. Press again to stop.
🤖 Local AI Intelligence: Powered by faster-whisper. Runs entirely on your machine. No cloud API keys, no data leaving your PC.
⚡ High Performance: Uses the 'Small' Whisper model by default (~500MB), optimized for a balance of speed and accuracy.
🎨 Modern UI: A frameless, draggable, floating "Pill" window with a Neon Audio Visualizer that reacts to your voice.
🔌 Smart Bootstrapper: The app is portable and self-healing. On the first run, it checks for the AI model and downloads it automatically if missing.
✍️ Auto-Type: Automatically simulates keyboard input to paste the transcribed text where your cursor is.
🔋 Portable: Can be compiled into a single .exe file that you can carry on a USB drive.

🛠️ Requirements

OS: Windows 10 or 11 (64-bit).
Python: 3.10 or newer (if running from source).
Hardware: A reasonable CPU (Modern Intel i5/AMD Ryzen). NVIDIA GPU recommended for instant speed (requires CUDA setup), but runs fine on CPU.
Dependencies:
- FFmpeg: Essential for audio processing. (See Setup Guide).

🚀 Installation & Setup

Option A: Running from Source (Developers)

Clone the Repository:

git clone https://github.com/your/repo.git
cd whisper_voice

Environment Setup: It is highly recommended to use a virtual environment.
```
python -m venv venv
venv\Scripts\activate
```
Install Python Dependencies:
```
pip install -r requirements.txt
```
FFmpeg Setup:
- Method 1 (System-wide): Download FFmpeg and add the bin folder to your Windows PATH environment variable.
- Method 2 (Portable): Download ffmpeg.exe and place it in a libs folder inside the project root:
```
whisper_voice/
├── main.py
├── libs/
│   └── ffmpeg.exe  <-- Place here
```
Run the App:
```
python main.py
```
Or use the provided run_source.bat script.

Option B: Building a Portable EXE

You can compile the application into a single executable file for easy distribution.

Follow the Running from Source steps above to set up your environment.
Install pyinstaller:
```
pip install pyinstaller
```
Run the Build Script:
```
build_exe.bat
```
(Or run pyinstaller build.spec manually).
Locate the EXE: The result will be in the dist folder: dist/WhisperVoice.exe.
Distribution:
- You can send just the .exe to anyone.
- Note: The end-user will still need FFmpeg. You can zip the libs folder alongside the EXE to make it truly "unzip and run".

🎮 Usage Guide

First Run Initialization:
- When you launch the app, you will see a "Initializing..." window.
- If the AI Model (models/ folder) is missing, the app will automatically download it (~500MB).
- Once complete, the app minimizes to the System Tray.
Dictation:
- Focus the text field where you want to type (e.g., click into a Notepad document).
- Press F8.
- The Floating Pill appears on screen. Use the visualizer to confirm it hears you.
- Speak your sentence.
- Press F8 again to stop.
- The Pill turns Blue ("Thinking...").
- Wait a moment... the text will appear!
System Tray:
- Look for the application icon in your taskbar tray (near the clock).
- Right-click -> Quit Whisper Voice to exit the application completely.

📁 Project Structure

whisper_voice/
├── main.py                 # Application Entry Point & Orchestrator
├── task.md                 # Development Task Tracking
├── requirements.txt        # Python Dependencies
├── build.spec              # PyInstaller Configuration
├── run_source.bat          # Helper script
├── build_exe.bat           # Helper script
├── src/
│   ├── core/
│   │   ├── audio_engine.py    # Microphone recording logic
│   │   ├── transcriber.py     # AI Model wrapper (Faster-Whisper)
│   │   ├── hotkey_manager.py  # Global keyboard hooks
│   │   └── paths.py           # Path resolution (EXE vs Script)
│   ├── ui/
│   │   ├── overlay.py         # Main Pill Window
│   │   ├── visualizer.py      # Audio Spectrum Widget
│   │   ├── loader.py          # Bootstrapper/Downloader UI
│   │   └── tray.py            # System Tray Icon
│   └── utils/
│       ├── injector.py        # Clipboard/Paste logic
│       └── downloader.py      # File download utilities

❓ Troubleshooting

Q: Nothing happens when I press F8.

Check the System Tray to ensure the app is running.
Ensure you have given the app "Input Monitoring" permissions if prompted (rare on standard Windows).
Some Antivirus software might block the "Global Hotkey" feature. Whitelist the app.

Q: The app crashes with an error about FFmpeg.

faster-whisper requires FFmpeg. Make sure ffmpeg.exe is either in your system PATH or in a libs folder next to the main.py (or EXE).

Q: Transcription is slow.

The "Small" model is generally fast, but on older CPUs, it might take 2-5 seconds for a long sentence.
To use a GPU, you must install the NVIDIA cuDNN libraries and the torch version with CUDA support. This prototype setup defaults to CPU/Auto for compatibility.

Q: "Failed to load model" error.

Delete the models folder and restart the app to force a re-download.

License: MIT
Author: Antigravity

6.4 KiB Raw Blame History