6.4 KiB
Whisper Voice - Native Windows AI Transcriber
Whisper Voice is a high-performance, native Windows application that brings the power of OpenAI's Whisper model to your desktop in a seamless, interactive way.
Designed for productivity "power users", it allows you to invoke a global hotkey, dictate your thoughts, and have the transcribed text instantly typed into any active application (Notepad, Word, Slack, VS Code, etc.).
It features a modern, floating "Pill" UI with real-time audio visualization, built on top of the robust PySide6 (Qt) framework.
✨ Features
- 🎙️ Global Hotkey: Press F8 anywhere in Windows to start recording. Press again to stop.
- 🤖 Local AI Intelligence: Powered by
faster-whisper. Runs entirely on your machine. No cloud API keys, no data leaving your PC. - ⚡ High Performance: Uses the 'Small' Whisper model by default (~500MB), optimized for a balance of speed and accuracy.
- 🎨 Modern UI: A frameless, draggable, floating "Pill" window with a Neon Audio Visualizer that reacts to your voice.
- 🔌 Smart Bootstrapper: The app is portable and self-healing. On the first run, it checks for the AI model and downloads it automatically if missing.
- ✍️ Auto-Type: Automatically simulates keyboard input to paste the transcribed text where your cursor is.
- 🔋 Portable: Can be compiled into a single
.exefile that you can carry on a USB drive.
🛠️ Requirements
- OS: Windows 10 or 11 (64-bit).
- Python: 3.10 or newer (if running from source).
- Hardware: A reasonable CPU (Modern Intel i5/AMD Ryzen). NVIDIA GPU recommended for instant speed (requires CUDA setup), but runs fine on CPU.
- Dependencies:
- FFmpeg: Essential for audio processing. (See Setup Guide).
🚀 Installation & Setup
Option A: Running from Source (Developers)
-
Clone the Repository:
git clone https://github.com/your/repo.git cd whisper_voice -
Environment Setup: It is highly recommended to use a virtual environment.
python -m venv venv venv\Scripts\activate -
Install Python Dependencies:
pip install -r requirements.txt -
FFmpeg Setup:
- Method 1 (System-wide): Download FFmpeg and add the
binfolder to your Windows PATH environment variable. - Method 2 (Portable): Download
ffmpeg.exeand place it in alibsfolder inside the project root:whisper_voice/ ├── main.py ├── libs/ │ └── ffmpeg.exe <-- Place here
- Method 1 (System-wide): Download FFmpeg and add the
-
Run the App:
python main.pyOr use the provided
run_source.batscript.
Option B: Building a Portable EXE
You can compile the application into a single executable file for easy distribution.
-
Follow the Running from Source steps above to set up your environment.
-
Install
pyinstaller:pip install pyinstaller -
Run the Build Script:
build_exe.bat(Or run
pyinstaller build.specmanually). -
Locate the EXE: The result will be in the
distfolder:dist/WhisperVoice.exe. -
Distribution:
- You can send just the
.exeto anyone. - Note: The end-user will still need FFmpeg. You can zip the
libsfolder alongside the EXE to make it truly "unzip and run".
- You can send just the
🎮 Usage Guide
-
First Run Initialization:
- When you launch the app, you will see a "Initializing..." window.
- If the AI Model (
models/folder) is missing, the app will automatically download it (~500MB). - Once complete, the app minimizes to the System Tray.
-
Dictation:
- Focus the text field where you want to type (e.g., click into a Notepad document).
- Press F8.
- The Floating Pill appears on screen. Use the visualizer to confirm it hears you.
- Speak your sentence.
- Press F8 again to stop.
- The Pill turns Blue ("Thinking...").
- Wait a moment... the text will appear!
-
System Tray:
- Look for the application icon in your taskbar tray (near the clock).
- Right-click -> Quit Whisper Voice to exit the application completely.
📁 Project Structure
whisper_voice/
├── main.py # Application Entry Point & Orchestrator
├── task.md # Development Task Tracking
├── requirements.txt # Python Dependencies
├── build.spec # PyInstaller Configuration
├── run_source.bat # Helper script
├── build_exe.bat # Helper script
├── src/
│ ├── core/
│ │ ├── audio_engine.py # Microphone recording logic
│ │ ├── transcriber.py # AI Model wrapper (Faster-Whisper)
│ │ ├── hotkey_manager.py # Global keyboard hooks
│ │ └── paths.py # Path resolution (EXE vs Script)
│ ├── ui/
│ │ ├── overlay.py # Main Pill Window
│ │ ├── visualizer.py # Audio Spectrum Widget
│ │ ├── loader.py # Bootstrapper/Downloader UI
│ │ └── tray.py # System Tray Icon
│ └── utils/
│ ├── injector.py # Clipboard/Paste logic
│ └── downloader.py # File download utilities
❓ Troubleshooting
Q: Nothing happens when I press F8.
- Check the System Tray to ensure the app is running.
- Ensure you have given the app "Input Monitoring" permissions if prompted (rare on standard Windows).
- Some Antivirus software might block the "Global Hotkey" feature. Whitelist the app.
Q: The app crashes with an error about FFmpeg.
faster-whisperrequires FFmpeg. Make sureffmpeg.exeis either in your system PATH or in alibsfolder next to themain.py(or EXE).
Q: Transcription is slow.
- The "Small" model is generally fast, but on older CPUs, it might take 2-5 seconds for a long sentence.
- To use a GPU, you must install the NVIDIA cuDNN libraries and the
torchversion with CUDA support. This prototype setup defaults to CPU/Auto for compatibility.
Q: "Failed to load model" error.
- Delete the
modelsfolder and restart the app to force a re-download.
License: MIT
Author: Antigravity