2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00
2026-01-24 17:03:52 +02:00

Whisper Voice - Native Windows AI Transcriber

Whisper Voice is a high-performance, native Windows application that brings the power of OpenAI's Whisper model to your desktop in a seamless, interactive way.

Designed for productivity "power users", it allows you to invoke a global hotkey, dictate your thoughts, and have the transcribed text instantly typed into any active application (Notepad, Word, Slack, VS Code, etc.).

It features a modern, floating "Pill" UI with real-time audio visualization, built on top of the robust PySide6 (Qt) framework.


Features

  • 🎙️ Global Hotkey: Press F8 anywhere in Windows to start recording. Press again to stop.
  • 🤖 Local AI Intelligence: Powered by faster-whisper. Runs entirely on your machine. No cloud API keys, no data leaving your PC.
  • High Performance: Uses the 'Small' Whisper model by default (~500MB), optimized for a balance of speed and accuracy.
  • 🎨 Modern UI: A frameless, draggable, floating "Pill" window with a Neon Audio Visualizer that reacts to your voice.
  • 🔌 Smart Bootstrapper: The app is portable and self-healing. On the first run, it checks for the AI model and downloads it automatically if missing.
  • ✍️ Auto-Type: Automatically simulates keyboard input to paste the transcribed text where your cursor is.
  • 🔋 Portable: Can be compiled into a single .exe file that you can carry on a USB drive.

🛠️ Requirements

  • OS: Windows 10 or 11 (64-bit).
  • Python: 3.10 or newer (if running from source).
  • Hardware: A reasonable CPU (Modern Intel i5/AMD Ryzen). NVIDIA GPU recommended for instant speed (requires CUDA setup), but runs fine on CPU.
  • Dependencies:
    • FFmpeg: Essential for audio processing. (See Setup Guide).

🚀 Installation & Setup

Option A: Running from Source (Developers)

  1. Clone the Repository:

    git clone https://github.com/your/repo.git
    cd whisper_voice
    
  2. Environment Setup: It is highly recommended to use a virtual environment.

    python -m venv venv
    venv\Scripts\activate
    
  3. Install Python Dependencies:

    pip install -r requirements.txt
    
  4. FFmpeg Setup:

    • Method 1 (System-wide): Download FFmpeg and add the bin folder to your Windows PATH environment variable.
    • Method 2 (Portable): Download ffmpeg.exe and place it in a libs folder inside the project root:
      whisper_voice/
      ├── main.py
      ├── libs/
      │   └── ffmpeg.exe  <-- Place here
      
  5. Run the App:

    python main.py
    

    Or use the provided run_source.bat script.

Option B: Building a Portable EXE

You can compile the application into a single executable file for easy distribution.

  1. Follow the Running from Source steps above to set up your environment.

  2. Install pyinstaller:

    pip install pyinstaller
    
  3. Run the Build Script:

    build_exe.bat
    

    (Or run pyinstaller build.spec manually).

  4. Locate the EXE: The result will be in the dist folder: dist/WhisperVoice.exe.

  5. Distribution:

    • You can send just the .exe to anyone.
    • Note: The end-user will still need FFmpeg. You can zip the libs folder alongside the EXE to make it truly "unzip and run".

🎮 Usage Guide

  1. First Run Initialization:

    • When you launch the app, you will see a "Initializing..." window.
    • If the AI Model (models/ folder) is missing, the app will automatically download it (~500MB).
    • Once complete, the app minimizes to the System Tray.
  2. Dictation:

    • Focus the text field where you want to type (e.g., click into a Notepad document).
    • Press F8.
    • The Floating Pill appears on screen. Use the visualizer to confirm it hears you.
    • Speak your sentence.
    • Press F8 again to stop.
    • The Pill turns Blue ("Thinking...").
    • Wait a moment... the text will appear!
  3. System Tray:

    • Look for the application icon in your taskbar tray (near the clock).
    • Right-click -> Quit Whisper Voice to exit the application completely.

📁 Project Structure

whisper_voice/
├── main.py                 # Application Entry Point & Orchestrator
├── task.md                 # Development Task Tracking
├── requirements.txt        # Python Dependencies
├── build.spec              # PyInstaller Configuration
├── run_source.bat          # Helper script
├── build_exe.bat           # Helper script
├── src/
│   ├── core/
│   │   ├── audio_engine.py    # Microphone recording logic
│   │   ├── transcriber.py     # AI Model wrapper (Faster-Whisper)
│   │   ├── hotkey_manager.py  # Global keyboard hooks
│   │   └── paths.py           # Path resolution (EXE vs Script)
│   ├── ui/
│   │   ├── overlay.py         # Main Pill Window
│   │   ├── visualizer.py      # Audio Spectrum Widget
│   │   ├── loader.py          # Bootstrapper/Downloader UI
│   │   └── tray.py            # System Tray Icon
│   └── utils/
│       ├── injector.py        # Clipboard/Paste logic
│       └── downloader.py      # File download utilities

Troubleshooting

Q: Nothing happens when I press F8.

  • Check the System Tray to ensure the app is running.
  • Ensure you have given the app "Input Monitoring" permissions if prompted (rare on standard Windows).
  • Some Antivirus software might block the "Global Hotkey" feature. Whitelist the app.

Q: The app crashes with an error about FFmpeg.

  • faster-whisper requires FFmpeg. Make sure ffmpeg.exe is either in your system PATH or in a libs folder next to the main.py (or EXE).

Q: Transcription is slow.

  • The "Small" model is generally fast, but on older CPUs, it might take 2-5 seconds for a long sentence.
  • To use a GPU, you must install the NVIDIA cuDNN libraries and the torch version with CUDA support. This prototype setup defaults to CPU/Auto for compatibility.

Q: "Failed to load model" error.

  • Delete the models folder and restart the app to force a re-download.

License: MIT
Author: Antigravity

Description
No description provided
Readme 106 MiB
v1.2.0 Latest
2026-02-18 22:30:48 +02:00
Languages
Python 51.8%
QML 44.7%
GLSL 3.1%
Batchfile 0.4%