Initial commit of WhisperVoice
This commit is contained in:
165
README.md
Normal file
165
README.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Whisper Voice - Native Windows AI Transcriber
|
||||
|
||||
**Whisper Voice** is a high-performance, native Windows application that brings the power of OpenAI's **Whisper** model to your desktop in a seamless, interactive way.
|
||||
|
||||
Designed for productivity "power users", it allows you to invoke a global hotkey, dictate your thoughts, and have the transcribed text instantly typed into *any* active application (Notepad, Word, Slack, VS Code, etc.).
|
||||
|
||||
It features a modern, floating "Pill" UI with real-time audio visualization, built on top of the robust PySide6 (Qt) framework.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Features
|
||||
|
||||
- **🎙️ Global Hotkey**: Press **F8** anywhere in Windows to start recording. Press again to stop.
|
||||
- **🤖 Local AI Intelligence**: Powered by `faster-whisper`. Runs entirely on your machine. No cloud API keys, no data leaving your PC.
|
||||
- **⚡ High Performance**: Uses the 'Small' Whisper model by default (~500MB), optimized for a balance of speed and accuracy.
|
||||
- **🎨 Modern UI**: A frameless, draggable, floating "Pill" window with a Neon **Audio Visualizer** that reacts to your voice.
|
||||
- **🔌 Smart Bootstrapper**: The app is portable and self-healing. On the first run, it checks for the AI model and downloads it automatically if missing.
|
||||
- **✍️ Auto-Type**: Automatically simulates keyboard input to paste the transcribed text where your cursor is.
|
||||
- **🔋 Portable**: Can be compiled into a single `.exe` file that you can carry on a USB drive.
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Requirements
|
||||
|
||||
- **OS**: Windows 10 or 11 (64-bit).
|
||||
- **Python**: 3.10 or newer (if running from source).
|
||||
- **Hardware**: A reasonable CPU (Modern Intel i5/AMD Ryzen). NVIDIA GPU recommended for instant speed (requires CUDA setup), but runs fine on CPU.
|
||||
- **Dependencies**:
|
||||
- **FFmpeg**: Essential for audio processing. (See Setup Guide).
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Installation & Setup
|
||||
|
||||
### Option A: Running from Source (Developers)
|
||||
|
||||
1. **Clone the Repository**:
|
||||
```bash
|
||||
git clone https://github.com/your/repo.git
|
||||
cd whisper_voice
|
||||
```
|
||||
|
||||
2. **Environment Setup**:
|
||||
It is highly recommended to use a virtual environment.
|
||||
```cmd
|
||||
python -m venv venv
|
||||
venv\Scripts\activate
|
||||
```
|
||||
|
||||
3. **Install Python Dependencies**:
|
||||
```cmd
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
4. **FFmpeg Setup**:
|
||||
- **Method 1 (System-wide)**: Download FFmpeg and add the `bin` folder to your Windows PATH environment variable.
|
||||
- **Method 2 (Portable)**: Download `ffmpeg.exe` and place it in a `libs` folder inside the project root:
|
||||
```text
|
||||
whisper_voice/
|
||||
├── main.py
|
||||
├── libs/
|
||||
│ └── ffmpeg.exe <-- Place here
|
||||
```
|
||||
|
||||
5. **Run the App**:
|
||||
```cmd
|
||||
python main.py
|
||||
```
|
||||
*Or use the provided `run_source.bat` script.*
|
||||
|
||||
### Option B: Building a Portable EXE
|
||||
|
||||
You can compile the application into a single executable file for easy distribution.
|
||||
|
||||
1. Follow the **Running from Source** steps above to set up your environment.
|
||||
2. Install `pyinstaller`:
|
||||
```cmd
|
||||
pip install pyinstaller
|
||||
```
|
||||
3. Run the Build Script:
|
||||
```cmd
|
||||
build_exe.bat
|
||||
```
|
||||
*(Or run `pyinstaller build.spec` manually).*
|
||||
|
||||
4. **Locate the EXE**:
|
||||
The result will be in the `dist` folder: `dist/WhisperVoice.exe`.
|
||||
|
||||
5. **Distribution**:
|
||||
- You can send just the `.exe` to anyone.
|
||||
- **Note**: The end-user will still need FFmpeg. You can zip the `libs` folder alongside the EXE to make it truly "unzip and run".
|
||||
|
||||
---
|
||||
|
||||
## 🎮 Usage Guide
|
||||
|
||||
1. **First Run Initialization**:
|
||||
- When you launch the app, you will see a **"Initializing..."** window.
|
||||
- If the AI Model (`models/` folder) is missing, the app will automatically download it (~500MB).
|
||||
- Once complete, the app minimizes to the System Tray.
|
||||
|
||||
2. **Dictation**:
|
||||
- Focus the text field where you want to type (e.g., click into a Notepad document).
|
||||
- Press **F8**.
|
||||
- The **Floating Pill** appears on screen. Use the visualizer to confirm it hears you.
|
||||
- Speak your sentence.
|
||||
- Press **F8** again to stop.
|
||||
- The Pill turns **Blue** ("Thinking...").
|
||||
- Wait a moment... the text will appear!
|
||||
|
||||
3. **System Tray**:
|
||||
- Look for the application icon in your taskbar tray (near the clock).
|
||||
- Right-click -> **Quit Whisper Voice** to exit the application completely.
|
||||
|
||||
---
|
||||
|
||||
## 📁 Project Structure
|
||||
|
||||
```text
|
||||
whisper_voice/
|
||||
├── main.py # Application Entry Point & Orchestrator
|
||||
├── task.md # Development Task Tracking
|
||||
├── requirements.txt # Python Dependencies
|
||||
├── build.spec # PyInstaller Configuration
|
||||
├── run_source.bat # Helper script
|
||||
├── build_exe.bat # Helper script
|
||||
├── src/
|
||||
│ ├── core/
|
||||
│ │ ├── audio_engine.py # Microphone recording logic
|
||||
│ │ ├── transcriber.py # AI Model wrapper (Faster-Whisper)
|
||||
│ │ ├── hotkey_manager.py # Global keyboard hooks
|
||||
│ │ └── paths.py # Path resolution (EXE vs Script)
|
||||
│ ├── ui/
|
||||
│ │ ├── overlay.py # Main Pill Window
|
||||
│ │ ├── visualizer.py # Audio Spectrum Widget
|
||||
│ │ ├── loader.py # Bootstrapper/Downloader UI
|
||||
│ │ └── tray.py # System Tray Icon
|
||||
│ └── utils/
|
||||
│ ├── injector.py # Clipboard/Paste logic
|
||||
│ └── downloader.py # File download utilities
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ❓ Troubleshooting
|
||||
|
||||
**Q: Nothing happens when I press F8.**
|
||||
- Check the System Tray to ensure the app is running.
|
||||
- Ensure you have given the app "Input Monitoring" permissions if prompted (rare on standard Windows).
|
||||
- Some Antivirus software might block the "Global Hotkey" feature. Whitelist the app.
|
||||
|
||||
**Q: The app crashes with an error about FFmpeg.**
|
||||
- `faster-whisper` requires FFmpeg. Make sure `ffmpeg.exe` is either in your system PATH or in a `libs` folder next to the `main.py` (or EXE).
|
||||
|
||||
**Q: Transcription is slow.**
|
||||
- The "Small" model is generally fast, but on older CPUs, it might take 2-5 seconds for a long sentence.
|
||||
- To use a GPU, you must install the NVIDIA cuDNN libraries and the `torch` version with CUDA support. This prototype setup defaults to CPU/Auto for compatibility.
|
||||
|
||||
**Q: "Failed to load model" error.**
|
||||
- Delete the `models` folder and restart the app to force a re-download.
|
||||
|
||||
---
|
||||
|
||||
**License**: MIT
|
||||
**Author**: Antigravity
|
||||
Reference in New Issue
Block a user