Update documentation with new manifesto

2026-01-24 17:06:31 +02:00
parent 9ff0e8d108
commit 26f1d8a3e7
1 changed files with 46 additions and 140 deletions
@@ -1,165 +1,71 @@
-# Whisper Voice - Native Windows AI Transcriber
+# Whisper Voice

-**Whisper Voice** is a high-performance, native Windows application that brings the power of OpenAI's **Whisper** model to your desktop in a seamless, interactive way. 
+**Reclaim Your Voice from the Cloud.**

-Designed for productivity "power users", it allows you to invoke a global hotkey, dictate your thoughts, and have the transcribed text instantly typed into *any* active application (Notepad, Word, Slack, VS Code, etc.).
+Whisper Voice is a high-performance, strictly local speech-to-text tool designed for the desktop. It provides instant, high-accuracy dictation anywhere on your system—no internet connection required, no corporate servers, and absolutely no data harvesting.

-It features a modern, floating "Pill" UI with real-time audio visualization, built on top of the robust PySide6 (Qt) framework.
+We believe that the tools of production—and communication—should belong to the individual, not rented from centralized tech giants.
+
+---
+
+## ✊ Core Principles
+
+### 1. Total Autonomy (Local-First)
+Your voice data is yours alone. Unlike commercial alternatives that siphon your words to remote data centers for processing and profiling, Whisper Voice runs entirely on your hardware. **No masters, no servers.** You retain full sovereignty over your digital footprint.
+
+### 2. Decentralized Power
+By leveraging optimized local processing, we strip away the need for reliance on massive, energy-hungry corporate infrastructure. This is technology scaled to the human level—powerful, efficient, and completely under your control.
+
+### 3. Accessible to All
+High-quality speech recognition shouldn't be gated behind subscriptions or paywalls. This tool is free, open, and built to empower users to interact with their machines on their own terms.

 ---

 ## ✨ Features

-   **🎙️ Global Hotkey**: Press **F8** anywhere in Windows to start recording. Press again to stop.
-   **🤖 Local AI Intelligence**: Powered by `faster-whisper`. Runs entirely on your machine. No cloud API keys, no data leaving your PC. 
-   **⚡ High Performance**: Uses the 'Small' Whisper model by default (~500MB), optimized for a balance of speed and accuracy.
-   **🎨 Modern UI**: A frameless, draggable, floating "Pill" window with a Neon **Audio Visualizer** that reacts to your voice.
-   **🔌 Smart Bootstrapper**: The app is portable and self-healing. On the first run, it checks for the AI model and downloads it automatically if missing.
-   **✍️ Auto-Type**: Automatically simulates keyboard input to paste the transcribed text where your cursor is.
-   **🔋 Portable**: Can be compiled into a single `.exe` file that you can carry on a USB drive.
+*   **100% Offline Processing**: Once the recognition engine is downloaded, the cable can be cut. Nothing leaves your machine.
+*   **Universal Compatibility**: Works in any text field—editors, chat apps, terminals, or browsers. If you can type there, you can speak there.
+*   **Adaptive Input**:
+    *   *Clipboard Mode*: Standard paste injection.
+    *   *High-Speed Simulation*: Simulates keystrokes at supersonic speeds (up to 6000 CPM) for apps that block pasting.
+*   **System Integration**: Minimalist overlay and system tray presence. It exists when you need it and vanishes when you don't.
+*   **Resource Efficiency**: Optimized to run smoothly on consumer hardware without monopolizing your system resources.

 ---

-## 🛠️ Requirements
+## 🚀 Getting Started

-   **OS**: Windows 10 or 11 (64-bit).
-   **Python**: 3.10 or newer (if running from source).
-   **Hardware**: A reasonable CPU (Modern Intel i5/AMD Ryzen). NVIDIA GPU recommended for instant speed (requires CUDA setup), but runs fine on CPU.
-   **Dependencies**: 
-    -   **FFmpeg**: Essential for audio processing. (See Setup Guide).
+### Installation
+1.  Download the latest release.
+2.  Run `WhisperVoice.exe`.
+3.  On the first run, the bootstrapper will autonomously provision the necessary runtime environment. This ensures your system remains clean and dependencies are self-contained.
+
+### Usage
+1.  **Set Your Trigger**: Configure a global hotkey (default: `F9`) in the settings.
+2.  **Speak Freely**: Hold the hotkey (or toggle it) and speak.
+3.  **Direct Action**: Your words are instantly transcribed and injected into your active window.

 ---

-## 🚀 Installation & Setup
+## ⚙️ Configuration

-### Option A: Running from Source (Developers)
+The **Settings** panel puts the means of configuration in your hands:

-1.  **Clone the Repository**:
-    ```bash
-    git clone https://github.com/your/repo.git
-    cd whisper_voice
-    ```
-
-2.  **Environment Setup**:
-    It is highly recommended to use a virtual environment.
-    ```cmd
-    python -m venv venv
-    venv\Scripts\activate
-    ```
-
-3.  **Install Python Dependencies**:
-    ```cmd
-    pip install -r requirements.txt
-    ```
-
-4.  **FFmpeg Setup**:
-    -   **Method 1 (System-wide)**: Download FFmpeg and add the `bin` folder to your Windows PATH environment variable.
-    -   **Method 2 (Portable)**: Download `ffmpeg.exe` and place it in a `libs` folder inside the project root:
-        ```text
-        whisper_voice/
-        ├── main.py
-        ├── libs/
-        │   └── ffmpeg.exe  <-- Place here
-        ```
-
-5.  **Run the App**:
-    ```cmd
-    python main.py
-    ```
-    *Or use the provided `run_source.bat` script.*
-
-### Option B: Building a Portable EXE
-
-You can compile the application into a single executable file for easy distribution.
-
-1.  Follow the **Running from Source** steps above to set up your environment.
-2.  Install `pyinstaller`:
-    ```cmd
-    pip install pyinstaller
-    ```
-3.  Run the Build Script:
-    ```cmd
-    build_exe.bat
-    ```
-    *(Or run `pyinstaller build.spec` manually).*
-
-4.  **Locate the EXE**:
-    The result will be in the `dist` folder: `dist/WhisperVoice.exe`.
-
-5.  **Distribution**:
-    -   You can send just the `.exe` to anyone.
-    -   **Note**: The end-user will still need FFmpeg. You can zip the `libs` folder alongside the EXE to make it truly "unzip and run".
+*   **Recognition Engine**: Choose the size of the model that fits your hardware capabilities (Tiny to Large). Larger models offer greater precision but require more computing power.
+*   **Input Method**: Switch between "Clipboard Paste" and "Simulate Typing" depending on target application restrictions.
+*   **Typing Speed**: Adjust the keystroke injection rate. Crank it up to 6000 CPM for instant text delivery.
+*   **Run on Startup**: Configure the agent to be ready the moment your session begins.

 ---

-## 🎮 Usage Guide
+## 🤝 Mutual Aid

-1.  **First Run Initialization**:
-    -   When you launch the app, you will see a **"Initializing..."** window.
-    -   If the AI Model (`models/` folder) is missing, the app will automatically download it (~500MB).
-    -   Once complete, the app minimizes to the System Tray.
+This project thrives on community collaboration. If you have improvements, fixes, or ideas, you are encouraged to contribute. We build better systems when we build them together, horizontally and transparently.

-2.  **Dictation**:
-    -   Focus the text field where you want to type (e.g., click into a Notepad document).
-    -   Press **F8**.
-    -   The **Floating Pill** appears on screen. Use the visualizer to confirm it hears you.
-    -   Speak your sentence.
-    -   Press **F8** again to stop.
-    -   The Pill turns **Blue** ("Thinking...").
-    -   Wait a moment... the text will appear!
-
-3.  **System Tray**:
-    -   Look for the application icon in your taskbar tray (near the clock).
-    -   Right-click -> **Quit Whisper Voice** to exit the application completely.
+*   **Report Issues**: If something breaks, let us know.
+*   **Contribute Code**: The source is open. Fork it, improve it, share it.

 ---

-## 📁 Project Structure
-
-```text
-whisper_voice/
-├── main.py                 # Application Entry Point & Orchestrator
-├── task.md                 # Development Task Tracking
-├── requirements.txt        # Python Dependencies
-├── build.spec              # PyInstaller Configuration
-├── run_source.bat          # Helper script
-├── build_exe.bat           # Helper script
-├── src/
-│   ├── core/
-│   │   ├── audio_engine.py    # Microphone recording logic
-│   │   ├── transcriber.py     # AI Model wrapper (Faster-Whisper)
-│   │   ├── hotkey_manager.py  # Global keyboard hooks
-│   │   └── paths.py           # Path resolution (EXE vs Script)
-│   ├── ui/
-│   │   ├── overlay.py         # Main Pill Window
-│   │   ├── visualizer.py      # Audio Spectrum Widget
-│   │   ├── loader.py          # Bootstrapper/Downloader UI
-│   │   └── tray.py            # System Tray Icon
-│   └── utils/
-│       ├── injector.py        # Clipboard/Paste logic
-│       └── downloader.py      # File download utilities
-```
-
---
-
-## ❓ Troubleshooting
-
-**Q: Nothing happens when I press F8.**
-   Check the System Tray to ensure the app is running.
-   Ensure you have given the app "Input Monitoring" permissions if prompted (rare on standard Windows).
-   Some Antivirus software might block the "Global Hotkey" feature. Whitelist the app.
-
-**Q: The app crashes with an error about FFmpeg.**
-   `faster-whisper` requires FFmpeg. Make sure `ffmpeg.exe` is either in your system PATH or in a `libs` folder next to the `main.py` (or EXE).
-
-**Q: Transcription is slow.**
-   The "Small" model is generally fast, but on older CPUs, it might take 2-5 seconds for a long sentence.
-   To use a GPU, you must install the NVIDIA cuDNN libraries and the `torch` version with CUDA support. This prototype setup defaults to CPU/Auto for compatibility.
-
-**Q: "Failed to load model" error.**
-   Delete the `models` folder and restart the app to force a re-download.
-
---
-
-**License**: MIT  
-**Author**: Antigravity
+*Built with local processing libraries and Qt.*
+*No gods, no cloud managers.*