lashman/whisper_voice

Fork 0

Files

Your Name e627e1b8aa Correct hardware detection statement in docs

2026-01-24 17:24:56 +02:00

5.7 KiB

Raw Blame History

WHISPER VOICE

SOVEREIGN SPEECH RECOGNITION

Your Voice. Your Machine. Your Data.
A high-performance, locally-run dictation agent for the liberated desktop.

✊ The Manifesto

We hold these truths to be self-evident: That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.

Whisper Voice is built on the principle of technological sovereignty. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.

"The master's tools will never dismantle the master's house." — Audre Lorde
Build your own tools. Run them locally.

⚡ Technical Core

Whisper Voice is not a wrapper for an API. It is a fully contained neural inference engine running on your metal.

The Engine: Faster-Whisper

We utilize the CTranslate2 backend—a high-performance inference engine for Transformer models. This allows us to run OpenAI's Whisper architectures with:

4x Speedup over standard PyTorch implementations.
4x Memory Reduction via 8-bit quantization (int8), enabling Pro-grade models on consumer GPUs.

The Sense: Silero VAD

To distinguish human speech from background noise, we employ Silero VAD (Voice Activity Detection). This ensures that the agent only listens when you speak, conserving compute resources and preventing hallucinated text from silence.

The Interface: Qt 6 (PySide6)

The UI is built with Qt Quick/QML, rendering a hardware-accelerated, glassmorphic overlay that feels native to modern desktop environments while remaining completely decoupled from OS spyware.

📊 Model Intelligence

Select the intelligence level that matches your hardware reality.

Model	GPU VRAM	CPU RAM	Speed	Best For
Tiny	~500 MB	~1 GB	⚡ Supersonic	Quick commands, older machinery.
Base	~600 MB	~1 GB	🚀 Very Fast	Daily driving on low-power laptops.
Small	~1 GB	~2 GB	⏩ Fast	High accuracy for English dictation.
Medium	~2 GB	~4 GB	⚖️ Balanced	Complex vocabulary and accents.
Large-v3 Turbo	~4 GB	~6 GB	✨ Optimal	The sweet spot. Large-level smarts, Medium-level speed.
Large-v3	~5 GB	~8 GB	🧠 Maximum	Professional transcription. Uncompromised quality.

Note: You must select your available Compute Device (CUDA GPU or CPU) in the Settings to enable acceleration.

🛠️ Operational Guide

Deployment

Download: Grab the latest WhisperVoice.exe from Releases.
Install: There is no installation. Place the executable in a directory you control (e.g., C:\Tools\WhisperVoice).
Bootstrap: Run it. The agent will self-provision its own isolated Python environment (~2GB). This ensures your system PATH remains clean and unpolluted.

Usage

Hotkeys: The default trigger is F9. You can rebind this in Settings to any combination (e.g., Ctrl+Space, Alt+V).
Injection Modes:
- Clipboard Paste: Standard, reliable text insertion.
- Simulate Typing: A stealth mode that physically mimics keystrokes (up to 6000 CPM) to bypass applications that block pasting (e.g., games, remote terminals).
Tray Agent: The app lives in your system tray. Right-click the icon to access Settings or terminate the process.

Removal

Portable: To uninstall, simply delete the folder. No registry keys, no hidden services, no trace left behind.

🔧 Troubleshooting

The app crashes immediately on start

Ensure you have the Microsoft Visual C++ Redistributable (2015-2022) installed, as the underlying CTranslate2 engine requires these standard libraries.

"Simulate Typing" is slow or misses characters

Adjust the Typing Speed slider in Settings. Some older applications cannot handle supersonic 6000 CPM input; try lowering it to 1200 CPM.

Microphone not picking up audio

The agent uses your System Default Input Device. Ensure your microphone is set as Default in Windows Sound Settings.

⚖️ License & Rights

Public Domain (CC0 1.0)

To the extent possible under law, the creators of this interface have waived all copyright and related or neighboring rights to this work. This tool belongs to the commons. It is a gift to the digital proletariat.

Fork it.
Mod it.
Distribute it.

Credits

OpenAI: For the Whisper weights (MIT).
Systran: For Faster-Whisper (MIT).
Qt Company: For the UI framework (LGPL).

No gods, no cloud managers.

5.7 KiB Raw Blame History