T

Your Name 4615f3084f Docs: Add WCAG 2.2 AAA step-by-step implementation plan

15 tasks covering: design tokens, 6 component fixes, Settings/Overlay/Loader
hardcoded colors, accessibility properties, keyboard nav, reduced motion.

2026-02-18 20:56:51 +02:00

assets

Initial commit of WhisperVoice

2026-01-24 17:03:52 +02:00

dist

Feat: Integrated Local LLM (Llama 3.2 1B) for Intelligent Correction -- New Core: Added LLMEngine utilizing llama-cpp-python for local private text post-processing. -- Forensic Protocol: Engineered strict system prompts to prevent LLM refusals, censorship, or assistant chatter. -- Three Modes: Grammar, Standard, Rewrite. -- Start/Stop Logic: Consolidated conflicting recording methods. -- Hotkeys: Added dedicated F9 (Correct) vs F8 (Transcribe). -- UI: Updated Settings. -- Build: Updated portable_build.py. -- Docs: Updated README.

2026-01-31 01:02:24 +02:00

docs/plans

Docs: Add WCAG 2.2 AAA step-by-step implementation plan

2026-02-18 20:56:51 +02:00

src

2026-01-31 01:02:24 +02:00

.gitignore

Initial commit of WhisperVoice

2026-01-24 17:03:52 +02:00

app_icon.ico

Initial commit of WhisperVoice

2026-01-24 17:03:52 +02:00

bootstrapper.py

2026-01-31 01:02:24 +02:00

build_bootstrapper.py

Initial commit of WhisperVoice

2026-01-24 17:03:52 +02:00

build_exe.bat

Initial commit of WhisperVoice

2026-01-24 17:03:52 +02:00

convert_icon.py

Initial commit of WhisperVoice

2026-01-24 17:03:52 +02:00

download_icons.py

Initial commit of WhisperVoice

2026-01-24 17:03:52 +02:00

main.py

2026-01-31 01:02:24 +02:00

portable_build.py

2026-01-31 01:02:24 +02:00

README.md

2026-01-31 01:02:24 +02:00

RELEASE_NOTES.md

Release v1.0.4: The Compatibility Update

2026-01-25 20:28:01 +02:00

requirements.txt

2026-01-31 01:02:24 +02:00

run_source.bat

Initial commit of WhisperVoice

2026-01-24 17:03:52 +02:00

run.bat

Initial commit of WhisperVoice

2026-01-24 17:03:52 +02:00

test_m2m.py

Release v1.0.2: Implemented Style Prompting & Removed Grammar Correction

2026-01-25 13:42:06 +02:00

test_mt0.py

Release v1.0.2: Implemented Style Prompting & Removed Grammar Correction

2026-01-25 13:42:06 +02:00

test_punctuation.py

Release v1.0.2: Implemented Style Prompting & Removed Grammar Correction

2026-01-25 13:42:06 +02:00

README.md

🎙️ W H I S P E R V O I C E

SOVEREIGN SPEECH RECOGNITION

"The master's tools will never dismantle the master's house."
Build your own tools. Run them locally. Free your mind.

View Source • Report Issue

📡 The Transmission

We are witnessing the enshittification of the digital world. What were once vibrant social commons are being walled off, strip-mined for data, and degraded into rent-seeking silos. Your voice is no longer your own; it is a training set for a corporate oracle that charges you for the privilege of listening.

Whisper Voice is a small act of sabotage against this trend.

It is built on the axiom of Technological Sovereignty. By moving state-of-the-art inference from the server farms to your own silicon, you reclaim the means of digital production. No telemetry. No subscriptions. No "cloud processing" that eavesdrops on your intent.

⚡ The Engine

Whisper Voice operates directly on the metal. It is not an API wrapper; it is an autonomous machine.

Component	Technology	Benefit
Inference Core	Faster-Whisper	Hyper-optimized C++ implementation via CTranslate2. Delivers 4x velocity over standard PyTorch.
Compression	INT8 quantization	Enables Pro-grade models (`Large-v3`) to run on consumer-grade GPUs, democratizing elite AI.
Sensory Gate	Silero VAD	Enterprise-grade Voice Activity Detection filters out the noise, ensuring only pure intent is processed.
Interface	Qt 6 / QML	Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign.

🛑 Compatibility Matrix (Windows)

The core engine (CTranslate2) is heavily optimized for Nvidia tensor cores.

Manufacturer	Hardware	Status	Notes
Nvidia	GTX 900+ / RTX	✅ Supported	Full heavy-metal acceleration.
AMD	Radeon RX	⚠️ CPU Fallback	Runs on CPU. Valid for `Small/Medium`, slow for `Large`.
Intel	Arc / Iris	⚠️ CPU Fallback	Runs on CPU. Valid for `Small/Medium`, slow for `Large`.
Apple	M1 / M2 / M3	❌ Unsupported	Release is strictly Windows x64.

AMD Users: v1.0.3 auto-detects GPU failures and silently falls back to CPU.

🖋️ Universal Transcription

At its core, Whisper Voice is the ultimate bridge between thought and text. It listens with superhuman precision, converting spoken word into written form across 99 languages.

Punctuation Mastery: Automatically handles capitalization and complex punctuation formatting.
Contextual Intelligence: Smarter than standard dictation; it understands the flow of sentences to resolve homophones and technical jargon ($1.5k vs "fifteen hundred dollars").
Total Privacy: Your private dictation, legal notes, or creative writing never leave your RAM.

Workflow: `F9 (Default)`

The primary channel for native-language transcription. It transcribes precisely what it hears in the language you speak (or the one you've locked in Settings).

🧠 Intelligent Correction (New in v1.1.0)

Whisper Voice now integrates a local Llama 3.2 1B LLM to act as a "Silent Consultant". It post-processes transcripts to fix grammar or polish style without effectively "chatting" back.

It is strictly trained on a Forensic Protocol: it will never lecture you, never refuse to process explicit language, and never sanitize your words. Your profanity is yours to keep.

Correction Modes:

Standard (Default): Fixes grammar, punctuation, and capitalization while keeping every word you said.
Grammar Only: Strictly fixes objective errors (spelling/agreement). Touches nothing else.
Rewrite: Polishes the flow and clarity of your sentences while explicitly preserving your original tone (Casual stays casual, Formal stays formal).

Supported Languages:

The correction engine is optimized for English, German, French, Italian, Portuguese, Spanish, Hindi, and Thai. It also performs well on Russian, Chinese, Japanese, and Romanian.

This approach incurs a ~2s latency penalty but uses zero extra VRAM when in Low VRAM mode.

🌎 Universal Translation

Whisper Voice v1.0.1 includes a Neural Translation Engine that allows you to bridge any linguistic gap instantly.

Input: Speak in French, Japanese, Russian, or 96 other languages.
Output: The engine instantly reconstructs the semantic meaning into fluent English.
Task Protocol: Handled via the dedicated F10 channel.

🔍 Why only English translation?

A common question arises: Why can't I translate from French to Japanese?

The architecture of the underlying Whisper model is a Many-to-English design. During its massive training phase (680,000 hours of audio), the translation task was specifically optimized to map the global linguistic commons onto a single bridge language: English. This allowed the model to reach incredible levels of semantic understanding without the exponential complexity of a "Many-to-Many" mapping.

By focusing its translation decoder solely on English, Whisper achieves "Zero-Shot" quality that rivals specialized translation engines while remaining lightweight enough to run on your local GPU.

🕹️ Command & Control

Global Hotkeys

The agent runs silently in the background, waiting for your signal.

Transcribe (F9): Opens the channel for standard speech-to-text.
Translate (F10): Opens the channel for neural translation.
Customization: Remap these keys in Settings. The recorder supports complex chords (e.g. Ctrl + Alt + Space) to fit your workflow.

Injection Protocols

Clipboard Paste: Standard text injection. Instant, reliable.
Simulate Typing: Mimics physical keystrokes at superhuman speed (6000 CPM). Bypasses anti-paste restrictions and "protected" windows.

📊 Intelligence Matrix

Select the model that aligns with your available resources.

Model	VRAM (GPU)	RAM (CPU)	Designation	Capability
`Tiny`	~500 MB	~1 GB	⚡ Supersonic	Command & Control, older hardware.
`Base`	~600 MB	~1 GB	🚀 Very Fast	Daily driver for low-power laptops.
`Small`	~1 GB	~2 GB	⏩ Fast	High accuracy English dictation.
`Medium`	~2 GB	~4 GB	⚖️ Balanced	Complex vocabulary, foreign accents.
`Large-v3 Turbo`	~4 GB	~6 GB	✨ Optimal	The Sweet Spot. Near-Large intelligence, Medium speed.
`Large-v3`	~5 GB	~8 GB	🧠 Maximum	Professional grade. Uncompromised.

Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.

📉 Low VRAM Mode

For users with limited GPU memory (e.g., 4GB cards) or those running heavy games simultaneously, Whisper Voice offers a specialized Low VRAM Mode.

Behavior: The AI model is aggressively unloaded from the GPU immediately after every transcription.
Benefit: When idle, the app consumes near-zero VRAM (~0MB), leaving your GPU completely free for gaming or rendering.
Trade-off: There is a "cold start" latency of 1-2 seconds for every voice command as the model reloads from the disk cache.

🛠️ Deployment

📥 Installation

Acquire: Download WhisperVoice.exe from Releases.
Deploy: Place it anywhere. It is portable.
Bootstrap: Run it. The agent will self-provision an isolated Python runtime (~2GB) on first launch.
Sync: Future updates are handled by the Smart Bootstrapper, which surgically updates only changed files, respecting your bandwidth and your settings.

🔧 Troubleshooting

App crashes on start: Ensure you have Microsoft Visual C++ Redistributable 2015-2022 installed.
"Simulate Typing" is slow: Some applications (remote desktops, legacy games) cannot handle the data stream. Lower the typing speed in Settings to ~1200 CPM.
No Audio: The agent listens to the Default Communication Device. Verify your Windows Sound Control Panel.

🌐 Supported Languages

The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on Auto-Detect for fluid multilingual usage.


Afrikaans 🇿🇦	Albanian 🇦🇱	Amharic 🇪🇹	Arabic 🇸🇦	Armenian 🇦🇲	Assamese 🇮🇳
Azerbaijani 🇦🇿	Bashkir 🇷🇺	Basque 🇪🇸	Belarusian 🇧🇾	Bengali 🇧🇩	Bosnian 🇧🇦
Breton 🇫🇷	Bulgarian 🇧🇬	Burmese 🇲🇲	Castilian 🇪🇸	Catalan 🇪🇸	Chinese 🇨🇳
Croatian 🇭🇷	Czech 🇨🇿	Danish 🇩🇰	Dutch 🇳🇱	English 🇺🇸	Estonian 🇪🇪
Faroese 🇫🇴	Finnish 🇫🇮	Flemish 🇧🇪	French 🇫🇷	Galician 🇪🇸	Georgian 🇬🇪
German 🇩🇪	Greek 🇬🇷	Gujarati 🇮🇳	Haitian 🇭🇹	Hausa 🇳🇬	Hawaiian 🇺🇸
Hebrew 🇮🇱	Hindi 🇮🇳	Hungarian 🇭🇺	Icelandic 🇮🇸	Indonesian 🇮🇩	Italian 🇮🇹
Japanese 🇯🇵	Javanese 🇮 Indonesa	Kannada 🇮🇳	Kazakh 🇰🇿	Khmer 🇰🇭	Korean 🇰🇷
Lao 🇱🇦	Latin 🇻🇦	Latvian 🇱🇻	Lingala 🇨🇩	Lithuanian 🇱🇹	Luxembourgish 🇱🇺
Macedonian 🇲🇰	Malagasy 🇲🇬	Malay 🇲🇾	Malayalam 🇮🇳	Maltese 🇲🇹	Maori 🇳🇿
Marathi 🇮🇳	Moldavian 🇲🇩	Mongolian 🇲🇳	Myanmar 🇲🇲	Nepali 🇳🇵	Norwegian 🇳🇴
Occitan 🇫🇷	Panjabi 🇮🇳	Pashto 🇦🇫	Persian 🇮🇷	Polish 🇵🇱	Portuguese 🇵🇹
Punjabi 🇮🇳	Romanian 🇷🇴	Russian 🇷🇺	Sanskrit 🇮🇳	Serbian 🇷🇸	Shona 🇿🇼
Sindhi 🇵🇰	Sinhala 🇱🇰	Slovak 🇸🇰	Slovenian 🇸🇮	Somali 🇸🇴	Spanish 🇪🇸
Sundanese 🇮🇩	Swahili 🇰🇪	Swedish 🇸🇪	Tagalog 🇵🇭	Tajik 🇹🇯	Tamil 🇮🇳
Tatar 🇷🇺	Telugu 🇮🇳	Thai 🇹🇭	Tibetan 🇨🇳	Turkish 🇹🇷	Turkmen 🇹🇲
Ukrainian 🇺🇦	Urdu 🇵🇰	Uzbek 🇺🇿	Vietnamese 🇻e	Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿	Yiddish 🇮🇱
Yoruba 🇳🇬

⚖️ PUBLIC DOMAIN (CC0 1.0)

No Rights Reserved. No Gods. No Masters. No Managers.

Credit to OpenAI (Whisper), Systran (Faster-Whisper), and Silero (VAD).

Releases 6

v1.2.0 Latest

2026-02-18 22:30:48 +02:00

Languages

Python 52.4%

QML 44.1%

GLSL 3.1%

Batchfile 0.4%

README.md Unescape Escape

🎙️ W H I S P E R V O I C E

SOVEREIGN SPEECH RECOGNITION

📡 The Transmission

⚡ The Engine

🛑 Compatibility Matrix (Windows)

🖋️ Universal Transcription

Workflow: F9 (Default)

🧠 Intelligent Correction (New in v1.1.0)

Correction Modes:

Supported Languages:

🌎 Universal Translation

🔍 Why only English translation?

🕹️ Command & Control

Global Hotkeys

Injection Protocols

📊 Intelligence Matrix

📉 Low VRAM Mode

🛠️ Deployment

📥 Installation

🔧 Troubleshooting

🌐 Supported Languages

⚖️ PUBLIC DOMAIN (CC0 1.0)

README.md

Workflow: `F9 (Default)`