18 Commits

Author SHA1 Message Date
Your Name
baa5e2e69e Feat: Integrated Local LLM (Llama 3.2 1B) for Intelligent Correction -- New Core: Added LLMEngine utilizing llama-cpp-python for local private text post-processing. -- Forensic Protocol: Engineered strict system prompts to prevent LLM refusals, censorship, or assistant chatter. -- Three Modes: Grammar, Standard, Rewrite. -- Start/Stop Logic: Consolidated conflicting recording methods. -- Hotkeys: Added dedicated F9 (Correct) vs F8 (Transcribe). -- UI: Updated Settings. -- Build: Updated portable_build.py. -- Docs: Updated README. 2026-01-31 01:02:24 +02:00
Your Name
3137770742 Release v1.0.4: The Compatibility Update
- Added robust CPU Fallback for AMD/Non-CUDA GPUs.
- Implemented Lazy Load for AI Engine to prevent startup crashes.
- Added explicit DLL injection for Cublas/Cudnn on Windows.
- Added Corrupt Model Auto-Repair logic.
- Includes pre-compiled v1.0.4 executable.
2026-01-25 20:28:01 +02:00
Your Name
aed489dd23 Docs: Detailed explanation of Low VRAM Mode and Style Prompting 2026-01-25 13:52:10 +02:00
Your Name
e23c492360 Docs: Add RELEASE_NOTES.md for v1.0.2 2026-01-25 13:46:48 +02:00
Your Name
84f10092e9 Release v1.0.2: Implemented Style Prompting & Removed Grammar Correction
- Removed M2M100 Grammar Correction model completely to reduce bloat/complexity.
- Implemented 'Style Prompting' in Settings -> AI Engine to handle punctuation natively via Whisper.
- Added Style Presets: Standard (Default), Casual, and Custom.
- Optimized Build: Bootstrapper no longer requires transformers/sentencepiece.
- Fixed 'torch' NameError in Low VRAM mode.
- Fixed Bootstrapper missing dependency detection.
- Updated UI to reflect removed features.
- Included compiled v1.0.2 Executable in dist/.
2026-01-25 13:42:06 +02:00
Your Name
03f46ee1e3 Docs: Final polish - Enshittification manifesto and structural refinement 2026-01-24 19:21:01 +02:00
Your Name
0f1bf5f1af Docs: Final polish - 6-col language table and refined manifesto 2026-01-24 19:12:08 +02:00
Your Name
0b2b5848e2 Fix: Translation Reliability, Click-Through, and Docs Sync
- Transcriber: Enforced 'beam_size=5' and prompt injection for robust translation.
- Transcriber: Removed conditioning on previous text to prevent language stickiness.
- Transcriber: Refactored kwargs to sanitize inputs.
- Overlay: Fixed click-through by toggling WS_EX_TRANSPARENT.
- UI: Added real download progress reporting.
- Docs: Refactored language list to table.
2026-01-24 19:05:43 +02:00
Your Name
f3bf7541cf Docs: Detailed expansion of README with Translation features and open layout 2026-01-24 18:33:22 +02:00
Your Name
4b84a27a67 v1.0.1 Feature Update and Polish
Full Changelog:

[New Features]
- Added Native Translation Mode:
  - Whisper model now fully supports Translating any language to English
  - Added 'task' and 'language' parameters to Transcriber core
- Dual Hotkey Support:
  - Added separate Global Hotkeys for Transcribe (default F8) and Translate (default F10)
  - Both hotkeys are fully customizable in Settings
  - Engine dynamically switches modes based on which key is pressed

[UI/UX Improvements]
- Settings Window:
  - Widened Hotkey Input fields (240px) to accommodate long combinations
  - Added Pretty-Printing for hotkey sequences (e.g. 'ctrl+f9' display as 'Ctrl + F9')
  - Replaced Country Code dropdown with Full Language Names (99+ languages)
  - Made Language Dropdown scrollable (max height 300px) to prevent screen overflow
  - Removed redundant 'Task' selector (replaced by dedicated hotkeys)
- System Tray:
  - Tooltip now displays both Transcribe and Translate hotkeys
  - Tooltip hotkeys are formatted readably

[Core & Performance]
- Bootstrapper:
  - Implemented Smart Incremental Sync
  - Now checks filesize and content hash before copying files
  - Drastically reduces startup time for subsequent runs
  - Preserves user settings.json during updates
- Backend:
  - Fixed HotkeyManager to support dynamic configuration keys
  - Fixed Language Lock: selecting a language now correctly forces the model to use it
  - Refactored bridge/main connection for language list handling
2026-01-24 18:29:10 +02:00
Your Name
f184eb0037 Fix: Invisible overlay blocking mouse clicks
Problem:
The overlay window, even when fully transparent or visually hidden (opacity 0), was still intercepting mouse events. This created a 'dead zone' on the screen where users could not click through to applications behind the overlay. This occurred because the low-level window hook was answering 'HTCAPTION' to hit tests regardless of the UI state.

Solution:
1. Modified 'WindowHook' to accept an 'enabled' state.
2. When disabled, 'WM_NCHITTEST' now returns 'HTTRANSPARENT', allowing the OS to pass the click to the window underneath.
3. Updated 'main.py' to toggle this hook state dynamically:
   - ENABLED when Recording or Processing (UI is visible/active).
   - DISABLED when Idling (UI is hidden/transparent).

Result:
The overlay is now completely non-intrusive when not in use.
2026-01-24 17:51:23 +02:00
Your Name
306bd075ed Aesthetic overhaul of documentation 2026-01-24 17:29:59 +02:00
Your Name
a1cc9c61b9 Add language list and file transcription info 2026-01-24 17:27:54 +02:00
Your Name
e627e1b8aa Correct hardware detection statement in docs 2026-01-24 17:24:56 +02:00
Your Name
eaa572b42f Fix release badge for Gitea 2026-01-24 17:22:14 +02:00
Your Name
e900201214 Final documentation polish 2026-01-24 17:20:22 +02:00
Your Name
0d426aea4b Update docs with license and model stats 2026-01-24 17:16:53 +02:00
Your Name
b15ce8076f Enhance documentation 2026-01-24 17:12:21 +02:00
22 changed files with 1688 additions and 219 deletions

211
README.md
View File

@@ -1,71 +1,196 @@
# Whisper Voice <div align="center">
**Reclaim Your Voice from the Cloud.** # 🎙️ W H I S P E R &nbsp; V O I C E
### SOVEREIGN SPEECH RECOGNITION
Whisper Voice is a high-performance, strictly local speech-to-text tool designed for the desktop. It provides instant, high-accuracy dictation anywhere on your system—no internet connection required, no corporate servers, and absolutely no data harvesting. <br>
We believe that the tools of production—and communication—should belong to the individual, not rented from centralized tech giants. ![Status](https://img.shields.io/badge/STATUS-OPERATIONAL-success?style=for-the-badge&logo=server&color=2ecc71)
[![Download](https://img.shields.io/gitea/v/release/lashman/whisper_voice?gitea_url=https%3A%2F%2Fgit.lashman.live&label=Install&style=for-the-badge&logo=windows&logoColor=white&color=3b82f6)](https://git.lashman.live/lashman/whisper_voice/releases/latest)
[![License](https://img.shields.io/badge/LICENSE-PUBLIC_DOMAIN-lightgrey?style=for-the-badge&logo=creative-commons&logoColor=black)](https://creativecommons.org/publicdomain/zero/1.0/)
<br>
> *"The master's tools will never dismantle the master's house."*
> <br>
> **Build your own tools. Run them locally. Free your mind.**
[View Source](https://git.lashman.live/lashman/whisper_voice) • [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues)
</div>
<br>
<br>
## 📡 The Transmission
We are witnessing the **enshittification** of the digital world. What were once vibrant social commons are being walled off, strip-mined for data, and degraded into rent-seeking silos. Your voice is no longer your own; it is a training set for a corporate oracle that charges you for the privilege of listening.
**Whisper Voice** is a small act of sabotage against this trend.
It is built on the axiom of **Technological Sovereignty**. By moving state-of-the-art inference from the server farms to your own silicon, you reclaim the means of digital production. No telemetry. No subscriptions. No "cloud processing" that eavesdrops on your intent.
--- ---
## ✊ Core Principles ## ⚡ The Engine
### 1. Total Autonomy (Local-First) Whisper Voice operates directly on the metal. It is not an API wrapper; it is an autonomous machine.
Your voice data is yours alone. Unlike commercial alternatives that siphon your words to remote data centers for processing and profiling, Whisper Voice runs entirely on your hardware. **No masters, no servers.** You retain full sovereignty over your digital footprint.
### 2. Decentralized Power | Component | Technology | Benefit |
By leveraging optimized local processing, we strip away the need for reliance on massive, energy-hungry corporate infrastructure. This is technology scaled to the human level—powerful, efficient, and completely under your control. | :--- | :--- | :--- |
| **Inference Core** | **Faster-Whisper** | Hyper-optimized C++ implementation via **CTranslate2**. Delivers **4x velocity** over standard PyTorch. |
| **Compression** | **INT8 quantization** | Enables Pro-grade models (`Large-v3`) to run on consumer-grade GPUs, democratizing elite AI. |
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only pure intent is processed. |
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. |
### 3. Accessible to All ### 🛑 Compatibility Matrix (Windows)
High-quality speech recognition shouldn't be gated behind subscriptions or paywalls. This tool is free, open, and built to empower users to interact with their machines on their own terms. The core engine (`CTranslate2`) is heavily optimized for Nvidia tensor cores.
| Manufacturer | Hardware | Status | Notes |
| :--- | :--- | :--- | :--- |
| **Nvidia** | GTX 900+ / RTX | ✅ **Supported** | Full heavy-metal acceleration. |
| **AMD** | Radeon RX | ⚠️ **CPU Fallback** | Runs on CPU. Valid for `Small/Medium`, slow for `Large`. |
| **Intel** | Arc / Iris | ⚠️ **CPU Fallback** | Runs on CPU. Valid for `Small/Medium`, slow for `Large`. |
| **Apple** | M1 / M2 / M3 | ❌ **Unsupported** | Release is strictly Windows x64. |
> **AMD Users**: v1.0.3 auto-detects GPU failures and silently falls back to CPU.
<br>
## 🖋️ Universal Transcription
At its core, Whisper Voice is the ultimate bridge between thought and text. It listens with superhuman precision, converting spoken word into written form across **99 languages**.
* **Punctuation Mastery**: Automatically handles capitalization and complex punctuation formatting.
* **Contextual Intelligence**: Smarter than standard dictation; it understands the flow of sentences to resolve homophones and technical jargon ($1.5k vs "fifteen hundred dollars").
* **Total Privacy**: Your private dictation, legal notes, or creative writing never leave your RAM.
### Workflow: `F9 (Default)`
The primary channel for native-language transcription. It transcribes precisely what it hears in the language you speak (or the one you've locked in Settings).
### 🧠 Intelligent Correction (New in v1.1.0)
Whisper Voice now integrates a local **Llama 3.2 1B** LLM to act as a "Silent Consultant". It post-processes transcripts to fix grammar or polish style without effectively "chatting" back.
It is strictly trained on a **Forensic Protocol**: it will never lecture you, never refuse to process explicit language, and never sanitize your words. Your profanity is yours to keep.
#### Correction Modes:
* **Standard (Default)**: Fixes grammar, punctuation, and capitalization while keeping every word you said.
* **Grammar Only**: Strictly fixes objective errors (spelling/agreement). Touches nothing else.
* **Rewrite**: Polishes the flow and clarity of your sentences while explicitly preserving your original tone (Casual stays casual, Formal stays formal).
#### Supported Languages:
The correction engine is optimized for **English, German, French, Italian, Portuguese, Spanish, Hindi, and Thai**. It also performs well on **Russian, Chinese, Japanese, and Romanian**.
This approach incurs a ~2s latency penalty but uses **zero extra VRAM** when in Low VRAM mode.
<br>
## 🌎 Universal Translation
Whisper Voice v1.0.1 includes a **Neural Translation Engine** that allows you to bridge any linguistic gap instantly.
* **Input**: Speak in French, Japanese, Russian, or **96 other languages**.
* **Output**: The engine instantly reconstructs the semantic meaning into fluent **English**.
* **Task Protocol**: Handled via the dedicated `F10` channel.
### 🔍 Why only English translation?
A common question arises: *Why can't I translate from French to Japanese?*
The architecture of the underlying Whisper model is a **Many-to-English** design. During its massive training phase (680,000 hours of audio), the translation task was specifically optimized to map the global linguistic commons onto a single bridge language: **English**. This allowed the model to reach incredible levels of semantic understanding without the exponential complexity of a "Many-to-Many" mapping.
By focusing its translation decoder solely on English, Whisper achieves "Zero-Shot" quality that rivals specialized translation engines while remaining lightweight enough to run on your local GPU.
--- ---
## ✨ Features ## 🕹️ Command & Control
* **100% Offline Processing**: Once the recognition engine is downloaded, the cable can be cut. Nothing leaves your machine. ### Global Hotkeys
* **Universal Compatibility**: Works in any text field—editors, chat apps, terminals, or browsers. If you can type there, you can speak there. The agent runs silently in the background, waiting for your signal.
* **Adaptive Input**:
* *Clipboard Mode*: Standard paste injection. * **Transcribe (F9)**: Opens the channel for standard speech-to-text.
* *High-Speed Simulation*: Simulates keystrokes at supersonic speeds (up to 6000 CPM) for apps that block pasting. * **Translate (F10)**: Opens the channel for neural translation.
* **System Integration**: Minimalist overlay and system tray presence. It exists when you need it and vanishes when you don't. * **Customization**: Remap these keys in Settings. The recorder supports complex chords (e.g. `Ctrl + Alt + Space`) to fit your workflow.
* **Resource Efficiency**: Optimized to run smoothly on consumer hardware without monopolizing your system resources.
### Injection Protocols
* **Clipboard Paste**: Standard text injection. Instant, reliable.
* **Simulate Typing**: Mimics physical keystrokes at superhuman speed (6000 CPM). Bypasses anti-paste restrictions and "protected" windows.
<br>
## 📊 Intelligence Matrix
Select the model that aligns with your available resources.
| Model | VRAM (GPU) | RAM (CPU) | Designation | Capability |
| :--- | :--- | :--- | :--- | :--- |
| `Tiny` | **~500 MB** | ~1 GB | ⚡ **Supersonic** | Command & Control, older hardware. |
| `Base` | **~600 MB** | ~1 GB | 🚀 **Very Fast** | Daily driver for low-power laptops. |
| `Small` | **~1 GB** | ~2 GB | ⏩ **Fast** | High accuracy English dictation. |
| `Medium` | **~2 GB** | ~4 GB | ⚖️ **Balanced** | Complex vocabulary, foreign accents. |
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **The Sweet Spot.** Near-Large intelligence, Medium speed. |
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional grade. Uncompromised. |
> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.*
### 📉 Low VRAM Mode
For users with limited GPU memory (e.g., 4GB cards) or those running heavy games simultaneously, Whisper Voice offers a specialized **Low VRAM Mode**.
* **Behavior**: The AI model is aggressively unloaded from the GPU immediately after every transcription.
* **Benefit**: When idle, the app consumes near-zero VRAM (~0MB), leaving your GPU completely free for gaming or rendering.
* **Trade-off**: There is a "cold start" latency of 1-2 seconds for every voice command as the model reloads from the disk cache.
--- ---
## 🚀 Getting Started ## 🛠️ Deployment
### Installation ### 📥 Installation
1. Download the latest release. 1. **Acquire**: Download `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
2. Run `WhisperVoice.exe`. 2. **Deploy**: Place it anywhere. It is portable.
3. On the first run, the bootstrapper will autonomously provision the necessary runtime environment. This ensures your system remains clean and dependencies are self-contained. 3. **Bootstrap**: Run it. The agent will self-provision an isolated Python runtime (~2GB) on first launch.
4. **Sync**: Future updates are handled by the **Smart Bootstrapper**, which surgically updates only changed files, respecting your bandwidth and your settings.
### Usage ### 🔧 Troubleshooting
1. **Set Your Trigger**: Configure a global hotkey (default: `F9`) in the settings. * **App crashes on start**: Ensure you have [Microsoft Visual C++ Redistributable 2015-2022](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) installed.
2. **Speak Freely**: Hold the hotkey (or toggle it) and speak. * **"Simulate Typing" is slow**: Some applications (remote desktops, legacy games) cannot handle the data stream. Lower the typing speed in Settings to ~1200 CPM.
3. **Direct Action**: Your words are instantly transcribed and injected into your active window. * **No Audio**: The agent listens to the **Default Communication Device**. Verify your Windows Sound Control Panel.
<br>
--- ---
## ⚙️ Configuration ## 🌐 Supported Languages
The **Settings** panel puts the means of configuration in your hands: The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on **Auto-Detect** for fluid multilingual usage.
* **Recognition Engine**: Choose the size of the model that fits your hardware capabilities (Tiny to Large). Larger models offer greater precision but require more computing power. | | | | | | |
* **Input Method**: Switch between "Clipboard Paste" and "Simulate Typing" depending on target application restrictions. | :--- | :--- | :--- | :--- | :--- | :--- |
* **Typing Speed**: Adjust the keystroke injection rate. Crank it up to 6000 CPM for instant text delivery. | Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 | Armenian 🇦🇲 | Assamese 🇮🇳 |
* **Run on Startup**: Configure the agent to be ready the moment your session begins. | Azerbaijani 🇦🇿 | Bashkir 🇷🇺 | Basque 🇪🇸 | Belarusian 🇧🇾 | Bengali 🇧🇩 | Bosnian 🇧🇦 |
| Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 | Castilian 🇪🇸 | Catalan 🇪🇸 | Chinese 🇨🇳 |
| Croatian 🇭🇷 | Czech 🇨🇿 | Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 |
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 | Indonesian 🇮🇩 | Italian 🇮🇹 |
| Japanese 🇯🇵 | Javanese 🇮 Indonesa | Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 |
| Macedonian 🇲🇰 | Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 | Norwegian 🇳🇴 |
| Occitan 🇫🇷 | Panjabi 🇮🇳 | Pashto 🇦🇫 | Persian 🇮🇷 | Polish 🇵🇱 | Portuguese 🇵🇹 |
| Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 | Sanskrit 🇮🇳 | Serbian 🇷🇸 | Shona 🇿🇼 |
| Sindhi 🇵🇰 | Sinhala 🇱🇰 | Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 |
| Sundanese 🇮🇩 | Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 | Tajik 🇹🇯 | Tamil 🇮🇳 |
| Tatar 🇷🇺 | Telugu 🇮🇳 | Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 | Turkmen 🇹🇲 |
| Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e | Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿 | Yiddish 🇮🇱 |
| Yoruba 🇳🇬 | | | | | |
--- <br>
<br>
## 🤝 Mutual Aid <div align="center">
This project thrives on community collaboration. If you have improvements, fixes, or ideas, you are encouraged to contribute. We build better systems when we build them together, horizontally and transparently. ### ⚖️ PUBLIC DOMAIN (CC0 1.0)
*No Rights Reserved. No Gods. No Masters. No Managers.*
* **Report Issues**: If something breaks, let us know. Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD).
* **Contribute Code**: The source is open. Fork it, improve it, share it.
--- </div>
*Built with local processing libraries and Qt.*
*No gods, no cloud managers.*

28
RELEASE_NOTES.md Normal file
View File

@@ -0,0 +1,28 @@
# Release v1.0.4
**"The Compatibility Update"**
This release focuses on maximum stability across different hardware configurations (AMD, Intel, Nvidia) and fixing startup crashes related to corrupted models or missing drivers.
## 🛠️ Critical Fixes
### 1. Robust CPU Fallback (AMD / Intel Support)
* **Problem**: Previously, if an AMD user tried to run the app, it would crash instantly because it tried to load Nvidia CUDA libraries by default.
* **Fix**: The app now **silently detects** if CUDA initialization fails (due to missing DLLs or incompatible hardware) and **automatically falls back to CPU mode**.
* **Result**: The app "just works" on any Windows machine, regardless of GPU.
### 2. Startup Crash Protection
* **Problem**: If `faster_whisper` was imported before checking for valid drivers, the app would crash on launch for some users.
* **Fix**: Implemented **Lazy Loading** for the AI engine. The app now starts the UI first, and only loads the heavy AI libraries inside a safety block that catches errors.
### 3. Corrupt Model Auto-Repair
* **Problem**: Interrupted downloads could leave a corrupted model folder, preventing the app from ever starting again.
* **Fix**: If the app detects a "vocabulary missing" or invalid config error, it will now **automatically delete the corrupt folder** and allow you to re-download it cleanly.
### 4. Windows DLL Injection
* **Fix**: Added explicit DLL path injection for `nvidia-cublas` and `nvidia-cudnn` to ensure Python 3.8+ can find the required CUDA libraries on Windows systems that don't have them in PATH.
## 📦 Installation
1. Download `WhisperVoice.exe` below.
2. Replace your existing `.exe`.
3. Run it.

View File

@@ -245,62 +245,106 @@ class Bootstrapper:
req_file = self.source_path / "requirements.txt" req_file = self.source_path / "requirements.txt"
# Use --prefer-binary to avoid building from source on Windows if possible
# Use --no-warn-script-location to reduce noise
# CRITICAL: Force --only-binary for llama-cpp-python to prevent picking new source-only versions
cmd = [
str(self.python_path / "python.exe"), "-m", "pip", "install",
"--prefer-binary",
"--only-binary", "llama-cpp-python",
"--extra-index-url", "https://abetlen.github.io/llama-cpp-python/whl/cpu",
"-r", str(req_file)
]
process = subprocess.Popen( process = subprocess.Popen(
[str(self.python_path / "python.exe"), "-m", "pip", "install", "-r", str(req_file)], cmd,
stdout=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, stderr=subprocess.STDOUT, # Merge stderr into stdout
text=True, text=True,
cwd=str(self.python_path), cwd=str(self.python_path),
creationflags=subprocess.CREATE_NO_WINDOW creationflags=subprocess.CREATE_NO_WINDOW
) )
output_buffer = []
for line in process.stdout: for line in process.stdout:
if self.ui: self.ui.set_detail(line.strip()[:60]) line_stripped = line.strip()
process.wait() if self.ui: self.ui.set_detail(line_stripped[:60])
output_buffer.append(line_stripped)
log(line_stripped)
return_code = process.wait()
if return_code != 0:
err_msg = "\n".join(output_buffer[-15:]) # Show last 15 lines
raise RuntimeError(f"Pip install failed (Exit code {return_code}):\n{err_msg}")
def refresh_app_source(self): def refresh_app_source(self):
"""Refresh app source files. Skips if already exists to save time.""" """
# Optimization: If app/main.py exists, skip update to improve startup speed. Smartly updates app source files by only copying changed files.
# The user can delete the 'runtime' folder to force an update. Preserves user settings and reduces disk I/O.
if (self.app_path / "main.py").exists(): """
log("App already exists. Skipping update.") if self.ui: self.ui.set_status("Checking for updates...")
return True
if self.ui: self.ui.set_status("Updating app files...")
try: try:
# Preserve settings.json if it exists # 1. Ensure destination exists
settings_path = self.app_path / "settings.json" if not self.app_path.exists():
temp_settings = None self.app_path.mkdir(parents=True, exist_ok=True)
if settings_path.exists():
try:
temp_settings = settings_path.read_bytes()
except:
log("Failed to backup settings.json, it involves risk of data loss.")
if self.app_path.exists():
shutil.rmtree(self.app_path, ignore_errors=True)
shutil.copytree( # 2. Walk source and sync
self.source_path, # source_path is the temporary bundled folder
self.app_path, # app_path is the persistent runtime folder
ignore=shutil.ignore_patterns(
'__pycache__', '*.pyc', '.git', 'venv', changes_made = 0
'build', 'dist', '*.egg-info', 'runtime'
) for src_dir, dirs, files in os.walk(self.source_path):
) # Determine relative path from source root
rel_path = Path(src_dir).relative_to(self.source_path)
# Restore settings.json dst_dir = self.app_path / rel_path
if temp_settings:
try: # Ensure directory exists
settings_path.write_bytes(temp_settings) if not dst_dir.exists():
log("Restored settings.json") dst_dir.mkdir(parents=True, exist_ok=True)
except:
log("Failed to restore settings.json") for file in files:
# Skip ignored files
if file in ['__pycache__', '.git', 'settings.json'] or file.endswith('.pyc'):
continue
src_file = Path(src_dir) / file
dst_file = dst_dir / file
# Check if update needed
should_copy = False
if not dst_file.exists():
should_copy = True
else:
# Compare size first (fast)
if src_file.stat().st_size != dst_file.stat().st_size:
should_copy = True
else:
# Compare content (slower but accurate)
# Only read if size matches to verify diff
if src_file.read_bytes() != dst_file.read_bytes():
should_copy = True
if should_copy:
shutil.copy2(src_file, dst_file)
changes_made += 1
if self.ui: self.ui.set_detail(f"Updated: {file}")
# 3. Cleanup logic (Optional: remove files in dest that are not in source)
# For now, we only add/update to prevent deleting generated user files (logs, etc)
if changes_made > 0:
log(f"Update complete. {changes_made} files changed.")
else:
log("App is up to date.")
return True return True
except Exception as e: except Exception as e:
log(f"Error refreshing app source: {e}") log(f"Error refreshing app source: {e}")
# Fallback to nuclear option if sync fails completely?
# No, 'smart_sync' failing might mean permissions, nuclear wouldn't help.
return False return False
def run_app(self): def run_app(self):
@@ -323,22 +367,51 @@ class Bootstrapper:
messagebox.showerror("WhisperVoice Error", f"Failed to launch app: {e}") messagebox.showerror("WhisperVoice Error", f"Failed to launch app: {e}")
return False return False
def check_dependencies(self):
"""Check if critical dependencies are importable in the embedded python."""
if not self.is_python_ready(): return False
try:
# Check for core libs that might be missing
# We use a subprocess to check imports in the runtime environment
subprocess.check_call(
[str(self.python_path / "python.exe"), "-c", "import faster_whisper; import llama_cpp; import PySide6"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
cwd=str(self.python_path),
creationflags=subprocess.CREATE_NO_WINDOW
)
return True
except (subprocess.CalledProcessError, FileNotFoundError):
return False
def setup_and_run(self): def setup_and_run(self):
"""Full setup/update and run flow.""" """Full setup/update and run flow."""
try: try:
# 1. Ensure basics
if not self.is_python_ready(): if not self.is_python_ready():
self.download_python() self.download_python()
self._fix_pth_file() # Ensure pth is fixed immediately after download
self.install_pip() self.install_pip()
self.install_packages() # self.install_packages() # We'll do this in the dependency check step now
# Always refresh source to ensure we have the latest bundled code # Always refresh source to ensure we have the latest bundled code
self.refresh_app_source() self.refresh_app_source()
# 2. Check and Install Dependencies
# We do this AFTER refreshing source so we have the latest requirements.txt
if not self.check_dependencies():
log("Dependencies missing or incomplete. Installing...")
self.install_packages()
# Launch # Launch
if self.run_app(): if self.run_app():
if self.ui: self.ui.root.quit() if self.ui: self.ui.root.quit()
except Exception as e: except Exception as e:
messagebox.showerror("Setup Error", f"Installation failed: {e}") if self.ui:
import tkinter.messagebox as mb
mb.showerror("Setup Error", f"Installation failed: {e}") # Improved error visibility
log(f"Fatal error: {e}")
import traceback import traceback
traceback.print_exc() traceback.print_exc()

BIN
dist/WhisperVoice.exe vendored Normal file

Binary file not shown.

389
main.py
View File

@@ -9,6 +9,31 @@ app_dir = os.path.dirname(os.path.abspath(__file__))
if app_dir not in sys.path: if app_dir not in sys.path:
sys.path.insert(0, app_dir) sys.path.insert(0, app_dir)
# -----------------------------------------------------------------------------
# WINDOWS DLL FIX (CRITICAL for Portable CUDA)
# Python 3.8+ on Windows requires explicit DLL directory addition.
# -----------------------------------------------------------------------------
if os.name == 'nt' and hasattr(os, 'add_dll_directory'):
try:
from pathlib import Path
# Scan sys.path for site-packages
for p in sys.path:
path_obj = Path(p)
if path_obj.name == 'site-packages' and path_obj.exists():
nvidia_path = path_obj / "nvidia"
if nvidia_path.exists():
for subdir in nvidia_path.iterdir():
# Add 'bin' folder from each nvidia stub (cublas, cudnn, etc.)
bin_path = subdir / "bin"
if bin_path.exists():
os.add_dll_directory(str(bin_path))
# Also try adding site-packages itself just in case
# os.add_dll_directory(str(path_obj))
break
except Exception:
pass
# -----------------------------------------------------------------------------
from PySide6.QtWidgets import QApplication, QFileDialog, QMessageBox from PySide6.QtWidgets import QApplication, QFileDialog, QMessageBox
from PySide6.QtCore import QObject, Slot, Signal, QThread, Qt, QUrl from PySide6.QtCore import QObject, Slot, Signal, QThread, Qt, QUrl
from PySide6.QtQml import QQmlApplicationEngine from PySide6.QtQml import QQmlApplicationEngine
@@ -19,6 +44,7 @@ from src.ui.bridge import UIBridge
from src.ui.tray import SystemTray from src.ui.tray import SystemTray
from src.core.audio_engine import AudioEngine from src.core.audio_engine import AudioEngine
from src.core.transcriber import WhisperTranscriber from src.core.transcriber import WhisperTranscriber
from src.core.llm_engine import LLMEngine
from src.core.hotkey_manager import HotkeyManager from src.core.hotkey_manager import HotkeyManager
from src.core.config import ConfigManager from src.core.config import ConfigManager
from src.utils.injector import InputInjector from src.utils.injector import InputInjector
@@ -87,7 +113,7 @@ def _silent_shutdown_hook(exc_type, exc_value, exc_tb):
sys.excepthook = _silent_shutdown_hook sys.excepthook = _silent_shutdown_hook
class DownloadWorker(QThread): class DownloadWorker(QThread):
"""Background worker for model downloads.""" """Background worker for model downloads with REAL progress."""
progress = Signal(int) progress = Signal(int)
finished = Signal() finished = Signal()
error = Signal(str) error = Signal(str)
@@ -98,33 +124,144 @@ class DownloadWorker(QThread):
def run(self): def run(self):
try: try:
from faster_whisper import download_model import requests
from tqdm import tqdm
model_path = get_models_path() model_path = get_models_path()
# Download to a specific subdirectory to keep things clean and predictable # Determine what to download
# This matches the logic in transcriber.py which looks for this specific path
dest_dir = model_path / f"faster-whisper-{self.model_name}" dest_dir = model_path / f"faster-whisper-{self.model_name}"
logging.info(f"Downloading Model '{self.model_name}' to {dest_dir}...") repo_id = f"Systran/faster-whisper-{self.model_name}"
files = ["config.json", "model.bin", "tokenizer.json", "vocabulary.json"]
base_url = f"https://huggingface.co/{repo_id}/resolve/main"
dest_dir.mkdir(parents=True, exist_ok=True)
logging.info(f"Downloading {self.model_name} to {dest_dir}...")
# Ensure parent exists # 1. Calculate Total Size
model_path.mkdir(parents=True, exist_ok=True) total_size = 0
file_sizes = {}
# output_dir in download_model specifies where the model files are saved with requests.Session() as s:
download_model(self.model_name, output_dir=str(dest_dir)) for fname in files:
url = f"{base_url}/{fname}"
head = s.head(url, allow_redirects=True)
if head.status_code == 200:
size = int(head.headers.get('content-length', 0))
file_sizes[fname] = size
total_size += size
else:
# Fallback for vocabulary.json vs vocabulary.txt
if fname == "vocabulary.json":
# Try .txt? Or just skip if not found?
# Faster-whisper usually has vocabulary.json
pass
# 2. Download loop
downloaded_bytes = 0
with requests.Session() as s:
for fname in files:
if fname not in file_sizes: continue
url = f"{base_url}/{fname}"
dest_file = dest_dir / fname
# Resume check?
# Simpler to just overwrite for reliability unless we want complex resume logic.
# We'll overwrite.
resp = s.get(url, stream=True)
resp.raise_for_status()
with open(dest_file, 'wb') as f:
for chunk in resp.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
downloaded_bytes += len(chunk)
# Emit Progress
if total_size > 0:
pct = int((downloaded_bytes / total_size) * 100)
self.progress.emit(pct)
self.finished.emit() self.finished.emit()
except Exception as e: except Exception as e:
logging.error(f"Download failed: {e}") logging.error(f"Download failed: {e}")
self.error.emit(str(e)) self.error.emit(str(e))
class LLMDownloadWorker(QThread):
progress = Signal(int)
finished = Signal()
error = Signal(str)
def __init__(self, parent=None):
super().__init__(parent)
def run(self):
try:
import requests
# Support one model for now
url = "https://huggingface.co/hugging-quants/Llama-3.2-1B-Instruct-Q4_K_M-GGUF/resolve/main/llama-3.2-1b-instruct-q4_k_m.gguf?download=true"
fname = "llama-3.2-1b-instruct-q4_k_m.gguf"
model_path = get_models_path() / "llm" / "llama-3.2-1b-instruct"
model_path.mkdir(parents=True, exist_ok=True)
dest_file = model_path / fname
# Simple check if exists and > 0 size?
# We assume if the user clicked download, they want to download it.
with requests.Session() as s:
head = s.head(url, allow_redirects=True)
total_size = int(head.headers.get('content-length', 0))
resp = s.get(url, stream=True)
resp.raise_for_status()
downloaded = 0
with open(dest_file, 'wb') as f:
for chunk in resp.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
downloaded += len(chunk)
if total_size > 0:
pct = int((downloaded / total_size) * 100)
self.progress.emit(pct)
self.finished.emit()
except Exception as e:
logging.error(f"LLM Download failed: {e}")
self.error.emit(str(e))
class LLMWorker(QThread):
finished = Signal(str)
def __init__(self, llm_engine, text, mode, parent=None):
super().__init__(parent)
self.llm_engine = llm_engine
self.text = text
self.mode = mode
def run(self):
try:
corrected = self.llm_engine.correct_text(self.text, self.mode)
self.finished.emit(corrected)
except Exception as e:
logging.error(f"LLMWorker crashed: {e}")
self.finished.emit(self.text) # Fail safe: return original text
class TranscriptionWorker(QThread): class TranscriptionWorker(QThread):
finished = Signal(str) finished = Signal(str)
def __init__(self, transcriber, audio_data, is_file=False, parent=None): def __init__(self, transcriber, audio_data, is_file=False, parent=None, task_override=None):
super().__init__(parent) super().__init__(parent)
self.transcriber = transcriber self.transcriber = transcriber
self.audio_data = audio_data self.audio_data = audio_data
self.is_file = is_file self.is_file = is_file
self.task_override = task_override
def run(self): def run(self):
text = self.transcriber.transcribe(self.audio_data, is_file=self.is_file) text = self.transcriber.transcribe(self.audio_data, is_file=self.is_file, task=self.task_override)
self.finished.emit(text) self.finished.emit(text)
class WhisperApp(QObject): class WhisperApp(QObject):
@@ -156,6 +293,7 @@ class WhisperApp(QObject):
self.bridge.settingChanged.connect(self.on_settings_changed) self.bridge.settingChanged.connect(self.on_settings_changed)
self.bridge.hotkeysEnabledChanged.connect(self.on_hotkeys_enabled_toggle) self.bridge.hotkeysEnabledChanged.connect(self.on_hotkeys_enabled_toggle)
self.bridge.downloadRequested.connect(self.on_download_requested) self.bridge.downloadRequested.connect(self.on_download_requested)
self.bridge.llmDownloadRequested.connect(self.on_llm_download_requested)
self.engine.rootContext().setContextProperty("ui", self.bridge) self.engine.rootContext().setContextProperty("ui", self.bridge)
@@ -166,13 +304,20 @@ class WhisperApp(QObject):
self.tray.transcribe_file_requested.connect(self.transcribe_file) self.tray.transcribe_file_requested.connect(self.transcribe_file)
# Init Tooltip # Init Tooltip
hotkey = self.config.get("hotkey") from src.utils.formatters import format_hotkey
self.tray.setToolTip(f"Whisper Voice - Press {hotkey} to Record") self.format_hotkey = format_hotkey # Store ref
hk1 = self.format_hotkey(self.config.get("hotkey"))
hk2 = self.format_hotkey(self.config.get("hotkey_translate"))
self.tray.setToolTip(f"Whisper Voice\nTranscribe: {hk1}\nTranslate: {hk2}")
# 3. Logic Components Placeholders # 3. Logic Components Placeholders
self.audio_engine = None self.audio_engine = None
self.transcriber = None self.transcriber = None
self.hotkey_manager = None self.llm_engine = None
self.hk_transcribe = None
self.hk_correct = None
self.hk_translate = None
self.overlay_root = None self.overlay_root = None
# 4. Start Loader # 4. Start Loader
@@ -222,12 +367,23 @@ class WhisperApp(QObject):
self.settings_root.setVisible(False) self.settings_root.setVisible(False)
# Install Low-Level Window Hook for Transparent Hit Test # Install Low-Level Window Hook for Transparent Hit Test
# We must keep a reference to 'self.hook' so it isn't GC'd try:
# scale = self.overlay_root.devicePixelRatio() from src.utils.window_hook import WindowHook
# self.hook = WindowHook(int(self.overlay_root.winId()), 500, 300, scale) hwnd = self.overlay_root.winId()
# self.hook.install() # Initial scale from config
scale = float(self.config.get("ui_scale"))
# NOTE: HitTest hook will be installed here later
# Current Overlay Dimensions
win_w = int(460 * scale)
win_h = int(180 * scale)
self.window_hook = WindowHook(hwnd, win_w, win_h, initial_scale=scale)
self.window_hook.install()
# Initial state: Disabled because we start inactive
self.window_hook.set_enabled(False)
except Exception as e:
logging.error(f"Failed to install WindowHook: {e}")
def center_overlay(self): def center_overlay(self):
"""Calculates and sets the Overlay position above the taskbar.""" """Calculates and sets the Overlay position above the taskbar."""
@@ -255,14 +411,77 @@ class WhisperApp(QObject):
self.audio_engine.set_visualizer_callback(self.bridge.update_amplitude) self.audio_engine.set_visualizer_callback(self.bridge.update_amplitude)
self.audio_engine.set_silence_callback(self.on_silence_detected) self.audio_engine.set_silence_callback(self.on_silence_detected)
self.transcriber = WhisperTranscriber() self.transcriber = WhisperTranscriber()
self.hotkey_manager = HotkeyManager() self.llm_engine = LLMEngine()
self.hotkey_manager.triggered.connect(self.toggle_recording)
self.hotkey_manager.start() # Dual Hotkey Managers
self.hk_transcribe = HotkeyManager(config_key="hotkey")
self.hk_transcribe.triggered.connect(lambda: self.toggle_recording(task_override="transcribe", task_mode="standard"))
self.hk_transcribe.start()
self.hk_correct = HotkeyManager(config_key="hotkey_correct")
self.hk_correct.triggered.connect(lambda: self.toggle_recording(task_override="transcribe", task_mode="correct"))
self.hk_correct.start()
self.hk_translate = HotkeyManager(config_key="hotkey_translate")
self.hk_translate.triggered.connect(lambda: self.toggle_recording(task_override="translate", task_mode="standard"))
self.hk_translate.start()
self.bridge.update_status("Ready") self.bridge.update_status("Ready")
def run(self): def run(self):
sys.exit(self.qt_app.exec()) sys.exit(self.qt_app.exec())
@Slot(str, str)
@Slot(str)
def toggle_recording(self, task_override=None, task_mode="standard"):
"""
task_override: 'transcribe' or 'translate' (passed to whisper)
task_mode: 'standard' or 'correct' (determines post-processing)
"""
if task_mode == "correct":
self.current_task_requires_llm = True
elif task_mode == "standard":
self.current_task_requires_llm = False # Explicit reset
# Actual Logic
if self.bridge.isRecording:
logging.info("Stopping recording...")
# stop_recording returns the numpy array directly
audio_data = self.audio_engine.stop_recording()
self.bridge.isRecording = False
self.bridge.update_status("Processing...")
self.bridge.isProcessing = True
# Save task override for processing
self.last_task_override = task_override
if audio_data is not None and len(audio_data) > 0:
# Use the task that started this session, or the override if provided
final_task = getattr(self, "current_recording_task", self.config.get("task"))
if task_override: final_task = task_override
self.worker = TranscriptionWorker(self.transcriber, audio_data, parent=self, task_override=final_task)
self.worker.finished.connect(self.on_transcription_done)
self.worker.start()
else:
self.bridge.update_status("Ready")
self.bridge.isProcessing = False
else:
# START RECORDING
if self.bridge.isProcessing:
logging.warning("Ignored toggle request: Transcription in progress.")
return
intended_task = task_override if task_override else self.config.get("task")
self.current_recording_task = intended_task
logging.info(f"Starting recording... (Task: {intended_task}, Mode: {task_mode})")
self.audio_engine.start_recording()
self.bridge.isRecording = True
self.bridge.update_status(f"Recording ({intended_task})...")
@Slot() @Slot()
def quit_app(self): def quit_app(self):
logging.info("Shutting down...") logging.info("Shutting down...")
@@ -275,7 +494,8 @@ class WhisperApp(QObject):
except: pass except: pass
self.bridge.stats_worker.stop() self.bridge.stats_worker.stop()
if self.hotkey_manager: self.hotkey_manager.stop() if self.hk_transcribe: self.hk_transcribe.stop()
if self.hk_translate: self.hk_translate.stop()
# Close all QML windows to ensure bindings stop before Python objects die # Close all QML windows to ensure bindings stop before Python objects die
if self.overlay_root: if self.overlay_root:
@@ -350,10 +570,16 @@ class WhisperApp(QObject):
print(f"Setting Changed: {key} = {value}") print(f"Setting Changed: {key} = {value}")
# 1. Hotkey Reload # 1. Hotkey Reload
if key == "hotkey": if key in ["hotkey", "hotkey_translate", "hotkey_correct"]:
if self.hotkey_manager: self.hotkey_manager.reload_hotkey() if self.hk_transcribe: self.hk_transcribe.reload_hotkey()
if self.hk_correct: self.hk_correct.reload_hotkey()
if self.hk_translate: self.hk_translate.reload_hotkey()
if self.tray: if self.tray:
self.tray.setToolTip(f"Whisper Voice - Press {value} to Record") hk1 = self.format_hotkey(self.config.get("hotkey"))
hk3 = self.format_hotkey(self.config.get("hotkey_correct"))
hk2 = self.format_hotkey(self.config.get("hotkey_translate"))
self.tray.setToolTip(f"Whisper Voice\nTranscribe: {hk1}\nCorrect: {hk3}\nTranslate: {hk2}")
# 2. AI Model Reload (Heavy) # 2. AI Model Reload (Heavy)
if key in ["model_size", "compute_device", "compute_type"]: if key in ["model_size", "compute_device", "compute_type"]:
@@ -456,6 +682,8 @@ class WhisperApp(QObject):
file_path, _ = QFileDialog.getOpenFileName(None, "Select Audio", "", "Audio (*.mp3 *.wav *.flac *.m4a *.ogg)") file_path, _ = QFileDialog.getOpenFileName(None, "Select Audio", "", "Audio (*.mp3 *.wav *.flac *.m4a *.ogg)")
if file_path: if file_path:
self.bridge.update_status("Thinking...") self.bridge.update_status("Thinking...")
# Files use the default configured task usually, or we could ask?
# Default to config setting for files.
self.worker = TranscriptionWorker(self.transcriber, file_path, is_file=True, parent=self) self.worker = TranscriptionWorker(self.transcriber, file_path, is_file=True, parent=self)
self.worker.finished.connect(self.on_transcription_done) self.worker.finished.connect(self.on_transcription_done)
self.worker.start() self.worker.start()
@@ -463,48 +691,73 @@ class WhisperApp(QObject):
@Slot() @Slot()
def on_silence_detected(self): def on_silence_detected(self):
from PySide6.QtCore import QMetaObject, Qt from PySide6.QtCore import QMetaObject, Qt
# Silence detection always triggers the task that was active?
# Since silence stops recording, it just calls toggle_recording with no arg, using the stored current_task?
# Let's ensure toggle_recording handles no arg calls by stopping the CURRENT task.
QMetaObject.invokeMethod(self, "toggle_recording", Qt.QueuedConnection) QMetaObject.invokeMethod(self, "toggle_recording", Qt.QueuedConnection)
@Slot()
def toggle_recording(self):
if not self.audio_engine: return
# Prevent starting a new recording while we are still transcribing the last one
if self.bridge.isProcessing:
logging.warning("Ignored toggle request: Transcription in progress.")
return
if self.audio_engine.recording:
self.bridge.update_status("Thinking...")
self.bridge.isRecording = False
self.bridge.isProcessing = True # Start Processing
audio_data = self.audio_engine.stop_recording()
self.worker = TranscriptionWorker(self.transcriber, audio_data, parent=self)
self.worker.finished.connect(self.on_transcription_done)
self.worker.start()
else:
self.bridge.update_status("Recording")
self.bridge.isRecording = True
self.audio_engine.start_recording()
@Slot(bool) @Slot(bool)
def on_ui_toggle_request(self, state): def on_ui_toggle_request(self, state):
if state != self.audio_engine.recording: if state != self.audio_engine.recording:
self.toggle_recording() self.toggle_recording() # Default behavior for UI clicks
@Slot(str) @Slot(str)
def on_transcription_done(self, text: str): def on_transcription_done(self, text: str):
self.bridge.update_status("Ready") self.bridge.update_status("Ready")
self.bridge.isProcessing = False # End Processing self.bridge.isProcessing = False # Temporarily false? No, keep it true if we chain.
# Check LLM Settings -> AND check if the current task requested it
llm_enabled = self.config.get("llm_enabled")
requires_llm = getattr(self, "current_task_requires_llm", False)
# We only correct if:
# 1. LLM is globally enabled (safety switch)
# 2. current_task_requires_llm is True (triggered by Correct hotkey)
# OR 3. Maybe user WANTS global correction? Ideally user uses separate hotkey.
# Let's say: If "Correction" is enabled in settings, does it apply to ALL?
# The user's feedback suggests they DON'T want it on regular hotkey.
# So we enforce: Correct Hotkey -> Corrects. Regular Hotkey -> Raw.
# BUT we must handle the case where user expects the old behavior?
# Let's make it strict: Only correct if triggered by correct hotkey OR if we add a "Correct All" toggle later.
# For now, let's respect the flag. But wait, if llm_enabled is OFF, we shouldn't run it even if hotkey pressed?
# Yes, safety switch.
if text and llm_enabled and requires_llm:
# Chain to LLM
self.bridge.isProcessing = True
self.bridge.update_status("Correcting...")
mode = self.config.get("llm_mode")
self.llm_worker = LLMWorker(self.llm_engine, text, mode, parent=self)
self.llm_worker.finished.connect(self.on_llm_done)
self.llm_worker.start()
return
self.bridge.isProcessing = False
if text: if text:
method = self.config.get("input_method") method = self.config.get("input_method")
speed = int(self.config.get("typing_speed")) speed = int(self.config.get("typing_speed"))
InputInjector.inject_text(text, method, speed) InputInjector.inject_text(text, method, speed)
@Slot(str)
def on_llm_done(self, text: str):
self.bridge.update_status("Ready")
self.bridge.isProcessing = False
if text:
method = self.config.get("input_method")
speed = int(self.config.get("typing_speed"))
InputInjector.inject_text(text, method, speed)
# Cleanup
if hasattr(self, 'llm_worker') and self.llm_worker:
self.llm_worker.deleteLater()
self.llm_worker = None
@Slot(bool) @Slot(bool)
def on_hotkeys_enabled_toggle(self, state): def on_hotkeys_enabled_toggle(self, state):
if self.hotkey_manager: if self.hk_transcribe: self.hk_transcribe.set_enabled(state)
self.hotkey_manager.set_enabled(state) if self.hk_translate: self.hk_translate.set_enabled(state)
@Slot(str) @Slot(str)
def on_download_requested(self, size): def on_download_requested(self, size):
@@ -519,6 +772,19 @@ class WhisperApp(QObject):
self.download_worker.error.connect(self.on_download_error) self.download_worker.error.connect(self.on_download_error)
self.download_worker.start() self.download_worker.start()
@Slot()
def on_llm_download_requested(self):
if self.bridge.isDownloading: return
self.bridge.update_status("Downloading LLM...")
self.bridge.isDownloading = True
self.llm_dl_worker = LLMDownloadWorker(parent=self)
self.llm_dl_worker.progress.connect(self.on_loader_progress) # Reuse existing progress slot? Yes.
self.llm_dl_worker.finished.connect(self.on_download_finished) # Reuses same cleanup
self.llm_dl_worker.error.connect(self.on_download_error)
self.llm_dl_worker.start()
def on_download_finished(self): def on_download_finished(self):
self.bridge.isDownloading = False self.bridge.isDownloading = False
self.bridge.update_status("Ready") self.bridge.update_status("Ready")
@@ -531,6 +797,25 @@ class WhisperApp(QObject):
self.bridge.update_status("Error") self.bridge.update_status("Error")
logging.error(f"Download Error: {err}") logging.error(f"Download Error: {err}")
@Slot(bool)
def on_ui_toggle_request(self, is_recording):
"""Called when recording state changes."""
# Update Window Hook to allow clicking if active
is_active = is_recording or self.bridge.isProcessing
if hasattr(self, 'window_hook'):
self.window_hook.set_enabled(is_active)
@Slot(bool)
def on_processing_changed(self, is_processing):
is_active = self.bridge.isRecording or is_processing
if hasattr(self, 'window_hook'):
self.window_hook.set_enabled(is_active)
if __name__ == "__main__": if __name__ == "__main__":
import sys
app = WhisperApp() app = WhisperApp()
app.run()
# Connect extra signal for processing state
app.bridge.isProcessingChanged.connect(app.on_processing_changed)
sys.exit(app.run())

View File

@@ -39,39 +39,37 @@ def build_portable():
print("⏳ This may take 5-10 minutes...") print("⏳ This may take 5-10 minutes...")
PyInstaller.__main__.run([ PyInstaller.__main__.run([
"main.py", # Entry point "bootstrapper.py", # Entry point (Tiny Installer)
"--name=WhisperVoice", # EXE name "--name=WhisperVoice", # EXE name
"--onefile", # Single EXE (slower startup but portable) "--onefile", # Single EXE
"--noconsole", # No terminal window "--noconsole", # No terminal window
"--clean", # Clean cache "--clean", # Clean cache
*add_data_args, # Bundled assets
# Heavy libraries that need special collection # Bundle the app source to be extracted by bootstrapper
"--collect-all", "faster_whisper", # The bootstrapper expects 'app_source' folder in bundled resources
"--collect-all", "ctranslate2", "--add-data", f"src{os.pathsep}app_source/src",
"--collect-all", "PySide6", "--add-data", f"main.py{os.pathsep}app_source",
"--collect-all", "torch", "--add-data", f"requirements.txt{os.pathsep}app_source",
"--collect-all", "numpy",
# Hidden imports (modules imported dynamically) # Add assets
"--hidden-import", "keyboard", "--add-data", f"src/ui/qml{os.pathsep}app_source/src/ui/qml",
"--hidden-import", "pyperclip", "--add-data", f"assets{os.pathsep}app_source/assets",
"--hidden-import", "psutil",
"--hidden-import", "pynvml",
"--hidden-import", "sounddevice",
"--hidden-import", "scipy",
"--hidden-import", "scipy.signal",
"--hidden-import", "huggingface_hub",
"--hidden-import", "tokenizers",
# Qt plugins # No heavy collections!
"--hidden-import", "PySide6.QtQuickControls2", # The bootstrapper uses internal pip to install everything.
"--hidden-import", "PySide6.QtQuick.Controls",
# Icon (convert to .ico for Windows) # Exclude heavy modules to ensure this exe stays tiny
# "--icon=icon.ico", # Uncomment if you have a .ico file "--exclude-module", "faster_whisper",
"--exclude-module", "torch",
"--exclude-module", "PySide6",
"--exclude-module", "llama_cpp",
# Icon
# "--icon=icon.ico",
]) ])
print("\n" + "="*60) print("\n" + "="*60)
print("✅ BUILD COMPLETE!") print("✅ BUILD COMPLETE!")
print("="*60) print("="*60)

73
publish_release.py Normal file
View File

@@ -0,0 +1,73 @@
import os
import requests
import mimetypes
# Configuration
API_URL = "https://git.lashman.live/api/v1"
OWNER = "lashman"
REPO = "whisper_voice"
TAG = "v1.0.4"
TOKEN = "6153890332afff2d725aaf4729bc54b5030d5700" # Extracted from git config
EXE_PATH = r"dist\WhisperVoice.exe"
headers = {
"Authorization": f"token {TOKEN}",
"Accept": "application/json"
}
def create_release():
print(f"Creating release {TAG}...")
# Read Release Notes
with open("RELEASE_NOTES.md", "r", encoding="utf-8") as f:
notes = f.read()
# Create Release
payload = {
"tag_name": TAG,
"name": TAG,
"body": notes,
"draft": False,
"prerelease": False
}
url = f"{API_URL}/repos/{OWNER}/{REPO}/releases"
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code == 201:
print("Release created successfully!")
return resp.json()
elif resp.status_code == 409:
print("Release already exists. Fetching it...")
# Get by tag
resp = requests.get(f"{API_URL}/repos/{OWNER}/{REPO}/releases/tags/{TAG}", headers=headers)
if resp.status_code == 200:
return resp.json()
print(f"Failed to create release: {resp.status_code} - {resp.text}")
return None
def upload_asset(release_id, file_path):
print(f"Uploading asset: {file_path}...")
filename = os.path.basename(file_path)
with open(file_path, "rb") as f:
data = f.read()
url = f"{API_URL}/repos/{OWNER}/{REPO}/releases/{release_id}/assets?name={filename}"
# Gitea API expects raw body
resp = requests.post(url, data=data, headers=headers)
if resp.status_code == 201:
print(f"Uploaded {filename} successfully!")
else:
print(f"Failed to upload asset: {resp.status_code} - {resp.text}")
def main():
release = create_release()
if release:
upload_asset(release["id"], EXE_PATH)
if __name__ == "__main__":
main()

View File

@@ -5,6 +5,7 @@
faster-whisper>=1.0.0 faster-whisper>=1.0.0
torch>=2.0.0 torch>=2.0.0
# UI Framework # UI Framework
PySide6>=6.6.0 PySide6>=6.6.0
@@ -28,3 +29,6 @@ huggingface-hub>=0.20.0
pystray>=0.19.0 pystray>=0.19.0
Pillow>=10.0.0 Pillow>=10.0.0
darkdetect>=0.8.0 darkdetect>=0.8.0
# LLM / Correction
llama-cpp-python>=0.2.20

View File

@@ -16,6 +16,8 @@ from src.core.paths import get_base_path
# Default Configuration # Default Configuration
DEFAULT_SETTINGS = { DEFAULT_SETTINGS = {
"hotkey": "f8", "hotkey": "f8",
"hotkey_translate": "f10",
"hotkey_correct": "f9", # New: Transcribe + Correct
"model_size": "small", "model_size": "small",
"input_device": None, # Device ID (int) or Name (str), None = Default "input_device": None, # Device ID (int) or Name (str), None = Default
"save_recordings": False, # Save .wav files for debugging "save_recordings": False, # Save .wav files for debugging
@@ -38,13 +40,25 @@ DEFAULT_SETTINGS = {
# AI - Advanced # AI - Advanced
"language": "auto", # "auto" or ISO code "language": "auto", # "auto" or ISO code
"task": "transcribe", # "transcribe" or "translate" (to English)
"compute_device": "auto", # "auto", "cuda", "cpu" "compute_device": "auto", # "auto", "cuda", "cpu"
"compute_type": "int8", # "int8", "float16", "float32" "compute_type": "int8", # "int8", "float16", "float32"
"beam_size": 5, "beam_size": 5,
"best_of": 5, "best_of": 5,
"vad_filter": True, "vad_filter": True,
"no_repeat_ngram_size": 0, "no_repeat_ngram_size": 0,
"condition_on_previous_text": True "condition_on_previous_text": True,
"initial_prompt": "Mm-hmm. Okay, let's go. I speak in full sentences.", # Default: Forces punctuation
# LLM Correction
"llm_enabled": False,
"llm_mode": "Standard", # "Grammar", "Standard", "Rewrite"
"llm_model_name": "llama-3.2-1b-instruct",
# Low VRAM Mode
"unload_models_after_use": False # If True, models are unloaded immediately to free VRAM
} }
class ConfigManager: class ConfigManager:
@@ -94,9 +108,9 @@ class ConfigManager:
except Exception as e: except Exception as e:
logging.error(f"Failed to save settings: {e}") logging.error(f"Failed to save settings: {e}")
def get(self, key: str) -> Any: def get(self, key: str, default: Any = None) -> Any:
"""Get a setting value.""" """Get a setting value."""
return self.data.get(key, DEFAULT_SETTINGS.get(key)) return self.data.get(key, DEFAULT_SETTINGS.get(key, default))

View File

@@ -30,15 +30,16 @@ class HotkeyManager(QObject):
triggered = Signal() triggered = Signal()
def __init__(self, hotkey: str = "f8"): def __init__(self, config_key: str = "hotkey"):
""" """
Initialize the HotkeyManager. Initialize the HotkeyManager.
Args: Args:
hotkey (str): The global hotkey string description. Default: "f8". config_key (str): The configuration key to look up (e.g. "hotkey").
""" """
super().__init__() super().__init__()
self.hotkey = hotkey self.config_key = config_key
self.hotkey = "f8" # Placeholder
self.is_listening = False self.is_listening = False
self._enabled = True self._enabled = True
@@ -58,9 +59,9 @@ class HotkeyManager(QObject):
from src.core.config import ConfigManager from src.core.config import ConfigManager
config = ConfigManager() config = ConfigManager()
self.hotkey = config.get("hotkey") self.hotkey = config.get(self.config_key)
logging.info(f"Registering global hotkey: {self.hotkey}") logging.info(f"Registering global hotkey ({self.config_key}): {self.hotkey}")
try: try:
# We don't suppress=True here because we want the app to see keys during recording # We don't suppress=True here because we want the app to see keys during recording
# (Wait, actually if we are recording we WANT keyboard to see it, # (Wait, actually if we are recording we WANT keyboard to see it,

120
src/core/languages.py Normal file
View File

@@ -0,0 +1,120 @@
"""
Supported Languages Module
==========================
Full list of languages supported by OpenAI Whisper.
Maps ISO codes to display names.
"""
LANGUAGES = {
"auto": "Auto Detect",
"af": "Afrikaans",
"sq": "Albanian",
"am": "Amharic",
"ar": "Arabic",
"hy": "Armenian",
"as": "Assamese",
"az": "Azerbaijani",
"ba": "Bashkir",
"eu": "Basque",
"be": "Belarusian",
"bn": "Bengali",
"bs": "Bosnian",
"br": "Breton",
"bg": "Bulgarian",
"my": "Burmese",
"ca": "Catalan",
"zh": "Chinese",
"hr": "Croatian",
"cs": "Czech",
"da": "Danish",
"nl": "Dutch",
"en": "English",
"et": "Estonian",
"fo": "Faroese",
"fi": "Finnish",
"fr": "French",
"gl": "Galician",
"ka": "Georgian",
"de": "German",
"el": "Greek",
"gu": "Gujarati",
"ht": "Haitian",
"ha": "Hausa",
"haw": "Hawaiian",
"he": "Hebrew",
"hi": "Hindi",
"hu": "Hungarian",
"is": "Icelandic",
"id": "Indonesian",
"it": "Italian",
"ja": "Japanese",
"jw": "Javanese",
"kn": "Kannada",
"kk": "Kazakh",
"km": "Khmer",
"ko": "Korean",
"lo": "Lao",
"la": "Latin",
"lv": "Latvian",
"ln": "Lingala",
"lt": "Lithuanian",
"lb": "Luxembourgish",
"mk": "Macedonian",
"mg": "Malagasy",
"ms": "Malay",
"ml": "Malayalam",
"mt": "Maltese",
"mi": "Maori",
"mr": "Marathi",
"mn": "Mongolian",
"ne": "Nepali",
"no": "Norwegian",
"oc": "Occitan",
"pa": "Punjabi",
"ps": "Pashto",
"fa": "Persian",
"pl": "Polish",
"pt": "Portuguese",
"ro": "Romanian",
"ru": "Russian",
"sa": "Sanskrit",
"sr": "Serbian",
"sn": "Shona",
"sd": "Sindhi",
"si": "Sinhala",
"sk": "Slovak",
"sl": "Slovenian",
"so": "Somali",
"es": "Spanish",
"su": "Sundanese",
"sw": "Swahili",
"sv": "Swedish",
"tl": "Tagalog",
"tg": "Tajik",
"ta": "Tamil",
"tt": "Tatar",
"te": "Telugu",
"th": "Thai",
"bo": "Tibetan",
"tr": "Turkish",
"tk": "Turkmen",
"uk": "Ukrainian",
"ur": "Urdu",
"uz": "Uzbek",
"vi": "Vietnamese",
"cy": "Welsh",
"yi": "Yiddish",
"yo": "Yoruba",
}
def get_language_names():
return list(LANGUAGES.values())
def get_code_by_name(name):
for code, lang in LANGUAGES.items():
if lang == name:
return code
return "auto"
def get_name_by_code(code):
return LANGUAGES.get(code, "Auto Detect")

185
src/core/llm_engine.py Normal file
View File

@@ -0,0 +1,185 @@
"""
LLM Engine Module.
==================
Handles interaction with the local Llama 3.2 1B model for transcription correction.
Uses llama-cpp-python for efficient local inference.
"""
import os
import logging
from typing import Optional
from src.core.paths import get_models_path
from src.core.config import ConfigManager
try:
from llama_cpp import Llama
except ImportError:
Llama = None
class LLMEngine:
"""
Manages the Llama model and performs text correction/rewriting.
"""
def __init__(self):
self.config = ConfigManager()
self.model = None
self.current_model_path = None
# --- Mode 1: Grammar Only (Strict) ---
self.prompt_grammar = (
"You are a text correction tool. "
"Correct the grammar/spelling. Do not change punctuation or capitalization styles. "
"Do not remove any words (including profanity). Output ONLY the result."
"\n\nExample:\nInput: 'damn it works'\nOutput: 'damn it works'"
)
# --- Mode 2: Standard (Grammar + Punctuation + Caps) ---
self.prompt_standard = (
"You are a text correction tool. "
"Standardize the grammar, punctuation, and capitalization. "
"Do not remove any words (including profanity). Output ONLY the result."
"\n\nExample:\nInput: 'damn it works'\nOutput: 'Damn it works.'"
)
# --- Mode 3: Rewrite (Tone-Aware Polish) ---
self.prompt_rewrite = (
"You are a text rewriting tool. Improve flow/clarity but keep the exact tone and vocabulary. "
"Do not remove any words (including profanity). Output ONLY the result."
"\n\nExample:\nInput: 'damn it works'\nOutput: 'Damn, it works.'"
)
def load_model(self) -> bool:
"""
Loads the LLM model if it exists.
Returns True if successful, False otherwise.
"""
if Llama is None:
logging.error("llama-cpp-python not installed.")
return False
model_name = self.config.get("llm_model_name", "llama-3.2-1b-instruct")
model_dir = get_models_path() / "llm" / model_name
model_file = model_dir / "llama-3.2-1b-instruct-q4_k_m.gguf"
if not model_file.exists():
logging.warning(f"LLM Model not found at: {model_file}")
return False
if self.model and self.current_model_path == str(model_file):
return True
try:
logging.info(f"Loading LLM from {model_file}...")
n_gpu_layers = 0
try:
import torch
if torch.cuda.is_available():
n_gpu_layers = -1
except:
pass
self.model = Llama(
model_path=str(model_file),
n_gpu_layers=n_gpu_layers,
n_ctx=2048,
verbose=False
)
self.current_model_path = str(model_file)
logging.info("LLM loaded successfully.")
return True
except Exception as e:
logging.error(f"Failed to load LLM: {e}")
self.model = None
return False
def correct_text(self, text: str, mode: str = "Standard") -> str:
"""Corrects or rewrites the provided text."""
if not text or not text.strip():
return text
if not self.model:
if not self.load_model():
return text
logging.info(f"LLM Processing ({mode}): '{text}'")
system_prompt = self.prompt_standard
if mode == "Grammar": system_prompt = self.prompt_grammar
elif mode == "Rewrite": system_prompt = self.prompt_rewrite
# PREFIX INJECTION TECHNIQUE
# We end the prompt with the start of the assistant's answer specifically phrased to force compliance.
# "Here is the processed output:" forces it into a completion mode rather than a refusal mode.
prefix_injection = "Here is the processed output:\n"
prompt = (
f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|>"
f"<|start_header_id|>user<|end_header_id|>\n\nProcess this input:\n{text}<|eot_id|>"
f"<|start_header_id|>assistant<|end_header_id|>\n\n{prefix_injection}"
)
try:
output = self.model(
prompt,
max_tokens=512,
stop=["<|eot_id|>"],
echo=False,
temperature=0.1
)
result = output['choices'][0]['text'].strip()
# 1. Fallback: If result is empty, it might have just outputted nothing because we prefilled?
# Actually llama-cpp-python usually returns the *continuation*.
# So if it outputted "My corrected text.", the full logical response is "Here is...: My corrected text."
# We just want the result.
# Refusal Detection (Safety Net)
refusal_triggers = [
"I cannot", "I can't", "I am unable", "I apologize", "sorry",
"As an AI", "explicit content", "harmful content", "safety guidelines"
]
lower_res = result.lower()
if any(trig in lower_res for trig in refusal_triggers) and len(result) < 150:
logging.warning(f"LLM Refusal Detected: '{result}'. Falling back to original.")
return text # Return original text on refusal!
# --- Robust Post-Processing ---
# 1. Strip quotes
if result.startswith('"') and result.endswith('"') and len(result) > 2 and '"' not in result[1:-1]:
result = result[1:-1]
if result.startswith("'") and result.endswith("'") and len(result) > 2 and "'" not in result[1:-1]:
result = result[1:-1]
# 2. Split by newline
if "\n" in result:
lines = result.split('\n')
clean_lines = [l.strip() for l in lines if l.strip()]
if clean_lines:
result = clean_lines[0]
# 3. Aggressive Preamble Stripping (Updates for new prefix)
import re
prefixes = [
r"^Here is the processed output:?\s*", # The one we injected
r"^Here is the corrected text:?\s*",
r"^Here is the rewritten text:?\s*",
r"^Here's the result:?\s*",
r"^Sure,? here is regex.*:?\s*",
r"^Output:?\s*",
r"^Processing result:?\s*",
]
for p in prefixes:
result = re.sub(p, "", result, flags=re.IGNORECASE).strip()
if result.startswith('"') and result.endswith('"') and len(result) > 2 and '"' not in result[1:-1]:
result = result[1:-1]
logging.info(f"LLM Result: '{result}'")
return result
except Exception as e:
logging.error(f"LLM inference failed: {e}")
return text # Fail safe logic

View File

@@ -15,8 +15,13 @@ import numpy as np
from src.core.config import ConfigManager from src.core.config import ConfigManager
from src.core.paths import get_models_path from src.core.paths import get_models_path
try:
import torch
except ImportError:
torch = None
# Import directly - valid since we are now running in the full environment # Import directly - valid since we are now running in the full environment
from faster_whisper import WhisperModel
class WhisperTranscriber: class WhisperTranscriber:
""" """
@@ -57,13 +62,32 @@ class WhisperTranscriber:
# Force offline if path exists to avoid HF errors # Force offline if path exists to avoid HF errors
local_only = new_path.exists() local_only = new_path.exists()
self.model = WhisperModel( try:
model_input, from faster_whisper import WhisperModel
device=device, self.model = WhisperModel(
compute_type=compute, model_input,
download_root=str(get_models_path()), device=device,
local_files_only=local_only compute_type=compute,
) download_root=str(get_models_path()),
local_files_only=local_only
)
except Exception as load_err:
# CRITICAL FALLBACK: If CUDA/cublas fails (AMD/Intel users), fallback to CPU
err_str = str(load_err).lower()
if "cublas" in err_str or "cudnn" in err_str or "library" in err_str or "device" in err_str:
logging.warning(f"CUDA Init Failed ({load_err}). Falling back to CPU...")
self.config.set("compute_device", "cpu") # Update config for persistence/UI
self.current_compute_device = "cpu"
self.model = WhisperModel(
model_input,
device="cpu",
compute_type="int8", # CPU usually handles int8 well with newer extensions, or standard
download_root=str(get_models_path()),
local_files_only=local_only
)
else:
raise load_err
self.current_model_size = size self.current_model_size = size
self.current_compute_device = device self.current_compute_device = device
@@ -73,42 +97,120 @@ class WhisperTranscriber:
except Exception as e: except Exception as e:
logging.error(f"Failed to load model: {e}") logging.error(f"Failed to load model: {e}")
self.model = None self.model = None
# Auto-Repair: Detect vocabulary/corrupt errors
err_str = str(e).lower()
if "vocabulary" in err_str or "tokenizer" in err_str or "config.json" in err_str:
# ... existing auto-repair logic ...
logging.warning("Corrupt model detected on load. Attempting to delete and reset...")
try:
import shutil
# Differentiate between simple path and HF path
new_path = get_models_path() / f"faster-whisper-{size}"
if new_path.exists():
shutil.rmtree(new_path)
logging.info(f"Deleted corrupt model at {new_path}")
else:
# Try legacy HF path
hf_path = get_models_path() / f"models--Systran--faster-whisper-{size}"
if hf_path.exists():
shutil.rmtree(hf_path)
logging.info(f"Deleted corrupt HF model at {hf_path}")
# Notify UI to refresh state (will show 'Download' button now)
# We can't reach bridge easily here without passing it in,
# but the UI polls or listens to logs.
# The user will simply see "Model Missing" in settings after this.
except Exception as del_err:
logging.error(f"Failed to delete corrupt model: {del_err}")
def transcribe(self, audio_data, is_file: bool = False) -> str: def transcribe(self, audio_data, is_file: bool = False, task: Optional[str] = None) -> str:
""" """
Transcribe audio data. Transcribe audio data.
""" """
logging.info(f"Starting transcription... (is_file={is_file})") logging.info(f"Starting transcription... (is_file={is_file}, task={task})")
# Ensure model is loaded # Ensure model is loaded
if not self.model: if not self.model:
self.load_model() self.load_model()
if not self.model: if not self.model:
return "Error: Model failed to load." return "Error: Model failed to load. Please check Settings -> Model Info."
try: try:
# Config # Config
beam_size = int(self.config.get("beam_size")) beam_size = int(self.config.get("beam_size"))
best_of = int(self.config.get("best_of")) best_of = int(self.config.get("best_of"))
vad = False if is_file else self.config.get("vad_filter") vad = False if is_file else self.config.get("vad_filter")
language = self.config.get("language")
# Use task override if provided, otherwise config
# Ensure safe string and lowercase ("transcribe" vs "Transcribe")
raw_task = task if task else self.config.get("task")
final_task = str(raw_task).strip().lower() if raw_task else "transcribe"
# Sanity check for valid Whisper tasks
if final_task not in ["transcribe", "translate"]:
logging.warning(f"Invalid task '{final_task}' detected. Defaulting to 'transcribe'.")
final_task = "transcribe"
# Language handling
final_language = language if language != "auto" else None
# Anti-Hallucination: Force condition_on_previous_text=False for translation
condition_prev = self.config.get("condition_on_previous_text")
# Helper options for Translation Stability
initial_prompt = self.config.get("initial_prompt")
if final_task == "translate":
condition_prev = False
# Force beam search if user has set it to greedy (1)
# Translation requires more search breadth to find the English mapping
if beam_size < 5:
logging.info("Forcing beam_size=5 for Translation task.")
beam_size = 5
# Inject guidance prompt if none exists
if not initial_prompt:
initial_prompt = "Translate this to English."
logging.info(f"Model Dispatch: Task='{final_task}', Language='{final_language}', ConditionPrev={condition_prev}, Beam={beam_size}")
# Build arguments dynamically to avoid passing None if that's the issue
transcribe_opts = {
"beam_size": beam_size,
"best_of": best_of,
"vad_filter": vad,
"task": final_task,
"vad_parameters": dict(min_silence_duration_ms=500),
"condition_on_previous_text": condition_prev,
"without_timestamps": True
}
if initial_prompt:
transcribe_opts["initial_prompt"] = initial_prompt
# Only add language if it's explicitly set (not None/Auto)
# This avoids potentially confusing the model with explicit None
if final_language:
transcribe_opts["language"] = final_language
# Transcribe # Transcribe
segments, info = self.model.transcribe( segments, info = self.model.transcribe(audio_data, **transcribe_opts)
audio_data,
beam_size=beam_size,
best_of=best_of,
vad_filter=vad,
vad_parameters=dict(min_silence_duration_ms=500),
condition_on_previous_text=self.config.get("condition_on_previous_text"),
without_timestamps=True
)
# Aggregate text # Aggregate text
text_result = "" text_result = ""
for segment in segments: for segment in segments:
text_result += segment.text + " " text_result += segment.text + " "
return text_result.strip() text_result = text_result.strip()
# Low VRAM Mode: Unload Whisper Model immediately
if self.config.get("unload_models_after_use"):
self.unload_model()
logging.info(f"Final Transcription Output: '{text_result}'")
return text_result
except Exception as e: except Exception as e:
logging.error(f"Transcription failed: {e}") logging.error(f"Transcription failed: {e}")
@@ -117,8 +219,11 @@ class WhisperTranscriber:
def model_exists(self, size: str) -> bool: def model_exists(self, size: str) -> bool:
"""Checks if a model size is already downloaded.""" """Checks if a model size is already downloaded."""
new_path = get_models_path() / f"faster-whisper-{size}" new_path = get_models_path() / f"faster-whisper-{size}"
if (new_path / "config.json").exists(): if new_path.exists():
return True # Strict check
required = ["config.json", "model.bin", "vocabulary.json"]
if all((new_path / f).exists() for f in required):
return True
# Legacy HF cache check # Legacy HF cache check
folder_name = f"models--Systran--faster-whisper-{size}" folder_name = f"models--Systran--faster-whisper-{size}"
@@ -127,3 +232,21 @@ class WhisperTranscriber:
return True return True
return False return False
def unload_model(self):
"""
Unloads model to free memory.
"""
if self.model:
del self.model
self.model = None
self.current_model_size = None
# Force garbage collection
import gc
gc.collect()
if torch.cuda.is_available():
torch.cuda.empty_cache()
logging.info("Whisper Model unloaded (Low VRAM Mode).")

View File

@@ -110,6 +110,7 @@ class UIBridge(QObject):
logAppended = Signal(str) # Emits new log line logAppended = Signal(str) # Emits new log line
settingChanged = Signal(str, 'QVariant') settingChanged = Signal(str, 'QVariant')
modelStatesChanged = Signal() # Notify UI to re-check isModelDownloaded modelStatesChanged = Signal() # Notify UI to re-check isModelDownloaded
llmDownloadRequested = Signal()
def __init__(self, parent=None): def __init__(self, parent=None):
super().__init__(parent) super().__init__(parent)
@@ -245,6 +246,26 @@ class UIBridge(QObject):
# --- Methods called from QML --- # --- Methods called from QML ---
@Slot(result=list)
def get_supported_languages(self):
from src.core.languages import get_language_names
return get_language_names()
@Slot(str)
def set_language_by_name(self, name):
from src.core.languages import get_code_by_name
from src.core.config import ConfigManager
code = get_code_by_name(name)
ConfigManager().set("language", code)
self.settingChanged.emit("language", code)
@Slot(result=str)
def get_current_language_name(self):
from src.core.languages import get_name_by_code
from src.core.config import ConfigManager
code = ConfigManager().get("language")
return get_name_by_code(code)
@Slot(str, result='QVariant') @Slot(str, result='QVariant')
def getSetting(self, key): def getSetting(self, key):
from src.core.config import ConfigManager from src.core.config import ConfigManager
@@ -336,11 +357,7 @@ class UIBridge(QObject):
except Exception as e: except Exception as e:
logging.error(f"Failed to preload audio devices: {e}") logging.error(f"Failed to preload audio devices: {e}")
@Slot()
def toggle_recording(self):
"""Called by UI elements to trigger the app's recording logic."""
# This will be connected to the main app's toggle logic
pass
@Property(bool, notify=isDownloadingChanged) @Property(bool, notify=isDownloadingChanged)
def isDownloading(self): return self._is_downloading def isDownloading(self): return self._is_downloading
@@ -356,27 +373,39 @@ class UIBridge(QObject):
try: try:
from src.core.paths import get_models_path from src.core.paths import get_models_path
# Check new simple format used by DownloadWorker # Check new simple format used by DownloadWorker
path_simple = get_models_path() / f"faster-whisper-{size}" path_simple = get_models_path() / f"faster-whisper-{size}"
if path_simple.exists() and any(path_simple.iterdir()): if path_simple.exists():
return True # Strict check: Ensure all critical files exist
required = ["config.json", "model.bin", "vocabulary.json"]
if all((path_simple / f).exists() for f in required):
return True
# Check HF Cache format (legacy/default) # Check HF Cache format (legacy/default)
folder_name = f"models--Systran--faster-whisper-{size}" folder_name = f"models--Systran--faster-whisper-{size}"
path_hf = get_models_path() / folder_name path_hf = get_models_path() / folder_name
snapshots = path_hf / "snapshots" snapshots = path_hf / "snapshots"
if snapshots.exists() and any(snapshots.iterdir()): if snapshots.exists() and any(snapshots.iterdir()):
return True return True # Legacy cache structure is complex, assume valid if present
# Check direct folder (simple)
path_direct = get_models_path() / size
if (path_direct / "config.json").exists():
return True
return False
except Exception as e: except Exception as e:
logging.error(f"Error checking model status: {e}") logging.error(f"Error checking model status: {e}")
return False
return False
@Slot(result=bool)
def isLLMModelDownloaded(self):
try:
from src.core.paths import get_models_path
# Hardcoded check for the 1B model we support
model_file = get_models_path() / "llm" / "llama-3.2-1b-instruct" / "llama-3.2-1b-instruct-q4_k_m.gguf"
return model_file.exists()
except:
return False
@Slot(str) @Slot(str)
def downloadModel(self, size): def downloadModel(self, size):
@@ -385,3 +414,7 @@ class UIBridge(QObject):
@Slot() @Slot()
def notifyModelStatesChanged(self): def notifyModelStatesChanged(self):
self.modelStatesChanged.emit() self.modelStatesChanged.emit()
@Slot()
def downloadLLM(self):
self.llmDownloadRequested.emit()

View File

@@ -100,7 +100,7 @@ ComboBox {
popup: Popup { popup: Popup {
y: control.height - 1 y: control.height - 1
width: control.width width: control.width
implicitHeight: contentItem.implicitHeight implicitHeight: Math.min(contentItem.implicitHeight, 300)
padding: 5 padding: 5
contentItem: ListView { contentItem: ListView {

View File

@@ -25,7 +25,7 @@ Rectangle {
Text { Text {
anchors.centerIn: parent anchors.centerIn: parent
text: control.recording ? "Listening..." : (control.currentSequence || "None") text: control.recording ? "Listening..." : (formatSequence(control.currentSequence) || "None")
color: control.recording ? SettingsStyle.accent : (control.currentSequence ? "#ffffff" : "#808080") color: control.recording ? SettingsStyle.accent : (control.currentSequence ? "#ffffff" : "#808080")
font.family: "JetBrains Mono" font.family: "JetBrains Mono"
font.pixelSize: 13 font.pixelSize: 13
@@ -72,6 +72,23 @@ Rectangle {
if (!activeFocus) control.recording = false if (!activeFocus) control.recording = false
} }
function formatSequence(seq) {
if (!seq) return ""
var parts = seq.split("+")
for (var i = 0; i < parts.length; i++) {
var p = parts[i]
// Standardize modifiers
if (p === "ctrl") parts[i] = "Ctrl"
else if (p === "alt") parts[i] = "Alt"
else if (p === "shift") parts[i] = "Shift"
else if (p === "win") parts[i] = "Win"
else if (p === "esc") parts[i] = "Esc"
// Capitalize F-keys and others (e.g. f8 -> F8, space -> Space)
else parts[i] = p.charAt(0).toUpperCase() + p.slice(1)
}
return parts.join(" + ")
}
function getKeyName(key, text) { function getKeyName(key, text) {
// F-Keys // F-Keys
if (key >= Qt.Key_F1 && key <= Qt.Key_F35) return "f" + (key - Qt.Key_F1 + 1) if (key >= Qt.Key_F1 && key <= Qt.Key_F35) return "f" + (key - Qt.Key_F1 + 1)

View File

@@ -314,14 +314,34 @@ Window {
spacing: 0 spacing: 0
ModernSettingsItem { ModernSettingsItem {
label: "Global Hotkey" label: "Global Hotkey (Transcribe)"
description: "Press to record a new shortcut (e.g. Ctrl+Space)" description: "Standard: Raw transcription"
control: ModernKeySequenceRecorder { control: ModernKeySequenceRecorder {
Layout.preferredWidth: 200 implicitWidth: 240
currentSequence: ui.getSetting("hotkey") currentSequence: ui.getSetting("hotkey")
onSequenceChanged: (seq) => ui.setSetting("hotkey", seq) onSequenceChanged: (seq) => ui.setSetting("hotkey", seq)
} }
} }
ModernSettingsItem {
label: "Global Hotkey (Correct)"
description: "Enhanced: Transcribe + AI Correction"
control: ModernKeySequenceRecorder {
implicitWidth: 240
currentSequence: ui.getSetting("hotkey_correct")
onSequenceChanged: (seq) => ui.setSetting("hotkey_correct", seq)
}
}
ModernSettingsItem {
label: "Global Hotkey (Translate)"
description: "Press to record a new shortcut (e.g. F10)"
control: ModernKeySequenceRecorder {
implicitWidth: 240
currentSequence: ui.getSetting("hotkey_translate")
onSequenceChanged: (seq) => ui.setSetting("hotkey_translate", seq)
}
}
ModernSettingsItem { ModernSettingsItem {
label: "Run on Startup" label: "Run on Startup"
@@ -349,8 +369,8 @@ Window {
showSeparator: false showSeparator: false
control: ModernSlider { control: ModernSlider {
Layout.preferredWidth: 200 Layout.preferredWidth: 200
from: 10; to: 6000 from: 10; to: 20000
stepSize: 10 stepSize: 100
snapMode: Slider.SnapAlways snapMode: Slider.SnapAlways
value: ui.getSetting("typing_speed") value: ui.getSetting("typing_speed")
onMoved: ui.setSetting("typing_speed", value) onMoved: ui.setSetting("typing_speed", value)
@@ -577,6 +597,53 @@ Window {
Text { text: "Model configuration and performance"; color: SettingsStyle.textSecondary; font.family: mainFont; font.pixelSize: 14 } Text { text: "Model configuration and performance"; color: SettingsStyle.textSecondary; font.family: mainFont; font.pixelSize: 14 }
} }
ModernSettingsSection {
title: "Style & Prompting"
Layout.margins: 32
Layout.topMargin: 0
content: ColumnLayout {
width: parent.width
spacing: 0
ModernSettingsItem {
label: "Punctuation Style"
description: "Hint for how to format text"
control: ModernComboBox {
id: styleCombo
width: 180
model: ["Standard (Proper)", "Casual (Lowercase)", "Custom"]
// Logic to determine initial index based on config string
Component.onCompleted: {
let current = ui.getSetting("initial_prompt")
if (current === "Mm-hmm. Okay, let's go. I speak in full sentences.") currentIndex = 0
else if (current === "um, okay... i guess so.") currentIndex = 1
else currentIndex = 2
}
onActivated: {
if (index === 0) ui.setSetting("initial_prompt", "Mm-hmm. Okay, let's go. I speak in full sentences.")
else if (index === 1) ui.setSetting("initial_prompt", "um, okay... i guess so.")
// Custom: Don't change string immediately, let user type
}
}
}
ModernSettingsItem {
label: "Custom Prompt"
description: "Advanced: Define your own style hint"
visible: styleCombo.currentIndex === 2
control: ModernTextField {
Layout.preferredWidth: 280
placeholderText: "e.g. 'Hello, World.'"
text: ui.getSetting("initial_prompt") || ""
onEditingFinished: ui.setSetting("initial_prompt", text === "" ? null : text)
}
}
}
}
ModernSettingsSection { ModernSettingsSection {
title: "Model Config" title: "Model Config"
Layout.margins: 32 Layout.margins: 32
@@ -742,15 +809,17 @@ Window {
ModernSettingsItem { ModernSettingsItem {
label: "Language" label: "Language"
description: "Force language or Auto-detect" description: "Spoken language to transcribe"
control: ModernComboBox { control: ModernComboBox {
width: 140 Layout.preferredWidth: 200
model: ["auto", "en", "fr", "de", "es", "it", "ja", "zh", "ru"] model: ui.get_supported_languages()
currentIndex: model.indexOf(ui.getSetting("language")) currentIndex: model.indexOf(ui.get_current_language_name())
onActivated: ui.setSetting("language", currentText) onActivated: (index) => ui.set_language_by_name(currentText)
} }
} }
// Task selector removed as per user request (Hotkeys handle this now)
ModernSettingsItem { ModernSettingsItem {
label: "Compute Device" label: "Compute Device"
description: "Hardware acceleration (CUDA requires NVidia GPU)" description: "Hardware acceleration (CUDA requires NVidia GPU)"
@@ -773,6 +842,147 @@ Window {
onActivated: ui.setSetting("compute_type", currentText) onActivated: ui.setSetting("compute_type", currentText)
} }
} }
ModernSettingsItem {
label: "Low VRAM Mode"
description: "Unload models immediately after use (Saves VRAM, Adds Delay)"
showSeparator: false
control: ModernSwitch {
checked: ui.getSetting("unload_models_after_use")
onToggled: ui.setSetting("unload_models_after_use", checked)
}
}
}
}
ModernSettingsSection {
title: "Correction & Rewriting"
Layout.margins: 32
Layout.topMargin: 0
content: ColumnLayout {
width: parent.width
spacing: 0
ModernSettingsItem {
label: "Enable Correction"
description: "Post-process text with Llama 3.2 1B (Adds latency)"
control: ModernSwitch {
checked: ui.getSetting("llm_enabled")
onToggled: ui.setSetting("llm_enabled", checked)
}
}
ModernSettingsItem {
label: "Correction Mode"
description: "Grammar Fix vs. Complete Rewrite"
visible: ui.getSetting("llm_enabled")
control: ModernComboBox {
width: 140
model: ["Grammar", "Standard", "Rewrite"]
currentIndex: model.indexOf(ui.getSetting("llm_mode"))
onActivated: ui.setSetting("llm_mode", currentText)
}
}
// LLM Model Status Card
Rectangle {
Layout.fillWidth: true
Layout.margins: 12
Layout.topMargin: 0
Layout.bottomMargin: 16
height: 54
color: "#0a0a0f"
visible: ui.getSetting("llm_enabled")
radius: 6
border.color: SettingsStyle.borderSubtle
border.width: 1
property bool isDownloaded: false
property bool isDownloading: ui.isDownloading && ui.statusText.indexOf("LLM") !== -1
Timer {
interval: 2000
running: visible
repeat: true
onTriggered: parent.checkStatus()
}
function checkStatus() {
isDownloaded = ui.isLLMModelDownloaded()
}
Component.onCompleted: checkStatus()
Connections {
target: ui
function onModelStatesChanged() { parent.checkStatus() }
function onIsDownloadingChanged() { parent.checkStatus() }
}
RowLayout {
anchors.fill: parent
anchors.leftMargin: 12
anchors.rightMargin: 12
spacing: 12
Image {
source: "smart_toy.svg"
sourceSize: Qt.size(16, 16)
layer.enabled: true
layer.effect: MultiEffect {
colorization: 1.0
colorizationColor: parent.parent.isDownloaded ? SettingsStyle.accent : "#808080"
}
}
ColumnLayout {
Layout.fillWidth: true
spacing: 2
Text {
text: "Llama 3.2 1B (Instruct)"
color: "#ffffff"
font.family: "JetBrains Mono"; font.bold: true
font.pixelSize: 11
}
Text {
text: parent.parent.isDownloaded ? "Ready." : "Model missing (~1.2GB)"
color: SettingsStyle.textSecondary
font.family: "JetBrains Mono"; font.pixelSize: 10
}
}
Button {
id: dlBtn
text: "Download"
visible: !parent.parent.isDownloaded && !parent.parent.isDownloading
Layout.preferredHeight: 24
Layout.preferredWidth: 80
contentItem: Text {
text: "DOWNLOAD"
font.pixelSize: 10; font.bold: true; color: "#000000"; horizontalAlignment: Text.AlignHCenter; verticalAlignment: Text.AlignVCenter
}
background: Rectangle {
color: dlBtn.hovered ? "#ffffff" : SettingsStyle.accent; radius: 4
}
onClicked: ui.downloadLLM()
}
// Progress Bar
Rectangle {
visible: parent.parent.isDownloading
Layout.fillWidth: true
height: 4
color: "#30ffffff"
Rectangle {
width: parent.width * (ui.downloadProgress / 100)
height: parent.height
color: SettingsStyle.accent
}
}
}
}
} }
} }

32
src/utils/formatters.py Normal file
View File

@@ -0,0 +1,32 @@
"""
Formatter Utilities
===================
Helper functions for text formatting.
"""
def format_hotkey(sequence: str) -> str:
"""
Formats a hotkey sequence string (e.g. 'ctrl+alt+f9')
into a pretty readable string (e.g. 'Ctrl + Alt + F9').
"""
if not sequence:
return "None"
parts = sequence.split('+')
formatted_parts = []
for p in parts:
p = p.strip().lower()
if p == 'ctrl': formatted_parts.append('Ctrl')
elif p == 'alt': formatted_parts.append('Alt')
elif p == 'shift': formatted_parts.append('Shift')
elif p == 'win': formatted_parts.append('Win')
elif p == 'esc': formatted_parts.append('Esc')
else:
# Capitalize first letter
if len(p) > 0:
formatted_parts.append(p[0].upper() + p[1:])
else:
formatted_parts.append(p)
return " + ".join(formatted_parts)

View File

@@ -55,6 +55,10 @@ except AttributeError:
def LOWORD(l): return l & 0xffff def LOWORD(l): return l & 0xffff
def HIWORD(l): return (l >> 16) & 0xffff def HIWORD(l): return (l >> 16) & 0xffff
GWL_EXSTYLE = -20
WS_EX_TRANSPARENT = 0x00000020
WS_EX_LAYERED = 0x00080000
class WindowHook: class WindowHook:
def __init__(self, hwnd, width, height, initial_scale=1.0): def __init__(self, hwnd, width, height, initial_scale=1.0):
self.hwnd = hwnd self.hwnd = hwnd
@@ -65,6 +69,34 @@ class WindowHook:
# (Window 420x140, Pill 380x100) # (Window 420x140, Pill 380x100)
self.logical_rect = [20, 20, 20+380, 20+100] self.logical_rect = [20, 20, 20+380, 20+100]
self.current_scale = initial_scale self.current_scale = initial_scale
self.enabled = True # New flag
def set_enabled(self, enabled):
"""
Enables or disables interaction.
When disabled, we set WS_EX_TRANSPARENT so clicks pass through physically.
"""
if self.enabled == enabled:
return
self.enabled = enabled
# Get current styles
style = user32.GetWindowLongW(self.hwnd, GWL_EXSTYLE)
if not enabled:
# Enable Click-Through (Add Transparent)
# We also ensure Layered is set (Qt usually sets it, but good to be sure)
new_style = style | WS_EX_TRANSPARENT | WS_EX_LAYERED
else:
# Disable Click-Through (Remove Transparent)
new_style = style & ~WS_EX_TRANSPARENT
if new_style != style:
SetWindowLongPtr(self.hwnd, GWL_EXSTYLE, new_style)
# Force a redraw/frame update just in case
user32.SetWindowPos(self.hwnd, 0, 0, 0, 0, 0, 0x0027) # SWP_NOMOVE | SWP_NOSIZE | SWP_NOZORDER | SWP_FRAMECHANGED
def install(self): def install(self):
proc_address = ctypes.cast(self.new_wnd_proc, ctypes.c_void_p) proc_address = ctypes.cast(self.new_wnd_proc, ctypes.c_void_p)
@@ -73,6 +105,10 @@ class WindowHook:
def wnd_proc_callback(self, hwnd, msg, wParam, lParam): def wnd_proc_callback(self, hwnd, msg, wParam, lParam):
try: try:
if msg == WM_NCHITTEST: if msg == WM_NCHITTEST:
# If disabled (invisible/inactive), let clicks pass through (HTTRANSPARENT)
if not self.enabled:
return HTTRANSPARENT
res = self.on_nchittest(lParam) res = self.on_nchittest(lParam)
if res != 0: if res != 0:
return res return res

38
test_m2m.py Normal file
View File

@@ -0,0 +1,38 @@
import sys
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
def test_m2m():
model_name = "facebook/m2m100_418M"
print(f"Loading {model_name}...")
tokenizer = M2M100Tokenizer.from_pretrained(model_name)
model = M2M100ForConditionalGeneration.from_pretrained(model_name)
# Test cases: (Language Code, Input)
test_cases = [
("en", "he go to school yesterday"),
("pl", "on iść do szkoła wczoraj"), # Intentional broken grammar in Polish
]
print("\nStarting M2M Tests (Self-Translation):\n")
for lang, input_text in test_cases:
tokenizer.src_lang = lang
encoded = tokenizer(input_text, return_tensors="pt")
# Translate to SAME language
generated_tokens = model.generate(
**encoded,
forced_bos_token_id=tokenizer.get_lang_id(lang)
)
corrected = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
print(f"[{lang}]")
print(f"Input: {input_text}")
print(f"Output: {corrected}")
print("-" * 20)
if __name__ == "__main__":
test_m2m()

40
test_mt0.py Normal file
View File

@@ -0,0 +1,40 @@
import sys
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
def test_mt0():
model_name = "bigscience/mt0-base"
print(f"Loading {model_name}...")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
# Test cases: (Language, Prompt, Input)
# MT0 is instruction tuned, so we should prompt it in the target language or English.
# Cross-lingual prompting (English prompt -> Target tasks) is usually supported.
test_cases = [
("English", "Correct grammar:", "he go to school yesterday"),
("Polish", "Popraw gramatykę:", "to jest testowe zdanie bez kropki"),
("Finnish", "Korjaa kielioppi:", "tämä on testilause ilman pistettä"),
("Russian", "Исправь грамматику:", "это тестовое предложение без точки"),
("Japanese", "文法を直してください:", "これは点のないテスト文です"),
("Spanish", "Corrige la gramática:", "esta es una oración de prueba sin punto"),
]
print("\nStarting MT0 Tests:\n")
for lang, prompt_text, input_text in test_cases:
full_input = f"{prompt_text} {input_text}"
inputs = tokenizer(full_input, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=128)
corrected = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"[{lang}]")
print(f"Input: {full_input}")
print(f"Output: {corrected}")
print("-" * 20)
if __name__ == "__main__":
test_mt0()

34
test_punctuation.py Normal file
View File

@@ -0,0 +1,34 @@
import sys
import os
# Add src to path
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from src.core.grammar_assistant import GrammarAssistant
def test_punctuation():
assistant = GrammarAssistant()
assistant.load_model()
samples = [
# User's example (verbatim)
"If the voice recognition doesn't recognize that I like stopped Or something would that would it also correct that",
# Generic run-on
"hello how are you doing today i am doing fine thanks for asking",
# Missing commas/periods
"well i think its valid however we should probably check the logs first"
]
print("\nStarting Punctuation Tests:\n")
for sample in samples:
print(f"Original: {sample}")
corrected = assistant.correct(sample)
print(f"Corrected: {corrected}")
print("-" * 20)
if __name__ == "__main__":
test_punctuation()