14 Commits

Author SHA1 Message Date
Your Name
84f10092e9 Release v1.0.2: Implemented Style Prompting & Removed Grammar Correction
- Removed M2M100 Grammar Correction model completely to reduce bloat/complexity.
- Implemented 'Style Prompting' in Settings -> AI Engine to handle punctuation natively via Whisper.
- Added Style Presets: Standard (Default), Casual, and Custom.
- Optimized Build: Bootstrapper no longer requires transformers/sentencepiece.
- Fixed 'torch' NameError in Low VRAM mode.
- Fixed Bootstrapper missing dependency detection.
- Updated UI to reflect removed features.
- Included compiled v1.0.2 Executable in dist/.
2026-01-25 13:42:06 +02:00
Your Name
03f46ee1e3 Docs: Final polish - Enshittification manifesto and structural refinement 2026-01-24 19:21:01 +02:00
Your Name
0f1bf5f1af Docs: Final polish - 6-col language table and refined manifesto 2026-01-24 19:12:08 +02:00
Your Name
0b2b5848e2 Fix: Translation Reliability, Click-Through, and Docs Sync
- Transcriber: Enforced 'beam_size=5' and prompt injection for robust translation.
- Transcriber: Removed conditioning on previous text to prevent language stickiness.
- Transcriber: Refactored kwargs to sanitize inputs.
- Overlay: Fixed click-through by toggling WS_EX_TRANSPARENT.
- UI: Added real download progress reporting.
- Docs: Refactored language list to table.
2026-01-24 19:05:43 +02:00
Your Name
f3bf7541cf Docs: Detailed expansion of README with Translation features and open layout 2026-01-24 18:33:22 +02:00
Your Name
4b84a27a67 v1.0.1 Feature Update and Polish
Full Changelog:

[New Features]
- Added Native Translation Mode:
  - Whisper model now fully supports Translating any language to English
  - Added 'task' and 'language' parameters to Transcriber core
- Dual Hotkey Support:
  - Added separate Global Hotkeys for Transcribe (default F8) and Translate (default F10)
  - Both hotkeys are fully customizable in Settings
  - Engine dynamically switches modes based on which key is pressed

[UI/UX Improvements]
- Settings Window:
  - Widened Hotkey Input fields (240px) to accommodate long combinations
  - Added Pretty-Printing for hotkey sequences (e.g. 'ctrl+f9' display as 'Ctrl + F9')
  - Replaced Country Code dropdown with Full Language Names (99+ languages)
  - Made Language Dropdown scrollable (max height 300px) to prevent screen overflow
  - Removed redundant 'Task' selector (replaced by dedicated hotkeys)
- System Tray:
  - Tooltip now displays both Transcribe and Translate hotkeys
  - Tooltip hotkeys are formatted readably

[Core & Performance]
- Bootstrapper:
  - Implemented Smart Incremental Sync
  - Now checks filesize and content hash before copying files
  - Drastically reduces startup time for subsequent runs
  - Preserves user settings.json during updates
- Backend:
  - Fixed HotkeyManager to support dynamic configuration keys
  - Fixed Language Lock: selecting a language now correctly forces the model to use it
  - Refactored bridge/main connection for language list handling
2026-01-24 18:29:10 +02:00
Your Name
f184eb0037 Fix: Invisible overlay blocking mouse clicks
Problem:
The overlay window, even when fully transparent or visually hidden (opacity 0), was still intercepting mouse events. This created a 'dead zone' on the screen where users could not click through to applications behind the overlay. This occurred because the low-level window hook was answering 'HTCAPTION' to hit tests regardless of the UI state.

Solution:
1. Modified 'WindowHook' to accept an 'enabled' state.
2. When disabled, 'WM_NCHITTEST' now returns 'HTTRANSPARENT', allowing the OS to pass the click to the window underneath.
3. Updated 'main.py' to toggle this hook state dynamically:
   - ENABLED when Recording or Processing (UI is visible/active).
   - DISABLED when Idling (UI is hidden/transparent).

Result:
The overlay is now completely non-intrusive when not in use.
2026-01-24 17:51:23 +02:00
Your Name
306bd075ed Aesthetic overhaul of documentation 2026-01-24 17:29:59 +02:00
Your Name
a1cc9c61b9 Add language list and file transcription info 2026-01-24 17:27:54 +02:00
Your Name
e627e1b8aa Correct hardware detection statement in docs 2026-01-24 17:24:56 +02:00
Your Name
eaa572b42f Fix release badge for Gitea 2026-01-24 17:22:14 +02:00
Your Name
e900201214 Final documentation polish 2026-01-24 17:20:22 +02:00
Your Name
0d426aea4b Update docs with license and model stats 2026-01-24 17:16:53 +02:00
Your Name
b15ce8076f Enhance documentation 2026-01-24 17:12:21 +02:00
19 changed files with 895 additions and 167 deletions

177
README.md
View File

@@ -1,71 +1,162 @@
# Whisper Voice <div align="center">
**Reclaim Your Voice from the Cloud.** # 🎙️ W H I S P E R &nbsp; V O I C E
### SOVEREIGN SPEECH RECOGNITION
Whisper Voice is a high-performance, strictly local speech-to-text tool designed for the desktop. It provides instant, high-accuracy dictation anywhere on your system—no internet connection required, no corporate servers, and absolutely no data harvesting. <br>
We believe that the tools of production—and communication—should belong to the individual, not rented from centralized tech giants. ![Status](https://img.shields.io/badge/STATUS-OPERATIONAL-success?style=for-the-badge&logo=server&color=2ecc71)
[![Download](https://img.shields.io/gitea/v/release/lashman/whisper_voice?gitea_url=https%3A%2F%2Fgit.lashman.live&label=Install&style=for-the-badge&logo=windows&logoColor=white&color=3b82f6)](https://git.lashman.live/lashman/whisper_voice/releases/latest)
[![License](https://img.shields.io/badge/LICENSE-PUBLIC_DOMAIN-lightgrey?style=for-the-badge&logo=creative-commons&logoColor=black)](https://creativecommons.org/publicdomain/zero/1.0/)
<br>
> *"The master's tools will never dismantle the master's house."*
> <br>
> **Build your own tools. Run them locally. Free your mind.**
[View Source](https://git.lashman.live/lashman/whisper_voice) • [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues)
</div>
<br>
<br>
## 📡 The Transmission
We are witnessing the **enshittification** of the digital world. What were once vibrant social commons are being walled off, strip-mined for data, and degraded into rent-seeking silos. Your voice is no longer your own; it is a training set for a corporate oracle that charges you for the privilege of listening.
**Whisper Voice** is a small act of sabotage against this trend.
It is built on the axiom of **Technological Sovereignty**. By moving state-of-the-art inference from the server farms to your own silicon, you reclaim the means of digital production. No telemetry. No subscriptions. No "cloud processing" that eavesdrops on your intent.
--- ---
## ✊ Core Principles ## ⚡ The Engine
### 1. Total Autonomy (Local-First) Whisper Voice operates directly on the metal. It is not an API wrapper; it is an autonomous machine.
Your voice data is yours alone. Unlike commercial alternatives that siphon your words to remote data centers for processing and profiling, Whisper Voice runs entirely on your hardware. **No masters, no servers.** You retain full sovereignty over your digital footprint.
### 2. Decentralized Power | Component | Technology | Benefit |
By leveraging optimized local processing, we strip away the need for reliance on massive, energy-hungry corporate infrastructure. This is technology scaled to the human level—powerful, efficient, and completely under your control. | :--- | :--- | :--- |
| **Inference Core** | **Faster-Whisper** | Hyper-optimized C++ implementation via **CTranslate2**. Delivers **4x velocity** over standard PyTorch. |
| **Compression** | **INT8 quantization** | Enables Pro-grade models (`Large-v3`) to run on consumer-grade GPUs, democratizing elite AI. |
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only pure intent is processed. |
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. |
### 3. Accessible to All <br>
High-quality speech recognition shouldn't be gated behind subscriptions or paywalls. This tool is free, open, and built to empower users to interact with their machines on their own terms.
## 🖋️ Universal Transcription
At its core, Whisper Voice is the ultimate bridge between thought and text. It listens with superhuman precision, converting spoken word into written form across **99 languages**.
* **Punctuation Mastery**: Automatically handles capitalization and complex punctuation formatting.
* **Contextual Intelligence**: Smarter than standard dictation; it understands the flow of sentences to resolve homophones and technical jargon ($1.5k vs "fifteen hundred dollars").
* **Total Privacy**: Your private dictation, legal notes, or creative writing never leave your RAM.
### Workflow: `F9 (Default)`
The primary channel for native-language transcription. It transcribes precisely what it hears in the language you speak (or the one you've locked in Settings).
<br>
## 🌎 Universal Translation
Whisper Voice v1.0.1 includes a **Neural Translation Engine** that allows you to bridge any linguistic gap instantly.
* **Input**: Speak in French, Japanese, Russian, or **96 other languages**.
* **Output**: The engine instantly reconstructs the semantic meaning into fluent **English**.
* **Task Protocol**: Handled via the dedicated `F10` channel.
### 🔍 Why only English translation?
A common question arises: *Why can't I translate from French to Japanese?*
The architecture of the underlying Whisper model is a **Many-to-English** design. During its massive training phase (680,000 hours of audio), the translation task was specifically optimized to map the global linguistic commons onto a single bridge language: **English**. This allowed the model to reach incredible levels of semantic understanding without the exponential complexity of a "Many-to-Many" mapping.
By focusing its translation decoder solely on English, Whisper achieves "Zero-Shot" quality that rivals specialized translation engines while remaining lightweight enough to run on your local GPU.
--- ---
## ✨ Features ## 🕹️ Command & Control
* **100% Offline Processing**: Once the recognition engine is downloaded, the cable can be cut. Nothing leaves your machine. ### Global Hotkeys
* **Universal Compatibility**: Works in any text field—editors, chat apps, terminals, or browsers. If you can type there, you can speak there. The agent runs silently in the background, waiting for your signal.
* **Adaptive Input**:
* *Clipboard Mode*: Standard paste injection. * **Transcribe (F9)**: Opens the channel for standard speech-to-text.
* *High-Speed Simulation*: Simulates keystrokes at supersonic speeds (up to 6000 CPM) for apps that block pasting. * **Translate (F10)**: Opens the channel for neural translation.
* **System Integration**: Minimalist overlay and system tray presence. It exists when you need it and vanishes when you don't. * **Customization**: Remap these keys in Settings. The recorder supports complex chords (e.g. `Ctrl + Alt + Space`) to fit your workflow.
* **Resource Efficiency**: Optimized to run smoothly on consumer hardware without monopolizing your system resources.
### Injection Protocols
* **Clipboard Paste**: Standard text injection. Instant, reliable.
* **Simulate Typing**: Mimics physical keystrokes at superhuman speed (6000 CPM). Bypasses anti-paste restrictions and "protected" windows.
<br>
## 📊 Intelligence Matrix
Select the model that aligns with your available resources.
| Model | VRAM (GPU) | RAM (CPU) | Designation | Capability |
| :--- | :--- | :--- | :--- | :--- |
| `Tiny` | **~500 MB** | ~1 GB | ⚡ **Supersonic** | Command & Control, older hardware. |
| `Base` | **~600 MB** | ~1 GB | 🚀 **Very Fast** | Daily driver for low-power laptops. |
| `Small` | **~1 GB** | ~2 GB | ⏩ **Fast** | High accuracy English dictation. |
| `Medium` | **~2 GB** | ~4 GB | ⚖️ **Balanced** | Complex vocabulary, foreign accents. |
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **The Sweet Spot.** Near-Large intelligence, Medium speed. |
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional grade. Uncompromised. |
> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.*
--- ---
## 🚀 Getting Started ## 🛠️ Deployment
### Installation ### 📥 Installation
1. Download the latest release. 1. **Acquire**: Download `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
2. Run `WhisperVoice.exe`. 2. **Deploy**: Place it anywhere. It is portable.
3. On the first run, the bootstrapper will autonomously provision the necessary runtime environment. This ensures your system remains clean and dependencies are self-contained. 3. **Bootstrap**: Run it. The agent will self-provision an isolated Python runtime (~2GB) on first launch.
4. **Sync**: Future updates are handled by the **Smart Bootstrapper**, which surgically updates only changed files, respecting your bandwidth and your settings.
### Usage ### 🔧 Troubleshooting
1. **Set Your Trigger**: Configure a global hotkey (default: `F9`) in the settings. * **App crashes on start**: Ensure you have [Microsoft Visual C++ Redistributable 2015-2022](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) installed.
2. **Speak Freely**: Hold the hotkey (or toggle it) and speak. * **"Simulate Typing" is slow**: Some applications (remote desktops, legacy games) cannot handle the data stream. Lower the typing speed in Settings to ~1200 CPM.
3. **Direct Action**: Your words are instantly transcribed and injected into your active window. * **No Audio**: The agent listens to the **Default Communication Device**. Verify your Windows Sound Control Panel.
<br>
--- ---
## ⚙️ Configuration ## 🌐 Supported Languages
The **Settings** panel puts the means of configuration in your hands: The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on **Auto-Detect** for fluid multilingual usage.
* **Recognition Engine**: Choose the size of the model that fits your hardware capabilities (Tiny to Large). Larger models offer greater precision but require more computing power. | | | | | | |
* **Input Method**: Switch between "Clipboard Paste" and "Simulate Typing" depending on target application restrictions. | :--- | :--- | :--- | :--- | :--- | :--- |
* **Typing Speed**: Adjust the keystroke injection rate. Crank it up to 6000 CPM for instant text delivery. | Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 | Armenian 🇦🇲 | Assamese 🇮🇳 |
* **Run on Startup**: Configure the agent to be ready the moment your session begins. | Azerbaijani 🇦🇿 | Bashkir 🇷🇺 | Basque 🇪🇸 | Belarusian 🇧🇾 | Bengali 🇧🇩 | Bosnian 🇧🇦 |
| Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 | Castilian 🇪🇸 | Catalan 🇪🇸 | Chinese 🇨🇳 |
| Croatian 🇭🇷 | Czech 🇨🇿 | Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 |
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 | Indonesian 🇮🇩 | Italian 🇮🇹 |
| Japanese 🇯🇵 | Javanese 🇮 Indonesa | Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 |
| Macedonian 🇲🇰 | Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 | Norwegian 🇳🇴 |
| Occitan 🇫🇷 | Panjabi 🇮🇳 | Pashto 🇦🇫 | Persian 🇮🇷 | Polish 🇵🇱 | Portuguese 🇵🇹 |
| Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 | Sanskrit 🇮🇳 | Serbian 🇷🇸 | Shona 🇿🇼 |
| Sindhi 🇵🇰 | Sinhala 🇱🇰 | Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 |
| Sundanese 🇮🇩 | Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 | Tajik 🇹🇯 | Tamil 🇮🇳 |
| Tatar 🇷🇺 | Telugu 🇮🇳 | Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 | Turkmen 🇹🇲 |
| Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e | Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿 | Yiddish 🇮🇱 |
| Yoruba 🇳🇬 | | | | | |
--- <br>
<br>
## 🤝 Mutual Aid <div align="center">
This project thrives on community collaboration. If you have improvements, fixes, or ideas, you are encouraged to contribute. We build better systems when we build them together, horizontally and transparently. ### ⚖️ PUBLIC DOMAIN (CC0 1.0)
*No Rights Reserved. No Gods. No Masters. No Managers.*
* **Report Issues**: If something breaks, let us know. Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD).
* **Contribute Code**: The source is open. Fork it, improve it, share it.
--- </div>
*Built with local processing libraries and Qt.*
*No gods, no cloud managers.*

View File

@@ -259,48 +259,72 @@ class Bootstrapper:
process.wait() process.wait()
def refresh_app_source(self): def refresh_app_source(self):
"""Refresh app source files. Skips if already exists to save time.""" """
# Optimization: If app/main.py exists, skip update to improve startup speed. Smartly updates app source files by only copying changed files.
# The user can delete the 'runtime' folder to force an update. Preserves user settings and reduces disk I/O.
if (self.app_path / "main.py").exists(): """
log("App already exists. Skipping update.") if self.ui: self.ui.set_status("Checking for updates...")
return True
if self.ui: self.ui.set_status("Updating app files...")
try: try:
# Preserve settings.json if it exists # 1. Ensure destination exists
settings_path = self.app_path / "settings.json" if not self.app_path.exists():
temp_settings = None self.app_path.mkdir(parents=True, exist_ok=True)
if settings_path.exists():
try:
temp_settings = settings_path.read_bytes()
except:
log("Failed to backup settings.json, it involves risk of data loss.")
if self.app_path.exists():
shutil.rmtree(self.app_path, ignore_errors=True)
shutil.copytree( # 2. Walk source and sync
self.source_path, # source_path is the temporary bundled folder
self.app_path, # app_path is the persistent runtime folder
ignore=shutil.ignore_patterns(
'__pycache__', '*.pyc', '.git', 'venv', changes_made = 0
'build', 'dist', '*.egg-info', 'runtime'
) for src_dir, dirs, files in os.walk(self.source_path):
) # Determine relative path from source root
rel_path = Path(src_dir).relative_to(self.source_path)
# Restore settings.json dst_dir = self.app_path / rel_path
if temp_settings:
try: # Ensure directory exists
settings_path.write_bytes(temp_settings) if not dst_dir.exists():
log("Restored settings.json") dst_dir.mkdir(parents=True, exist_ok=True)
except:
log("Failed to restore settings.json") for file in files:
# Skip ignored files
if file in ['__pycache__', '.git', 'settings.json'] or file.endswith('.pyc'):
continue
src_file = Path(src_dir) / file
dst_file = dst_dir / file
# Check if update needed
should_copy = False
if not dst_file.exists():
should_copy = True
else:
# Compare size first (fast)
if src_file.stat().st_size != dst_file.stat().st_size:
should_copy = True
else:
# Compare content (slower but accurate)
# Only read if size matches to verify diff
if src_file.read_bytes() != dst_file.read_bytes():
should_copy = True
if should_copy:
shutil.copy2(src_file, dst_file)
changes_made += 1
if self.ui: self.ui.set_detail(f"Updated: {file}")
# 3. Cleanup logic (Optional: remove files in dest that are not in source)
# For now, we only add/update to prevent deleting generated user files (logs, etc)
if changes_made > 0:
log(f"Update complete. {changes_made} files changed.")
else:
log("App is up to date.")
return True return True
except Exception as e: except Exception as e:
log(f"Error refreshing app source: {e}") log(f"Error refreshing app source: {e}")
# Fallback to nuclear option if sync fails completely?
# No, 'smart_sync' failing might mean permissions, nuclear wouldn't help.
return False return False
def run_app(self): def run_app(self):
@@ -323,11 +347,17 @@ class Bootstrapper:
messagebox.showerror("WhisperVoice Error", f"Failed to launch app: {e}") messagebox.showerror("WhisperVoice Error", f"Failed to launch app: {e}")
return False return False
def check_dependencies(self):
"""Quick check if critical dependencies are installed."""
return True # Deprecated logic placeholder
def setup_and_run(self): def setup_and_run(self):
"""Full setup/update and run flow.""" """Full setup/update and run flow."""
try: try:
# 1. Ensure basics
if not self.is_python_ready(): if not self.is_python_ready():
self.download_python() self.download_python()
self._fix_pth_file() # Ensure pth is fixed immediately after download
self.install_pip() self.install_pip()
self.install_packages() self.install_packages()
@@ -338,7 +368,10 @@ class Bootstrapper:
if self.run_app(): if self.run_app():
if self.ui: self.ui.root.quit() if self.ui: self.ui.root.quit()
except Exception as e: except Exception as e:
messagebox.showerror("Setup Error", f"Installation failed: {e}") if self.ui:
import tkinter.messagebox as mb
mb.showerror("Setup Error", f"Installation failed: {e}") # Improved error visibility
log(f"Fatal error: {e}")
import traceback import traceback
traceback.print_exc() traceback.print_exc()

BIN
dist/WhisperVoice.exe vendored Normal file

Binary file not shown.

183
main.py
View File

@@ -87,7 +87,7 @@ def _silent_shutdown_hook(exc_type, exc_value, exc_tb):
sys.excepthook = _silent_shutdown_hook sys.excepthook = _silent_shutdown_hook
class DownloadWorker(QThread): class DownloadWorker(QThread):
"""Background worker for model downloads.""" """Background worker for model downloads with REAL progress."""
progress = Signal(int) progress = Signal(int)
finished = Signal() finished = Signal()
error = Signal(str) error = Signal(str)
@@ -98,33 +98,81 @@ class DownloadWorker(QThread):
def run(self): def run(self):
try: try:
from faster_whisper import download_model import requests
from tqdm import tqdm
model_path = get_models_path() model_path = get_models_path()
# Download to a specific subdirectory to keep things clean and predictable # Determine what to download
# This matches the logic in transcriber.py which looks for this specific path
dest_dir = model_path / f"faster-whisper-{self.model_name}" dest_dir = model_path / f"faster-whisper-{self.model_name}"
logging.info(f"Downloading Model '{self.model_name}' to {dest_dir}...") repo_id = f"Systran/faster-whisper-{self.model_name}"
files = ["config.json", "model.bin", "tokenizer.json", "vocabulary.json"]
base_url = f"https://huggingface.co/{repo_id}/resolve/main"
dest_dir.mkdir(parents=True, exist_ok=True)
logging.info(f"Downloading {self.model_name} to {dest_dir}...")
# Ensure parent exists # 1. Calculate Total Size
model_path.mkdir(parents=True, exist_ok=True) total_size = 0
file_sizes = {}
# output_dir in download_model specifies where the model files are saved with requests.Session() as s:
download_model(self.model_name, output_dir=str(dest_dir)) for fname in files:
url = f"{base_url}/{fname}"
head = s.head(url, allow_redirects=True)
if head.status_code == 200:
size = int(head.headers.get('content-length', 0))
file_sizes[fname] = size
total_size += size
else:
# Fallback for vocabulary.json vs vocabulary.txt
if fname == "vocabulary.json":
# Try .txt? Or just skip if not found?
# Faster-whisper usually has vocabulary.json
pass
# 2. Download loop
downloaded_bytes = 0
with requests.Session() as s:
for fname in files:
if fname not in file_sizes: continue
url = f"{base_url}/{fname}"
dest_file = dest_dir / fname
# Resume check?
# Simpler to just overwrite for reliability unless we want complex resume logic.
# We'll overwrite.
resp = s.get(url, stream=True)
resp.raise_for_status()
with open(dest_file, 'wb') as f:
for chunk in resp.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
downloaded_bytes += len(chunk)
# Emit Progress
if total_size > 0:
pct = int((downloaded_bytes / total_size) * 100)
self.progress.emit(pct)
self.finished.emit() self.finished.emit()
except Exception as e: except Exception as e:
logging.error(f"Download failed: {e}") logging.error(f"Download failed: {e}")
self.error.emit(str(e)) self.error.emit(str(e))
class TranscriptionWorker(QThread): class TranscriptionWorker(QThread):
finished = Signal(str) finished = Signal(str)
def __init__(self, transcriber, audio_data, is_file=False, parent=None): def __init__(self, transcriber, audio_data, is_file=False, parent=None, task_override=None):
super().__init__(parent) super().__init__(parent)
self.transcriber = transcriber self.transcriber = transcriber
self.audio_data = audio_data self.audio_data = audio_data
self.is_file = is_file self.is_file = is_file
self.task_override = task_override
def run(self): def run(self):
text = self.transcriber.transcribe(self.audio_data, is_file=self.is_file) text = self.transcriber.transcribe(self.audio_data, is_file=self.is_file, task=self.task_override)
self.finished.emit(text) self.finished.emit(text)
class WhisperApp(QObject): class WhisperApp(QObject):
@@ -166,13 +214,18 @@ class WhisperApp(QObject):
self.tray.transcribe_file_requested.connect(self.transcribe_file) self.tray.transcribe_file_requested.connect(self.transcribe_file)
# Init Tooltip # Init Tooltip
hotkey = self.config.get("hotkey") from src.utils.formatters import format_hotkey
self.tray.setToolTip(f"Whisper Voice - Press {hotkey} to Record") self.format_hotkey = format_hotkey # Store ref
hk1 = self.format_hotkey(self.config.get("hotkey"))
hk2 = self.format_hotkey(self.config.get("hotkey_translate"))
self.tray.setToolTip(f"Whisper Voice\nTranscribe: {hk1}\nTranslate: {hk2}")
# 3. Logic Components Placeholders # 3. Logic Components Placeholders
self.audio_engine = None self.audio_engine = None
self.transcriber = None self.transcriber = None
self.hotkey_manager = None self.hk_transcribe = None
self.hk_translate = None
self.overlay_root = None self.overlay_root = None
# 4. Start Loader # 4. Start Loader
@@ -222,12 +275,23 @@ class WhisperApp(QObject):
self.settings_root.setVisible(False) self.settings_root.setVisible(False)
# Install Low-Level Window Hook for Transparent Hit Test # Install Low-Level Window Hook for Transparent Hit Test
# We must keep a reference to 'self.hook' so it isn't GC'd try:
# scale = self.overlay_root.devicePixelRatio() from src.utils.window_hook import WindowHook
# self.hook = WindowHook(int(self.overlay_root.winId()), 500, 300, scale) hwnd = self.overlay_root.winId()
# self.hook.install() # Initial scale from config
scale = float(self.config.get("ui_scale"))
# NOTE: HitTest hook will be installed here later
# Current Overlay Dimensions
win_w = int(460 * scale)
win_h = int(180 * scale)
self.window_hook = WindowHook(hwnd, win_w, win_h, initial_scale=scale)
self.window_hook.install()
# Initial state: Disabled because we start inactive
self.window_hook.set_enabled(False)
except Exception as e:
logging.error(f"Failed to install WindowHook: {e}")
def center_overlay(self): def center_overlay(self):
"""Calculates and sets the Overlay position above the taskbar.""" """Calculates and sets the Overlay position above the taskbar."""
@@ -255,9 +319,16 @@ class WhisperApp(QObject):
self.audio_engine.set_visualizer_callback(self.bridge.update_amplitude) self.audio_engine.set_visualizer_callback(self.bridge.update_amplitude)
self.audio_engine.set_silence_callback(self.on_silence_detected) self.audio_engine.set_silence_callback(self.on_silence_detected)
self.transcriber = WhisperTranscriber() self.transcriber = WhisperTranscriber()
self.hotkey_manager = HotkeyManager()
self.hotkey_manager.triggered.connect(self.toggle_recording) # Dual Hotkey Managers
self.hotkey_manager.start() self.hk_transcribe = HotkeyManager(config_key="hotkey")
self.hk_transcribe.triggered.connect(lambda: self.toggle_recording(task_override="transcribe"))
self.hk_transcribe.start()
self.hk_translate = HotkeyManager(config_key="hotkey_translate")
self.hk_translate.triggered.connect(lambda: self.toggle_recording(task_override="translate"))
self.hk_translate.start()
self.bridge.update_status("Ready") self.bridge.update_status("Ready")
def run(self): def run(self):
@@ -275,7 +346,8 @@ class WhisperApp(QObject):
except: pass except: pass
self.bridge.stats_worker.stop() self.bridge.stats_worker.stop()
if self.hotkey_manager: self.hotkey_manager.stop() if self.hk_transcribe: self.hk_transcribe.stop()
if self.hk_translate: self.hk_translate.stop()
# Close all QML windows to ensure bindings stop before Python objects die # Close all QML windows to ensure bindings stop before Python objects die
if self.overlay_root: if self.overlay_root:
@@ -350,10 +422,14 @@ class WhisperApp(QObject):
print(f"Setting Changed: {key} = {value}") print(f"Setting Changed: {key} = {value}")
# 1. Hotkey Reload # 1. Hotkey Reload
if key == "hotkey": if key in ["hotkey", "hotkey_translate"]:
if self.hotkey_manager: self.hotkey_manager.reload_hotkey() if self.hk_transcribe: self.hk_transcribe.reload_hotkey()
if self.hk_translate: self.hk_translate.reload_hotkey()
if self.tray: if self.tray:
self.tray.setToolTip(f"Whisper Voice - Press {value} to Record") hk1 = self.format_hotkey(self.config.get("hotkey"))
hk2 = self.format_hotkey(self.config.get("hotkey_translate"))
self.tray.setToolTip(f"Whisper Voice\nTranscribe: {hk1}\nTranslate: {hk2}")
# 2. AI Model Reload (Heavy) # 2. AI Model Reload (Heavy)
if key in ["model_size", "compute_device", "compute_type"]: if key in ["model_size", "compute_device", "compute_type"]:
@@ -456,6 +532,8 @@ class WhisperApp(QObject):
file_path, _ = QFileDialog.getOpenFileName(None, "Select Audio", "", "Audio (*.mp3 *.wav *.flac *.m4a *.ogg)") file_path, _ = QFileDialog.getOpenFileName(None, "Select Audio", "", "Audio (*.mp3 *.wav *.flac *.m4a *.ogg)")
if file_path: if file_path:
self.bridge.update_status("Thinking...") self.bridge.update_status("Thinking...")
# Files use the default configured task usually, or we could ask?
# Default to config setting for files.
self.worker = TranscriptionWorker(self.transcriber, file_path, is_file=True, parent=self) self.worker = TranscriptionWorker(self.transcriber, file_path, is_file=True, parent=self)
self.worker.finished.connect(self.on_transcription_done) self.worker.finished.connect(self.on_transcription_done)
self.worker.start() self.worker.start()
@@ -463,10 +541,13 @@ class WhisperApp(QObject):
@Slot() @Slot()
def on_silence_detected(self): def on_silence_detected(self):
from PySide6.QtCore import QMetaObject, Qt from PySide6.QtCore import QMetaObject, Qt
# Silence detection always triggers the task that was active?
# Since silence stops recording, it just calls toggle_recording with no arg, using the stored current_task?
# Let's ensure toggle_recording handles no arg calls by stopping the CURRENT task.
QMetaObject.invokeMethod(self, "toggle_recording", Qt.QueuedConnection) QMetaObject.invokeMethod(self, "toggle_recording", Qt.QueuedConnection)
@Slot() @Slot() # Modified to allow lambda override
def toggle_recording(self): def toggle_recording(self, task_override=None):
if not self.audio_engine: return if not self.audio_engine: return
# Prevent starting a new recording while we are still transcribing the last one # Prevent starting a new recording while we are still transcribing the last one
@@ -474,23 +555,36 @@ class WhisperApp(QObject):
logging.warning("Ignored toggle request: Transcription in progress.") logging.warning("Ignored toggle request: Transcription in progress.")
return return
# Determine which task we are entering
if task_override:
intended_task = task_override
else:
intended_task = self.config.get("task")
if self.audio_engine.recording: if self.audio_engine.recording:
# STOP RECORDING
self.bridge.update_status("Thinking...") self.bridge.update_status("Thinking...")
self.bridge.isRecording = False self.bridge.isRecording = False
self.bridge.isProcessing = True # Start Processing self.bridge.isProcessing = True # Start Processing
audio_data = self.audio_engine.stop_recording() audio_data = self.audio_engine.stop_recording()
self.worker = TranscriptionWorker(self.transcriber, audio_data, parent=self)
# Use the task that started this session, or the override if provided (though usually override is for starting)
final_task = getattr(self, "current_recording_task", self.config.get("task"))
self.worker = TranscriptionWorker(self.transcriber, audio_data, parent=self, task_override=final_task)
self.worker.finished.connect(self.on_transcription_done) self.worker.finished.connect(self.on_transcription_done)
self.worker.start() self.worker.start()
else: else:
self.bridge.update_status("Recording") # START RECORDING
self.current_recording_task = intended_task
self.bridge.update_status(f"Recording ({intended_task})...")
self.bridge.isRecording = True self.bridge.isRecording = True
self.audio_engine.start_recording() self.audio_engine.start_recording()
@Slot(bool) @Slot(bool)
def on_ui_toggle_request(self, state): def on_ui_toggle_request(self, state):
if state != self.audio_engine.recording: if state != self.audio_engine.recording:
self.toggle_recording() self.toggle_recording() # Default behavior for UI clicks
@Slot(str) @Slot(str)
def on_transcription_done(self, text: str): def on_transcription_done(self, text: str):
@@ -503,8 +597,8 @@ class WhisperApp(QObject):
@Slot(bool) @Slot(bool)
def on_hotkeys_enabled_toggle(self, state): def on_hotkeys_enabled_toggle(self, state):
if self.hotkey_manager: if self.hk_transcribe: self.hk_transcribe.set_enabled(state)
self.hotkey_manager.set_enabled(state) if self.hk_translate: self.hk_translate.set_enabled(state)
@Slot(str) @Slot(str)
def on_download_requested(self, size): def on_download_requested(self, size):
@@ -531,6 +625,25 @@ class WhisperApp(QObject):
self.bridge.update_status("Error") self.bridge.update_status("Error")
logging.error(f"Download Error: {err}") logging.error(f"Download Error: {err}")
@Slot(bool)
def on_ui_toggle_request(self, is_recording):
"""Called when recording state changes."""
# Update Window Hook to allow clicking if active
is_active = is_recording or self.bridge.isProcessing
if hasattr(self, 'window_hook'):
self.window_hook.set_enabled(is_active)
@Slot(bool)
def on_processing_changed(self, is_processing):
is_active = self.bridge.isRecording or is_processing
if hasattr(self, 'window_hook'):
self.window_hook.set_enabled(is_active)
if __name__ == "__main__": if __name__ == "__main__":
import sys
app = WhisperApp() app = WhisperApp()
app.run()
# Connect extra signal for processing state
app.bridge.isProcessingChanged.connect(app.on_processing_changed)
sys.exit(app.run())

View File

@@ -39,39 +39,36 @@ def build_portable():
print("⏳ This may take 5-10 minutes...") print("⏳ This may take 5-10 minutes...")
PyInstaller.__main__.run([ PyInstaller.__main__.run([
"main.py", # Entry point "bootstrapper.py", # Entry point (Tiny Installer)
"--name=WhisperVoice", # EXE name "--name=WhisperVoice", # EXE name
"--onefile", # Single EXE (slower startup but portable) "--onefile", # Single EXE
"--noconsole", # No terminal window "--noconsole", # No terminal window
"--clean", # Clean cache "--clean", # Clean cache
*add_data_args, # Bundled assets
# Heavy libraries that need special collection # Bundle the app source to be extracted by bootstrapper
"--collect-all", "faster_whisper", # The bootstrapper expects 'app_source' folder in bundled resources
"--collect-all", "ctranslate2", "--add-data", f"src{os.pathsep}app_source/src",
"--collect-all", "PySide6", "--add-data", f"main.py{os.pathsep}app_source",
"--collect-all", "torch", "--add-data", f"requirements.txt{os.pathsep}app_source",
"--collect-all", "numpy",
# Hidden imports (modules imported dynamically) # Add assets
"--hidden-import", "keyboard", "--add-data", f"src/ui/qml{os.pathsep}app_source/src/ui/qml",
"--hidden-import", "pyperclip", "--add-data", f"assets{os.pathsep}app_source/assets",
"--hidden-import", "psutil",
"--hidden-import", "pynvml",
"--hidden-import", "sounddevice",
"--hidden-import", "scipy",
"--hidden-import", "scipy.signal",
"--hidden-import", "huggingface_hub",
"--hidden-import", "tokenizers",
# Qt plugins # No heavy collections!
"--hidden-import", "PySide6.QtQuickControls2", # The bootstrapper uses internal pip to install everything.
"--hidden-import", "PySide6.QtQuick.Controls",
# Icon (convert to .ico for Windows) # Exclude heavy modules to ensure this exe stays tiny
# "--icon=icon.ico", # Uncomment if you have a .ico file "--exclude-module", "faster_whisper",
"--exclude-module", "torch",
"--exclude-module", "PySide6",
# Icon
# "--icon=icon.ico",
]) ])
print("\n" + "="*60) print("\n" + "="*60)
print("✅ BUILD COMPLETE!") print("✅ BUILD COMPLETE!")
print("="*60) print("="*60)

View File

@@ -5,6 +5,7 @@
faster-whisper>=1.0.0 faster-whisper>=1.0.0
torch>=2.0.0 torch>=2.0.0
# UI Framework # UI Framework
PySide6>=6.6.0 PySide6>=6.6.0

View File

@@ -16,6 +16,7 @@ from src.core.paths import get_base_path
# Default Configuration # Default Configuration
DEFAULT_SETTINGS = { DEFAULT_SETTINGS = {
"hotkey": "f8", "hotkey": "f8",
"hotkey_translate": "f10",
"model_size": "small", "model_size": "small",
"input_device": None, # Device ID (int) or Name (str), None = Default "input_device": None, # Device ID (int) or Name (str), None = Default
"save_recordings": False, # Save .wav files for debugging "save_recordings": False, # Save .wav files for debugging
@@ -38,13 +39,20 @@ DEFAULT_SETTINGS = {
# AI - Advanced # AI - Advanced
"language": "auto", # "auto" or ISO code "language": "auto", # "auto" or ISO code
"task": "transcribe", # "transcribe" or "translate" (to English)
"compute_device": "auto", # "auto", "cuda", "cpu" "compute_device": "auto", # "auto", "cuda", "cpu"
"compute_type": "int8", # "int8", "float16", "float32" "compute_type": "int8", # "int8", "float16", "float32"
"beam_size": 5, "beam_size": 5,
"best_of": 5, "best_of": 5,
"vad_filter": True, "vad_filter": True,
"no_repeat_ngram_size": 0, "no_repeat_ngram_size": 0,
"condition_on_previous_text": True "condition_on_previous_text": True,
"initial_prompt": "Mm-hmm. Okay, let's go. I speak in full sentences.", # Default: Forces punctuation
# Low VRAM Mode
"unload_models_after_use": False # If True, models are unloaded immediately to free VRAM
} }
class ConfigManager: class ConfigManager:

View File

@@ -30,15 +30,16 @@ class HotkeyManager(QObject):
triggered = Signal() triggered = Signal()
def __init__(self, hotkey: str = "f8"): def __init__(self, config_key: str = "hotkey"):
""" """
Initialize the HotkeyManager. Initialize the HotkeyManager.
Args: Args:
hotkey (str): The global hotkey string description. Default: "f8". config_key (str): The configuration key to look up (e.g. "hotkey").
""" """
super().__init__() super().__init__()
self.hotkey = hotkey self.config_key = config_key
self.hotkey = "f8" # Placeholder
self.is_listening = False self.is_listening = False
self._enabled = True self._enabled = True
@@ -58,9 +59,9 @@ class HotkeyManager(QObject):
from src.core.config import ConfigManager from src.core.config import ConfigManager
config = ConfigManager() config = ConfigManager()
self.hotkey = config.get("hotkey") self.hotkey = config.get(self.config_key)
logging.info(f"Registering global hotkey: {self.hotkey}") logging.info(f"Registering global hotkey ({self.config_key}): {self.hotkey}")
try: try:
# We don't suppress=True here because we want the app to see keys during recording # We don't suppress=True here because we want the app to see keys during recording
# (Wait, actually if we are recording we WANT keyboard to see it, # (Wait, actually if we are recording we WANT keyboard to see it,

120
src/core/languages.py Normal file
View File

@@ -0,0 +1,120 @@
"""
Supported Languages Module
==========================
Full list of languages supported by OpenAI Whisper.
Maps ISO codes to display names.
"""
LANGUAGES = {
"auto": "Auto Detect",
"af": "Afrikaans",
"sq": "Albanian",
"am": "Amharic",
"ar": "Arabic",
"hy": "Armenian",
"as": "Assamese",
"az": "Azerbaijani",
"ba": "Bashkir",
"eu": "Basque",
"be": "Belarusian",
"bn": "Bengali",
"bs": "Bosnian",
"br": "Breton",
"bg": "Bulgarian",
"my": "Burmese",
"ca": "Catalan",
"zh": "Chinese",
"hr": "Croatian",
"cs": "Czech",
"da": "Danish",
"nl": "Dutch",
"en": "English",
"et": "Estonian",
"fo": "Faroese",
"fi": "Finnish",
"fr": "French",
"gl": "Galician",
"ka": "Georgian",
"de": "German",
"el": "Greek",
"gu": "Gujarati",
"ht": "Haitian",
"ha": "Hausa",
"haw": "Hawaiian",
"he": "Hebrew",
"hi": "Hindi",
"hu": "Hungarian",
"is": "Icelandic",
"id": "Indonesian",
"it": "Italian",
"ja": "Japanese",
"jw": "Javanese",
"kn": "Kannada",
"kk": "Kazakh",
"km": "Khmer",
"ko": "Korean",
"lo": "Lao",
"la": "Latin",
"lv": "Latvian",
"ln": "Lingala",
"lt": "Lithuanian",
"lb": "Luxembourgish",
"mk": "Macedonian",
"mg": "Malagasy",
"ms": "Malay",
"ml": "Malayalam",
"mt": "Maltese",
"mi": "Maori",
"mr": "Marathi",
"mn": "Mongolian",
"ne": "Nepali",
"no": "Norwegian",
"oc": "Occitan",
"pa": "Punjabi",
"ps": "Pashto",
"fa": "Persian",
"pl": "Polish",
"pt": "Portuguese",
"ro": "Romanian",
"ru": "Russian",
"sa": "Sanskrit",
"sr": "Serbian",
"sn": "Shona",
"sd": "Sindhi",
"si": "Sinhala",
"sk": "Slovak",
"sl": "Slovenian",
"so": "Somali",
"es": "Spanish",
"su": "Sundanese",
"sw": "Swahili",
"sv": "Swedish",
"tl": "Tagalog",
"tg": "Tajik",
"ta": "Tamil",
"tt": "Tatar",
"te": "Telugu",
"th": "Thai",
"bo": "Tibetan",
"tr": "Turkish",
"tk": "Turkmen",
"uk": "Ukrainian",
"ur": "Urdu",
"uz": "Uzbek",
"vi": "Vietnamese",
"cy": "Welsh",
"yi": "Yiddish",
"yo": "Yoruba",
}
def get_language_names():
return list(LANGUAGES.values())
def get_code_by_name(name):
for code, lang in LANGUAGES.items():
if lang == name:
return code
return "auto"
def get_name_by_code(code):
return LANGUAGES.get(code, "Auto Detect")

View File

@@ -15,6 +15,11 @@ import numpy as np
from src.core.config import ConfigManager from src.core.config import ConfigManager
from src.core.paths import get_models_path from src.core.paths import get_models_path
try:
import torch
except ImportError:
torch = None
# Import directly - valid since we are now running in the full environment # Import directly - valid since we are now running in the full environment
from faster_whisper import WhisperModel from faster_whisper import WhisperModel
@@ -74,11 +79,11 @@ class WhisperTranscriber:
logging.error(f"Failed to load model: {e}") logging.error(f"Failed to load model: {e}")
self.model = None self.model = None
def transcribe(self, audio_data, is_file: bool = False) -> str: def transcribe(self, audio_data, is_file: bool = False, task: Optional[str] = None) -> str:
""" """
Transcribe audio data. Transcribe audio data.
""" """
logging.info(f"Starting transcription... (is_file={is_file})") logging.info(f"Starting transcription... (is_file={is_file}, task={task})")
# Ensure model is loaded # Ensure model is loaded
if not self.model: if not self.model:
@@ -91,24 +96,76 @@ class WhisperTranscriber:
beam_size = int(self.config.get("beam_size")) beam_size = int(self.config.get("beam_size"))
best_of = int(self.config.get("best_of")) best_of = int(self.config.get("best_of"))
vad = False if is_file else self.config.get("vad_filter") vad = False if is_file else self.config.get("vad_filter")
language = self.config.get("language")
# Use task override if provided, otherwise config
# Ensure safe string and lowercase ("transcribe" vs "Transcribe")
raw_task = task if task else self.config.get("task")
final_task = str(raw_task).strip().lower() if raw_task else "transcribe"
# Sanity check for valid Whisper tasks
if final_task not in ["transcribe", "translate"]:
logging.warning(f"Invalid task '{final_task}' detected. Defaulting to 'transcribe'.")
final_task = "transcribe"
# Language handling
final_language = language if language != "auto" else None
# Anti-Hallucination: Force condition_on_previous_text=False for translation
condition_prev = self.config.get("condition_on_previous_text")
# Helper options for Translation Stability
initial_prompt = self.config.get("initial_prompt")
if final_task == "translate":
condition_prev = False
# Force beam search if user has set it to greedy (1)
# Translation requires more search breadth to find the English mapping
if beam_size < 5:
logging.info("Forcing beam_size=5 for Translation task.")
beam_size = 5
# Inject guidance prompt if none exists
if not initial_prompt:
initial_prompt = "Translate this to English."
logging.info(f"Model Dispatch: Task='{final_task}', Language='{final_language}', ConditionPrev={condition_prev}, Beam={beam_size}")
# Build arguments dynamically to avoid passing None if that's the issue
transcribe_opts = {
"beam_size": beam_size,
"best_of": best_of,
"vad_filter": vad,
"task": final_task,
"vad_parameters": dict(min_silence_duration_ms=500),
"condition_on_previous_text": condition_prev,
"without_timestamps": True
}
if initial_prompt:
transcribe_opts["initial_prompt"] = initial_prompt
# Only add language if it's explicitly set (not None/Auto)
# This avoids potentially confusing the model with explicit None
if final_language:
transcribe_opts["language"] = final_language
# Transcribe # Transcribe
segments, info = self.model.transcribe( segments, info = self.model.transcribe(audio_data, **transcribe_opts)
audio_data,
beam_size=beam_size,
best_of=best_of,
vad_filter=vad,
vad_parameters=dict(min_silence_duration_ms=500),
condition_on_previous_text=self.config.get("condition_on_previous_text"),
without_timestamps=True
)
# Aggregate text # Aggregate text
text_result = "" text_result = ""
for segment in segments: for segment in segments:
text_result += segment.text + " " text_result += segment.text + " "
return text_result.strip() text_result = text_result.strip()
# Low VRAM Mode: Unload Whisper Model immediately
if self.config.get("unload_models_after_use"):
self.unload_model()
logging.info(f"Final Transcription Output: '{text_result}'")
return text_result
except Exception as e: except Exception as e:
logging.error(f"Transcription failed: {e}") logging.error(f"Transcription failed: {e}")
@@ -127,3 +184,21 @@ class WhisperTranscriber:
return True return True
return False return False
def unload_model(self):
"""
Unloads model to free memory.
"""
if self.model:
del self.model
self.model = None
self.current_model_size = None
# Force garbage collection
import gc
gc.collect()
if torch.cuda.is_available():
torch.cuda.empty_cache()
logging.info("Whisper Model unloaded (Low VRAM Mode).")

View File

@@ -245,6 +245,26 @@ class UIBridge(QObject):
# --- Methods called from QML --- # --- Methods called from QML ---
@Slot(result=list)
def get_supported_languages(self):
from src.core.languages import get_language_names
return get_language_names()
@Slot(str)
def set_language_by_name(self, name):
from src.core.languages import get_code_by_name
from src.core.config import ConfigManager
code = get_code_by_name(name)
ConfigManager().set("language", code)
self.settingChanged.emit("language", code)
@Slot(result=str)
def get_current_language_name(self):
from src.core.languages import get_name_by_code
from src.core.config import ConfigManager
code = ConfigManager().get("language")
return get_name_by_code(code)
@Slot(str, result='QVariant') @Slot(str, result='QVariant')
def getSetting(self, key): def getSetting(self, key):
from src.core.config import ConfigManager from src.core.config import ConfigManager
@@ -356,6 +376,9 @@ class UIBridge(QObject):
try: try:
from src.core.paths import get_models_path from src.core.paths import get_models_path
# Check new simple format used by DownloadWorker # Check new simple format used by DownloadWorker
path_simple = get_models_path() / f"faster-whisper-{size}" path_simple = get_models_path() / f"faster-whisper-{size}"
if path_simple.exists() and any(path_simple.iterdir()): if path_simple.exists() and any(path_simple.iterdir()):

View File

@@ -100,7 +100,7 @@ ComboBox {
popup: Popup { popup: Popup {
y: control.height - 1 y: control.height - 1
width: control.width width: control.width
implicitHeight: contentItem.implicitHeight implicitHeight: Math.min(contentItem.implicitHeight, 300)
padding: 5 padding: 5
contentItem: ListView { contentItem: ListView {

View File

@@ -25,7 +25,7 @@ Rectangle {
Text { Text {
anchors.centerIn: parent anchors.centerIn: parent
text: control.recording ? "Listening..." : (control.currentSequence || "None") text: control.recording ? "Listening..." : (formatSequence(control.currentSequence) || "None")
color: control.recording ? SettingsStyle.accent : (control.currentSequence ? "#ffffff" : "#808080") color: control.recording ? SettingsStyle.accent : (control.currentSequence ? "#ffffff" : "#808080")
font.family: "JetBrains Mono" font.family: "JetBrains Mono"
font.pixelSize: 13 font.pixelSize: 13
@@ -72,6 +72,23 @@ Rectangle {
if (!activeFocus) control.recording = false if (!activeFocus) control.recording = false
} }
function formatSequence(seq) {
if (!seq) return ""
var parts = seq.split("+")
for (var i = 0; i < parts.length; i++) {
var p = parts[i]
// Standardize modifiers
if (p === "ctrl") parts[i] = "Ctrl"
else if (p === "alt") parts[i] = "Alt"
else if (p === "shift") parts[i] = "Shift"
else if (p === "win") parts[i] = "Win"
else if (p === "esc") parts[i] = "Esc"
// Capitalize F-keys and others (e.g. f8 -> F8, space -> Space)
else parts[i] = p.charAt(0).toUpperCase() + p.slice(1)
}
return parts.join(" + ")
}
function getKeyName(key, text) { function getKeyName(key, text) {
// F-Keys // F-Keys
if (key >= Qt.Key_F1 && key <= Qt.Key_F35) return "f" + (key - Qt.Key_F1 + 1) if (key >= Qt.Key_F1 && key <= Qt.Key_F35) return "f" + (key - Qt.Key_F1 + 1)

View File

@@ -314,14 +314,24 @@ Window {
spacing: 0 spacing: 0
ModernSettingsItem { ModernSettingsItem {
label: "Global Hotkey" label: "Global Hotkey (Transcribe)"
description: "Press to record a new shortcut (e.g. Ctrl+Space)" description: "Press to record a new shortcut (e.g. F9)"
control: ModernKeySequenceRecorder { control: ModernKeySequenceRecorder {
Layout.preferredWidth: 200 implicitWidth: 240
currentSequence: ui.getSetting("hotkey") currentSequence: ui.getSetting("hotkey")
onSequenceChanged: (seq) => ui.setSetting("hotkey", seq) onSequenceChanged: (seq) => ui.setSetting("hotkey", seq)
} }
} }
ModernSettingsItem {
label: "Global Hotkey (Translate)"
description: "Press to record a new shortcut (e.g. F10)"
control: ModernKeySequenceRecorder {
implicitWidth: 240
currentSequence: ui.getSetting("hotkey_translate")
onSequenceChanged: (seq) => ui.setSetting("hotkey_translate", seq)
}
}
ModernSettingsItem { ModernSettingsItem {
label: "Run on Startup" label: "Run on Startup"
@@ -577,6 +587,53 @@ Window {
Text { text: "Model configuration and performance"; color: SettingsStyle.textSecondary; font.family: mainFont; font.pixelSize: 14 } Text { text: "Model configuration and performance"; color: SettingsStyle.textSecondary; font.family: mainFont; font.pixelSize: 14 }
} }
ModernSettingsSection {
title: "Style & Prompting"
Layout.margins: 32
Layout.topMargin: 0
content: ColumnLayout {
width: parent.width
spacing: 0
ModernSettingsItem {
label: "Punctuation Style"
description: "Hint for how to format text"
control: ModernComboBox {
id: styleCombo
width: 180
model: ["Standard (Proper)", "Casual (Lowercase)", "Custom"]
// Logic to determine initial index based on config string
Component.onCompleted: {
let current = ui.getSetting("initial_prompt")
if (current === "Mm-hmm. Okay, let's go. I speak in full sentences.") currentIndex = 0
else if (current === "um, okay... i guess so.") currentIndex = 1
else currentIndex = 2
}
onActivated: {
if (index === 0) ui.setSetting("initial_prompt", "Mm-hmm. Okay, let's go. I speak in full sentences.")
else if (index === 1) ui.setSetting("initial_prompt", "um, okay... i guess so.")
// Custom: Don't change string immediately, let user type
}
}
}
ModernSettingsItem {
label: "Custom Prompt"
description: "Advanced: Define your own style hint"
visible: styleCombo.currentIndex === 2
control: ModernTextField {
Layout.preferredWidth: 280
placeholderText: "e.g. 'Hello, World.'"
text: ui.getSetting("initial_prompt") || ""
onEditingFinished: ui.setSetting("initial_prompt", text === "" ? null : text)
}
}
}
}
ModernSettingsSection { ModernSettingsSection {
title: "Model Config" title: "Model Config"
Layout.margins: 32 Layout.margins: 32
@@ -742,15 +799,17 @@ Window {
ModernSettingsItem { ModernSettingsItem {
label: "Language" label: "Language"
description: "Force language or Auto-detect" description: "Spoken language to transcribe"
control: ModernComboBox { control: ModernComboBox {
width: 140 Layout.preferredWidth: 200
model: ["auto", "en", "fr", "de", "es", "it", "ja", "zh", "ru"] model: ui.get_supported_languages()
currentIndex: model.indexOf(ui.getSetting("language")) currentIndex: model.indexOf(ui.get_current_language_name())
onActivated: ui.setSetting("language", currentText) onActivated: (index) => ui.set_language_by_name(currentText)
} }
} }
// Task selector removed as per user request (Hotkeys handle this now)
ModernSettingsItem { ModernSettingsItem {
label: "Compute Device" label: "Compute Device"
description: "Hardware acceleration (CUDA requires NVidia GPU)" description: "Hardware acceleration (CUDA requires NVidia GPU)"
@@ -773,6 +832,16 @@ Window {
onActivated: ui.setSetting("compute_type", currentText) onActivated: ui.setSetting("compute_type", currentText)
} }
} }
ModernSettingsItem {
label: "Low VRAM Mode"
description: "Unload models immediately after use (Saves VRAM, Adds Delay)"
showSeparator: false
control: ModernSwitch {
checked: ui.getSetting("unload_models_after_use")
onToggled: ui.setSetting("unload_models_after_use", checked)
}
}
} }
} }

32
src/utils/formatters.py Normal file
View File

@@ -0,0 +1,32 @@
"""
Formatter Utilities
===================
Helper functions for text formatting.
"""
def format_hotkey(sequence: str) -> str:
"""
Formats a hotkey sequence string (e.g. 'ctrl+alt+f9')
into a pretty readable string (e.g. 'Ctrl + Alt + F9').
"""
if not sequence:
return "None"
parts = sequence.split('+')
formatted_parts = []
for p in parts:
p = p.strip().lower()
if p == 'ctrl': formatted_parts.append('Ctrl')
elif p == 'alt': formatted_parts.append('Alt')
elif p == 'shift': formatted_parts.append('Shift')
elif p == 'win': formatted_parts.append('Win')
elif p == 'esc': formatted_parts.append('Esc')
else:
# Capitalize first letter
if len(p) > 0:
formatted_parts.append(p[0].upper() + p[1:])
else:
formatted_parts.append(p)
return " + ".join(formatted_parts)

View File

@@ -55,6 +55,10 @@ except AttributeError:
def LOWORD(l): return l & 0xffff def LOWORD(l): return l & 0xffff
def HIWORD(l): return (l >> 16) & 0xffff def HIWORD(l): return (l >> 16) & 0xffff
GWL_EXSTYLE = -20
WS_EX_TRANSPARENT = 0x00000020
WS_EX_LAYERED = 0x00080000
class WindowHook: class WindowHook:
def __init__(self, hwnd, width, height, initial_scale=1.0): def __init__(self, hwnd, width, height, initial_scale=1.0):
self.hwnd = hwnd self.hwnd = hwnd
@@ -65,6 +69,34 @@ class WindowHook:
# (Window 420x140, Pill 380x100) # (Window 420x140, Pill 380x100)
self.logical_rect = [20, 20, 20+380, 20+100] self.logical_rect = [20, 20, 20+380, 20+100]
self.current_scale = initial_scale self.current_scale = initial_scale
self.enabled = True # New flag
def set_enabled(self, enabled):
"""
Enables or disables interaction.
When disabled, we set WS_EX_TRANSPARENT so clicks pass through physically.
"""
if self.enabled == enabled:
return
self.enabled = enabled
# Get current styles
style = user32.GetWindowLongW(self.hwnd, GWL_EXSTYLE)
if not enabled:
# Enable Click-Through (Add Transparent)
# We also ensure Layered is set (Qt usually sets it, but good to be sure)
new_style = style | WS_EX_TRANSPARENT | WS_EX_LAYERED
else:
# Disable Click-Through (Remove Transparent)
new_style = style & ~WS_EX_TRANSPARENT
if new_style != style:
SetWindowLongPtr(self.hwnd, GWL_EXSTYLE, new_style)
# Force a redraw/frame update just in case
user32.SetWindowPos(self.hwnd, 0, 0, 0, 0, 0, 0x0027) # SWP_NOMOVE | SWP_NOSIZE | SWP_NOZORDER | SWP_FRAMECHANGED
def install(self): def install(self):
proc_address = ctypes.cast(self.new_wnd_proc, ctypes.c_void_p) proc_address = ctypes.cast(self.new_wnd_proc, ctypes.c_void_p)
@@ -73,6 +105,10 @@ class WindowHook:
def wnd_proc_callback(self, hwnd, msg, wParam, lParam): def wnd_proc_callback(self, hwnd, msg, wParam, lParam):
try: try:
if msg == WM_NCHITTEST: if msg == WM_NCHITTEST:
# If disabled (invisible/inactive), let clicks pass through (HTTRANSPARENT)
if not self.enabled:
return HTTRANSPARENT
res = self.on_nchittest(lParam) res = self.on_nchittest(lParam)
if res != 0: if res != 0:
return res return res

38
test_m2m.py Normal file
View File

@@ -0,0 +1,38 @@
import sys
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
def test_m2m():
model_name = "facebook/m2m100_418M"
print(f"Loading {model_name}...")
tokenizer = M2M100Tokenizer.from_pretrained(model_name)
model = M2M100ForConditionalGeneration.from_pretrained(model_name)
# Test cases: (Language Code, Input)
test_cases = [
("en", "he go to school yesterday"),
("pl", "on iść do szkoła wczoraj"), # Intentional broken grammar in Polish
]
print("\nStarting M2M Tests (Self-Translation):\n")
for lang, input_text in test_cases:
tokenizer.src_lang = lang
encoded = tokenizer(input_text, return_tensors="pt")
# Translate to SAME language
generated_tokens = model.generate(
**encoded,
forced_bos_token_id=tokenizer.get_lang_id(lang)
)
corrected = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
print(f"[{lang}]")
print(f"Input: {input_text}")
print(f"Output: {corrected}")
print("-" * 20)
if __name__ == "__main__":
test_m2m()

40
test_mt0.py Normal file
View File

@@ -0,0 +1,40 @@
import sys
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
def test_mt0():
model_name = "bigscience/mt0-base"
print(f"Loading {model_name}...")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
# Test cases: (Language, Prompt, Input)
# MT0 is instruction tuned, so we should prompt it in the target language or English.
# Cross-lingual prompting (English prompt -> Target tasks) is usually supported.
test_cases = [
("English", "Correct grammar:", "he go to school yesterday"),
("Polish", "Popraw gramatykę:", "to jest testowe zdanie bez kropki"),
("Finnish", "Korjaa kielioppi:", "tämä on testilause ilman pistettä"),
("Russian", "Исправь грамматику:", "это тестовое предложение без точки"),
("Japanese", "文法を直してください:", "これは点のないテスト文です"),
("Spanish", "Corrige la gramática:", "esta es una oración de prueba sin punto"),
]
print("\nStarting MT0 Tests:\n")
for lang, prompt_text, input_text in test_cases:
full_input = f"{prompt_text} {input_text}"
inputs = tokenizer(full_input, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=128)
corrected = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"[{lang}]")
print(f"Input: {full_input}")
print(f"Output: {corrected}")
print("-" * 20)
if __name__ == "__main__":
test_mt0()

34
test_punctuation.py Normal file
View File

@@ -0,0 +1,34 @@
import sys
import os
# Add src to path
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from src.core.grammar_assistant import GrammarAssistant
def test_punctuation():
assistant = GrammarAssistant()
assistant.load_model()
samples = [
# User's example (verbatim)
"If the voice recognition doesn't recognize that I like stopped Or something would that would it also correct that",
# Generic run-on
"hello how are you doing today i am doing fine thanks for asking",
# Missing commas/periods
"well i think its valid however we should probably check the logs first"
]
print("\nStarting Punctuation Tests:\n")
for sample in samples:
print(f"Original: {sample}")
corrected = assistant.correct(sample)
print(f"Corrected: {corrected}")
print("-" * 20)
if __name__ == "__main__":
test_punctuation()