9 Commits

Author SHA1 Message Date
Your Name
4b84a27a67 v1.0.1 Feature Update and Polish
Full Changelog:

[New Features]
- Added Native Translation Mode:
  - Whisper model now fully supports Translating any language to English
  - Added 'task' and 'language' parameters to Transcriber core
- Dual Hotkey Support:
  - Added separate Global Hotkeys for Transcribe (default F8) and Translate (default F10)
  - Both hotkeys are fully customizable in Settings
  - Engine dynamically switches modes based on which key is pressed

[UI/UX Improvements]
- Settings Window:
  - Widened Hotkey Input fields (240px) to accommodate long combinations
  - Added Pretty-Printing for hotkey sequences (e.g. 'ctrl+f9' display as 'Ctrl + F9')
  - Replaced Country Code dropdown with Full Language Names (99+ languages)
  - Made Language Dropdown scrollable (max height 300px) to prevent screen overflow
  - Removed redundant 'Task' selector (replaced by dedicated hotkeys)
- System Tray:
  - Tooltip now displays both Transcribe and Translate hotkeys
  - Tooltip hotkeys are formatted readably

[Core & Performance]
- Bootstrapper:
  - Implemented Smart Incremental Sync
  - Now checks filesize and content hash before copying files
  - Drastically reduces startup time for subsequent runs
  - Preserves user settings.json during updates
- Backend:
  - Fixed HotkeyManager to support dynamic configuration keys
  - Fixed Language Lock: selecting a language now correctly forces the model to use it
  - Refactored bridge/main connection for language list handling
2026-01-24 18:29:10 +02:00
Your Name
f184eb0037 Fix: Invisible overlay blocking mouse clicks
Problem:
The overlay window, even when fully transparent or visually hidden (opacity 0), was still intercepting mouse events. This created a 'dead zone' on the screen where users could not click through to applications behind the overlay. This occurred because the low-level window hook was answering 'HTCAPTION' to hit tests regardless of the UI state.

Solution:
1. Modified 'WindowHook' to accept an 'enabled' state.
2. When disabled, 'WM_NCHITTEST' now returns 'HTTRANSPARENT', allowing the OS to pass the click to the window underneath.
3. Updated 'main.py' to toggle this hook state dynamically:
   - ENABLED when Recording or Processing (UI is visible/active).
   - DISABLED when Idling (UI is hidden/transparent).

Result:
The overlay is now completely non-intrusive when not in use.
2026-01-24 17:51:23 +02:00
Your Name
306bd075ed Aesthetic overhaul of documentation 2026-01-24 17:29:59 +02:00
Your Name
a1cc9c61b9 Add language list and file transcription info 2026-01-24 17:27:54 +02:00
Your Name
e627e1b8aa Correct hardware detection statement in docs 2026-01-24 17:24:56 +02:00
Your Name
eaa572b42f Fix release badge for Gitea 2026-01-24 17:22:14 +02:00
Your Name
e900201214 Final documentation polish 2026-01-24 17:20:22 +02:00
Your Name
0d426aea4b Update docs with license and model stats 2026-01-24 17:16:53 +02:00
Your Name
b15ce8076f Enhance documentation 2026-01-24 17:12:21 +02:00
13 changed files with 512 additions and 120 deletions

166
README.md
View File

@@ -1,71 +1,155 @@
# Whisper Voice
<div align="center">
**Reclaim Your Voice from the Cloud.**
# 🎙️ W H I S P E R &nbsp; V O I C E
### SOVEREIGN SPEECH RECOGNITION
Whisper Voice is a high-performance, strictly local speech-to-text tool designed for the desktop. It provides instant, high-accuracy dictation anywhere on your system—no internet connection required, no corporate servers, and absolutely no data harvesting.
<br>
We believe that the tools of production—and communication—should belong to the individual, not rented from centralized tech giants.
![Status](https://img.shields.io/badge/STATUS-OPERATIONAL-success?style=for-the-badge&logo=server)
[![Download](https://img.shields.io/gitea/v/release/lashman/whisper_voice?gitea_url=https%3A%2F%2Fgit.lashman.live&label=Download&style=for-the-badge&logo=windows&logoColor=white&color=2563eb)](https://git.lashman.live/lashman/whisper_voice/releases/latest)
[![License](https://img.shields.io/badge/LICENSE-CC0_PUBLIC_DOMAIN-lightgrey?style=for-the-badge&logo=creative-commons&logoColor=black)](https://creativecommons.org/publicdomain/zero/1.0/)
<br>
> *"The master's tools will never dismantle the master's house."* — Audre Lorde
> <br>
> **Build your own tools. Run them locally.**
[Report Issue](https://git.lashman.live/lashman/whisper_voice/issues) • [View Source](https://git.lashman.live/lashman/whisper_voice) • [Releases](https://git.lashman.live/lashman/whisper_voice/releases)
</div>
<br>
## ✊ The Manifesto
**We hold these truths to be self-evident:** That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.
**Whisper Voice** is built on the principle of **technological sovereignty**. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.
---
## ✊ Core Principles
## ⚡ Technical Architecture
### 1. Total Autonomy (Local-First)
Your voice data is yours alone. Unlike commercial alternatives that siphon your words to remote data centers for processing and profiling, Whisper Voice runs entirely on your hardware. **No masters, no servers.** You retain full sovereignty over your digital footprint.
This operates on the metal. It is not a wrapper. It is an engine.
### 2. Decentralized Power
By leveraging optimized local processing, we strip away the need for reliance on massive, energy-hungry corporate infrastructure. This is technology scaled to the human level—powerful, efficient, and completely under your control.
### 3. Accessible to All
High-quality speech recognition shouldn't be gated behind subscriptions or paywalls. This tool is free, open, and built to empower users to interact with their machines on their own terms.
| Component | Technology | Benefit |
| :--- | :--- | :--- |
| **Inference Core** | **Faster-Whisper** | Hyper-optimized implementation of OpenAI's Whisper using **CTranslate2**. Delivers **4x speedups** over PyTorch. |
| **Quantization** | **INT8** | 8-bit quantization enables Pro-grade models (`Large-v3`) to run on consumer GPUs with minimal VRAM. |
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out silence and background noise, conserving compute. |
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that feels native yet remains OS-independent. |
---
## ✨ Features
## 📊 Intelligence Matrix
* **100% Offline Processing**: Once the recognition engine is downloaded, the cable can be cut. Nothing leaves your machine.
* **Universal Compatibility**: Works in any text field—editors, chat apps, terminals, or browsers. If you can type there, you can speak there.
* **Adaptive Input**:
* *Clipboard Mode*: Standard paste injection.
* *High-Speed Simulation*: Simulates keystrokes at supersonic speeds (up to 6000 CPM) for apps that block pasting.
* **System Integration**: Minimalist overlay and system tray presence. It exists when you need it and vanishes when you don't.
* **Resource Efficiency**: Optimized to run smoothly on consumer hardware without monopolizing your system resources.
Select the model that aligns with your hardware capabilities.
| Model | VRAM (GPU) | RAM (CPU) | Velocity | Designation |
| :--- | :--- | :--- | :--- | :--- |
| `Tiny` | **~500 MB** | ~1 GB | ⚡ **Supersonic** | Command & Control, older hardware. |
| `Base` | **~600 MB** | ~1 GB | 🚀 **Very Fast** | Daily driver for low-power laptops. |
| `Small` | **~1 GB** | ~2 GB | ⏩ **Fast** | High accuracy English dictation. |
| `Medium` | **~2 GB** | ~4 GB | ⚖️ **Balanced** | Complex vocabulary, foreign accents. |
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **Sweet Spot.** Near-Large smarts, Medium speed. |
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional transcription. Uncompromised. |
> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.*
---
## 🚀 Getting Started
## 🛠️ Operations
### Installation
1. Download the latest release.
2. Run `WhisperVoice.exe`.
3. On the first run, the bootstrapper will autonomously provision the necessary runtime environment. This ensures your system remains clean and dependencies are self-contained.
### 📥 Deployment
1. **Download**: Grab `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
2. **Deploy**: Place it anywhere. It is portable.
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python environment (~2GB) on first launch.
### Usage
1. **Set Your Trigger**: Configure a global hotkey (default: `F9`) in the settings.
2. **Speak Freely**: Hold the hotkey (or toggle it) and speak.
3. **Direct Action**: Your words are instantly transcribed and injected into your active window.
### 🕹️ Controls
* **Global Hook**: `F9` (Default). Press to open the channel. Release to inject text.
* **Tray Agent**: Retracts to the system tray. Right-click for **Settings** or **File Transcription**.
### 📡 Input Modes
| Mode | Description | Speed |
| :--- | :--- | :--- |
| **Clipboard Paste** | Standard text injection via OS clipboard. | Instant |
| **Simulate Typing** | Mimics physical keystrokes. Bypasses anti-paste blocks. | Up to **6000** CPM |
---
## ⚙️ Configuration
## 🌐 Universal Translation
The **Settings** panel puts the means of configuration in your hands:
The model listens in **99 languages** and translates them to English or transcribes them natively.
* **Recognition Engine**: Choose the size of the model that fits your hardware capabilities (Tiny to Large). Larger models offer greater precision but require more computing power.
* **Input Method**: Switch between "Clipboard Paste" and "Simulate Typing" depending on target application restrictions.
* **Typing Speed**: Adjust the keystroke injection rate. Crank it up to 6000 CPM for instant text delivery.
* **Run on Startup**: Configure the agent to be ready the moment your session begins.
<details>
<summary><b>Click to view supported languages</b></summary>
<br>
| | | | |
| :--- | :--- | :--- | :--- |
| Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 |
| Armenian 🇦🇲 | Assamese 🇮🇳 | Azerbaijani 🇦🇿 | Bashkir 🇷🇺 |
| Basque 🇪🇸 | Belarusian 🇧🇾 | Bengali 🇧🇩 | Bosnian 🇧🇦 |
| Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 | Castilian 🇪🇸 |
| Catalan 🇪🇸 | Chinese 🇨🇳 | Croatian 🇭🇷 | Czech 🇨🇿 |
| Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 |
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 |
| Galician 🇪🇸 | Georgian 🇬🇪 | German 🇩🇪 | Greek 🇬🇷 |
| Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 |
| Indonesian 🇮🇩 | Italian 🇮🇹 | Japanese 🇯🇵 | Javanese 🇮🇩 |
| Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 |
| Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 | Macedonian 🇲🇰 | Malagasy 🇲🇬 |
| Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 |
| Nepali 🇳🇵 | Norwegian 🇳🇴 | Occitan 🇫🇷 | Panjabi 🇮🇳 |
| Pashto 🇦🇫 | Persian 🇮🇷 | Polish 🇵🇱 | Portuguese 🇵🇹 |
| Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 | Sanskrit 🇮🇳 |
| Serbian 🇷🇸 | Shona 🇿🇼 | Sindhi 🇵🇰 | Sinhala 🇱🇰 |
| Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 |
| Sundanese 🇮🇩 | Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 |
| Tajik 🇹🇯 | Tamil 🇮🇳 | Tatar 🇷🇺 | Telugu 🇮🇳 |
| Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 | Turkmen 🇹🇲 |
| Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e |
| Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿 | Yiddish 🇮🇱 | Yoruba 🇳🇬 | |
</details>
---
## 🤝 Mutual Aid
## 🔧 Troubleshooting
This project thrives on community collaboration. If you have improvements, fixes, or ideas, you are encouraged to contribute. We build better systems when we build them together, horizontally and transparently.
<details>
<summary><b>🔥 App crashes on start</b></summary>
<blockquote>
The underlying engine requires standard C++ libraries. Install the <b>Microsoft Visual C++ Redistributable (2015-2022)</b>.
</blockquote>
</details>
* **Report Issues**: If something breaks, let us know.
* **Contribute Code**: The source is open. Fork it, improve it, share it.
<details>
<summary><b>🐌 "Simulate Typing" is slow</b></summary>
<blockquote>
Some apps (games, RDP) can't handle supersonic input. Go to <b>Settings</b> and lower the <b>Typing Speed</b> to ~1200 CPM.
</blockquote>
</details>
<details>
<summary><b>🎤 No Audio / Silence</b></summary>
<blockquote>
The agent listens to the <b>Default Communication Device</b>. Ensure your microphone is set correctly in Windows Sound Settings.
</blockquote>
</details>
---
*Built with local processing libraries and Qt.*
*No gods, no cloud managers.*
<div align="center">
### ⚖️ PUBLIC DOMAIN (CC0 1.0)
*No Rights Reserved. No Gods. No Managers.*
Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD).
</div>

View File

@@ -259,48 +259,72 @@ class Bootstrapper:
process.wait()
def refresh_app_source(self):
"""Refresh app source files. Skips if already exists to save time."""
# Optimization: If app/main.py exists, skip update to improve startup speed.
# The user can delete the 'runtime' folder to force an update.
if (self.app_path / "main.py").exists():
log("App already exists. Skipping update.")
return True
if self.ui: self.ui.set_status("Updating app files...")
"""
Smartly updates app source files by only copying changed files.
Preserves user settings and reduces disk I/O.
"""
if self.ui: self.ui.set_status("Checking for updates...")
try:
# Preserve settings.json if it exists
settings_path = self.app_path / "settings.json"
temp_settings = None
if settings_path.exists():
try:
temp_settings = settings_path.read_bytes()
except:
log("Failed to backup settings.json, it involves risk of data loss.")
if self.app_path.exists():
shutil.rmtree(self.app_path, ignore_errors=True)
# 1. Ensure destination exists
if not self.app_path.exists():
self.app_path.mkdir(parents=True, exist_ok=True)
shutil.copytree(
self.source_path,
self.app_path,
ignore=shutil.ignore_patterns(
'__pycache__', '*.pyc', '.git', 'venv',
'build', 'dist', '*.egg-info', 'runtime'
)
)
# Restore settings.json
if temp_settings:
try:
settings_path.write_bytes(temp_settings)
log("Restored settings.json")
except:
log("Failed to restore settings.json")
# 2. Walk source and sync
# source_path is the temporary bundled folder
# app_path is the persistent runtime folder
changes_made = 0
for src_dir, dirs, files in os.walk(self.source_path):
# Determine relative path from source root
rel_path = Path(src_dir).relative_to(self.source_path)
dst_dir = self.app_path / rel_path
# Ensure directory exists
if not dst_dir.exists():
dst_dir.mkdir(parents=True, exist_ok=True)
for file in files:
# Skip ignored files
if file in ['__pycache__', '.git', 'settings.json'] or file.endswith('.pyc'):
continue
src_file = Path(src_dir) / file
dst_file = dst_dir / file
# Check if update needed
should_copy = False
if not dst_file.exists():
should_copy = True
else:
# Compare size first (fast)
if src_file.stat().st_size != dst_file.stat().st_size:
should_copy = True
else:
# Compare content (slower but accurate)
# Only read if size matches to verify diff
if src_file.read_bytes() != dst_file.read_bytes():
should_copy = True
if should_copy:
shutil.copy2(src_file, dst_file)
changes_made += 1
if self.ui: self.ui.set_detail(f"Updated: {file}")
# 3. Cleanup logic (Optional: remove files in dest that are not in source)
# For now, we only add/update to prevent deleting generated user files (logs, etc)
if changes_made > 0:
log(f"Update complete. {changes_made} files changed.")
else:
log("App is up to date.")
return True
except Exception as e:
log(f"Error refreshing app source: {e}")
# Fallback to nuclear option if sync fails completely?
# No, 'smart_sync' failing might mean permissions, nuclear wouldn't help.
return False
def run_app(self):

118
main.py
View File

@@ -118,13 +118,14 @@ class DownloadWorker(QThread):
class TranscriptionWorker(QThread):
finished = Signal(str)
def __init__(self, transcriber, audio_data, is_file=False, parent=None):
def __init__(self, transcriber, audio_data, is_file=False, parent=None, task_override=None):
super().__init__(parent)
self.transcriber = transcriber
self.audio_data = audio_data
self.is_file = is_file
self.task_override = task_override
def run(self):
text = self.transcriber.transcribe(self.audio_data, is_file=self.is_file)
text = self.transcriber.transcribe(self.audio_data, is_file=self.is_file, task=self.task_override)
self.finished.emit(text)
class WhisperApp(QObject):
@@ -166,13 +167,18 @@ class WhisperApp(QObject):
self.tray.transcribe_file_requested.connect(self.transcribe_file)
# Init Tooltip
hotkey = self.config.get("hotkey")
self.tray.setToolTip(f"Whisper Voice - Press {hotkey} to Record")
from src.utils.formatters import format_hotkey
self.format_hotkey = format_hotkey # Store ref
hk1 = self.format_hotkey(self.config.get("hotkey"))
hk2 = self.format_hotkey(self.config.get("hotkey_translate"))
self.tray.setToolTip(f"Whisper Voice\nTranscribe: {hk1}\nTranslate: {hk2}")
# 3. Logic Components Placeholders
self.audio_engine = None
self.transcriber = None
self.hotkey_manager = None
self.hk_transcribe = None
self.hk_translate = None
self.overlay_root = None
# 4. Start Loader
@@ -222,12 +228,23 @@ class WhisperApp(QObject):
self.settings_root.setVisible(False)
# Install Low-Level Window Hook for Transparent Hit Test
# We must keep a reference to 'self.hook' so it isn't GC'd
# scale = self.overlay_root.devicePixelRatio()
# self.hook = WindowHook(int(self.overlay_root.winId()), 500, 300, scale)
# self.hook.install()
# NOTE: HitTest hook will be installed here later
try:
from src.utils.window_hook import WindowHook
hwnd = self.overlay_root.winId()
# Initial scale from config
scale = float(self.config.get("ui_scale"))
# Current Overlay Dimensions
win_w = int(460 * scale)
win_h = int(180 * scale)
self.window_hook = WindowHook(hwnd, win_w, win_h, initial_scale=scale)
self.window_hook.install()
# Initial state: Disabled because we start inactive
self.window_hook.set_enabled(False)
except Exception as e:
logging.error(f"Failed to install WindowHook: {e}")
def center_overlay(self):
"""Calculates and sets the Overlay position above the taskbar."""
@@ -255,9 +272,16 @@ class WhisperApp(QObject):
self.audio_engine.set_visualizer_callback(self.bridge.update_amplitude)
self.audio_engine.set_silence_callback(self.on_silence_detected)
self.transcriber = WhisperTranscriber()
self.hotkey_manager = HotkeyManager()
self.hotkey_manager.triggered.connect(self.toggle_recording)
self.hotkey_manager.start()
# Dual Hotkey Managers
self.hk_transcribe = HotkeyManager(config_key="hotkey")
self.hk_transcribe.triggered.connect(lambda: self.toggle_recording(task_override="transcribe"))
self.hk_transcribe.start()
self.hk_translate = HotkeyManager(config_key="hotkey_translate")
self.hk_translate.triggered.connect(lambda: self.toggle_recording(task_override="translate"))
self.hk_translate.start()
self.bridge.update_status("Ready")
def run(self):
@@ -275,7 +299,8 @@ class WhisperApp(QObject):
except: pass
self.bridge.stats_worker.stop()
if self.hotkey_manager: self.hotkey_manager.stop()
if self.hk_transcribe: self.hk_transcribe.stop()
if self.hk_translate: self.hk_translate.stop()
# Close all QML windows to ensure bindings stop before Python objects die
if self.overlay_root:
@@ -350,10 +375,14 @@ class WhisperApp(QObject):
print(f"Setting Changed: {key} = {value}")
# 1. Hotkey Reload
if key == "hotkey":
if self.hotkey_manager: self.hotkey_manager.reload_hotkey()
if key in ["hotkey", "hotkey_translate"]:
if self.hk_transcribe: self.hk_transcribe.reload_hotkey()
if self.hk_translate: self.hk_translate.reload_hotkey()
if self.tray:
self.tray.setToolTip(f"Whisper Voice - Press {value} to Record")
hk1 = self.format_hotkey(self.config.get("hotkey"))
hk2 = self.format_hotkey(self.config.get("hotkey_translate"))
self.tray.setToolTip(f"Whisper Voice\nTranscribe: {hk1}\nTranslate: {hk2}")
# 2. AI Model Reload (Heavy)
if key in ["model_size", "compute_device", "compute_type"]:
@@ -456,6 +485,8 @@ class WhisperApp(QObject):
file_path, _ = QFileDialog.getOpenFileName(None, "Select Audio", "", "Audio (*.mp3 *.wav *.flac *.m4a *.ogg)")
if file_path:
self.bridge.update_status("Thinking...")
# Files use the default configured task usually, or we could ask?
# Default to config setting for files.
self.worker = TranscriptionWorker(self.transcriber, file_path, is_file=True, parent=self)
self.worker.finished.connect(self.on_transcription_done)
self.worker.start()
@@ -463,10 +494,13 @@ class WhisperApp(QObject):
@Slot()
def on_silence_detected(self):
from PySide6.QtCore import QMetaObject, Qt
# Silence detection always triggers the task that was active?
# Since silence stops recording, it just calls toggle_recording with no arg, using the stored current_task?
# Let's ensure toggle_recording handles no arg calls by stopping the CURRENT task.
QMetaObject.invokeMethod(self, "toggle_recording", Qt.QueuedConnection)
@Slot()
def toggle_recording(self):
@Slot() # Modified to allow lambda override
def toggle_recording(self, task_override=None):
if not self.audio_engine: return
# Prevent starting a new recording while we are still transcribing the last one
@@ -474,23 +508,36 @@ class WhisperApp(QObject):
logging.warning("Ignored toggle request: Transcription in progress.")
return
# Determine which task we are entering
if task_override:
intended_task = task_override
else:
intended_task = self.config.get("task")
if self.audio_engine.recording:
# STOP RECORDING
self.bridge.update_status("Thinking...")
self.bridge.isRecording = False
self.bridge.isProcessing = True # Start Processing
audio_data = self.audio_engine.stop_recording()
self.worker = TranscriptionWorker(self.transcriber, audio_data, parent=self)
# Use the task that started this session, or the override if provided (though usually override is for starting)
final_task = getattr(self, "current_recording_task", self.config.get("task"))
self.worker = TranscriptionWorker(self.transcriber, audio_data, parent=self, task_override=final_task)
self.worker.finished.connect(self.on_transcription_done)
self.worker.start()
else:
self.bridge.update_status("Recording")
# START RECORDING
self.current_recording_task = intended_task
self.bridge.update_status(f"Recording ({intended_task})...")
self.bridge.isRecording = True
self.audio_engine.start_recording()
@Slot(bool)
def on_ui_toggle_request(self, state):
if state != self.audio_engine.recording:
self.toggle_recording()
self.toggle_recording() # Default behavior for UI clicks
@Slot(str)
def on_transcription_done(self, text: str):
@@ -503,8 +550,8 @@ class WhisperApp(QObject):
@Slot(bool)
def on_hotkeys_enabled_toggle(self, state):
if self.hotkey_manager:
self.hotkey_manager.set_enabled(state)
if self.hk_transcribe: self.hk_transcribe.set_enabled(state)
if self.hk_translate: self.hk_translate.set_enabled(state)
@Slot(str)
def on_download_requested(self, size):
@@ -531,6 +578,25 @@ class WhisperApp(QObject):
self.bridge.update_status("Error")
logging.error(f"Download Error: {err}")
@Slot(bool)
def on_ui_toggle_request(self, is_recording):
"""Called when recording state changes."""
# Update Window Hook to allow clicking if active
is_active = is_recording or self.bridge.isProcessing
if hasattr(self, 'window_hook'):
self.window_hook.set_enabled(is_active)
@Slot(bool)
def on_processing_changed(self, is_processing):
is_active = self.bridge.isRecording or is_processing
if hasattr(self, 'window_hook'):
self.window_hook.set_enabled(is_active)
if __name__ == "__main__":
import sys
app = WhisperApp()
app.run()
# Connect extra signal for processing state
app.bridge.isProcessingChanged.connect(app.on_processing_changed)
sys.exit(app.run())

View File

@@ -16,6 +16,7 @@ from src.core.paths import get_base_path
# Default Configuration
DEFAULT_SETTINGS = {
"hotkey": "f8",
"hotkey_translate": "f10",
"model_size": "small",
"input_device": None, # Device ID (int) or Name (str), None = Default
"save_recordings": False, # Save .wav files for debugging
@@ -38,6 +39,7 @@ DEFAULT_SETTINGS = {
# AI - Advanced
"language": "auto", # "auto" or ISO code
"task": "transcribe", # "transcribe" or "translate" (to English)
"compute_device": "auto", # "auto", "cuda", "cpu"
"compute_type": "int8", # "int8", "float16", "float32"
"beam_size": 5,

View File

@@ -30,15 +30,16 @@ class HotkeyManager(QObject):
triggered = Signal()
def __init__(self, hotkey: str = "f8"):
def __init__(self, config_key: str = "hotkey"):
"""
Initialize the HotkeyManager.
Args:
hotkey (str): The global hotkey string description. Default: "f8".
config_key (str): The configuration key to look up (e.g. "hotkey").
"""
super().__init__()
self.hotkey = hotkey
self.config_key = config_key
self.hotkey = "f8" # Placeholder
self.is_listening = False
self._enabled = True
@@ -58,9 +59,9 @@ class HotkeyManager(QObject):
from src.core.config import ConfigManager
config = ConfigManager()
self.hotkey = config.get("hotkey")
self.hotkey = config.get(self.config_key)
logging.info(f"Registering global hotkey: {self.hotkey}")
logging.info(f"Registering global hotkey ({self.config_key}): {self.hotkey}")
try:
# We don't suppress=True here because we want the app to see keys during recording
# (Wait, actually if we are recording we WANT keyboard to see it,

120
src/core/languages.py Normal file
View File

@@ -0,0 +1,120 @@
"""
Supported Languages Module
==========================
Full list of languages supported by OpenAI Whisper.
Maps ISO codes to display names.
"""
LANGUAGES = {
"auto": "Auto Detect",
"af": "Afrikaans",
"sq": "Albanian",
"am": "Amharic",
"ar": "Arabic",
"hy": "Armenian",
"as": "Assamese",
"az": "Azerbaijani",
"ba": "Bashkir",
"eu": "Basque",
"be": "Belarusian",
"bn": "Bengali",
"bs": "Bosnian",
"br": "Breton",
"bg": "Bulgarian",
"my": "Burmese",
"ca": "Catalan",
"zh": "Chinese",
"hr": "Croatian",
"cs": "Czech",
"da": "Danish",
"nl": "Dutch",
"en": "English",
"et": "Estonian",
"fo": "Faroese",
"fi": "Finnish",
"fr": "French",
"gl": "Galician",
"ka": "Georgian",
"de": "German",
"el": "Greek",
"gu": "Gujarati",
"ht": "Haitian",
"ha": "Hausa",
"haw": "Hawaiian",
"he": "Hebrew",
"hi": "Hindi",
"hu": "Hungarian",
"is": "Icelandic",
"id": "Indonesian",
"it": "Italian",
"ja": "Japanese",
"jw": "Javanese",
"kn": "Kannada",
"kk": "Kazakh",
"km": "Khmer",
"ko": "Korean",
"lo": "Lao",
"la": "Latin",
"lv": "Latvian",
"ln": "Lingala",
"lt": "Lithuanian",
"lb": "Luxembourgish",
"mk": "Macedonian",
"mg": "Malagasy",
"ms": "Malay",
"ml": "Malayalam",
"mt": "Maltese",
"mi": "Maori",
"mr": "Marathi",
"mn": "Mongolian",
"ne": "Nepali",
"no": "Norwegian",
"oc": "Occitan",
"pa": "Punjabi",
"ps": "Pashto",
"fa": "Persian",
"pl": "Polish",
"pt": "Portuguese",
"ro": "Romanian",
"ru": "Russian",
"sa": "Sanskrit",
"sr": "Serbian",
"sn": "Shona",
"sd": "Sindhi",
"si": "Sinhala",
"sk": "Slovak",
"sl": "Slovenian",
"so": "Somali",
"es": "Spanish",
"su": "Sundanese",
"sw": "Swahili",
"sv": "Swedish",
"tl": "Tagalog",
"tg": "Tajik",
"ta": "Tamil",
"tt": "Tatar",
"te": "Telugu",
"th": "Thai",
"bo": "Tibetan",
"tr": "Turkish",
"tk": "Turkmen",
"uk": "Ukrainian",
"ur": "Urdu",
"uz": "Uzbek",
"vi": "Vietnamese",
"cy": "Welsh",
"yi": "Yiddish",
"yo": "Yoruba",
}
def get_language_names():
return list(LANGUAGES.values())
def get_code_by_name(name):
for code, lang in LANGUAGES.items():
if lang == name:
return code
return "auto"
def get_name_by_code(code):
return LANGUAGES.get(code, "Auto Detect")

View File

@@ -74,11 +74,11 @@ class WhisperTranscriber:
logging.error(f"Failed to load model: {e}")
self.model = None
def transcribe(self, audio_data, is_file: bool = False) -> str:
def transcribe(self, audio_data, is_file: bool = False, task: Optional[str] = None) -> str:
"""
Transcribe audio data.
"""
logging.info(f"Starting transcription... (is_file={is_file})")
logging.info(f"Starting transcription... (is_file={is_file}, task={task})")
# Ensure model is loaded
if not self.model:
@@ -91,6 +91,10 @@ class WhisperTranscriber:
beam_size = int(self.config.get("beam_size"))
best_of = int(self.config.get("best_of"))
vad = False if is_file else self.config.get("vad_filter")
language = self.config.get("language")
# Use task override if provided, otherwise config
final_task = task if task else self.config.get("task")
# Transcribe
segments, info = self.model.transcribe(
@@ -98,6 +102,8 @@ class WhisperTranscriber:
beam_size=beam_size,
best_of=best_of,
vad_filter=vad,
task=final_task,
language=language if language != "auto" else None,
vad_parameters=dict(min_silence_duration_ms=500),
condition_on_previous_text=self.config.get("condition_on_previous_text"),
without_timestamps=True

View File

@@ -245,6 +245,26 @@ class UIBridge(QObject):
# --- Methods called from QML ---
@Slot(result=list)
def get_supported_languages(self):
from src.core.languages import get_language_names
return get_language_names()
@Slot(str)
def set_language_by_name(self, name):
from src.core.languages import get_code_by_name
from src.core.config import ConfigManager
code = get_code_by_name(name)
ConfigManager().set("language", code)
self.settingChanged.emit("language", code)
@Slot(result=str)
def get_current_language_name(self):
from src.core.languages import get_name_by_code
from src.core.config import ConfigManager
code = ConfigManager().get("language")
return get_name_by_code(code)
@Slot(str, result='QVariant')
def getSetting(self, key):
from src.core.config import ConfigManager

View File

@@ -100,7 +100,7 @@ ComboBox {
popup: Popup {
y: control.height - 1
width: control.width
implicitHeight: contentItem.implicitHeight
implicitHeight: Math.min(contentItem.implicitHeight, 300)
padding: 5
contentItem: ListView {

View File

@@ -25,7 +25,7 @@ Rectangle {
Text {
anchors.centerIn: parent
text: control.recording ? "Listening..." : (control.currentSequence || "None")
text: control.recording ? "Listening..." : (formatSequence(control.currentSequence) || "None")
color: control.recording ? SettingsStyle.accent : (control.currentSequence ? "#ffffff" : "#808080")
font.family: "JetBrains Mono"
font.pixelSize: 13
@@ -72,6 +72,23 @@ Rectangle {
if (!activeFocus) control.recording = false
}
function formatSequence(seq) {
if (!seq) return ""
var parts = seq.split("+")
for (var i = 0; i < parts.length; i++) {
var p = parts[i]
// Standardize modifiers
if (p === "ctrl") parts[i] = "Ctrl"
else if (p === "alt") parts[i] = "Alt"
else if (p === "shift") parts[i] = "Shift"
else if (p === "win") parts[i] = "Win"
else if (p === "esc") parts[i] = "Esc"
// Capitalize F-keys and others (e.g. f8 -> F8, space -> Space)
else parts[i] = p.charAt(0).toUpperCase() + p.slice(1)
}
return parts.join(" + ")
}
function getKeyName(key, text) {
// F-Keys
if (key >= Qt.Key_F1 && key <= Qt.Key_F35) return "f" + (key - Qt.Key_F1 + 1)

View File

@@ -314,14 +314,24 @@ Window {
spacing: 0
ModernSettingsItem {
label: "Global Hotkey"
description: "Press to record a new shortcut (e.g. Ctrl+Space)"
label: "Global Hotkey (Transcribe)"
description: "Press to record a new shortcut (e.g. F9)"
control: ModernKeySequenceRecorder {
Layout.preferredWidth: 200
implicitWidth: 240
currentSequence: ui.getSetting("hotkey")
onSequenceChanged: (seq) => ui.setSetting("hotkey", seq)
}
}
ModernSettingsItem {
label: "Global Hotkey (Translate)"
description: "Press to record a new shortcut (e.g. F10)"
control: ModernKeySequenceRecorder {
implicitWidth: 240
currentSequence: ui.getSetting("hotkey_translate")
onSequenceChanged: (seq) => ui.setSetting("hotkey_translate", seq)
}
}
ModernSettingsItem {
label: "Run on Startup"
@@ -742,15 +752,17 @@ Window {
ModernSettingsItem {
label: "Language"
description: "Force language or Auto-detect"
description: "Spoken language to transcribe"
control: ModernComboBox {
width: 140
model: ["auto", "en", "fr", "de", "es", "it", "ja", "zh", "ru"]
currentIndex: model.indexOf(ui.getSetting("language"))
onActivated: ui.setSetting("language", currentText)
Layout.preferredWidth: 200
model: ui.get_supported_languages()
currentIndex: model.indexOf(ui.get_current_language_name())
onActivated: (index) => ui.set_language_by_name(currentText)
}
}
// Task selector removed as per user request (Hotkeys handle this now)
ModernSettingsItem {
label: "Compute Device"
description: "Hardware acceleration (CUDA requires NVidia GPU)"

32
src/utils/formatters.py Normal file
View File

@@ -0,0 +1,32 @@
"""
Formatter Utilities
===================
Helper functions for text formatting.
"""
def format_hotkey(sequence: str) -> str:
"""
Formats a hotkey sequence string (e.g. 'ctrl+alt+f9')
into a pretty readable string (e.g. 'Ctrl + Alt + F9').
"""
if not sequence:
return "None"
parts = sequence.split('+')
formatted_parts = []
for p in parts:
p = p.strip().lower()
if p == 'ctrl': formatted_parts.append('Ctrl')
elif p == 'alt': formatted_parts.append('Alt')
elif p == 'shift': formatted_parts.append('Shift')
elif p == 'win': formatted_parts.append('Win')
elif p == 'esc': formatted_parts.append('Esc')
else:
# Capitalize first letter
if len(p) > 0:
formatted_parts.append(p[0].upper() + p[1:])
else:
formatted_parts.append(p)
return " + ".join(formatted_parts)

View File

@@ -65,6 +65,10 @@ class WindowHook:
# (Window 420x140, Pill 380x100)
self.logical_rect = [20, 20, 20+380, 20+100]
self.current_scale = initial_scale
self.enabled = True # New flag
def set_enabled(self, enabled):
self.enabled = enabled
def install(self):
proc_address = ctypes.cast(self.new_wnd_proc, ctypes.c_void_p)
@@ -73,6 +77,10 @@ class WindowHook:
def wnd_proc_callback(self, hwnd, msg, wParam, lParam):
try:
if msg == WM_NCHITTEST:
# If disabled (invisible/inactive), let clicks pass through (HTTRANSPARENT)
if not self.enabled:
return HTTRANSPARENT
res = self.on_nchittest(lParam)
if res != 0:
return res