2 Commits
main ... v1.1.0

13 changed files with 799 additions and 102 deletions

View File

@@ -43,6 +43,18 @@ Whisper Voice operates directly on the metal. It is not an API wrapper; it is an
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only pure intent is processed. | | **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only pure intent is processed. |
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. | | **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. |
### 🛑 Compatibility Matrix (Windows)
The core engine (`CTranslate2`) is heavily optimized for Nvidia tensor cores.
| Manufacturer | Hardware | Status | Notes |
| :--- | :--- | :--- | :--- |
| **Nvidia** | GTX 900+ / RTX | ✅ **Supported** | Full heavy-metal acceleration. |
| **AMD** | Radeon RX | ⚠️ **CPU Fallback** | Runs on CPU. Valid for `Small/Medium`, slow for `Large`. |
| **Intel** | Arc / Iris | ⚠️ **CPU Fallback** | Runs on CPU. Valid for `Small/Medium`, slow for `Large`. |
| **Apple** | M1 / M2 / M3 | ❌ **Unsupported** | Release is strictly Windows x64. |
> **AMD Users**: v1.0.3 auto-detects GPU failures and silently falls back to CPU.
<br> <br>
## 🖋️ Universal Transcription ## 🖋️ Universal Transcription
@@ -56,14 +68,20 @@ At its core, Whisper Voice is the ultimate bridge between thought and text. It l
### Workflow: `F9 (Default)` ### Workflow: `F9 (Default)`
The primary channel for native-language transcription. It transcribes precisely what it hears in the language you speak (or the one you've locked in Settings). The primary channel for native-language transcription. It transcribes precisely what it hears in the language you speak (or the one you've locked in Settings).
### ✨ Style Prompting (New in v1.0.2) ### 🧠 Intelligent Correction (New in v1.1.0)
Whisper Voice replaces traditional "grammar correction models" with a native **Style Prompting** engine. By injecting a specific "pre-prompt" into the model's context window, we can guide its internal style without external post-processing. Whisper Voice now integrates a local **Llama 3.2 1B** LLM to act as a "Silent Consultant". It post-processes transcripts to fix grammar or polish style without effectively "chatting" back.
* **Standard (Default)**: Forces the model to use full sentences, proper capitalization, and periods. Ideal for dictation. It is strictly trained on a **Forensic Protocol**: it will never lecture you, never refuse to process explicit language, and never sanitize your words. Your profanity is yours to keep.
* **Casual**: Encourages a relaxed, lowercase style (e.g., "no way that's crazy lol").
* **Custom**: Allows you to seed the model with your own context (e.g., "Here is a list of medical terms:").
This approach incurs **zero latency penalty** and **zero extra VRAM** usage. #### Correction Modes:
* **Standard (Default)**: Fixes grammar, punctuation, and capitalization while keeping every word you said.
* **Grammar Only**: Strictly fixes objective errors (spelling/agreement). Touches nothing else.
* **Rewrite**: Polishes the flow and clarity of your sentences while explicitly preserving your original tone (Casual stays casual, Formal stays formal).
#### Supported Languages:
The correction engine is optimized for **English, German, French, Italian, Portuguese, Spanish, Hindi, and Thai**. It also performs well on **Russian, Chinese, Japanese, and Romanian**.
This approach incurs a ~2s latency penalty but uses **zero extra VRAM** when in Low VRAM mode.
<br> <br>

View File

@@ -1,28 +1,28 @@
# Release v1.0.2 # Release v1.0.4
**"The Lightweight Release"** **"The Compatibility Update"**
This release focuses on removing bloat and switching to a native approach for punctuation, resulting in a significantly faster and smaller application. This release focuses on maximum stability across different hardware configurations (AMD, Intel, Nvidia) and fixing startup crashes related to corrupted models or missing drivers.
## 🚀 Key Changes ## 🛠️ Critical Fixes
### 1. Style Prompting (Replaces Grammar Model) ### 1. Robust CPU Fallback (AMD / Intel Support)
We have removed the heavy "Grammar Correction" model (M2M100) and replaced it with **Style Prompting**. * **Problem**: Previously, if an AMD user tried to run the app, it would crash instantly because it tried to load Nvidia CUDA libraries by default.
* **How it works**: Uses Whisper's internal context awareness to force proper punctuation. * **Fix**: The app now **silently detects** if CUDA initialization fails (due to missing DLLs or incompatible hardware) and **automatically falls back to CPU mode**.
* **New Settings**: Go to `Settings -> AI Engine` to choose a style: * **Result**: The app "just works" on any Windows machine, regardless of GPU.
* **Standard**: (Default) Forces full sentences and proper punctuation.
* **Casual**: Relaxed, lowercase style.
* **Custom**: Enter your own prompt context.
### 2. Bloat Removal ### 2. Startup Crash Protection
* **Removed**: `transformers`, `sentencepiece`, `accelerate` libraries. * **Problem**: If `faster_whisper` was imported before checking for valid drivers, the app would crash on launch for some users.
* **Removed**: `grammar-m2m100` model downloader and logic. * **Fix**: Implemented **Lazy Loading** for the AI engine. The app now starts the UI first, and only loads the heavy AI libraries inside a safety block that catches errors.
* **Impact**: The application is lighter, installs faster, and uses less RAM.
### 3. Stability Fixes ### 3. Corrupt Model Auto-Repair
* **Fixed**: `NameError: 'torch' is not defined` when using Low VRAM Mode. * **Problem**: Interrupted downloads could leave a corrupted model folder, preventing the app from ever starting again.
* **Fixed**: Bootstrapper now self-repairs missing dependencies if the environment gets corrupted. * **Fix**: If the app detects a "vocabulary missing" or invalid config error, it will now **automatically delete the corrupt folder** and allow you to re-download it cleanly.
### 4. Windows DLL Injection
* **Fix**: Added explicit DLL path injection for `nvidia-cublas` and `nvidia-cudnn` to ensure Python 3.8+ can find the required CUDA libraries on Windows systems that don't have them in PATH.
## 📦 Installation ## 📦 Installation
1. Download `WhisperVoice.exe` (attached below or in `dist/`). 1. Download `WhisperVoice.exe` below.
2. Run it. It will automatically update your environment if needed. 2. Replace your existing `.exe`.
3. Run it.

View File

@@ -245,18 +245,38 @@ class Bootstrapper:
req_file = self.source_path / "requirements.txt" req_file = self.source_path / "requirements.txt"
# Use --prefer-binary to avoid building from source on Windows if possible
# Use --no-warn-script-location to reduce noise
# CRITICAL: Force --only-binary for llama-cpp-python to prevent picking new source-only versions
cmd = [
str(self.python_path / "python.exe"), "-m", "pip", "install",
"--prefer-binary",
"--only-binary", "llama-cpp-python",
"--extra-index-url", "https://abetlen.github.io/llama-cpp-python/whl/cpu",
"-r", str(req_file)
]
process = subprocess.Popen( process = subprocess.Popen(
[str(self.python_path / "python.exe"), "-m", "pip", "install", "-r", str(req_file)], cmd,
stdout=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, stderr=subprocess.STDOUT, # Merge stderr into stdout
text=True, text=True,
cwd=str(self.python_path), cwd=str(self.python_path),
creationflags=subprocess.CREATE_NO_WINDOW creationflags=subprocess.CREATE_NO_WINDOW
) )
output_buffer = []
for line in process.stdout: for line in process.stdout:
if self.ui: self.ui.set_detail(line.strip()[:60]) line_stripped = line.strip()
process.wait() if self.ui: self.ui.set_detail(line_stripped[:60])
output_buffer.append(line_stripped)
log(line_stripped)
return_code = process.wait()
if return_code != 0:
err_msg = "\n".join(output_buffer[-15:]) # Show last 15 lines
raise RuntimeError(f"Pip install failed (Exit code {return_code}):\n{err_msg}")
def refresh_app_source(self): def refresh_app_source(self):
""" """
@@ -348,8 +368,22 @@ class Bootstrapper:
return False return False
def check_dependencies(self): def check_dependencies(self):
"""Quick check if critical dependencies are installed.""" """Check if critical dependencies are importable in the embedded python."""
return True # Deprecated logic placeholder if not self.is_python_ready(): return False
try:
# Check for core libs that might be missing
# We use a subprocess to check imports in the runtime environment
subprocess.check_call(
[str(self.python_path / "python.exe"), "-c", "import faster_whisper; import llama_cpp; import PySide6"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
cwd=str(self.python_path),
creationflags=subprocess.CREATE_NO_WINDOW
)
return True
except (subprocess.CalledProcessError, FileNotFoundError):
return False
def setup_and_run(self): def setup_and_run(self):
"""Full setup/update and run flow.""" """Full setup/update and run flow."""
@@ -359,10 +393,16 @@ class Bootstrapper:
self.download_python() self.download_python()
self._fix_pth_file() # Ensure pth is fixed immediately after download self._fix_pth_file() # Ensure pth is fixed immediately after download
self.install_pip() self.install_pip()
self.install_packages() # self.install_packages() # We'll do this in the dependency check step now
# Always refresh source to ensure we have the latest bundled code # Always refresh source to ensure we have the latest bundled code
self.refresh_app_source() self.refresh_app_source()
# 2. Check and Install Dependencies
# We do this AFTER refreshing source so we have the latest requirements.txt
if not self.check_dependencies():
log("Dependencies missing or incomplete. Installing...")
self.install_packages()
# Launch # Launch
if self.run_app(): if self.run_app():

BIN
dist/WhisperVoice.exe vendored

Binary file not shown.

248
main.py
View File

@@ -9,6 +9,31 @@ app_dir = os.path.dirname(os.path.abspath(__file__))
if app_dir not in sys.path: if app_dir not in sys.path:
sys.path.insert(0, app_dir) sys.path.insert(0, app_dir)
# -----------------------------------------------------------------------------
# WINDOWS DLL FIX (CRITICAL for Portable CUDA)
# Python 3.8+ on Windows requires explicit DLL directory addition.
# -----------------------------------------------------------------------------
if os.name == 'nt' and hasattr(os, 'add_dll_directory'):
try:
from pathlib import Path
# Scan sys.path for site-packages
for p in sys.path:
path_obj = Path(p)
if path_obj.name == 'site-packages' and path_obj.exists():
nvidia_path = path_obj / "nvidia"
if nvidia_path.exists():
for subdir in nvidia_path.iterdir():
# Add 'bin' folder from each nvidia stub (cublas, cudnn, etc.)
bin_path = subdir / "bin"
if bin_path.exists():
os.add_dll_directory(str(bin_path))
# Also try adding site-packages itself just in case
# os.add_dll_directory(str(path_obj))
break
except Exception:
pass
# -----------------------------------------------------------------------------
from PySide6.QtWidgets import QApplication, QFileDialog, QMessageBox from PySide6.QtWidgets import QApplication, QFileDialog, QMessageBox
from PySide6.QtCore import QObject, Slot, Signal, QThread, Qt, QUrl from PySide6.QtCore import QObject, Slot, Signal, QThread, Qt, QUrl
from PySide6.QtQml import QQmlApplicationEngine from PySide6.QtQml import QQmlApplicationEngine
@@ -19,6 +44,7 @@ from src.ui.bridge import UIBridge
from src.ui.tray import SystemTray from src.ui.tray import SystemTray
from src.core.audio_engine import AudioEngine from src.core.audio_engine import AudioEngine
from src.core.transcriber import WhisperTranscriber from src.core.transcriber import WhisperTranscriber
from src.core.llm_engine import LLMEngine
from src.core.hotkey_manager import HotkeyManager from src.core.hotkey_manager import HotkeyManager
from src.core.config import ConfigManager from src.core.config import ConfigManager
from src.utils.injector import InputInjector from src.utils.injector import InputInjector
@@ -163,6 +189,69 @@ class DownloadWorker(QThread):
logging.error(f"Download failed: {e}") logging.error(f"Download failed: {e}")
self.error.emit(str(e)) self.error.emit(str(e))
class LLMDownloadWorker(QThread):
progress = Signal(int)
finished = Signal()
error = Signal(str)
def __init__(self, parent=None):
super().__init__(parent)
def run(self):
try:
import requests
# Support one model for now
url = "https://huggingface.co/hugging-quants/Llama-3.2-1B-Instruct-Q4_K_M-GGUF/resolve/main/llama-3.2-1b-instruct-q4_k_m.gguf?download=true"
fname = "llama-3.2-1b-instruct-q4_k_m.gguf"
model_path = get_models_path() / "llm" / "llama-3.2-1b-instruct"
model_path.mkdir(parents=True, exist_ok=True)
dest_file = model_path / fname
# Simple check if exists and > 0 size?
# We assume if the user clicked download, they want to download it.
with requests.Session() as s:
head = s.head(url, allow_redirects=True)
total_size = int(head.headers.get('content-length', 0))
resp = s.get(url, stream=True)
resp.raise_for_status()
downloaded = 0
with open(dest_file, 'wb') as f:
for chunk in resp.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
downloaded += len(chunk)
if total_size > 0:
pct = int((downloaded / total_size) * 100)
self.progress.emit(pct)
self.finished.emit()
except Exception as e:
logging.error(f"LLM Download failed: {e}")
self.error.emit(str(e))
class LLMWorker(QThread):
finished = Signal(str)
def __init__(self, llm_engine, text, mode, parent=None):
super().__init__(parent)
self.llm_engine = llm_engine
self.text = text
self.mode = mode
def run(self):
try:
corrected = self.llm_engine.correct_text(self.text, self.mode)
self.finished.emit(corrected)
except Exception as e:
logging.error(f"LLMWorker crashed: {e}")
self.finished.emit(self.text) # Fail safe: return original text
class TranscriptionWorker(QThread): class TranscriptionWorker(QThread):
finished = Signal(str) finished = Signal(str)
def __init__(self, transcriber, audio_data, is_file=False, parent=None, task_override=None): def __init__(self, transcriber, audio_data, is_file=False, parent=None, task_override=None):
@@ -204,6 +293,7 @@ class WhisperApp(QObject):
self.bridge.settingChanged.connect(self.on_settings_changed) self.bridge.settingChanged.connect(self.on_settings_changed)
self.bridge.hotkeysEnabledChanged.connect(self.on_hotkeys_enabled_toggle) self.bridge.hotkeysEnabledChanged.connect(self.on_hotkeys_enabled_toggle)
self.bridge.downloadRequested.connect(self.on_download_requested) self.bridge.downloadRequested.connect(self.on_download_requested)
self.bridge.llmDownloadRequested.connect(self.on_llm_download_requested)
self.engine.rootContext().setContextProperty("ui", self.bridge) self.engine.rootContext().setContextProperty("ui", self.bridge)
@@ -224,7 +314,9 @@ class WhisperApp(QObject):
# 3. Logic Components Placeholders # 3. Logic Components Placeholders
self.audio_engine = None self.audio_engine = None
self.transcriber = None self.transcriber = None
self.llm_engine = None
self.hk_transcribe = None self.hk_transcribe = None
self.hk_correct = None
self.hk_translate = None self.hk_translate = None
self.overlay_root = None self.overlay_root = None
@@ -319,14 +411,19 @@ class WhisperApp(QObject):
self.audio_engine.set_visualizer_callback(self.bridge.update_amplitude) self.audio_engine.set_visualizer_callback(self.bridge.update_amplitude)
self.audio_engine.set_silence_callback(self.on_silence_detected) self.audio_engine.set_silence_callback(self.on_silence_detected)
self.transcriber = WhisperTranscriber() self.transcriber = WhisperTranscriber()
self.llm_engine = LLMEngine()
# Dual Hotkey Managers # Dual Hotkey Managers
self.hk_transcribe = HotkeyManager(config_key="hotkey") self.hk_transcribe = HotkeyManager(config_key="hotkey")
self.hk_transcribe.triggered.connect(lambda: self.toggle_recording(task_override="transcribe")) self.hk_transcribe.triggered.connect(lambda: self.toggle_recording(task_override="transcribe", task_mode="standard"))
self.hk_transcribe.start() self.hk_transcribe.start()
self.hk_correct = HotkeyManager(config_key="hotkey_correct")
self.hk_correct.triggered.connect(lambda: self.toggle_recording(task_override="transcribe", task_mode="correct"))
self.hk_correct.start()
self.hk_translate = HotkeyManager(config_key="hotkey_translate") self.hk_translate = HotkeyManager(config_key="hotkey_translate")
self.hk_translate.triggered.connect(lambda: self.toggle_recording(task_override="translate")) self.hk_translate.triggered.connect(lambda: self.toggle_recording(task_override="translate", task_mode="standard"))
self.hk_translate.start() self.hk_translate.start()
self.bridge.update_status("Ready") self.bridge.update_status("Ready")
@@ -334,6 +431,57 @@ class WhisperApp(QObject):
def run(self): def run(self):
sys.exit(self.qt_app.exec()) sys.exit(self.qt_app.exec())
@Slot(str, str)
@Slot(str)
def toggle_recording(self, task_override=None, task_mode="standard"):
"""
task_override: 'transcribe' or 'translate' (passed to whisper)
task_mode: 'standard' or 'correct' (determines post-processing)
"""
if task_mode == "correct":
self.current_task_requires_llm = True
elif task_mode == "standard":
self.current_task_requires_llm = False # Explicit reset
# Actual Logic
if self.bridge.isRecording:
logging.info("Stopping recording...")
# stop_recording returns the numpy array directly
audio_data = self.audio_engine.stop_recording()
self.bridge.isRecording = False
self.bridge.update_status("Processing...")
self.bridge.isProcessing = True
# Save task override for processing
self.last_task_override = task_override
if audio_data is not None and len(audio_data) > 0:
# Use the task that started this session, or the override if provided
final_task = getattr(self, "current_recording_task", self.config.get("task"))
if task_override: final_task = task_override
self.worker = TranscriptionWorker(self.transcriber, audio_data, parent=self, task_override=final_task)
self.worker.finished.connect(self.on_transcription_done)
self.worker.start()
else:
self.bridge.update_status("Ready")
self.bridge.isProcessing = False
else:
# START RECORDING
if self.bridge.isProcessing:
logging.warning("Ignored toggle request: Transcription in progress.")
return
intended_task = task_override if task_override else self.config.get("task")
self.current_recording_task = intended_task
logging.info(f"Starting recording... (Task: {intended_task}, Mode: {task_mode})")
self.audio_engine.start_recording()
self.bridge.isRecording = True
self.bridge.update_status(f"Recording ({intended_task})...")
@Slot() @Slot()
def quit_app(self): def quit_app(self):
logging.info("Shutting down...") logging.info("Shutting down...")
@@ -422,14 +570,16 @@ class WhisperApp(QObject):
print(f"Setting Changed: {key} = {value}") print(f"Setting Changed: {key} = {value}")
# 1. Hotkey Reload # 1. Hotkey Reload
if key in ["hotkey", "hotkey_translate"]: if key in ["hotkey", "hotkey_translate", "hotkey_correct"]:
if self.hk_transcribe: self.hk_transcribe.reload_hotkey() if self.hk_transcribe: self.hk_transcribe.reload_hotkey()
if self.hk_correct: self.hk_correct.reload_hotkey()
if self.hk_translate: self.hk_translate.reload_hotkey() if self.hk_translate: self.hk_translate.reload_hotkey()
if self.tray: if self.tray:
hk1 = self.format_hotkey(self.config.get("hotkey")) hk1 = self.format_hotkey(self.config.get("hotkey"))
hk3 = self.format_hotkey(self.config.get("hotkey_correct"))
hk2 = self.format_hotkey(self.config.get("hotkey_translate")) hk2 = self.format_hotkey(self.config.get("hotkey_translate"))
self.tray.setToolTip(f"Whisper Voice\nTranscribe: {hk1}\nTranslate: {hk2}") self.tray.setToolTip(f"Whisper Voice\nTranscribe: {hk1}\nCorrect: {hk3}\nTranslate: {hk2}")
# 2. AI Model Reload (Heavy) # 2. AI Model Reload (Heavy)
if key in ["model_size", "compute_device", "compute_type"]: if key in ["model_size", "compute_device", "compute_type"]:
@@ -546,40 +696,7 @@ class WhisperApp(QObject):
# Let's ensure toggle_recording handles no arg calls by stopping the CURRENT task. # Let's ensure toggle_recording handles no arg calls by stopping the CURRENT task.
QMetaObject.invokeMethod(self, "toggle_recording", Qt.QueuedConnection) QMetaObject.invokeMethod(self, "toggle_recording", Qt.QueuedConnection)
@Slot() # Modified to allow lambda override
def toggle_recording(self, task_override=None):
if not self.audio_engine: return
# Prevent starting a new recording while we are still transcribing the last one
if self.bridge.isProcessing:
logging.warning("Ignored toggle request: Transcription in progress.")
return
# Determine which task we are entering
if task_override:
intended_task = task_override
else:
intended_task = self.config.get("task")
if self.audio_engine.recording:
# STOP RECORDING
self.bridge.update_status("Thinking...")
self.bridge.isRecording = False
self.bridge.isProcessing = True # Start Processing
audio_data = self.audio_engine.stop_recording()
# Use the task that started this session, or the override if provided (though usually override is for starting)
final_task = getattr(self, "current_recording_task", self.config.get("task"))
self.worker = TranscriptionWorker(self.transcriber, audio_data, parent=self, task_override=final_task)
self.worker.finished.connect(self.on_transcription_done)
self.worker.start()
else:
# START RECORDING
self.current_recording_task = intended_task
self.bridge.update_status(f"Recording ({intended_task})...")
self.bridge.isRecording = True
self.audio_engine.start_recording()
@Slot(bool) @Slot(bool)
def on_ui_toggle_request(self, state): def on_ui_toggle_request(self, state):
@@ -589,11 +706,53 @@ class WhisperApp(QObject):
@Slot(str) @Slot(str)
def on_transcription_done(self, text: str): def on_transcription_done(self, text: str):
self.bridge.update_status("Ready") self.bridge.update_status("Ready")
self.bridge.isProcessing = False # End Processing self.bridge.isProcessing = False # Temporarily false? No, keep it true if we chain.
# Check LLM Settings -> AND check if the current task requested it
llm_enabled = self.config.get("llm_enabled")
requires_llm = getattr(self, "current_task_requires_llm", False)
# We only correct if:
# 1. LLM is globally enabled (safety switch)
# 2. current_task_requires_llm is True (triggered by Correct hotkey)
# OR 3. Maybe user WANTS global correction? Ideally user uses separate hotkey.
# Let's say: If "Correction" is enabled in settings, does it apply to ALL?
# The user's feedback suggests they DON'T want it on regular hotkey.
# So we enforce: Correct Hotkey -> Corrects. Regular Hotkey -> Raw.
# BUT we must handle the case where user expects the old behavior?
# Let's make it strict: Only correct if triggered by correct hotkey OR if we add a "Correct All" toggle later.
# For now, let's respect the flag. But wait, if llm_enabled is OFF, we shouldn't run it even if hotkey pressed?
# Yes, safety switch.
if text and llm_enabled and requires_llm:
# Chain to LLM
self.bridge.isProcessing = True
self.bridge.update_status("Correcting...")
mode = self.config.get("llm_mode")
self.llm_worker = LLMWorker(self.llm_engine, text, mode, parent=self)
self.llm_worker.finished.connect(self.on_llm_done)
self.llm_worker.start()
return
self.bridge.isProcessing = False
if text: if text:
method = self.config.get("input_method") method = self.config.get("input_method")
speed = int(self.config.get("typing_speed")) speed = int(self.config.get("typing_speed"))
InputInjector.inject_text(text, method, speed) InputInjector.inject_text(text, method, speed)
@Slot(str)
def on_llm_done(self, text: str):
self.bridge.update_status("Ready")
self.bridge.isProcessing = False
if text:
method = self.config.get("input_method")
speed = int(self.config.get("typing_speed"))
InputInjector.inject_text(text, method, speed)
# Cleanup
if hasattr(self, 'llm_worker') and self.llm_worker:
self.llm_worker.deleteLater()
self.llm_worker = None
@Slot(bool) @Slot(bool)
def on_hotkeys_enabled_toggle(self, state): def on_hotkeys_enabled_toggle(self, state):
@@ -613,6 +772,19 @@ class WhisperApp(QObject):
self.download_worker.error.connect(self.on_download_error) self.download_worker.error.connect(self.on_download_error)
self.download_worker.start() self.download_worker.start()
@Slot()
def on_llm_download_requested(self):
if self.bridge.isDownloading: return
self.bridge.update_status("Downloading LLM...")
self.bridge.isDownloading = True
self.llm_dl_worker = LLMDownloadWorker(parent=self)
self.llm_dl_worker.progress.connect(self.on_loader_progress) # Reuse existing progress slot? Yes.
self.llm_dl_worker.finished.connect(self.on_download_finished) # Reuses same cleanup
self.llm_dl_worker.error.connect(self.on_download_error)
self.llm_dl_worker.start()
def on_download_finished(self): def on_download_finished(self):
self.bridge.isDownloading = False self.bridge.isDownloading = False
self.bridge.update_status("Ready") self.bridge.update_status("Ready")

View File

@@ -62,6 +62,7 @@ def build_portable():
"--exclude-module", "faster_whisper", "--exclude-module", "faster_whisper",
"--exclude-module", "torch", "--exclude-module", "torch",
"--exclude-module", "PySide6", "--exclude-module", "PySide6",
"--exclude-module", "llama_cpp",
# Icon # Icon

73
publish_release.py Normal file
View File

@@ -0,0 +1,73 @@
import os
import requests
import mimetypes
# Configuration
API_URL = "https://git.lashman.live/api/v1"
OWNER = "lashman"
REPO = "whisper_voice"
TAG = "v1.0.4"
TOKEN = "6153890332afff2d725aaf4729bc54b5030d5700" # Extracted from git config
EXE_PATH = r"dist\WhisperVoice.exe"
headers = {
"Authorization": f"token {TOKEN}",
"Accept": "application/json"
}
def create_release():
print(f"Creating release {TAG}...")
# Read Release Notes
with open("RELEASE_NOTES.md", "r", encoding="utf-8") as f:
notes = f.read()
# Create Release
payload = {
"tag_name": TAG,
"name": TAG,
"body": notes,
"draft": False,
"prerelease": False
}
url = f"{API_URL}/repos/{OWNER}/{REPO}/releases"
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code == 201:
print("Release created successfully!")
return resp.json()
elif resp.status_code == 409:
print("Release already exists. Fetching it...")
# Get by tag
resp = requests.get(f"{API_URL}/repos/{OWNER}/{REPO}/releases/tags/{TAG}", headers=headers)
if resp.status_code == 200:
return resp.json()
print(f"Failed to create release: {resp.status_code} - {resp.text}")
return None
def upload_asset(release_id, file_path):
print(f"Uploading asset: {file_path}...")
filename = os.path.basename(file_path)
with open(file_path, "rb") as f:
data = f.read()
url = f"{API_URL}/repos/{OWNER}/{REPO}/releases/{release_id}/assets?name={filename}"
# Gitea API expects raw body
resp = requests.post(url, data=data, headers=headers)
if resp.status_code == 201:
print(f"Uploaded {filename} successfully!")
else:
print(f"Failed to upload asset: {resp.status_code} - {resp.text}")
def main():
release = create_release()
if release:
upload_asset(release["id"], EXE_PATH)
if __name__ == "__main__":
main()

View File

@@ -29,3 +29,6 @@ huggingface-hub>=0.20.0
pystray>=0.19.0 pystray>=0.19.0
Pillow>=10.0.0 Pillow>=10.0.0
darkdetect>=0.8.0 darkdetect>=0.8.0
# LLM / Correction
llama-cpp-python>=0.2.20

View File

@@ -17,6 +17,7 @@ from src.core.paths import get_base_path
DEFAULT_SETTINGS = { DEFAULT_SETTINGS = {
"hotkey": "f8", "hotkey": "f8",
"hotkey_translate": "f10", "hotkey_translate": "f10",
"hotkey_correct": "f9", # New: Transcribe + Correct
"model_size": "small", "model_size": "small",
"input_device": None, # Device ID (int) or Name (str), None = Default "input_device": None, # Device ID (int) or Name (str), None = Default
"save_recordings": False, # Save .wav files for debugging "save_recordings": False, # Save .wav files for debugging
@@ -49,6 +50,11 @@ DEFAULT_SETTINGS = {
"condition_on_previous_text": True, "condition_on_previous_text": True,
"initial_prompt": "Mm-hmm. Okay, let's go. I speak in full sentences.", # Default: Forces punctuation "initial_prompt": "Mm-hmm. Okay, let's go. I speak in full sentences.", # Default: Forces punctuation
# LLM Correction
"llm_enabled": False,
"llm_mode": "Standard", # "Grammar", "Standard", "Rewrite"
"llm_model_name": "llama-3.2-1b-instruct",
# Low VRAM Mode # Low VRAM Mode
@@ -102,9 +108,9 @@ class ConfigManager:
except Exception as e: except Exception as e:
logging.error(f"Failed to save settings: {e}") logging.error(f"Failed to save settings: {e}")
def get(self, key: str) -> Any: def get(self, key: str, default: Any = None) -> Any:
"""Get a setting value.""" """Get a setting value."""
return self.data.get(key, DEFAULT_SETTINGS.get(key)) return self.data.get(key, DEFAULT_SETTINGS.get(key, default))

185
src/core/llm_engine.py Normal file
View File

@@ -0,0 +1,185 @@
"""
LLM Engine Module.
==================
Handles interaction with the local Llama 3.2 1B model for transcription correction.
Uses llama-cpp-python for efficient local inference.
"""
import os
import logging
from typing import Optional
from src.core.paths import get_models_path
from src.core.config import ConfigManager
try:
from llama_cpp import Llama
except ImportError:
Llama = None
class LLMEngine:
"""
Manages the Llama model and performs text correction/rewriting.
"""
def __init__(self):
self.config = ConfigManager()
self.model = None
self.current_model_path = None
# --- Mode 1: Grammar Only (Strict) ---
self.prompt_grammar = (
"You are a text correction tool. "
"Correct the grammar/spelling. Do not change punctuation or capitalization styles. "
"Do not remove any words (including profanity). Output ONLY the result."
"\n\nExample:\nInput: 'damn it works'\nOutput: 'damn it works'"
)
# --- Mode 2: Standard (Grammar + Punctuation + Caps) ---
self.prompt_standard = (
"You are a text correction tool. "
"Standardize the grammar, punctuation, and capitalization. "
"Do not remove any words (including profanity). Output ONLY the result."
"\n\nExample:\nInput: 'damn it works'\nOutput: 'Damn it works.'"
)
# --- Mode 3: Rewrite (Tone-Aware Polish) ---
self.prompt_rewrite = (
"You are a text rewriting tool. Improve flow/clarity but keep the exact tone and vocabulary. "
"Do not remove any words (including profanity). Output ONLY the result."
"\n\nExample:\nInput: 'damn it works'\nOutput: 'Damn, it works.'"
)
def load_model(self) -> bool:
"""
Loads the LLM model if it exists.
Returns True if successful, False otherwise.
"""
if Llama is None:
logging.error("llama-cpp-python not installed.")
return False
model_name = self.config.get("llm_model_name", "llama-3.2-1b-instruct")
model_dir = get_models_path() / "llm" / model_name
model_file = model_dir / "llama-3.2-1b-instruct-q4_k_m.gguf"
if not model_file.exists():
logging.warning(f"LLM Model not found at: {model_file}")
return False
if self.model and self.current_model_path == str(model_file):
return True
try:
logging.info(f"Loading LLM from {model_file}...")
n_gpu_layers = 0
try:
import torch
if torch.cuda.is_available():
n_gpu_layers = -1
except:
pass
self.model = Llama(
model_path=str(model_file),
n_gpu_layers=n_gpu_layers,
n_ctx=2048,
verbose=False
)
self.current_model_path = str(model_file)
logging.info("LLM loaded successfully.")
return True
except Exception as e:
logging.error(f"Failed to load LLM: {e}")
self.model = None
return False
def correct_text(self, text: str, mode: str = "Standard") -> str:
"""Corrects or rewrites the provided text."""
if not text or not text.strip():
return text
if not self.model:
if not self.load_model():
return text
logging.info(f"LLM Processing ({mode}): '{text}'")
system_prompt = self.prompt_standard
if mode == "Grammar": system_prompt = self.prompt_grammar
elif mode == "Rewrite": system_prompt = self.prompt_rewrite
# PREFIX INJECTION TECHNIQUE
# We end the prompt with the start of the assistant's answer specifically phrased to force compliance.
# "Here is the processed output:" forces it into a completion mode rather than a refusal mode.
prefix_injection = "Here is the processed output:\n"
prompt = (
f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|>"
f"<|start_header_id|>user<|end_header_id|>\n\nProcess this input:\n{text}<|eot_id|>"
f"<|start_header_id|>assistant<|end_header_id|>\n\n{prefix_injection}"
)
try:
output = self.model(
prompt,
max_tokens=512,
stop=["<|eot_id|>"],
echo=False,
temperature=0.1
)
result = output['choices'][0]['text'].strip()
# 1. Fallback: If result is empty, it might have just outputted nothing because we prefilled?
# Actually llama-cpp-python usually returns the *continuation*.
# So if it outputted "My corrected text.", the full logical response is "Here is...: My corrected text."
# We just want the result.
# Refusal Detection (Safety Net)
refusal_triggers = [
"I cannot", "I can't", "I am unable", "I apologize", "sorry",
"As an AI", "explicit content", "harmful content", "safety guidelines"
]
lower_res = result.lower()
if any(trig in lower_res for trig in refusal_triggers) and len(result) < 150:
logging.warning(f"LLM Refusal Detected: '{result}'. Falling back to original.")
return text # Return original text on refusal!
# --- Robust Post-Processing ---
# 1. Strip quotes
if result.startswith('"') and result.endswith('"') and len(result) > 2 and '"' not in result[1:-1]:
result = result[1:-1]
if result.startswith("'") and result.endswith("'") and len(result) > 2 and "'" not in result[1:-1]:
result = result[1:-1]
# 2. Split by newline
if "\n" in result:
lines = result.split('\n')
clean_lines = [l.strip() for l in lines if l.strip()]
if clean_lines:
result = clean_lines[0]
# 3. Aggressive Preamble Stripping (Updates for new prefix)
import re
prefixes = [
r"^Here is the processed output:?\s*", # The one we injected
r"^Here is the corrected text:?\s*",
r"^Here is the rewritten text:?\s*",
r"^Here's the result:?\s*",
r"^Sure,? here is regex.*:?\s*",
r"^Output:?\s*",
r"^Processing result:?\s*",
]
for p in prefixes:
result = re.sub(p, "", result, flags=re.IGNORECASE).strip()
if result.startswith('"') and result.endswith('"') and len(result) > 2 and '"' not in result[1:-1]:
result = result[1:-1]
logging.info(f"LLM Result: '{result}'")
return result
except Exception as e:
logging.error(f"LLM inference failed: {e}")
return text # Fail safe logic

View File

@@ -21,7 +21,7 @@ except ImportError:
torch = None torch = None
# Import directly - valid since we are now running in the full environment # Import directly - valid since we are now running in the full environment
from faster_whisper import WhisperModel
class WhisperTranscriber: class WhisperTranscriber:
""" """
@@ -62,13 +62,32 @@ class WhisperTranscriber:
# Force offline if path exists to avoid HF errors # Force offline if path exists to avoid HF errors
local_only = new_path.exists() local_only = new_path.exists()
self.model = WhisperModel( try:
model_input, from faster_whisper import WhisperModel
device=device, self.model = WhisperModel(
compute_type=compute, model_input,
download_root=str(get_models_path()), device=device,
local_files_only=local_only compute_type=compute,
) download_root=str(get_models_path()),
local_files_only=local_only
)
except Exception as load_err:
# CRITICAL FALLBACK: If CUDA/cublas fails (AMD/Intel users), fallback to CPU
err_str = str(load_err).lower()
if "cublas" in err_str or "cudnn" in err_str or "library" in err_str or "device" in err_str:
logging.warning(f"CUDA Init Failed ({load_err}). Falling back to CPU...")
self.config.set("compute_device", "cpu") # Update config for persistence/UI
self.current_compute_device = "cpu"
self.model = WhisperModel(
model_input,
device="cpu",
compute_type="int8", # CPU usually handles int8 well with newer extensions, or standard
download_root=str(get_models_path()),
local_files_only=local_only
)
else:
raise load_err
self.current_model_size = size self.current_model_size = size
self.current_compute_device = device self.current_compute_device = device
@@ -78,6 +97,32 @@ class WhisperTranscriber:
except Exception as e: except Exception as e:
logging.error(f"Failed to load model: {e}") logging.error(f"Failed to load model: {e}")
self.model = None self.model = None
# Auto-Repair: Detect vocabulary/corrupt errors
err_str = str(e).lower()
if "vocabulary" in err_str or "tokenizer" in err_str or "config.json" in err_str:
# ... existing auto-repair logic ...
logging.warning("Corrupt model detected on load. Attempting to delete and reset...")
try:
import shutil
# Differentiate between simple path and HF path
new_path = get_models_path() / f"faster-whisper-{size}"
if new_path.exists():
shutil.rmtree(new_path)
logging.info(f"Deleted corrupt model at {new_path}")
else:
# Try legacy HF path
hf_path = get_models_path() / f"models--Systran--faster-whisper-{size}"
if hf_path.exists():
shutil.rmtree(hf_path)
logging.info(f"Deleted corrupt HF model at {hf_path}")
# Notify UI to refresh state (will show 'Download' button now)
# We can't reach bridge easily here without passing it in,
# but the UI polls or listens to logs.
# The user will simply see "Model Missing" in settings after this.
except Exception as del_err:
logging.error(f"Failed to delete corrupt model: {del_err}")
def transcribe(self, audio_data, is_file: bool = False, task: Optional[str] = None) -> str: def transcribe(self, audio_data, is_file: bool = False, task: Optional[str] = None) -> str:
""" """
@@ -89,7 +134,7 @@ class WhisperTranscriber:
if not self.model: if not self.model:
self.load_model() self.load_model()
if not self.model: if not self.model:
return "Error: Model failed to load." return "Error: Model failed to load. Please check Settings -> Model Info."
try: try:
# Config # Config
@@ -174,8 +219,11 @@ class WhisperTranscriber:
def model_exists(self, size: str) -> bool: def model_exists(self, size: str) -> bool:
"""Checks if a model size is already downloaded.""" """Checks if a model size is already downloaded."""
new_path = get_models_path() / f"faster-whisper-{size}" new_path = get_models_path() / f"faster-whisper-{size}"
if (new_path / "config.json").exists(): if new_path.exists():
return True # Strict check
required = ["config.json", "model.bin", "vocabulary.json"]
if all((new_path / f).exists() for f in required):
return True
# Legacy HF cache check # Legacy HF cache check
folder_name = f"models--Systran--faster-whisper-{size}" folder_name = f"models--Systran--faster-whisper-{size}"

View File

@@ -110,6 +110,7 @@ class UIBridge(QObject):
logAppended = Signal(str) # Emits new log line logAppended = Signal(str) # Emits new log line
settingChanged = Signal(str, 'QVariant') settingChanged = Signal(str, 'QVariant')
modelStatesChanged = Signal() # Notify UI to re-check isModelDownloaded modelStatesChanged = Signal() # Notify UI to re-check isModelDownloaded
llmDownloadRequested = Signal()
def __init__(self, parent=None): def __init__(self, parent=None):
super().__init__(parent) super().__init__(parent)
@@ -356,11 +357,7 @@ class UIBridge(QObject):
except Exception as e: except Exception as e:
logging.error(f"Failed to preload audio devices: {e}") logging.error(f"Failed to preload audio devices: {e}")
@Slot()
def toggle_recording(self):
"""Called by UI elements to trigger the app's recording logic."""
# This will be connected to the main app's toggle logic
pass
@Property(bool, notify=isDownloadingChanged) @Property(bool, notify=isDownloadingChanged)
def isDownloading(self): return self._is_downloading def isDownloading(self): return self._is_downloading
@@ -381,25 +378,34 @@ class UIBridge(QObject):
# Check new simple format used by DownloadWorker # Check new simple format used by DownloadWorker
path_simple = get_models_path() / f"faster-whisper-{size}" path_simple = get_models_path() / f"faster-whisper-{size}"
if path_simple.exists() and any(path_simple.iterdir()): if path_simple.exists():
return True # Strict check: Ensure all critical files exist
required = ["config.json", "model.bin", "vocabulary.json"]
if all((path_simple / f).exists() for f in required):
return True
# Check HF Cache format (legacy/default) # Check HF Cache format (legacy/default)
folder_name = f"models--Systran--faster-whisper-{size}" folder_name = f"models--Systran--faster-whisper-{size}"
path_hf = get_models_path() / folder_name path_hf = get_models_path() / folder_name
snapshots = path_hf / "snapshots" snapshots = path_hf / "snapshots"
if snapshots.exists() and any(snapshots.iterdir()): if snapshots.exists() and any(snapshots.iterdir()):
return True return True # Legacy cache structure is complex, assume valid if present
# Check direct folder (simple)
path_direct = get_models_path() / size
if (path_direct / "config.json").exists():
return True
return False
except Exception as e: except Exception as e:
logging.error(f"Error checking model status: {e}") logging.error(f"Error checking model status: {e}")
return False
return False
@Slot(result=bool)
def isLLMModelDownloaded(self):
try:
from src.core.paths import get_models_path
# Hardcoded check for the 1B model we support
model_file = get_models_path() / "llm" / "llama-3.2-1b-instruct" / "llama-3.2-1b-instruct-q4_k_m.gguf"
return model_file.exists()
except:
return False
@Slot(str) @Slot(str)
def downloadModel(self, size): def downloadModel(self, size):
@@ -408,3 +414,7 @@ class UIBridge(QObject):
@Slot() @Slot()
def notifyModelStatesChanged(self): def notifyModelStatesChanged(self):
self.modelStatesChanged.emit() self.modelStatesChanged.emit()
@Slot()
def downloadLLM(self):
self.llmDownloadRequested.emit()

View File

@@ -315,7 +315,7 @@ Window {
ModernSettingsItem { ModernSettingsItem {
label: "Global Hotkey (Transcribe)" label: "Global Hotkey (Transcribe)"
description: "Press to record a new shortcut (e.g. F9)" description: "Standard: Raw transcription"
control: ModernKeySequenceRecorder { control: ModernKeySequenceRecorder {
implicitWidth: 240 implicitWidth: 240
currentSequence: ui.getSetting("hotkey") currentSequence: ui.getSetting("hotkey")
@@ -323,6 +323,16 @@ Window {
} }
} }
ModernSettingsItem {
label: "Global Hotkey (Correct)"
description: "Enhanced: Transcribe + AI Correction"
control: ModernKeySequenceRecorder {
implicitWidth: 240
currentSequence: ui.getSetting("hotkey_correct")
onSequenceChanged: (seq) => ui.setSetting("hotkey_correct", seq)
}
}
ModernSettingsItem { ModernSettingsItem {
label: "Global Hotkey (Translate)" label: "Global Hotkey (Translate)"
description: "Press to record a new shortcut (e.g. F10)" description: "Press to record a new shortcut (e.g. F10)"
@@ -359,8 +369,8 @@ Window {
showSeparator: false showSeparator: false
control: ModernSlider { control: ModernSlider {
Layout.preferredWidth: 200 Layout.preferredWidth: 200
from: 10; to: 6000 from: 10; to: 20000
stepSize: 10 stepSize: 100
snapMode: Slider.SnapAlways snapMode: Slider.SnapAlways
value: ui.getSetting("typing_speed") value: ui.getSetting("typing_speed")
onMoved: ui.setSetting("typing_speed", value) onMoved: ui.setSetting("typing_speed", value)
@@ -845,6 +855,137 @@ Window {
} }
} }
ModernSettingsSection {
title: "Correction & Rewriting"
Layout.margins: 32
Layout.topMargin: 0
content: ColumnLayout {
width: parent.width
spacing: 0
ModernSettingsItem {
label: "Enable Correction"
description: "Post-process text with Llama 3.2 1B (Adds latency)"
control: ModernSwitch {
checked: ui.getSetting("llm_enabled")
onToggled: ui.setSetting("llm_enabled", checked)
}
}
ModernSettingsItem {
label: "Correction Mode"
description: "Grammar Fix vs. Complete Rewrite"
visible: ui.getSetting("llm_enabled")
control: ModernComboBox {
width: 140
model: ["Grammar", "Standard", "Rewrite"]
currentIndex: model.indexOf(ui.getSetting("llm_mode"))
onActivated: ui.setSetting("llm_mode", currentText)
}
}
// LLM Model Status Card
Rectangle {
Layout.fillWidth: true
Layout.margins: 12
Layout.topMargin: 0
Layout.bottomMargin: 16
height: 54
color: "#0a0a0f"
visible: ui.getSetting("llm_enabled")
radius: 6
border.color: SettingsStyle.borderSubtle
border.width: 1
property bool isDownloaded: false
property bool isDownloading: ui.isDownloading && ui.statusText.indexOf("LLM") !== -1
Timer {
interval: 2000
running: visible
repeat: true
onTriggered: parent.checkStatus()
}
function checkStatus() {
isDownloaded = ui.isLLMModelDownloaded()
}
Component.onCompleted: checkStatus()
Connections {
target: ui
function onModelStatesChanged() { parent.checkStatus() }
function onIsDownloadingChanged() { parent.checkStatus() }
}
RowLayout {
anchors.fill: parent
anchors.leftMargin: 12
anchors.rightMargin: 12
spacing: 12
Image {
source: "smart_toy.svg"
sourceSize: Qt.size(16, 16)
layer.enabled: true
layer.effect: MultiEffect {
colorization: 1.0
colorizationColor: parent.parent.isDownloaded ? SettingsStyle.accent : "#808080"
}
}
ColumnLayout {
Layout.fillWidth: true
spacing: 2
Text {
text: "Llama 3.2 1B (Instruct)"
color: "#ffffff"
font.family: "JetBrains Mono"; font.bold: true
font.pixelSize: 11
}
Text {
text: parent.parent.isDownloaded ? "Ready." : "Model missing (~1.2GB)"
color: SettingsStyle.textSecondary
font.family: "JetBrains Mono"; font.pixelSize: 10
}
}
Button {
id: dlBtn
text: "Download"
visible: !parent.parent.isDownloaded && !parent.parent.isDownloading
Layout.preferredHeight: 24
Layout.preferredWidth: 80
contentItem: Text {
text: "DOWNLOAD"
font.pixelSize: 10; font.bold: true; color: "#000000"; horizontalAlignment: Text.AlignHCenter; verticalAlignment: Text.AlignVCenter
}
background: Rectangle {
color: dlBtn.hovered ? "#ffffff" : SettingsStyle.accent; radius: 4
}
onClicked: ui.downloadLLM()
}
// Progress Bar
Rectangle {
visible: parent.parent.isDownloading
Layout.fillWidth: true
height: 4
color: "#30ffffff"
Rectangle {
width: parent.width * (ui.downloadProgress / 100)
height: parent.height
color: SettingsStyle.accent
}
}
}
}
}
}
ModernSettingsSection { ModernSettingsSection {
title: "Advanced Decoding" title: "Advanced Decoding"
Layout.margins: 32 Layout.margins: 32