Compare commits
18 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
baa5e2e69e | ||
|
|
3137770742 | ||
|
|
aed489dd23 | ||
|
|
e23c492360 | ||
|
|
84f10092e9 | ||
|
|
03f46ee1e3 | ||
|
|
0f1bf5f1af | ||
|
|
0b2b5848e2 | ||
|
|
f3bf7541cf | ||
|
|
4b84a27a67 | ||
|
|
f184eb0037 | ||
|
|
306bd075ed | ||
|
|
a1cc9c61b9 | ||
|
|
e627e1b8aa | ||
|
|
eaa572b42f | ||
|
|
e900201214 | ||
|
|
0d426aea4b | ||
|
|
b15ce8076f |
211
README.md
211
README.md
@@ -1,71 +1,196 @@
|
|||||||
# Whisper Voice
|
<div align="center">
|
||||||
|
|
||||||
**Reclaim Your Voice from the Cloud.**
|
# 🎙️ W H I S P E R V O I C E
|
||||||
|
### SOVEREIGN SPEECH RECOGNITION
|
||||||
|
|
||||||
Whisper Voice is a high-performance, strictly local speech-to-text tool designed for the desktop. It provides instant, high-accuracy dictation anywhere on your system—no internet connection required, no corporate servers, and absolutely no data harvesting.
|
<br>
|
||||||
|
|
||||||
We believe that the tools of production—and communication—should belong to the individual, not rented from centralized tech giants.
|

|
||||||
|
[](https://git.lashman.live/lashman/whisper_voice/releases/latest)
|
||||||
|
[](https://creativecommons.org/publicdomain/zero/1.0/)
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
> *"The master's tools will never dismantle the master's house."*
|
||||||
|
> <br>
|
||||||
|
> **Build your own tools. Run them locally. Free your mind.**
|
||||||
|
|
||||||
|
[View Source](https://git.lashman.live/lashman/whisper_voice) • [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues)
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
|
||||||
|
## 📡 The Transmission
|
||||||
|
|
||||||
|
We are witnessing the **enshittification** of the digital world. What were once vibrant social commons are being walled off, strip-mined for data, and degraded into rent-seeking silos. Your voice is no longer your own; it is a training set for a corporate oracle that charges you for the privilege of listening.
|
||||||
|
|
||||||
|
**Whisper Voice** is a small act of sabotage against this trend.
|
||||||
|
|
||||||
|
It is built on the axiom of **Technological Sovereignty**. By moving state-of-the-art inference from the server farms to your own silicon, you reclaim the means of digital production. No telemetry. No subscriptions. No "cloud processing" that eavesdrops on your intent.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## ✊ Core Principles
|
## ⚡ The Engine
|
||||||
|
|
||||||
### 1. Total Autonomy (Local-First)
|
Whisper Voice operates directly on the metal. It is not an API wrapper; it is an autonomous machine.
|
||||||
Your voice data is yours alone. Unlike commercial alternatives that siphon your words to remote data centers for processing and profiling, Whisper Voice runs entirely on your hardware. **No masters, no servers.** You retain full sovereignty over your digital footprint.
|
|
||||||
|
|
||||||
### 2. Decentralized Power
|
| Component | Technology | Benefit |
|
||||||
By leveraging optimized local processing, we strip away the need for reliance on massive, energy-hungry corporate infrastructure. This is technology scaled to the human level—powerful, efficient, and completely under your control.
|
| :--- | :--- | :--- |
|
||||||
|
| **Inference Core** | **Faster-Whisper** | Hyper-optimized C++ implementation via **CTranslate2**. Delivers **4x velocity** over standard PyTorch. |
|
||||||
|
| **Compression** | **INT8 quantization** | Enables Pro-grade models (`Large-v3`) to run on consumer-grade GPUs, democratizing elite AI. |
|
||||||
|
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only pure intent is processed. |
|
||||||
|
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. |
|
||||||
|
|
||||||
### 3. Accessible to All
|
### 🛑 Compatibility Matrix (Windows)
|
||||||
High-quality speech recognition shouldn't be gated behind subscriptions or paywalls. This tool is free, open, and built to empower users to interact with their machines on their own terms.
|
The core engine (`CTranslate2`) is heavily optimized for Nvidia tensor cores.
|
||||||
|
|
||||||
|
| Manufacturer | Hardware | Status | Notes |
|
||||||
|
| :--- | :--- | :--- | :--- |
|
||||||
|
| **Nvidia** | GTX 900+ / RTX | ✅ **Supported** | Full heavy-metal acceleration. |
|
||||||
|
| **AMD** | Radeon RX | ⚠️ **CPU Fallback** | Runs on CPU. Valid for `Small/Medium`, slow for `Large`. |
|
||||||
|
| **Intel** | Arc / Iris | ⚠️ **CPU Fallback** | Runs on CPU. Valid for `Small/Medium`, slow for `Large`. |
|
||||||
|
| **Apple** | M1 / M2 / M3 | ❌ **Unsupported** | Release is strictly Windows x64. |
|
||||||
|
|
||||||
|
> **AMD Users**: v1.0.3 auto-detects GPU failures and silently falls back to CPU.
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
## 🖋️ Universal Transcription
|
||||||
|
|
||||||
|
At its core, Whisper Voice is the ultimate bridge between thought and text. It listens with superhuman precision, converting spoken word into written form across **99 languages**.
|
||||||
|
|
||||||
|
* **Punctuation Mastery**: Automatically handles capitalization and complex punctuation formatting.
|
||||||
|
* **Contextual Intelligence**: Smarter than standard dictation; it understands the flow of sentences to resolve homophones and technical jargon ($1.5k vs "fifteen hundred dollars").
|
||||||
|
* **Total Privacy**: Your private dictation, legal notes, or creative writing never leave your RAM.
|
||||||
|
|
||||||
|
### Workflow: `F9 (Default)`
|
||||||
|
The primary channel for native-language transcription. It transcribes precisely what it hears in the language you speak (or the one you've locked in Settings).
|
||||||
|
|
||||||
|
### 🧠 Intelligent Correction (New in v1.1.0)
|
||||||
|
Whisper Voice now integrates a local **Llama 3.2 1B** LLM to act as a "Silent Consultant". It post-processes transcripts to fix grammar or polish style without effectively "chatting" back.
|
||||||
|
|
||||||
|
It is strictly trained on a **Forensic Protocol**: it will never lecture you, never refuse to process explicit language, and never sanitize your words. Your profanity is yours to keep.
|
||||||
|
|
||||||
|
#### Correction Modes:
|
||||||
|
* **Standard (Default)**: Fixes grammar, punctuation, and capitalization while keeping every word you said.
|
||||||
|
* **Grammar Only**: Strictly fixes objective errors (spelling/agreement). Touches nothing else.
|
||||||
|
* **Rewrite**: Polishes the flow and clarity of your sentences while explicitly preserving your original tone (Casual stays casual, Formal stays formal).
|
||||||
|
|
||||||
|
#### Supported Languages:
|
||||||
|
The correction engine is optimized for **English, German, French, Italian, Portuguese, Spanish, Hindi, and Thai**. It also performs well on **Russian, Chinese, Japanese, and Romanian**.
|
||||||
|
|
||||||
|
This approach incurs a ~2s latency penalty but uses **zero extra VRAM** when in Low VRAM mode.
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
## 🌎 Universal Translation
|
||||||
|
|
||||||
|
Whisper Voice v1.0.1 includes a **Neural Translation Engine** that allows you to bridge any linguistic gap instantly.
|
||||||
|
|
||||||
|
* **Input**: Speak in French, Japanese, Russian, or **96 other languages**.
|
||||||
|
* **Output**: The engine instantly reconstructs the semantic meaning into fluent **English**.
|
||||||
|
* **Task Protocol**: Handled via the dedicated `F10` channel.
|
||||||
|
|
||||||
|
### 🔍 Why only English translation?
|
||||||
|
A common question arises: *Why can't I translate from French to Japanese?*
|
||||||
|
|
||||||
|
The architecture of the underlying Whisper model is a **Many-to-English** design. During its massive training phase (680,000 hours of audio), the translation task was specifically optimized to map the global linguistic commons onto a single bridge language: **English**. This allowed the model to reach incredible levels of semantic understanding without the exponential complexity of a "Many-to-Many" mapping.
|
||||||
|
|
||||||
|
By focusing its translation decoder solely on English, Whisper achieves "Zero-Shot" quality that rivals specialized translation engines while remaining lightweight enough to run on your local GPU.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## ✨ Features
|
## 🕹️ Command & Control
|
||||||
|
|
||||||
* **100% Offline Processing**: Once the recognition engine is downloaded, the cable can be cut. Nothing leaves your machine.
|
### Global Hotkeys
|
||||||
* **Universal Compatibility**: Works in any text field—editors, chat apps, terminals, or browsers. If you can type there, you can speak there.
|
The agent runs silently in the background, waiting for your signal.
|
||||||
* **Adaptive Input**:
|
|
||||||
* *Clipboard Mode*: Standard paste injection.
|
* **Transcribe (F9)**: Opens the channel for standard speech-to-text.
|
||||||
* *High-Speed Simulation*: Simulates keystrokes at supersonic speeds (up to 6000 CPM) for apps that block pasting.
|
* **Translate (F10)**: Opens the channel for neural translation.
|
||||||
* **System Integration**: Minimalist overlay and system tray presence. It exists when you need it and vanishes when you don't.
|
* **Customization**: Remap these keys in Settings. The recorder supports complex chords (e.g. `Ctrl + Alt + Space`) to fit your workflow.
|
||||||
* **Resource Efficiency**: Optimized to run smoothly on consumer hardware without monopolizing your system resources.
|
|
||||||
|
### Injection Protocols
|
||||||
|
* **Clipboard Paste**: Standard text injection. Instant, reliable.
|
||||||
|
* **Simulate Typing**: Mimics physical keystrokes at superhuman speed (6000 CPM). Bypasses anti-paste restrictions and "protected" windows.
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
## 📊 Intelligence Matrix
|
||||||
|
|
||||||
|
Select the model that aligns with your available resources.
|
||||||
|
|
||||||
|
| Model | VRAM (GPU) | RAM (CPU) | Designation | Capability |
|
||||||
|
| :--- | :--- | :--- | :--- | :--- |
|
||||||
|
| `Tiny` | **~500 MB** | ~1 GB | ⚡ **Supersonic** | Command & Control, older hardware. |
|
||||||
|
| `Base` | **~600 MB** | ~1 GB | 🚀 **Very Fast** | Daily driver for low-power laptops. |
|
||||||
|
| `Small` | **~1 GB** | ~2 GB | ⏩ **Fast** | High accuracy English dictation. |
|
||||||
|
| `Medium` | **~2 GB** | ~4 GB | ⚖️ **Balanced** | Complex vocabulary, foreign accents. |
|
||||||
|
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **The Sweet Spot.** Near-Large intelligence, Medium speed. |
|
||||||
|
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional grade. Uncompromised. |
|
||||||
|
|
||||||
|
> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.*
|
||||||
|
|
||||||
|
### 📉 Low VRAM Mode
|
||||||
|
For users with limited GPU memory (e.g., 4GB cards) or those running heavy games simultaneously, Whisper Voice offers a specialized **Low VRAM Mode**.
|
||||||
|
|
||||||
|
* **Behavior**: The AI model is aggressively unloaded from the GPU immediately after every transcription.
|
||||||
|
* **Benefit**: When idle, the app consumes near-zero VRAM (~0MB), leaving your GPU completely free for gaming or rendering.
|
||||||
|
* **Trade-off**: There is a "cold start" latency of 1-2 seconds for every voice command as the model reloads from the disk cache.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🚀 Getting Started
|
## 🛠️ Deployment
|
||||||
|
|
||||||
### Installation
|
### 📥 Installation
|
||||||
1. Download the latest release.
|
1. **Acquire**: Download `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
|
||||||
2. Run `WhisperVoice.exe`.
|
2. **Deploy**: Place it anywhere. It is portable.
|
||||||
3. On the first run, the bootstrapper will autonomously provision the necessary runtime environment. This ensures your system remains clean and dependencies are self-contained.
|
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python runtime (~2GB) on first launch.
|
||||||
|
4. **Sync**: Future updates are handled by the **Smart Bootstrapper**, which surgically updates only changed files, respecting your bandwidth and your settings.
|
||||||
|
|
||||||
### Usage
|
### 🔧 Troubleshooting
|
||||||
1. **Set Your Trigger**: Configure a global hotkey (default: `F9`) in the settings.
|
* **App crashes on start**: Ensure you have [Microsoft Visual C++ Redistributable 2015-2022](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) installed.
|
||||||
2. **Speak Freely**: Hold the hotkey (or toggle it) and speak.
|
* **"Simulate Typing" is slow**: Some applications (remote desktops, legacy games) cannot handle the data stream. Lower the typing speed in Settings to ~1200 CPM.
|
||||||
3. **Direct Action**: Your words are instantly transcribed and injected into your active window.
|
* **No Audio**: The agent listens to the **Default Communication Device**. Verify your Windows Sound Control Panel.
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## ⚙️ Configuration
|
## 🌐 Supported Languages
|
||||||
|
|
||||||
The **Settings** panel puts the means of configuration in your hands:
|
The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on **Auto-Detect** for fluid multilingual usage.
|
||||||
|
|
||||||
* **Recognition Engine**: Choose the size of the model that fits your hardware capabilities (Tiny to Large). Larger models offer greater precision but require more computing power.
|
| | | | | | |
|
||||||
* **Input Method**: Switch between "Clipboard Paste" and "Simulate Typing" depending on target application restrictions.
|
| :--- | :--- | :--- | :--- | :--- | :--- |
|
||||||
* **Typing Speed**: Adjust the keystroke injection rate. Crank it up to 6000 CPM for instant text delivery.
|
| Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 | Armenian 🇦🇲 | Assamese 🇮🇳 |
|
||||||
* **Run on Startup**: Configure the agent to be ready the moment your session begins.
|
| Azerbaijani 🇦🇿 | Bashkir 🇷🇺 | Basque 🇪🇸 | Belarusian 🇧🇾 | Bengali 🇧🇩 | Bosnian 🇧🇦 |
|
||||||
|
| Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 | Castilian 🇪🇸 | Catalan 🇪🇸 | Chinese 🇨🇳 |
|
||||||
|
| Croatian 🇭🇷 | Czech 🇨🇿 | Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 |
|
||||||
|
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
|
||||||
|
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
|
||||||
|
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 | Indonesian 🇮🇩 | Italian 🇮🇹 |
|
||||||
|
| Japanese 🇯🇵 | Javanese 🇮 Indonesa | Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
|
||||||
|
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 |
|
||||||
|
| Macedonian 🇲🇰 | Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
|
||||||
|
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 | Norwegian 🇳🇴 |
|
||||||
|
| Occitan 🇫🇷 | Panjabi 🇮🇳 | Pashto 🇦🇫 | Persian 🇮🇷 | Polish 🇵🇱 | Portuguese 🇵🇹 |
|
||||||
|
| Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 | Sanskrit 🇮🇳 | Serbian 🇷🇸 | Shona 🇿🇼 |
|
||||||
|
| Sindhi 🇵🇰 | Sinhala 🇱🇰 | Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 |
|
||||||
|
| Sundanese 🇮🇩 | Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 | Tajik 🇹🇯 | Tamil 🇮🇳 |
|
||||||
|
| Tatar 🇷🇺 | Telugu 🇮🇳 | Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 | Turkmen 🇹🇲 |
|
||||||
|
| Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e | Welsh 🏴 | Yiddish 🇮🇱 |
|
||||||
|
| Yoruba 🇳🇬 | | | | | |
|
||||||
|
|
||||||
---
|
<br>
|
||||||
|
<br>
|
||||||
|
|
||||||
## 🤝 Mutual Aid
|
<div align="center">
|
||||||
|
|
||||||
This project thrives on community collaboration. If you have improvements, fixes, or ideas, you are encouraged to contribute. We build better systems when we build them together, horizontally and transparently.
|
### ⚖️ PUBLIC DOMAIN (CC0 1.0)
|
||||||
|
*No Rights Reserved. No Gods. No Masters. No Managers.*
|
||||||
|
|
||||||
* **Report Issues**: If something breaks, let us know.
|
Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD).
|
||||||
* **Contribute Code**: The source is open. Fork it, improve it, share it.
|
|
||||||
|
|
||||||
---
|
</div>
|
||||||
|
|
||||||
*Built with local processing libraries and Qt.*
|
|
||||||
*No gods, no cloud managers.*
|
|
||||||
|
|||||||
28
RELEASE_NOTES.md
Normal file
28
RELEASE_NOTES.md
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
# Release v1.0.4
|
||||||
|
|
||||||
|
**"The Compatibility Update"**
|
||||||
|
|
||||||
|
This release focuses on maximum stability across different hardware configurations (AMD, Intel, Nvidia) and fixing startup crashes related to corrupted models or missing drivers.
|
||||||
|
|
||||||
|
## 🛠️ Critical Fixes
|
||||||
|
|
||||||
|
### 1. Robust CPU Fallback (AMD / Intel Support)
|
||||||
|
* **Problem**: Previously, if an AMD user tried to run the app, it would crash instantly because it tried to load Nvidia CUDA libraries by default.
|
||||||
|
* **Fix**: The app now **silently detects** if CUDA initialization fails (due to missing DLLs or incompatible hardware) and **automatically falls back to CPU mode**.
|
||||||
|
* **Result**: The app "just works" on any Windows machine, regardless of GPU.
|
||||||
|
|
||||||
|
### 2. Startup Crash Protection
|
||||||
|
* **Problem**: If `faster_whisper` was imported before checking for valid drivers, the app would crash on launch for some users.
|
||||||
|
* **Fix**: Implemented **Lazy Loading** for the AI engine. The app now starts the UI first, and only loads the heavy AI libraries inside a safety block that catches errors.
|
||||||
|
|
||||||
|
### 3. Corrupt Model Auto-Repair
|
||||||
|
* **Problem**: Interrupted downloads could leave a corrupted model folder, preventing the app from ever starting again.
|
||||||
|
* **Fix**: If the app detects a "vocabulary missing" or invalid config error, it will now **automatically delete the corrupt folder** and allow you to re-download it cleanly.
|
||||||
|
|
||||||
|
### 4. Windows DLL Injection
|
||||||
|
* **Fix**: Added explicit DLL path injection for `nvidia-cublas` and `nvidia-cudnn` to ensure Python 3.8+ can find the required CUDA libraries on Windows systems that don't have them in PATH.
|
||||||
|
|
||||||
|
## 📦 Installation
|
||||||
|
1. Download `WhisperVoice.exe` below.
|
||||||
|
2. Replace your existing `.exe`.
|
||||||
|
3. Run it.
|
||||||
151
bootstrapper.py
151
bootstrapper.py
@@ -245,62 +245,106 @@ class Bootstrapper:
|
|||||||
|
|
||||||
req_file = self.source_path / "requirements.txt"
|
req_file = self.source_path / "requirements.txt"
|
||||||
|
|
||||||
|
# Use --prefer-binary to avoid building from source on Windows if possible
|
||||||
|
# Use --no-warn-script-location to reduce noise
|
||||||
|
# CRITICAL: Force --only-binary for llama-cpp-python to prevent picking new source-only versions
|
||||||
|
cmd = [
|
||||||
|
str(self.python_path / "python.exe"), "-m", "pip", "install",
|
||||||
|
"--prefer-binary",
|
||||||
|
"--only-binary", "llama-cpp-python",
|
||||||
|
"--extra-index-url", "https://abetlen.github.io/llama-cpp-python/whl/cpu",
|
||||||
|
"-r", str(req_file)
|
||||||
|
]
|
||||||
|
|
||||||
process = subprocess.Popen(
|
process = subprocess.Popen(
|
||||||
[str(self.python_path / "python.exe"), "-m", "pip", "install", "-r", str(req_file)],
|
cmd,
|
||||||
stdout=subprocess.PIPE,
|
stdout=subprocess.PIPE,
|
||||||
stderr=subprocess.STDOUT,
|
stderr=subprocess.STDOUT, # Merge stderr into stdout
|
||||||
text=True,
|
text=True,
|
||||||
cwd=str(self.python_path),
|
cwd=str(self.python_path),
|
||||||
creationflags=subprocess.CREATE_NO_WINDOW
|
creationflags=subprocess.CREATE_NO_WINDOW
|
||||||
)
|
)
|
||||||
|
|
||||||
|
output_buffer = []
|
||||||
for line in process.stdout:
|
for line in process.stdout:
|
||||||
if self.ui: self.ui.set_detail(line.strip()[:60])
|
line_stripped = line.strip()
|
||||||
process.wait()
|
if self.ui: self.ui.set_detail(line_stripped[:60])
|
||||||
|
output_buffer.append(line_stripped)
|
||||||
|
log(line_stripped)
|
||||||
|
|
||||||
|
return_code = process.wait()
|
||||||
|
|
||||||
|
if return_code != 0:
|
||||||
|
err_msg = "\n".join(output_buffer[-15:]) # Show last 15 lines
|
||||||
|
raise RuntimeError(f"Pip install failed (Exit code {return_code}):\n{err_msg}")
|
||||||
|
|
||||||
def refresh_app_source(self):
|
def refresh_app_source(self):
|
||||||
"""Refresh app source files. Skips if already exists to save time."""
|
"""
|
||||||
# Optimization: If app/main.py exists, skip update to improve startup speed.
|
Smartly updates app source files by only copying changed files.
|
||||||
# The user can delete the 'runtime' folder to force an update.
|
Preserves user settings and reduces disk I/O.
|
||||||
if (self.app_path / "main.py").exists():
|
"""
|
||||||
log("App already exists. Skipping update.")
|
if self.ui: self.ui.set_status("Checking for updates...")
|
||||||
return True
|
|
||||||
|
|
||||||
if self.ui: self.ui.set_status("Updating app files...")
|
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Preserve settings.json if it exists
|
# 1. Ensure destination exists
|
||||||
settings_path = self.app_path / "settings.json"
|
if not self.app_path.exists():
|
||||||
temp_settings = None
|
self.app_path.mkdir(parents=True, exist_ok=True)
|
||||||
if settings_path.exists():
|
|
||||||
try:
|
|
||||||
temp_settings = settings_path.read_bytes()
|
|
||||||
except:
|
|
||||||
log("Failed to backup settings.json, it involves risk of data loss.")
|
|
||||||
|
|
||||||
if self.app_path.exists():
|
# 2. Walk source and sync
|
||||||
shutil.rmtree(self.app_path, ignore_errors=True)
|
# source_path is the temporary bundled folder
|
||||||
|
# app_path is the persistent runtime folder
|
||||||
|
|
||||||
shutil.copytree(
|
changes_made = 0
|
||||||
self.source_path,
|
|
||||||
self.app_path,
|
|
||||||
ignore=shutil.ignore_patterns(
|
|
||||||
'__pycache__', '*.pyc', '.git', 'venv',
|
|
||||||
'build', 'dist', '*.egg-info', 'runtime'
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
# Restore settings.json
|
for src_dir, dirs, files in os.walk(self.source_path):
|
||||||
if temp_settings:
|
# Determine relative path from source root
|
||||||
try:
|
rel_path = Path(src_dir).relative_to(self.source_path)
|
||||||
settings_path.write_bytes(temp_settings)
|
dst_dir = self.app_path / rel_path
|
||||||
log("Restored settings.json")
|
|
||||||
except:
|
# Ensure directory exists
|
||||||
log("Failed to restore settings.json")
|
if not dst_dir.exists():
|
||||||
|
dst_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
for file in files:
|
||||||
|
# Skip ignored files
|
||||||
|
if file in ['__pycache__', '.git', 'settings.json'] or file.endswith('.pyc'):
|
||||||
|
continue
|
||||||
|
|
||||||
|
src_file = Path(src_dir) / file
|
||||||
|
dst_file = dst_dir / file
|
||||||
|
|
||||||
|
# Check if update needed
|
||||||
|
should_copy = False
|
||||||
|
if not dst_file.exists():
|
||||||
|
should_copy = True
|
||||||
|
else:
|
||||||
|
# Compare size first (fast)
|
||||||
|
if src_file.stat().st_size != dst_file.stat().st_size:
|
||||||
|
should_copy = True
|
||||||
|
else:
|
||||||
|
# Compare content (slower but accurate)
|
||||||
|
# Only read if size matches to verify diff
|
||||||
|
if src_file.read_bytes() != dst_file.read_bytes():
|
||||||
|
should_copy = True
|
||||||
|
|
||||||
|
if should_copy:
|
||||||
|
shutil.copy2(src_file, dst_file)
|
||||||
|
changes_made += 1
|
||||||
|
if self.ui: self.ui.set_detail(f"Updated: {file}")
|
||||||
|
|
||||||
|
# 3. Cleanup logic (Optional: remove files in dest that are not in source)
|
||||||
|
# For now, we only add/update to prevent deleting generated user files (logs, etc)
|
||||||
|
|
||||||
|
if changes_made > 0:
|
||||||
|
log(f"Update complete. {changes_made} files changed.")
|
||||||
|
else:
|
||||||
|
log("App is up to date.")
|
||||||
|
|
||||||
return True
|
return True
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
log(f"Error refreshing app source: {e}")
|
log(f"Error refreshing app source: {e}")
|
||||||
|
# Fallback to nuclear option if sync fails completely?
|
||||||
|
# No, 'smart_sync' failing might mean permissions, nuclear wouldn't help.
|
||||||
return False
|
return False
|
||||||
|
|
||||||
def run_app(self):
|
def run_app(self):
|
||||||
@@ -323,22 +367,51 @@ class Bootstrapper:
|
|||||||
messagebox.showerror("WhisperVoice Error", f"Failed to launch app: {e}")
|
messagebox.showerror("WhisperVoice Error", f"Failed to launch app: {e}")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
def check_dependencies(self):
|
||||||
|
"""Check if critical dependencies are importable in the embedded python."""
|
||||||
|
if not self.is_python_ready(): return False
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Check for core libs that might be missing
|
||||||
|
# We use a subprocess to check imports in the runtime environment
|
||||||
|
subprocess.check_call(
|
||||||
|
[str(self.python_path / "python.exe"), "-c", "import faster_whisper; import llama_cpp; import PySide6"],
|
||||||
|
stdout=subprocess.DEVNULL,
|
||||||
|
stderr=subprocess.DEVNULL,
|
||||||
|
cwd=str(self.python_path),
|
||||||
|
creationflags=subprocess.CREATE_NO_WINDOW
|
||||||
|
)
|
||||||
|
return True
|
||||||
|
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||||
|
return False
|
||||||
|
|
||||||
def setup_and_run(self):
|
def setup_and_run(self):
|
||||||
"""Full setup/update and run flow."""
|
"""Full setup/update and run flow."""
|
||||||
try:
|
try:
|
||||||
|
# 1. Ensure basics
|
||||||
if not self.is_python_ready():
|
if not self.is_python_ready():
|
||||||
self.download_python()
|
self.download_python()
|
||||||
|
self._fix_pth_file() # Ensure pth is fixed immediately after download
|
||||||
self.install_pip()
|
self.install_pip()
|
||||||
self.install_packages()
|
# self.install_packages() # We'll do this in the dependency check step now
|
||||||
|
|
||||||
# Always refresh source to ensure we have the latest bundled code
|
# Always refresh source to ensure we have the latest bundled code
|
||||||
self.refresh_app_source()
|
self.refresh_app_source()
|
||||||
|
|
||||||
|
# 2. Check and Install Dependencies
|
||||||
|
# We do this AFTER refreshing source so we have the latest requirements.txt
|
||||||
|
if not self.check_dependencies():
|
||||||
|
log("Dependencies missing or incomplete. Installing...")
|
||||||
|
self.install_packages()
|
||||||
|
|
||||||
# Launch
|
# Launch
|
||||||
if self.run_app():
|
if self.run_app():
|
||||||
if self.ui: self.ui.root.quit()
|
if self.ui: self.ui.root.quit()
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
messagebox.showerror("Setup Error", f"Installation failed: {e}")
|
if self.ui:
|
||||||
|
import tkinter.messagebox as mb
|
||||||
|
mb.showerror("Setup Error", f"Installation failed: {e}") # Improved error visibility
|
||||||
|
log(f"Fatal error: {e}")
|
||||||
import traceback
|
import traceback
|
||||||
traceback.print_exc()
|
traceback.print_exc()
|
||||||
|
|
||||||
|
|||||||
BIN
dist/WhisperVoice.exe
vendored
Normal file
BIN
dist/WhisperVoice.exe
vendored
Normal file
Binary file not shown.
387
main.py
387
main.py
@@ -9,6 +9,31 @@ app_dir = os.path.dirname(os.path.abspath(__file__))
|
|||||||
if app_dir not in sys.path:
|
if app_dir not in sys.path:
|
||||||
sys.path.insert(0, app_dir)
|
sys.path.insert(0, app_dir)
|
||||||
|
|
||||||
|
# -----------------------------------------------------------------------------
|
||||||
|
# WINDOWS DLL FIX (CRITICAL for Portable CUDA)
|
||||||
|
# Python 3.8+ on Windows requires explicit DLL directory addition.
|
||||||
|
# -----------------------------------------------------------------------------
|
||||||
|
if os.name == 'nt' and hasattr(os, 'add_dll_directory'):
|
||||||
|
try:
|
||||||
|
from pathlib import Path
|
||||||
|
# Scan sys.path for site-packages
|
||||||
|
for p in sys.path:
|
||||||
|
path_obj = Path(p)
|
||||||
|
if path_obj.name == 'site-packages' and path_obj.exists():
|
||||||
|
nvidia_path = path_obj / "nvidia"
|
||||||
|
if nvidia_path.exists():
|
||||||
|
for subdir in nvidia_path.iterdir():
|
||||||
|
# Add 'bin' folder from each nvidia stub (cublas, cudnn, etc.)
|
||||||
|
bin_path = subdir / "bin"
|
||||||
|
if bin_path.exists():
|
||||||
|
os.add_dll_directory(str(bin_path))
|
||||||
|
# Also try adding site-packages itself just in case
|
||||||
|
# os.add_dll_directory(str(path_obj))
|
||||||
|
break
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
# -----------------------------------------------------------------------------
|
||||||
|
|
||||||
from PySide6.QtWidgets import QApplication, QFileDialog, QMessageBox
|
from PySide6.QtWidgets import QApplication, QFileDialog, QMessageBox
|
||||||
from PySide6.QtCore import QObject, Slot, Signal, QThread, Qt, QUrl
|
from PySide6.QtCore import QObject, Slot, Signal, QThread, Qt, QUrl
|
||||||
from PySide6.QtQml import QQmlApplicationEngine
|
from PySide6.QtQml import QQmlApplicationEngine
|
||||||
@@ -19,6 +44,7 @@ from src.ui.bridge import UIBridge
|
|||||||
from src.ui.tray import SystemTray
|
from src.ui.tray import SystemTray
|
||||||
from src.core.audio_engine import AudioEngine
|
from src.core.audio_engine import AudioEngine
|
||||||
from src.core.transcriber import WhisperTranscriber
|
from src.core.transcriber import WhisperTranscriber
|
||||||
|
from src.core.llm_engine import LLMEngine
|
||||||
from src.core.hotkey_manager import HotkeyManager
|
from src.core.hotkey_manager import HotkeyManager
|
||||||
from src.core.config import ConfigManager
|
from src.core.config import ConfigManager
|
||||||
from src.utils.injector import InputInjector
|
from src.utils.injector import InputInjector
|
||||||
@@ -87,7 +113,7 @@ def _silent_shutdown_hook(exc_type, exc_value, exc_tb):
|
|||||||
sys.excepthook = _silent_shutdown_hook
|
sys.excepthook = _silent_shutdown_hook
|
||||||
|
|
||||||
class DownloadWorker(QThread):
|
class DownloadWorker(QThread):
|
||||||
"""Background worker for model downloads."""
|
"""Background worker for model downloads with REAL progress."""
|
||||||
progress = Signal(int)
|
progress = Signal(int)
|
||||||
finished = Signal()
|
finished = Signal()
|
||||||
error = Signal(str)
|
error = Signal(str)
|
||||||
@@ -98,33 +124,144 @@ class DownloadWorker(QThread):
|
|||||||
|
|
||||||
def run(self):
|
def run(self):
|
||||||
try:
|
try:
|
||||||
from faster_whisper import download_model
|
import requests
|
||||||
|
from tqdm import tqdm
|
||||||
model_path = get_models_path()
|
model_path = get_models_path()
|
||||||
# Download to a specific subdirectory to keep things clean and predictable
|
# Determine what to download
|
||||||
# This matches the logic in transcriber.py which looks for this specific path
|
|
||||||
dest_dir = model_path / f"faster-whisper-{self.model_name}"
|
dest_dir = model_path / f"faster-whisper-{self.model_name}"
|
||||||
logging.info(f"Downloading Model '{self.model_name}' to {dest_dir}...")
|
repo_id = f"Systran/faster-whisper-{self.model_name}"
|
||||||
|
files = ["config.json", "model.bin", "tokenizer.json", "vocabulary.json"]
|
||||||
|
base_url = f"https://huggingface.co/{repo_id}/resolve/main"
|
||||||
|
|
||||||
# Ensure parent exists
|
dest_dir.mkdir(parents=True, exist_ok=True)
|
||||||
model_path.mkdir(parents=True, exist_ok=True)
|
logging.info(f"Downloading {self.model_name} to {dest_dir}...")
|
||||||
|
|
||||||
# output_dir in download_model specifies where the model files are saved
|
# 1. Calculate Total Size
|
||||||
download_model(self.model_name, output_dir=str(dest_dir))
|
total_size = 0
|
||||||
|
file_sizes = {}
|
||||||
|
|
||||||
|
with requests.Session() as s:
|
||||||
|
for fname in files:
|
||||||
|
url = f"{base_url}/{fname}"
|
||||||
|
head = s.head(url, allow_redirects=True)
|
||||||
|
if head.status_code == 200:
|
||||||
|
size = int(head.headers.get('content-length', 0))
|
||||||
|
file_sizes[fname] = size
|
||||||
|
total_size += size
|
||||||
|
else:
|
||||||
|
# Fallback for vocabulary.json vs vocabulary.txt
|
||||||
|
if fname == "vocabulary.json":
|
||||||
|
# Try .txt? Or just skip if not found?
|
||||||
|
# Faster-whisper usually has vocabulary.json
|
||||||
|
pass
|
||||||
|
|
||||||
|
# 2. Download loop
|
||||||
|
downloaded_bytes = 0
|
||||||
|
|
||||||
|
with requests.Session() as s:
|
||||||
|
for fname in files:
|
||||||
|
if fname not in file_sizes: continue
|
||||||
|
|
||||||
|
url = f"{base_url}/{fname}"
|
||||||
|
dest_file = dest_dir / fname
|
||||||
|
|
||||||
|
# Resume check?
|
||||||
|
# Simpler to just overwrite for reliability unless we want complex resume logic.
|
||||||
|
# We'll overwrite.
|
||||||
|
|
||||||
|
resp = s.get(url, stream=True)
|
||||||
|
resp.raise_for_status()
|
||||||
|
|
||||||
|
with open(dest_file, 'wb') as f:
|
||||||
|
for chunk in resp.iter_content(chunk_size=8192):
|
||||||
|
if chunk:
|
||||||
|
f.write(chunk)
|
||||||
|
downloaded_bytes += len(chunk)
|
||||||
|
|
||||||
|
# Emit Progress
|
||||||
|
if total_size > 0:
|
||||||
|
pct = int((downloaded_bytes / total_size) * 100)
|
||||||
|
self.progress.emit(pct)
|
||||||
|
|
||||||
self.finished.emit()
|
self.finished.emit()
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logging.error(f"Download failed: {e}")
|
logging.error(f"Download failed: {e}")
|
||||||
self.error.emit(str(e))
|
self.error.emit(str(e))
|
||||||
|
|
||||||
|
class LLMDownloadWorker(QThread):
|
||||||
|
progress = Signal(int)
|
||||||
|
finished = Signal()
|
||||||
|
error = Signal(str)
|
||||||
|
|
||||||
|
def __init__(self, parent=None):
|
||||||
|
super().__init__(parent)
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
# Support one model for now
|
||||||
|
url = "https://huggingface.co/hugging-quants/Llama-3.2-1B-Instruct-Q4_K_M-GGUF/resolve/main/llama-3.2-1b-instruct-q4_k_m.gguf?download=true"
|
||||||
|
fname = "llama-3.2-1b-instruct-q4_k_m.gguf"
|
||||||
|
|
||||||
|
model_path = get_models_path() / "llm" / "llama-3.2-1b-instruct"
|
||||||
|
model_path.mkdir(parents=True, exist_ok=True)
|
||||||
|
dest_file = model_path / fname
|
||||||
|
|
||||||
|
# Simple check if exists and > 0 size?
|
||||||
|
# We assume if the user clicked download, they want to download it.
|
||||||
|
|
||||||
|
with requests.Session() as s:
|
||||||
|
head = s.head(url, allow_redirects=True)
|
||||||
|
total_size = int(head.headers.get('content-length', 0))
|
||||||
|
|
||||||
|
resp = s.get(url, stream=True)
|
||||||
|
resp.raise_for_status()
|
||||||
|
|
||||||
|
downloaded = 0
|
||||||
|
with open(dest_file, 'wb') as f:
|
||||||
|
for chunk in resp.iter_content(chunk_size=8192):
|
||||||
|
if chunk:
|
||||||
|
f.write(chunk)
|
||||||
|
downloaded += len(chunk)
|
||||||
|
if total_size > 0:
|
||||||
|
pct = int((downloaded / total_size) * 100)
|
||||||
|
self.progress.emit(pct)
|
||||||
|
|
||||||
|
self.finished.emit()
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logging.error(f"LLM Download failed: {e}")
|
||||||
|
self.error.emit(str(e))
|
||||||
|
|
||||||
|
class LLMWorker(QThread):
|
||||||
|
finished = Signal(str)
|
||||||
|
|
||||||
|
def __init__(self, llm_engine, text, mode, parent=None):
|
||||||
|
super().__init__(parent)
|
||||||
|
self.llm_engine = llm_engine
|
||||||
|
self.text = text
|
||||||
|
self.mode = mode
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
try:
|
||||||
|
corrected = self.llm_engine.correct_text(self.text, self.mode)
|
||||||
|
self.finished.emit(corrected)
|
||||||
|
except Exception as e:
|
||||||
|
logging.error(f"LLMWorker crashed: {e}")
|
||||||
|
self.finished.emit(self.text) # Fail safe: return original text
|
||||||
|
|
||||||
|
|
||||||
class TranscriptionWorker(QThread):
|
class TranscriptionWorker(QThread):
|
||||||
finished = Signal(str)
|
finished = Signal(str)
|
||||||
def __init__(self, transcriber, audio_data, is_file=False, parent=None):
|
def __init__(self, transcriber, audio_data, is_file=False, parent=None, task_override=None):
|
||||||
super().__init__(parent)
|
super().__init__(parent)
|
||||||
self.transcriber = transcriber
|
self.transcriber = transcriber
|
||||||
self.audio_data = audio_data
|
self.audio_data = audio_data
|
||||||
self.is_file = is_file
|
self.is_file = is_file
|
||||||
|
self.task_override = task_override
|
||||||
def run(self):
|
def run(self):
|
||||||
text = self.transcriber.transcribe(self.audio_data, is_file=self.is_file)
|
text = self.transcriber.transcribe(self.audio_data, is_file=self.is_file, task=self.task_override)
|
||||||
self.finished.emit(text)
|
self.finished.emit(text)
|
||||||
|
|
||||||
class WhisperApp(QObject):
|
class WhisperApp(QObject):
|
||||||
@@ -156,6 +293,7 @@ class WhisperApp(QObject):
|
|||||||
self.bridge.settingChanged.connect(self.on_settings_changed)
|
self.bridge.settingChanged.connect(self.on_settings_changed)
|
||||||
self.bridge.hotkeysEnabledChanged.connect(self.on_hotkeys_enabled_toggle)
|
self.bridge.hotkeysEnabledChanged.connect(self.on_hotkeys_enabled_toggle)
|
||||||
self.bridge.downloadRequested.connect(self.on_download_requested)
|
self.bridge.downloadRequested.connect(self.on_download_requested)
|
||||||
|
self.bridge.llmDownloadRequested.connect(self.on_llm_download_requested)
|
||||||
|
|
||||||
self.engine.rootContext().setContextProperty("ui", self.bridge)
|
self.engine.rootContext().setContextProperty("ui", self.bridge)
|
||||||
|
|
||||||
@@ -166,13 +304,20 @@ class WhisperApp(QObject):
|
|||||||
self.tray.transcribe_file_requested.connect(self.transcribe_file)
|
self.tray.transcribe_file_requested.connect(self.transcribe_file)
|
||||||
|
|
||||||
# Init Tooltip
|
# Init Tooltip
|
||||||
hotkey = self.config.get("hotkey")
|
from src.utils.formatters import format_hotkey
|
||||||
self.tray.setToolTip(f"Whisper Voice - Press {hotkey} to Record")
|
self.format_hotkey = format_hotkey # Store ref
|
||||||
|
|
||||||
|
hk1 = self.format_hotkey(self.config.get("hotkey"))
|
||||||
|
hk2 = self.format_hotkey(self.config.get("hotkey_translate"))
|
||||||
|
self.tray.setToolTip(f"Whisper Voice\nTranscribe: {hk1}\nTranslate: {hk2}")
|
||||||
|
|
||||||
# 3. Logic Components Placeholders
|
# 3. Logic Components Placeholders
|
||||||
self.audio_engine = None
|
self.audio_engine = None
|
||||||
self.transcriber = None
|
self.transcriber = None
|
||||||
self.hotkey_manager = None
|
self.llm_engine = None
|
||||||
|
self.hk_transcribe = None
|
||||||
|
self.hk_correct = None
|
||||||
|
self.hk_translate = None
|
||||||
self.overlay_root = None
|
self.overlay_root = None
|
||||||
|
|
||||||
# 4. Start Loader
|
# 4. Start Loader
|
||||||
@@ -222,12 +367,23 @@ class WhisperApp(QObject):
|
|||||||
self.settings_root.setVisible(False)
|
self.settings_root.setVisible(False)
|
||||||
|
|
||||||
# Install Low-Level Window Hook for Transparent Hit Test
|
# Install Low-Level Window Hook for Transparent Hit Test
|
||||||
# We must keep a reference to 'self.hook' so it isn't GC'd
|
try:
|
||||||
# scale = self.overlay_root.devicePixelRatio()
|
from src.utils.window_hook import WindowHook
|
||||||
# self.hook = WindowHook(int(self.overlay_root.winId()), 500, 300, scale)
|
hwnd = self.overlay_root.winId()
|
||||||
# self.hook.install()
|
# Initial scale from config
|
||||||
|
scale = float(self.config.get("ui_scale"))
|
||||||
|
|
||||||
# NOTE: HitTest hook will be installed here later
|
# Current Overlay Dimensions
|
||||||
|
win_w = int(460 * scale)
|
||||||
|
win_h = int(180 * scale)
|
||||||
|
|
||||||
|
self.window_hook = WindowHook(hwnd, win_w, win_h, initial_scale=scale)
|
||||||
|
self.window_hook.install()
|
||||||
|
|
||||||
|
# Initial state: Disabled because we start inactive
|
||||||
|
self.window_hook.set_enabled(False)
|
||||||
|
except Exception as e:
|
||||||
|
logging.error(f"Failed to install WindowHook: {e}")
|
||||||
|
|
||||||
def center_overlay(self):
|
def center_overlay(self):
|
||||||
"""Calculates and sets the Overlay position above the taskbar."""
|
"""Calculates and sets the Overlay position above the taskbar."""
|
||||||
@@ -255,14 +411,77 @@ class WhisperApp(QObject):
|
|||||||
self.audio_engine.set_visualizer_callback(self.bridge.update_amplitude)
|
self.audio_engine.set_visualizer_callback(self.bridge.update_amplitude)
|
||||||
self.audio_engine.set_silence_callback(self.on_silence_detected)
|
self.audio_engine.set_silence_callback(self.on_silence_detected)
|
||||||
self.transcriber = WhisperTranscriber()
|
self.transcriber = WhisperTranscriber()
|
||||||
self.hotkey_manager = HotkeyManager()
|
self.llm_engine = LLMEngine()
|
||||||
self.hotkey_manager.triggered.connect(self.toggle_recording)
|
|
||||||
self.hotkey_manager.start()
|
# Dual Hotkey Managers
|
||||||
|
self.hk_transcribe = HotkeyManager(config_key="hotkey")
|
||||||
|
self.hk_transcribe.triggered.connect(lambda: self.toggle_recording(task_override="transcribe", task_mode="standard"))
|
||||||
|
self.hk_transcribe.start()
|
||||||
|
|
||||||
|
self.hk_correct = HotkeyManager(config_key="hotkey_correct")
|
||||||
|
self.hk_correct.triggered.connect(lambda: self.toggle_recording(task_override="transcribe", task_mode="correct"))
|
||||||
|
self.hk_correct.start()
|
||||||
|
|
||||||
|
self.hk_translate = HotkeyManager(config_key="hotkey_translate")
|
||||||
|
self.hk_translate.triggered.connect(lambda: self.toggle_recording(task_override="translate", task_mode="standard"))
|
||||||
|
self.hk_translate.start()
|
||||||
|
|
||||||
self.bridge.update_status("Ready")
|
self.bridge.update_status("Ready")
|
||||||
|
|
||||||
def run(self):
|
def run(self):
|
||||||
sys.exit(self.qt_app.exec())
|
sys.exit(self.qt_app.exec())
|
||||||
|
|
||||||
|
@Slot(str, str)
|
||||||
|
@Slot(str)
|
||||||
|
def toggle_recording(self, task_override=None, task_mode="standard"):
|
||||||
|
"""
|
||||||
|
task_override: 'transcribe' or 'translate' (passed to whisper)
|
||||||
|
task_mode: 'standard' or 'correct' (determines post-processing)
|
||||||
|
"""
|
||||||
|
if task_mode == "correct":
|
||||||
|
self.current_task_requires_llm = True
|
||||||
|
elif task_mode == "standard":
|
||||||
|
self.current_task_requires_llm = False # Explicit reset
|
||||||
|
|
||||||
|
# Actual Logic
|
||||||
|
if self.bridge.isRecording:
|
||||||
|
logging.info("Stopping recording...")
|
||||||
|
# stop_recording returns the numpy array directly
|
||||||
|
audio_data = self.audio_engine.stop_recording()
|
||||||
|
|
||||||
|
self.bridge.isRecording = False
|
||||||
|
self.bridge.update_status("Processing...")
|
||||||
|
self.bridge.isProcessing = True
|
||||||
|
|
||||||
|
# Save task override for processing
|
||||||
|
self.last_task_override = task_override
|
||||||
|
|
||||||
|
if audio_data is not None and len(audio_data) > 0:
|
||||||
|
# Use the task that started this session, or the override if provided
|
||||||
|
final_task = getattr(self, "current_recording_task", self.config.get("task"))
|
||||||
|
if task_override: final_task = task_override
|
||||||
|
|
||||||
|
self.worker = TranscriptionWorker(self.transcriber, audio_data, parent=self, task_override=final_task)
|
||||||
|
self.worker.finished.connect(self.on_transcription_done)
|
||||||
|
self.worker.start()
|
||||||
|
else:
|
||||||
|
self.bridge.update_status("Ready")
|
||||||
|
self.bridge.isProcessing = False
|
||||||
|
|
||||||
|
else:
|
||||||
|
# START RECORDING
|
||||||
|
if self.bridge.isProcessing:
|
||||||
|
logging.warning("Ignored toggle request: Transcription in progress.")
|
||||||
|
return
|
||||||
|
|
||||||
|
intended_task = task_override if task_override else self.config.get("task")
|
||||||
|
self.current_recording_task = intended_task
|
||||||
|
|
||||||
|
logging.info(f"Starting recording... (Task: {intended_task}, Mode: {task_mode})")
|
||||||
|
self.audio_engine.start_recording()
|
||||||
|
self.bridge.isRecording = True
|
||||||
|
self.bridge.update_status(f"Recording ({intended_task})...")
|
||||||
|
|
||||||
@Slot()
|
@Slot()
|
||||||
def quit_app(self):
|
def quit_app(self):
|
||||||
logging.info("Shutting down...")
|
logging.info("Shutting down...")
|
||||||
@@ -275,7 +494,8 @@ class WhisperApp(QObject):
|
|||||||
except: pass
|
except: pass
|
||||||
self.bridge.stats_worker.stop()
|
self.bridge.stats_worker.stop()
|
||||||
|
|
||||||
if self.hotkey_manager: self.hotkey_manager.stop()
|
if self.hk_transcribe: self.hk_transcribe.stop()
|
||||||
|
if self.hk_translate: self.hk_translate.stop()
|
||||||
|
|
||||||
# Close all QML windows to ensure bindings stop before Python objects die
|
# Close all QML windows to ensure bindings stop before Python objects die
|
||||||
if self.overlay_root:
|
if self.overlay_root:
|
||||||
@@ -350,10 +570,16 @@ class WhisperApp(QObject):
|
|||||||
print(f"Setting Changed: {key} = {value}")
|
print(f"Setting Changed: {key} = {value}")
|
||||||
|
|
||||||
# 1. Hotkey Reload
|
# 1. Hotkey Reload
|
||||||
if key == "hotkey":
|
if key in ["hotkey", "hotkey_translate", "hotkey_correct"]:
|
||||||
if self.hotkey_manager: self.hotkey_manager.reload_hotkey()
|
if self.hk_transcribe: self.hk_transcribe.reload_hotkey()
|
||||||
|
if self.hk_correct: self.hk_correct.reload_hotkey()
|
||||||
|
if self.hk_translate: self.hk_translate.reload_hotkey()
|
||||||
|
|
||||||
if self.tray:
|
if self.tray:
|
||||||
self.tray.setToolTip(f"Whisper Voice - Press {value} to Record")
|
hk1 = self.format_hotkey(self.config.get("hotkey"))
|
||||||
|
hk3 = self.format_hotkey(self.config.get("hotkey_correct"))
|
||||||
|
hk2 = self.format_hotkey(self.config.get("hotkey_translate"))
|
||||||
|
self.tray.setToolTip(f"Whisper Voice\nTranscribe: {hk1}\nCorrect: {hk3}\nTranslate: {hk2}")
|
||||||
|
|
||||||
# 2. AI Model Reload (Heavy)
|
# 2. AI Model Reload (Heavy)
|
||||||
if key in ["model_size", "compute_device", "compute_type"]:
|
if key in ["model_size", "compute_device", "compute_type"]:
|
||||||
@@ -456,6 +682,8 @@ class WhisperApp(QObject):
|
|||||||
file_path, _ = QFileDialog.getOpenFileName(None, "Select Audio", "", "Audio (*.mp3 *.wav *.flac *.m4a *.ogg)")
|
file_path, _ = QFileDialog.getOpenFileName(None, "Select Audio", "", "Audio (*.mp3 *.wav *.flac *.m4a *.ogg)")
|
||||||
if file_path:
|
if file_path:
|
||||||
self.bridge.update_status("Thinking...")
|
self.bridge.update_status("Thinking...")
|
||||||
|
# Files use the default configured task usually, or we could ask?
|
||||||
|
# Default to config setting for files.
|
||||||
self.worker = TranscriptionWorker(self.transcriber, file_path, is_file=True, parent=self)
|
self.worker = TranscriptionWorker(self.transcriber, file_path, is_file=True, parent=self)
|
||||||
self.worker.finished.connect(self.on_transcription_done)
|
self.worker.finished.connect(self.on_transcription_done)
|
||||||
self.worker.start()
|
self.worker.start()
|
||||||
@@ -463,48 +691,73 @@ class WhisperApp(QObject):
|
|||||||
@Slot()
|
@Slot()
|
||||||
def on_silence_detected(self):
|
def on_silence_detected(self):
|
||||||
from PySide6.QtCore import QMetaObject, Qt
|
from PySide6.QtCore import QMetaObject, Qt
|
||||||
|
# Silence detection always triggers the task that was active?
|
||||||
|
# Since silence stops recording, it just calls toggle_recording with no arg, using the stored current_task?
|
||||||
|
# Let's ensure toggle_recording handles no arg calls by stopping the CURRENT task.
|
||||||
QMetaObject.invokeMethod(self, "toggle_recording", Qt.QueuedConnection)
|
QMetaObject.invokeMethod(self, "toggle_recording", Qt.QueuedConnection)
|
||||||
|
|
||||||
@Slot()
|
|
||||||
def toggle_recording(self):
|
|
||||||
if not self.audio_engine: return
|
|
||||||
|
|
||||||
# Prevent starting a new recording while we are still transcribing the last one
|
|
||||||
if self.bridge.isProcessing:
|
|
||||||
logging.warning("Ignored toggle request: Transcription in progress.")
|
|
||||||
return
|
|
||||||
|
|
||||||
if self.audio_engine.recording:
|
|
||||||
self.bridge.update_status("Thinking...")
|
|
||||||
self.bridge.isRecording = False
|
|
||||||
self.bridge.isProcessing = True # Start Processing
|
|
||||||
audio_data = self.audio_engine.stop_recording()
|
|
||||||
self.worker = TranscriptionWorker(self.transcriber, audio_data, parent=self)
|
|
||||||
self.worker.finished.connect(self.on_transcription_done)
|
|
||||||
self.worker.start()
|
|
||||||
else:
|
|
||||||
self.bridge.update_status("Recording")
|
|
||||||
self.bridge.isRecording = True
|
|
||||||
self.audio_engine.start_recording()
|
|
||||||
|
|
||||||
@Slot(bool)
|
@Slot(bool)
|
||||||
def on_ui_toggle_request(self, state):
|
def on_ui_toggle_request(self, state):
|
||||||
if state != self.audio_engine.recording:
|
if state != self.audio_engine.recording:
|
||||||
self.toggle_recording()
|
self.toggle_recording() # Default behavior for UI clicks
|
||||||
|
|
||||||
@Slot(str)
|
@Slot(str)
|
||||||
def on_transcription_done(self, text: str):
|
def on_transcription_done(self, text: str):
|
||||||
self.bridge.update_status("Ready")
|
self.bridge.update_status("Ready")
|
||||||
self.bridge.isProcessing = False # End Processing
|
self.bridge.isProcessing = False # Temporarily false? No, keep it true if we chain.
|
||||||
|
|
||||||
|
# Check LLM Settings -> AND check if the current task requested it
|
||||||
|
llm_enabled = self.config.get("llm_enabled")
|
||||||
|
requires_llm = getattr(self, "current_task_requires_llm", False)
|
||||||
|
|
||||||
|
# We only correct if:
|
||||||
|
# 1. LLM is globally enabled (safety switch)
|
||||||
|
# 2. current_task_requires_llm is True (triggered by Correct hotkey)
|
||||||
|
# OR 3. Maybe user WANTS global correction? Ideally user uses separate hotkey.
|
||||||
|
# Let's say: If "Correction" is enabled in settings, does it apply to ALL?
|
||||||
|
# The user's feedback suggests they DON'T want it on regular hotkey.
|
||||||
|
# So we enforce: Correct Hotkey -> Corrects. Regular Hotkey -> Raw.
|
||||||
|
# BUT we must handle the case where user expects the old behavior?
|
||||||
|
# Let's make it strict: Only correct if triggered by correct hotkey OR if we add a "Correct All" toggle later.
|
||||||
|
# For now, let's respect the flag. But wait, if llm_enabled is OFF, we shouldn't run it even if hotkey pressed?
|
||||||
|
# Yes, safety switch.
|
||||||
|
|
||||||
|
if text and llm_enabled and requires_llm:
|
||||||
|
# Chain to LLM
|
||||||
|
self.bridge.isProcessing = True
|
||||||
|
self.bridge.update_status("Correcting...")
|
||||||
|
mode = self.config.get("llm_mode")
|
||||||
|
self.llm_worker = LLMWorker(self.llm_engine, text, mode, parent=self)
|
||||||
|
self.llm_worker.finished.connect(self.on_llm_done)
|
||||||
|
self.llm_worker.start()
|
||||||
|
return
|
||||||
|
|
||||||
|
self.bridge.isProcessing = False
|
||||||
if text:
|
if text:
|
||||||
method = self.config.get("input_method")
|
method = self.config.get("input_method")
|
||||||
speed = int(self.config.get("typing_speed"))
|
speed = int(self.config.get("typing_speed"))
|
||||||
InputInjector.inject_text(text, method, speed)
|
InputInjector.inject_text(text, method, speed)
|
||||||
|
|
||||||
|
@Slot(str)
|
||||||
|
def on_llm_done(self, text: str):
|
||||||
|
self.bridge.update_status("Ready")
|
||||||
|
self.bridge.isProcessing = False
|
||||||
|
if text:
|
||||||
|
method = self.config.get("input_method")
|
||||||
|
speed = int(self.config.get("typing_speed"))
|
||||||
|
InputInjector.inject_text(text, method, speed)
|
||||||
|
|
||||||
|
# Cleanup
|
||||||
|
if hasattr(self, 'llm_worker') and self.llm_worker:
|
||||||
|
self.llm_worker.deleteLater()
|
||||||
|
self.llm_worker = None
|
||||||
|
|
||||||
@Slot(bool)
|
@Slot(bool)
|
||||||
def on_hotkeys_enabled_toggle(self, state):
|
def on_hotkeys_enabled_toggle(self, state):
|
||||||
if self.hotkey_manager:
|
if self.hk_transcribe: self.hk_transcribe.set_enabled(state)
|
||||||
self.hotkey_manager.set_enabled(state)
|
if self.hk_translate: self.hk_translate.set_enabled(state)
|
||||||
|
|
||||||
@Slot(str)
|
@Slot(str)
|
||||||
def on_download_requested(self, size):
|
def on_download_requested(self, size):
|
||||||
@@ -519,6 +772,19 @@ class WhisperApp(QObject):
|
|||||||
self.download_worker.error.connect(self.on_download_error)
|
self.download_worker.error.connect(self.on_download_error)
|
||||||
self.download_worker.start()
|
self.download_worker.start()
|
||||||
|
|
||||||
|
@Slot()
|
||||||
|
def on_llm_download_requested(self):
|
||||||
|
if self.bridge.isDownloading: return
|
||||||
|
|
||||||
|
self.bridge.update_status("Downloading LLM...")
|
||||||
|
self.bridge.isDownloading = True
|
||||||
|
|
||||||
|
self.llm_dl_worker = LLMDownloadWorker(parent=self)
|
||||||
|
self.llm_dl_worker.progress.connect(self.on_loader_progress) # Reuse existing progress slot? Yes.
|
||||||
|
self.llm_dl_worker.finished.connect(self.on_download_finished) # Reuses same cleanup
|
||||||
|
self.llm_dl_worker.error.connect(self.on_download_error)
|
||||||
|
self.llm_dl_worker.start()
|
||||||
|
|
||||||
def on_download_finished(self):
|
def on_download_finished(self):
|
||||||
self.bridge.isDownloading = False
|
self.bridge.isDownloading = False
|
||||||
self.bridge.update_status("Ready")
|
self.bridge.update_status("Ready")
|
||||||
@@ -531,6 +797,25 @@ class WhisperApp(QObject):
|
|||||||
self.bridge.update_status("Error")
|
self.bridge.update_status("Error")
|
||||||
logging.error(f"Download Error: {err}")
|
logging.error(f"Download Error: {err}")
|
||||||
|
|
||||||
|
@Slot(bool)
|
||||||
|
def on_ui_toggle_request(self, is_recording):
|
||||||
|
"""Called when recording state changes."""
|
||||||
|
# Update Window Hook to allow clicking if active
|
||||||
|
is_active = is_recording or self.bridge.isProcessing
|
||||||
|
if hasattr(self, 'window_hook'):
|
||||||
|
self.window_hook.set_enabled(is_active)
|
||||||
|
|
||||||
|
@Slot(bool)
|
||||||
|
def on_processing_changed(self, is_processing):
|
||||||
|
is_active = self.bridge.isRecording or is_processing
|
||||||
|
if hasattr(self, 'window_hook'):
|
||||||
|
self.window_hook.set_enabled(is_active)
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
app = WhisperApp()
|
app = WhisperApp()
|
||||||
app.run()
|
|
||||||
|
# Connect extra signal for processing state
|
||||||
|
app.bridge.isProcessingChanged.connect(app.on_processing_changed)
|
||||||
|
|
||||||
|
sys.exit(app.run())
|
||||||
|
|||||||
@@ -39,39 +39,37 @@ def build_portable():
|
|||||||
print("⏳ This may take 5-10 minutes...")
|
print("⏳ This may take 5-10 minutes...")
|
||||||
|
|
||||||
PyInstaller.__main__.run([
|
PyInstaller.__main__.run([
|
||||||
"main.py", # Entry point
|
"bootstrapper.py", # Entry point (Tiny Installer)
|
||||||
"--name=WhisperVoice", # EXE name
|
"--name=WhisperVoice", # EXE name
|
||||||
"--onefile", # Single EXE (slower startup but portable)
|
"--onefile", # Single EXE
|
||||||
"--noconsole", # No terminal window
|
"--noconsole", # No terminal window
|
||||||
"--clean", # Clean cache
|
"--clean", # Clean cache
|
||||||
*add_data_args, # Bundled assets
|
|
||||||
|
|
||||||
# Heavy libraries that need special collection
|
# Bundle the app source to be extracted by bootstrapper
|
||||||
"--collect-all", "faster_whisper",
|
# The bootstrapper expects 'app_source' folder in bundled resources
|
||||||
"--collect-all", "ctranslate2",
|
"--add-data", f"src{os.pathsep}app_source/src",
|
||||||
"--collect-all", "PySide6",
|
"--add-data", f"main.py{os.pathsep}app_source",
|
||||||
"--collect-all", "torch",
|
"--add-data", f"requirements.txt{os.pathsep}app_source",
|
||||||
"--collect-all", "numpy",
|
|
||||||
|
|
||||||
# Hidden imports (modules imported dynamically)
|
# Add assets
|
||||||
"--hidden-import", "keyboard",
|
"--add-data", f"src/ui/qml{os.pathsep}app_source/src/ui/qml",
|
||||||
"--hidden-import", "pyperclip",
|
"--add-data", f"assets{os.pathsep}app_source/assets",
|
||||||
"--hidden-import", "psutil",
|
|
||||||
"--hidden-import", "pynvml",
|
|
||||||
"--hidden-import", "sounddevice",
|
|
||||||
"--hidden-import", "scipy",
|
|
||||||
"--hidden-import", "scipy.signal",
|
|
||||||
"--hidden-import", "huggingface_hub",
|
|
||||||
"--hidden-import", "tokenizers",
|
|
||||||
|
|
||||||
# Qt plugins
|
# No heavy collections!
|
||||||
"--hidden-import", "PySide6.QtQuickControls2",
|
# The bootstrapper uses internal pip to install everything.
|
||||||
"--hidden-import", "PySide6.QtQuick.Controls",
|
|
||||||
|
|
||||||
# Icon (convert to .ico for Windows)
|
# Exclude heavy modules to ensure this exe stays tiny
|
||||||
# "--icon=icon.ico", # Uncomment if you have a .ico file
|
"--exclude-module", "faster_whisper",
|
||||||
|
"--exclude-module", "torch",
|
||||||
|
"--exclude-module", "PySide6",
|
||||||
|
"--exclude-module", "llama_cpp",
|
||||||
|
|
||||||
|
|
||||||
|
# Icon
|
||||||
|
# "--icon=icon.ico",
|
||||||
])
|
])
|
||||||
|
|
||||||
|
|
||||||
print("\n" + "="*60)
|
print("\n" + "="*60)
|
||||||
print("✅ BUILD COMPLETE!")
|
print("✅ BUILD COMPLETE!")
|
||||||
print("="*60)
|
print("="*60)
|
||||||
|
|||||||
73
publish_release.py
Normal file
73
publish_release.py
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
import os
|
||||||
|
import requests
|
||||||
|
import mimetypes
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
API_URL = "https://git.lashman.live/api/v1"
|
||||||
|
OWNER = "lashman"
|
||||||
|
REPO = "whisper_voice"
|
||||||
|
TAG = "v1.0.4"
|
||||||
|
TOKEN = "6153890332afff2d725aaf4729bc54b5030d5700" # Extracted from git config
|
||||||
|
EXE_PATH = r"dist\WhisperVoice.exe"
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"token {TOKEN}",
|
||||||
|
"Accept": "application/json"
|
||||||
|
}
|
||||||
|
|
||||||
|
def create_release():
|
||||||
|
print(f"Creating release {TAG}...")
|
||||||
|
|
||||||
|
# Read Release Notes
|
||||||
|
with open("RELEASE_NOTES.md", "r", encoding="utf-8") as f:
|
||||||
|
notes = f.read()
|
||||||
|
|
||||||
|
# Create Release
|
||||||
|
payload = {
|
||||||
|
"tag_name": TAG,
|
||||||
|
"name": TAG,
|
||||||
|
"body": notes,
|
||||||
|
"draft": False,
|
||||||
|
"prerelease": False
|
||||||
|
}
|
||||||
|
|
||||||
|
url = f"{API_URL}/repos/{OWNER}/{REPO}/releases"
|
||||||
|
resp = requests.post(url, json=payload, headers=headers)
|
||||||
|
|
||||||
|
if resp.status_code == 201:
|
||||||
|
print("Release created successfully!")
|
||||||
|
return resp.json()
|
||||||
|
elif resp.status_code == 409:
|
||||||
|
print("Release already exists. Fetching it...")
|
||||||
|
# Get by tag
|
||||||
|
resp = requests.get(f"{API_URL}/repos/{OWNER}/{REPO}/releases/tags/{TAG}", headers=headers)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
return resp.json()
|
||||||
|
|
||||||
|
print(f"Failed to create release: {resp.status_code} - {resp.text}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def upload_asset(release_id, file_path):
|
||||||
|
print(f"Uploading asset: {file_path}...")
|
||||||
|
filename = os.path.basename(file_path)
|
||||||
|
|
||||||
|
with open(file_path, "rb") as f:
|
||||||
|
data = f.read()
|
||||||
|
|
||||||
|
url = f"{API_URL}/repos/{OWNER}/{REPO}/releases/{release_id}/assets?name={filename}"
|
||||||
|
|
||||||
|
# Gitea API expects raw body
|
||||||
|
resp = requests.post(url, data=data, headers=headers)
|
||||||
|
|
||||||
|
if resp.status_code == 201:
|
||||||
|
print(f"Uploaded {filename} successfully!")
|
||||||
|
else:
|
||||||
|
print(f"Failed to upload asset: {resp.status_code} - {resp.text}")
|
||||||
|
|
||||||
|
def main():
|
||||||
|
release = create_release()
|
||||||
|
if release:
|
||||||
|
upload_asset(release["id"], EXE_PATH)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -5,6 +5,7 @@
|
|||||||
faster-whisper>=1.0.0
|
faster-whisper>=1.0.0
|
||||||
torch>=2.0.0
|
torch>=2.0.0
|
||||||
|
|
||||||
|
|
||||||
# UI Framework
|
# UI Framework
|
||||||
PySide6>=6.6.0
|
PySide6>=6.6.0
|
||||||
|
|
||||||
@@ -28,3 +29,6 @@ huggingface-hub>=0.20.0
|
|||||||
pystray>=0.19.0
|
pystray>=0.19.0
|
||||||
Pillow>=10.0.0
|
Pillow>=10.0.0
|
||||||
darkdetect>=0.8.0
|
darkdetect>=0.8.0
|
||||||
|
|
||||||
|
# LLM / Correction
|
||||||
|
llama-cpp-python>=0.2.20
|
||||||
|
|||||||
@@ -16,6 +16,8 @@ from src.core.paths import get_base_path
|
|||||||
# Default Configuration
|
# Default Configuration
|
||||||
DEFAULT_SETTINGS = {
|
DEFAULT_SETTINGS = {
|
||||||
"hotkey": "f8",
|
"hotkey": "f8",
|
||||||
|
"hotkey_translate": "f10",
|
||||||
|
"hotkey_correct": "f9", # New: Transcribe + Correct
|
||||||
"model_size": "small",
|
"model_size": "small",
|
||||||
"input_device": None, # Device ID (int) or Name (str), None = Default
|
"input_device": None, # Device ID (int) or Name (str), None = Default
|
||||||
"save_recordings": False, # Save .wav files for debugging
|
"save_recordings": False, # Save .wav files for debugging
|
||||||
@@ -38,13 +40,25 @@ DEFAULT_SETTINGS = {
|
|||||||
|
|
||||||
# AI - Advanced
|
# AI - Advanced
|
||||||
"language": "auto", # "auto" or ISO code
|
"language": "auto", # "auto" or ISO code
|
||||||
|
"task": "transcribe", # "transcribe" or "translate" (to English)
|
||||||
"compute_device": "auto", # "auto", "cuda", "cpu"
|
"compute_device": "auto", # "auto", "cuda", "cpu"
|
||||||
"compute_type": "int8", # "int8", "float16", "float32"
|
"compute_type": "int8", # "int8", "float16", "float32"
|
||||||
"beam_size": 5,
|
"beam_size": 5,
|
||||||
"best_of": 5,
|
"best_of": 5,
|
||||||
"vad_filter": True,
|
"vad_filter": True,
|
||||||
"no_repeat_ngram_size": 0,
|
"no_repeat_ngram_size": 0,
|
||||||
"condition_on_previous_text": True
|
"condition_on_previous_text": True,
|
||||||
|
"initial_prompt": "Mm-hmm. Okay, let's go. I speak in full sentences.", # Default: Forces punctuation
|
||||||
|
|
||||||
|
# LLM Correction
|
||||||
|
"llm_enabled": False,
|
||||||
|
"llm_mode": "Standard", # "Grammar", "Standard", "Rewrite"
|
||||||
|
"llm_model_name": "llama-3.2-1b-instruct",
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# Low VRAM Mode
|
||||||
|
"unload_models_after_use": False # If True, models are unloaded immediately to free VRAM
|
||||||
}
|
}
|
||||||
|
|
||||||
class ConfigManager:
|
class ConfigManager:
|
||||||
@@ -94,9 +108,9 @@ class ConfigManager:
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
logging.error(f"Failed to save settings: {e}")
|
logging.error(f"Failed to save settings: {e}")
|
||||||
|
|
||||||
def get(self, key: str) -> Any:
|
def get(self, key: str, default: Any = None) -> Any:
|
||||||
"""Get a setting value."""
|
"""Get a setting value."""
|
||||||
return self.data.get(key, DEFAULT_SETTINGS.get(key))
|
return self.data.get(key, DEFAULT_SETTINGS.get(key, default))
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -30,15 +30,16 @@ class HotkeyManager(QObject):
|
|||||||
|
|
||||||
triggered = Signal()
|
triggered = Signal()
|
||||||
|
|
||||||
def __init__(self, hotkey: str = "f8"):
|
def __init__(self, config_key: str = "hotkey"):
|
||||||
"""
|
"""
|
||||||
Initialize the HotkeyManager.
|
Initialize the HotkeyManager.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
hotkey (str): The global hotkey string description. Default: "f8".
|
config_key (str): The configuration key to look up (e.g. "hotkey").
|
||||||
"""
|
"""
|
||||||
super().__init__()
|
super().__init__()
|
||||||
self.hotkey = hotkey
|
self.config_key = config_key
|
||||||
|
self.hotkey = "f8" # Placeholder
|
||||||
self.is_listening = False
|
self.is_listening = False
|
||||||
self._enabled = True
|
self._enabled = True
|
||||||
|
|
||||||
@@ -58,9 +59,9 @@ class HotkeyManager(QObject):
|
|||||||
|
|
||||||
from src.core.config import ConfigManager
|
from src.core.config import ConfigManager
|
||||||
config = ConfigManager()
|
config = ConfigManager()
|
||||||
self.hotkey = config.get("hotkey")
|
self.hotkey = config.get(self.config_key)
|
||||||
|
|
||||||
logging.info(f"Registering global hotkey: {self.hotkey}")
|
logging.info(f"Registering global hotkey ({self.config_key}): {self.hotkey}")
|
||||||
try:
|
try:
|
||||||
# We don't suppress=True here because we want the app to see keys during recording
|
# We don't suppress=True here because we want the app to see keys during recording
|
||||||
# (Wait, actually if we are recording we WANT keyboard to see it,
|
# (Wait, actually if we are recording we WANT keyboard to see it,
|
||||||
|
|||||||
120
src/core/languages.py
Normal file
120
src/core/languages.py
Normal file
@@ -0,0 +1,120 @@
|
|||||||
|
"""
|
||||||
|
Supported Languages Module
|
||||||
|
==========================
|
||||||
|
Full list of languages supported by OpenAI Whisper.
|
||||||
|
Maps ISO codes to display names.
|
||||||
|
"""
|
||||||
|
|
||||||
|
LANGUAGES = {
|
||||||
|
"auto": "Auto Detect",
|
||||||
|
"af": "Afrikaans",
|
||||||
|
"sq": "Albanian",
|
||||||
|
"am": "Amharic",
|
||||||
|
"ar": "Arabic",
|
||||||
|
"hy": "Armenian",
|
||||||
|
"as": "Assamese",
|
||||||
|
"az": "Azerbaijani",
|
||||||
|
"ba": "Bashkir",
|
||||||
|
"eu": "Basque",
|
||||||
|
"be": "Belarusian",
|
||||||
|
"bn": "Bengali",
|
||||||
|
"bs": "Bosnian",
|
||||||
|
"br": "Breton",
|
||||||
|
"bg": "Bulgarian",
|
||||||
|
"my": "Burmese",
|
||||||
|
"ca": "Catalan",
|
||||||
|
"zh": "Chinese",
|
||||||
|
"hr": "Croatian",
|
||||||
|
"cs": "Czech",
|
||||||
|
"da": "Danish",
|
||||||
|
"nl": "Dutch",
|
||||||
|
"en": "English",
|
||||||
|
"et": "Estonian",
|
||||||
|
"fo": "Faroese",
|
||||||
|
"fi": "Finnish",
|
||||||
|
"fr": "French",
|
||||||
|
"gl": "Galician",
|
||||||
|
"ka": "Georgian",
|
||||||
|
"de": "German",
|
||||||
|
"el": "Greek",
|
||||||
|
"gu": "Gujarati",
|
||||||
|
"ht": "Haitian",
|
||||||
|
"ha": "Hausa",
|
||||||
|
"haw": "Hawaiian",
|
||||||
|
"he": "Hebrew",
|
||||||
|
"hi": "Hindi",
|
||||||
|
"hu": "Hungarian",
|
||||||
|
"is": "Icelandic",
|
||||||
|
"id": "Indonesian",
|
||||||
|
"it": "Italian",
|
||||||
|
"ja": "Japanese",
|
||||||
|
"jw": "Javanese",
|
||||||
|
"kn": "Kannada",
|
||||||
|
"kk": "Kazakh",
|
||||||
|
"km": "Khmer",
|
||||||
|
"ko": "Korean",
|
||||||
|
"lo": "Lao",
|
||||||
|
"la": "Latin",
|
||||||
|
"lv": "Latvian",
|
||||||
|
"ln": "Lingala",
|
||||||
|
"lt": "Lithuanian",
|
||||||
|
"lb": "Luxembourgish",
|
||||||
|
"mk": "Macedonian",
|
||||||
|
"mg": "Malagasy",
|
||||||
|
"ms": "Malay",
|
||||||
|
"ml": "Malayalam",
|
||||||
|
"mt": "Maltese",
|
||||||
|
"mi": "Maori",
|
||||||
|
"mr": "Marathi",
|
||||||
|
"mn": "Mongolian",
|
||||||
|
"ne": "Nepali",
|
||||||
|
"no": "Norwegian",
|
||||||
|
"oc": "Occitan",
|
||||||
|
"pa": "Punjabi",
|
||||||
|
"ps": "Pashto",
|
||||||
|
"fa": "Persian",
|
||||||
|
"pl": "Polish",
|
||||||
|
"pt": "Portuguese",
|
||||||
|
"ro": "Romanian",
|
||||||
|
"ru": "Russian",
|
||||||
|
"sa": "Sanskrit",
|
||||||
|
"sr": "Serbian",
|
||||||
|
"sn": "Shona",
|
||||||
|
"sd": "Sindhi",
|
||||||
|
"si": "Sinhala",
|
||||||
|
"sk": "Slovak",
|
||||||
|
"sl": "Slovenian",
|
||||||
|
"so": "Somali",
|
||||||
|
"es": "Spanish",
|
||||||
|
"su": "Sundanese",
|
||||||
|
"sw": "Swahili",
|
||||||
|
"sv": "Swedish",
|
||||||
|
"tl": "Tagalog",
|
||||||
|
"tg": "Tajik",
|
||||||
|
"ta": "Tamil",
|
||||||
|
"tt": "Tatar",
|
||||||
|
"te": "Telugu",
|
||||||
|
"th": "Thai",
|
||||||
|
"bo": "Tibetan",
|
||||||
|
"tr": "Turkish",
|
||||||
|
"tk": "Turkmen",
|
||||||
|
"uk": "Ukrainian",
|
||||||
|
"ur": "Urdu",
|
||||||
|
"uz": "Uzbek",
|
||||||
|
"vi": "Vietnamese",
|
||||||
|
"cy": "Welsh",
|
||||||
|
"yi": "Yiddish",
|
||||||
|
"yo": "Yoruba",
|
||||||
|
}
|
||||||
|
|
||||||
|
def get_language_names():
|
||||||
|
return list(LANGUAGES.values())
|
||||||
|
|
||||||
|
def get_code_by_name(name):
|
||||||
|
for code, lang in LANGUAGES.items():
|
||||||
|
if lang == name:
|
||||||
|
return code
|
||||||
|
return "auto"
|
||||||
|
|
||||||
|
def get_name_by_code(code):
|
||||||
|
return LANGUAGES.get(code, "Auto Detect")
|
||||||
185
src/core/llm_engine.py
Normal file
185
src/core/llm_engine.py
Normal file
@@ -0,0 +1,185 @@
|
|||||||
|
"""
|
||||||
|
LLM Engine Module.
|
||||||
|
==================
|
||||||
|
|
||||||
|
Handles interaction with the local Llama 3.2 1B model for transcription correction.
|
||||||
|
Uses llama-cpp-python for efficient local inference.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import logging
|
||||||
|
from typing import Optional
|
||||||
|
from src.core.paths import get_models_path
|
||||||
|
from src.core.config import ConfigManager
|
||||||
|
|
||||||
|
try:
|
||||||
|
from llama_cpp import Llama
|
||||||
|
except ImportError:
|
||||||
|
Llama = None
|
||||||
|
|
||||||
|
class LLMEngine:
|
||||||
|
"""
|
||||||
|
Manages the Llama model and performs text correction/rewriting.
|
||||||
|
"""
|
||||||
|
def __init__(self):
|
||||||
|
self.config = ConfigManager()
|
||||||
|
self.model = None
|
||||||
|
self.current_model_path = None
|
||||||
|
|
||||||
|
# --- Mode 1: Grammar Only (Strict) ---
|
||||||
|
self.prompt_grammar = (
|
||||||
|
"You are a text correction tool. "
|
||||||
|
"Correct the grammar/spelling. Do not change punctuation or capitalization styles. "
|
||||||
|
"Do not remove any words (including profanity). Output ONLY the result."
|
||||||
|
"\n\nExample:\nInput: 'damn it works'\nOutput: 'damn it works'"
|
||||||
|
)
|
||||||
|
|
||||||
|
# --- Mode 2: Standard (Grammar + Punctuation + Caps) ---
|
||||||
|
self.prompt_standard = (
|
||||||
|
"You are a text correction tool. "
|
||||||
|
"Standardize the grammar, punctuation, and capitalization. "
|
||||||
|
"Do not remove any words (including profanity). Output ONLY the result."
|
||||||
|
"\n\nExample:\nInput: 'damn it works'\nOutput: 'Damn it works.'"
|
||||||
|
)
|
||||||
|
|
||||||
|
# --- Mode 3: Rewrite (Tone-Aware Polish) ---
|
||||||
|
self.prompt_rewrite = (
|
||||||
|
"You are a text rewriting tool. Improve flow/clarity but keep the exact tone and vocabulary. "
|
||||||
|
"Do not remove any words (including profanity). Output ONLY the result."
|
||||||
|
"\n\nExample:\nInput: 'damn it works'\nOutput: 'Damn, it works.'"
|
||||||
|
)
|
||||||
|
|
||||||
|
def load_model(self) -> bool:
|
||||||
|
"""
|
||||||
|
Loads the LLM model if it exists.
|
||||||
|
Returns True if successful, False otherwise.
|
||||||
|
"""
|
||||||
|
if Llama is None:
|
||||||
|
logging.error("llama-cpp-python not installed.")
|
||||||
|
return False
|
||||||
|
|
||||||
|
model_name = self.config.get("llm_model_name", "llama-3.2-1b-instruct")
|
||||||
|
model_dir = get_models_path() / "llm" / model_name
|
||||||
|
model_file = model_dir / "llama-3.2-1b-instruct-q4_k_m.gguf"
|
||||||
|
|
||||||
|
if not model_file.exists():
|
||||||
|
logging.warning(f"LLM Model not found at: {model_file}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if self.model and self.current_model_path == str(model_file):
|
||||||
|
return True
|
||||||
|
|
||||||
|
try:
|
||||||
|
logging.info(f"Loading LLM from {model_file}...")
|
||||||
|
n_gpu_layers = 0
|
||||||
|
try:
|
||||||
|
import torch
|
||||||
|
if torch.cuda.is_available():
|
||||||
|
n_gpu_layers = -1
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
self.model = Llama(
|
||||||
|
model_path=str(model_file),
|
||||||
|
n_gpu_layers=n_gpu_layers,
|
||||||
|
n_ctx=2048,
|
||||||
|
verbose=False
|
||||||
|
)
|
||||||
|
self.current_model_path = str(model_file)
|
||||||
|
logging.info("LLM loaded successfully.")
|
||||||
|
return True
|
||||||
|
except Exception as e:
|
||||||
|
logging.error(f"Failed to load LLM: {e}")
|
||||||
|
self.model = None
|
||||||
|
return False
|
||||||
|
|
||||||
|
def correct_text(self, text: str, mode: str = "Standard") -> str:
|
||||||
|
"""Corrects or rewrites the provided text."""
|
||||||
|
if not text or not text.strip():
|
||||||
|
return text
|
||||||
|
|
||||||
|
if not self.model:
|
||||||
|
if not self.load_model():
|
||||||
|
return text
|
||||||
|
|
||||||
|
logging.info(f"LLM Processing ({mode}): '{text}'")
|
||||||
|
|
||||||
|
system_prompt = self.prompt_standard
|
||||||
|
if mode == "Grammar": system_prompt = self.prompt_grammar
|
||||||
|
elif mode == "Rewrite": system_prompt = self.prompt_rewrite
|
||||||
|
|
||||||
|
# PREFIX INJECTION TECHNIQUE
|
||||||
|
# We end the prompt with the start of the assistant's answer specifically phrased to force compliance.
|
||||||
|
# "Here is the processed output:" forces it into a completion mode rather than a refusal mode.
|
||||||
|
prefix_injection = "Here is the processed output:\n"
|
||||||
|
|
||||||
|
prompt = (
|
||||||
|
f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|>"
|
||||||
|
f"<|start_header_id|>user<|end_header_id|>\n\nProcess this input:\n{text}<|eot_id|>"
|
||||||
|
f"<|start_header_id|>assistant<|end_header_id|>\n\n{prefix_injection}"
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
output = self.model(
|
||||||
|
prompt,
|
||||||
|
max_tokens=512,
|
||||||
|
stop=["<|eot_id|>"],
|
||||||
|
echo=False,
|
||||||
|
temperature=0.1
|
||||||
|
)
|
||||||
|
|
||||||
|
result = output['choices'][0]['text'].strip()
|
||||||
|
|
||||||
|
# 1. Fallback: If result is empty, it might have just outputted nothing because we prefilled?
|
||||||
|
# Actually llama-cpp-python usually returns the *continuation*.
|
||||||
|
# So if it outputted "My corrected text.", the full logical response is "Here is...: My corrected text."
|
||||||
|
# We just want the result.
|
||||||
|
|
||||||
|
# Refusal Detection (Safety Net)
|
||||||
|
refusal_triggers = [
|
||||||
|
"I cannot", "I can't", "I am unable", "I apologize", "sorry",
|
||||||
|
"As an AI", "explicit content", "harmful content", "safety guidelines"
|
||||||
|
]
|
||||||
|
lower_res = result.lower()
|
||||||
|
if any(trig in lower_res for trig in refusal_triggers) and len(result) < 150:
|
||||||
|
logging.warning(f"LLM Refusal Detected: '{result}'. Falling back to original.")
|
||||||
|
return text # Return original text on refusal!
|
||||||
|
|
||||||
|
# --- Robust Post-Processing ---
|
||||||
|
|
||||||
|
# 1. Strip quotes
|
||||||
|
if result.startswith('"') and result.endswith('"') and len(result) > 2 and '"' not in result[1:-1]:
|
||||||
|
result = result[1:-1]
|
||||||
|
if result.startswith("'") and result.endswith("'") and len(result) > 2 and "'" not in result[1:-1]:
|
||||||
|
result = result[1:-1]
|
||||||
|
|
||||||
|
# 2. Split by newline
|
||||||
|
if "\n" in result:
|
||||||
|
lines = result.split('\n')
|
||||||
|
clean_lines = [l.strip() for l in lines if l.strip()]
|
||||||
|
if clean_lines:
|
||||||
|
result = clean_lines[0]
|
||||||
|
|
||||||
|
# 3. Aggressive Preamble Stripping (Updates for new prefix)
|
||||||
|
import re
|
||||||
|
prefixes = [
|
||||||
|
r"^Here is the processed output:?\s*", # The one we injected
|
||||||
|
r"^Here is the corrected text:?\s*",
|
||||||
|
r"^Here is the rewritten text:?\s*",
|
||||||
|
r"^Here's the result:?\s*",
|
||||||
|
r"^Sure,? here is regex.*:?\s*",
|
||||||
|
r"^Output:?\s*",
|
||||||
|
r"^Processing result:?\s*",
|
||||||
|
]
|
||||||
|
|
||||||
|
for p in prefixes:
|
||||||
|
result = re.sub(p, "", result, flags=re.IGNORECASE).strip()
|
||||||
|
|
||||||
|
if result.startswith('"') and result.endswith('"') and len(result) > 2 and '"' not in result[1:-1]:
|
||||||
|
result = result[1:-1]
|
||||||
|
|
||||||
|
logging.info(f"LLM Result: '{result}'")
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
logging.error(f"LLM inference failed: {e}")
|
||||||
|
return text # Fail safe logic
|
||||||
@@ -15,8 +15,13 @@ import numpy as np
|
|||||||
from src.core.config import ConfigManager
|
from src.core.config import ConfigManager
|
||||||
from src.core.paths import get_models_path
|
from src.core.paths import get_models_path
|
||||||
|
|
||||||
|
try:
|
||||||
|
import torch
|
||||||
|
except ImportError:
|
||||||
|
torch = None
|
||||||
|
|
||||||
# Import directly - valid since we are now running in the full environment
|
# Import directly - valid since we are now running in the full environment
|
||||||
from faster_whisper import WhisperModel
|
|
||||||
|
|
||||||
class WhisperTranscriber:
|
class WhisperTranscriber:
|
||||||
"""
|
"""
|
||||||
@@ -57,6 +62,8 @@ class WhisperTranscriber:
|
|||||||
# Force offline if path exists to avoid HF errors
|
# Force offline if path exists to avoid HF errors
|
||||||
local_only = new_path.exists()
|
local_only = new_path.exists()
|
||||||
|
|
||||||
|
try:
|
||||||
|
from faster_whisper import WhisperModel
|
||||||
self.model = WhisperModel(
|
self.model = WhisperModel(
|
||||||
model_input,
|
model_input,
|
||||||
device=device,
|
device=device,
|
||||||
@@ -64,6 +71,23 @@ class WhisperTranscriber:
|
|||||||
download_root=str(get_models_path()),
|
download_root=str(get_models_path()),
|
||||||
local_files_only=local_only
|
local_files_only=local_only
|
||||||
)
|
)
|
||||||
|
except Exception as load_err:
|
||||||
|
# CRITICAL FALLBACK: If CUDA/cublas fails (AMD/Intel users), fallback to CPU
|
||||||
|
err_str = str(load_err).lower()
|
||||||
|
if "cublas" in err_str or "cudnn" in err_str or "library" in err_str or "device" in err_str:
|
||||||
|
logging.warning(f"CUDA Init Failed ({load_err}). Falling back to CPU...")
|
||||||
|
self.config.set("compute_device", "cpu") # Update config for persistence/UI
|
||||||
|
self.current_compute_device = "cpu"
|
||||||
|
|
||||||
|
self.model = WhisperModel(
|
||||||
|
model_input,
|
||||||
|
device="cpu",
|
||||||
|
compute_type="int8", # CPU usually handles int8 well with newer extensions, or standard
|
||||||
|
download_root=str(get_models_path()),
|
||||||
|
local_files_only=local_only
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
raise load_err
|
||||||
|
|
||||||
self.current_model_size = size
|
self.current_model_size = size
|
||||||
self.current_compute_device = device
|
self.current_compute_device = device
|
||||||
@@ -74,41 +98,119 @@ class WhisperTranscriber:
|
|||||||
logging.error(f"Failed to load model: {e}")
|
logging.error(f"Failed to load model: {e}")
|
||||||
self.model = None
|
self.model = None
|
||||||
|
|
||||||
def transcribe(self, audio_data, is_file: bool = False) -> str:
|
# Auto-Repair: Detect vocabulary/corrupt errors
|
||||||
|
err_str = str(e).lower()
|
||||||
|
if "vocabulary" in err_str or "tokenizer" in err_str or "config.json" in err_str:
|
||||||
|
# ... existing auto-repair logic ...
|
||||||
|
logging.warning("Corrupt model detected on load. Attempting to delete and reset...")
|
||||||
|
try:
|
||||||
|
import shutil
|
||||||
|
# Differentiate between simple path and HF path
|
||||||
|
new_path = get_models_path() / f"faster-whisper-{size}"
|
||||||
|
if new_path.exists():
|
||||||
|
shutil.rmtree(new_path)
|
||||||
|
logging.info(f"Deleted corrupt model at {new_path}")
|
||||||
|
else:
|
||||||
|
# Try legacy HF path
|
||||||
|
hf_path = get_models_path() / f"models--Systran--faster-whisper-{size}"
|
||||||
|
if hf_path.exists():
|
||||||
|
shutil.rmtree(hf_path)
|
||||||
|
logging.info(f"Deleted corrupt HF model at {hf_path}")
|
||||||
|
|
||||||
|
# Notify UI to refresh state (will show 'Download' button now)
|
||||||
|
# We can't reach bridge easily here without passing it in,
|
||||||
|
# but the UI polls or listens to logs.
|
||||||
|
# The user will simply see "Model Missing" in settings after this.
|
||||||
|
except Exception as del_err:
|
||||||
|
logging.error(f"Failed to delete corrupt model: {del_err}")
|
||||||
|
|
||||||
|
def transcribe(self, audio_data, is_file: bool = False, task: Optional[str] = None) -> str:
|
||||||
"""
|
"""
|
||||||
Transcribe audio data.
|
Transcribe audio data.
|
||||||
"""
|
"""
|
||||||
logging.info(f"Starting transcription... (is_file={is_file})")
|
logging.info(f"Starting transcription... (is_file={is_file}, task={task})")
|
||||||
|
|
||||||
# Ensure model is loaded
|
# Ensure model is loaded
|
||||||
if not self.model:
|
if not self.model:
|
||||||
self.load_model()
|
self.load_model()
|
||||||
if not self.model:
|
if not self.model:
|
||||||
return "Error: Model failed to load."
|
return "Error: Model failed to load. Please check Settings -> Model Info."
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Config
|
# Config
|
||||||
beam_size = int(self.config.get("beam_size"))
|
beam_size = int(self.config.get("beam_size"))
|
||||||
best_of = int(self.config.get("best_of"))
|
best_of = int(self.config.get("best_of"))
|
||||||
vad = False if is_file else self.config.get("vad_filter")
|
vad = False if is_file else self.config.get("vad_filter")
|
||||||
|
language = self.config.get("language")
|
||||||
|
|
||||||
|
# Use task override if provided, otherwise config
|
||||||
|
# Ensure safe string and lowercase ("transcribe" vs "Transcribe")
|
||||||
|
raw_task = task if task else self.config.get("task")
|
||||||
|
final_task = str(raw_task).strip().lower() if raw_task else "transcribe"
|
||||||
|
|
||||||
|
# Sanity check for valid Whisper tasks
|
||||||
|
if final_task not in ["transcribe", "translate"]:
|
||||||
|
logging.warning(f"Invalid task '{final_task}' detected. Defaulting to 'transcribe'.")
|
||||||
|
final_task = "transcribe"
|
||||||
|
|
||||||
|
# Language handling
|
||||||
|
final_language = language if language != "auto" else None
|
||||||
|
|
||||||
|
# Anti-Hallucination: Force condition_on_previous_text=False for translation
|
||||||
|
condition_prev = self.config.get("condition_on_previous_text")
|
||||||
|
|
||||||
|
# Helper options for Translation Stability
|
||||||
|
initial_prompt = self.config.get("initial_prompt")
|
||||||
|
|
||||||
|
if final_task == "translate":
|
||||||
|
condition_prev = False
|
||||||
|
# Force beam search if user has set it to greedy (1)
|
||||||
|
# Translation requires more search breadth to find the English mapping
|
||||||
|
if beam_size < 5:
|
||||||
|
logging.info("Forcing beam_size=5 for Translation task.")
|
||||||
|
beam_size = 5
|
||||||
|
|
||||||
|
# Inject guidance prompt if none exists
|
||||||
|
if not initial_prompt:
|
||||||
|
initial_prompt = "Translate this to English."
|
||||||
|
|
||||||
|
logging.info(f"Model Dispatch: Task='{final_task}', Language='{final_language}', ConditionPrev={condition_prev}, Beam={beam_size}")
|
||||||
|
|
||||||
|
# Build arguments dynamically to avoid passing None if that's the issue
|
||||||
|
transcribe_opts = {
|
||||||
|
"beam_size": beam_size,
|
||||||
|
"best_of": best_of,
|
||||||
|
"vad_filter": vad,
|
||||||
|
"task": final_task,
|
||||||
|
"vad_parameters": dict(min_silence_duration_ms=500),
|
||||||
|
"condition_on_previous_text": condition_prev,
|
||||||
|
"without_timestamps": True
|
||||||
|
}
|
||||||
|
|
||||||
|
if initial_prompt:
|
||||||
|
transcribe_opts["initial_prompt"] = initial_prompt
|
||||||
|
|
||||||
|
# Only add language if it's explicitly set (not None/Auto)
|
||||||
|
# This avoids potentially confusing the model with explicit None
|
||||||
|
if final_language:
|
||||||
|
transcribe_opts["language"] = final_language
|
||||||
|
|
||||||
# Transcribe
|
# Transcribe
|
||||||
segments, info = self.model.transcribe(
|
segments, info = self.model.transcribe(audio_data, **transcribe_opts)
|
||||||
audio_data,
|
|
||||||
beam_size=beam_size,
|
|
||||||
best_of=best_of,
|
|
||||||
vad_filter=vad,
|
|
||||||
vad_parameters=dict(min_silence_duration_ms=500),
|
|
||||||
condition_on_previous_text=self.config.get("condition_on_previous_text"),
|
|
||||||
without_timestamps=True
|
|
||||||
)
|
|
||||||
|
|
||||||
# Aggregate text
|
# Aggregate text
|
||||||
text_result = ""
|
text_result = ""
|
||||||
for segment in segments:
|
for segment in segments:
|
||||||
text_result += segment.text + " "
|
text_result += segment.text + " "
|
||||||
|
|
||||||
return text_result.strip()
|
text_result = text_result.strip()
|
||||||
|
|
||||||
|
# Low VRAM Mode: Unload Whisper Model immediately
|
||||||
|
if self.config.get("unload_models_after_use"):
|
||||||
|
self.unload_model()
|
||||||
|
|
||||||
|
logging.info(f"Final Transcription Output: '{text_result}'")
|
||||||
|
return text_result
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logging.error(f"Transcription failed: {e}")
|
logging.error(f"Transcription failed: {e}")
|
||||||
@@ -117,7 +219,10 @@ class WhisperTranscriber:
|
|||||||
def model_exists(self, size: str) -> bool:
|
def model_exists(self, size: str) -> bool:
|
||||||
"""Checks if a model size is already downloaded."""
|
"""Checks if a model size is already downloaded."""
|
||||||
new_path = get_models_path() / f"faster-whisper-{size}"
|
new_path = get_models_path() / f"faster-whisper-{size}"
|
||||||
if (new_path / "config.json").exists():
|
if new_path.exists():
|
||||||
|
# Strict check
|
||||||
|
required = ["config.json", "model.bin", "vocabulary.json"]
|
||||||
|
if all((new_path / f).exists() for f in required):
|
||||||
return True
|
return True
|
||||||
|
|
||||||
# Legacy HF cache check
|
# Legacy HF cache check
|
||||||
@@ -127,3 +232,21 @@ class WhisperTranscriber:
|
|||||||
return True
|
return True
|
||||||
|
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
def unload_model(self):
|
||||||
|
"""
|
||||||
|
Unloads model to free memory.
|
||||||
|
"""
|
||||||
|
if self.model:
|
||||||
|
del self.model
|
||||||
|
|
||||||
|
self.model = None
|
||||||
|
self.current_model_size = None
|
||||||
|
|
||||||
|
# Force garbage collection
|
||||||
|
import gc
|
||||||
|
gc.collect()
|
||||||
|
if torch.cuda.is_available():
|
||||||
|
torch.cuda.empty_cache()
|
||||||
|
|
||||||
|
logging.info("Whisper Model unloaded (Low VRAM Mode).")
|
||||||
|
|||||||
@@ -110,6 +110,7 @@ class UIBridge(QObject):
|
|||||||
logAppended = Signal(str) # Emits new log line
|
logAppended = Signal(str) # Emits new log line
|
||||||
settingChanged = Signal(str, 'QVariant')
|
settingChanged = Signal(str, 'QVariant')
|
||||||
modelStatesChanged = Signal() # Notify UI to re-check isModelDownloaded
|
modelStatesChanged = Signal() # Notify UI to re-check isModelDownloaded
|
||||||
|
llmDownloadRequested = Signal()
|
||||||
|
|
||||||
def __init__(self, parent=None):
|
def __init__(self, parent=None):
|
||||||
super().__init__(parent)
|
super().__init__(parent)
|
||||||
@@ -245,6 +246,26 @@ class UIBridge(QObject):
|
|||||||
|
|
||||||
# --- Methods called from QML ---
|
# --- Methods called from QML ---
|
||||||
|
|
||||||
|
@Slot(result=list)
|
||||||
|
def get_supported_languages(self):
|
||||||
|
from src.core.languages import get_language_names
|
||||||
|
return get_language_names()
|
||||||
|
|
||||||
|
@Slot(str)
|
||||||
|
def set_language_by_name(self, name):
|
||||||
|
from src.core.languages import get_code_by_name
|
||||||
|
from src.core.config import ConfigManager
|
||||||
|
code = get_code_by_name(name)
|
||||||
|
ConfigManager().set("language", code)
|
||||||
|
self.settingChanged.emit("language", code)
|
||||||
|
|
||||||
|
@Slot(result=str)
|
||||||
|
def get_current_language_name(self):
|
||||||
|
from src.core.languages import get_name_by_code
|
||||||
|
from src.core.config import ConfigManager
|
||||||
|
code = ConfigManager().get("language")
|
||||||
|
return get_name_by_code(code)
|
||||||
|
|
||||||
@Slot(str, result='QVariant')
|
@Slot(str, result='QVariant')
|
||||||
def getSetting(self, key):
|
def getSetting(self, key):
|
||||||
from src.core.config import ConfigManager
|
from src.core.config import ConfigManager
|
||||||
@@ -336,11 +357,7 @@ class UIBridge(QObject):
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
logging.error(f"Failed to preload audio devices: {e}")
|
logging.error(f"Failed to preload audio devices: {e}")
|
||||||
|
|
||||||
@Slot()
|
|
||||||
def toggle_recording(self):
|
|
||||||
"""Called by UI elements to trigger the app's recording logic."""
|
|
||||||
# This will be connected to the main app's toggle logic
|
|
||||||
pass
|
|
||||||
@Property(bool, notify=isDownloadingChanged)
|
@Property(bool, notify=isDownloadingChanged)
|
||||||
def isDownloading(self): return self._is_downloading
|
def isDownloading(self): return self._is_downloading
|
||||||
|
|
||||||
@@ -356,9 +373,15 @@ class UIBridge(QObject):
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
from src.core.paths import get_models_path
|
from src.core.paths import get_models_path
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# Check new simple format used by DownloadWorker
|
# Check new simple format used by DownloadWorker
|
||||||
path_simple = get_models_path() / f"faster-whisper-{size}"
|
path_simple = get_models_path() / f"faster-whisper-{size}"
|
||||||
if path_simple.exists() and any(path_simple.iterdir()):
|
if path_simple.exists():
|
||||||
|
# Strict check: Ensure all critical files exist
|
||||||
|
required = ["config.json", "model.bin", "vocabulary.json"]
|
||||||
|
if all((path_simple / f).exists() for f in required):
|
||||||
return True
|
return True
|
||||||
|
|
||||||
# Check HF Cache format (legacy/default)
|
# Check HF Cache format (legacy/default)
|
||||||
@@ -366,16 +389,22 @@ class UIBridge(QObject):
|
|||||||
path_hf = get_models_path() / folder_name
|
path_hf = get_models_path() / folder_name
|
||||||
snapshots = path_hf / "snapshots"
|
snapshots = path_hf / "snapshots"
|
||||||
if snapshots.exists() and any(snapshots.iterdir()):
|
if snapshots.exists() and any(snapshots.iterdir()):
|
||||||
return True
|
return True # Legacy cache structure is complex, assume valid if present
|
||||||
|
|
||||||
# Check direct folder (simple)
|
return False
|
||||||
path_direct = get_models_path() / size
|
|
||||||
if (path_direct / "config.json").exists():
|
|
||||||
return True
|
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logging.error(f"Error checking model status: {e}")
|
logging.error(f"Error checking model status: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
@Slot(result=bool)
|
||||||
|
def isLLMModelDownloaded(self):
|
||||||
|
try:
|
||||||
|
from src.core.paths import get_models_path
|
||||||
|
# Hardcoded check for the 1B model we support
|
||||||
|
model_file = get_models_path() / "llm" / "llama-3.2-1b-instruct" / "llama-3.2-1b-instruct-q4_k_m.gguf"
|
||||||
|
return model_file.exists()
|
||||||
|
except:
|
||||||
return False
|
return False
|
||||||
|
|
||||||
@Slot(str)
|
@Slot(str)
|
||||||
@@ -385,3 +414,7 @@ class UIBridge(QObject):
|
|||||||
@Slot()
|
@Slot()
|
||||||
def notifyModelStatesChanged(self):
|
def notifyModelStatesChanged(self):
|
||||||
self.modelStatesChanged.emit()
|
self.modelStatesChanged.emit()
|
||||||
|
|
||||||
|
@Slot()
|
||||||
|
def downloadLLM(self):
|
||||||
|
self.llmDownloadRequested.emit()
|
||||||
|
|||||||
@@ -100,7 +100,7 @@ ComboBox {
|
|||||||
popup: Popup {
|
popup: Popup {
|
||||||
y: control.height - 1
|
y: control.height - 1
|
||||||
width: control.width
|
width: control.width
|
||||||
implicitHeight: contentItem.implicitHeight
|
implicitHeight: Math.min(contentItem.implicitHeight, 300)
|
||||||
padding: 5
|
padding: 5
|
||||||
|
|
||||||
contentItem: ListView {
|
contentItem: ListView {
|
||||||
|
|||||||
@@ -25,7 +25,7 @@ Rectangle {
|
|||||||
|
|
||||||
Text {
|
Text {
|
||||||
anchors.centerIn: parent
|
anchors.centerIn: parent
|
||||||
text: control.recording ? "Listening..." : (control.currentSequence || "None")
|
text: control.recording ? "Listening..." : (formatSequence(control.currentSequence) || "None")
|
||||||
color: control.recording ? SettingsStyle.accent : (control.currentSequence ? "#ffffff" : "#808080")
|
color: control.recording ? SettingsStyle.accent : (control.currentSequence ? "#ffffff" : "#808080")
|
||||||
font.family: "JetBrains Mono"
|
font.family: "JetBrains Mono"
|
||||||
font.pixelSize: 13
|
font.pixelSize: 13
|
||||||
@@ -72,6 +72,23 @@ Rectangle {
|
|||||||
if (!activeFocus) control.recording = false
|
if (!activeFocus) control.recording = false
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function formatSequence(seq) {
|
||||||
|
if (!seq) return ""
|
||||||
|
var parts = seq.split("+")
|
||||||
|
for (var i = 0; i < parts.length; i++) {
|
||||||
|
var p = parts[i]
|
||||||
|
// Standardize modifiers
|
||||||
|
if (p === "ctrl") parts[i] = "Ctrl"
|
||||||
|
else if (p === "alt") parts[i] = "Alt"
|
||||||
|
else if (p === "shift") parts[i] = "Shift"
|
||||||
|
else if (p === "win") parts[i] = "Win"
|
||||||
|
else if (p === "esc") parts[i] = "Esc"
|
||||||
|
// Capitalize F-keys and others (e.g. f8 -> F8, space -> Space)
|
||||||
|
else parts[i] = p.charAt(0).toUpperCase() + p.slice(1)
|
||||||
|
}
|
||||||
|
return parts.join(" + ")
|
||||||
|
}
|
||||||
|
|
||||||
function getKeyName(key, text) {
|
function getKeyName(key, text) {
|
||||||
// F-Keys
|
// F-Keys
|
||||||
if (key >= Qt.Key_F1 && key <= Qt.Key_F35) return "f" + (key - Qt.Key_F1 + 1)
|
if (key >= Qt.Key_F1 && key <= Qt.Key_F35) return "f" + (key - Qt.Key_F1 + 1)
|
||||||
|
|||||||
@@ -314,15 +314,35 @@ Window {
|
|||||||
spacing: 0
|
spacing: 0
|
||||||
|
|
||||||
ModernSettingsItem {
|
ModernSettingsItem {
|
||||||
label: "Global Hotkey"
|
label: "Global Hotkey (Transcribe)"
|
||||||
description: "Press to record a new shortcut (e.g. Ctrl+Space)"
|
description: "Standard: Raw transcription"
|
||||||
control: ModernKeySequenceRecorder {
|
control: ModernKeySequenceRecorder {
|
||||||
Layout.preferredWidth: 200
|
implicitWidth: 240
|
||||||
currentSequence: ui.getSetting("hotkey")
|
currentSequence: ui.getSetting("hotkey")
|
||||||
onSequenceChanged: (seq) => ui.setSetting("hotkey", seq)
|
onSequenceChanged: (seq) => ui.setSetting("hotkey", seq)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
ModernSettingsItem {
|
||||||
|
label: "Global Hotkey (Correct)"
|
||||||
|
description: "Enhanced: Transcribe + AI Correction"
|
||||||
|
control: ModernKeySequenceRecorder {
|
||||||
|
implicitWidth: 240
|
||||||
|
currentSequence: ui.getSetting("hotkey_correct")
|
||||||
|
onSequenceChanged: (seq) => ui.setSetting("hotkey_correct", seq)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
ModernSettingsItem {
|
||||||
|
label: "Global Hotkey (Translate)"
|
||||||
|
description: "Press to record a new shortcut (e.g. F10)"
|
||||||
|
control: ModernKeySequenceRecorder {
|
||||||
|
implicitWidth: 240
|
||||||
|
currentSequence: ui.getSetting("hotkey_translate")
|
||||||
|
onSequenceChanged: (seq) => ui.setSetting("hotkey_translate", seq)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
ModernSettingsItem {
|
ModernSettingsItem {
|
||||||
label: "Run on Startup"
|
label: "Run on Startup"
|
||||||
description: "Automatically launch when you log in"
|
description: "Automatically launch when you log in"
|
||||||
@@ -349,8 +369,8 @@ Window {
|
|||||||
showSeparator: false
|
showSeparator: false
|
||||||
control: ModernSlider {
|
control: ModernSlider {
|
||||||
Layout.preferredWidth: 200
|
Layout.preferredWidth: 200
|
||||||
from: 10; to: 6000
|
from: 10; to: 20000
|
||||||
stepSize: 10
|
stepSize: 100
|
||||||
snapMode: Slider.SnapAlways
|
snapMode: Slider.SnapAlways
|
||||||
value: ui.getSetting("typing_speed")
|
value: ui.getSetting("typing_speed")
|
||||||
onMoved: ui.setSetting("typing_speed", value)
|
onMoved: ui.setSetting("typing_speed", value)
|
||||||
@@ -577,6 +597,53 @@ Window {
|
|||||||
Text { text: "Model configuration and performance"; color: SettingsStyle.textSecondary; font.family: mainFont; font.pixelSize: 14 }
|
Text { text: "Model configuration and performance"; color: SettingsStyle.textSecondary; font.family: mainFont; font.pixelSize: 14 }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
ModernSettingsSection {
|
||||||
|
title: "Style & Prompting"
|
||||||
|
Layout.margins: 32
|
||||||
|
Layout.topMargin: 0
|
||||||
|
|
||||||
|
content: ColumnLayout {
|
||||||
|
width: parent.width
|
||||||
|
spacing: 0
|
||||||
|
|
||||||
|
ModernSettingsItem {
|
||||||
|
label: "Punctuation Style"
|
||||||
|
description: "Hint for how to format text"
|
||||||
|
control: ModernComboBox {
|
||||||
|
id: styleCombo
|
||||||
|
width: 180
|
||||||
|
model: ["Standard (Proper)", "Casual (Lowercase)", "Custom"]
|
||||||
|
|
||||||
|
// Logic to determine initial index based on config string
|
||||||
|
Component.onCompleted: {
|
||||||
|
let current = ui.getSetting("initial_prompt")
|
||||||
|
if (current === "Mm-hmm. Okay, let's go. I speak in full sentences.") currentIndex = 0
|
||||||
|
else if (current === "um, okay... i guess so.") currentIndex = 1
|
||||||
|
else currentIndex = 2
|
||||||
|
}
|
||||||
|
|
||||||
|
onActivated: {
|
||||||
|
if (index === 0) ui.setSetting("initial_prompt", "Mm-hmm. Okay, let's go. I speak in full sentences.")
|
||||||
|
else if (index === 1) ui.setSetting("initial_prompt", "um, okay... i guess so.")
|
||||||
|
// Custom: Don't change string immediately, let user type
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
ModernSettingsItem {
|
||||||
|
label: "Custom Prompt"
|
||||||
|
description: "Advanced: Define your own style hint"
|
||||||
|
visible: styleCombo.currentIndex === 2
|
||||||
|
control: ModernTextField {
|
||||||
|
Layout.preferredWidth: 280
|
||||||
|
placeholderText: "e.g. 'Hello, World.'"
|
||||||
|
text: ui.getSetting("initial_prompt") || ""
|
||||||
|
onEditingFinished: ui.setSetting("initial_prompt", text === "" ? null : text)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
ModernSettingsSection {
|
ModernSettingsSection {
|
||||||
title: "Model Config"
|
title: "Model Config"
|
||||||
Layout.margins: 32
|
Layout.margins: 32
|
||||||
@@ -742,15 +809,17 @@ Window {
|
|||||||
|
|
||||||
ModernSettingsItem {
|
ModernSettingsItem {
|
||||||
label: "Language"
|
label: "Language"
|
||||||
description: "Force language or Auto-detect"
|
description: "Spoken language to transcribe"
|
||||||
control: ModernComboBox {
|
control: ModernComboBox {
|
||||||
width: 140
|
Layout.preferredWidth: 200
|
||||||
model: ["auto", "en", "fr", "de", "es", "it", "ja", "zh", "ru"]
|
model: ui.get_supported_languages()
|
||||||
currentIndex: model.indexOf(ui.getSetting("language"))
|
currentIndex: model.indexOf(ui.get_current_language_name())
|
||||||
onActivated: ui.setSetting("language", currentText)
|
onActivated: (index) => ui.set_language_by_name(currentText)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Task selector removed as per user request (Hotkeys handle this now)
|
||||||
|
|
||||||
ModernSettingsItem {
|
ModernSettingsItem {
|
||||||
label: "Compute Device"
|
label: "Compute Device"
|
||||||
description: "Hardware acceleration (CUDA requires NVidia GPU)"
|
description: "Hardware acceleration (CUDA requires NVidia GPU)"
|
||||||
@@ -773,6 +842,147 @@ Window {
|
|||||||
onActivated: ui.setSetting("compute_type", currentText)
|
onActivated: ui.setSetting("compute_type", currentText)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
ModernSettingsItem {
|
||||||
|
label: "Low VRAM Mode"
|
||||||
|
description: "Unload models immediately after use (Saves VRAM, Adds Delay)"
|
||||||
|
showSeparator: false
|
||||||
|
control: ModernSwitch {
|
||||||
|
checked: ui.getSetting("unload_models_after_use")
|
||||||
|
onToggled: ui.setSetting("unload_models_after_use", checked)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
ModernSettingsSection {
|
||||||
|
title: "Correction & Rewriting"
|
||||||
|
Layout.margins: 32
|
||||||
|
Layout.topMargin: 0
|
||||||
|
|
||||||
|
content: ColumnLayout {
|
||||||
|
width: parent.width
|
||||||
|
spacing: 0
|
||||||
|
|
||||||
|
ModernSettingsItem {
|
||||||
|
label: "Enable Correction"
|
||||||
|
description: "Post-process text with Llama 3.2 1B (Adds latency)"
|
||||||
|
control: ModernSwitch {
|
||||||
|
checked: ui.getSetting("llm_enabled")
|
||||||
|
onToggled: ui.setSetting("llm_enabled", checked)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
ModernSettingsItem {
|
||||||
|
label: "Correction Mode"
|
||||||
|
description: "Grammar Fix vs. Complete Rewrite"
|
||||||
|
visible: ui.getSetting("llm_enabled")
|
||||||
|
control: ModernComboBox {
|
||||||
|
width: 140
|
||||||
|
model: ["Grammar", "Standard", "Rewrite"]
|
||||||
|
currentIndex: model.indexOf(ui.getSetting("llm_mode"))
|
||||||
|
onActivated: ui.setSetting("llm_mode", currentText)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// LLM Model Status Card
|
||||||
|
Rectangle {
|
||||||
|
Layout.fillWidth: true
|
||||||
|
Layout.margins: 12
|
||||||
|
Layout.topMargin: 0
|
||||||
|
Layout.bottomMargin: 16
|
||||||
|
height: 54
|
||||||
|
color: "#0a0a0f"
|
||||||
|
visible: ui.getSetting("llm_enabled")
|
||||||
|
radius: 6
|
||||||
|
border.color: SettingsStyle.borderSubtle
|
||||||
|
border.width: 1
|
||||||
|
|
||||||
|
property bool isDownloaded: false
|
||||||
|
property bool isDownloading: ui.isDownloading && ui.statusText.indexOf("LLM") !== -1
|
||||||
|
|
||||||
|
Timer {
|
||||||
|
interval: 2000
|
||||||
|
running: visible
|
||||||
|
repeat: true
|
||||||
|
onTriggered: parent.checkStatus()
|
||||||
|
}
|
||||||
|
|
||||||
|
function checkStatus() {
|
||||||
|
isDownloaded = ui.isLLMModelDownloaded()
|
||||||
|
}
|
||||||
|
|
||||||
|
Component.onCompleted: checkStatus()
|
||||||
|
|
||||||
|
Connections {
|
||||||
|
target: ui
|
||||||
|
function onModelStatesChanged() { parent.checkStatus() }
|
||||||
|
function onIsDownloadingChanged() { parent.checkStatus() }
|
||||||
|
}
|
||||||
|
|
||||||
|
RowLayout {
|
||||||
|
anchors.fill: parent
|
||||||
|
anchors.leftMargin: 12
|
||||||
|
anchors.rightMargin: 12
|
||||||
|
spacing: 12
|
||||||
|
|
||||||
|
Image {
|
||||||
|
source: "smart_toy.svg"
|
||||||
|
sourceSize: Qt.size(16, 16)
|
||||||
|
layer.enabled: true
|
||||||
|
layer.effect: MultiEffect {
|
||||||
|
colorization: 1.0
|
||||||
|
colorizationColor: parent.parent.isDownloaded ? SettingsStyle.accent : "#808080"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
ColumnLayout {
|
||||||
|
Layout.fillWidth: true
|
||||||
|
spacing: 2
|
||||||
|
Text {
|
||||||
|
text: "Llama 3.2 1B (Instruct)"
|
||||||
|
color: "#ffffff"
|
||||||
|
font.family: "JetBrains Mono"; font.bold: true
|
||||||
|
font.pixelSize: 11
|
||||||
|
}
|
||||||
|
Text {
|
||||||
|
text: parent.parent.isDownloaded ? "Ready." : "Model missing (~1.2GB)"
|
||||||
|
color: SettingsStyle.textSecondary
|
||||||
|
font.family: "JetBrains Mono"; font.pixelSize: 10
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Button {
|
||||||
|
id: dlBtn
|
||||||
|
text: "Download"
|
||||||
|
visible: !parent.parent.isDownloaded && !parent.parent.isDownloading
|
||||||
|
Layout.preferredHeight: 24
|
||||||
|
Layout.preferredWidth: 80
|
||||||
|
|
||||||
|
contentItem: Text {
|
||||||
|
text: "DOWNLOAD"
|
||||||
|
font.pixelSize: 10; font.bold: true; color: "#000000"; horizontalAlignment: Text.AlignHCenter; verticalAlignment: Text.AlignVCenter
|
||||||
|
}
|
||||||
|
background: Rectangle {
|
||||||
|
color: dlBtn.hovered ? "#ffffff" : SettingsStyle.accent; radius: 4
|
||||||
|
}
|
||||||
|
onClicked: ui.downloadLLM()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Progress Bar
|
||||||
|
Rectangle {
|
||||||
|
visible: parent.parent.isDownloading
|
||||||
|
Layout.fillWidth: true
|
||||||
|
height: 4
|
||||||
|
color: "#30ffffff"
|
||||||
|
Rectangle {
|
||||||
|
width: parent.width * (ui.downloadProgress / 100)
|
||||||
|
height: parent.height
|
||||||
|
color: SettingsStyle.accent
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
32
src/utils/formatters.py
Normal file
32
src/utils/formatters.py
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
"""
|
||||||
|
Formatter Utilities
|
||||||
|
===================
|
||||||
|
Helper functions for text formatting.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def format_hotkey(sequence: str) -> str:
|
||||||
|
"""
|
||||||
|
Formats a hotkey sequence string (e.g. 'ctrl+alt+f9')
|
||||||
|
into a pretty readable string (e.g. 'Ctrl + Alt + F9').
|
||||||
|
"""
|
||||||
|
if not sequence:
|
||||||
|
return "None"
|
||||||
|
|
||||||
|
parts = sequence.split('+')
|
||||||
|
formatted_parts = []
|
||||||
|
|
||||||
|
for p in parts:
|
||||||
|
p = p.strip().lower()
|
||||||
|
if p == 'ctrl': formatted_parts.append('Ctrl')
|
||||||
|
elif p == 'alt': formatted_parts.append('Alt')
|
||||||
|
elif p == 'shift': formatted_parts.append('Shift')
|
||||||
|
elif p == 'win': formatted_parts.append('Win')
|
||||||
|
elif p == 'esc': formatted_parts.append('Esc')
|
||||||
|
else:
|
||||||
|
# Capitalize first letter
|
||||||
|
if len(p) > 0:
|
||||||
|
formatted_parts.append(p[0].upper() + p[1:])
|
||||||
|
else:
|
||||||
|
formatted_parts.append(p)
|
||||||
|
|
||||||
|
return " + ".join(formatted_parts)
|
||||||
@@ -55,6 +55,10 @@ except AttributeError:
|
|||||||
def LOWORD(l): return l & 0xffff
|
def LOWORD(l): return l & 0xffff
|
||||||
def HIWORD(l): return (l >> 16) & 0xffff
|
def HIWORD(l): return (l >> 16) & 0xffff
|
||||||
|
|
||||||
|
GWL_EXSTYLE = -20
|
||||||
|
WS_EX_TRANSPARENT = 0x00000020
|
||||||
|
WS_EX_LAYERED = 0x00080000
|
||||||
|
|
||||||
class WindowHook:
|
class WindowHook:
|
||||||
def __init__(self, hwnd, width, height, initial_scale=1.0):
|
def __init__(self, hwnd, width, height, initial_scale=1.0):
|
||||||
self.hwnd = hwnd
|
self.hwnd = hwnd
|
||||||
@@ -65,6 +69,34 @@ class WindowHook:
|
|||||||
# (Window 420x140, Pill 380x100)
|
# (Window 420x140, Pill 380x100)
|
||||||
self.logical_rect = [20, 20, 20+380, 20+100]
|
self.logical_rect = [20, 20, 20+380, 20+100]
|
||||||
self.current_scale = initial_scale
|
self.current_scale = initial_scale
|
||||||
|
self.enabled = True # New flag
|
||||||
|
|
||||||
|
def set_enabled(self, enabled):
|
||||||
|
"""
|
||||||
|
Enables or disables interaction.
|
||||||
|
When disabled, we set WS_EX_TRANSPARENT so clicks pass through physically.
|
||||||
|
"""
|
||||||
|
if self.enabled == enabled:
|
||||||
|
return
|
||||||
|
|
||||||
|
self.enabled = enabled
|
||||||
|
|
||||||
|
# Get current styles
|
||||||
|
style = user32.GetWindowLongW(self.hwnd, GWL_EXSTYLE)
|
||||||
|
|
||||||
|
if not enabled:
|
||||||
|
# Enable Click-Through (Add Transparent)
|
||||||
|
# We also ensure Layered is set (Qt usually sets it, but good to be sure)
|
||||||
|
new_style = style | WS_EX_TRANSPARENT | WS_EX_LAYERED
|
||||||
|
else:
|
||||||
|
# Disable Click-Through (Remove Transparent)
|
||||||
|
new_style = style & ~WS_EX_TRANSPARENT
|
||||||
|
|
||||||
|
if new_style != style:
|
||||||
|
SetWindowLongPtr(self.hwnd, GWL_EXSTYLE, new_style)
|
||||||
|
|
||||||
|
# Force a redraw/frame update just in case
|
||||||
|
user32.SetWindowPos(self.hwnd, 0, 0, 0, 0, 0, 0x0027) # SWP_NOMOVE | SWP_NOSIZE | SWP_NOZORDER | SWP_FRAMECHANGED
|
||||||
|
|
||||||
def install(self):
|
def install(self):
|
||||||
proc_address = ctypes.cast(self.new_wnd_proc, ctypes.c_void_p)
|
proc_address = ctypes.cast(self.new_wnd_proc, ctypes.c_void_p)
|
||||||
@@ -73,6 +105,10 @@ class WindowHook:
|
|||||||
def wnd_proc_callback(self, hwnd, msg, wParam, lParam):
|
def wnd_proc_callback(self, hwnd, msg, wParam, lParam):
|
||||||
try:
|
try:
|
||||||
if msg == WM_NCHITTEST:
|
if msg == WM_NCHITTEST:
|
||||||
|
# If disabled (invisible/inactive), let clicks pass through (HTTRANSPARENT)
|
||||||
|
if not self.enabled:
|
||||||
|
return HTTRANSPARENT
|
||||||
|
|
||||||
res = self.on_nchittest(lParam)
|
res = self.on_nchittest(lParam)
|
||||||
if res != 0:
|
if res != 0:
|
||||||
return res
|
return res
|
||||||
|
|||||||
38
test_m2m.py
Normal file
38
test_m2m.py
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
|
||||||
|
import sys
|
||||||
|
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
|
||||||
|
|
||||||
|
def test_m2m():
|
||||||
|
model_name = "facebook/m2m100_418M"
|
||||||
|
print(f"Loading {model_name}...")
|
||||||
|
|
||||||
|
tokenizer = M2M100Tokenizer.from_pretrained(model_name)
|
||||||
|
model = M2M100ForConditionalGeneration.from_pretrained(model_name)
|
||||||
|
|
||||||
|
# Test cases: (Language Code, Input)
|
||||||
|
test_cases = [
|
||||||
|
("en", "he go to school yesterday"),
|
||||||
|
("pl", "on iść do szkoła wczoraj"), # Intentional broken grammar in Polish
|
||||||
|
]
|
||||||
|
|
||||||
|
print("\nStarting M2M Tests (Self-Translation):\n")
|
||||||
|
|
||||||
|
for lang, input_text in test_cases:
|
||||||
|
tokenizer.src_lang = lang
|
||||||
|
encoded = tokenizer(input_text, return_tensors="pt")
|
||||||
|
|
||||||
|
# Translate to SAME language
|
||||||
|
generated_tokens = model.generate(
|
||||||
|
**encoded,
|
||||||
|
forced_bos_token_id=tokenizer.get_lang_id(lang)
|
||||||
|
)
|
||||||
|
|
||||||
|
corrected = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
|
||||||
|
|
||||||
|
print(f"[{lang}]")
|
||||||
|
print(f"Input: {input_text}")
|
||||||
|
print(f"Output: {corrected}")
|
||||||
|
print("-" * 20)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
test_m2m()
|
||||||
40
test_mt0.py
Normal file
40
test_mt0.py
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
|
||||||
|
import sys
|
||||||
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
||||||
|
|
||||||
|
def test_mt0():
|
||||||
|
model_name = "bigscience/mt0-base"
|
||||||
|
print(f"Loading {model_name}...")
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||||
|
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
|
||||||
|
|
||||||
|
# Test cases: (Language, Prompt, Input)
|
||||||
|
# MT0 is instruction tuned, so we should prompt it in the target language or English.
|
||||||
|
# Cross-lingual prompting (English prompt -> Target tasks) is usually supported.
|
||||||
|
|
||||||
|
test_cases = [
|
||||||
|
("English", "Correct grammar:", "he go to school yesterday"),
|
||||||
|
("Polish", "Popraw gramatykę:", "to jest testowe zdanie bez kropki"),
|
||||||
|
("Finnish", "Korjaa kielioppi:", "tämä on testilause ilman pistettä"),
|
||||||
|
("Russian", "Исправь грамматику:", "это тестовое предложение без точки"),
|
||||||
|
("Japanese", "文法を直してください:", "これは点のないテスト文です"),
|
||||||
|
("Spanish", "Corrige la gramática:", "esta es una oración de prueba sin punto"),
|
||||||
|
]
|
||||||
|
|
||||||
|
print("\nStarting MT0 Tests:\n")
|
||||||
|
|
||||||
|
for lang, prompt_text, input_text in test_cases:
|
||||||
|
full_input = f"{prompt_text} {input_text}"
|
||||||
|
inputs = tokenizer(full_input, return_tensors="pt")
|
||||||
|
|
||||||
|
outputs = model.generate(inputs.input_ids, max_length=128)
|
||||||
|
corrected = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||||
|
|
||||||
|
print(f"[{lang}]")
|
||||||
|
print(f"Input: {full_input}")
|
||||||
|
print(f"Output: {corrected}")
|
||||||
|
print("-" * 20)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
test_mt0()
|
||||||
34
test_punctuation.py
Normal file
34
test_punctuation.py
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
|
||||||
|
# Add src to path
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
|
||||||
|
from src.core.grammar_assistant import GrammarAssistant
|
||||||
|
|
||||||
|
def test_punctuation():
|
||||||
|
assistant = GrammarAssistant()
|
||||||
|
assistant.load_model()
|
||||||
|
|
||||||
|
samples = [
|
||||||
|
# User's example (verbatim)
|
||||||
|
"If the voice recognition doesn't recognize that I like stopped Or something would that would it also correct that",
|
||||||
|
|
||||||
|
# Generic run-on
|
||||||
|
"hello how are you doing today i am doing fine thanks for asking",
|
||||||
|
|
||||||
|
# Missing commas/periods
|
||||||
|
"well i think its valid however we should probably check the logs first"
|
||||||
|
]
|
||||||
|
|
||||||
|
print("\nStarting Punctuation Tests:\n")
|
||||||
|
|
||||||
|
for sample in samples:
|
||||||
|
print(f"Original: {sample}")
|
||||||
|
corrected = assistant.correct(sample)
|
||||||
|
print(f"Corrected: {corrected}")
|
||||||
|
print("-" * 20)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
test_punctuation()
|
||||||
Reference in New Issue
Block a user