Compare commits
8 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3137770742 | ||
|
|
aed489dd23 | ||
|
|
e23c492360 | ||
|
|
84f10092e9 | ||
|
|
03f46ee1e3 | ||
|
|
0f1bf5f1af | ||
|
|
0b2b5848e2 | ||
|
|
f3bf7541cf |
223
README.md
223
README.md
@@ -5,150 +5,185 @@
|
||||
|
||||
<br>
|
||||
|
||||

|
||||
[](https://git.lashman.live/lashman/whisper_voice/releases/latest)
|
||||
[](https://creativecommons.org/publicdomain/zero/1.0/)
|
||||

|
||||
[](https://git.lashman.live/lashman/whisper_voice/releases/latest)
|
||||
[](https://creativecommons.org/publicdomain/zero/1.0/)
|
||||
|
||||
<br>
|
||||
|
||||
> *"The master's tools will never dismantle the master's house."* — Audre Lorde
|
||||
> *"The master's tools will never dismantle the master's house."*
|
||||
> <br>
|
||||
> **Build your own tools. Run them locally.**
|
||||
> **Build your own tools. Run them locally. Free your mind.**
|
||||
|
||||
[Report Issue](https://git.lashman.live/lashman/whisper_voice/issues) • [View Source](https://git.lashman.live/lashman/whisper_voice) • [Releases](https://git.lashman.live/lashman/whisper_voice/releases)
|
||||
[View Source](https://git.lashman.live/lashman/whisper_voice) • [Report Issue](https://git.lashman.live/lashman/whisper_voice/issues)
|
||||
|
||||
</div>
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
## ✊ The Manifesto
|
||||
## 📡 The Transmission
|
||||
|
||||
**We hold these truths to be self-evident:** That user data is an extension of the self, and its exploitation by centralized clouds is a violation of digital autonomy.
|
||||
We are witnessing the **enshittification** of the digital world. What were once vibrant social commons are being walled off, strip-mined for data, and degraded into rent-seeking silos. Your voice is no longer your own; it is a training set for a corporate oracle that charges you for the privilege of listening.
|
||||
|
||||
**Whisper Voice** is built on the principle of **technological sovereignty**. It provides state-of-the-art speech recognition without renting your cognitive output to corporate oligarchies. By running entirely on your own hardware, it reclaims the means of digital production, ensuring that your words remain exclusively yours.
|
||||
**Whisper Voice** is a small act of sabotage against this trend.
|
||||
|
||||
It is built on the axiom of **Technological Sovereignty**. By moving state-of-the-art inference from the server farms to your own silicon, you reclaim the means of digital production. No telemetry. No subscriptions. No "cloud processing" that eavesdrops on your intent.
|
||||
|
||||
---
|
||||
|
||||
## ⚡ Technical Architecture
|
||||
## ⚡ The Engine
|
||||
|
||||
This operates on the metal. It is not a wrapper. It is an engine.
|
||||
Whisper Voice operates directly on the metal. It is not an API wrapper; it is an autonomous machine.
|
||||
|
||||
| Component | Technology | Benefit |
|
||||
| :--- | :--- | :--- |
|
||||
| **Inference Core** | **Faster-Whisper** | Hyper-optimized implementation of OpenAI's Whisper using **CTranslate2**. Delivers **4x speedups** over PyTorch. |
|
||||
| **Quantization** | **INT8** | 8-bit quantization enables Pro-grade models (`Large-v3`) to run on consumer GPUs with minimal VRAM. |
|
||||
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out silence and background noise, conserving compute. |
|
||||
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that feels native yet remains OS-independent. |
|
||||
| **Inference Core** | **Faster-Whisper** | Hyper-optimized C++ implementation via **CTranslate2**. Delivers **4x velocity** over standard PyTorch. |
|
||||
| **Compression** | **INT8 quantization** | Enables Pro-grade models (`Large-v3`) to run on consumer-grade GPUs, democratizing elite AI. |
|
||||
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only pure intent is processed. |
|
||||
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. |
|
||||
|
||||
### 🛑 Compatibility Matrix (Windows)
|
||||
The core engine (`CTranslate2`) is heavily optimized for Nvidia tensor cores.
|
||||
|
||||
| Manufacturer | Hardware | Status | Notes |
|
||||
| :--- | :--- | :--- | :--- |
|
||||
| **Nvidia** | GTX 900+ / RTX | ✅ **Supported** | Full heavy-metal acceleration. |
|
||||
| **AMD** | Radeon RX | ⚠️ **CPU Fallback** | Runs on CPU. Valid for `Small/Medium`, slow for `Large`. |
|
||||
| **Intel** | Arc / Iris | ⚠️ **CPU Fallback** | Runs on CPU. Valid for `Small/Medium`, slow for `Large`. |
|
||||
| **Apple** | M1 / M2 / M3 | ❌ **Unsupported** | Release is strictly Windows x64. |
|
||||
|
||||
> **AMD Users**: v1.0.3 auto-detects GPU failures and silently falls back to CPU.
|
||||
|
||||
<br>
|
||||
|
||||
## 🖋️ Universal Transcription
|
||||
|
||||
At its core, Whisper Voice is the ultimate bridge between thought and text. It listens with superhuman precision, converting spoken word into written form across **99 languages**.
|
||||
|
||||
* **Punctuation Mastery**: Automatically handles capitalization and complex punctuation formatting.
|
||||
* **Contextual Intelligence**: Smarter than standard dictation; it understands the flow of sentences to resolve homophones and technical jargon ($1.5k vs "fifteen hundred dollars").
|
||||
* **Total Privacy**: Your private dictation, legal notes, or creative writing never leave your RAM.
|
||||
|
||||
### Workflow: `F9 (Default)`
|
||||
The primary channel for native-language transcription. It transcribes precisely what it hears in the language you speak (or the one you've locked in Settings).
|
||||
|
||||
### ✨ Style Prompting (New in v1.0.2)
|
||||
Whisper Voice replaces traditional "grammar correction models" with a native **Style Prompting** engine. By injecting a specific "pre-prompt" into the model's context window, we can guide its internal style without external post-processing.
|
||||
|
||||
* **Standard (Default)**: Forces the model to use full sentences, proper capitalization, and periods. Ideal for dictation.
|
||||
* **Casual**: Encourages a relaxed, lowercase style (e.g., "no way that's crazy lol").
|
||||
* **Custom**: Allows you to seed the model with your own context (e.g., "Here is a list of medical terms:").
|
||||
|
||||
This approach incurs **zero latency penalty** and **zero extra VRAM** usage.
|
||||
|
||||
<br>
|
||||
|
||||
## 🌎 Universal Translation
|
||||
|
||||
Whisper Voice v1.0.1 includes a **Neural Translation Engine** that allows you to bridge any linguistic gap instantly.
|
||||
|
||||
* **Input**: Speak in French, Japanese, Russian, or **96 other languages**.
|
||||
* **Output**: The engine instantly reconstructs the semantic meaning into fluent **English**.
|
||||
* **Task Protocol**: Handled via the dedicated `F10` channel.
|
||||
|
||||
### 🔍 Why only English translation?
|
||||
A common question arises: *Why can't I translate from French to Japanese?*
|
||||
|
||||
The architecture of the underlying Whisper model is a **Many-to-English** design. During its massive training phase (680,000 hours of audio), the translation task was specifically optimized to map the global linguistic commons onto a single bridge language: **English**. This allowed the model to reach incredible levels of semantic understanding without the exponential complexity of a "Many-to-Many" mapping.
|
||||
|
||||
By focusing its translation decoder solely on English, Whisper achieves "Zero-Shot" quality that rivals specialized translation engines while remaining lightweight enough to run on your local GPU.
|
||||
|
||||
---
|
||||
|
||||
## 🕹️ Command & Control
|
||||
|
||||
### Global Hotkeys
|
||||
The agent runs silently in the background, waiting for your signal.
|
||||
|
||||
* **Transcribe (F9)**: Opens the channel for standard speech-to-text.
|
||||
* **Translate (F10)**: Opens the channel for neural translation.
|
||||
* **Customization**: Remap these keys in Settings. The recorder supports complex chords (e.g. `Ctrl + Alt + Space`) to fit your workflow.
|
||||
|
||||
### Injection Protocols
|
||||
* **Clipboard Paste**: Standard text injection. Instant, reliable.
|
||||
* **Simulate Typing**: Mimics physical keystrokes at superhuman speed (6000 CPM). Bypasses anti-paste restrictions and "protected" windows.
|
||||
|
||||
<br>
|
||||
|
||||
## 📊 Intelligence Matrix
|
||||
|
||||
Select the model that aligns with your hardware capabilities.
|
||||
Select the model that aligns with your available resources.
|
||||
|
||||
| Model | VRAM (GPU) | RAM (CPU) | Velocity | Designation |
|
||||
| Model | VRAM (GPU) | RAM (CPU) | Designation | Capability |
|
||||
| :--- | :--- | :--- | :--- | :--- |
|
||||
| `Tiny` | **~500 MB** | ~1 GB | ⚡ **Supersonic** | Command & Control, older hardware. |
|
||||
| `Base` | **~600 MB** | ~1 GB | 🚀 **Very Fast** | Daily driver for low-power laptops. |
|
||||
| `Small` | **~1 GB** | ~2 GB | ⏩ **Fast** | High accuracy English dictation. |
|
||||
| `Medium` | **~2 GB** | ~4 GB | ⚖️ **Balanced** | Complex vocabulary, foreign accents. |
|
||||
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **Sweet Spot.** Near-Large smarts, Medium speed. |
|
||||
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional transcription. Uncompromised. |
|
||||
| `Large-v3 Turbo` | **~4 GB** | ~6 GB | ✨ **Optimal** | **The Sweet Spot.** Near-Large intelligence, Medium speed. |
|
||||
| `Large-v3` | **~5 GB** | ~8 GB | 🧠 **Maximum** | Professional grade. Uncompromised. |
|
||||
|
||||
> *Note: Acceleration requires you to manually select your Compute Device (CUDA GPU or CPU) in Settings.*
|
||||
|
||||
### 📉 Low VRAM Mode
|
||||
For users with limited GPU memory (e.g., 4GB cards) or those running heavy games simultaneously, Whisper Voice offers a specialized **Low VRAM Mode**.
|
||||
|
||||
* **Behavior**: The AI model is aggressively unloaded from the GPU immediately after every transcription.
|
||||
* **Benefit**: When idle, the app consumes near-zero VRAM (~0MB), leaving your GPU completely free for gaming or rendering.
|
||||
* **Trade-off**: There is a "cold start" latency of 1-2 seconds for every voice command as the model reloads from the disk cache.
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Operations
|
||||
## 🛠️ Deployment
|
||||
|
||||
### 📥 Deployment
|
||||
1. **Download**: Grab `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
|
||||
### 📥 Installation
|
||||
1. **Acquire**: Download `WhisperVoice.exe` from [Releases](https://git.lashman.live/lashman/whisper_voice/releases).
|
||||
2. **Deploy**: Place it anywhere. It is portable.
|
||||
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python environment (~2GB) on first launch.
|
||||
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python runtime (~2GB) on first launch.
|
||||
4. **Sync**: Future updates are handled by the **Smart Bootstrapper**, which surgically updates only changed files, respecting your bandwidth and your settings.
|
||||
|
||||
### 🕹️ Controls
|
||||
* **Global Hook**: `F9` (Default). Press to open the channel. Release to inject text.
|
||||
* **Tray Agent**: Retracts to the system tray. Right-click for **Settings** or **File Transcription**.
|
||||
### 🔧 Troubleshooting
|
||||
* **App crashes on start**: Ensure you have [Microsoft Visual C++ Redistributable 2015-2022](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) installed.
|
||||
* **"Simulate Typing" is slow**: Some applications (remote desktops, legacy games) cannot handle the data stream. Lower the typing speed in Settings to ~1200 CPM.
|
||||
* **No Audio**: The agent listens to the **Default Communication Device**. Verify your Windows Sound Control Panel.
|
||||
|
||||
### 📡 Input Modes
|
||||
| Mode | Description | Speed |
|
||||
| :--- | :--- | :--- |
|
||||
| **Clipboard Paste** | Standard text injection via OS clipboard. | Instant |
|
||||
| **Simulate Typing** | Mimics physical keystrokes. Bypasses anti-paste blocks. | Up to **6000** CPM |
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Universal Translation
|
||||
|
||||
The model listens in **99 languages** and translates them to English or transcribes them natively.
|
||||
|
||||
<details>
|
||||
<summary><b>Click to view supported languages</b></summary>
|
||||
<br>
|
||||
|
||||
| | | | |
|
||||
| :--- | :--- | :--- | :--- |
|
||||
| Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 |
|
||||
| Armenian 🇦🇲 | Assamese 🇮🇳 | Azerbaijani 🇦🇿 | Bashkir 🇷🇺 |
|
||||
| Basque 🇪🇸 | Belarusian 🇧🇾 | Bengali 🇧🇩 | Bosnian 🇧🇦 |
|
||||
| Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 | Castilian 🇪🇸 |
|
||||
| Catalan 🇪🇸 | Chinese 🇨🇳 | Croatian 🇭🇷 | Czech 🇨🇿 |
|
||||
| Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 |
|
||||
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 |
|
||||
| Galician 🇪🇸 | Georgian 🇬🇪 | German 🇩🇪 | Greek 🇬🇷 |
|
||||
| Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
|
||||
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 |
|
||||
| Indonesian 🇮🇩 | Italian 🇮🇹 | Japanese 🇯🇵 | Javanese 🇮🇩 |
|
||||
| Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
|
||||
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 |
|
||||
| Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 | Macedonian 🇲🇰 | Malagasy 🇲🇬 |
|
||||
| Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
|
||||
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 |
|
||||
| Nepali 🇳🇵 | Norwegian 🇳🇴 | Occitan 🇫🇷 | Panjabi 🇮🇳 |
|
||||
| Pashto 🇦🇫 | Persian 🇮🇷 | Polish 🇵🇱 | Portuguese 🇵🇹 |
|
||||
| Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 | Sanskrit 🇮🇳 |
|
||||
| Serbian 🇷🇸 | Shona 🇿🇼 | Sindhi 🇵🇰 | Sinhala 🇱🇰 |
|
||||
| Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 |
|
||||
| Sundanese 🇮🇩 | Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 |
|
||||
| Tajik 🇹🇯 | Tamil 🇮🇳 | Tatar 🇷🇺 | Telugu 🇮🇳 |
|
||||
| Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 | Turkmen 🇹🇲 |
|
||||
| Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e |
|
||||
| Welsh 🏴 | Yiddish 🇮🇱 | Yoruba 🇳🇬 | |
|
||||
|
||||
</details>
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Troubleshooting
|
||||
## 🌐 Supported Languages
|
||||
|
||||
<details>
|
||||
<summary><b>🔥 App crashes on start</b></summary>
|
||||
<blockquote>
|
||||
The underlying engine requires standard C++ libraries. Install the <b>Microsoft Visual C++ Redistributable (2015-2022)</b>.
|
||||
</blockquote>
|
||||
</details>
|
||||
The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on **Auto-Detect** for fluid multilingual usage.
|
||||
|
||||
<details>
|
||||
<summary><b>🐌 "Simulate Typing" is slow</b></summary>
|
||||
<blockquote>
|
||||
Some apps (games, RDP) can't handle supersonic input. Go to <b>Settings</b> and lower the <b>Typing Speed</b> to ~1200 CPM.
|
||||
</blockquote>
|
||||
</details>
|
||||
| | | | | | |
|
||||
| :--- | :--- | :--- | :--- | :--- | :--- |
|
||||
| Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 | Armenian 🇦🇲 | Assamese 🇮🇳 |
|
||||
| Azerbaijani 🇦🇿 | Bashkir 🇷🇺 | Basque 🇪🇸 | Belarusian 🇧🇾 | Bengali 🇧🇩 | Bosnian 🇧🇦 |
|
||||
| Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 | Castilian 🇪🇸 | Catalan 🇪🇸 | Chinese 🇨🇳 |
|
||||
| Croatian 🇭🇷 | Czech 🇨🇿 | Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 |
|
||||
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
|
||||
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
|
||||
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 | Indonesian 🇮🇩 | Italian 🇮🇹 |
|
||||
| Japanese 🇯🇵 | Javanese 🇮 Indonesa | Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
|
||||
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 |
|
||||
| Macedonian 🇲🇰 | Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
|
||||
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 | Norwegian 🇳🇴 |
|
||||
| Occitan 🇫🇷 | Panjabi 🇮🇳 | Pashto 🇦🇫 | Persian 🇮🇷 | Polish 🇵🇱 | Portuguese 🇵🇹 |
|
||||
| Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 | Sanskrit 🇮🇳 | Serbian 🇷🇸 | Shona 🇿🇼 |
|
||||
| Sindhi 🇵🇰 | Sinhala 🇱🇰 | Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 |
|
||||
| Sundanese 🇮🇩 | Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 | Tajik 🇹🇯 | Tamil 🇮🇳 |
|
||||
| Tatar 🇷🇺 | Telugu 🇮🇳 | Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 | Turkmen 🇹🇲 |
|
||||
| Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e | Welsh 🏴 | Yiddish 🇮🇱 |
|
||||
| Yoruba 🇳🇬 | | | | | |
|
||||
|
||||
<details>
|
||||
<summary><b>🎤 No Audio / Silence</b></summary>
|
||||
<blockquote>
|
||||
The agent listens to the <b>Default Communication Device</b>. Ensure your microphone is set correctly in Windows Sound Settings.
|
||||
</blockquote>
|
||||
</details>
|
||||
|
||||
---
|
||||
<br>
|
||||
<br>
|
||||
|
||||
<div align="center">
|
||||
|
||||
### ⚖️ PUBLIC DOMAIN (CC0 1.0)
|
||||
|
||||
*No Rights Reserved. No Gods. No Managers.*
|
||||
*No Rights Reserved. No Gods. No Masters. No Managers.*
|
||||
|
||||
Credit to **OpenAI** (Whisper), **Systran** (Faster-Whisper), and **Silero** (VAD).
|
||||
|
||||
|
||||
28
RELEASE_NOTES.md
Normal file
28
RELEASE_NOTES.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Release v1.0.4
|
||||
|
||||
**"The Compatibility Update"**
|
||||
|
||||
This release focuses on maximum stability across different hardware configurations (AMD, Intel, Nvidia) and fixing startup crashes related to corrupted models or missing drivers.
|
||||
|
||||
## 🛠️ Critical Fixes
|
||||
|
||||
### 1. Robust CPU Fallback (AMD / Intel Support)
|
||||
* **Problem**: Previously, if an AMD user tried to run the app, it would crash instantly because it tried to load Nvidia CUDA libraries by default.
|
||||
* **Fix**: The app now **silently detects** if CUDA initialization fails (due to missing DLLs or incompatible hardware) and **automatically falls back to CPU mode**.
|
||||
* **Result**: The app "just works" on any Windows machine, regardless of GPU.
|
||||
|
||||
### 2. Startup Crash Protection
|
||||
* **Problem**: If `faster_whisper` was imported before checking for valid drivers, the app would crash on launch for some users.
|
||||
* **Fix**: Implemented **Lazy Loading** for the AI engine. The app now starts the UI first, and only loads the heavy AI libraries inside a safety block that catches errors.
|
||||
|
||||
### 3. Corrupt Model Auto-Repair
|
||||
* **Problem**: Interrupted downloads could leave a corrupted model folder, preventing the app from ever starting again.
|
||||
* **Fix**: If the app detects a "vocabulary missing" or invalid config error, it will now **automatically delete the corrupt folder** and allow you to re-download it cleanly.
|
||||
|
||||
### 4. Windows DLL Injection
|
||||
* **Fix**: Added explicit DLL path injection for `nvidia-cublas` and `nvidia-cudnn` to ensure Python 3.8+ can find the required CUDA libraries on Windows systems that don't have them in PATH.
|
||||
|
||||
## 📦 Installation
|
||||
1. Download `WhisperVoice.exe` below.
|
||||
2. Replace your existing `.exe`.
|
||||
3. Run it.
|
||||
@@ -347,11 +347,17 @@ class Bootstrapper:
|
||||
messagebox.showerror("WhisperVoice Error", f"Failed to launch app: {e}")
|
||||
return False
|
||||
|
||||
def check_dependencies(self):
|
||||
"""Quick check if critical dependencies are installed."""
|
||||
return True # Deprecated logic placeholder
|
||||
|
||||
def setup_and_run(self):
|
||||
"""Full setup/update and run flow."""
|
||||
try:
|
||||
# 1. Ensure basics
|
||||
if not self.is_python_ready():
|
||||
self.download_python()
|
||||
self._fix_pth_file() # Ensure pth is fixed immediately after download
|
||||
self.install_pip()
|
||||
self.install_packages()
|
||||
|
||||
@@ -362,7 +368,10 @@ class Bootstrapper:
|
||||
if self.run_app():
|
||||
if self.ui: self.ui.root.quit()
|
||||
except Exception as e:
|
||||
messagebox.showerror("Setup Error", f"Installation failed: {e}")
|
||||
if self.ui:
|
||||
import tkinter.messagebox as mb
|
||||
mb.showerror("Setup Error", f"Installation failed: {e}") # Improved error visibility
|
||||
log(f"Fatal error: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
BIN
dist/WhisperVoice.exe
vendored
Normal file
BIN
dist/WhisperVoice.exe
vendored
Normal file
Binary file not shown.
90
main.py
90
main.py
@@ -9,6 +9,31 @@ app_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
if app_dir not in sys.path:
|
||||
sys.path.insert(0, app_dir)
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# WINDOWS DLL FIX (CRITICAL for Portable CUDA)
|
||||
# Python 3.8+ on Windows requires explicit DLL directory addition.
|
||||
# -----------------------------------------------------------------------------
|
||||
if os.name == 'nt' and hasattr(os, 'add_dll_directory'):
|
||||
try:
|
||||
from pathlib import Path
|
||||
# Scan sys.path for site-packages
|
||||
for p in sys.path:
|
||||
path_obj = Path(p)
|
||||
if path_obj.name == 'site-packages' and path_obj.exists():
|
||||
nvidia_path = path_obj / "nvidia"
|
||||
if nvidia_path.exists():
|
||||
for subdir in nvidia_path.iterdir():
|
||||
# Add 'bin' folder from each nvidia stub (cublas, cudnn, etc.)
|
||||
bin_path = subdir / "bin"
|
||||
if bin_path.exists():
|
||||
os.add_dll_directory(str(bin_path))
|
||||
# Also try adding site-packages itself just in case
|
||||
# os.add_dll_directory(str(path_obj))
|
||||
break
|
||||
except Exception:
|
||||
pass
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
from PySide6.QtWidgets import QApplication, QFileDialog, QMessageBox
|
||||
from PySide6.QtCore import QObject, Slot, Signal, QThread, Qt, QUrl
|
||||
from PySide6.QtQml import QQmlApplicationEngine
|
||||
@@ -87,7 +112,7 @@ def _silent_shutdown_hook(exc_type, exc_value, exc_tb):
|
||||
sys.excepthook = _silent_shutdown_hook
|
||||
|
||||
class DownloadWorker(QThread):
|
||||
"""Background worker for model downloads."""
|
||||
"""Background worker for model downloads with REAL progress."""
|
||||
progress = Signal(int)
|
||||
finished = Signal()
|
||||
error = Signal(str)
|
||||
@@ -98,20 +123,67 @@ class DownloadWorker(QThread):
|
||||
|
||||
def run(self):
|
||||
try:
|
||||
from faster_whisper import download_model
|
||||
import requests
|
||||
from tqdm import tqdm
|
||||
model_path = get_models_path()
|
||||
# Download to a specific subdirectory to keep things clean and predictable
|
||||
# This matches the logic in transcriber.py which looks for this specific path
|
||||
# Determine what to download
|
||||
dest_dir = model_path / f"faster-whisper-{self.model_name}"
|
||||
logging.info(f"Downloading Model '{self.model_name}' to {dest_dir}...")
|
||||
repo_id = f"Systran/faster-whisper-{self.model_name}"
|
||||
files = ["config.json", "model.bin", "tokenizer.json", "vocabulary.json"]
|
||||
base_url = f"https://huggingface.co/{repo_id}/resolve/main"
|
||||
|
||||
# Ensure parent exists
|
||||
model_path.mkdir(parents=True, exist_ok=True)
|
||||
dest_dir.mkdir(parents=True, exist_ok=True)
|
||||
logging.info(f"Downloading {self.model_name} to {dest_dir}...")
|
||||
|
||||
# output_dir in download_model specifies where the model files are saved
|
||||
download_model(self.model_name, output_dir=str(dest_dir))
|
||||
# 1. Calculate Total Size
|
||||
total_size = 0
|
||||
file_sizes = {}
|
||||
|
||||
with requests.Session() as s:
|
||||
for fname in files:
|
||||
url = f"{base_url}/{fname}"
|
||||
head = s.head(url, allow_redirects=True)
|
||||
if head.status_code == 200:
|
||||
size = int(head.headers.get('content-length', 0))
|
||||
file_sizes[fname] = size
|
||||
total_size += size
|
||||
else:
|
||||
# Fallback for vocabulary.json vs vocabulary.txt
|
||||
if fname == "vocabulary.json":
|
||||
# Try .txt? Or just skip if not found?
|
||||
# Faster-whisper usually has vocabulary.json
|
||||
pass
|
||||
|
||||
# 2. Download loop
|
||||
downloaded_bytes = 0
|
||||
|
||||
with requests.Session() as s:
|
||||
for fname in files:
|
||||
if fname not in file_sizes: continue
|
||||
|
||||
url = f"{base_url}/{fname}"
|
||||
dest_file = dest_dir / fname
|
||||
|
||||
# Resume check?
|
||||
# Simpler to just overwrite for reliability unless we want complex resume logic.
|
||||
# We'll overwrite.
|
||||
|
||||
resp = s.get(url, stream=True)
|
||||
resp.raise_for_status()
|
||||
|
||||
with open(dest_file, 'wb') as f:
|
||||
for chunk in resp.iter_content(chunk_size=8192):
|
||||
if chunk:
|
||||
f.write(chunk)
|
||||
downloaded_bytes += len(chunk)
|
||||
|
||||
# Emit Progress
|
||||
if total_size > 0:
|
||||
pct = int((downloaded_bytes / total_size) * 100)
|
||||
self.progress.emit(pct)
|
||||
|
||||
self.finished.emit()
|
||||
|
||||
except Exception as e:
|
||||
logging.error(f"Download failed: {e}")
|
||||
self.error.emit(str(e))
|
||||
|
||||
@@ -39,39 +39,36 @@ def build_portable():
|
||||
print("⏳ This may take 5-10 minutes...")
|
||||
|
||||
PyInstaller.__main__.run([
|
||||
"main.py", # Entry point
|
||||
"bootstrapper.py", # Entry point (Tiny Installer)
|
||||
"--name=WhisperVoice", # EXE name
|
||||
"--onefile", # Single EXE (slower startup but portable)
|
||||
"--onefile", # Single EXE
|
||||
"--noconsole", # No terminal window
|
||||
"--clean", # Clean cache
|
||||
*add_data_args, # Bundled assets
|
||||
|
||||
# Heavy libraries that need special collection
|
||||
"--collect-all", "faster_whisper",
|
||||
"--collect-all", "ctranslate2",
|
||||
"--collect-all", "PySide6",
|
||||
"--collect-all", "torch",
|
||||
"--collect-all", "numpy",
|
||||
# Bundle the app source to be extracted by bootstrapper
|
||||
# The bootstrapper expects 'app_source' folder in bundled resources
|
||||
"--add-data", f"src{os.pathsep}app_source/src",
|
||||
"--add-data", f"main.py{os.pathsep}app_source",
|
||||
"--add-data", f"requirements.txt{os.pathsep}app_source",
|
||||
|
||||
# Hidden imports (modules imported dynamically)
|
||||
"--hidden-import", "keyboard",
|
||||
"--hidden-import", "pyperclip",
|
||||
"--hidden-import", "psutil",
|
||||
"--hidden-import", "pynvml",
|
||||
"--hidden-import", "sounddevice",
|
||||
"--hidden-import", "scipy",
|
||||
"--hidden-import", "scipy.signal",
|
||||
"--hidden-import", "huggingface_hub",
|
||||
"--hidden-import", "tokenizers",
|
||||
# Add assets
|
||||
"--add-data", f"src/ui/qml{os.pathsep}app_source/src/ui/qml",
|
||||
"--add-data", f"assets{os.pathsep}app_source/assets",
|
||||
|
||||
# Qt plugins
|
||||
"--hidden-import", "PySide6.QtQuickControls2",
|
||||
"--hidden-import", "PySide6.QtQuick.Controls",
|
||||
# No heavy collections!
|
||||
# The bootstrapper uses internal pip to install everything.
|
||||
|
||||
# Icon (convert to .ico for Windows)
|
||||
# "--icon=icon.ico", # Uncomment if you have a .ico file
|
||||
# Exclude heavy modules to ensure this exe stays tiny
|
||||
"--exclude-module", "faster_whisper",
|
||||
"--exclude-module", "torch",
|
||||
"--exclude-module", "PySide6",
|
||||
|
||||
|
||||
# Icon
|
||||
# "--icon=icon.ico",
|
||||
])
|
||||
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("✅ BUILD COMPLETE!")
|
||||
print("="*60)
|
||||
|
||||
73
publish_release.py
Normal file
73
publish_release.py
Normal file
@@ -0,0 +1,73 @@
|
||||
import os
|
||||
import requests
|
||||
import mimetypes
|
||||
|
||||
# Configuration
|
||||
API_URL = "https://git.lashman.live/api/v1"
|
||||
OWNER = "lashman"
|
||||
REPO = "whisper_voice"
|
||||
TAG = "v1.0.4"
|
||||
TOKEN = "6153890332afff2d725aaf4729bc54b5030d5700" # Extracted from git config
|
||||
EXE_PATH = r"dist\WhisperVoice.exe"
|
||||
|
||||
headers = {
|
||||
"Authorization": f"token {TOKEN}",
|
||||
"Accept": "application/json"
|
||||
}
|
||||
|
||||
def create_release():
|
||||
print(f"Creating release {TAG}...")
|
||||
|
||||
# Read Release Notes
|
||||
with open("RELEASE_NOTES.md", "r", encoding="utf-8") as f:
|
||||
notes = f.read()
|
||||
|
||||
# Create Release
|
||||
payload = {
|
||||
"tag_name": TAG,
|
||||
"name": TAG,
|
||||
"body": notes,
|
||||
"draft": False,
|
||||
"prerelease": False
|
||||
}
|
||||
|
||||
url = f"{API_URL}/repos/{OWNER}/{REPO}/releases"
|
||||
resp = requests.post(url, json=payload, headers=headers)
|
||||
|
||||
if resp.status_code == 201:
|
||||
print("Release created successfully!")
|
||||
return resp.json()
|
||||
elif resp.status_code == 409:
|
||||
print("Release already exists. Fetching it...")
|
||||
# Get by tag
|
||||
resp = requests.get(f"{API_URL}/repos/{OWNER}/{REPO}/releases/tags/{TAG}", headers=headers)
|
||||
if resp.status_code == 200:
|
||||
return resp.json()
|
||||
|
||||
print(f"Failed to create release: {resp.status_code} - {resp.text}")
|
||||
return None
|
||||
|
||||
def upload_asset(release_id, file_path):
|
||||
print(f"Uploading asset: {file_path}...")
|
||||
filename = os.path.basename(file_path)
|
||||
|
||||
with open(file_path, "rb") as f:
|
||||
data = f.read()
|
||||
|
||||
url = f"{API_URL}/repos/{OWNER}/{REPO}/releases/{release_id}/assets?name={filename}"
|
||||
|
||||
# Gitea API expects raw body
|
||||
resp = requests.post(url, data=data, headers=headers)
|
||||
|
||||
if resp.status_code == 201:
|
||||
print(f"Uploaded {filename} successfully!")
|
||||
else:
|
||||
print(f"Failed to upload asset: {resp.status_code} - {resp.text}")
|
||||
|
||||
def main():
|
||||
release = create_release()
|
||||
if release:
|
||||
upload_asset(release["id"], EXE_PATH)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -5,6 +5,7 @@
|
||||
faster-whisper>=1.0.0
|
||||
torch>=2.0.0
|
||||
|
||||
|
||||
# UI Framework
|
||||
PySide6>=6.6.0
|
||||
|
||||
|
||||
@@ -46,7 +46,13 @@ DEFAULT_SETTINGS = {
|
||||
"best_of": 5,
|
||||
"vad_filter": True,
|
||||
"no_repeat_ngram_size": 0,
|
||||
"condition_on_previous_text": True
|
||||
"condition_on_previous_text": True,
|
||||
"initial_prompt": "Mm-hmm. Okay, let's go. I speak in full sentences.", # Default: Forces punctuation
|
||||
|
||||
|
||||
|
||||
# Low VRAM Mode
|
||||
"unload_models_after_use": False # If True, models are unloaded immediately to free VRAM
|
||||
}
|
||||
|
||||
class ConfigManager:
|
||||
|
||||
@@ -15,8 +15,13 @@ import numpy as np
|
||||
from src.core.config import ConfigManager
|
||||
from src.core.paths import get_models_path
|
||||
|
||||
try:
|
||||
import torch
|
||||
except ImportError:
|
||||
torch = None
|
||||
|
||||
# Import directly - valid since we are now running in the full environment
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
|
||||
class WhisperTranscriber:
|
||||
"""
|
||||
@@ -57,6 +62,8 @@ class WhisperTranscriber:
|
||||
# Force offline if path exists to avoid HF errors
|
||||
local_only = new_path.exists()
|
||||
|
||||
try:
|
||||
from faster_whisper import WhisperModel
|
||||
self.model = WhisperModel(
|
||||
model_input,
|
||||
device=device,
|
||||
@@ -64,6 +71,23 @@ class WhisperTranscriber:
|
||||
download_root=str(get_models_path()),
|
||||
local_files_only=local_only
|
||||
)
|
||||
except Exception as load_err:
|
||||
# CRITICAL FALLBACK: If CUDA/cublas fails (AMD/Intel users), fallback to CPU
|
||||
err_str = str(load_err).lower()
|
||||
if "cublas" in err_str or "cudnn" in err_str or "library" in err_str or "device" in err_str:
|
||||
logging.warning(f"CUDA Init Failed ({load_err}). Falling back to CPU...")
|
||||
self.config.set("compute_device", "cpu") # Update config for persistence/UI
|
||||
self.current_compute_device = "cpu"
|
||||
|
||||
self.model = WhisperModel(
|
||||
model_input,
|
||||
device="cpu",
|
||||
compute_type="int8", # CPU usually handles int8 well with newer extensions, or standard
|
||||
download_root=str(get_models_path()),
|
||||
local_files_only=local_only
|
||||
)
|
||||
else:
|
||||
raise load_err
|
||||
|
||||
self.current_model_size = size
|
||||
self.current_compute_device = device
|
||||
@@ -74,6 +98,32 @@ class WhisperTranscriber:
|
||||
logging.error(f"Failed to load model: {e}")
|
||||
self.model = None
|
||||
|
||||
# Auto-Repair: Detect vocabulary/corrupt errors
|
||||
err_str = str(e).lower()
|
||||
if "vocabulary" in err_str or "tokenizer" in err_str or "config.json" in err_str:
|
||||
# ... existing auto-repair logic ...
|
||||
logging.warning("Corrupt model detected on load. Attempting to delete and reset...")
|
||||
try:
|
||||
import shutil
|
||||
# Differentiate between simple path and HF path
|
||||
new_path = get_models_path() / f"faster-whisper-{size}"
|
||||
if new_path.exists():
|
||||
shutil.rmtree(new_path)
|
||||
logging.info(f"Deleted corrupt model at {new_path}")
|
||||
else:
|
||||
# Try legacy HF path
|
||||
hf_path = get_models_path() / f"models--Systran--faster-whisper-{size}"
|
||||
if hf_path.exists():
|
||||
shutil.rmtree(hf_path)
|
||||
logging.info(f"Deleted corrupt HF model at {hf_path}")
|
||||
|
||||
# Notify UI to refresh state (will show 'Download' button now)
|
||||
# We can't reach bridge easily here without passing it in,
|
||||
# but the UI polls or listens to logs.
|
||||
# The user will simply see "Model Missing" in settings after this.
|
||||
except Exception as del_err:
|
||||
logging.error(f"Failed to delete corrupt model: {del_err}")
|
||||
|
||||
def transcribe(self, audio_data, is_file: bool = False, task: Optional[str] = None) -> str:
|
||||
"""
|
||||
Transcribe audio data.
|
||||
@@ -84,7 +134,7 @@ class WhisperTranscriber:
|
||||
if not self.model:
|
||||
self.load_model()
|
||||
if not self.model:
|
||||
return "Error: Model failed to load."
|
||||
return "Error: Model failed to load. Please check Settings -> Model Info."
|
||||
|
||||
try:
|
||||
# Config
|
||||
@@ -94,27 +144,73 @@ class WhisperTranscriber:
|
||||
language = self.config.get("language")
|
||||
|
||||
# Use task override if provided, otherwise config
|
||||
final_task = task if task else self.config.get("task")
|
||||
# Ensure safe string and lowercase ("transcribe" vs "Transcribe")
|
||||
raw_task = task if task else self.config.get("task")
|
||||
final_task = str(raw_task).strip().lower() if raw_task else "transcribe"
|
||||
|
||||
# Sanity check for valid Whisper tasks
|
||||
if final_task not in ["transcribe", "translate"]:
|
||||
logging.warning(f"Invalid task '{final_task}' detected. Defaulting to 'transcribe'.")
|
||||
final_task = "transcribe"
|
||||
|
||||
# Language handling
|
||||
final_language = language if language != "auto" else None
|
||||
|
||||
# Anti-Hallucination: Force condition_on_previous_text=False for translation
|
||||
condition_prev = self.config.get("condition_on_previous_text")
|
||||
|
||||
# Helper options for Translation Stability
|
||||
initial_prompt = self.config.get("initial_prompt")
|
||||
|
||||
if final_task == "translate":
|
||||
condition_prev = False
|
||||
# Force beam search if user has set it to greedy (1)
|
||||
# Translation requires more search breadth to find the English mapping
|
||||
if beam_size < 5:
|
||||
logging.info("Forcing beam_size=5 for Translation task.")
|
||||
beam_size = 5
|
||||
|
||||
# Inject guidance prompt if none exists
|
||||
if not initial_prompt:
|
||||
initial_prompt = "Translate this to English."
|
||||
|
||||
logging.info(f"Model Dispatch: Task='{final_task}', Language='{final_language}', ConditionPrev={condition_prev}, Beam={beam_size}")
|
||||
|
||||
# Build arguments dynamically to avoid passing None if that's the issue
|
||||
transcribe_opts = {
|
||||
"beam_size": beam_size,
|
||||
"best_of": best_of,
|
||||
"vad_filter": vad,
|
||||
"task": final_task,
|
||||
"vad_parameters": dict(min_silence_duration_ms=500),
|
||||
"condition_on_previous_text": condition_prev,
|
||||
"without_timestamps": True
|
||||
}
|
||||
|
||||
if initial_prompt:
|
||||
transcribe_opts["initial_prompt"] = initial_prompt
|
||||
|
||||
# Only add language if it's explicitly set (not None/Auto)
|
||||
# This avoids potentially confusing the model with explicit None
|
||||
if final_language:
|
||||
transcribe_opts["language"] = final_language
|
||||
|
||||
# Transcribe
|
||||
segments, info = self.model.transcribe(
|
||||
audio_data,
|
||||
beam_size=beam_size,
|
||||
best_of=best_of,
|
||||
vad_filter=vad,
|
||||
task=final_task,
|
||||
language=language if language != "auto" else None,
|
||||
vad_parameters=dict(min_silence_duration_ms=500),
|
||||
condition_on_previous_text=self.config.get("condition_on_previous_text"),
|
||||
without_timestamps=True
|
||||
)
|
||||
segments, info = self.model.transcribe(audio_data, **transcribe_opts)
|
||||
|
||||
# Aggregate text
|
||||
text_result = ""
|
||||
for segment in segments:
|
||||
text_result += segment.text + " "
|
||||
|
||||
return text_result.strip()
|
||||
text_result = text_result.strip()
|
||||
|
||||
# Low VRAM Mode: Unload Whisper Model immediately
|
||||
if self.config.get("unload_models_after_use"):
|
||||
self.unload_model()
|
||||
|
||||
logging.info(f"Final Transcription Output: '{text_result}'")
|
||||
return text_result
|
||||
|
||||
except Exception as e:
|
||||
logging.error(f"Transcription failed: {e}")
|
||||
@@ -123,7 +219,10 @@ class WhisperTranscriber:
|
||||
def model_exists(self, size: str) -> bool:
|
||||
"""Checks if a model size is already downloaded."""
|
||||
new_path = get_models_path() / f"faster-whisper-{size}"
|
||||
if (new_path / "config.json").exists():
|
||||
if new_path.exists():
|
||||
# Strict check
|
||||
required = ["config.json", "model.bin", "vocabulary.json"]
|
||||
if all((new_path / f).exists() for f in required):
|
||||
return True
|
||||
|
||||
# Legacy HF cache check
|
||||
@@ -133,3 +232,21 @@ class WhisperTranscriber:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def unload_model(self):
|
||||
"""
|
||||
Unloads model to free memory.
|
||||
"""
|
||||
if self.model:
|
||||
del self.model
|
||||
|
||||
self.model = None
|
||||
self.current_model_size = None
|
||||
|
||||
# Force garbage collection
|
||||
import gc
|
||||
gc.collect()
|
||||
if torch.cuda.is_available():
|
||||
torch.cuda.empty_cache()
|
||||
|
||||
logging.info("Whisper Model unloaded (Low VRAM Mode).")
|
||||
|
||||
@@ -376,9 +376,15 @@ class UIBridge(QObject):
|
||||
|
||||
try:
|
||||
from src.core.paths import get_models_path
|
||||
|
||||
|
||||
|
||||
# Check new simple format used by DownloadWorker
|
||||
path_simple = get_models_path() / f"faster-whisper-{size}"
|
||||
if path_simple.exists() and any(path_simple.iterdir()):
|
||||
if path_simple.exists():
|
||||
# Strict check: Ensure all critical files exist
|
||||
required = ["config.json", "model.bin", "vocabulary.json"]
|
||||
if all((path_simple / f).exists() for f in required):
|
||||
return True
|
||||
|
||||
# Check HF Cache format (legacy/default)
|
||||
@@ -386,16 +392,12 @@ class UIBridge(QObject):
|
||||
path_hf = get_models_path() / folder_name
|
||||
snapshots = path_hf / "snapshots"
|
||||
if snapshots.exists() and any(snapshots.iterdir()):
|
||||
return True
|
||||
return True # Legacy cache structure is complex, assume valid if present
|
||||
|
||||
# Check direct folder (simple)
|
||||
path_direct = get_models_path() / size
|
||||
if (path_direct / "config.json").exists():
|
||||
return True
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
logging.error(f"Error checking model status: {e}")
|
||||
|
||||
return False
|
||||
|
||||
@Slot(str)
|
||||
|
||||
@@ -587,6 +587,53 @@ Window {
|
||||
Text { text: "Model configuration and performance"; color: SettingsStyle.textSecondary; font.family: mainFont; font.pixelSize: 14 }
|
||||
}
|
||||
|
||||
ModernSettingsSection {
|
||||
title: "Style & Prompting"
|
||||
Layout.margins: 32
|
||||
Layout.topMargin: 0
|
||||
|
||||
content: ColumnLayout {
|
||||
width: parent.width
|
||||
spacing: 0
|
||||
|
||||
ModernSettingsItem {
|
||||
label: "Punctuation Style"
|
||||
description: "Hint for how to format text"
|
||||
control: ModernComboBox {
|
||||
id: styleCombo
|
||||
width: 180
|
||||
model: ["Standard (Proper)", "Casual (Lowercase)", "Custom"]
|
||||
|
||||
// Logic to determine initial index based on config string
|
||||
Component.onCompleted: {
|
||||
let current = ui.getSetting("initial_prompt")
|
||||
if (current === "Mm-hmm. Okay, let's go. I speak in full sentences.") currentIndex = 0
|
||||
else if (current === "um, okay... i guess so.") currentIndex = 1
|
||||
else currentIndex = 2
|
||||
}
|
||||
|
||||
onActivated: {
|
||||
if (index === 0) ui.setSetting("initial_prompt", "Mm-hmm. Okay, let's go. I speak in full sentences.")
|
||||
else if (index === 1) ui.setSetting("initial_prompt", "um, okay... i guess so.")
|
||||
// Custom: Don't change string immediately, let user type
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
ModernSettingsItem {
|
||||
label: "Custom Prompt"
|
||||
description: "Advanced: Define your own style hint"
|
||||
visible: styleCombo.currentIndex === 2
|
||||
control: ModernTextField {
|
||||
Layout.preferredWidth: 280
|
||||
placeholderText: "e.g. 'Hello, World.'"
|
||||
text: ui.getSetting("initial_prompt") || ""
|
||||
onEditingFinished: ui.setSetting("initial_prompt", text === "" ? null : text)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
ModernSettingsSection {
|
||||
title: "Model Config"
|
||||
Layout.margins: 32
|
||||
@@ -785,6 +832,16 @@ Window {
|
||||
onActivated: ui.setSetting("compute_type", currentText)
|
||||
}
|
||||
}
|
||||
|
||||
ModernSettingsItem {
|
||||
label: "Low VRAM Mode"
|
||||
description: "Unload models immediately after use (Saves VRAM, Adds Delay)"
|
||||
showSeparator: false
|
||||
control: ModernSwitch {
|
||||
checked: ui.getSetting("unload_models_after_use")
|
||||
onToggled: ui.setSetting("unload_models_after_use", checked)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -55,6 +55,10 @@ except AttributeError:
|
||||
def LOWORD(l): return l & 0xffff
|
||||
def HIWORD(l): return (l >> 16) & 0xffff
|
||||
|
||||
GWL_EXSTYLE = -20
|
||||
WS_EX_TRANSPARENT = 0x00000020
|
||||
WS_EX_LAYERED = 0x00080000
|
||||
|
||||
class WindowHook:
|
||||
def __init__(self, hwnd, width, height, initial_scale=1.0):
|
||||
self.hwnd = hwnd
|
||||
@@ -68,8 +72,32 @@ class WindowHook:
|
||||
self.enabled = True # New flag
|
||||
|
||||
def set_enabled(self, enabled):
|
||||
"""
|
||||
Enables or disables interaction.
|
||||
When disabled, we set WS_EX_TRANSPARENT so clicks pass through physically.
|
||||
"""
|
||||
if self.enabled == enabled:
|
||||
return
|
||||
|
||||
self.enabled = enabled
|
||||
|
||||
# Get current styles
|
||||
style = user32.GetWindowLongW(self.hwnd, GWL_EXSTYLE)
|
||||
|
||||
if not enabled:
|
||||
# Enable Click-Through (Add Transparent)
|
||||
# We also ensure Layered is set (Qt usually sets it, but good to be sure)
|
||||
new_style = style | WS_EX_TRANSPARENT | WS_EX_LAYERED
|
||||
else:
|
||||
# Disable Click-Through (Remove Transparent)
|
||||
new_style = style & ~WS_EX_TRANSPARENT
|
||||
|
||||
if new_style != style:
|
||||
SetWindowLongPtr(self.hwnd, GWL_EXSTYLE, new_style)
|
||||
|
||||
# Force a redraw/frame update just in case
|
||||
user32.SetWindowPos(self.hwnd, 0, 0, 0, 0, 0, 0x0027) # SWP_NOMOVE | SWP_NOSIZE | SWP_NOZORDER | SWP_FRAMECHANGED
|
||||
|
||||
def install(self):
|
||||
proc_address = ctypes.cast(self.new_wnd_proc, ctypes.c_void_p)
|
||||
self.old_wnd_proc = SetWindowLongPtr(self.hwnd, GWLP_WNDPROC, proc_address)
|
||||
|
||||
38
test_m2m.py
Normal file
38
test_m2m.py
Normal file
@@ -0,0 +1,38 @@
|
||||
|
||||
import sys
|
||||
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
|
||||
|
||||
def test_m2m():
|
||||
model_name = "facebook/m2m100_418M"
|
||||
print(f"Loading {model_name}...")
|
||||
|
||||
tokenizer = M2M100Tokenizer.from_pretrained(model_name)
|
||||
model = M2M100ForConditionalGeneration.from_pretrained(model_name)
|
||||
|
||||
# Test cases: (Language Code, Input)
|
||||
test_cases = [
|
||||
("en", "he go to school yesterday"),
|
||||
("pl", "on iść do szkoła wczoraj"), # Intentional broken grammar in Polish
|
||||
]
|
||||
|
||||
print("\nStarting M2M Tests (Self-Translation):\n")
|
||||
|
||||
for lang, input_text in test_cases:
|
||||
tokenizer.src_lang = lang
|
||||
encoded = tokenizer(input_text, return_tensors="pt")
|
||||
|
||||
# Translate to SAME language
|
||||
generated_tokens = model.generate(
|
||||
**encoded,
|
||||
forced_bos_token_id=tokenizer.get_lang_id(lang)
|
||||
)
|
||||
|
||||
corrected = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
|
||||
|
||||
print(f"[{lang}]")
|
||||
print(f"Input: {input_text}")
|
||||
print(f"Output: {corrected}")
|
||||
print("-" * 20)
|
||||
|
||||
if __name__ == "__main__":
|
||||
test_m2m()
|
||||
40
test_mt0.py
Normal file
40
test_mt0.py
Normal file
@@ -0,0 +1,40 @@
|
||||
|
||||
import sys
|
||||
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
||||
|
||||
def test_mt0():
|
||||
model_name = "bigscience/mt0-base"
|
||||
print(f"Loading {model_name}...")
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
|
||||
|
||||
# Test cases: (Language, Prompt, Input)
|
||||
# MT0 is instruction tuned, so we should prompt it in the target language or English.
|
||||
# Cross-lingual prompting (English prompt -> Target tasks) is usually supported.
|
||||
|
||||
test_cases = [
|
||||
("English", "Correct grammar:", "he go to school yesterday"),
|
||||
("Polish", "Popraw gramatykę:", "to jest testowe zdanie bez kropki"),
|
||||
("Finnish", "Korjaa kielioppi:", "tämä on testilause ilman pistettä"),
|
||||
("Russian", "Исправь грамматику:", "это тестовое предложение без точки"),
|
||||
("Japanese", "文法を直してください:", "これは点のないテスト文です"),
|
||||
("Spanish", "Corrige la gramática:", "esta es una oración de prueba sin punto"),
|
||||
]
|
||||
|
||||
print("\nStarting MT0 Tests:\n")
|
||||
|
||||
for lang, prompt_text, input_text in test_cases:
|
||||
full_input = f"{prompt_text} {input_text}"
|
||||
inputs = tokenizer(full_input, return_tensors="pt")
|
||||
|
||||
outputs = model.generate(inputs.input_ids, max_length=128)
|
||||
corrected = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||
|
||||
print(f"[{lang}]")
|
||||
print(f"Input: {full_input}")
|
||||
print(f"Output: {corrected}")
|
||||
print("-" * 20)
|
||||
|
||||
if __name__ == "__main__":
|
||||
test_mt0()
|
||||
34
test_punctuation.py
Normal file
34
test_punctuation.py
Normal file
@@ -0,0 +1,34 @@
|
||||
|
||||
import sys
|
||||
import os
|
||||
|
||||
# Add src to path
|
||||
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
|
||||
|
||||
from src.core.grammar_assistant import GrammarAssistant
|
||||
|
||||
def test_punctuation():
|
||||
assistant = GrammarAssistant()
|
||||
assistant.load_model()
|
||||
|
||||
samples = [
|
||||
# User's example (verbatim)
|
||||
"If the voice recognition doesn't recognize that I like stopped Or something would that would it also correct that",
|
||||
|
||||
# Generic run-on
|
||||
"hello how are you doing today i am doing fine thanks for asking",
|
||||
|
||||
# Missing commas/periods
|
||||
"well i think its valid however we should probably check the logs first"
|
||||
]
|
||||
|
||||
print("\nStarting Punctuation Tests:\n")
|
||||
|
||||
for sample in samples:
|
||||
print(f"Original: {sample}")
|
||||
corrected = assistant.correct(sample)
|
||||
print(f"Corrected: {corrected}")
|
||||
print("-" * 20)
|
||||
|
||||
if __name__ == "__main__":
|
||||
test_punctuation()
|
||||
Reference in New Issue
Block a user