Fix: Translation Reliability, Click-Through, and Docs Sync
- Transcriber: Enforced 'beam_size=5' and prompt injection for robust translation. - Transcriber: Removed conditioning on previous text to prevent language stickiness. - Transcriber: Refactored kwargs to sanitize inputs. - Overlay: Fixed click-through by toggling WS_EX_TRANSPARENT. - UI: Added real download progress reporting. - Docs: Refactored language list to table.
This commit is contained in:
30
README.md
30
README.md
@@ -100,7 +100,7 @@ Select the model that aligns with your hardware capabilities.
|
|||||||
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python environment (~2GB) on first launch.
|
3. **Bootstrap**: Run it. The agent will self-provision an isolated Python environment (~2GB) on first launch.
|
||||||
4. **Updates**: Simply replace the `.exe`. The **Smart Bootstrapper** will detect the update and sync only the changed files, preserving your settings and skipping unnecessary downloads.
|
4. **Updates**: Simply replace the `.exe`. The **Smart Bootstrapper** will detect the update and sync only the changed files, preserving your settings and skipping unnecessary downloads.
|
||||||
|
|
||||||
### <EFBFBD> Troubleshooting
|
### 🔧 Troubleshooting
|
||||||
* **App crashes on start**: Ensure you have [Microsoft Visual C++ Redistributable 2015-2022](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) installed.
|
* **App crashes on start**: Ensure you have [Microsoft Visual C++ Redistributable 2015-2022](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) installed.
|
||||||
* **"Simulate Typing" is slow**: Some applications (remote desktops, older games) choke on super-fast input. Lower the typing speed in Settings to ~1200 CPM.
|
* **"Simulate Typing" is slow**: Some applications (remote desktops, older games) choke on super-fast input. Lower the typing speed in Settings to ~1200 CPM.
|
||||||
* **No Audio**: The agent listens to the **Default Communication Device**. Check your Windows Sound Control Panel.
|
* **No Audio**: The agent listens to the **Default Communication Device**. Check your Windows Sound Control Panel.
|
||||||
@@ -111,10 +111,36 @@ Select the model that aligns with your hardware capabilities.
|
|||||||
|
|
||||||
The engine supports 99 languages. You can lock the engine to a specific language in Settings to improve accuracy, or leave it on **Auto-Detect** for multilingual usage.
|
The engine supports 99 languages. You can lock the engine to a specific language in Settings to improve accuracy, or leave it on **Auto-Detect** for multilingual usage.
|
||||||
|
|
||||||
Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Castilian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, Flemish, French, Galician, Georgian, German, Greek, Gujarati, Haitian, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Moldavian, Mongolian, Myanmar, Nepali, Norwegian, Occitan, Panjabi, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba.
|
([See full language list below](#full-language-list))
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## 🌐 Full Language List
|
||||||
|
|
||||||
|
| | | | | |
|
||||||
|
| :--- | :--- | :--- | :--- | :--- |
|
||||||
|
| Afrikaans 🇿🇦 | Albanian 🇦🇱 | Amharic 🇪🇹 | Arabic 🇸🇦 | Armenian 🇦🇲 |
|
||||||
|
| Assamese 🇮🇳 | Azerbaijani 🇦🇿 | Bashkir 🇷🇺 | Basque 🇪🇸 | Belarusian 🇧🇾 |
|
||||||
|
| Bengali 🇧🇩 | Bosnian 🇧🇦 | Breton 🇫🇷 | Bulgarian 🇧🇬 | Burmese 🇲🇲 |
|
||||||
|
| Castilian 🇪🇸 | Catalan 🇪🇸 | Chinese 🇨🇳 | Croatian 🇭🇷 | Czech 🇨🇿 |
|
||||||
|
| Danish 🇩🇰 | Dutch 🇳🇱 | English 🇺🇸 | Estonian 🇪🇪 | Faroese 🇫🇴 |
|
||||||
|
| Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
|
||||||
|
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 |
|
||||||
|
| Hawaiian 🇺🇸 | Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 |
|
||||||
|
| Indonesian 🇮🇩 | Italian 🇮🇹 | Japanese 🇯🇵 | Javanese 🇮🇩 | Kannada 🇮🇳 |
|
||||||
|
| Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 | Lao 🇱🇦 | Latin 🇻🇦 |
|
||||||
|
| Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 | Macedonian 🇲🇰 |
|
||||||
|
| Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
|
||||||
|
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 |
|
||||||
|
| Norwegian 🇳🇴 | Occitan 🇫🇷 | Panjabi 🇮🇳 | Pashto 🇦🇫 | Persian 🇮🇷 |
|
||||||
|
| Polish 🇵🇱 | Portuguese 🇵🇹 | Punjabi 🇮🇳 | Romanian 🇷🇴 | Russian 🇷🇺 |
|
||||||
|
| Sanskrit 🇮🇳 | Serbian 🇷🇸 | Shona 🇿🇼 | Sindhi 🇵🇰 | Sinhala 🇱🇰 |
|
||||||
|
| Slovak 🇸🇰 | Slovenian 🇸🇮 | Somali 🇸🇴 | Spanish 🇪🇸 | Sundanese 🇮🇩 |
|
||||||
|
| Swahili 🇰🇪 | Swedish 🇸🇪 | Tagalog 🇵🇭 | Tajik 🇹🇯 | Tamil 🇮🇳 |
|
||||||
|
| Tatar 🇷🇺 | Telugu 🇮🇳 | Thai 🇹🇭 | Tibetan 🇨🇳 | Turkish 🇹🇷 |
|
||||||
|
| Turkmen 🇹🇲 | Ukrainian 🇺🇦 | Urdu 🇵🇰 | Uzbek 🇺🇿 | Vietnamese 🇻e |
|
||||||
|
| Welsh 🏴 | Yiddish 🇮🇱 | Yoruba 🇳🇬 | | |
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
|
|
||||||
### ⚖️ PUBLIC DOMAIN (CC0 1.0)
|
### ⚖️ PUBLIC DOMAIN (CC0 1.0)
|
||||||
|
|||||||
71
main.py
71
main.py
@@ -87,7 +87,7 @@ def _silent_shutdown_hook(exc_type, exc_value, exc_tb):
|
|||||||
sys.excepthook = _silent_shutdown_hook
|
sys.excepthook = _silent_shutdown_hook
|
||||||
|
|
||||||
class DownloadWorker(QThread):
|
class DownloadWorker(QThread):
|
||||||
"""Background worker for model downloads."""
|
"""Background worker for model downloads with REAL progress."""
|
||||||
progress = Signal(int)
|
progress = Signal(int)
|
||||||
finished = Signal()
|
finished = Signal()
|
||||||
error = Signal(str)
|
error = Signal(str)
|
||||||
@@ -98,20 +98,73 @@ class DownloadWorker(QThread):
|
|||||||
|
|
||||||
def run(self):
|
def run(self):
|
||||||
try:
|
try:
|
||||||
from faster_whisper import download_model
|
import requests
|
||||||
|
from tqdm import tqdm
|
||||||
model_path = get_models_path()
|
model_path = get_models_path()
|
||||||
# Download to a specific subdirectory to keep things clean and predictable
|
|
||||||
# This matches the logic in transcriber.py which looks for this specific path
|
|
||||||
dest_dir = model_path / f"faster-whisper-{self.model_name}"
|
dest_dir = model_path / f"faster-whisper-{self.model_name}"
|
||||||
logging.info(f"Downloading Model '{self.model_name}' to {dest_dir}...")
|
dest_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
# Ensure parent exists
|
# Files to download for a standard faster-whisper model
|
||||||
model_path.mkdir(parents=True, exist_ok=True)
|
# We map local filenames to HF repo filenames
|
||||||
|
repo_id = f"Systran/faster-whisper-{self.model_name}"
|
||||||
|
files = ["config.json", "model.bin", "tokenizer.json", "vocabulary.json"]
|
||||||
|
|
||||||
# output_dir in download_model specifies where the model files are saved
|
# Check if Preprocessor config exists (sometimes it does, usually optional for whisper?)
|
||||||
download_model(self.model_name, output_dir=str(dest_dir))
|
# We'll stick to the core 4.
|
||||||
|
|
||||||
|
base_url = f"https://huggingface.co/{repo_id}/resolve/main"
|
||||||
|
|
||||||
|
logging.info(f"Downloading {self.model_name} from {base_url}...")
|
||||||
|
|
||||||
|
# 1. Calculate Total Size
|
||||||
|
total_size = 0
|
||||||
|
file_sizes = {}
|
||||||
|
|
||||||
|
with requests.Session() as s:
|
||||||
|
for fname in files:
|
||||||
|
url = f"{base_url}/{fname}"
|
||||||
|
head = s.head(url, allow_redirects=True)
|
||||||
|
if head.status_code == 200:
|
||||||
|
size = int(head.headers.get('content-length', 0))
|
||||||
|
file_sizes[fname] = size
|
||||||
|
total_size += size
|
||||||
|
else:
|
||||||
|
# Fallback for vocabulary.json vs vocabulary.txt
|
||||||
|
if fname == "vocabulary.json":
|
||||||
|
# Try .txt? Or just skip if not found?
|
||||||
|
# Faster-whisper usually has vocabulary.json
|
||||||
|
pass
|
||||||
|
|
||||||
|
# 2. Download loop
|
||||||
|
downloaded_bytes = 0
|
||||||
|
|
||||||
|
with requests.Session() as s:
|
||||||
|
for fname in files:
|
||||||
|
if fname not in file_sizes: continue
|
||||||
|
|
||||||
|
url = f"{base_url}/{fname}"
|
||||||
|
dest_file = dest_dir / fname
|
||||||
|
|
||||||
|
# Resume check?
|
||||||
|
# Simpler to just overwrite for reliability unless we want complex resume logic.
|
||||||
|
# We'll overwrite.
|
||||||
|
|
||||||
|
resp = s.get(url, stream=True)
|
||||||
|
resp.raise_for_status()
|
||||||
|
|
||||||
|
with open(dest_file, 'wb') as f:
|
||||||
|
for chunk in resp.iter_content(chunk_size=8192):
|
||||||
|
if chunk:
|
||||||
|
f.write(chunk)
|
||||||
|
downloaded_bytes += len(chunk)
|
||||||
|
|
||||||
|
# Emit Progress
|
||||||
|
if total_size > 0:
|
||||||
|
pct = int((downloaded_bytes / total_size) * 100)
|
||||||
|
self.progress.emit(pct)
|
||||||
|
|
||||||
self.finished.emit()
|
self.finished.emit()
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logging.error(f"Download failed: {e}")
|
logging.error(f"Download failed: {e}")
|
||||||
self.error.emit(str(e))
|
self.error.emit(str(e))
|
||||||
|
|||||||
@@ -94,20 +94,59 @@ class WhisperTranscriber:
|
|||||||
language = self.config.get("language")
|
language = self.config.get("language")
|
||||||
|
|
||||||
# Use task override if provided, otherwise config
|
# Use task override if provided, otherwise config
|
||||||
final_task = task if task else self.config.get("task")
|
# Ensure safe string and lowercase ("transcribe" vs "Transcribe")
|
||||||
|
raw_task = task if task else self.config.get("task")
|
||||||
|
final_task = str(raw_task).strip().lower() if raw_task else "transcribe"
|
||||||
|
|
||||||
|
# Sanity check for valid Whisper tasks
|
||||||
|
if final_task not in ["transcribe", "translate"]:
|
||||||
|
logging.warning(f"Invalid task '{final_task}' detected. Defaulting to 'transcribe'.")
|
||||||
|
final_task = "transcribe"
|
||||||
|
|
||||||
|
# Language handling
|
||||||
|
final_language = language if language != "auto" else None
|
||||||
|
|
||||||
|
# Anti-Hallucination: Force condition_on_previous_text=False for translation
|
||||||
|
condition_prev = self.config.get("condition_on_previous_text")
|
||||||
|
|
||||||
|
# Helper options for Translation Stability
|
||||||
|
initial_prompt = self.config.get("initial_prompt")
|
||||||
|
|
||||||
|
if final_task == "translate":
|
||||||
|
condition_prev = False
|
||||||
|
# Force beam search if user has set it to greedy (1)
|
||||||
|
# Translation requires more search breadth to find the English mapping
|
||||||
|
if beam_size < 5:
|
||||||
|
logging.info("Forcing beam_size=5 for Translation task.")
|
||||||
|
beam_size = 5
|
||||||
|
|
||||||
|
# Inject guidance prompt if none exists
|
||||||
|
if not initial_prompt:
|
||||||
|
initial_prompt = "Translate this to English."
|
||||||
|
|
||||||
|
logging.info(f"Model Dispatch: Task='{final_task}', Language='{final_language}', ConditionPrev={condition_prev}, Beam={beam_size}")
|
||||||
|
|
||||||
|
# Build arguments dynamically to avoid passing None if that's the issue
|
||||||
|
transcribe_opts = {
|
||||||
|
"beam_size": beam_size,
|
||||||
|
"best_of": best_of,
|
||||||
|
"vad_filter": vad,
|
||||||
|
"task": final_task,
|
||||||
|
"vad_parameters": dict(min_silence_duration_ms=500),
|
||||||
|
"condition_on_previous_text": condition_prev,
|
||||||
|
"without_timestamps": True
|
||||||
|
}
|
||||||
|
|
||||||
|
if initial_prompt:
|
||||||
|
transcribe_opts["initial_prompt"] = initial_prompt
|
||||||
|
|
||||||
|
# Only add language if it's explicitly set (not None/Auto)
|
||||||
|
# This avoids potentially confusing the model with explicit None
|
||||||
|
if final_language:
|
||||||
|
transcribe_opts["language"] = final_language
|
||||||
|
|
||||||
# Transcribe
|
# Transcribe
|
||||||
segments, info = self.model.transcribe(
|
segments, info = self.model.transcribe(audio_data, **transcribe_opts)
|
||||||
audio_data,
|
|
||||||
beam_size=beam_size,
|
|
||||||
best_of=best_of,
|
|
||||||
vad_filter=vad,
|
|
||||||
task=final_task,
|
|
||||||
language=language if language != "auto" else None,
|
|
||||||
vad_parameters=dict(min_silence_duration_ms=500),
|
|
||||||
condition_on_previous_text=self.config.get("condition_on_previous_text"),
|
|
||||||
without_timestamps=True
|
|
||||||
)
|
|
||||||
|
|
||||||
# Aggregate text
|
# Aggregate text
|
||||||
text_result = ""
|
text_result = ""
|
||||||
|
|||||||
@@ -55,6 +55,10 @@ except AttributeError:
|
|||||||
def LOWORD(l): return l & 0xffff
|
def LOWORD(l): return l & 0xffff
|
||||||
def HIWORD(l): return (l >> 16) & 0xffff
|
def HIWORD(l): return (l >> 16) & 0xffff
|
||||||
|
|
||||||
|
GWL_EXSTYLE = -20
|
||||||
|
WS_EX_TRANSPARENT = 0x00000020
|
||||||
|
WS_EX_LAYERED = 0x00080000
|
||||||
|
|
||||||
class WindowHook:
|
class WindowHook:
|
||||||
def __init__(self, hwnd, width, height, initial_scale=1.0):
|
def __init__(self, hwnd, width, height, initial_scale=1.0):
|
||||||
self.hwnd = hwnd
|
self.hwnd = hwnd
|
||||||
@@ -68,7 +72,31 @@ class WindowHook:
|
|||||||
self.enabled = True # New flag
|
self.enabled = True # New flag
|
||||||
|
|
||||||
def set_enabled(self, enabled):
|
def set_enabled(self, enabled):
|
||||||
|
"""
|
||||||
|
Enables or disables interaction.
|
||||||
|
When disabled, we set WS_EX_TRANSPARENT so clicks pass through physically.
|
||||||
|
"""
|
||||||
|
if self.enabled == enabled:
|
||||||
|
return
|
||||||
|
|
||||||
self.enabled = enabled
|
self.enabled = enabled
|
||||||
|
|
||||||
|
# Get current styles
|
||||||
|
style = user32.GetWindowLongW(self.hwnd, GWL_EXSTYLE)
|
||||||
|
|
||||||
|
if not enabled:
|
||||||
|
# Enable Click-Through (Add Transparent)
|
||||||
|
# We also ensure Layered is set (Qt usually sets it, but good to be sure)
|
||||||
|
new_style = style | WS_EX_TRANSPARENT | WS_EX_LAYERED
|
||||||
|
else:
|
||||||
|
# Disable Click-Through (Remove Transparent)
|
||||||
|
new_style = style & ~WS_EX_TRANSPARENT
|
||||||
|
|
||||||
|
if new_style != style:
|
||||||
|
SetWindowLongPtr(self.hwnd, GWL_EXSTYLE, new_style)
|
||||||
|
|
||||||
|
# Force a redraw/frame update just in case
|
||||||
|
user32.SetWindowPos(self.hwnd, 0, 0, 0, 0, 0, 0x0027) # SWP_NOMOVE | SWP_NOSIZE | SWP_NOZORDER | SWP_FRAMECHANGED
|
||||||
|
|
||||||
def install(self):
|
def install(self):
|
||||||
proc_address = ctypes.cast(self.new_wnd_proc, ctypes.c_void_p)
|
proc_address = ctypes.cast(self.new_wnd_proc, ctypes.c_void_p)
|
||||||
|
|||||||
Reference in New Issue
Block a user