Docs: Final polish - Enshittification manifesto and structural refinement
This commit is contained in:
48
README.md
48
README.md
@@ -24,40 +24,54 @@
|
|||||||
|
|
||||||
## 📡 The Transmission
|
## 📡 The Transmission
|
||||||
|
|
||||||
We live in an era of enclosure. Our words, our thoughts, and our digital footprints are strip-mined by centralized giants, turned into capital, and sold back to us as "convenience."
|
We are witnessing the **enshittification** of the digital world. What were once vibrant social commons are being walled off, strip-mined for data, and degraded into rent-seeking silos. Your voice is no longer your own; it is a training set for a corporate oracle that charges you for the privilege of listening.
|
||||||
|
|
||||||
**Whisper Voice** is a rejection of that contract.
|
**Whisper Voice** is a small act of sabotage against this trend.
|
||||||
|
|
||||||
It is built on the axiom that **your voice belongs to you**. By bringing state-of-the-art inference down from the server farms and running it on your own metal, we reclaim a small piece of the digital commons. This software answers to no one but you. It has no telemetry, no subscription, and no masters.
|
It is built on the axiom of **Technological Sovereignty**. By moving state-of-the-art inference from the server farms to your own silicon, you reclaim the means of digital production. No telemetry. No subscriptions. No "cloud processing" that eavesdrops on your intent.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## ⚡ The Engine
|
## ⚡ The Engine
|
||||||
|
|
||||||
This operates on the silicon. It is not a wrapper. It is a machine.
|
Whisper Voice operates directly on the metal. It is not an API wrapper; it is an autonomous machine.
|
||||||
|
|
||||||
| Component | Technology | Benefit |
|
| Component | Technology | Benefit |
|
||||||
| :--- | :--- | :--- |
|
| :--- | :--- | :--- |
|
||||||
| **Inference Core** | **Faster-Whisper** | Hyper-optimized implementation via **CTranslate2**. Delivers **4x velocity** over standard PyTorch execution. |
|
| **Inference Core** | **Faster-Whisper** | Hyper-optimized C++ implementation via **CTranslate2**. Delivers **4x velocity** over standard PyTorch. |
|
||||||
| **Compression** | **INT8 quantization** | Enables Pro-grade models (`Large-v3`) to run on consumer hardware, democratizing access to high-fidelity AI. |
|
| **Compression** | **INT8 quantization** | Enables Pro-grade models (`Large-v3`) to run on consumer-grade GPUs, democratizing elite AI. |
|
||||||
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only intent is captured. |
|
| **Sensory Gate** | **Silero VAD** | Enterprise-grade Voice Activity Detection filters out the noise, ensuring only pure intent is processed. |
|
||||||
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. |
|
| **Interface** | **Qt 6 / QML** | Hardware-accelerated, glassmorphic UI that is fluid, responsive, and sovereign. |
|
||||||
|
|
||||||
<br>
|
<br>
|
||||||
|
|
||||||
## 🌎 Universal Translator
|
## 🖋️ Universal Transcription
|
||||||
|
|
||||||
Whisper Voice v1.0.1 introduces a **Neural Translation Engine** built directly into the core. It bypasses the need for corporate translation APIs entirely.
|
At its core, Whisper Voice is the ultimate bridge between thought and text. It listens with superhuman precision, converting spoken word into written form across **99 languages**.
|
||||||
|
|
||||||
|
* **Punctuation Mastery**: Automatically handles capitalization and complex punctuation formatting.
|
||||||
|
* **Contextual Intelligence**: Smarter than standard dictation; it understands the flow of sentences to resolve homophones and technical jargon ($1.5k vs "fifteen hundred dollars").
|
||||||
|
* **Total Privacy**: Your private dictation, legal notes, or creative writing never leave your RAM.
|
||||||
|
|
||||||
|
### Workflow: `F9 (Default)`
|
||||||
|
The primary channel for native-language transcription. It transcribes precisely what it hears in the language you speak (or the one you've locked in Settings).
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
## 🌎 Universal Translation
|
||||||
|
|
||||||
|
Whisper Voice v1.0.1 includes a **Neural Translation Engine** that allows you to bridge any linguistic gap instantly.
|
||||||
|
|
||||||
* **Input**: Speak in French, Japanese, Russian, or **96 other languages**.
|
* **Input**: Speak in French, Japanese, Russian, or **96 other languages**.
|
||||||
* **Output**: The engine instantly reconstructs the semantic meaning in fluent **English**.
|
* **Output**: The engine instantly reconstructs the semantic meaning into fluent **English**.
|
||||||
* **Local Execution**: No API keys. No data leaks. The translation happens on your GPU.
|
* **Task Protocol**: Handled via the dedicated `F10` channel.
|
||||||
|
|
||||||
### Dual-Channel Workflow
|
### 🔍 Why only English translation?
|
||||||
The application listens on two separate channels simultaneously, allowing for seamless fluid switching between local and international communication.
|
A common question arises: *Why can't I translate from French to Japanese?*
|
||||||
|
|
||||||
* **F9 (Default)** -> **Transcribe**: Types exactly what you say, in the language you speak.
|
The architecture of the underlying Whisper model is a **Many-to-English** design. During its massive training phase (680,000 hours of audio), the translation task was specifically optimized to map the global linguistic commons onto a single bridge language: **English**. This allowed the model to reach incredible levels of semantic understanding without the exponential complexity of a "Many-to-Many" mapping.
|
||||||
* **F10 (Default)** -> **Translate**: Translates whatever you say in *any* language into English.
|
|
||||||
|
By focusing its translation decoder solely on English, Whisper achieves "Zero-Shot" quality that rivals specialized translation engines while remaining lightweight enough to run on your local GPU.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -110,7 +124,7 @@ Select the model that aligns with your available resources.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## <EFBFBD> Supported Languages
|
## 🌐 Supported Languages
|
||||||
|
|
||||||
The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on **Auto-Detect** for fluid multilingual usage.
|
The engine understands the following 99 languages. You can lock the focus to a specific language in Settings to improve accuracy, or rely on **Auto-Detect** for fluid multilingual usage.
|
||||||
|
|
||||||
@@ -123,7 +137,7 @@ The engine understands the following 99 languages. You can lock the focus to a s
|
|||||||
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
|
| Faroese 🇫🇴 | Finnish 🇫🇮 | Flemish 🇧🇪 | French 🇫🇷 | Galician 🇪🇸 | Georgian 🇬🇪 |
|
||||||
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
|
| German 🇩🇪 | Greek 🇬🇷 | Gujarati 🇮🇳 | Haitian 🇭🇹 | Hausa 🇳🇬 | Hawaiian 🇺🇸 |
|
||||||
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 | Indonesian 🇮🇩 | Italian 🇮🇹 |
|
| Hebrew 🇮🇱 | Hindi 🇮🇳 | Hungarian 🇭🇺 | Icelandic 🇮🇸 | Indonesian 🇮🇩 | Italian 🇮🇹 |
|
||||||
| Japanese 🇯🇵 | Javanese 🇮🇩 | Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
|
| Japanese 🇯🇵 | Javanese 🇮 Indonesa | Kannada 🇮🇳 | Kazakh 🇰🇿 | Khmer 🇰🇭 | Korean 🇰🇷 |
|
||||||
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 |
|
| Lao 🇱🇦 | Latin 🇻🇦 | Latvian 🇱🇻 | Lingala 🇨🇩 | Lithuanian 🇱🇹 | Luxembourgish 🇱🇺 |
|
||||||
| Macedonian 🇲🇰 | Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
|
| Macedonian 🇲🇰 | Malagasy 🇲🇬 | Malay 🇲🇾 | Malayalam 🇮🇳 | Maltese 🇲🇹 | Maori 🇳🇿 |
|
||||||
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 | Norwegian 🇳🇴 |
|
| Marathi 🇮🇳 | Moldavian 🇲🇩 | Mongolian 🇲🇳 | Myanmar 🇲🇲 | Nepali 🇳🇵 | Norwegian 🇳🇴 |
|
||||||
|
|||||||
Reference in New Issue
Block a user