Your voice stays
yours.
Local speech recognition that never leaves your machine.
99 languages. LLM correction. Neural translation. Zero cloud.
Your words belong
to you.
Every word you speak is intimate data. Cloud transcription services process your voice on distant servers you don't control, building profiles you never consented to. Whisper Voice runs entirely on your hardware. Your thoughts never leave your machine.
Fully local
Every model runs on your GPU or CPU. No audio ever leaves your machine, not even for "quality improvement."
Zero telemetry
No analytics, no crash reports, no usage tracking. We don't even know you exist.
Free forever
Not "free tier." Free unconditionally. No subscription, no token limits, no API keys.
Yours to own
CC0 public domain. Fork it, modify it, sell it. The code belongs to everyone.
99 languages.
One hotkey.
Speech recognition that
respects your privacy.
Press F9. Speak. Done.
Powered by Faster-Whisper with CTranslate2 acceleration. Automatic punctuation, 99 language recognition, and Silero VAD for precise voice activity detection. Just press a hotkey and talk.
LLM-powered text refinement.
A bundled Llama 3.2 1B model fixes grammar, punctuation, and phrasing after transcription. Choose Standard, Grammar, or Rewrite modes depending on how much you want corrected.
Speak any language, type in English.
Whisper's built-in translation pipeline converts speech from 99 languages directly into English text. Press F10 instead of F9 and your words arrive translated.
From potato laptops to RTX rigs.
Low VRAM mode runs alongside games and heavy apps. Choose from Tiny to Large-v3 models based on your hardware. INT8 quantization keeps things fast on modest GPUs. CPU-only mode when no GPU is available.
System-wide, always ready.
F9 for transcription, F10 for translation. Works in any app, any window. Customizable hotkeys and hold-to-speak or toggle modes. No need to switch to the app first.
One folder. Zero footprint.
Portable executable with all models stored locally. No installer, no registry, no internet connection required. Delete the folder to uninstall. Your voice data never touches a network.
LLM correction in 8+ languages.
The bundled language model corrects transcription in English, German, French, Spanish, Italian, Portuguese, Dutch, and more. Each language gets grammar-aware refinement.
Voice chat without the cloud.
Low VRAM mode lets Whisper Voice run alongside GPU-heavy games. Use system hotkeys to dictate in any game chat, Discord, or voice channel without alt-tabbing.
WCAG 2.2 AAA compliant.
7:1+ contrast ratios, complete keyboard navigation with 2px focus rings, semantic screen reader support with descriptive labels, 24px minimum touch targets, I/O toggle indicators beyond color, and automatic reduced motion detection.
See it in action.
Three steps.
Nothing else.
Download
Grab the 24 MB portable executable. Models download automatically on first run (~2 GB total).
Run
Launch it. No installer, no setup wizard. It sits in your system tray, ready to go.
Speak
Press F9 in any app. Your speech becomes text, locally, instantly.
Built on proven
open-source.
Faster-Whisper for speech recognition, CTranslate2 for model inference, Silero VAD for voice detection, and a Qt 6 / QML interface. Everything runs locally through Python with GPU acceleration.
Run it. Speak.
Stay private.
Voice recognition should work for you, not harvest your data. Every word stays on your machine.