Whisper Voice
v1.2.0 - Windows · Portable · ~24 MB

Your voice stays
yours.

Local speech recognition that never leaves your machine.
99 languages. LLM correction. Neural translation. Zero cloud.

Zero telemetry No internet needed CC0 Public Domain
Whisper Voice main UI — hotkey popup
99 Languages
4x Faster than realtime
~24 MB Download size
CC0 Public domain

Your words belong
to you.

Every word you speak is intimate data. Cloud transcription services process your voice on distant servers you don't control, building profiles you never consented to. Whisper Voice runs entirely on your hardware. Your thoughts never leave your machine.

Fully local

Every model runs on your GPU or CPU. No audio ever leaves your machine, not even for "quality improvement."

Zero telemetry

No analytics, no crash reports, no usage tracking. We don't even know you exist.

Free forever

Not "free tier." Free unconditionally. No subscription, no token limits, no API keys.

Yours to own

CC0 public domain. Fork it, modify it, sell it. The code belongs to everyone.

99 languages.
One hotkey.

Afrikaans Albanian Amharic Arabic Armenian Assamese Azerbaijani Bashkir Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Castilian Catalan Chinese Croatian Czech Danish Dutch English Estonian Faroese Finnish Flemish French Galician Georgian German Greek Gujarati Haitian Hausa Hawaiian Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Javanese Kannada Kazakh Khmer Korean Lao Latin Latvian Lingala Lithuanian Luxembourgish Macedonian Malagasy Malay Malayalam Maltese Maori Marathi Moldavian Mongolian Myanmar Nepali Norwegian Occitan Panjabi Pashto Persian Polish Portuguese Punjabi Romanian Russian Sanskrit Serbian Shona Sindhi Sinhala Slovak Slovenian Somali Spanish Sundanese Swahili Swedish Tagalog Tajik Tamil Tatar Telugu Thai Tibetan Turkish Turkmen Ukrainian Urdu Uzbek Vietnamese Welsh Yiddish Yoruba

Speech recognition that
respects your privacy.

Speech-to-Text

Press F9. Speak. Done.

Powered by Faster-Whisper with CTranslate2 acceleration. Automatic punctuation, 99 language recognition, and Silero VAD for precise voice activity detection. Just press a hotkey and talk.

F9 system hotkey
Auto-punctuation
99 languages
Intelligent Correction

LLM-powered text refinement.

A bundled Llama 3.2 1B model fixes grammar, punctuation, and phrasing after transcription. Choose Standard, Grammar, or Rewrite modes depending on how much you want corrected.

3 correction modes
Llama 3.2 1B
Runs locally
Neural Translation

Speak any language, type in English.

Whisper's built-in translation pipeline converts speech from 99 languages directly into English text. Press F10 instead of F9 and your words arrive translated.

F10 hotkey
99 → English
Real-time
Hardware Flexibility

From potato laptops to RTX rigs.

Low VRAM mode runs alongside games and heavy apps. Choose from Tiny to Large-v3 models based on your hardware. INT8 quantization keeps things fast on modest GPUs. CPU-only mode when no GPU is available.

Low VRAM mode
Tiny → Large-v3
INT8 quantization
Hotkey Control

System-wide, always ready.

F9 for transcription, F10 for translation. Works in any app, any window. Customizable hotkeys and hold-to-speak or toggle modes. No need to switch to the app first.

System-wide hotkeys
Customizable keys
Hold or toggle
Privacy & Portable

One folder. Zero footprint.

Portable executable with all models stored locally. No installer, no registry, no internet connection required. Delete the folder to uninstall. Your voice data never touches a network.

Portable exe
Zero telemetry
No internet needed
Multi-language Correction

LLM correction in 8+ languages.

The bundled language model corrects transcription in English, German, French, Spanish, Italian, Portuguese, Dutch, and more. Each language gets grammar-aware refinement.

8+ languages
Grammar-aware
Context-aware
Gaming Compatible

Voice chat without the cloud.

Low VRAM mode lets Whisper Voice run alongside GPU-heavy games. Use system hotkeys to dictate in any game chat, Discord, or voice channel without alt-tabbing.

Low VRAM mode
No alt-tab needed
Global hotkeys
Accessibility

WCAG 2.2 AAA compliant.

7:1+ contrast ratios, complete keyboard navigation with 2px focus rings, semantic screen reader support with descriptive labels, 24px minimum touch targets, I/O toggle indicators beyond color, and automatic reduced motion detection.

7:1+ contrast
Full keyboard nav
Screen reader support

See it in action.

Three steps.
Nothing else.

1

Download

Grab the 24 MB portable executable. Models download automatically on first run (~2 GB total).

2

Run

Launch it. No installer, no setup wizard. It sits in your system tray, ready to go.

3

Speak

Press F9 in any app. Your speech becomes text, locally, instantly.

Built on proven
open-source.

Faster-Whisper for speech recognition, CTranslate2 for model inference, Silero VAD for voice detection, and a Qt 6 / QML interface. Everything runs locally through Python with GPU acceleration.

Faster-Whisper CTranslate2 Silero VAD Llama 3.2 1B Qt 6 / QML Python

Run it. Speak.
Stay private.

Voice recognition should work for you, not harvest your data. Every word stays on your machine.

Windows 10/11 Portable · No install v1.2.0