Whisper Voice

v1.2.0 - Windows · Portable · ~24 MB

Your voice stays
yours.

Local speech recognition that never leaves your machine.
99 languages. LLM correction. Neural translation. Zero cloud.

Download for Windows Source Code Buy me a coffee Support on Liberapay

Zero telemetry No internet needed CC0 Public Domain

99 Languages

4x Faster than realtime

~24 MB Download size

CC0 Public domain

Philosophy

Your words belong
to you.

Every word you speak is intimate data. Cloud transcription services process your voice on distant servers you don't control, building profiles you never consented to. Whisper Voice runs entirely on your hardware. Your thoughts never leave your machine.

Fully local

Every model runs on your GPU or CPU. No audio ever leaves your machine, not even for "quality improvement."

Zero telemetry

No analytics, no crash reports, no usage tracking. We don't even know you exist.

Free forever

Not "free tier." Free unconditionally. No subscription, no token limits, no API keys.

Yours to own

CC0 public domain. Fork it, modify it, sell it. The code belongs to everyone.

Languages

99 languages.
One hotkey.

Afrikaans Albanian Amharic Arabic Armenian Assamese Azerbaijani Bashkir Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Castilian Catalan Chinese Croatian Czech Danish Dutch English Estonian Faroese Finnish Flemish French Galician Georgian German Greek Gujarati Haitian Hausa Hawaiian Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Javanese Kannada Kazakh Khmer Korean Lao Latin Latvian Lingala Lithuanian Luxembourgish Macedonian Malagasy Malay Malayalam Maltese Maori Marathi Moldavian Mongolian Myanmar Nepali Norwegian Occitan Panjabi Pashto Persian Polish Portuguese Punjabi Romanian Russian Sanskrit Serbian Shona Sindhi Sinhala Slovak Slovenian Somali Spanish Sundanese Swahili Swedish Tagalog Tajik Tamil Tatar Telugu Thai Tibetan Turkish Turkmen Ukrainian Urdu Uzbek Vietnamese Welsh Yiddish Yoruba

Features

Speech recognition that
respects your privacy.

Speech-to-Text

Press F9. Speak. Done.

Powered by Faster-Whisper with CTranslate2 acceleration. Automatic punctuation, 99 language recognition, and Silero VAD for precise voice activity detection. Just press a hotkey and talk.

F9 system hotkey

Auto-punctuation

99 languages

Intelligent Correction

LLM-powered text refinement.

A bundled Llama 3.2 1B model fixes grammar, punctuation, and phrasing after transcription. Choose Standard, Grammar, or Rewrite modes depending on how much you want corrected.

3 correction modes

Llama 3.2 1B

Runs locally

Neural Translation

Speak any language, type in English.

Whisper's built-in translation pipeline converts speech from 99 languages directly into English text. Press F10 instead of F9 and your words arrive translated.

F10 hotkey

99 → English

Real-time

Hardware Flexibility

From potato laptops to RTX rigs.

Low VRAM mode runs alongside games and heavy apps. Choose from Tiny to Large-v3 models based on your hardware. INT8 quantization keeps things fast on modest GPUs. CPU-only mode when no GPU is available.

Low VRAM mode

Tiny → Large-v3

INT8 quantization

Hotkey Control

System-wide, always ready.

F9 for transcription, F10 for translation. Works in any app, any window. Customizable hotkeys and hold-to-speak or toggle modes. No need to switch to the app first.

System-wide hotkeys

Customizable keys

Hold or toggle

Privacy & Portable

One folder. Zero footprint.

Portable executable with all models stored locally. No installer, no registry, no internet connection required. Delete the folder to uninstall. Your voice data never touches a network.

Portable exe

Zero telemetry

No internet needed

Multi-language Correction

LLM correction in 8+ languages.

The bundled language model corrects transcription in English, German, French, Spanish, Italian, Portuguese, Dutch, and more. Each language gets grammar-aware refinement.

8+ languages

Grammar-aware

Context-aware

Gaming Compatible

Voice chat without the cloud.

Low VRAM mode lets Whisper Voice run alongside GPU-heavy games. Use system hotkeys to dictate in any game chat, Discord, or voice channel without alt-tabbing.

Low VRAM mode

No alt-tab needed

Global hotkeys

Accessibility

WCAG 2.2 AAA compliant.

7:1+ contrast ratios, complete keyboard navigation with 2px focus rings, semantic screen reader support with descriptive labels, 24px minimum touch targets, I/O toggle indicators beyond color, and automatic reduced motion detection.

7:1+ contrast

Full keyboard nav

Screen reader support

Screenshots

See it in action.

Main UI

How It Works

Three steps.
Nothing else.

Download

Grab the 24 MB portable executable. Models download automatically on first run (~2 GB total).

Run

Launch it. No installer, no setup wizard. It sits in your system tray, ready to go.

Speak

Press F9 in any app. Your speech becomes text, locally, instantly.

Under the Hood

Built on proven
open-source.

Faster-Whisper for speech recognition, CTranslate2 for model inference, Silero VAD for voice detection, and a Qt 6 / QML interface. Everything runs locally through Python with GPU acceleration.

Faster-Whisper CTranslate2 Silero VAD Llama 3.2 1B Qt 6 / QML Python

Run it. Speak.
Stay private.

Voice recognition should work for you, not harvest your data. Every word stays on your machine.

Download for Windows View Source Buy me a coffee Support on Liberapay

Windows 10/11 Portable · No install v1.2.0

Your voice staysyours.

Your words belongto you.

Fully local

Zero telemetry

Free forever

Yours to own

99 languages.One hotkey.

Speech recognition thatrespects your privacy.

Press F9. Speak. Done.

LLM-powered text refinement.

Speak any language, type in English.

From potato laptops to RTX rigs.

System-wide, always ready.

One folder. Zero footprint.

LLM correction in 8+ languages.

Voice chat without the cloud.

WCAG 2.2 AAA compliant.

See it in action.

Three steps.Nothing else.

Download

Run

Speak

Built on provenopen-source.

Run it. Speak.Stay private.

Your voice stays
yours.

Your words belong
to you.

99 languages.
One hotkey.

Speech recognition that
respects your privacy.

Three steps.
Nothing else.

Built on proven
open-source.

Run it. Speak.
Stay private.