Offline Transcriber

User Manual

Overview

Offline Transcriber converts speech in an audio or video file into text. It decodes the audio in your browser, resamples it for the model, and runs Whisper to produce a transcript broken into timestamped segments. You can read the result, copy the plain text, or download subtitles as .srt or .vtt. Everything happens on your own machine — the file you choose is never uploaded anywhere. The only network use is a one-time download of the model files from the Hugging Face CDN.

Getting Started

Loading a file

Drag and drop — drop an audio or video file onto the dashed drop zone.
Click to browse — click the drop zone and pick a file from your computer.

Then choose your model, language and acceleration options and press Transcribe.

Choosing a model

Pick a model based on how much accuracy you need versus how long you're willing to wait:

Model	Download	Best for
Tiny	~75 MB	Quick drafts, clear speech, fastest results
Base	~145 MB	Good everyday balance (default)
Small	~480 MB	Best multilingual accuracy, slower

The model downloads only the first time you use it, then is cached by your browser. Switching models later downloads that model once too.

Saving / exporting

After transcription, use the export buttons above the results:

SRT — subtitle file for most video players and editors.
VTT — WebVTT subtitles for HTML5 <video> and the web.
TXT — the plain transcript with no timestamps.
JSON — full text plus every segment with start/end times, for further processing.
Copy text — copies the plain transcript to your clipboard.

The Controls

Control	Action
Model	Choose Tiny, Base or Small — accuracy versus speed.
Language	Auto-detect, or pick English, Dutch, German or French to force that language.
Task	Transcribe keeps the spoken language; Translate → English outputs an English translation.
Acceleration	Auto uses WebGPU when available, otherwise CPU. You can force WebGPU or CPU (WASM).
Transcribe	Start processing the loaded file.
Clear	Remove the current file and results.

Languages

The Tiny / Base / Small models are multilingual and recognise dozens of languages. This tool exposes English, Dutch, German and French directly, plus an Auto-detect option. Forcing the correct language usually gives cleaner results than Auto-detect, especially on short or noisy clips. The Translate → English task will translate speech in any supported language into English text.

Reading the results

Results appear in two tabs. Segments shows each line of speech with its start and end time — this is what becomes your subtitles. Plain text shows the full transcript as one flowing block. A summary line reports how many segments were found, the total duration, and which model was used.

Acceleration & performance

If your browser supports WebGPU (recent Chrome, Edge, and Chromium-based browsers), transcription runs on your GPU and is several times faster. Otherwise it falls back to CPU via WebAssembly, which still works but is slower — pick the Tiny model for long files on CPU. The control panel shows whether WebGPU was detected. Processing happens in a background worker, so the page stays responsive while it runs.

Longer recordings take longer to process, and the first run also has to download the model. A short clip plus a one-time model download is the quickest way to confirm everything works.

Privacy

Offline Transcriber processes everything locally in your browser. The audio or video you load is decoded and transcribed on your device and is never transmitted to any server. The only network request the tool makes is a one-time download of the Whisper model files from a public CDN; after that the tool works without re-downloading. Your settings (model, language, task, acceleration) are stored in localStorage on your device and never leave it.

Supported File Types

Common audio: .mp3 · .wav · .m4a · .aac · .ogg · .flac
Video (audio track is extracted): .mp4 · .webm · .mov

Decoding relies on your browser's built-in audio decoders, so the exact set of supported formats depends on your browser. Chrome handles the widest range. If a file can't be decoded, try converting it to MP3 or WAV first.

About this tool

Technologies used:

Built with:

← Back to Bunka.be