File Transcription PRO

Transcribe audio and video files without using your microphone. Import recordings, podcasts, interviews, or any media file and get a text transcription.

Pro Feature

File transcription is available exclusively to Pro subscribers. Upgrade at starwhisper.ai/upgrade to unlock this feature.

Supported Formats

Format	Type	Common Use
MP3	Audio	Music, podcasts, voice memos
M4A	Audio	Apple voice memos, AAC encoded audio
WAV	Audio	Professional recordings, lossless audio
FLAC	Audio	Lossless compressed audio
OGG	Audio	Open-source audio format
MP4	Video	Most common video format
MKV	Video	High-quality video container
MOV	Video	Apple QuickTime video
AVI	Video	Legacy Windows video
WebM	Video	Web-optimized video

Video Files

For video files, StarWhisper extracts the audio track automatically. The video content is not analyzed.

Transcribing a Single File

Right-click the StarWhisper circle → Settings
Go to the Files tab
Click "Select Audio/Video Files"
Choose your file from the file picker
Wait for the progress bar to complete
Your transcription appears below

The Files tab — select files, track progress, and view results

Batch Processing

You can select multiple files at once to transcribe them in sequence.

Click "Select Audio/Video Files"
Select multiple files (hold Ctrl or Shift to select several)
Files are added to a queue and processed one by one
Each file shows individual progress
An overall progress bar tracks the entire batch

Queue Management

The queue processes files in the order they were selected. You can clear the queue at any time with the "Clear Queue" button.

Viewing Results

After transcription completes, results appear in the Files tab under "Transcribed Files". Each entry shows:

The original filename
Timestamp of when it was transcribed
The full transcription text
Option to copy the text

Tips for Best Results

Use high-quality source audio — clearer recordings produce better transcriptions
Enable GPU acceleration — dramatically speeds up file transcription, especially for long files
Choose the right model — use Medium or Large for important files where accuracy matters
Quiet recordings work best — files with heavy background noise or music will have lower accuracy
Split very long files — files over 60 minutes may take considerable time; consider splitting them

Processing Time

File transcription uses the same Whisper model as live dictation. Processing time depends on file length, model size, and whether GPU acceleration is enabled. A 10-minute file with the Small model on GPU typically takes 30–60 seconds.