Real-Time Transcription

Real-time transcription shows your words appearing in a floating window as you speak, giving you instant visual feedback. Instead of waiting until you stop recording, you can see the text forming in real time.

How It Works

When enabled, StarWhisper processes your speech in small chunks while you're still talking. A transparent overlay window appears showing the live transcription. When you stop recording, the final transcription uses the complete audio for maximum accuracy.

Key points:

  • Live preview uses the Small model (maximum) for fast processing
  • Final transcription uses your full model selection for best accuracy
  • Requires Local Whisper mode (not available with OpenAI API)
  • GPU acceleration is strongly recommended for smooth real-time performance

Setting Up

  1. Right-click the StarWhisper circle → Settings
  2. Go to the Transcription tab
  3. Toggle "Real-Time Transcription" on
  4. Adjust the Chunk Size slider if needed (see below)
Real-time transcription settings

Real-time transcription options in the Transcription tab

Chunk Size

The chunk size controls how frequently StarWhisper processes audio during recording. It's measured in milliseconds.

Setting Update Speed CPU/GPU Load Best For
300ms (Fast) Words appear very quickly Higher Powerful GPUs, presentations
500ms (Default) Good balance Moderate Most users
750ms Slightly delayed Lower Older hardware
1000ms (Slow) Noticeable lag Lowest CPU-only mode

If you notice stuttering or dropped words in real-time mode, try increasing the chunk size. If your GPU handles it well, a lower chunk size gives a more responsive experience.

Guidance Prompt

The guidance prompt provides context hints to the Whisper model, helping it recognize specialized vocabulary more accurately. This is especially useful for domain-specific dictation.

How to Use

  1. In Settings → Transcription, enable "Use Guidance Prompt"
  2. Enter context text in the text area (up to 244 characters)
  3. Describe the topic or list key terms the model should expect

Examples

Context Guidance Prompt Text
Medical dictation "Medical consultation discussing cardiovascular symptoms, ECG results, and statin therapy"
Legal dictation "Contract review for intellectual property licensing, discussing indemnification clauses"
Software development "Code review discussing React components, TypeScript interfaces, and API endpoints"
Company meeting "Team meeting at Acme Corp discussing Q4 targets, Project Phoenix, and the Berlin office"

Keep It Short

The guidance prompt is limited to 244 characters. Focus on the most important terms and context rather than writing full sentences. Even a short prompt like "Cardiology consultation" significantly improves domain-specific accuracy.

Requirements

Requirement Details
Transcription method Local Whisper (not OpenAI API)
GPU acceleration Strongly recommended
Minimum model Tiny (but Small recommended for accuracy)
CPU-only mode Works but may be slow — increase chunk size to 750ms+

Real-time transcription puts continuous load on your GPU while recording. If you're also running games, video editing, or AI tools, you may experience degraded performance. See Quality & Performance for more details.