title: Trelis Chorus
emoji: ๐๏ธ
colorFrom: blue
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false
Trelis Chorus โ Multi-Speaker Whisper
Upload audio of two people talking (possibly overlapping) and get separate transcripts for each speaker with timestamps.
This Space runs on CPU (~30โ60s per 30s of audio). For production use, Chorus is also available as a hosted GPU API โ see below.
Hosted API (Trelis Router)
Chorus is available as a hosted GPU endpoint on Trelis Router โ handles long audio end-to-end (VAD chunking + cross-chunk speaker clustering). Model id trelis/chorus-v1, base URL https://router.trelis.com/api/v1.
curl -X POST https://router.trelis.com/api/v1/transcribe \
-H "Authorization: Bearer $TRELIS_ROUTER_API_KEY" \
-F model=trelis/chorus-v1 \
-F file=@meeting.wav
How it works
Chorus is a LoRA fine-tune of whisper-large-v3-turbo that adds two speaker-conditioned tokens (<|speaker1|>, <|speaker2|>). At inference time the decoder prefix includes the speaker token, which biases cross-attention toward that speaker's audio regions. Two forward passes (one per speaker) produce a transcript per speaker.
Trained on a mix of:
- VoxPopuli (parliamentary speech, synthetically mixed pairs)
- AMI Meeting Corpus (real conversational meeting speech)
See the Trelis Studio repo (private) for full training pipeline.
Model
- Chorus v1:
Trelis/Chorus-v1โ merged standalone Whisper model (base + LoRA merged + expanded tokenizer)
Environment
The Space requires HF_TOKEN (Space secret) to pull the private model weights.