Chorus / README.md
RonanMcGovern's picture
Add Hosted API (Trelis Router) section
8c9bb33 verified
metadata
title: Trelis Chorus
emoji: ๐ŸŽ™๏ธ
colorFrom: blue
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false

Trelis Chorus โ€” Multi-Speaker Whisper

Upload audio of two people talking (possibly overlapping) and get separate transcripts for each speaker with timestamps.

This Space runs on CPU (~30โ€“60s per 30s of audio). For production use, Chorus is also available as a hosted GPU API โ€” see below.

Hosted API (Trelis Router)

Chorus is available as a hosted GPU endpoint on Trelis Router โ€” handles long audio end-to-end (VAD chunking + cross-chunk speaker clustering). Model id trelis/chorus-v1, base URL https://router.trelis.com/api/v1.

curl -X POST https://router.trelis.com/api/v1/transcribe \
  -H "Authorization: Bearer $TRELIS_ROUTER_API_KEY" \
  -F model=trelis/chorus-v1 \
  -F file=@meeting.wav

How it works

Chorus is a LoRA fine-tune of whisper-large-v3-turbo that adds two speaker-conditioned tokens (<|speaker1|>, <|speaker2|>). At inference time the decoder prefix includes the speaker token, which biases cross-attention toward that speaker's audio regions. Two forward passes (one per speaker) produce a transcript per speaker.

Trained on a mix of:

  • VoxPopuli (parliamentary speech, synthetically mixed pairs)
  • AMI Meeting Corpus (real conversational meeting speech)

See the Trelis Studio repo (private) for full training pipeline.

Model

  • Chorus v1: Trelis/Chorus-v1 โ€” merged standalone Whisper model (base + LoRA merged + expanded tokenizer)

Environment

The Space requires HF_TOKEN (Space secret) to pull the private model weights.