Spaces:

Trelis
/

Chorus

Paused

App Files Files Community

Chorus / README.md

RonanMcGovern

Add Hosted API (Trelis Router) section

8c9bb33 verified about 1 month ago

preview code

raw

history blame contribute delete

1.81 kB

metadata

title: Trelis Chorus
emoji: 🎙️
colorFrom: blue
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false

Trelis Chorus — Multi-Speaker Whisper

Upload audio of two people talking (possibly overlapping) and get separate transcripts for each speaker with timestamps.

This Space runs on CPU (~30–60s per 30s of audio). For production use, Chorus is also available as a hosted GPU API — see below.

Hosted API (Trelis Router)

Chorus is available as a hosted GPU endpoint on Trelis Router — handles long audio end-to-end (VAD chunking + cross-chunk speaker clustering). Model id trelis/chorus-v1, base URL https://router.trelis.com/api/v1.

curl -X POST https://router.trelis.com/api/v1/transcribe \
  -H "Authorization: Bearer $TRELIS_ROUTER_API_KEY" \
  -F model=trelis/chorus-v1 \
  -F file=@meeting.wav

How it works

Chorus is a LoRA fine-tune of whisper-large-v3-turbo that adds two speaker-conditioned tokens (<|speaker1|>, <|speaker2|>). At inference time the decoder prefix includes the speaker token, which biases cross-attention toward that speaker's audio regions. Two forward passes (one per speaker) produce a transcript per speaker.

Trained on a mix of:

VoxPopuli (parliamentary speech, synthetically mixed pairs)
AMI Meeting Corpus (real conversational meeting speech)

See the Trelis Studio repo (private) for full training pipeline.

Model

Chorus v1: Trelis/Chorus-v1 — merged standalone Whisper model (base + LoRA merged + expanded tokenizer)

Environment

The Space requires HF_TOKEN (Space secret) to pull the private model weights.