Audio-to-Audio
Safetensors
torch

πŸ”€ DyCAST

A variable-frame-rate 16 kHz speech codec based on FocalCodec.

This repository contains the checkpoint trained on LibriTTS 960, as described in the preprint.

The IVF index index.faiss contains continuous latents at 50 Hz from LibriSpeech train-clean-100, dev-clean, and test-clean.


▢️ Quickstart

See the readme at: https://github.com/lucadellalib/dycast


@ Citing

@article{dellalibera2026dycast,
    title   = {Beyond Fixed Frames: Dynamic Character-Aligned Speech Tokenization},
    author  = {Luca {Della Libera} and Cem Subakan and Mirco Ravanelli},
    journal = {arXiv preprint arXiv:2601.23174},
    year    = {2026},
}
@article{dellalibera2025focalcodecstream,
    title   = {{FocalCodec-Stream}: Streaming Low-Bitrate Speech Coding via Causal Distillation},
    author  = {Luca {Della Libera} and Cem Subakan and Mirco Ravanelli},
    journal = {arXiv preprint arXiv:2509.16195},
    year    = {2025},
}
@inproceedings{dellalibera2025focalcodec,
    title     = {{FocalCodec}: Low-Bitrate Speech Coding via Focal Modulation Networks},
    author    = {Luca {Della Libera} and Francesco Paissan and Cem Subakan and Mirco Ravanelli},
    booktitle = {Advances in Neural Information Processing Systems},
    year      = {2025},
}

πŸ“§ Contact

luca.dellalib@gmail.com


Downloads last month
176
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lucadellalib/dycast

Finetuned
(341)
this model

Dataset used to train lucadellalib/dycast

Collection including lucadellalib/dycast

Papers for lucadellalib/dycast