--- tags: - vanitas - spoken-dialogue - mamba-ssm - flow-matching license: mit datasets: - kyutai/DailyTalkContiguous --- # Vanitas SFT Model Supervised fine-tuned model for real-time spoken dialogue, trained on [kyutai/DailyTalkContiguous](https://huggingface.co/datasets/kyutai/DailyTalkContiguous). ## Architecture - **Perception Stream**: Mamba-2 SSM (4 layers, d=256) - **Cognition Core**: Sparse Attention (4 layers, d=256) - **Production Stream**: Mamba-2 + Flow Matching (4 layers, d=256) ## Training - **Dataset**: kyutai/DailyTalkContiguous (2,286 dialogues) - **Epochs**: 50 - **Batch Size**: 16 - **Learning Rate**: 2e-4 - **Hardware**: NVIDIA A100 (Modal Cloud) ## Files - `best_model.pt` — Checkpoint with the lowest validation loss - `final_model.pt` — Checkpoint after completing all 50 epochs - `config.json` — Model configuration