Instructions to use md13/vanitas-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MambaSSM
How to use md13/vanitas-sft with MambaSSM:
from mamba_ssm import MambaLMHeadModel model = MambaLMHeadModel.from_pretrained("md13/vanitas-sft") - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - vanitas | |
| - spoken-dialogue | |
| - mamba-ssm | |
| - flow-matching | |
| license: mit | |
| datasets: | |
| - kyutai/DailyTalkContiguous | |
| # Vanitas SFT Model | |
| Supervised fine-tuned model for real-time spoken dialogue, trained on [kyutai/DailyTalkContiguous](https://huggingface.co/datasets/kyutai/DailyTalkContiguous). | |
| ## Architecture | |
| - **Perception Stream**: Mamba-2 SSM (4 layers, d=256) | |
| - **Cognition Core**: Sparse Attention (4 layers, d=256) | |
| - **Production Stream**: Mamba-2 + Flow Matching (4 layers, d=256) | |
| ## Training | |
| - **Dataset**: kyutai/DailyTalkContiguous (2,286 dialogues) | |
| - **Epochs**: 50 | |
| - **Batch Size**: 16 | |
| - **Learning Rate**: 2e-4 | |
| - **Hardware**: NVIDIA A100 (Modal Cloud) | |
| ## Files | |
| - `best_model.pt` — Checkpoint with the lowest validation loss | |
| - `final_model.pt` — Checkpoint after completing all 50 epochs | |
| - `config.json` — Model configuration | |