MambaSSM
vanitas
spoken-dialogue
flow-matching
vanitas-sft / README.md
md13's picture
Upload README.md with huggingface_hub
503684c verified
|
Raw
History Blame Contribute Delete
859 Bytes
metadata
tags:
  - vanitas
  - spoken-dialogue
  - mamba-ssm
  - flow-matching
license: mit
datasets:
  - kyutai/DailyTalkContiguous

Vanitas SFT Model

Supervised fine-tuned model for real-time spoken dialogue, trained on kyutai/DailyTalkContiguous.

Architecture

  • Perception Stream: Mamba-2 SSM (4 layers, d=256)
  • Cognition Core: Sparse Attention (4 layers, d=256)
  • Production Stream: Mamba-2 + Flow Matching (4 layers, d=256)

Training

  • Dataset: kyutai/DailyTalkContiguous (2,286 dialogues)
  • Epochs: 50
  • Batch Size: 16
  • Learning Rate: 2e-4
  • Hardware: NVIDIA A100 (Modal Cloud)

Files

  • best_model.pt — Checkpoint with the lowest validation loss
  • final_model.pt — Checkpoint after completing all 50 epochs
  • config.json — Model configuration