--- language: - en license: apache-2.0 tags: - speech - text-to-speech - dialogue - emotion - empathy - glm4voice - lora base_model: THUDM/glm-4-voice-9b datasets: - anonymous2222/Sympatheia-18k --- # Sympatheia This is the model checkpoint for **Sympatheia**, an emotionally adaptive speech-to-speech dialogue model submitted to NeurIPS 2026 (anonymous review). It includes LoRA adapter checkpoint files. [[Paper]](https://anonymous.4open.science/r/sympatheia-9327/sympatheia_neurips_2026.pdf) | [[Demo]](https://anonymous.4open.science/r/sympatheia-9327/) | [[Dataset]](https://huggingface.co/datasets/anonymous2222/Sympatheia-18k) | [[Code]](https://anonymous.4open.science/r/sympatheia-9327) --- ## Model description Sympatheia fine-tunes [GLM-4-Voice-9B](https://huggingface.co/THUDM/glm-4-voice-9b) with LoRA to generate spoken responses conditioned on a continuous **valence–arousal (VA)** affect signal injected into the system prompt as `User emotion (valence=v, arousal=a)`. It is trained on [Sympatheia-18k](https://huggingface.co/datasets/anonymous2222/Sympatheia-18k), a synthetic corpus of 18k emotion-conditioned spoken dialogue pairs spanning 12 emotion anchors (happy, sad, angry, excited, frustrated, anxious, relaxed, surprised, disgusted, tired, content, neutral). ## How to use This checkpoint is a LoRA adapter for GLM-4-Voice-9B. You also need: - The GLM-4-Voice-9B base model (THUDM/glm-4-voice-9b) - The GLM-4-Voice decoder weights (flow.pt, hift.pt from THUDM/glm-4-voice-decoder) See the project code at https://anonymous.4open.science/r/sympatheia-1181 for full inference and evaluation scripts. # Download this checkpoint huggingface-cli download anonymous2222/Sympatheia --local-dir /path/to/checkpoint # Run inference (from the project src/ directory) python inference_sympatheia.py --checkpoint /path/to/checkpoint # Interactive Gradio demo python gradio_demo.py --checkpoint /path/to/checkpoint --port 7860 ## Training data Sympatheia-18k (https://huggingface.co/datasets/anonymous2222/Sympatheia-18k): 18k synthetic emotion-conditioned spoken dialogue pairs (Emotional split: 12k; Neutral split: 6k). Generated with Qwen3-32B (text) and Qwen3-TTS (speech). ## Training procedure LoRA fine-tuning of GLM-4-Voice-9B with DeepSpeed ZeRO Stage 3, BF16 precision. See src/config.yaml in the project code for full hyperparameter details. ## Intended use - Research on emotionally adaptive voice assistants. - Evaluation of continuous affect conditioning for speech-to-speech dialogue. - Integration experiments with external emotion sensing modules. Not intended for: covert emotion sensing, clinical diagnosis, or any deployment without explicit user consent and opt-in affect sensing. ## License Apache 2.0. The GLM-4-Voice-9B base model is subject to the GLM-4-Voice License (https://huggingface.co/THUDM/glm-4-voice-9b).