| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - speech |
| - text-to-speech |
| - dialogue |
| - emotion |
| - empathy |
| - glm4voice |
| - lora |
| base_model: THUDM/glm-4-voice-9b |
| datasets: |
| - anonymous2222/Sympatheia-18k |
| --- |
| |
| # Sympatheia |
|
|
| This is the model checkpoint for **Sympatheia**, an emotionally adaptive speech-to-speech dialogue model submitted to NeurIPS 2026 (anonymous review). It includes LoRA adapter checkpoint files. |
|
|
| [[Paper]](https://anonymous.4open.science/r/sympatheia-9327/sympatheia_neurips_2026.pdf) | [[Demo]](https://anonymous.4open.science/r/sympatheia-9327/) | [[Dataset]](https://huggingface.co/datasets/anonymous2222/Sympatheia-18k) | [[Code]](https://anonymous.4open.science/r/sympatheia-9327) |
|
|
| --- |
|
|
| ## Model description |
|
|
| Sympatheia fine-tunes [GLM-4-Voice-9B](https://huggingface.co/THUDM/glm-4-voice-9b) with LoRA to generate spoken responses conditioned on a continuous **valence–arousal (VA)** affect signal injected into the system prompt as `User emotion (valence=v, arousal=a)`. It is trained on [Sympatheia-18k](https://huggingface.co/datasets/anonymous2222/Sympatheia-18k), a synthetic corpus of 18k emotion-conditioned spoken dialogue pairs spanning 12 emotion anchors (happy, sad, angry, excited, frustrated, anxious, relaxed, surprised, disgusted, tired, content, neutral). |
|
|
| ## How to use |
|
|
| This checkpoint is a LoRA adapter for GLM-4-Voice-9B. You also need: |
| - The GLM-4-Voice-9B base model (THUDM/glm-4-voice-9b) |
| - The GLM-4-Voice decoder weights (flow.pt, hift.pt from THUDM/glm-4-voice-decoder) |
|
|
| See the project code at https://anonymous.4open.science/r/sympatheia-1181 for full inference and evaluation scripts. |
|
|
| # Download this checkpoint |
| huggingface-cli download anonymous2222/Sympatheia --local-dir /path/to/checkpoint |
| |
| # Run inference (from the project src/ directory) |
| python inference_sympatheia.py --checkpoint /path/to/checkpoint |
| |
| # Interactive Gradio demo |
| python gradio_demo.py --checkpoint /path/to/checkpoint --port 7860 |
| |
| ## Training data |
|
|
| Sympatheia-18k (https://huggingface.co/datasets/anonymous2222/Sympatheia-18k): 18k synthetic emotion-conditioned spoken dialogue pairs (Emotional split: 12k; Neutral split: 6k). Generated with Qwen3-32B (text) and Qwen3-TTS (speech). |
|
|
| ## Training procedure |
|
|
| LoRA fine-tuning of GLM-4-Voice-9B with DeepSpeed ZeRO Stage 3, BF16 precision. See src/config.yaml in the project code for full hyperparameter details. |
|
|
| ## Intended use |
|
|
| - Research on emotionally adaptive voice assistants. |
| - Evaluation of continuous affect conditioning for speech-to-speech dialogue. |
| - Integration experiments with external emotion sensing modules. |
|
|
| Not intended for: covert emotion sensing, clinical diagnosis, or any deployment without explicit user consent and opt-in affect sensing. |
|
|
| ## License |
|
|
| Apache 2.0. The GLM-4-Voice-9B base model is subject to the GLM-4-Voice License (https://huggingface.co/THUDM/glm-4-voice-9b). |