File size: 2,900 Bytes
d149804
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb00beb
d149804
7260b2a
d149804
22c8153
d149804
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f56eef6
d149804
 
 
 
 
7260b2a
 
 
 
 
 
 
 
d149804
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
language:
- en
license: apache-2.0
tags:
- speech
- text-to-speech
- dialogue
- emotion
- empathy
- glm4voice
- lora
base_model: THUDM/glm-4-voice-9b
datasets:
- anonymous2222/Sympatheia-18k
---

# Sympatheia

This is the model checkpoint for **Sympatheia**, an emotionally adaptive speech-to-speech dialogue model submitted to NeurIPS 2026 (anonymous review). It includes LoRA adapter checkpoint files.

[[Paper]](https://anonymous.4open.science/r/sympatheia-9327/sympatheia_neurips_2026.pdf) | [[Demo]](https://anonymous.4open.science/r/sympatheia-9327/) | [[Dataset]](https://huggingface.co/datasets/anonymous2222/Sympatheia-18k) | [[Code]](https://anonymous.4open.science/r/sympatheia-9327)

---

## Model description

Sympatheia fine-tunes [GLM-4-Voice-9B](https://huggingface.co/THUDM/glm-4-voice-9b) with LoRA to generate spoken responses conditioned on a continuous **valence–arousal (VA)** affect signal injected into the system prompt as `User emotion (valence=v, arousal=a)`. It is trained on [Sympatheia-18k](https://huggingface.co/datasets/anonymous2222/Sympatheia-18k), a synthetic corpus of 18k emotion-conditioned spoken dialogue pairs spanning 12 emotion anchors (happy, sad, angry, excited, frustrated, anxious, relaxed, surprised, disgusted, tired, content, neutral).

## How to use

This checkpoint is a LoRA adapter for GLM-4-Voice-9B. You also need:
- The GLM-4-Voice-9B base model (THUDM/glm-4-voice-9b)
- The GLM-4-Voice decoder weights (flow.pt, hift.pt from THUDM/glm-4-voice-decoder)

See the project code at https://anonymous.4open.science/r/sympatheia-1181 for full inference and evaluation scripts.

    # Download this checkpoint
    huggingface-cli download anonymous2222/Sympatheia --local-dir /path/to/checkpoint

    # Run inference (from the project src/ directory)
    python inference_sympatheia.py --checkpoint /path/to/checkpoint

    # Interactive Gradio demo
    python gradio_demo.py --checkpoint /path/to/checkpoint --port 7860

## Training data

Sympatheia-18k (https://huggingface.co/datasets/anonymous2222/Sympatheia-18k): 18k synthetic emotion-conditioned spoken dialogue pairs (Emotional split: 12k; Neutral split: 6k). Generated with Qwen3-32B (text) and Qwen3-TTS (speech).

## Training procedure

LoRA fine-tuning of GLM-4-Voice-9B with DeepSpeed ZeRO Stage 3, BF16 precision. See src/config.yaml in the project code for full hyperparameter details.

## Intended use

- Research on emotionally adaptive voice assistants.
- Evaluation of continuous affect conditioning for speech-to-speech dialogue.
- Integration experiments with external emotion sensing modules.

Not intended for: covert emotion sensing,  clinical diagnosis, or any deployment without explicit user consent and opt-in affect sensing.

## License

Apache 2.0. The GLM-4-Voice-9B base model is subject to the GLM-4-Voice License (https://huggingface.co/THUDM/glm-4-voice-9b).