---
license: apache-2.0
language:
  - en
tags:
  - negotiation
  - emotion
  - llm-agent
  - lora
  - peft
  - offline-rl
  - iql
  - small-language-model
  - edge-deployable
  - arxiv:2605.26785
  - arxiv:2503.21080
  - arxiv:2511.03370
  - arxiv:2509.04310
  - arxiv:2604.07003
library_name: peft
base_model: Qwen/Qwen2.5-7B-Instruct
datasets:
  - humanlong/emotion-negotiation-benchmarks
pipeline_tag: text-generation
---

# EmoDistill-7b

> **Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation.**
>
> [![arXiv](https://img.shields.io/badge/arXiv-2605.26785-b31b1b.svg)](https://arxiv.org/abs/2605.26785) [![HF Paper](https://img.shields.io/badge/🤗-Paper-orange.svg)](https://huggingface.co/papers/2605.26785) [![GitHub](https://img.shields.io/badge/GitHub-code-black.svg)](https://github.com/Yunbo-max/EmoDistill) [![Dataset](https://img.shields.io/badge/🤗-Dataset-orange.svg)](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) [![HF Collection](https://img.shields.io/badge/🤗-Collection-orange.svg)](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)

**EmoDistill turns a 7B base LLM into a domain-adaptive emotion-aware negotiation agent.** It decouples *what emotion to show* (an IQL emotion selector over a 28-emotion vocabulary) from *how to express it* (LoRA-SFT imitation followed by JPO refinement against a per-turn LLM judge) — both learned from a fixed **offline** corpus of LLM-vs-LLM negotiations.

This repository hosts **all eight model variants** from the paper: a full **IQL + LoRA-SFT + JPO** stack and an **emotion-free LoRA-SFT-only baseline**, one of each per benchmark domain — **CRAD**, **DESRD**, **SSAD**, **SSD** — for direct head-to-head comparison.

![EmoDistill workflow](figs/workflow.png)

> 🚧 **Status:** model card and repository layout live; **trained checkpoint weights are uploading rolling**. Each domain folder will hold its adapter once final training completes. Subscribe to the repo to be notified.

---

## 📦 What's in this repo

Every domain comes in two variants:

| Variant | What it is | Folder pattern |
|---|---|---|
| **EmoDistill (full)** — IQL + LoRA-SFT + JPO | The main method: IQL emotion selector picks the emotion, LoRA-SFT adapter expresses it, JPO refines against an LLM judge. Reported as **best** in the paper. | `<domain>/emodistill/` |
| **Emotion-free baseline** — LoRA-SFT only | LoRA fine-tune on the same offline corpus **without** the IQL emotion controller and **without** the JPO judge loop. Isolates "imitation alone" so you can attribute gains to the emotion control + judge components. | `<domain>/emotionfree/` |

Across the four benchmark domains:

| Domain | Paper acronym | EmoDistill (full) | Emotion-free baseline |
|---|---|---|---|
| Credit / debt recovery | **CRAD** | [`crad/emodistill/`](./crad/emodistill) | [`crad/emotionfree/`](./crad/emotionfree) |
| Disaster / emergency response | **DESRD** | [`desrd/emodistill/`](./desrd/emodistill) | [`desrd/emotionfree/`](./desrd/emotionfree) |
| Student bedtime negotiation | **SSAD** | [`ssad/emodistill/`](./ssad/emodistill) | [`ssad/emotionfree/`](./ssad/emotionfree) |
| Surgical scheduling | **SSD** | [`ssd/emodistill/`](./ssd/emodistill) | [`ssd/emotionfree/`](./ssd/emotionfree) |

Inside each `emodistill/` subfolder:
- `adapter/` — LoRA-SFT+JPO adapter weights (`adapter_model.safetensors`, `adapter_config.json`)
- `iql/` — IQL emotion selector weights (`q_net.pt`, `v_net.pt`, `policy.pt`)
- `config.json` — IQL hyperparameters, emotion vocabulary, JPO settings

Inside each `emotionfree/` subfolder:
- `adapter/` — LoRA-SFT-only adapter weights

---

## 📐 Method

EmoDistill composes **three offline-trained components** at inference (full variant):

1. **IQL emotion selector** — Implicit Q-Learning over a **28-emotion vocabulary**, trained on logged LLM-vs-LLM negotiation trajectories. Picks the emotion to express at each turn.
2. **LoRA-SFT expression imitation** — LoRA adapter on top of the 7B base, trained by *imitation* on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned utterances.
3. **JPO (Judge Policy Optimization)** — PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.

All three components are **fully offline** — no live LLM API at training time after the negotiation log is collected — and **edge-deployable**: at inference, the runtime is a single 7B model with a LoRA adapter (a few hundred MB) plus a small Q-network for emotion selection.

The **emotion-free baseline** isolates the contribution of the IQL + JPO components by training only the LoRA-SFT step on the same offline turns, with no emotion conditioning and no judge refinement.

## 🚀 Intended use

- **Primary task:** emotion-aware negotiation in agent-to-agent settings across the four domains.
- **Deployment:** on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
- **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) for all eight variants. Compatible with both OpenAI and DashScope serving stacks via the `LLMClient` wrapper in the [code repo](https://github.com/Yunbo-max/EmoDistill).

## 📊 Evaluation

All eight variants are evaluated on their respective subset of [`humanlong/emotion-negotiation-benchmarks`](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) (100 scenarios per domain). The paper reports identical metrics across the 4 domains for direct comparison.

Companion baselines (same benchmarks, same protocol — full numbers in the paper):

- **[EmoDebt](https://github.com/Yunbo-max/EmoDebt)** (AAMAS 2026 Main, [arXiv:2503.21080](https://arxiv.org/abs/2503.21080)) — Bayesian-optimized emotional intelligence engine.
- **[EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator)** (NeurIPS 2025, [arXiv:2511.03370](https://arxiv.org/abs/2511.03370)) — persona + HMM + WSLS, learning-free.
- **[EvoEmo](https://github.com/Yunbo-max/EvoEmo)** ([arXiv:2509.04310](https://arxiv.org/abs/2509.04310)) — online evolutionary emotion policies.
- **[EmoMAS](https://github.com/Yunbo-max/EmoMAS)** (ACL 2026 Main, top 9%, [arXiv:2604.07003](https://arxiv.org/abs/2604.07003)) — Bayesian multi-agent orchestration, no pre-training.
- Vanilla 7B (no adapter, no emotion guidance).

**Headline result:** EmoDistill (full) achieves the highest utility across all four domains, surpassing both vanilla and emotion-free baselines, and outperforming the other emotion-aware methods on edge-deployable 7B compute budgets.

## 📦 Quick start (after checkpoint release)

Loading any variant follows the same pattern — just change the `subfolder` argument:

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen2.5-7B-Instruct"
repo = "humanlong/EmoDistill-7b"

# Pick: ("crad" | "desrd" | "ssad" | "ssd") x ("emodistill" | "emotionfree")
domain  = "crad"
variant = "emodistill"          # full IQL + SFT + JPO
# variant = "emotionfree"        # LoRA-SFT-only baseline

tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="auto")
model = PeftModel.from_pretrained(model, repo, subfolder=f"{domain}/{variant}/adapter")
```

For the **full pipeline** (IQL emotion selection → LoRA generation → JPO-refined responses), use the helper code in the [EmoDistill GitHub repo](https://github.com/Yunbo-max/EmoDistill):

```python
from emodistill import EmoDistillAgent
agent = EmoDistillAgent.from_pretrained("humanlong/EmoDistill-7b", domain="crad")
reply = agent.respond(conversation_history, opponent_state)
```

## ⚠️ Limitations

- All adapters are trained for **English**. Cross-lingual transfer is not evaluated.
- The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
- Each adapter is domain-specific — using `crad/emodistill` on a disaster scenario will degrade gracefully but is not the recommended use.
- The model is designed to be persuasive but ethical — adversarial use to manipulate vulnerable users (debtors, patients, children, disaster survivors) is **out of scope** and explicitly discouraged.

## 📝 License

Apache 2.0 — matches the base model.

## 📚 Citation

```bibtex
@article{long2026emodistill,
  title   = {EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation},
  author  = {Long, Yunbo and Zhao, Haolang and Beckenbauer, Lukas and Xu, Liming and Brintrup, Alexandra},
  journal = {arXiv preprint arXiv:2605.26785},
  year    = {2026}
}
```

## 🔗 The full research thread

| Work | Venue | Role |
|---|---|---|
| [EmoDebt](https://github.com/Yunbo-max/EmoDebt) | AAMAS 2026 Main | Bayesian-optimized emotional intelligence (foundational) |
| [EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator) | NeurIPS 2025 | Personas + HMM + WSLS for SLMs |
| [EvoEmo](https://github.com/Yunbo-max/EvoEmo) | arXiv preprint | Online evolutionary emotion policies |
| [EmoMAS](https://github.com/Yunbo-max/EmoMAS) | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks |
| **EmoDistill** *(this repo)* | under review | Offline distillation: **4 domain models + 4 emotion-free baselines** in a 7B SLM |

🌟 All five papers + dataset + model in one place: [HF Collection — Emotion-Aware LLM Negotiation](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)