--- license: apache-2.0 language: - en tags: - negotiation - emotion - llm-agent - lora - peft - offline-rl - iql - small-language-model - edge-deployable - arxiv:2605.26785 - arxiv:2503.21080 - arxiv:2511.03370 - arxiv:2509.04310 - arxiv:2604.07003 library_name: peft base_model: Qwen/Qwen2.5-7B-Instruct datasets: - humanlong/emotion-negotiation-benchmarks pipeline_tag: text-generation --- # EmoDistill-7b > **Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation.** > > [![arXiv](https://img.shields.io/badge/arXiv-2605.26785-b31b1b.svg)](https://arxiv.org/abs/2605.26785) [![HF Paper](https://img.shields.io/badge/🤗-Paper-orange.svg)](https://huggingface.co/papers/2605.26785) [![GitHub](https://img.shields.io/badge/GitHub-code-black.svg)](https://github.com/Yunbo-max/EmoDistill) [![Dataset](https://img.shields.io/badge/🤗-Dataset-orange.svg)](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) [![HF Collection](https://img.shields.io/badge/🤗-Collection-orange.svg)](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75) **EmoDistill turns a 7B base LLM into a domain-adaptive emotion-aware negotiation agent.** It decouples *what emotion to show* (an IQL emotion selector over a 28-emotion vocabulary) from *how to express it* (LoRA-SFT imitation followed by JPO refinement against a per-turn LLM judge) — both learned from a fixed **offline** corpus of LLM-vs-LLM negotiations. This repository hosts **all eight model variants** from the paper: a full **IQL + LoRA-SFT + JPO** stack and an **emotion-free LoRA-SFT-only baseline**, one of each per benchmark domain — **CRAD**, **DESRD**, **SSAD**, **SSD** — for direct head-to-head comparison. ![EmoDistill workflow](figs/workflow.png) > 🚧 **Status:** model card and repository layout live; **trained checkpoint weights are uploading rolling**. Each domain folder will hold its adapter once final training completes. Subscribe to the repo to be notified. --- ## 📦 What's in this repo Every domain comes in two variants: | Variant | What it is | Folder pattern | |---|---|---| | **EmoDistill (full)** — IQL + LoRA-SFT + JPO | The main method: IQL emotion selector picks the emotion, LoRA-SFT adapter expresses it, JPO refines against an LLM judge. Reported as **best** in the paper. | `/emodistill/` | | **Emotion-free baseline** — LoRA-SFT only | LoRA fine-tune on the same offline corpus **without** the IQL emotion controller and **without** the JPO judge loop. Isolates "imitation alone" so you can attribute gains to the emotion control + judge components. | `/emotionfree/` | Across the four benchmark domains: | Domain | Paper acronym | EmoDistill (full) | Emotion-free baseline | |---|---|---|---| | Credit / debt recovery | **CRAD** | [`crad/emodistill/`](./crad/emodistill) | [`crad/emotionfree/`](./crad/emotionfree) | | Disaster / emergency response | **DESRD** | [`desrd/emodistill/`](./desrd/emodistill) | [`desrd/emotionfree/`](./desrd/emotionfree) | | Student bedtime negotiation | **SSAD** | [`ssad/emodistill/`](./ssad/emodistill) | [`ssad/emotionfree/`](./ssad/emotionfree) | | Surgical scheduling | **SSD** | [`ssd/emodistill/`](./ssd/emodistill) | [`ssd/emotionfree/`](./ssd/emotionfree) | Inside each `emodistill/` subfolder: - `adapter/` — LoRA-SFT+JPO adapter weights (`adapter_model.safetensors`, `adapter_config.json`) - `iql/` — IQL emotion selector weights (`q_net.pt`, `v_net.pt`, `policy.pt`) - `config.json` — IQL hyperparameters, emotion vocabulary, JPO settings Inside each `emotionfree/` subfolder: - `adapter/` — LoRA-SFT-only adapter weights --- ## 📐 Method EmoDistill composes **three offline-trained components** at inference (full variant): 1. **IQL emotion selector** — Implicit Q-Learning over a **28-emotion vocabulary**, trained on logged LLM-vs-LLM negotiation trajectories. Picks the emotion to express at each turn. 2. **LoRA-SFT expression imitation** — LoRA adapter on top of the 7B base, trained by *imitation* on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned utterances. 3. **JPO (Judge Policy Optimization)** — PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills. All three components are **fully offline** — no live LLM API at training time after the negotiation log is collected — and **edge-deployable**: at inference, the runtime is a single 7B model with a LoRA adapter (a few hundred MB) plus a small Q-network for emotion selection. The **emotion-free baseline** isolates the contribution of the IQL + JPO components by training only the LoRA-SFT step on the same offline turns, with no emotion conditioning and no judge refinement. ## 🚀 Intended use - **Primary task:** emotion-aware negotiation in agent-to-agent settings across the four domains. - **Deployment:** on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible. - **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) for all eight variants. Compatible with both OpenAI and DashScope serving stacks via the `LLMClient` wrapper in the [code repo](https://github.com/Yunbo-max/EmoDistill). ## 📊 Evaluation All eight variants are evaluated on their respective subset of [`humanlong/emotion-negotiation-benchmarks`](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) (100 scenarios per domain). The paper reports identical metrics across the 4 domains for direct comparison. Companion baselines (same benchmarks, same protocol — full numbers in the paper): - **[EmoDebt](https://github.com/Yunbo-max/EmoDebt)** (AAMAS 2026 Main, [arXiv:2503.21080](https://arxiv.org/abs/2503.21080)) — Bayesian-optimized emotional intelligence engine. - **[EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator)** (NeurIPS 2025, [arXiv:2511.03370](https://arxiv.org/abs/2511.03370)) — persona + HMM + WSLS, learning-free. - **[EvoEmo](https://github.com/Yunbo-max/EvoEmo)** ([arXiv:2509.04310](https://arxiv.org/abs/2509.04310)) — online evolutionary emotion policies. - **[EmoMAS](https://github.com/Yunbo-max/EmoMAS)** (ACL 2026 Main, top 9%, [arXiv:2604.07003](https://arxiv.org/abs/2604.07003)) — Bayesian multi-agent orchestration, no pre-training. - Vanilla 7B (no adapter, no emotion guidance). **Headline result:** EmoDistill (full) achieves the highest utility across all four domains, surpassing both vanilla and emotion-free baselines, and outperforming the other emotion-aware methods on edge-deployable 7B compute budgets. ## 📦 Quick start (after checkpoint release) Loading any variant follows the same pattern — just change the `subfolder` argument: ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = "Qwen/Qwen2.5-7B-Instruct" repo = "humanlong/EmoDistill-7b" # Pick: ("crad" | "desrd" | "ssad" | "ssd") x ("emodistill" | "emotionfree") domain = "crad" variant = "emodistill" # full IQL + SFT + JPO # variant = "emotionfree" # LoRA-SFT-only baseline tok = AutoTokenizer.from_pretrained(base) model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="auto") model = PeftModel.from_pretrained(model, repo, subfolder=f"{domain}/{variant}/adapter") ``` For the **full pipeline** (IQL emotion selection → LoRA generation → JPO-refined responses), use the helper code in the [EmoDistill GitHub repo](https://github.com/Yunbo-max/EmoDistill): ```python from emodistill import EmoDistillAgent agent = EmoDistillAgent.from_pretrained("humanlong/EmoDistill-7b", domain="crad") reply = agent.respond(conversation_history, opponent_state) ``` ## ⚠️ Limitations - All adapters are trained for **English**. Cross-lingual transfer is not evaluated. - The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported. - Each adapter is domain-specific — using `crad/emodistill` on a disaster scenario will degrade gracefully but is not the recommended use. - The model is designed to be persuasive but ethical — adversarial use to manipulate vulnerable users (debtors, patients, children, disaster survivors) is **out of scope** and explicitly discouraged. ## 📝 License Apache 2.0 — matches the base model. ## 📚 Citation ```bibtex @article{long2026emodistill, title = {EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation}, author = {Long, Yunbo and Zhao, Haolang and Beckenbauer, Lukas and Xu, Liming and Brintrup, Alexandra}, journal = {arXiv preprint arXiv:2605.26785}, year = {2026} } ``` ## 🔗 The full research thread | Work | Venue | Role | |---|---|---| | [EmoDebt](https://github.com/Yunbo-max/EmoDebt) | AAMAS 2026 Main | Bayesian-optimized emotional intelligence (foundational) | | [EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator) | NeurIPS 2025 | Personas + HMM + WSLS for SLMs | | [EvoEmo](https://github.com/Yunbo-max/EvoEmo) | arXiv preprint | Online evolutionary emotion policies | | [EmoMAS](https://github.com/Yunbo-max/EmoMAS) | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks | | **EmoDistill** *(this repo)* | under review | Offline distillation: **4 domain models + 4 emotion-free baselines** in a 7B SLM | 🌟 All five papers + dataset + model in one place: [HF Collection — Emotion-Aware LLM Negotiation](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)