Text Generation
PEFT
English
negotiation
emotion
llm-agent
lora
offline-rl
iql
small-language-model
edge-deployable
Instructions to use humanlong/EmoDistill-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use humanlong/EmoDistill-7b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
File size: 9,766 Bytes
10aac26 241747f 72b0a6b 241747f 10aac26 f3e0310 10aac26 055dbff f3e0310 10aac26 f3e0310 10aac26 89ab86b 055dbff 10aac26 f3e0310 89ab86b f3e0310 89ab86b f3e0310 89ab86b f3e0310 89ab86b f3e0310 10aac26 f3e0310 10aac26 f3e0310 10aac26 f3e0310 89ab86b 10aac26 f3e0310 10aac26 f3e0310 10aac26 f3e0310 055dbff f3e0310 10aac26 f3e0310 055dbff 72b0a6b 10aac26 f3e0310 10aac26 89ab86b 10aac26 f3e0310 10aac26 f3e0310 89ab86b f3e0310 89ab86b 10aac26 f3e0310 10aac26 f3e0310 10aac26 f3e0310 10aac26 f3e0310 10aac26 055dbff 10aac26 055dbff 10aac26 72b0a6b 10aac26 72b0a6b 10aac26 89ab86b 055dbff f3e0310 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | ---
license: apache-2.0
language:
- en
tags:
- negotiation
- emotion
- llm-agent
- lora
- peft
- offline-rl
- iql
- small-language-model
- edge-deployable
- arxiv:2605.26785
- arxiv:2503.21080
- arxiv:2511.03370
- arxiv:2509.04310
- arxiv:2604.07003
library_name: peft
base_model: Qwen/Qwen2.5-7B-Instruct
datasets:
- humanlong/emotion-negotiation-benchmarks
pipeline_tag: text-generation
---
# EmoDistill-7b
> **Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation.**
>
> [](https://arxiv.org/abs/2605.26785) [](https://huggingface.co/papers/2605.26785) [](https://github.com/Yunbo-max/EmoDistill) [](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) [](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)
**EmoDistill turns a 7B base LLM into a domain-adaptive emotion-aware negotiation agent.** It decouples *what emotion to show* (an IQL emotion selector over a 28-emotion vocabulary) from *how to express it* (LoRA-SFT imitation followed by JPO refinement against a per-turn LLM judge) β both learned from a fixed **offline** corpus of LLM-vs-LLM negotiations.
This repository hosts **all eight model variants** from the paper: a full **IQL + LoRA-SFT + JPO** stack and an **emotion-free LoRA-SFT-only baseline**, one of each per benchmark domain β **CRAD**, **DESRD**, **SSAD**, **SSD** β for direct head-to-head comparison.

> π§ **Status:** model card and repository layout live; **trained checkpoint weights are uploading rolling**. Each domain folder will hold its adapter once final training completes. Subscribe to the repo to be notified.
---
## π¦ What's in this repo
Every domain comes in two variants:
| Variant | What it is | Folder pattern |
|---|---|---|
| **EmoDistill (full)** β IQL + LoRA-SFT + JPO | The main method: IQL emotion selector picks the emotion, LoRA-SFT adapter expresses it, JPO refines against an LLM judge. Reported as **best** in the paper. | `<domain>/emodistill/` |
| **Emotion-free baseline** β LoRA-SFT only | LoRA fine-tune on the same offline corpus **without** the IQL emotion controller and **without** the JPO judge loop. Isolates "imitation alone" so you can attribute gains to the emotion control + judge components. | `<domain>/emotionfree/` |
Across the four benchmark domains:
| Domain | Paper acronym | EmoDistill (full) | Emotion-free baseline |
|---|---|---|---|
| Credit / debt recovery | **CRAD** | [`crad/emodistill/`](./crad/emodistill) | [`crad/emotionfree/`](./crad/emotionfree) |
| Disaster / emergency response | **DESRD** | [`desrd/emodistill/`](./desrd/emodistill) | [`desrd/emotionfree/`](./desrd/emotionfree) |
| Student bedtime negotiation | **SSAD** | [`ssad/emodistill/`](./ssad/emodistill) | [`ssad/emotionfree/`](./ssad/emotionfree) |
| Surgical scheduling | **SSD** | [`ssd/emodistill/`](./ssd/emodistill) | [`ssd/emotionfree/`](./ssd/emotionfree) |
Inside each `emodistill/` subfolder:
- `adapter/` β LoRA-SFT+JPO adapter weights (`adapter_model.safetensors`, `adapter_config.json`)
- `iql/` β IQL emotion selector weights (`q_net.pt`, `v_net.pt`, `policy.pt`)
- `config.json` β IQL hyperparameters, emotion vocabulary, JPO settings
Inside each `emotionfree/` subfolder:
- `adapter/` β LoRA-SFT-only adapter weights
---
## π Method
EmoDistill composes **three offline-trained components** at inference (full variant):
1. **IQL emotion selector** β Implicit Q-Learning over a **28-emotion vocabulary**, trained on logged LLM-vs-LLM negotiation trajectories. Picks the emotion to express at each turn.
2. **LoRA-SFT expression imitation** β LoRA adapter on top of the 7B base, trained by *imitation* on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned utterances.
3. **JPO (Judge Policy Optimization)** β PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.
All three components are **fully offline** β no live LLM API at training time after the negotiation log is collected β and **edge-deployable**: at inference, the runtime is a single 7B model with a LoRA adapter (a few hundred MB) plus a small Q-network for emotion selection.
The **emotion-free baseline** isolates the contribution of the IQL + JPO components by training only the LoRA-SFT step on the same offline turns, with no emotion conditioning and no judge refinement.
## π Intended use
- **Primary task:** emotion-aware negotiation in agent-to-agent settings across the four domains.
- **Deployment:** on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
- **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) for all eight variants. Compatible with both OpenAI and DashScope serving stacks via the `LLMClient` wrapper in the [code repo](https://github.com/Yunbo-max/EmoDistill).
## π Evaluation
All eight variants are evaluated on their respective subset of [`humanlong/emotion-negotiation-benchmarks`](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) (100 scenarios per domain). The paper reports identical metrics across the 4 domains for direct comparison.
Companion baselines (same benchmarks, same protocol β full numbers in the paper):
- **[EmoDebt](https://github.com/Yunbo-max/EmoDebt)** (AAMAS 2026 Main, [arXiv:2503.21080](https://arxiv.org/abs/2503.21080)) β Bayesian-optimized emotional intelligence engine.
- **[EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator)** (NeurIPS 2025, [arXiv:2511.03370](https://arxiv.org/abs/2511.03370)) β persona + HMM + WSLS, learning-free.
- **[EvoEmo](https://github.com/Yunbo-max/EvoEmo)** ([arXiv:2509.04310](https://arxiv.org/abs/2509.04310)) β online evolutionary emotion policies.
- **[EmoMAS](https://github.com/Yunbo-max/EmoMAS)** (ACL 2026 Main, top 9%, [arXiv:2604.07003](https://arxiv.org/abs/2604.07003)) β Bayesian multi-agent orchestration, no pre-training.
- Vanilla 7B (no adapter, no emotion guidance).
**Headline result:** EmoDistill (full) achieves the highest utility across all four domains, surpassing both vanilla and emotion-free baselines, and outperforming the other emotion-aware methods on edge-deployable 7B compute budgets.
## π¦ Quick start (after checkpoint release)
Loading any variant follows the same pattern β just change the `subfolder` argument:
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "Qwen/Qwen2.5-7B-Instruct"
repo = "humanlong/EmoDistill-7b"
# Pick: ("crad" | "desrd" | "ssad" | "ssd") x ("emodistill" | "emotionfree")
domain = "crad"
variant = "emodistill" # full IQL + SFT + JPO
# variant = "emotionfree" # LoRA-SFT-only baseline
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="auto")
model = PeftModel.from_pretrained(model, repo, subfolder=f"{domain}/{variant}/adapter")
```
For the **full pipeline** (IQL emotion selection β LoRA generation β JPO-refined responses), use the helper code in the [EmoDistill GitHub repo](https://github.com/Yunbo-max/EmoDistill):
```python
from emodistill import EmoDistillAgent
agent = EmoDistillAgent.from_pretrained("humanlong/EmoDistill-7b", domain="crad")
reply = agent.respond(conversation_history, opponent_state)
```
## β οΈ Limitations
- All adapters are trained for **English**. Cross-lingual transfer is not evaluated.
- The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
- Each adapter is domain-specific β using `crad/emodistill` on a disaster scenario will degrade gracefully but is not the recommended use.
- The model is designed to be persuasive but ethical β adversarial use to manipulate vulnerable users (debtors, patients, children, disaster survivors) is **out of scope** and explicitly discouraged.
## π License
Apache 2.0 β matches the base model.
## π Citation
```bibtex
@article{long2026emodistill,
title = {EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation},
author = {Long, Yunbo and Zhao, Haolang and Beckenbauer, Lukas and Xu, Liming and Brintrup, Alexandra},
journal = {arXiv preprint arXiv:2605.26785},
year = {2026}
}
```
## π The full research thread
| Work | Venue | Role |
|---|---|---|
| [EmoDebt](https://github.com/Yunbo-max/EmoDebt) | AAMAS 2026 Main | Bayesian-optimized emotional intelligence (foundational) |
| [EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator) | NeurIPS 2025 | Personas + HMM + WSLS for SLMs |
| [EvoEmo](https://github.com/Yunbo-max/EvoEmo) | arXiv preprint | Online evolutionary emotion policies |
| [EmoMAS](https://github.com/Yunbo-max/EmoMAS) | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks |
| **EmoDistill** *(this repo)* | under review | Offline distillation: **4 domain models + 4 emotion-free baselines** in a 7B SLM |
π All five papers + dataset + model in one place: [HF Collection β Emotion-Aware LLM Negotiation](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)
|