Text Generation
PEFT
English
negotiation
emotion
llm-agent
lora
offline-rl
iql
small-language-model
edge-deployable
Instructions to use humanlong/EmoDistill-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use humanlong/EmoDistill-7b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - en | |
| tags: | |
| - negotiation | |
| - emotion | |
| - llm-agent | |
| - lora | |
| - peft | |
| - offline-rl | |
| - iql | |
| - small-language-model | |
| - edge-deployable | |
| - arxiv:2605.26785 | |
| - arxiv:2503.21080 | |
| - arxiv:2511.03370 | |
| - arxiv:2509.04310 | |
| - arxiv:2604.07003 | |
| library_name: peft | |
| base_model: Qwen/Qwen2.5-7B-Instruct | |
| datasets: | |
| - humanlong/emotion-negotiation-benchmarks | |
| pipeline_tag: text-generation | |
| # EmoDistill-7b | |
| > **Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation.** | |
| > | |
| > [](https://arxiv.org/abs/2605.26785) [](https://huggingface.co/papers/2605.26785) [](https://github.com/Yunbo-max/EmoDistill) [](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) [](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75) | |
| **EmoDistill turns a 7B base LLM into a domain-adaptive emotion-aware negotiation agent.** It decouples *what emotion to show* (an IQL emotion selector over a 28-emotion vocabulary) from *how to express it* (LoRA-SFT imitation followed by JPO refinement against a per-turn LLM judge) β both learned from a fixed **offline** corpus of LLM-vs-LLM negotiations. | |
| This repository hosts **all eight model variants** from the paper: a full **IQL + LoRA-SFT + JPO** stack and an **emotion-free LoRA-SFT-only baseline**, one of each per benchmark domain β **CRAD**, **DESRD**, **SSAD**, **SSD** β for direct head-to-head comparison. | |
|  | |
| > π§ **Status:** model card and repository layout live; **trained checkpoint weights are uploading rolling**. Each domain folder will hold its adapter once final training completes. Subscribe to the repo to be notified. | |
| --- | |
| ## π¦ What's in this repo | |
| Every domain comes in two variants: | |
| | Variant | What it is | Folder pattern | | |
| |---|---|---| | |
| | **EmoDistill (full)** β IQL + LoRA-SFT + JPO | The main method: IQL emotion selector picks the emotion, LoRA-SFT adapter expresses it, JPO refines against an LLM judge. Reported as **best** in the paper. | `<domain>/emodistill/` | | |
| | **Emotion-free baseline** β LoRA-SFT only | LoRA fine-tune on the same offline corpus **without** the IQL emotion controller and **without** the JPO judge loop. Isolates "imitation alone" so you can attribute gains to the emotion control + judge components. | `<domain>/emotionfree/` | | |
| Across the four benchmark domains: | |
| | Domain | Paper acronym | EmoDistill (full) | Emotion-free baseline | | |
| |---|---|---|---| | |
| | Credit / debt recovery | **CRAD** | [`crad/emodistill/`](./crad/emodistill) | [`crad/emotionfree/`](./crad/emotionfree) | | |
| | Disaster / emergency response | **DESRD** | [`desrd/emodistill/`](./desrd/emodistill) | [`desrd/emotionfree/`](./desrd/emotionfree) | | |
| | Student bedtime negotiation | **SSAD** | [`ssad/emodistill/`](./ssad/emodistill) | [`ssad/emotionfree/`](./ssad/emotionfree) | | |
| | Surgical scheduling | **SSD** | [`ssd/emodistill/`](./ssd/emodistill) | [`ssd/emotionfree/`](./ssd/emotionfree) | | |
| Inside each `emodistill/` subfolder: | |
| - `adapter/` β LoRA-SFT+JPO adapter weights (`adapter_model.safetensors`, `adapter_config.json`) | |
| - `iql/` β IQL emotion selector weights (`q_net.pt`, `v_net.pt`, `policy.pt`) | |
| - `config.json` β IQL hyperparameters, emotion vocabulary, JPO settings | |
| Inside each `emotionfree/` subfolder: | |
| - `adapter/` β LoRA-SFT-only adapter weights | |
| --- | |
| ## π Method | |
| EmoDistill composes **three offline-trained components** at inference (full variant): | |
| 1. **IQL emotion selector** β Implicit Q-Learning over a **28-emotion vocabulary**, trained on logged LLM-vs-LLM negotiation trajectories. Picks the emotion to express at each turn. | |
| 2. **LoRA-SFT expression imitation** β LoRA adapter on top of the 7B base, trained by *imitation* on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned utterances. | |
| 3. **JPO (Judge Policy Optimization)** β PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills. | |
| All three components are **fully offline** β no live LLM API at training time after the negotiation log is collected β and **edge-deployable**: at inference, the runtime is a single 7B model with a LoRA adapter (a few hundred MB) plus a small Q-network for emotion selection. | |
| The **emotion-free baseline** isolates the contribution of the IQL + JPO components by training only the LoRA-SFT step on the same offline turns, with no emotion conditioning and no judge refinement. | |
| ## π Intended use | |
| - **Primary task:** emotion-aware negotiation in agent-to-agent settings across the four domains. | |
| - **Deployment:** on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible. | |
| - **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) for all eight variants. Compatible with both OpenAI and DashScope serving stacks via the `LLMClient` wrapper in the [code repo](https://github.com/Yunbo-max/EmoDistill). | |
| ## π Evaluation | |
| All eight variants are evaluated on their respective subset of [`humanlong/emotion-negotiation-benchmarks`](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) (100 scenarios per domain). The paper reports identical metrics across the 4 domains for direct comparison. | |
| Companion baselines (same benchmarks, same protocol β full numbers in the paper): | |
| - **[EmoDebt](https://github.com/Yunbo-max/EmoDebt)** (AAMAS 2026 Main, [arXiv:2503.21080](https://arxiv.org/abs/2503.21080)) β Bayesian-optimized emotional intelligence engine. | |
| - **[EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator)** (NeurIPS 2025, [arXiv:2511.03370](https://arxiv.org/abs/2511.03370)) β persona + HMM + WSLS, learning-free. | |
| - **[EvoEmo](https://github.com/Yunbo-max/EvoEmo)** ([arXiv:2509.04310](https://arxiv.org/abs/2509.04310)) β online evolutionary emotion policies. | |
| - **[EmoMAS](https://github.com/Yunbo-max/EmoMAS)** (ACL 2026 Main, top 9%, [arXiv:2604.07003](https://arxiv.org/abs/2604.07003)) β Bayesian multi-agent orchestration, no pre-training. | |
| - Vanilla 7B (no adapter, no emotion guidance). | |
| **Headline result:** EmoDistill (full) achieves the highest utility across all four domains, surpassing both vanilla and emotion-free baselines, and outperforming the other emotion-aware methods on edge-deployable 7B compute budgets. | |
| ## π¦ Quick start (after checkpoint release) | |
| Loading any variant follows the same pattern β just change the `subfolder` argument: | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| base = "Qwen/Qwen2.5-7B-Instruct" | |
| repo = "humanlong/EmoDistill-7b" | |
| # Pick: ("crad" | "desrd" | "ssad" | "ssd") x ("emodistill" | "emotionfree") | |
| domain = "crad" | |
| variant = "emodistill" # full IQL + SFT + JPO | |
| # variant = "emotionfree" # LoRA-SFT-only baseline | |
| tok = AutoTokenizer.from_pretrained(base) | |
| model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="auto") | |
| model = PeftModel.from_pretrained(model, repo, subfolder=f"{domain}/{variant}/adapter") | |
| ``` | |
| For the **full pipeline** (IQL emotion selection β LoRA generation β JPO-refined responses), use the helper code in the [EmoDistill GitHub repo](https://github.com/Yunbo-max/EmoDistill): | |
| ```python | |
| from emodistill import EmoDistillAgent | |
| agent = EmoDistillAgent.from_pretrained("humanlong/EmoDistill-7b", domain="crad") | |
| reply = agent.respond(conversation_history, opponent_state) | |
| ``` | |
| ## β οΈ Limitations | |
| - All adapters are trained for **English**. Cross-lingual transfer is not evaluated. | |
| - The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported. | |
| - Each adapter is domain-specific β using `crad/emodistill` on a disaster scenario will degrade gracefully but is not the recommended use. | |
| - The model is designed to be persuasive but ethical β adversarial use to manipulate vulnerable users (debtors, patients, children, disaster survivors) is **out of scope** and explicitly discouraged. | |
| ## π License | |
| Apache 2.0 β matches the base model. | |
| ## π Citation | |
| ```bibtex | |
| @article{long2026emodistill, | |
| title = {EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation}, | |
| author = {Long, Yunbo and Zhao, Haolang and Beckenbauer, Lukas and Xu, Liming and Brintrup, Alexandra}, | |
| journal = {arXiv preprint arXiv:2605.26785}, | |
| year = {2026} | |
| } | |
| ``` | |
| ## π The full research thread | |
| | Work | Venue | Role | | |
| |---|---|---| | |
| | [EmoDebt](https://github.com/Yunbo-max/EmoDebt) | AAMAS 2026 Main | Bayesian-optimized emotional intelligence (foundational) | | |
| | [EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator) | NeurIPS 2025 | Personas + HMM + WSLS for SLMs | | |
| | [EvoEmo](https://github.com/Yunbo-max/EvoEmo) | arXiv preprint | Online evolutionary emotion policies | | |
| | [EmoMAS](https://github.com/Yunbo-max/EmoMAS) | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks | | |
| | **EmoDistill** *(this repo)* | under review | Offline distillation: **4 domain models + 4 emotion-free baselines** in a 7B SLM | | |
| π All five papers + dataset + model in one place: [HF Collection β Emotion-Aware LLM Negotiation](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75) | |