Instructions to use humanlong/EmoDistill-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use humanlong/EmoDistill-7b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
EmoDistill-7b
Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation.
EmoDistill turns a 7B base LLM into a domain-adaptive emotion-aware negotiation agent. It decouples what emotion to show (an IQL emotion selector over a 28-emotion vocabulary) from how to express it (LoRA-SFT imitation followed by JPO refinement against a per-turn LLM judge) β both learned from a fixed offline corpus of LLM-vs-LLM negotiations.
This repository hosts all eight model variants from the paper: a full IQL + LoRA-SFT + JPO stack and an emotion-free LoRA-SFT-only baseline, one of each per benchmark domain β CRAD, DESRD, SSAD, SSD β for direct head-to-head comparison.
π§ Status: model card and repository layout live; trained checkpoint weights are uploading rolling. Each domain folder will hold its adapter once final training completes. Subscribe to the repo to be notified.
π¦ What's in this repo
Every domain comes in two variants:
| Variant | What it is | Folder pattern |
|---|---|---|
| EmoDistill (full) β IQL + LoRA-SFT + JPO | The main method: IQL emotion selector picks the emotion, LoRA-SFT adapter expresses it, JPO refines against an LLM judge. Reported as best in the paper. | <domain>/emodistill/ |
| Emotion-free baseline β LoRA-SFT only | LoRA fine-tune on the same offline corpus without the IQL emotion controller and without the JPO judge loop. Isolates "imitation alone" so you can attribute gains to the emotion control + judge components. | <domain>/emotionfree/ |
Across the four benchmark domains:
| Domain | Paper acronym | EmoDistill (full) | Emotion-free baseline |
|---|---|---|---|
| Credit / debt recovery | CRAD | crad/emodistill/ |
crad/emotionfree/ |
| Disaster / emergency response | DESRD | desrd/emodistill/ |
desrd/emotionfree/ |
| Student bedtime negotiation | SSAD | ssad/emodistill/ |
ssad/emotionfree/ |
| Surgical scheduling | SSD | ssd/emodistill/ |
ssd/emotionfree/ |
Inside each emodistill/ subfolder:
adapter/β LoRA-SFT+JPO adapter weights (adapter_model.safetensors,adapter_config.json)iql/β IQL emotion selector weights (q_net.pt,v_net.pt,policy.pt)config.jsonβ IQL hyperparameters, emotion vocabulary, JPO settings
Inside each emotionfree/ subfolder:
adapter/β LoRA-SFT-only adapter weights
π Method
EmoDistill composes three offline-trained components at inference (full variant):
- IQL emotion selector β Implicit Q-Learning over a 28-emotion vocabulary, trained on logged LLM-vs-LLM negotiation trajectories. Picks the emotion to express at each turn.
- LoRA-SFT expression imitation β LoRA adapter on top of the 7B base, trained by imitation on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned utterances.
- JPO (Judge Policy Optimization) β PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.
All three components are fully offline β no live LLM API at training time after the negotiation log is collected β and edge-deployable: at inference, the runtime is a single 7B model with a LoRA adapter (a few hundred MB) plus a small Q-network for emotion selection.
The emotion-free baseline isolates the contribution of the IQL + JPO components by training only the LoRA-SFT step on the same offline turns, with no emotion conditioning and no judge refinement.
π Intended use
- Primary task: emotion-aware negotiation in agent-to-agent settings across the four domains.
- Deployment: on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
- Base model:
Qwen/Qwen2.5-7B-Instructfor all eight variants. Compatible with both OpenAI and DashScope serving stacks via theLLMClientwrapper in the code repo.
π Evaluation
All eight variants are evaluated on their respective subset of humanlong/emotion-negotiation-benchmarks (100 scenarios per domain). The paper reports identical metrics across the 4 domains for direct comparison.
Companion baselines (same benchmarks, same protocol β full numbers in the paper):
- EmoDebt (AAMAS 2026 Main, arXiv:2503.21080) β Bayesian-optimized emotional intelligence engine.
- EQ-Negotiator (NeurIPS 2025, arXiv:2511.03370) β persona + HMM + WSLS, learning-free.
- EvoEmo (arXiv:2509.04310) β online evolutionary emotion policies.
- EmoMAS (ACL 2026 Main, top 9%, arXiv:2604.07003) β Bayesian multi-agent orchestration, no pre-training.
- Vanilla 7B (no adapter, no emotion guidance).
Headline result: EmoDistill (full) achieves the highest utility across all four domains, surpassing both vanilla and emotion-free baselines, and outperforming the other emotion-aware methods on edge-deployable 7B compute budgets.
π¦ Quick start (after checkpoint release)
Loading any variant follows the same pattern β just change the subfolder argument:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "Qwen/Qwen2.5-7B-Instruct"
repo = "humanlong/EmoDistill-7b"
# Pick: ("crad" | "desrd" | "ssad" | "ssd") x ("emodistill" | "emotionfree")
domain = "crad"
variant = "emodistill" # full IQL + SFT + JPO
# variant = "emotionfree" # LoRA-SFT-only baseline
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype="auto")
model = PeftModel.from_pretrained(model, repo, subfolder=f"{domain}/{variant}/adapter")
For the full pipeline (IQL emotion selection β LoRA generation β JPO-refined responses), use the helper code in the EmoDistill GitHub repo:
from emodistill import EmoDistillAgent
agent = EmoDistillAgent.from_pretrained("humanlong/EmoDistill-7b", domain="crad")
reply = agent.respond(conversation_history, opponent_state)
β οΈ Limitations
- All adapters are trained for English. Cross-lingual transfer is not evaluated.
- The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
- Each adapter is domain-specific β using
crad/emodistillon a disaster scenario will degrade gracefully but is not the recommended use. - The model is designed to be persuasive but ethical β adversarial use to manipulate vulnerable users (debtors, patients, children, disaster survivors) is out of scope and explicitly discouraged.
π License
Apache 2.0 β matches the base model.
π Citation
@article{long2026emodistill,
title = {EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation},
author = {Long, Yunbo and Zhao, Haolang and Beckenbauer, Lukas and Xu, Liming and Brintrup, Alexandra},
journal = {arXiv preprint arXiv:2605.26785},
year = {2026}
}
π The full research thread
| Work | Venue | Role |
|---|---|---|
| EmoDebt | AAMAS 2026 Main | Bayesian-optimized emotional intelligence (foundational) |
| EQ-Negotiator | NeurIPS 2025 | Personas + HMM + WSLS for SLMs |
| EvoEmo | arXiv preprint | Online evolutionary emotion policies |
| EmoMAS | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks |
| EmoDistill (this repo) | under review | Offline distillation: 4 domain models + 4 emotion-free baselines in a 7B SLM |
π All five papers + dataset + model in one place: HF Collection β Emotion-Aware LLM Negotiation
- Downloads last month
- -
