Text Generation
PEFT
English
negotiation
emotion
llm-agent
lora
offline-rl
iql
small-language-model
edge-deployable
Instructions to use humanlong/EmoDistill-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use humanlong/EmoDistill-7b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Polish model card: add workflow figure, real arxiv ID, drop EMNLP (under review)
Browse files- .gitattributes +1 -0
- README.md +26 -18
- figs/workflow.png +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
figs/workflow.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -21,13 +21,17 @@ pipeline_tag: text-generation
|
|
| 21 |
|
| 22 |
# EmoDistill-creditor-7b
|
| 23 |
|
| 24 |
-
**Offline
|
|
|
|
|
|
|
| 25 |
|
| 26 |
-
EmoDistill turns a 7B base LLM into a domain-adaptive negotiation agent by decoupling *what emotion to show* from *how to express it*. It learns both from a fixed offline corpus of LLM-vs-LLM negotiations β **no online rollouts, no human feedback** β and refines the expression policy with a per-turn LLM judge.
|
| 27 |
|
| 28 |
-
This
|
| 29 |
|
| 30 |
-
> π§ **Status:**
|
|
|
|
|
|
|
| 31 |
|
| 32 |
---
|
| 33 |
|
|
@@ -39,24 +43,26 @@ EmoDistill composes **three offline-trained components** at inference:
|
|
| 39 |
2. **LoRA-SFT expression imitation** β LoRA adapter on top of the 7B base, trained by *imitation* on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned creditor utterances.
|
| 40 |
3. **JPO (Judge Policy Optimization)** β PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.
|
| 41 |
|
| 42 |
-
The three components are designed to be **fully offline** β no live LLM API needed at training time after the negotiation log is collected β and **edge-deployable**: the runtime is a single 7B model with a LoRA adapter and a small Q-network for emotion selection.
|
| 43 |
|
| 44 |
## π Intended use
|
| 45 |
|
| 46 |
- **Primary task:** automated, emotion-aware credit-recovery negotiation in agent-to-agent settings.
|
| 47 |
- **Deployment:** on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
|
| 48 |
-
- **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). Compatible with both OpenAI
|
| 49 |
|
| 50 |
## π Evaluation
|
| 51 |
|
| 52 |
-
|
|
|
|
|
|
|
| 53 |
|
| 54 |
-
- **[EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator)** (NeurIPS 2025, [arXiv:2511.03370](https://arxiv.org/abs/2511.03370)) β persona + HMM + WSLS,
|
| 55 |
- **[EmoMAS](https://github.com/Yunbo-max/EmoMAS)** (ACL 2026 Main, top 9%, [arXiv:2604.07003](https://arxiv.org/abs/2604.07003)) β Bayesian multi-agent orchestration, no pre-training.
|
| 56 |
- **[EvoEmo](https://github.com/Yunbo-max/EvoEmo)** (AAMAS 2026, [arXiv:2509.04310](https://arxiv.org/abs/2509.04310)) β online evolutionary emotion policies.
|
| 57 |
-
- Vanilla and fixed-emotion 7B baselines.
|
| 58 |
|
| 59 |
-
Headline
|
| 60 |
|
| 61 |
## π¦ Quick start (after checkpoint release)
|
| 62 |
|
|
@@ -81,7 +87,7 @@ For the full pipeline (IQL emotion selection β LoRA generation β JPO-refined
|
|
| 81 |
|
| 82 |
## β οΈ Limitations
|
| 83 |
|
| 84 |
-
- Trained for **credit recovery** in English. Generalization to the other three domains (disaster, education, hospital) in the benchmark suite is not
|
| 85 |
- The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
|
| 86 |
- The model is designed to be persuasive but ethical β adversarial use to manipulate vulnerable debtors is **out of scope** and explicitly discouraged.
|
| 87 |
|
|
@@ -92,19 +98,21 @@ Apache 2.0 β matches the base model.
|
|
| 92 |
## π Citation
|
| 93 |
|
| 94 |
```bibtex
|
| 95 |
-
@
|
| 96 |
-
title
|
| 97 |
-
author
|
| 98 |
-
|
| 99 |
-
|
| 100 |
}
|
| 101 |
```
|
| 102 |
|
| 103 |
-
## π
|
| 104 |
|
| 105 |
| Work | Venue | Role |
|
| 106 |
|---|---|---|
|
| 107 |
| [EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator) | NeurIPS 2025 | Personas + HMM + WSLS for SLMs |
|
| 108 |
| [EvoEmo](https://github.com/Yunbo-max/EvoEmo) | AAMAS 2026 | Online evolutionary emotion policies |
|
| 109 |
| [EmoMAS](https://github.com/Yunbo-max/EmoMAS) | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks |
|
| 110 |
-
| **EmoDistill** *(this repo)* |
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
# EmoDistill-creditor-7b
|
| 23 |
|
| 24 |
+
> **Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation.**
|
| 25 |
+
>
|
| 26 |
+
> [](https://arxiv.org/abs/2605.26785) [](https://huggingface.co/papers/2605.26785) [](https://github.com/Yunbo-max/EmoDistill) [](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)
|
| 27 |
|
| 28 |
+
**EmoDistill turns a 7B base LLM into a domain-adaptive emotion-aware negotiation agent** by decoupling *what emotion to show* from *how to express it*. It learns both from a fixed offline corpus of LLM-vs-LLM negotiations β **no online rollouts, no human feedback** β and refines the expression policy with a per-turn LLM judge.
|
| 29 |
|
| 30 |
+
This repository hosts the **EmoDistill credit-recovery checkpoint**: a LoRA adapter on top of `Qwen2.5-7B-Instruct` plus the IQL emotion selector weights. See the [code repository](https://github.com/Yunbo-max/EmoDistill) for training and full inference pipeline.
|
| 31 |
|
| 32 |
+
> π§ **Status:** model card live, **trained checkpoint coming soon**. The IQL emotion selector + LoRA adapter weights will be uploaded once final training completes; the card here documents the method, intended use, and evaluation protocol.
|
| 33 |
+
|
| 34 |
+

|
| 35 |
|
| 36 |
---
|
| 37 |
|
|
|
|
| 43 |
2. **LoRA-SFT expression imitation** β LoRA adapter on top of the 7B base, trained by *imitation* on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned creditor utterances.
|
| 44 |
3. **JPO (Judge Policy Optimization)** β PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.
|
| 45 |
|
| 46 |
+
The three components are designed to be **fully offline** β no live LLM API needed at training time after the negotiation log is collected β and **edge-deployable**: at inference, the runtime is a single 7B model with a LoRA adapter and a small Q-network for emotion selection.
|
| 47 |
|
| 48 |
## π Intended use
|
| 49 |
|
| 50 |
- **Primary task:** automated, emotion-aware credit-recovery negotiation in agent-to-agent settings.
|
| 51 |
- **Deployment:** on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
|
| 52 |
+
- **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). Compatible with both the OpenAI and DashScope serving stacks via the `LLMClient` wrapper in the [code repo](https://github.com/Yunbo-max/EmoDistill).
|
| 53 |
|
| 54 |
## π Evaluation
|
| 55 |
|
| 56 |
+
Evaluated on the **`credit_recovery`** subset of [`humanlong/emotion-negotiation-benchmarks`](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) (100 scenarios). The paper reports the same evaluation across all 4 domains in the benchmark suite.
|
| 57 |
+
|
| 58 |
+
Companion baselines for direct comparison (same benchmark, same protocol):
|
| 59 |
|
| 60 |
+
- **[EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator)** (NeurIPS 2025, [arXiv:2511.03370](https://arxiv.org/abs/2511.03370)) β persona + HMM + WSLS, learning-free.
|
| 61 |
- **[EmoMAS](https://github.com/Yunbo-max/EmoMAS)** (ACL 2026 Main, top 9%, [arXiv:2604.07003](https://arxiv.org/abs/2604.07003)) β Bayesian multi-agent orchestration, no pre-training.
|
| 62 |
- **[EvoEmo](https://github.com/Yunbo-max/EvoEmo)** (AAMAS 2026, [arXiv:2509.04310](https://arxiv.org/abs/2509.04310)) β online evolutionary emotion policies.
|
| 63 |
+
- Vanilla 7B and fixed-emotion 7B baselines.
|
| 64 |
|
| 65 |
+
Headline result from the paper: **EmoDistill achieves the highest utility across all four domains**, surpassing both vanilla baselines and emotion-selection-only approaches. Full numbers will be cross-linked here when the checkpoint is uploaded.
|
| 66 |
|
| 67 |
## π¦ Quick start (after checkpoint release)
|
| 68 |
|
|
|
|
| 87 |
|
| 88 |
## β οΈ Limitations
|
| 89 |
|
| 90 |
+
- Trained for **credit recovery** in English. Generalization to the other three domains (disaster, education, hospital) in the benchmark suite is reported in the paper but not separately released as checkpoints yet.
|
| 91 |
- The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
|
| 92 |
- The model is designed to be persuasive but ethical β adversarial use to manipulate vulnerable debtors is **out of scope** and explicitly discouraged.
|
| 93 |
|
|
|
|
| 98 |
## π Citation
|
| 99 |
|
| 100 |
```bibtex
|
| 101 |
+
@article{long2026emodistill,
|
| 102 |
+
title = {EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation},
|
| 103 |
+
author = {Long, Yunbo and Zhao, Haolang and Beckenbauer, Lukas and Xu, Liming and Brintrup, Alexandra},
|
| 104 |
+
journal = {arXiv preprint arXiv:2605.26785},
|
| 105 |
+
year = {2026}
|
| 106 |
}
|
| 107 |
```
|
| 108 |
|
| 109 |
+
## π The full research thread
|
| 110 |
|
| 111 |
| Work | Venue | Role |
|
| 112 |
|---|---|---|
|
| 113 |
| [EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator) | NeurIPS 2025 | Personas + HMM + WSLS for SLMs |
|
| 114 |
| [EvoEmo](https://github.com/Yunbo-max/EvoEmo) | AAMAS 2026 | Online evolutionary emotion policies |
|
| 115 |
| [EmoMAS](https://github.com/Yunbo-max/EmoMAS) | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks |
|
| 116 |
+
| **EmoDistill** *(this repo)* | under review | Offline distillation into a 7B SLM |
|
| 117 |
+
|
| 118 |
+
π All four in one place: [HF Collection β Emotion-Aware LLM Negotiation](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)
|
figs/workflow.png
ADDED
|
Git LFS Details
|