humanlong commited on
Commit
055dbff
Β·
verified Β·
1 Parent(s): 10aac26

Polish model card: add workflow figure, real arxiv ID, drop EMNLP (under review)

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +26 -18
  3. figs/workflow.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ figs/workflow.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -21,13 +21,17 @@ pipeline_tag: text-generation
21
 
22
  # EmoDistill-creditor-7b
23
 
24
- **Offline-distilled 7B emotion-aware credit-recovery negotiation agent.**
 
 
25
 
26
- EmoDistill turns a 7B base LLM into a domain-adaptive negotiation agent by decoupling *what emotion to show* from *how to express it*. It learns both from a fixed offline corpus of LLM-vs-LLM negotiations β€” **no online rollouts, no human feedback** β€” and refines the expression policy with a per-turn LLM judge.
27
 
28
- This repo will host the released checkpoint and adapter weights for the EmoDistill 7B creditor agent. See the [code repository](https://github.com/Yunbo-max/EmoDistill) for training and inference.
29
 
30
- > 🚧 **Status:** Pretrained 7B fine-tuned creditor checkpoint coming soon. This repo currently hosts the model card and configuration; the LoRA adapter and IQL emotion-selector weights will be uploaded once training finalizes.
 
 
31
 
32
  ---
33
 
@@ -39,24 +43,26 @@ EmoDistill composes **three offline-trained components** at inference:
39
  2. **LoRA-SFT expression imitation** β€” LoRA adapter on top of the 7B base, trained by *imitation* on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned creditor utterances.
40
  3. **JPO (Judge Policy Optimization)** β€” PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.
41
 
42
- The three components are designed to be **fully offline** β€” no live LLM API needed at training time after the negotiation log is collected β€” and **edge-deployable**: the runtime is a single 7B model with a LoRA adapter and a small Q-network for emotion selection.
43
 
44
  ## πŸš€ Intended use
45
 
46
  - **Primary task:** automated, emotion-aware credit-recovery negotiation in agent-to-agent settings.
47
  - **Deployment:** on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
48
- - **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). Compatible with both OpenAI-API and DashScope-API serving via the `LLMClient` wrapper in the [code repo](https://github.com/Yunbo-max/EmoDistill).
49
 
50
  ## πŸ“Š Evaluation
51
 
52
- The model is evaluated on the **credit_recovery** subset of [`humanlong/emotion-negotiation-benchmarks`](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) (100 scenarios). Companion baselines for comparison:
 
 
53
 
54
- - **[EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator)** (NeurIPS 2025, [arXiv:2511.03370](https://arxiv.org/abs/2511.03370)) β€” persona + HMM + WSLS, no learning.
55
  - **[EmoMAS](https://github.com/Yunbo-max/EmoMAS)** (ACL 2026 Main, top 9%, [arXiv:2604.07003](https://arxiv.org/abs/2604.07003)) β€” Bayesian multi-agent orchestration, no pre-training.
56
  - **[EvoEmo](https://github.com/Yunbo-max/EvoEmo)** (AAMAS 2026, [arXiv:2509.04310](https://arxiv.org/abs/2509.04310)) β€” online evolutionary emotion policies.
57
- - Vanilla and fixed-emotion 7B baselines.
58
 
59
- Headline results will be filled in here after the checkpoint upload.
60
 
61
  ## πŸ“¦ Quick start (after checkpoint release)
62
 
@@ -81,7 +87,7 @@ For the full pipeline (IQL emotion selection β†’ LoRA generation β†’ JPO-refined
81
 
82
  ## ⚠️ Limitations
83
 
84
- - Trained for **credit recovery** in English. Generalization to the other three domains (disaster, education, hospital) in the benchmark suite is not yet evaluated.
85
  - The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
86
  - The model is designed to be persuasive but ethical β€” adversarial use to manipulate vulnerable debtors is **out of scope** and explicitly discouraged.
87
 
@@ -92,19 +98,21 @@ Apache 2.0 β€” matches the base model.
92
  ## πŸ“š Citation
93
 
94
  ```bibtex
95
- @inproceedings{emodistill2026,
96
- title = {EmoDistill: Offline Emotion Skill Distillation for LM Negotiation Agents},
97
- author = {Long, Yunbo and others},
98
- year = {2026},
99
- note = {Submitted to EMNLP}
100
  }
101
  ```
102
 
103
- ## πŸ”— Related work β€” the full thread
104
 
105
  | Work | Venue | Role |
106
  |---|---|---|
107
  | [EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator) | NeurIPS 2025 | Personas + HMM + WSLS for SLMs |
108
  | [EvoEmo](https://github.com/Yunbo-max/EvoEmo) | AAMAS 2026 | Online evolutionary emotion policies |
109
  | [EmoMAS](https://github.com/Yunbo-max/EmoMAS) | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks |
110
- | **EmoDistill** *(this repo)* | EMNLP submission | Offline distillation into 7B SLM |
 
 
 
21
 
22
  # EmoDistill-creditor-7b
23
 
24
+ > **Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation.**
25
+ >
26
+ > [![arXiv](https://img.shields.io/badge/arXiv-2605.26785-b31b1b.svg)](https://arxiv.org/abs/2605.26785) [![HF Paper](https://img.shields.io/badge/πŸ€—-Paper-orange.svg)](https://huggingface.co/papers/2605.26785) [![GitHub](https://img.shields.io/badge/GitHub-code-black.svg)](https://github.com/Yunbo-max/EmoDistill) [![HF Collection](https://img.shields.io/badge/πŸ€—-Collection-orange.svg)](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)
27
 
28
+ **EmoDistill turns a 7B base LLM into a domain-adaptive emotion-aware negotiation agent** by decoupling *what emotion to show* from *how to express it*. It learns both from a fixed offline corpus of LLM-vs-LLM negotiations β€” **no online rollouts, no human feedback** β€” and refines the expression policy with a per-turn LLM judge.
29
 
30
+ This repository hosts the **EmoDistill credit-recovery checkpoint**: a LoRA adapter on top of `Qwen2.5-7B-Instruct` plus the IQL emotion selector weights. See the [code repository](https://github.com/Yunbo-max/EmoDistill) for training and full inference pipeline.
31
 
32
+ > 🚧 **Status:** model card live, **trained checkpoint coming soon**. The IQL emotion selector + LoRA adapter weights will be uploaded once final training completes; the card here documents the method, intended use, and evaluation protocol.
33
+
34
+ ![EmoDistill workflow](figs/workflow.png)
35
 
36
  ---
37
 
 
43
  2. **LoRA-SFT expression imitation** β€” LoRA adapter on top of the 7B base, trained by *imitation* on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned creditor utterances.
44
  3. **JPO (Judge Policy Optimization)** β€” PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.
45
 
46
+ The three components are designed to be **fully offline** β€” no live LLM API needed at training time after the negotiation log is collected β€” and **edge-deployable**: at inference, the runtime is a single 7B model with a LoRA adapter and a small Q-network for emotion selection.
47
 
48
  ## πŸš€ Intended use
49
 
50
  - **Primary task:** automated, emotion-aware credit-recovery negotiation in agent-to-agent settings.
51
  - **Deployment:** on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
52
+ - **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). Compatible with both the OpenAI and DashScope serving stacks via the `LLMClient` wrapper in the [code repo](https://github.com/Yunbo-max/EmoDistill).
53
 
54
  ## πŸ“Š Evaluation
55
 
56
+ Evaluated on the **`credit_recovery`** subset of [`humanlong/emotion-negotiation-benchmarks`](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) (100 scenarios). The paper reports the same evaluation across all 4 domains in the benchmark suite.
57
+
58
+ Companion baselines for direct comparison (same benchmark, same protocol):
59
 
60
+ - **[EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator)** (NeurIPS 2025, [arXiv:2511.03370](https://arxiv.org/abs/2511.03370)) β€” persona + HMM + WSLS, learning-free.
61
  - **[EmoMAS](https://github.com/Yunbo-max/EmoMAS)** (ACL 2026 Main, top 9%, [arXiv:2604.07003](https://arxiv.org/abs/2604.07003)) β€” Bayesian multi-agent orchestration, no pre-training.
62
  - **[EvoEmo](https://github.com/Yunbo-max/EvoEmo)** (AAMAS 2026, [arXiv:2509.04310](https://arxiv.org/abs/2509.04310)) β€” online evolutionary emotion policies.
63
+ - Vanilla 7B and fixed-emotion 7B baselines.
64
 
65
+ Headline result from the paper: **EmoDistill achieves the highest utility across all four domains**, surpassing both vanilla baselines and emotion-selection-only approaches. Full numbers will be cross-linked here when the checkpoint is uploaded.
66
 
67
  ## πŸ“¦ Quick start (after checkpoint release)
68
 
 
87
 
88
  ## ⚠️ Limitations
89
 
90
+ - Trained for **credit recovery** in English. Generalization to the other three domains (disaster, education, hospital) in the benchmark suite is reported in the paper but not separately released as checkpoints yet.
91
  - The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
92
  - The model is designed to be persuasive but ethical β€” adversarial use to manipulate vulnerable debtors is **out of scope** and explicitly discouraged.
93
 
 
98
  ## πŸ“š Citation
99
 
100
  ```bibtex
101
+ @article{long2026emodistill,
102
+ title = {EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation},
103
+ author = {Long, Yunbo and Zhao, Haolang and Beckenbauer, Lukas and Xu, Liming and Brintrup, Alexandra},
104
+ journal = {arXiv preprint arXiv:2605.26785},
105
+ year = {2026}
106
  }
107
  ```
108
 
109
+ ## πŸ”— The full research thread
110
 
111
  | Work | Venue | Role |
112
  |---|---|---|
113
  | [EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator) | NeurIPS 2025 | Personas + HMM + WSLS for SLMs |
114
  | [EvoEmo](https://github.com/Yunbo-max/EvoEmo) | AAMAS 2026 | Online evolutionary emotion policies |
115
  | [EmoMAS](https://github.com/Yunbo-max/EmoMAS) | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks |
116
+ | **EmoDistill** *(this repo)* | under review | Offline distillation into a 7B SLM |
117
+
118
+ 🌟 All four in one place: [HF Collection β€” Emotion-Aware LLM Negotiation](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)
figs/workflow.png ADDED

Git LFS Details

  • SHA256: d4a62fba35f59c28df61691b8653f3186e4c435e1b861843199bb94b01faa84e
  • Pointer size: 132 Bytes
  • Size of remote file: 1.59 MB