Polish model card: add workflow figure, real arxiv ID, drop EMNLP (under review)

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +26 -18
figs/workflow.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+figs/workflow.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -21,13 +21,17 @@ pipeline_tag: text-generation
 # EmoDistill-creditor-7b
-**Offline-distilled 7B emotion-aware credit-recovery negotiation agent.**
-EmoDistill turns a 7B base LLM into a domain-adaptive negotiation agent by decoupling *what emotion to show* from *how to express it*. It learns both from a fixed offline corpus of LLM-vs-LLM negotiations — **no online rollouts, no human feedback** — and refines the expression policy with a per-turn LLM judge.
-This repo will host the released checkpoint and adapter weights for the EmoDistill 7B creditor agent. See the [code repository](https://github.com/Yunbo-max/EmoDistill) for training and inference.
-> 🚧 **Status:** Pretrained 7B fine-tuned creditor checkpoint coming soon. This repo currently hosts the model card and configuration; the LoRA adapter and IQL emotion-selector weights will be uploaded once training finalizes.
 ---
@@ -39,24 +43,26 @@ EmoDistill composes **three offline-trained components** at inference:
 2. **LoRA-SFT expression imitation** — LoRA adapter on top of the 7B base, trained by *imitation* on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned creditor utterances.
 3. **JPO (Judge Policy Optimization)** — PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.
-The three components are designed to be **fully offline** — no live LLM API needed at training time after the negotiation log is collected — and **edge-deployable**: the runtime is a single 7B model with a LoRA adapter and a small Q-network for emotion selection.
 ## 🚀 Intended use
 - **Primary task:** automated, emotion-aware credit-recovery negotiation in agent-to-agent settings.
 - **Deployment:** on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
-- **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). Compatible with both OpenAI-API and DashScope-API serving via the `LLMClient` wrapper in the [code repo](https://github.com/Yunbo-max/EmoDistill).
 ## 📊 Evaluation
-The model is evaluated on the **credit_recovery** subset of [`humanlong/emotion-negotiation-benchmarks`](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) (100 scenarios). Companion baselines for comparison:
-- **[EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator)** (NeurIPS 2025, [arXiv:2511.03370](https://arxiv.org/abs/2511.03370)) — persona + HMM + WSLS, no learning.
 - **[EmoMAS](https://github.com/Yunbo-max/EmoMAS)** (ACL 2026 Main, top 9%, [arXiv:2604.07003](https://arxiv.org/abs/2604.07003)) — Bayesian multi-agent orchestration, no pre-training.
 - **[EvoEmo](https://github.com/Yunbo-max/EvoEmo)** (AAMAS 2026, [arXiv:2509.04310](https://arxiv.org/abs/2509.04310)) — online evolutionary emotion policies.
-- Vanilla and fixed-emotion 7B baselines.
-Headline results will be filled in here after the checkpoint upload.
 ## 📦 Quick start (after checkpoint release)
@@ -81,7 +87,7 @@ For the full pipeline (IQL emotion selection → LoRA generation → JPO-refined
 ## ⚠️ Limitations
-- Trained for **credit recovery** in English. Generalization to the other three domains (disaster, education, hospital) in the benchmark suite is not yet evaluated.
 - The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
 - The model is designed to be persuasive but ethical — adversarial use to manipulate vulnerable debtors is **out of scope** and explicitly discouraged.
@@ -92,19 +98,21 @@ Apache 2.0 — matches the base model.
 ## 📚 Citation
 ```bibtex
-@inproceedings{emodistill2026,
-  title  = {EmoDistill: Offline Emotion Skill Distillation for LM Negotiation Agents},
-  author = {Long, Yunbo and others},
-  year   = {2026},
-  note   = {Submitted to EMNLP}
 }
 ```
-## 🔗 Related work — the full thread
 | Work | Venue | Role |
 |---|---|---|
 | [EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator) | NeurIPS 2025 | Personas + HMM + WSLS for SLMs |
 | [EvoEmo](https://github.com/Yunbo-max/EvoEmo) | AAMAS 2026 | Online evolutionary emotion policies |
 | [EmoMAS](https://github.com/Yunbo-max/EmoMAS) | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks |
-| **EmoDistill** *(this repo)* | EMNLP submission | Offline distillation into 7B SLM |

 # EmoDistill-creditor-7b
+> **Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation.**
+>
+> [![arXiv](https://img.shields.io/badge/arXiv-2605.26785-b31b1b.svg)](https://arxiv.org/abs/2605.26785) [![HF Paper](https://img.shields.io/badge/🤗-Paper-orange.svg)](https://huggingface.co/papers/2605.26785) [![GitHub](https://img.shields.io/badge/GitHub-code-black.svg)](https://github.com/Yunbo-max/EmoDistill) [![HF Collection](https://img.shields.io/badge/🤗-Collection-orange.svg)](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)
+**EmoDistill turns a 7B base LLM into a domain-adaptive emotion-aware negotiation agent** by decoupling *what emotion to show* from *how to express it*. It learns both from a fixed offline corpus of LLM-vs-LLM negotiations — **no online rollouts, no human feedback** — and refines the expression policy with a per-turn LLM judge.
+This repository hosts the **EmoDistill credit-recovery checkpoint**: a LoRA adapter on top of `Qwen2.5-7B-Instruct` plus the IQL emotion selector weights. See the [code repository](https://github.com/Yunbo-max/EmoDistill) for training and full inference pipeline.
+> 🚧 **Status:** model card live, **trained checkpoint coming soon**. The IQL emotion selector + LoRA adapter weights will be uploaded once final training completes; the card here documents the method, intended use, and evaluation protocol.
+![EmoDistill workflow](figs/workflow.png)
 ---
 2. **LoRA-SFT expression imitation** — LoRA adapter on top of the 7B base, trained by *imitation* on top-K advantage-filtered offline turns. Learns to verbalize emotion-conditioned creditor utterances.
 3. **JPO (Judge Policy Optimization)** — PPO-clipped surrogate against a per-turn LLM judge, anchored by KL to the SFT init. Refines the LoRA adapter for naturalness and strategic effectiveness without destabilizing the SFT skills.
+The three components are designed to be **fully offline** — no live LLM API needed at training time after the negotiation log is collected — and **edge-deployable**: at inference, the runtime is a single 7B model with a LoRA adapter and a small Q-network for emotion selection.
 ## 🚀 Intended use
 - **Primary task:** automated, emotion-aware credit-recovery negotiation in agent-to-agent settings.
 - **Deployment:** on-device / edge, where data-privacy constraints make calling a frontier LLM infeasible.
+- **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). Compatible with both the OpenAI and DashScope serving stacks via the `LLMClient` wrapper in the [code repo](https://github.com/Yunbo-max/EmoDistill).
 ## 📊 Evaluation
+Evaluated on the **`credit_recovery`** subset of [`humanlong/emotion-negotiation-benchmarks`](https://huggingface.co/datasets/humanlong/emotion-negotiation-benchmarks) (100 scenarios). The paper reports the same evaluation across all 4 domains in the benchmark suite.
+Companion baselines for direct comparison (same benchmark, same protocol):
+- **[EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator)** (NeurIPS 2025, [arXiv:2511.03370](https://arxiv.org/abs/2511.03370)) — persona + HMM + WSLS, learning-free.
 - **[EmoMAS](https://github.com/Yunbo-max/EmoMAS)** (ACL 2026 Main, top 9%, [arXiv:2604.07003](https://arxiv.org/abs/2604.07003)) — Bayesian multi-agent orchestration, no pre-training.
 - **[EvoEmo](https://github.com/Yunbo-max/EvoEmo)** (AAMAS 2026, [arXiv:2509.04310](https://arxiv.org/abs/2509.04310)) — online evolutionary emotion policies.
+- Vanilla 7B and fixed-emotion 7B baselines.
+Headline result from the paper: **EmoDistill achieves the highest utility across all four domains**, surpassing both vanilla baselines and emotion-selection-only approaches. Full numbers will be cross-linked here when the checkpoint is uploaded.
 ## 📦 Quick start (after checkpoint release)
 ## ⚠️ Limitations
+- Trained for **credit recovery** in English. Generalization to the other three domains (disaster, education, hospital) in the benchmark suite is reported in the paper but not separately released as checkpoints yet.
 - The IQL emotion selector uses a fixed 28-emotion vocabulary; unseen emotions are not supported.
 - The model is designed to be persuasive but ethical — adversarial use to manipulate vulnerable debtors is **out of scope** and explicitly discouraged.
 ## 📚 Citation
 ```bibtex
+@article{long2026emodistill,
+  title   = {EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation},
+  author  = {Long, Yunbo and Zhao, Haolang and Beckenbauer, Lukas and Xu, Liming and Brintrup, Alexandra},
+  journal = {arXiv preprint arXiv:2605.26785},
+  year    = {2026}
 }
 ```
+## 🔗 The full research thread
 | Work | Venue | Role |
 |---|---|---|
 | [EQ-Negotiator](https://github.com/Yunbo-max/EQ-Negotiator) | NeurIPS 2025 | Personas + HMM + WSLS for SLMs |
 | [EvoEmo](https://github.com/Yunbo-max/EvoEmo) | AAMAS 2026 | Online evolutionary emotion policies |
 | [EmoMAS](https://github.com/Yunbo-max/EmoMAS) | ACL 2026 (top 9%) | Bayesian multi-agent orchestration + 4 benchmarks |
+| **EmoDistill** *(this repo)* | under review | Offline distillation into a 7B SLM |
+🌟 All four in one place: [HF Collection — Emotion-Aware LLM Negotiation](https://huggingface.co/collections/humanlong/emotion-aware-llm-negotiation-6a25d88adcd0b6d41c9d8c75)

figs/workflow.png ADDED Viewed

Git LFS Details

SHA256: d4a62fba35f59c28df61691b8653f3186e4c435e1b861843199bb94b01faa84e
Pointer size: 132 Bytes
Size of remote file: 1.59 MB