Improve model card: Add paper link, pipeline tag, library name, links, and usage
Browse filesThis PR significantly enhances the model card for the `lil-lab/respect` model by:
- Updating the paper reference to the official Hugging Face paper: [The Era of Real-World Human Interaction: RL from User Conversations](https://huggingface.co/papers/2509.25137).
- Adding `pipeline_tag: image-text-to-text` for improved discoverability on the Hub.
- Specifying `library_name: transformers` based on the explicit usage of the `transformers` library in the GitHub README.
- Including direct links to the project page and GitHub repository.
- Providing a concise overview of the model based on the paper abstract.
- Adding a comprehensive sample usage section, including environment setup, data download, and model loading, directly from the GitHub README.
README.md
CHANGED
|
@@ -1,9 +1,63 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
base_model:
|
| 6 |
- HuggingFaceM4/idefics2-8b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- HuggingFaceM4/idefics2-8b
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: image-text-to-text
|
| 8 |
+
library_name: transformers
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# The Era of Real-World Human Interaction: RL from User Conversations
|
| 12 |
+
|
| 13 |
+
This repository contains the `lil-lab/respect` model, based on the paper [The Era of Real-World Human Interaction: RL from User Conversations](https://huggingface.co/papers/2509.25137).
|
| 14 |
+
|
| 15 |
+
## Model Description
|
| 16 |
+
The model introduces Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations to achieve continual model improvement and multifaceted alignment. It develops two complementary methods: (1) RLHI with User-Guided Rewrites, which revises unsatisfactory model outputs based on users' natural-language follow-up responses, and (2) RLHI with User-Based Rewards, which learns via a reward model conditioned on knowledge of the user's long-term interaction history (termed persona). These methods link long-term user personas to turn-level preferences via persona-conditioned preference optimization.
|
| 17 |
+
|
| 18 |
+
## Project Resources
|
| 19 |
+
* **Project Page:** [https://lil-lab.github.io/respect](https://lil-lab.github.io/respect)
|
| 20 |
+
* **Code Repository:** [https://github.com/lil-lab/respect](https://github.com/lil-lab/respect)
|
| 21 |
+
|
| 22 |
+
## Sample Usage
|
| 23 |
+
|
| 24 |
+
To get started with the model, follow these steps:
|
| 25 |
+
|
| 26 |
+
### 1. Setting up Environment
|
| 27 |
+
|
| 28 |
+
Prepare your conda environment:
|
| 29 |
+
|
| 30 |
+
```bash
|
| 31 |
+
conda create -n respect python=3.9.18
|
| 32 |
+
pip install -r requirements.txt
|
| 33 |
+
pip install -e .
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
### 2. Download Data
|
| 37 |
+
|
| 38 |
+
```python
|
| 39 |
+
from datasets import load_dataset
|
| 40 |
+
|
| 41 |
+
ds = load_dataset("lil-lab/respect", name="turn", split="train")
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
### 3. Load Model Checkpoints
|
| 45 |
+
|
| 46 |
+
Download checkpoints and load the model using `transformers` and `peft`:
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
import torch
|
| 50 |
+
from transformers import Idefics2ForConditionalGeneration
|
| 51 |
+
from peft import PeftModel
|
| 52 |
+
|
| 53 |
+
checkpoint = "HuggingFaceM4/idefics2-8b"
|
| 54 |
+
model_id = 'lil-lab/respect'
|
| 55 |
+
|
| 56 |
+
model = Idefics2ForConditionalGeneration.from_pretrained(
|
| 57 |
+
checkpoint, torch_dtype=torch.bfloat16)
|
| 58 |
+
peft_model = PeftModel.from_pretrained(
|
| 59 |
+
model, model_id, adapter_name="r6_bp", revision="r6_bp")
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
## Reproducibility
|
| 63 |
+
To generate plots from the paper, run `analysis/plots.ipynb` in the [GitHub repository](https://github.com/lil-lab/respect).
|