Image-Text-to-Text
Transformers
English
nielsr HF Staff commited on
Commit
6526d4d
·
verified ·
1 Parent(s): 4e69256

Improve model card: Add paper link, pipeline tag, library name, links, and usage

Browse files

This PR significantly enhances the model card for the `lil-lab/respect` model by:
- Updating the paper reference to the official Hugging Face paper: [The Era of Real-World Human Interaction: RL from User Conversations](https://huggingface.co/papers/2509.25137).
- Adding `pipeline_tag: image-text-to-text` for improved discoverability on the Hub.
- Specifying `library_name: transformers` based on the explicit usage of the `transformers` library in the GitHub README.
- Including direct links to the project page and GitHub repository.
- Providing a concise overview of the model based on the paper abstract.
- Adding a comprehensive sample usage section, including environment setup, data download, and model loading, directly from the GitHub README.

Files changed (1) hide show
  1. README.md +58 -4
README.md CHANGED
@@ -1,9 +1,63 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - HuggingFaceM4/idefics2-8b
 
 
 
 
 
7
  ---
8
 
9
- <https://arxiv.org/abs/2410.13852>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - HuggingFaceM4/idefics2-8b
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
  ---
10
 
11
+ # The Era of Real-World Human Interaction: RL from User Conversations
12
+
13
+ This repository contains the `lil-lab/respect` model, based on the paper [The Era of Real-World Human Interaction: RL from User Conversations](https://huggingface.co/papers/2509.25137).
14
+
15
+ ## Model Description
16
+ The model introduces Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations to achieve continual model improvement and multifaceted alignment. It develops two complementary methods: (1) RLHI with User-Guided Rewrites, which revises unsatisfactory model outputs based on users' natural-language follow-up responses, and (2) RLHI with User-Based Rewards, which learns via a reward model conditioned on knowledge of the user's long-term interaction history (termed persona). These methods link long-term user personas to turn-level preferences via persona-conditioned preference optimization.
17
+
18
+ ## Project Resources
19
+ * **Project Page:** [https://lil-lab.github.io/respect](https://lil-lab.github.io/respect)
20
+ * **Code Repository:** [https://github.com/lil-lab/respect](https://github.com/lil-lab/respect)
21
+
22
+ ## Sample Usage
23
+
24
+ To get started with the model, follow these steps:
25
+
26
+ ### 1. Setting up Environment
27
+
28
+ Prepare your conda environment:
29
+
30
+ ```bash
31
+ conda create -n respect python=3.9.18
32
+ pip install -r requirements.txt
33
+ pip install -e .
34
+ ```
35
+
36
+ ### 2. Download Data
37
+
38
+ ```python
39
+ from datasets import load_dataset
40
+
41
+ ds = load_dataset("lil-lab/respect", name="turn", split="train")
42
+ ```
43
+
44
+ ### 3. Load Model Checkpoints
45
+
46
+ Download checkpoints and load the model using `transformers` and `peft`:
47
+
48
+ ```python
49
+ import torch
50
+ from transformers import Idefics2ForConditionalGeneration
51
+ from peft import PeftModel
52
+
53
+ checkpoint = "HuggingFaceM4/idefics2-8b"
54
+ model_id = 'lil-lab/respect'
55
+
56
+ model = Idefics2ForConditionalGeneration.from_pretrained(
57
+ checkpoint, torch_dtype=torch.bfloat16)
58
+ peft_model = PeftModel.from_pretrained(
59
+ model, model_id, adapter_name="r6_bp", revision="r6_bp")
60
+ ```
61
+
62
+ ## Reproducibility
63
+ To generate plots from the paper, run `analysis/plots.ipynb` in the [GitHub repository](https://github.com/lil-lab/respect).