| | --- |
| | base_model: |
| | - HuggingFaceM4/idefics2-8b |
| | language: |
| | - en |
| | license: apache-2.0 |
| | pipeline_tag: image-text-to-text |
| | library_name: transformers |
| | --- |
| | |
| | # The Era of Real-World Human Interaction: RL from User Conversations |
| |
|
| | This repository contains the `lil-lab/respect` model, based on the paper [The Era of Real-World Human Interaction: RL from User Conversations](https://huggingface.co/papers/2509.25137). |
| |
|
| | ## Model Description |
| | The model introduces Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations to achieve continual model improvement and multifaceted alignment. It develops two complementary methods: (1) RLHI with User-Guided Rewrites, which revises unsatisfactory model outputs based on users' natural-language follow-up responses, and (2) RLHI with User-Based Rewards, which learns via a reward model conditioned on knowledge of the user's long-term interaction history (termed persona). These methods link long-term user personas to turn-level preferences via persona-conditioned preference optimization. |
| |
|
| | ## Project Resources |
| | * **Project Page:** [https://lil-lab.github.io/respect](https://lil-lab.github.io/respect) |
| | * **Code Repository:** [https://github.com/lil-lab/respect](https://github.com/lil-lab/respect) |
| |
|
| | ## Sample Usage |
| |
|
| | To get started with the model, follow these steps: |
| |
|
| | ### 1. Setting up Environment |
| |
|
| | Prepare your conda environment: |
| |
|
| | ```bash |
| | conda create -n respect python=3.9.18 |
| | pip install -r requirements.txt |
| | pip install -e . |
| | ``` |
| |
|
| | ### 2. Download Data |
| |
|
| | ```python |
| | from datasets import load_dataset |
| | |
| | ds = load_dataset("lil-lab/respect", name="turn", split="train") |
| | ``` |
| |
|
| | ### 3. Load Model Checkpoints |
| |
|
| | Download checkpoints and load the model using `transformers` and `peft`: |
| |
|
| | ```python |
| | import torch |
| | from transformers import Idefics2ForConditionalGeneration |
| | from peft import PeftModel |
| | |
| | checkpoint = "HuggingFaceM4/idefics2-8b" |
| | model_id = 'lil-lab/respect' |
| | |
| | model = Idefics2ForConditionalGeneration.from_pretrained( |
| | checkpoint, torch_dtype=torch.bfloat16) |
| | peft_model = PeftModel.from_pretrained( |
| | model, model_id, adapter_name="r6_bp", revision="r6_bp") |
| | ``` |
| |
|
| | ## Reproducibility |
| | To generate plots from the paper, run `analysis/plots.ipynb` in the [GitHub repository](https://github.com/lil-lab/respect). |