respect / README.md

Improve model card: Add paper link, pipeline tag, library name, links, and usage

6526d4d verified 5 months ago

2.3 kB

	---
	base_model:
	- HuggingFaceM4/idefics2-8b
	language:
	- en
	license: apache-2.0
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	# The Era of Real-World Human Interaction: RL from User Conversations

	This repository contains the `lil-lab/respect` model, based on the paper [The Era of Real-World Human Interaction: RL from User Conversations](https://huggingface.co/papers/2509.25137).

	## Model Description
	The model introduces Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations to achieve continual model improvement and multifaceted alignment. It develops two complementary methods: (1) RLHI with User-Guided Rewrites, which revises unsatisfactory model outputs based on users' natural-language follow-up responses, and (2) RLHI with User-Based Rewards, which learns via a reward model conditioned on knowledge of the user's long-term interaction history (termed persona). These methods link long-term user personas to turn-level preferences via persona-conditioned preference optimization.

	## Project Resources
	* Project Page: [https://lil-lab.github.io/respect](https://lil-lab.github.io/respect)
	* Code Repository: [https://github.com/lil-lab/respect](https://github.com/lil-lab/respect)

	## Sample Usage

	To get started with the model, follow these steps:

	### 1. Setting up Environment

	Prepare your conda environment:

	```bash
	conda create -n respect python=3.9.18
	pip install -r requirements.txt
	pip install -e .
	```

	### 2. Download Data

	```python
	from datasets import load_dataset

	ds = load_dataset("lil-lab/respect", name="turn", split="train")
	```

	### 3. Load Model Checkpoints

	Download checkpoints and load the model using `transformers` and `peft`:

	```python
	import torch
	from transformers import Idefics2ForConditionalGeneration
	from peft import PeftModel

	checkpoint = "HuggingFaceM4/idefics2-8b"
	model_id = 'lil-lab/respect'

	model = Idefics2ForConditionalGeneration.from_pretrained(
	checkpoint, torch_dtype=torch.bfloat16)
	peft_model = PeftModel.from_pretrained(
	model, model_id, adapter_name="r6_bp", revision="r6_bp")
	```

	## Reproducibility
	To generate plots from the paper, run `analysis/plots.ipynb` in the [GitHub repository](https://github.com/lil-lab/respect).