Improve model card: Add paper link, pipeline tag, library name, links, and usage

This PR significantly enhances the model card for the `lil-lab/respect` model by:
- Updating the paper reference to the official Hugging Face paper: [The Era of Real-World Human Interaction: RL from User Conversations](https://huggingface.co/papers/2509.25137).
- Adding `pipeline_tag: image-text-to-text` for improved discoverability on the Hub.
- Specifying `library_name: transformers` based on the explicit usage of the `transformers` library in the GitHub README.
- Including direct links to the project page and GitHub repository.
- Providing a concise overview of the model based on the paper abstract.
- Adding a comprehensive sample usage section, including environment setup, data download, and model loading, directly from the GitHub README.

Files changed (1) hide show

README.md +58 -4

README.md CHANGED Viewed

@@ -1,9 +1,63 @@
 ---
-license: apache-2.0
-language:
-- en
 base_model:
 - HuggingFaceM4/idefics2-8b
 ---
-<https://arxiv.org/abs/2410.13852>

 ---
 base_model:
 - HuggingFaceM4/idefics2-8b
+language:
+- en
+license: apache-2.0
+pipeline_tag: image-text-to-text
+library_name: transformers
 ---
+# The Era of Real-World Human Interaction: RL from User Conversations
+This repository contains the `lil-lab/respect` model, based on the paper [The Era of Real-World Human Interaction: RL from User Conversations](https://huggingface.co/papers/2509.25137).
+## Model Description
+The model introduces Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations to achieve continual model improvement and multifaceted alignment. It develops two complementary methods: (1) RLHI with User-Guided Rewrites, which revises unsatisfactory model outputs based on users' natural-language follow-up responses, and (2) RLHI with User-Based Rewards, which learns via a reward model conditioned on knowledge of the user's long-term interaction history (termed persona). These methods link long-term user personas to turn-level preferences via persona-conditioned preference optimization.
+## Project Resources
+*   **Project Page:** [https://lil-lab.github.io/respect](https://lil-lab.github.io/respect)
+*   **Code Repository:** [https://github.com/lil-lab/respect](https://github.com/lil-lab/respect)
+## Sample Usage
+To get started with the model, follow these steps:
+### 1. Setting up Environment
+Prepare your conda environment:
+```bash
+conda create -n respect python=3.9.18
+pip install -r requirements.txt
+pip install -e .
+```
+### 2. Download Data
+```python
+from datasets import load_dataset
+ds = load_dataset("lil-lab/respect", name="turn", split="train")
+```
+### 3. Load Model Checkpoints
+Download checkpoints and load the model using `transformers` and `peft`:
+```python
+import torch
+from transformers import Idefics2ForConditionalGeneration
+from peft import PeftModel
+checkpoint = "HuggingFaceM4/idefics2-8b"
+model_id = 'lil-lab/respect'
+model = Idefics2ForConditionalGeneration.from_pretrained(
+    checkpoint, torch_dtype=torch.bfloat16)
+peft_model = PeftModel.from_pretrained(
+    model, model_id, adapter_name="r6_bp", revision="r6_bp")
+```
+## Reproducibility
+To generate plots from the paper, run `analysis/plots.ipynb` in the [GitHub repository](https://github.com/lil-lab/respect).