Instructions to use CaraJ/ORM-T2I-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CaraJ/ORM-T2I-R1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="CaraJ/ORM-T2I-R1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("CaraJ/ORM-T2I-R1", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CaraJ/ORM-T2I-R1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CaraJ/ORM-T2I-R1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CaraJ/ORM-T2I-R1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/CaraJ/ORM-T2I-R1

SGLang

How to use CaraJ/ORM-T2I-R1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CaraJ/ORM-T2I-R1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CaraJ/ORM-T2I-R1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CaraJ/ORM-T2I-R1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CaraJ/ORM-T2I-R1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use CaraJ/ORM-T2I-R1 with Docker Model Runner:
```
docker model run hf.co/CaraJ/ORM-T2I-R1
```

CaraJ

nielsr HF Staff commited on Jul 2, 2025

Commit

233150b

verified ·

1 Parent(s): cfa5d4e

Improve model card: Add license and expand description (#2)

Browse files

- Improve model card: Add license and expand description (c5f91643658e30c873fa3ac478babf3264b1e1b4)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +11 -4

README.md CHANGED Viewed

@@ -1,12 +1,19 @@
 ---
-library_name: transformers
-pipeline_tag: image-text-to-text
 base_model:
 - lmms-lab/llava-onevision-qwen2-7b-ov
 ---
-This is the output reward model (ORM) used in [T2I-R1](https://github.com/CaraJ7/T2I-R1).
 This model is fine-tuned from [lmms-lab/llava-onevision-qwen2-7b-ov](https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov).
-Please check our paper: "[T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT](https://arxiv.org/pdf/2505.00703)" and [GitHub](https://github.com/CaraJ7/T2I-R1) for more information.

 ---
 base_model:
 - lmms-lab/llava-onevision-qwen2-7b-ov
+library_name: transformers
+pipeline_tag: image-text-to-text
+license: cc-by-nc-4.0
 ---
+This is the **Output Reward Model (ORM)** used in the paper [T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT](https://arxiv.org/pdf/2505.00703).
+T2I-R1 is a novel reasoning-enhanced text-to-image generation model powered by Reinforcement Learning (RL) with a bi-level Chain-of-Thought (CoT) reasoning process. This ORM is crucial for evaluating image generation by leveraging two levels of CoT:
+1.  **Semantic-level CoT**: for high-level planning of the prompt.
+2.  **Token-level CoT**: for low-level pixel processing during patch-by-patch generation.
+The paper introduces BiCoT-GRPO with an ensemble of generation rewards, which seamlessly optimizes both generation CoTs within the same training step. By applying these reasoning strategies to the baseline model, Janus-Pro, T2I-R1 achieves superior performance with a 13% improvement on T2I-CompBench and 19% improvement on the WISE benchmark, even surpassing the state-of-the-art model FLUX.1.
 This model is fine-tuned from [lmms-lab/llava-onevision-qwen2-7b-ov](https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov).
+For more details, please refer to the [official paper](https://arxiv.org/pdf/2505.00703) and the [GitHub repository](https://github.com/CaraJ7/T2I-R1).