Instructions to use RenlyH/CodeV-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RenlyH/CodeV-RL with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="RenlyH/CodeV-RL")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("RenlyH/CodeV-RL")
model = AutoModelForImageTextToText.from_pretrained("RenlyH/CodeV-RL")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use RenlyH/CodeV-RL with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RenlyH/CodeV-RL"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RenlyH/CodeV-RL",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/RenlyH/CodeV-RL

SGLang

How to use RenlyH/CodeV-RL with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RenlyH/CodeV-RL" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RenlyH/CodeV-RL",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RenlyH/CodeV-RL" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RenlyH/CodeV-RL",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use RenlyH/CodeV-RL with Docker Model Runner:
```
docker model run hf.co/RenlyH/CodeV-RL
```

nielsr HF Staff commited on Dec 4, 2025

Commit

e28d411

verified ·

1 Parent(s): 63c59a1

Improve model card: Add pipeline tag, library name, paper and code links

Browse files

This PR enhances the model card for the CodeV model by adding crucial metadata and expanding the content for better discoverability and user experience.

Key improvements include:
- Adding `pipeline_tag: image-text-to-text` to improve discoverability for vision-language models on the Hugging Face Hub.
- Adding `library_name: transformers` as the model is compatible with the Transformers library, which will enable the automated "How to use" widget.
- Updating the paper link to point to the Hugging Face Papers page (`https://huggingface.co/papers/2511.19661`) for easier access and consistency.
- Including a direct link to the GitHub repository (`https://github.com/RenlyH/CodeV`) for users to access the code.
- Expanding the model description with key information from the paper's abstract, providing a more comprehensive overview of CodeV's purpose and capabilities.

Please review and merge if these improvements are satisfactory!

Files changed (1) hide show

README.md +10 -4

README.md CHANGED Viewed

@@ -1,14 +1,20 @@
 ---
-license: mit
 datasets:
 - RenlyH/CodeV-RL-Data
 language:
 - en
 - zh
 metrics:
 - accuracy
-base_model:
-- Qwen/Qwen2.5-VL-7B-Instruct
 ---
-The model CodeV is trained with TAPO described in [paper](https://arxiv.org/abs/2511.19661).

 ---
+base_model:
+- Qwen/Qwen2.5-VL-7B-Instruct
 datasets:
 - RenlyH/CodeV-RL-Data
 language:
 - en
 - zh
+license: mit
 metrics:
 - accuracy
+pipeline_tag: image-text-to-text
+library_name: transformers
 ---
+CodeV is a code-based visual agent trained with Tool-Aware Policy Optimization (TAPO) for faithful visual reasoning. This agentic vision-language model is designed to "think with images" by calling image operations, addressing unfaithful visual reasoning in prior models. CodeV achieves competitive accuracy and substantially increases faithful tool-use rates on visual search benchmarks, also demonstrating strong performance on multimodal reasoning and math benchmarks.
+This model was presented in the paper [CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization](https://huggingface.co/papers/2511.19661).
+Code: https://github.com/RenlyH/CodeV