Instructions to use RenlyH/CodeV-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RenlyH/CodeV-RL with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="RenlyH/CodeV-RL") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("RenlyH/CodeV-RL") model = AutoModelForImageTextToText.from_pretrained("RenlyH/CodeV-RL") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use RenlyH/CodeV-RL with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RenlyH/CodeV-RL" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RenlyH/CodeV-RL", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/RenlyH/CodeV-RL
- SGLang
How to use RenlyH/CodeV-RL with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RenlyH/CodeV-RL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RenlyH/CodeV-RL", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RenlyH/CodeV-RL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RenlyH/CodeV-RL", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use RenlyH/CodeV-RL with Docker Model Runner:
docker model run hf.co/RenlyH/CodeV-RL
Improve model card: Add pipeline tag, library name, paper and code links
Browse filesThis PR enhances the model card for the CodeV model by adding crucial metadata and expanding the content for better discoverability and user experience.
Key improvements include:
- Adding `pipeline_tag: image-text-to-text` to improve discoverability for vision-language models on the Hugging Face Hub.
- Adding `library_name: transformers` as the model is compatible with the Transformers library, which will enable the automated "How to use" widget.
- Updating the paper link to point to the Hugging Face Papers page (`https://huggingface.co/papers/2511.19661`) for easier access and consistency.
- Including a direct link to the GitHub repository (`https://github.com/RenlyH/CodeV`) for users to access the code.
- Expanding the model description with key information from the paper's abstract, providing a more comprehensive overview of CodeV's purpose and capabilities.
Please review and merge if these improvements are satisfactory!
|
@@ -1,14 +1,20 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- RenlyH/CodeV-RL-Data
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
- zh
|
|
|
|
| 8 |
metrics:
|
| 9 |
- accuracy
|
| 10 |
-
|
| 11 |
-
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen2.5-VL-7B-Instruct
|
| 4 |
datasets:
|
| 5 |
- RenlyH/CodeV-RL-Data
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
- zh
|
| 9 |
+
license: mit
|
| 10 |
metrics:
|
| 11 |
- accuracy
|
| 12 |
+
pipeline_tag: image-text-to-text
|
| 13 |
+
library_name: transformers
|
| 14 |
---
|
| 15 |
|
| 16 |
+
CodeV is a code-based visual agent trained with Tool-Aware Policy Optimization (TAPO) for faithful visual reasoning. This agentic vision-language model is designed to "think with images" by calling image operations, addressing unfaithful visual reasoning in prior models. CodeV achieves competitive accuracy and substantially increases faithful tool-use rates on visual search benchmarks, also demonstrating strong performance on multimodal reasoning and math benchmarks.
|
| 17 |
+
|
| 18 |
+
This model was presented in the paper [CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization](https://huggingface.co/papers/2511.19661).
|
| 19 |
+
|
| 20 |
+
Code: https://github.com/RenlyH/CodeV
|