File size: 880 Bytes
63c59a1
d3454eb
 
63c59a1
 
 
 
 
d3454eb
63c59a1
 
d3454eb
 
63c59a1
 
d3454eb
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
datasets:
- RenlyH/CodeV-RL-Data
language:
- en
- zh
license: mit
metrics:
- accuracy
pipeline_tag: image-text-to-text
library_name: transformers
---

CodeV is a code-based visual agent trained with Tool-Aware Policy Optimization (TAPO) for faithful visual reasoning. This agentic vision-language model is designed to "think with images" by calling image operations, addressing unfaithful visual reasoning in prior models. CodeV achieves competitive accuracy and substantially increases faithful tool-use rates on visual search benchmarks, also demonstrating strong performance on multimodal reasoning and math benchmarks.

This model was presented in the paper [CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization](https://huggingface.co/papers/2511.19661).

Code: https://github.com/RenlyH/CodeV