Instructions to use opendatalab/ChartVerse-Coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use opendatalab/ChartVerse-Coder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="opendatalab/ChartVerse-Coder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("opendatalab/ChartVerse-Coder")
model = AutoModelForCausalLM.from_pretrained("opendatalab/ChartVerse-Coder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use opendatalab/ChartVerse-Coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "opendatalab/ChartVerse-Coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "opendatalab/ChartVerse-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/opendatalab/ChartVerse-Coder

SGLang

How to use opendatalab/ChartVerse-Coder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "opendatalab/ChartVerse-Coder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "opendatalab/ChartVerse-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "opendatalab/ChartVerse-Coder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "opendatalab/ChartVerse-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use opendatalab/ChartVerse-Coder with Docker Model Runner:
```
docker model run hf.co/opendatalab/ChartVerse-Coder
```

starriver030515 commited on Jan 19

Commit

d5c7574

verified ·

1 Parent(s): df54d71

Create README.md

Browse files

Files changed (1) hide show

README.md +186 -0

README.md ADDED Viewed

	@@ -0,0 +1,186 @@

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- Qwen/Qwen2.5-Coder-7B-Instruct
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- chart
+- code-generation
+- visualization
+- matplotlib
+- data-visualization
+- complexity-aware
+datasets:
+- opendatalab/ChartVerse-Coder-Data
+---
+**ChartVerse-Coder** is a complexity-aware chart code generator that can autonomously synthesize diverse, high-complexity chart codes from scratch, developed as part of the **[opendatalab/ChartVerse](https://huggingface.co/collections/opendatalab/chartverse)** project. For more details about our method, datasets, and full model series, please visit our [GitHub Repository](https://github.com/starriver030515/ChartVerse) and [Project Page](https://chartverse.github.io).
+Unlike prior template-based or seed-conditioned approaches, ChartVerse-Coder generates chart code via high-temperature sampling, enabling broad exploration of the long-tail chart distribution and producing diverse, realistic charts with high structural complexity.
+## 🔥 Highlights
+- **Autonomous Synthesis**: Generates diverse chart codes from scratch without templates or seed charts
+- **Complexity-Aware**: Trained with RPE-guided filtering to master high-complexity visualizations
+- **High Diversity**: Produces charts spanning 3D plots, hierarchical structures, multi-subplot layouts, and more
+- **Iterative Self-Enhancement**: Progressively improves code quality through generation-filtering-retraining loops
+## 🔬 Method Overview
+### Rollout Posterior Entropy (RPE)
+<div align="center">
+  <img src="https://raw.githubusercontent.com/chartverse/chartverse.github.io/main/static/images/rpe_illustration.png" width="100%" alt="RPE Illustration">
+</div>
+We propose **Rollout Posterior Entropy (RPE)** to quantify intrinsic chart complexity via generative stability:
+1. **VLM Rollout**: Given a chart, prompt a VLM to generate executable code 8 times with temperature 1.0
+2. **Feature Extraction**: Extract CLIP embeddings from reconstructed images and compute Gram matrix
+3. **Spectral Entropy**: Calculate entropy from normalized singular values
+**Key Insight**: Simple charts yield consistent reconstructions (low RPE), while complex charts result in divergent outcomes (high RPE). We retain only samples with **RPE ≥ 0.4**.
+### Training Pipeline
+<div align="center">
+  <img src="https://raw.githubusercontent.com/chartverse/chartverse.github.io/main/static/images/pipeline.png" width="100%" alt="ChartVerse Pipeline">
+</div>
+**Stage 1: Difficulty-Filtered Cold Start**
+- Aggregate charts from existing datasets and filter by RPE ≥ 0.4
+- Use Claude-4-Sonnet to infer source code for high-complexity charts
+- Curate **60K** high-quality seed samples
+**Stage 2: Iterative Self-Enhancement**
+- Generate 2M raw candidates via high-temperature sampling
+- Apply tri-fold filtering:
+  - ✅ Valid Execution
+  - ✅ High Complexity (RPE ≥ 0.4)
+  - ✅ Low Similarity to existing data (Cosine Sim ≤ 0.65)
+- Retrain coder on expanded dataset
+- Repeat for 2 iterations
+**Final Output**: Generate **700K** high-complexity chart code samples for downstream QA synthesis.
+## 🏋️ Training Details
+- **Base Model**: Qwen2.5-Coder-7B-Instruct
+- **Cold Start Data**: 60K high-complexity samples
+- **Boost Data**: 200K iteratively filtered samples
+- **Training**: Full-parameter fine-tuning with LLaMA-Factory
+- **Learning Rate**: 2.0 × 10⁻⁵
+- **Batch Size**: 16
+- **Context Length**: 4,096 tokens
+- **Epochs**: 5
+- **Precision**: BF16
+## 📊 Synthesized Data Quality
+### Comparison with Existing Datasets
+<div align="center">
+  <img src="https://raw.githubusercontent.com/chartverse/chartverse.github.io/main/static/images/chart_cmp.png" width="100%" alt="Dataset Comparison">
+</div>
+ChartVerse-Coder synthesizes charts with significantly higher complexity and diversity than all existing datasets.
+### Synthesized Chart Examples
+<div align="center">
+  <img src="https://raw.githubusercontent.com/chartverse/chartverse.github.io/main/static/images/complex_images.png" width="100%" alt="Complex Chart Examples">
+</div>
+Our synthesized charts demonstrate exceptional diversity:
+- **3D Visualizations**: Surface plots, 3D bar charts, scatter plots
+- **Hierarchical Structures**: Treemaps, sunburst charts, dendrograms
+- **Statistical Plots**: Violin plots, radar charts, box plots with annotations
+- **Multi-Subplot Layouts**: Complex dashboards with mixed chart types
+- **Specialized Charts**: Sankey diagrams, chord diagrams, heatmaps with clustering
+## 🚀 Quick Start
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load Model
+model_path = "opendatalab/ChartVerse-Coder"
+model = AutoModelForCausalLM.from_pretrained(
+    model_path, torch_dtype="auto", device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+# System Prompt
+prompt = """You are a Python visualization expert. Generate a random Python visualization code focusing on charts, tables, or diagrams.
+Requirements:
+- Choose any visualization type (chart, table, flowchart, diagram, etc.)
+- Create sample data
+- Use Python visualization library (matplotlib, graphviz, etc.)
+- Make it visually appealing with proper labels, titles, and colors
+- Include sufficient visual elements
+- Carefully design the layout to avoid any overlapping text or elements
+- Adjust figure size, margins, and spacing for optimal clarity
+- Make it visually appealing with proper labels, titles, and colors
+Output format: Only output the Python visualization code wrapped in ```python```
+"""
+# Generate Chart Code
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to("cuda")
+# High-temperature sampling for diversity
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=4096,
+    temperature=1.0,
+    top_p=0.95,
+    top_k=20,
+    do_sample=True
+)
+generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_code)
+```
+### Execute Generated Code
+```python
+import re
+import matplotlib.pyplot as plt
+# Extract code from response
+code_match = re.search(r'```python\n(.*?)```', generated_code, re.DOTALL)
+if code_match:
+    code = code_match.group(1)
+    exec(code)  # This will save the figure as 'image.png'
+```
+## 📖 Citation
+```bibtex
+@article{chartverse2026,
+  title={ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch},
+  author={Anonymous Authors},
+  journal={Anonymous ACL Submission},
+  year={2026}
+}
+```
+## 📄 License
+This model is released under the Apache 2.0 License.
+## 🙏 Acknowledgements
+- Base model: [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)
+- Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)
+- Code inference: Claude-4-Sonnet for cold start data generation