Instructions to use GAIR/daVinci-Dev-32B-MT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use GAIR/daVinci-Dev-32B-MT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="GAIR/daVinci-Dev-32B-MT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("GAIR/daVinci-Dev-32B-MT")
model = AutoModelForCausalLM.from_pretrained("GAIR/daVinci-Dev-32B-MT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use GAIR/daVinci-Dev-32B-MT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "GAIR/daVinci-Dev-32B-MT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GAIR/daVinci-Dev-32B-MT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/GAIR/daVinci-Dev-32B-MT

SGLang

How to use GAIR/daVinci-Dev-32B-MT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "GAIR/daVinci-Dev-32B-MT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GAIR/daVinci-Dev-32B-MT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "GAIR/daVinci-Dev-32B-MT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GAIR/daVinci-Dev-32B-MT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use GAIR/daVinci-Dev-32B-MT with Docker Model Runner:
```
docker model run hf.co/GAIR/daVinci-Dev-32B-MT
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

daVinci-Dev: Agent-native Mid-training for Software Engineering

Overview
Key Results
Model Zoo
Datasets
Pipeline
Quick Start
Training
Evaluation
License
Citation

Overview

daVinci-Dev is a family of large language models trained for agentic software engineering.

This work presents a systematic study of agentic mid-training and introduces agent-native data to reduce the distribution mismatch between static pretraining corpora and the dynamic, feedback-rich environments faced by real code agents.

Our training uses two complementary trajectory types (details in the paper):

Contextually-native trajectories $\mathcal{D}^{\text{ctx}}_{\text{py}}$ (PR-derived): preserve the full information flow by bundling file discovery/context retrieval together with sequential edits. This provides broad coverage and diversity.
Environmentally-native trajectories $\mathcal{D}^{\text{env}}_{\text{pass}}$ (executable rollouts): collected from real executable repositories with genuine tool/test outputs, capturing authentic feedback loops.

Resources (open-source / open-release):

Paper + data processing pipeline: https://github.com/GAIR-NLP/daVinci-Dev
Dataset: https://huggingface.co/datasets/GAIR/daVinci-Dev

Key Results

SWE-Bench Verified

We reach SOTA among open training recipes using agentic scaffolds under their model sizes, despite starting from the non-coder Qwen2.5-Base family.

Model	SWE-Bench Verified (Pass@1)	Notes
`daVinci-Dev-72B`	58.5%	Agent-native MT + SFT
`daVinci-Dev-32B`	56.1%	Agent-native MT + SFT

Generalization gains: improvements are also observed on standard code benchmarks (e.g., HumanEval/EvalPlus) and scientific reasoning benchmarks (e.g., GPQA/SciBench) as reported in the paper.

Model Zoo

We will open-source model checkpoints on Hugging Face:

Model	Description	Link
`daVinci-Dev-72B`	Final model (agent-native mid-training + env native SFT)	https://huggingface.co/GAIR/daVinci-Dev-72B
`daVinci-Dev-32B`	Final model (agent-native mid-training + env native SFT)	https://huggingface.co/GAIR/daVinci-Dev-32B
`daVinci-Dev-72B-MT`	MT checkpoint (after agent-native mid-training, before SFT)	https://huggingface.co/GAIR/daVinci-Dev-72B-MT
`daVinci-Dev-32B-MT`	MT checkpoint (after agent-native mid-training, before SFT)	https://huggingface.co/GAIR/daVinci-Dev-32B-MT

Datasets

We will open-source our datasets through Hugging Face:

Dataset	Description	Link
`daVinci-Dev`	Agent-native data used in our training recipe (as permitted)	https://huggingface.co/datasets/GAIR/daVinci-Dev

Pipeline

The GitHub repository contains a high-performance pipeline that calls the GitHub API and constructs the structured PR representation used to build $\mathcal{D}^{\text{ctx}}_{\text{py}}$ .

Pipeline	Description	Link
daVinci-Dev Pipeline	a high-performance pipeline used to build $\mathcal{D}^{\text{ctx}}_{\text{py}}$	`GAIR-NLP/daVinci-Dev`

Quick Start

These checkpoints are intended to be used inside the SWE-Agent scaffold. They are also compatible with standard inference frameworks.

Start with HF Transformers

pip install transformers torch

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "GAIR/daVinci-Dev-72B"  # or any checkpoint in the model zoo

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {
        "role": "system",
        "content": (
            "You are a software engineering agent. "
            "When solving tasks, reason about the repo structure, propose minimal edits, "
            "and describe how you would validate with tests."
        ),
    },
    {"role": "user", "content": "Bug: tests fail when X. Please fix it."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        temperature=0.2,
        do_sample=True,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Start with vLLM

pip install vllm

python -m vllm.entrypoints.openai.api_server \
  --model GAIR/daVinci-Dev-72B \
  --tensor-parallel-size 8 \
  --max-model-len 131072

Training

This section summarizes the methodology described in the paper.

Data

Contextually-native PR trajectories: 68.6B tokens (constructed from GitHub pull requests, preserving the coupling between context retrieval and edits).
Environmentally-native executable trajectories: 3.1B raw tokens (4.5B effective tokens), collected by running an agent in real executable environments with tool and test feedback. Trajectories include both test-passing and non-passing rollouts.

Recipe (high level)

Start from the Qwen2.5 base model family (32B / 72B).
Perform agent-native mid-training on PR-derived trajectories (and optionally mixed with executable trajectories).
Perform SFT on the test-passing subset of environmentally-native trajectories.

-MT checkpoints correspond to the state after mid-training and before SFT.

Evaluation

We report performance on SWE-Bench Verified using SWE-Agent with the setup described in the paper (including temperature 0, 128k context, and a 100-step budget). Results are reported as Pass@1 (averaged across 4 runs).

License

This project is a mixed release:

Contextually-native PR-derived subset: only PRs from repositories detected as having a permissive license are included. Each repo’s license is provided in ./ctx-native/filtered_repos/part-0000.parquet.
Environmentally-native subset: derived from SWE-rebench, licensed under CC-BY-4.0.
daVinci-Dev models: released under Qwen license. Users should verify the licensing status of any generated code before using it in production.
daVinci-Dev pipeline: released under the Apache-2.0 license.

Users are responsible for ensuring their downstream usage complies with the licenses of the underlying sources.

Citation

If you use this work, please cite the daVinci-Dev paper.

@misc{zeng2026davincidevagentnativemidtrainingsoftware,
      title={daVinci-Dev: Agent-native Mid-training for Software Engineering},
      author={Ji Zeng and Dayuan Fu and Tiantian Mi and Yumin Zhuang and Yaxing Huang and Xuefeng Li and Lyumanshan Ye and Muhang Xie and Qishuo Hua and Zhen Huang and Mohan Jiang and Hanning Wang and Jifan Lin and Yang Xiao and Jie Sun and Yunze Wu and Pengfei Liu},
      year={2026},
      eprint={2601.18418},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2601.18418},
}

Downloads last month: 16

Safetensors

Model size

33B params

Tensor type

BF16

Model tree for GAIR/daVinci-Dev-32B-MT

Base model

Qwen/Qwen2.5-32B

Finetuned

(120)

this model

Quantizations

2 models

Collection including GAIR/daVinci-Dev-32B-MT

daVinci-Dev

Collection

6 items • Updated Jan 27 • 2

Paper for GAIR/daVinci-Dev-32B-MT

daVinci-Dev: Agent-native Mid-training for Software Engineering

Paper • 2601.18418 • Published Jan 26 • 126

GAIR
/

daVinci-Dev-32B-MT

daVinci-Dev: Agent-native Mid-training for Software Engineering

Table of Contents

Overview

Key Results

SWE-Bench Verified

Model Zoo

Datasets

Pipeline

Quick Start

Training

Data

Recipe (high level)

Evaluation

License

Citation

Model tree for GAIR/daVinci-Dev-32B-MT

Collection including GAIR/daVinci-Dev-32B-MT

daVinci-Dev

Paper for GAIR/daVinci-Dev-32B-MT

daVinci-Dev: Agent-native Mid-training for Software Engineering