daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently
π Paper | π» GitHub Repository | π Dataset
Model Card
daVinci-Agency is a long-horizon agentic model finetuned on GLM-4.6. It is designed to master long-horizon tasks by learning from the iterative evolutionary process of real-world software development.
Unlike standard instruction-tuned models, daVinci-Agency is trained on trajectories that explicitly embody key meta-skills: task decomposition, long-term consistency, and iterative refinement. These trajectories, mined from real GitHub Pull Request chains, average 85k tokens and 116 tool invocations, enabling the model to handle complex, multi-step agentic workflows with superior proficiency.
Model Details
- Base Model: GLM-4.6 (Zhipu AI)
- Context Window: 200K Tokens (Inherited from GLM-4.6, optimized for long-horizon history)
- Training Method: Finetuned on daVinci-Agency synthetic data.
- Primary Capabilities: Long-horizon planning, complex tool usage, autonomous coding, and self-correction.
Highlights
- π Unlocking Long-Horizon Agency: By training on interaction trajectories that mirror the "software evolution" process, daVinci-Agency breaks the teacher bounds of traditional data synthesis, learning to maintain consistency over long operational windows.
- π§ Advanced Meta-Skills: The model demonstrates internalized capabilities for decomposing massive tasks into manageable units (PRs) and refining solutions based on feedback, achieving a 47% relative gain on Toolathlon compared to the base model.
- ποΈ Built on GLM-4.6: Leveraging the superior coding and reasoning foundation of GLM-4.6, daVinci-Agency inherits a 200K context window, enabling it to process the extensive context required for complex agentic tasks.
π Performance Benchmarks
daVinci-Agency (239 samples) consistently outperforms much larger synthetic datasets.
| Training Data | Samples | SWE-bench(SWE-agent) | Toolathlon | $\tau^2$-bench | Overall Avg. |
|---|---|---|---|---|---|
| GLM-4.6 (Base) | - | 0.608 | 0.157 | 0.675 | 0.441 |
| SWE-Smith | 66,000 | 0.404 | 0.093 | 0.586 | 0.373 |
| CC-bench | 260 | 0.618 | 0.000 | 0.697 | 0.436 |
| daVinci-Agency | 239 | 0.632 | 0.231 | 0.707 | 0.475 |
Baseline Models Comparison
| Model | SWE-bench(SWE-agent) | Toolathlon | AgencyBench | Overall Avg. |
|---|---|---|---|---|
| DeepSeek-v3.2 | 0.456 | 0.250 | 11.6 | 0.366 |
| Qwen3-235B | 0.504 | 0.046 | 4.6 | 0.309 |
| Kimi-K2-Thinking | 0.318 | 0.213 | 11.8 | 0.404 |
| GLM-4.6 | 0.608 | 0.157 | 11.9 | 0.441 |
| GLM-4.6-daVinci-Agency | 0.632 | 0.231 | 15.9 | 0.475 |
Inference
Since daVinci-Agency is based on GLM-4.6, it utilizes the standard GLM tokenizer and chat template.
Recommended Parameters
Following the GLM-4.6 guidelines, we recommend the following parameters for optimal performance, especially in code-related tasks:
- Temperature: 1.0 (General)
- Top_p: 0.95
- Top_k: 40
Quick Start (Transformers)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model_path = "GAIR/daVinci-Agency" # Replace with actual path
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to(device).eval()
# Example: Complex Long-Horizon Task
query = "Refactor the authentication module in this repository to support OAuth2, ensuring backward compatibility."
messages = [
{"role": "system", "content": "You are an intelligent software engineering agent capable of long-horizon planning and execution."},
{"role": "user", "content": query}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
return_dict=True
).to(device)
gen_kwargs = {
"max_new_tokens": 4096,
"do_sample": True,
"top_p": 0.95,
"temperature": 1.0,
"top_k": 40
}
with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
response = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(response[0], skip_special_tokens=True))
Citation
If you use daVinci-Agency in your research, please cite our work:
@article{jiang2026davinci,
title={daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently},
author={Mohan Jiang and Dayuan Fu and Junhao Shi and Ji Zeng and Weiye Si and Keyu Li and Xuefeng Li and Yang Xiao and Wenjie Li and Dequan Wang and Pengfei Liu},
journal={arXiv preprint arXiv:2602.02619},
year={2026}
}
- Downloads last month
- 27