SII ASI

daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

πŸ“„ Paper   |   πŸ’» GitHub Repository   |   πŸ“š Dataset

Chat

Model Card

daVinci-Agency is a long-horizon agentic model finetuned on GLM-4.6. It is designed to master long-horizon tasks by learning from the iterative evolutionary process of real-world software development.

Unlike standard instruction-tuned models, daVinci-Agency is trained on trajectories that explicitly embody key meta-skills: task decomposition, long-term consistency, and iterative refinement. These trajectories, mined from real GitHub Pull Request chains, average 85k tokens and 116 tool invocations, enabling the model to handle complex, multi-step agentic workflows with superior proficiency.

Model Details

  • Base Model: GLM-4.6 (Zhipu AI)
  • Context Window: 200K Tokens (Inherited from GLM-4.6, optimized for long-horizon history)
  • Training Method: Finetuned on daVinci-Agency synthetic data.
  • Primary Capabilities: Long-horizon planning, complex tool usage, autonomous coding, and self-correction.

Highlights

  • πŸš€ Unlocking Long-Horizon Agency: By training on interaction trajectories that mirror the "software evolution" process, daVinci-Agency breaks the teacher bounds of traditional data synthesis, learning to maintain consistency over long operational windows.
  • 🧠 Advanced Meta-Skills: The model demonstrates internalized capabilities for decomposing massive tasks into manageable units (PRs) and refining solutions based on feedback, achieving a 47% relative gain on Toolathlon compared to the base model.
  • πŸ—οΈ Built on GLM-4.6: Leveraging the superior coding and reasoning foundation of GLM-4.6, daVinci-Agency inherits a 200K context window, enabling it to process the extensive context required for complex agentic tasks.

πŸ“Š Performance Benchmarks

daVinci-Agency (239 samples) consistently outperforms much larger synthetic datasets.

Training Data Samples SWE-bench(SWE-agent) Toolathlon $\tau^2$-bench Overall Avg.
GLM-4.6 (Base) - 0.608 0.157 0.675 0.441
SWE-Smith 66,000 0.404 0.093 0.586 0.373
CC-bench 260 0.618 0.000 0.697 0.436
daVinci-Agency 239 0.632 0.231 0.707 0.475

Baseline Models Comparison

Model SWE-bench(SWE-agent) Toolathlon AgencyBench Overall
Avg.
DeepSeek-v3.2 0.456 0.250 11.6 0.366
Qwen3-235B 0.504 0.046 4.6 0.309
Kimi-K2-Thinking 0.318 0.213 11.8 0.404
GLM-4.6 0.608 0.157 11.9 0.441
GLM-4.6-daVinci-Agency 0.632 0.231 15.9 0.475

Inference

Since daVinci-Agency is based on GLM-4.6, it utilizes the standard GLM tokenizer and chat template.

Recommended Parameters

Following the GLM-4.6 guidelines, we recommend the following parameters for optimal performance, especially in code-related tasks:

  • Temperature: 1.0 (General)
  • Top_p: 0.95
  • Top_k: 40

Quick Start (Transformers)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "GAIR/daVinci-Agency" # Replace with actual path

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True
).to(device).eval()

# Example: Complex Long-Horizon Task
query = "Refactor the authentication module in this repository to support OAuth2, ensuring backward compatibility."

messages = [
    {"role": "system", "content": "You are an intelligent software engineering agent capable of long-horizon planning and execution."},
    {"role": "user", "content": query}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
    return_dict=True
).to(device)

gen_kwargs = {
    "max_new_tokens": 4096,
    "do_sample": True,
    "top_p": 0.95,
    "temperature": 1.0,
    "top_k": 40
}

with torch.no_grad():
    outputs = model.generate(**inputs, **gen_kwargs)
    response = outputs[:, inputs['input_ids'].shape[1]:]
    print(tokenizer.decode(response[0], skip_special_tokens=True))

Citation

If you use daVinci-Agency in your research, please cite our work:

@article{jiang2026davinci,
  title={daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently},
  author={Mohan Jiang and Dayuan Fu and Junhao Shi and Ji Zeng and Weiye Si and Keyu Li and Xuefeng Li and Yang Xiao and Wenjie Li and Dequan Wang and Pengfei Liu},
  journal={arXiv preprint arXiv:2602.02619},
  year={2026}
}
Downloads last month
27
Safetensors
Model size
353B params
Tensor type
BF16
Β·
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for GAIR/daVinci-Agency