license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
This repository contains the model presented in the paper Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL. This work introduces Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enables native end-to-end complex problem-solving in the same way as a multi-agent system (i.e., multi-turn problem solving with multiple tools and multiple agents) within one model. The resulting models are called Agent Foundation Models (AFMs).
- π Paper
- π Project Page
- π» GitHub Repository
- π€ Models Collection
- π Datasets Collection
Overview
Recent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks. Chain-of-Agents (CoA) addresses inefficiencies in existing multi-agent systems by enabling end-to-end complex problem-solving within a single model. The model dynamically activates different tool agents and role-playing agents to simulate multi-agent collaboration in an end-to-end fashion. Agent Foundation Models (AFMs) are trained using a multi-agent distillation framework and agentic reinforcement learning, establishing new state-of-the-art performance across diverse benchmarks in both web agent and code agent settings.
Quick Feature Summary
| Feature Category | Supported Capabilities |
|---|---|
| Core Paradigm | β
Chain-of-Agents (CoA) for end-to-end problem-solving β Single-model simulation of multi-agent collaboration β Dynamic activation of tool agents and role-playing agents |
| Training Framework | β
Multi-Agent Distillation pipeline β Agentic Reinforcement Learning support β Mask fine-tuning for selective learning |
| Agent Capabilities | β
Web interaction (Web Agent) β Multi-hop question answering (MHQA Agent) β Code execution (Code Agent) |
| Tool Integration | β
Web search and crawling servers β Secure code sandbox (via nsjail) β Configurable multi-tool collaboration |
| Evaluation | β
Multi-scenario benchmark testing β Custom reward model integration |
Performance Highlights
AFM achieves state-of-the-art performance across multiple benchmarks in both web agent and code agent settings. For instance, our 32B AFM model reaches an average success rate of 55.3% (Pass@1) on the GAIA benchmark, 11.1% on BrowseComp, 63.0% on WebWalker, and 18.0% on HLE. Test-time scaling further enhances AFM's performance across all benchmarks.
Usage
This model can be loaded and used with the Hugging Face transformers library. Ensure trust_remote_code=True is set for proper functionality, as this model uses custom code.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Replace "PersonalAILab/AFM-32B" with the specific model ID you wish to use,
# e.g., "PersonalAILab/AFM-CodeAgent-32B-rl" or "PersonalAILab/AFM-WebAgent-32B-RL"
model_id = "PersonalAILab/AFM-32B" # Example model ID
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto", # Automatically loads the model across available devices (GPUs/CPU)
torch_dtype=torch.bfloat16, # Uses bfloat16 for efficient memory and computation
trust_remote_code=True # Required for custom modeling and tokenization
)
# Example chat interaction for a multi-hop question answering task
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France and what is its population?"}
]
# Apply chat template and tokenize inputs for proper Qwen2-based chat formatting
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate response
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_k=20,
top_p=0.8,
repetition_penalty=1.05
)
# Decode only the newly generated tokens
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
# For more detailed usage, including training, evaluation, and tool integration,
# please refer to the official GitHub repository.
Citation
If you find AFM useful in your research or applications, we would appreciate it if you could cite our work:
@misc{li2025chainofagentsendtoendagentfoundation,
title={Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL},
author={Weizhen Li and Jianbo Lin and Zhuosong Jiang and Jingyi Cao and Xinpeng Liu and Jiayu Zhang and Zhenqiang Huang and Qianben Chen and Weichen Sun and Qiexiang Wang and Hongxuan Lu and Tianrui Qin and Chenghao Zhu and Yi Yao and Shuying Fan and Xiaowan Li and Tiannan Wang and Pai Liu and King Zhu and He Zhu and Dingfeng Shi and Piaohong Wang and Yeyi Guan and Xiangru Tang and Minghao Liu and Yuchen Eleanor Jiang and Jian Yang and Jiaheng Liu and Ge Zhang and Wangchunshu Zhou},
year={2025},
eprint={2508.13167},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2508.13167},
}