AFM-WebAgent-7B-rl / README.md
nielsr's picture
nielsr HF Staff
Add comprehensive model card for Chain-of-Agents AFM
0d3dfd3 verified
|
raw
history blame
6.58 kB
metadata
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

This repository contains the model presented in the paper Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL. This work introduces Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enables native end-to-end complex problem-solving in the same way as a multi-agent system (i.e., multi-turn problem solving with multiple tools and multiple agents) within one model. The resulting models are called Agent Foundation Models (AFMs).

Chain-of-Agents Overview

Overview

Recent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks. Chain-of-Agents (CoA) addresses inefficiencies in existing multi-agent systems by enabling end-to-end complex problem-solving within a single model. The model dynamically activates different tool agents and role-playing agents to simulate multi-agent collaboration in an end-to-end fashion. Agent Foundation Models (AFMs) are trained using a multi-agent distillation framework and agentic reinforcement learning, establishing new state-of-the-art performance across diverse benchmarks in both web agent and code agent settings.

Quick Feature Summary

Feature Category Supported Capabilities
Core Paradigm βœ… Chain-of-Agents (CoA) for end-to-end problem-solving
βœ… Single-model simulation of multi-agent collaboration
βœ… Dynamic activation of tool agents and role-playing agents
Training Framework βœ… Multi-Agent Distillation pipeline
βœ… Agentic Reinforcement Learning support
βœ… Mask fine-tuning for selective learning
Agent Capabilities βœ… Web interaction (Web Agent)
βœ… Multi-hop question answering (MHQA Agent)
βœ… Code execution (Code Agent)
Tool Integration βœ… Web search and crawling servers
βœ… Secure code sandbox (via nsjail)
βœ… Configurable multi-tool collaboration
Evaluation βœ… Multi-scenario benchmark testing
βœ… Custom reward model integration

Performance Highlights

AFM achieves state-of-the-art performance across multiple benchmarks in both web agent and code agent settings. For instance, our 32B AFM model reaches an average success rate of 55.3% (Pass@1) on the GAIA benchmark, 11.1% on BrowseComp, 63.0% on WebWalker, and 18.0% on HLE. Test-time scaling further enhances AFM's performance across all benchmarks.

Performance Table

Usage

This model can be loaded and used with the Hugging Face transformers library. Ensure trust_remote_code=True is set for proper functionality, as this model uses custom code.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Replace "PersonalAILab/AFM-32B" with the specific model ID you wish to use,
# e.g., "PersonalAILab/AFM-CodeAgent-32B-rl" or "PersonalAILab/AFM-WebAgent-32B-RL"
model_id = "PersonalAILab/AFM-32B" # Example model ID

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",          # Automatically loads the model across available devices (GPUs/CPU)
    torch_dtype=torch.bfloat16, # Uses bfloat16 for efficient memory and computation
    trust_remote_code=True      # Required for custom modeling and tokenization
)

# Example chat interaction for a multi-hop question answering task
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France and what is its population?"}
]

# Apply chat template and tokenize inputs for proper Qwen2-based chat formatting
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate response
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_k=20,
    top_p=0.8,
    repetition_penalty=1.05
)

# Decode only the newly generated tokens
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

# For more detailed usage, including training, evaluation, and tool integration,
# please refer to the official GitHub repository.

Citation

If you find AFM useful in your research or applications, we would appreciate it if you could cite our work:

@misc{li2025chainofagentsendtoendagentfoundation,
      title={Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL}, 
      author={Weizhen Li and Jianbo Lin and Zhuosong Jiang and Jingyi Cao and Xinpeng Liu and Jiayu Zhang and Zhenqiang Huang and Qianben Chen and Weichen Sun and Qiexiang Wang and Hongxuan Lu and Tianrui Qin and Chenghao Zhu and Yi Yao and Shuying Fan and Xiaowan Li and Tiannan Wang and Pai Liu and King Zhu and He Zhu and Dingfeng Shi and Piaohong Wang and Yeyi Guan and Xiangru Tang and Minghao Liu and Yuchen Eleanor Jiang and Jian Yang and Jiaheng Liu and Ge Zhang and Wangchunshu Zhou},
      year={2025},
      eprint={2508.13167},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.13167}, 
}