shanghengdu commited on
Commit
8844c78
·
verified ·
1 Parent(s): 47cab4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +301 -25
README.md CHANGED
@@ -1,27 +1,303 @@
1
- ---
2
- title: LLM Agent Optimization PaperList
3
- emoji: 🧠
4
- colorFrom: yellow
5
- colorTo: indigo
6
- sdk: static
7
- pinned: false
8
- license: apache-2.0
9
- short_description: This is the reading list for the survey "A Survey on the Opt
10
- ---
11
-
12
- # Nerfies
13
-
14
- This is the repository that contains source code for the [Nerfies website](https://nerfies.github.io).
15
-
16
- If you find Nerfies useful for your work please cite:
17
- ```
18
- @article{park2021nerfies
19
- author = {Park, Keunhong and Sinha, Utkarsh and Barron, Jonathan T. and Bouaziz, Sofien and Goldman, Dan B and Seitz, Steven M. and Martin-Brualla, Ricardo},
20
- title = {Nerfies: Deformable Neural Radiance Fields},
21
- journal = {ICCV},
22
- year = {2021},
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  }
24
  ```
25
-
26
- # Website License
27
- <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
 
1
+ # LLM-Agent-Optimization
2
+ This is the reading list for the survey **"A Survey of LLM-based Agents Optimization" ([Paper Link](https://arxiv.org/abs/2503.12434))**, which systematically explores various optimization techniques for enhancing LLM-based agents. The survey categorizes existing works into parameter-driven optimization, parameter-free optimization, datasets and benchmarks, and real-world applications. We will keep adding papers and improving the list. Any suggestions and PRs are welcome!
3
+
4
+
5
+ <div align="center">
6
+ <img src="https://github.com/user-attachments/assets/7ad2d1e2-17c7-42bc-bcbc-a615209b1a5a" width="50%">
7
+ </div>
8
+
9
+
10
+ # Parameter-driven Optimization
11
+
12
+ ## Conventional Fine-Tuning-based
13
+
14
+ - FireAct : TOWARD LANGUAGE AGENT FINE-TUNING (arXiv 2023) [[paper](https://arxiv.org/pdf/2310.05915)] [[code](https://github.com/anchen1011/FireAct)]
15
+ - AgentTuning: Enabling Generalized Agent Abilities for LLMs (ACL-findings 2024) [[paper](https://arxiv.org/pdf/2310.12823)] [[code](https://github.com/THUDM/AgentTuning)]
16
+ - SMART: Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks (arXiv 2024) [[paper](https://arxiv.org/abs/2407.09893)] [[code](https://github.com/yueshengbin/SMART)]
17
+ - Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models (ACL-findings 2024) [[paper](https://arxiv.org/abs/2403.12881)] [[code](https://github.com/InternLM/Agent-FLAN)]
18
+ - Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk (arXiv 2024) [[paper](https://arxiv.org/abs/2401.05033)]
19
+ - SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales (EMNLP 2024) [[paper](https://arxiv.org/abs/2405.20974)] [[code](https://github.com/tianyang-x/SaySelf)]
20
+ - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments (arXiv 2024) [[paper](https://arxiv.org/abs/2406.04151)] [[code](https://github.com/WooooDyy/AgentGym)]
21
+ - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents (ACL 2024) [[paper](https://aclanthology.org/2024.acl-long.409.pdf)] [[code](https://github.com/Yifan-Song793/ETO)]
22
+ - Agent LUMOS: Unified and Modular Training for Open-Source Language Agents (ACL 2024) [[paper](https://arxiv.org/pdf/2311.05657v3)] [[code](https://allenai.github.io/lumos/)]
23
+ - LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error (ACL 2024) [[paper](https://arxiv.org/abs/2403.04746)] [[code](https://github.com/microsoft/simulated-trial-and-error)]
24
+ - NAT: Learning From Failure: Integrating Negative Examples when Fine-tuning LLMs as Agents (arXiv 2024) [[paper](https://arxiv.org/abs/2402.11651)] [[code](https://github.com/Reason-Wang/NAT)]
25
+ - OPTIMA: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System (arXiv 2024) [[paper](https://arxiv.org/abs/2410.08115)] [[code](https://chenweize1998.github.io/optima-project-page/)]
26
+ - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning (NAACL 2024) [[paper](https://aclanthology.org/2024.findings-naacl.184/)] [[code](https://github.com/HAIV-Lab/LLM-TMBR)]
27
+ - AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2402.15506)] [[code](https://github.com/SalesforceAIResearch/xLAM)]
28
+ - TORA: A TOOL-INTEGRATED REASONING AGENT FOR MATHEMATICAL PROBLEM SOLVING (ICLR 2024) [[paper](https://arxiv.org/abs/2309.17452)] [[code](https://github.com/microsoft/tora)]
29
+ - ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent (arxiv 2023) [[paper](https://arxiv.org/pdf/2312.10003)]
30
+ - AGENTBANK: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories (ACL-Findings 2024) [[paper](https://arxiv.org/abs/2410.07706)] [[code](https://huggingface.co/datasets/Solaris99/AgentBank)]
31
+ - ADASWITCH: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning (EMNLP 2024) [[paper](https://arxiv.org/abs/2410.13181)]
32
+ - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.11176)] [[code](https://github.com/WeiminXiong/IPR)]
33
+ - Re-ReST: Reflection-Reinforced Self-Training for Language Agents (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.01495)] [[code](https://github.com/PlusLabNLP/Re-ReST)]
34
+ - Retrospex: Language Agent Meets Offline Reinforcement Learning Critic (EMNLP 2024) [[paper](https://aclanthology.org/2024.emnlp-main.268/)] [[code](https://github.com/Yufei-Xiang/Retrospex)]
35
+ - ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator (EMNLP 2024) [[paper](https://arxiv.org/abs/2405.18111)] [[code](https://github.com/chuhac/ATM-RAG)]
36
+ - SWIFTSAGE: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks (NeurIPS 2023) [[paper](https://arxiv.org/abs/2305.17390)] [[code](https://github.com/SwiftSage/SwiftSage)]
37
+ - NLRL: Natural Language Reinforcement Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2411.14251)] [[code](https://github.com/waterhorse1/Natural-language-RL)]
38
+ - AGILE: A Novel Reinforcement Learning Framework of LLM Agents (NeurIPS 2024) [[paper](https://arxiv.org/abs/2405.14751)] [[code](https://github.com/bytarnish/AGILE)]
39
+ - COEVOL: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation (arXiv 2024) [[paper](https://arxiv.org/abs/2406.07054)] [[code](https://github.com/lirenhao1997/CoEvol)]
40
+ - E2CL: Exploration-based Error Correction Learning for Embodied Agents (EMNLP-Findings 2024) [[paper](https://aclanthology.org/2024.findings-emnlp.448/)][[code](https://github.com/WangHanLinHenry/E2CL)]
41
+ - STeCa: Step-level Trajectory Calibration for LLM Agent Learning (arXiv 2025) [[paper](https://arxiv.org/abs/2502.14276)][[code](https://github.com/WangHanLinHenry/STeCa)]
42
+ - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2411.14497)] [[code](https://github.com/CANGLETIAN/Star-Agents)]
43
+ - ATLaS: Agent Tuning via Learning Critical Steps (**ACL 2025**) [[paper](https://arxiv.org/abs/2503.02197)]
44
+ - Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2502.02533)]
45
+ - Agent Planning with World Knowledge Model (**NeurIPS 2024**) [[paper](https://arxiv.org/pdf/2405.14205)] [[code](https://github.com/zjunlp/WKM)]
46
+ - MULTIAGENT FINETUNING: SELF IMPROVEMENT WITH DIVERSE REASONING CHAINS (**ICLR 2025**) [[paper](https://arxiv.org/abs/2501.05707)] [[code](https://github.com/vsubramaniam851/multiagent-ft)]
47
+ - Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning (**ACL 2025**) [[paper](https://arxiv.org/pdf/2412.14780)]
48
+ -
49
+ ## Reinforcement Learning-based
50
+
51
+ - CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models (arXiv 2024) [[paper](https://arxiv.org/pdf/2404.01663)] [[code](https://github.com/heimy2000/CMAT)]
52
+ - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2411.03817)]
53
+ - WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2411.02337)] [[code](https://github.com/THUDM/WebRL)]
54
+ - SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales (EMNLP 2024) [[paper](https://arxiv.org/abs/2405.20974)] [[code](https://github.com/tianyang-x/SaySelf)]
55
+ - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments (arXiv 2024) [[paper](https://arxiv.org/abs/2406.04151)] [[code](https://github.com/WooooDyy/AgentGym)]
56
+ - Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2410.06101)]
57
+ - GELI: Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents (EMNLP 2024) [[paper](https://aclanthology.org/2024.emnlp-main.881/)]
58
+ - AGILE: A Novel Reinforcement Learning Framework of LLM Agents (NeurIPS 2024) [[paper](https://arxiv.org/abs/2405.14751)] [[code](https://github.com/bytarnish/AGILE)]
59
+ - Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents (arxiv) [[paper](https://arxiv.org/abs/2408.07199)]
60
+ - DMPO: Direct Multi-Turn Preference Optimization for Language Agents (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.14868)] [[code](https://github.com/swt-user/DMPO)]
61
+ - Re-ReST: Reflection-Reinforced Self-Training for Language Agents (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.01495)] [[code](https://github.com/PlusLabNLP/Re-ReST)]
62
+ - ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator (EMNLP 2024) [[paper](https://arxiv.org/abs/2405.18111)] [[code](https://github.com/chuhac/ATM-RAG)]
63
+ - OPTIMA: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System (arXiv 2024) [[paper](https://arxiv.org/abs/2410.08115)] [[code](https://chenweize1998.github.io/optima-project-page/)]
64
+ - EPO: Hierarchical LLM Agents with Environment Preference Optimization (EMNLP 2024) [[paper](https://arxiv.org/abs/2408.16090)] [[code](https://github.com/kevinz8866/EPO)]
65
+ - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.11176)] [[code](https://github.com/WeiminXiong/IPR)]
66
+ - AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback (NeurIPS 2024) [[paper](https://arxiv.org/abs/2402.01469)] [[code](https://github.com/JianGuanTHU/AMOR)]
67
+ - SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2503.15478)] [[code](https://github.com/facebookresearch/sweet_rl)]
68
+ - Reinforcing Language Agents via Policy Optimization with Action Decomposition (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2405.15821)] [[code](https://github.com/morning9393/ADRL)]
69
+ - STeCa: Step-level Trajectory Calibration for LLM Agent Learning (arXiv 2025) [[paper](https://arxiv.org/abs/2502.14276)][[code](https://github.com/WangHanLinHenry/STeCa)]
70
+ - DITS: Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search (**arXiv 2025**) [[paper](https://arxiv.org/abs/2502.00955)]
71
+ - EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning (**ACL 2025**) [[paper](https://arxiv.org/abs/2502.12486)] [[code](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/EPO)]
72
+ - DAPO: Decoupled Clip and Dynamic Sampling Policy Optimization (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2503.14476)]
73
+ - MARFT: Multi-Agent Reinforcement Fine-Tuning (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2504.16129)] [[code](https://github.com/jwliao-ai/MARFT)]
74
+
75
+ ## Hybrid Fine-Tuning Optimization
76
+
77
+ - ReFT: Reasoning with Reinforced Fine-Tuning (ACL 2024) [[paper](https://arxiv.org/abs/2401.08967)] [[code](https://github.com/lqtrung1998/mwp_ReFT)]
78
+ - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments (arXiv 2024) [[paper](https://arxiv.org/abs/2406.04151)] [[code](https://github.com/WooooDyy/AgentGym)]
79
+ - AGILE: A Novel Reinforcement Learning Framework of LLM Agents (NeurIPS 2024) [[paper](https://arxiv.org/abs/2405.14751)] [[code](https://github.com/bytarnish/AGILE)]
80
+ - Re-ReST: Reflection-Reinforced Self-Training for Language Agents (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.01495)] [[code](https://github.com/PlusLabNLP/Re-ReST)]
81
+ - AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback (NeurIPS 2024) [[paper](https://arxiv.org/abs/2402.01469)] [[code](https://github.com/JianGuanTHU/AMOR)]
82
+ - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents (ACL 2024) [[paper](https://aclanthology.org/2024.acl-long.409.pdf)] [[code](https://github.com/Yifan-Song793/ETO)]
83
+ - OPTIMA: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System (arXiv 2024) [[paper](https://arxiv.org/abs/2410.08115)] [[code](https://chenweize1998.github.io/optima-project-page/)]
84
+ - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.11176)] [[code](https://github.com/WeiminXiong/IPR)]
85
+ - Retrospex: Language Agent Meets Offline Reinforcement Learning Critic (EMNLP 2024) [[paper](https://aclanthology.org/2024.emnlp-main.268/)] [[code](https://github.com/Yufei-Xiang/Retrospex)]
86
+ - ENVISION:Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models (arXiv 2024) [[paper](https://arxiv.org/abs/2406.11736)] [[code](https://github.com/xufangzhi/ENVISIONS)]
87
+ - DITS: Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search (**arXiv 2025**) [[paper](https://arxiv.org/abs/2502.00955)]
88
+ -
89
+ # Parameter-Free Optimization
90
+
91
+ ## Experience-based
92
+
93
+ - Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks (NeurIPS 2024) [[paper](https://arxiv.org/abs/2408.03615)] [[code](https://cybertronagent.github.io/Optimus-1.github.io/)]
94
+ - Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents (arXiv 2024) [[paper](https://arxiv.org/abs/2405.02957)]
95
+ - ExpeL: LLM Agents Are Experiential Learners (AAAI 2024) [[paper](https://arxiv.org/abs/2308.10144)] [[code](https://github.com/LeapLabTHU/ExpeL)]
96
+ - AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning (NeurIPS 2024) [[paper](https://arxiv.org/abs/2405.16247)] [[code](https://github.com/minghchen/automanual)]
97
+ - AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents (NeurIPS 2024) [[paper](https://arxiv.org/abs/2403.08978)]
98
+ - Experiential Co-Learning of Software-Developing Agents (ACL 2024) [[paper](https://arxiv.org/abs/2312.17025)] [[code](https://github.com/OpenBMB/ChatDev)]
99
+
100
+ ## Feedback-based
101
+
102
+ - Reflexion: Language Agents with Verbal Reinforcement Learning (NeurIPS 2023) [[paper](https://arxiv.org/pdf/2303.11366)] [[code](https://github.com/noahshinn/reflexion)]
103
+ - QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction (ACL 2024) [[paper](https://arxiv.org/abs/2403.11886)] [[code](https://github.com/cdhx/QueryAgent)]
104
+ - Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization (ACL 2024) [[paper](https://arxiv.org/abs/2402.17574)] [[code](https://github.com/zwq2018/Agent-Pro)]
105
+ - SAGE: Self-Evolving Agents with Reflective and Memory-Augmented Abilities (arXiv 2024) [[paper](https://arxiv.org/abs/2409.00872)]
106
+ - ReCon: Boosting LLM Agents with Recursive Contemplation for Effective Deception Handling (ACL-findings 2024) [[paper](https://openreview.net/pdf?id=LO-NO1-PwJR)]
107
+ - Symbolic Learning Enables Self-Evolving Agents (arXiv 2024) [[paper](https://arxiv.org/abs/2406.18532)] [[code](https://github.com/aiwaves-cn/agents)]
108
+ - COPPR:Reflective Multi-Agent Collaboration based on Large Language Models (NeurIPS 2024) [[paper](https://neurips.cc/virtual/2024/poster/93147)]
109
+ - METAREFLECTION: Learning Instructions for Language Agents using Past Reflections (EMNLP 2024) [[paper](https://arxiv.org/abs/2405.13009)] [[code](https://github.com/microsoft/prose/tree/main/misc/MetaReflection)]
110
+ - InteRecAgent: Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations (arXiv 2023) [[paper](https://arxiv.org/abs/2308.16505)] [[code](https://github.com/microsoft/RecAI/tree/main/InteRecAgent)]
111
+ - NLRL: Natural Language Reinforcement Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2411.14251)] [[code](https://github.com/waterhorse1/Natural-language-RL)]
112
+ - Chain-of-Experts: When LLMs Meet Complex Operation Research Problems (ICLR 2024) [[paper](https://openreview.net/forum?id=HobyL1B9CZ)] [[code](https://github.com/xzymustbexzy/Chain-of-Experts)]
113
+ - Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization (arXiv 2024) [[paper](https://arxiv.org/abs/2308.02151)]
114
+ - SELF-TUNING: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching (arXiv 2024) [[paper](https://arxiv.org/pdf/2406.06326)]
115
+ - OPRO: LARGE LANGUAGE MODELS AS OPTIMIZERS (ICLR 2024) [[paper](https://arxiv.org/abs/2309.03409)] [[code](https://github.com/google-deepmind/opro)]
116
+ - MPO: Boosting LLM Agents with Meta Plan Optimization (**arXiv 2025**) [[paper](https://arxiv.org/abs/2503.02682)]
117
+
118
+ ## Tool-based
119
+
120
+ - Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (**EMNLP 2024**) [[paper](https://aclanthology.org/2024.emnlp-main.436/)] [[code](https://github.com/OSU-NLP-Group/Middleware)]
121
+ - AVATAR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2406.11200)] [[code](https://github.com/zou-group/avatar)]
122
+ - AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning (**ACL** **2024**) [[paper](https://arxiv.org/abs/2401.05268)] [[code](https://github.com/zjunlp/AutoAct)]
123
+ - TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2308.03427)]
124
+ - Lyra: Orchestrating Dual Correction in Automated Theorem Proving (**TMLR 2024**) [[paper](https://arxiv.org/abs/2309.15806)] [[code](https://github.com/chuanyang-zheng/lyra-theorem-prover)]
125
+ - Offline Training of Language Model Agents with Functions as Learnable Weights (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2402.11359)]
126
+ - VideoAgent: A Memory-Augmented Multimodal Agent for Video Understanding (**ECCV 2024**) [[paper](https://arxiv.org/pdf/2403.11481)] [[code](https://videoagent.github.io)]
127
+ - ALITA: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution (**arXiv 2025**) [[paper](https://arxiv.org/abs/2505.20286)] [[code](https://github.com/CharlesQ9/Alita)]
128
+ - Search-o1: Agentic Search-Enhanced Large Reasoning Models (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2501.05366)] [[code](https://github.com/sunnynexus/Search-o1)]
129
+
130
+ ## RAG-based
131
+
132
+ - Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs (**EMNLP 2024**) [[paper](https://arxiv.org/abs/2409.19401)]
133
+ - RaDA: Retrieval-augmented Web Agent Planning with LLMs (**ACL** **2024-findings**) [[paper](https://aclanthology.org/2024.findings-acl.802/)] [[code](https://github.com/ldilab/RaDA)]
134
+ - AutoRAG: Automated Framework for Optimization of Retrieval Augmented Generation Pipeline (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2410.20878)] [[code](https://github.com/Marker-Inc-Korea/AutoRAG_ARAGOG_Paper)]
135
+ - RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2402.03610)] [[code](https://github.com/PanasonicConnect/rap)]
136
+ - MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2408.01869v1)] [[code](https://github.com/jihyechoi77/malade)]
137
+ - PaperQA: Retrieval-Augmented Generative Agent for Scientific Research (**arXiv 2023**) [[paper](https://arxiv.org/abs/2312.07559)] [[code](https://github.com/future-house/paper-qa)]
138
+ - Search-o1: Agentic Search-Enhanced Large Reasoning Models (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2501.05366)] [[code](https://github.com/sunnynexus/Search-o1)]
139
+
140
+ ## Multi-Agent
141
+
142
+ - CAPO: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation (arXiv 2024) [[paper](https://arxiv.org/abs/2411.04679)]
143
+ - A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops (arXiv 2024) [[paper](https://arxiv.org/abs/2412.17149)] [[code](https://anonymous.4open.science/r/evolver-1D11/)]
144
+ - Training Agents with Weakly Supervised Feedback from Large Language Models (arXiv 2024) [[paper](https://arxiv.org/abs/2411.19547)]
145
+ - Chatdev: Communicative Agents for Software Development (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2307.07924)] [[code](https://github.com/OpenBMB/ChatDev)]
146
+ - MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2308.00352)] [[code](https://github.com/geekan/MetaGPT)]
147
+ - MapCoder: Multi-Agent Code Generation for Competitive Problem Solving (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2405.11403)] [[code](https://github.com/Md-Ashraful-Pramanik/MapCoder)]
148
+ - A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration (**COLM 2024**) [[paper](https://arxiv.org/abs/2310.02170)] [[code](https://github.com/SALT-NLP/DyLAN)]
149
+ - Scaling Large-Language-Model-based Multi-Agent Collaboration (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2406.07155)] [[code](https://github.com/OpenBMB/ChatDev)]
150
+ - AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2308.10848)] [[code](https://github.com/OpenBMB/AgentVerse)]
151
+ - SMoA: Improving Multi-Agent Large Language Models with Sparse Mixture-of-Agents (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2411.03284)] [[code](https://github.com/David-Li0406/SMoA)]
152
+ - Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate (**EMNLP 2024**) [[paper](https://arxiv.org/abs/2305.19118)] [[code](https://github.com/Skytliang/Multi-Agents-Debate)]
153
+ - AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2308.08155)] [[code](https://github.com/microsoft/autogen)]
154
+
155
+ # Datasets and Benchmarks
156
+
157
+ ## Datasets and Benchmarks for Evaluation
158
+
159
+ ### General Evaluation Tasks
160
+
161
+ - WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents (NeurIPS 2022) [[paper](https://arxiv.org/abs/2207.01206)] [[code](https://webshop-pnlp.github.io/)]
162
+ - WebArena: A Realistic Web Environment for Building Autonomous Agents (arXiv 2024) [[paper](https://arxiv.org/abs/2307.13854)] [[code](https://webarena.dev/)]
163
+ - Mind2Web: Towards a Generalist Agent for the Web (NeurIPS 2023 Spotlight) [[paper](https://arxiv.org/abs/2306.06070)] [[code](https://osu-nlp-group.github.io/Mind2Web/)]
164
+ - Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration (ICLR 2018) [[paper](https://arxiv.org/abs/1802.08802)] [[code](https://github.com/Farama-Foundation/miniwob-plusplus)]
165
+ - ScienceWorld: Is your Agent Smarter than a 5th Grader? (EMNLP 2022) [[paper](https://aclanthology.org/2022.emnlp-main.775/)] [[code](https://sciworld.apps.allenai.org/)]
166
+ - ALFWorld: Aligning Text and Embodied Environments for Interactive Learning (ICLR 2021) [[paper](https://arxiv.org/abs/2010.03768)] [[code](https://alfworld.github.io/)]
167
+ - Building Cooperative Embodied Agents Modularly with Large Language Models (ICLR 2024) [[paper](https://arxiv.org/abs/2307.02485)] [[code](https://vis-www.cs.umass.edu/Co-LLM-Agents/)]
168
+ - ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks (CVPR 2020) [[paper](https://arxiv.org/abs/1912.01734)] [[code](https://askforalfred.com/)]
169
+ - RLCard: A Toolkit for Reinforcement Learning in Card Games (AAAI-Workshop 2020) [[paper](https://arxiv.org/abs/1910.04376)] [[code](https://github.com/datamllab/rlcard)]
170
+ - OpenSpiel: A Framework for Reinforcement Learning in Games (arXiv 2019) [[paper](https://arxiv.org/abs/1908.09453)] [[code](https://github.com/google-deepmind/open_spiel)]
171
+ - HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (EMNLP 2018) [[paper](https://arxiv.org/abs/1809.09600)] [[code](https://hotpotqa.github.io/)]
172
+ - StrategyQA: Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies (TACL 2021) [[paper](https://arxiv.org/abs/2101.02235)] [[code](https://github.com/eladsegal/strategyqa)]
173
+ - mmlu:Measuring Massive Multitask Language Understanding (ICLR 2021) [[paper](https://arxiv.org/abs/2009.03300)] [[code](https://github.com/hendrycks/test)]
174
+ - TruthfulQA: Measuring How Models Mimic Human Falsehoods (ACL 2022) [[paper](https://arxiv.org/abs/2109.07958)] [[code](https://github.com/sylinrl/TruthfulQA)]
175
+ - TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension (ACL 2017) [[paper](https://aclanthology.org/P17-1147/)]
176
+ - PubMedQA: A Dataset for Biomedical Research Question Answering (EMNLP 2019) [[paper](https://arxiv.org/abs/1909.06146)] [[code](https://pubmedqa.github.io/)]
177
+ - MuSiQue: Multihop Questions via Single-hop Question Composition (TACL 2022) [[paper](https://arxiv.org/abs/2108.00573)] [[code](https://github.com/stonybrooknlp/musique)]
178
+ - Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps (COLING 2020) [[paper](https://arxiv.org/abs/2011.01060)] [[code](https://github.com/Alab-NII/2wikimultihop)]
179
+ - A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers (NAACL 2021) [[paper](https://arxiv.org/abs/2105.03011)] [[code](https://huggingface.co/datasets/allenai/qasper)]
180
+ - Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge (arXiv 2018) [[paper](https://arxiv.org/abs/1803.05457)] [[code](https://huggingface.co/datasets/allenai/ai2_arc)]
181
+ - Training Verifiers to Solve Math Word Problems (arXiv 2021) [[[paper](https://arxiv.org/abs/2110.14168)] [[code](https://openai.com/index/solving-math-word-problems/)]
182
+ - A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers (ACL 2020) [[paper](https://arxiv.org/abs/2106.15772)] [[code](https://github.com/chaochun/nlu-asdiv-dataset)]
183
+ - mwp:Are NLP Models Really Able to Solve Simple Math Word Problems? (NAACL 2021) [[paper](https://arxiv.org/abs/2103.07191)] [[code](https://github.com/arkilpatel/SVAMP)]
184
+ - Measuring Mathematical Problem Solving with the MATH Dataset (NeurIPS 2021) [[paper](https://arxiv.org/abs/2103.03874)] [[code](https://github.com/hendrycks/math/)]
185
+ - T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step (ACL 2024) [[paper](https://arxiv.org/abs/2312.14033)] [[code](https://open-compass.github.io/T-Eval/)]
186
+ - ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (ICLR 2024) [[paper](https://arxiv.org/abs/2307.16789v2)] [[code](https://github.com/OpenBMB/ToolBench)]
187
+ - MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback (ICLR 2024) [[paper](https://arxiv.org/abs/2309.10691)] [[code](https://xwang.dev/mint-bench/)]
188
+ - API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs (EMNLP 2023) [[paper](https://arxiv.org/abs/2304.08244)] [[code](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/api-bank)]
189
+ - A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge (ECCV 2022) [[paper](https://arxiv.org/abs/2206.01718)] [[code](https://github.com/allenai/aokvqa)]
190
+ - Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering (NeurIPS 2022) [[paper](https://arxiv.org/abs/2209.09513)] [[code](https://scienceqa.github.io/)]
191
+ - VQA: Visual Question Answering (ICCV 2015) [[paper](https://arxiv.org/abs/1505.00468)] [[code](https://visualqa.org/)]
192
+ - EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding (NeurIPS 2023) [[paper](https://arxiv.org/abs/2308.09126)] [[code](https://egoschema.github.io/)]
193
+ - NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR 2021) [[paper](https://arxiv.org/abs/2105.08276)] [[code](https://github.com/doc-doc/NExT-QA)]
194
+ - SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (ICLR 2024) [[paper](https://arxiv.org/abs/2310.06770)] [[code](https://github.com/swe-bench/SWE-bench)]
195
+ - Evaluating Large Language Models Trained on Code (arXiv 2021) [[paper](https://arxiv.org/abs/2107.03374)] [[code](https://github.com/openai/human-eval)]
196
+ - LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code (arXiv 2024) [[paper](https://arxiv.org/abs/2403.07974v2)] [[code](https://github.com/LiveCodeBench/LiveCodeBench)]
197
+ - Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs (NeurIPS 2023) [[paper](https://arxiv.org/abs/2305.03111)] [[code](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/bird)]
198
+ - InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback (NeurIPS 2023) [[paper](https://arxiv.org/abs/2306.14898)] [[code](https://github.com/princeton-nlp/intercode)]
199
+
200
+ ### Multi-task Benchmarks
201
+
202
+ - AgentBench: Evaluating LLMs as Agents (**ICLR** **2024**) [[paper](https://arxiv.org/abs/2308.03688)] [[code](https://github.com/THUDM/AgentBench)]
203
+ - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2406.04151)] [[code](https://github.com/WooooDyy/AgentGym)]
204
+ - Just-Eval: The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning (**ICLR** **2024**) [[paper](https://arxiv.org/abs/2312.01552)] [[code](https://github.com/Re-Align/just-eval)]
205
+ - StreamBench: Towards Benchmarking Continuous Improvement of Language Agents (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2406.08747)] [[code](https://github.com/stream-bench/stream-bench)]
206
+ - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agent (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2401.13178)] [[code](https://github.com/hkust-nlp/AgentBoard)]
207
+ - GAIA: A Benchmark for General AI Assistants (**arXiv 2023**) [[paper](https://arxiv.org/abs/2311.12983)] [[code](https://huggingface.co/gaia-benchmark)]
208
+ - Humanity’s Last Exam (**arXiv 2025**) [[paper](https://arxiv.org/abs/2501.14249)] [[code](https://huggingface.co/datasets/cais/hle)]
209
+ - NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls (**arXiv 2024**) [[paper](https://arxiv.org/abs/2409.03797v3)] [[code](https://github.com/IBM/NESTFUL)]
210
+ - MCP_RADAR: Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2505.16700)] [[code](https://anonymous.4open.science/r/MCPRadar-B143)]
211
+
212
+ # Application
213
+
214
+ ## Healthcare
215
+
216
+ - Med-PaLM: Large language models encode clinical knowledge (**Nature 2023**) [[paper](https://arxiv.org/abs/2212.13138)]
217
+ - DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2304.01097)] [[code](https://github.com/xionghonglin/DoctorGLM)]
218
+ - BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2310.15896)] [[code](https://github.com/scutcyr/BianQue)]
219
+ - DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2308.14346)] [[code](https://github.com/FudanDISC/DISC-MedLLM)]
220
+ - ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning (**BCB 2024**) [[paper](https://dl.acm.org/doi/abs/10.1145/3698587.3701359)] [[code](https://github.com/LeoYML/clinical-agent)]
221
+ - MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (**ACL-findings 2024**) [[paper](https://aclanthology.org/2024.findings-acl.33/)] [[code](https://github.com/gersteinlab/MedAgents)]
222
+ - MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2404.15155)] [[code](https://github.com/mitmedialab/MDAgents)]
223
+ - Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2405.02957)]
224
+ - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator (**COLING 2025**) [[paper](https://arxiv.org/abs/2402.09742)] [[code](https://github.com/LibertFan/AI_Hospital)]
225
+ - KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis (**AAAI-25 Bridge Program**) [[paper](https://arxiv.org/abs/2412.16833)]
226
+ - AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2402.13225)] [[code](https://github.com/ncbi-nlp/Clinical-Tool-Learning)]
227
+ - MMedAgent: Learning to Use Medical Tools with Multi-modal Agent (**EMNLP-findings 2024**) [[paper](https://arxiv.org/abs/2407.02483)] [[code](https://github.com/Wangyixinxin/MMedAgent)]
228
+ - HuatuoGPT-o1: Towards Medical Complex Reasoning with LLMs (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2412.18925)] [[code](https://github.com/FreedomIntelligence/HuatuoGPT-o1)]
229
+ - IIMedGPT: Promoting Large Language Model Capabilities of Medical Tasks by Efficient Human Preference Alignment (**arXiv** **2025**) [[paper](https://arxiv.org/abs/2501.02869)]
230
+
231
+ ## Science
232
+
233
+ - CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2407.09811)] [[code](https://github.com/lsq2wal/CellAgent)]
234
+ - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2405.17631)] [[code](https://github.com/snap-stanford/BioDiscoveryAgent/)]
235
+ - ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning (**Digital Discovery 2024**) [[paper](https://arxiv.org/abs/2402.04268)] [[code](https://github.com/lamm-mit/ProtAgents)]
236
+ - CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2404.18021)] [[code](https://github.com/cong-lab/crispr-gpt-pub)]
237
+ - ChemCrow: Augmenting Large-Language Models with Chemistry Tools (**Nature** **Machine Intelligence** **2024**) [[paper](https://arxiv.org/abs/2304.05376)] [[code](https://github.com/ur-whitelab/chemcrow-public)]
238
+ - DrugAssist: A Large Language Model for Molecule Optimization (**Briefings in Bioinformatics 2024**) [[paper](https://arxiv.org/abs/2401.10334)] [[code](https://github.com/blazerye/DrugAssist)]
239
+ - Agent-based Learning of Materials Datasets from Scientific Literature (**Digital Discovery 2024**) [[paper](https://arxiv.org/abs/2312.11690)] [[code](https://github.com/AI4ChemS/Eunomia)]
240
+ - DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2411.15692)] [[code](https://github.com/anrohanro/DrugAgent)]
241
+ - MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization (**SC 2024**) [[paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00013)]
242
+ - Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2410.09403)] [[code](https://github.com/open-sciencelab/Virtual-Scientists)]
243
+ - SciAgent: Tool-augmented Language Models for Scientific Reasoning (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2402.11451)]
244
+
245
+ ## Embodied Intelligence
246
+
247
+ - Building Cooperative Embodied Agents Modularly with Large Language Models (**ICLR** **2024**) [[paper](https://arxiv.org/abs/2307.02485)] [[code](https://vis-www.cs.umass.edu/Co-LLM-Agents/)]
248
+ - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (**PMLR 2023**) [[paper](https://arxiv.org/abs/2204.01691)] [[code](https://github.com/google-research/google-research/tree/master/saycan)]
249
+ - RoCo: Dialectic Multi-Robot Collaboration with Large Language Models (**ICRA 2024**) [[paper](https://arxiv.org/abs/2307.04738)] [[code](https://github.com/MandiZhao/robot-collab)]
250
+ - Voyager: An Open-Ended Embodied Agent with Large Language Models (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2305.16291)] [[code](https://github.com/MineDojo/Voyager)]
251
+ - MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World (**CVPR** **2024**) [[paper](https://arxiv.org/abs/2401.08577)] [[code](https://github.com/UMass-Foundation-Model/MultiPLY)]
252
+ - Retrospex: Language Agent Meets Offline Reinforcement Learning Critic (**EMNLP 2024**) [[paper](https://aclanthology.org/2024.emnlp-main.268/)] [[code](https://github.com/Yufei-Xiang/Retrospex)]
253
+ - EPO: Hierarchical LLM Agents with Environment Preference Optimization (**EMNLP 2024**) [[paper](https://arxiv.org/abs/2408.16090)] [[code](https://github.com/kevinz8866/EPO)]
254
+ - AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2405.16247)] [[code](https://github.com/minghchen/automanual)]
255
+ - MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making (**EMNLP 2024**) [[paper](https://arxiv.org/abs/2409.16686)]
256
+ - iVideoGPT: Interactive VideoGPTs are Scalable World Models (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2405.15223)] [[code](https://github.com/thuml/iVideoGPT)]
257
+ - AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2401.12963)]
258
+ - Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld (**CVPR** **2024**) [[paper](https://arxiv.org/abs/2311.16714)] [[code](https://github.com/stevenyangyj/Emma-Alfworld)]
259
+
260
+ ## Finance
261
+
262
+ - Large Language Model Agent in Financial Trading: A Survey (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2408.06361)]
263
+ - TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2309.03736)]
264
+ - FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design (**AAAI-SS**) [[paper](https://arxiv.org/abs/2311.13743)] [[code](https://github.com/pipiku915/FinMem-LLM-StockTrading)]
265
+ - A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist (**SIGKDD 2024**) [[paper](https://arxiv.org/abs/2402.18485)]
266
+ - Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models (**WWW 2024**) [[paper](https://arxiv.org/abs/2402.03659)] [[code](https://github.com/koa-fin/sep)]
267
+ - FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2407.06567)]
268
+ - TradingAgents: Multi-Agents LLM Financial Trading Framework (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2412.20138)] [[code](https://github.com/TradingAgents-AI/TradingAgents)]
269
+ - FinVision: A Multi-Agent Framework for Stock Market Prediction (**ICAIF 2024**) [[paper](https://arxiv.org/abs/2411.08899)]
270
+ - Simulating Financial Market via Large Language Model-based Agents (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2406.19966)]
271
+ - FinVerse: An Autonomous Agent System for Versatile Financial Analysis (**arXiv 2024**) [[paper](https://arxiv.org/abs/2406.06379)]
272
+ - FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2405.14767)] [[code](https://github.com/AI4Finance-Foundation/FinRobot)]
273
+
274
+ ## Programming
275
+
276
+ - Agents in Software Engineering: Survey, Landscape, and Vision (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2409.09030)] [[code](https://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE)]
277
+ - Large Language Model-Based Agents for Software Engineering: A Survey (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2409.02977)] [[code](https://github.com/FudanSELab/Agent4SE-Paper-List)]
278
+ - Chatdev: Communicative Agents for Software Development (**ACL 2024**) [[paper](https://arxiv.org/abs/2307.07924)] [[code](https://github.com/OpenBMB/ChatDev)]
279
+ - MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (**ICLR** **2024**) [[paper](https://arxiv.org/abs/2308.00352)] [[code](https://github.com/geekan/MetaGPT)]
280
+ - MapCoder: Multi-Agent Code Generation for Competitive Problem Solving (**ACL** **2024**) [[paper](https://arxiv.org/abs/2405.11403)] [[code](https://github.com/Md-Ashraful-Pramanik/MapCoder)]
281
+ - Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2404.02183)] [[code](https://github.com/tsukushiAI/self-organized-agent)]
282
+ - Multi-Agent Software Development through Cross-Team Collaboration (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2406.08979)] [[code](https://github.com/OpenBMB/ChatDev)]
283
+ - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering (**NeurIPS 2024**) [[paper](http://arxiv.org/abs/2405.15793)] [[code](https://swe-agent.com/latest/)]
284
+ - CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges (**ACL** **2024**) [[paper](https://arxiv.org/abs/2401.07339)]
285
+ - AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2312.13010)] [[code](https://github.com/huangd1999/AgentCoder)]
286
+ - RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2410.02089)]
287
+ - Lemur: Harmonizing Natural Language and Code for Language Agents (**ICLR** **2024**) [[paper](https://arxiv.org/abs/2310.06830)] [[code](https://github.com/OpenLemur/Lemur)]
288
+ - AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology (**FORGE 2025**) [[paper](https://arxiv.org/abs/2406.11912)] [[code](https://github.com/FSoft-AI4Code/AgileCoder)]
289
+
290
+ ## 📄 Citation
291
+
292
+ If you find this project or the related paper helpful, please consider citing our work:
293
+
294
+ **A Survey on the Optimization of Large Language Model-based Agents** 📚 [arXiv:2503.12434](https://arxiv.org/abs/2503.12434)
295
+
296
+ ```bibtex
297
+ @article{du2025survey,
298
+ title={A Survey on the Optimization of Large Language Model-based Agents},
299
+ author={Du, Shangheng and Zhao, Jiabao and Shi, Jinxin and Xie, Zhentao and Jiang, Xin and Bai, Yanhong and He, Liang},
300
+ journal={arXiv preprint arXiv:2503.12434},
301
+ year={2025}
302
  }
303
  ```