Spaces:

shanghengdu
/

LLM-Agent-Optimization-PaperList

Running

App Files Files Community

shanghengdu commited on Jul 6

Commit

8844c78

verified ·

1 Parent(s): 47cab4e

Update README.md

Browse files

Files changed (1) hide show

README.md +301 -25

README.md CHANGED Viewed

@@ -1,27 +1,303 @@
----
-title: LLM Agent Optimization PaperList
-emoji: 🧠
-colorFrom: yellow
-colorTo: indigo
-sdk: static
-pinned: false
-license: apache-2.0
-short_description: This is the reading list for the survey "A Survey on the Opt
----
-# Nerfies
-This is the repository that contains source code for the [Nerfies website](https://nerfies.github.io).
-If you find Nerfies useful for your work please cite:
-```
-@article{park2021nerfies
-  author    = {Park, Keunhong and Sinha, Utkarsh and Barron, Jonathan T. and Bouaziz, Sofien and Goldman, Dan B and Seitz, Steven M. and Martin-Brualla, Ricardo},
-  title     = {Nerfies: Deformable Neural Radiance Fields},
-  journal   = {ICCV},
-  year      = {2021},
 }
 ```
-# Website License
-<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.

+# LLM-Agent-Optimization
+This is the reading list for the survey **"A Survey of LLM-based Agents Optimization" ([Paper Link](https://arxiv.org/abs/2503.12434))**, which systematically explores various optimization techniques for enhancing LLM-based agents. The survey categorizes existing works into parameter-driven optimization, parameter-free optimization, datasets and benchmarks, and real-world applications. We will keep adding papers and improving the list. Any suggestions and PRs are welcome!
+<div align="center">
+  <img src="https://github.com/user-attachments/assets/7ad2d1e2-17c7-42bc-bcbc-a615209b1a5a" width="50%">
+</div>
+# Parameter-driven Optimization
+## Conventional Fine-Tuning-based
+- FireAct : TOWARD LANGUAGE AGENT FINE-TUNING  (arXiv 2023) [[paper](https://arxiv.org/pdf/2310.05915)] [[code](https://github.com/anchen1011/FireAct)]
+- AgentTuning: Enabling Generalized Agent Abilities for LLMs  (ACL-findings 2024) [[paper](https://arxiv.org/pdf/2310.12823)] [[code](https://github.com/THUDM/AgentTuning)]
+- SMART: Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks (arXiv 2024) [[paper](https://arxiv.org/abs/2407.09893)] [[code](https://github.com/yueshengbin/SMART)]
+- Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models (ACL-findings 2024) [[paper](https://arxiv.org/abs/2403.12881)] [[code](https://github.com/InternLM/Agent-FLAN)]
+- Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk (arXiv 2024) [[paper](https://arxiv.org/abs/2401.05033)]
+- SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales (EMNLP 2024) [[paper](https://arxiv.org/abs/2405.20974)] [[code](https://github.com/tianyang-x/SaySelf)]
+- AgentGym: Evolving Large Language Model-based Agents across Diverse Environments (arXiv 2024) [[paper](https://arxiv.org/abs/2406.04151)] [[code](https://github.com/WooooDyy/AgentGym)]
+- Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents (ACL 2024) [[paper](https://aclanthology.org/2024.acl-long.409.pdf)] [[code](https://github.com/Yifan-Song793/ETO)]
+- Agent LUMOS: Unified and Modular Training for Open-Source Language Agents (ACL 2024) [[paper](https://arxiv.org/pdf/2311.05657v3)] [[code](https://allenai.github.io/lumos/)]
+- LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error (ACL 2024) [[paper](https://arxiv.org/abs/2403.04746)] [[code](https://github.com/microsoft/simulated-trial-and-error)]
+- NAT: Learning From Failure: Integrating Negative Examples when Fine-tuning LLMs as Agents (arXiv 2024) [[paper](https://arxiv.org/abs/2402.11651)] [[code](https://github.com/Reason-Wang/NAT)]
+- OPTIMA: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System (arXiv 2024) [[paper](https://arxiv.org/abs/2410.08115)] [[code](https://chenweize1998.github.io/optima-project-page/)]
+- Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning (NAACL 2024) [[paper](https://aclanthology.org/2024.findings-naacl.184/)] [[code](https://github.com/HAIV-Lab/LLM-TMBR)]
+- AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2402.15506)] [[code](https://github.com/SalesforceAIResearch/xLAM)]
+- TORA: A TOOL-INTEGRATED REASONING AGENT FOR MATHEMATICAL PROBLEM SOLVING (ICLR 2024) [[paper](https://arxiv.org/abs/2309.17452)] [[code](https://github.com/microsoft/tora)]
+- ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent (arxiv 2023) [[paper](https://arxiv.org/pdf/2312.10003)]
+- AGENTBANK: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories (ACL-Findings 2024) [[paper](https://arxiv.org/abs/2410.07706)] [[code](https://huggingface.co/datasets/Solaris99/AgentBank)]
+- ADASWITCH: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning (EMNLP 2024) [[paper](https://arxiv.org/abs/2410.13181)]
+- Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.11176)] [[code](https://github.com/WeiminXiong/IPR)]
+- Re-ReST: Reflection-Reinforced Self-Training for Language Agents (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.01495)] [[code](https://github.com/PlusLabNLP/Re-ReST)]
+- Retrospex: Language Agent Meets Offline Reinforcement Learning Critic (EMNLP 2024) [[paper](https://aclanthology.org/2024.emnlp-main.268/)] [[code](https://github.com/Yufei-Xiang/Retrospex)]
+- ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator (EMNLP 2024) [[paper](https://arxiv.org/abs/2405.18111)] [[code](https://github.com/chuhac/ATM-RAG)]
+- SWIFTSAGE: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks (NeurIPS 2023) [[paper](https://arxiv.org/abs/2305.17390)] [[code](https://github.com/SwiftSage/SwiftSage)]
+- NLRL: Natural Language Reinforcement Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2411.14251)] [[code](https://github.com/waterhorse1/Natural-language-RL)]
+- AGILE: A Novel Reinforcement Learning Framework of LLM Agents (NeurIPS 2024) [[paper](https://arxiv.org/abs/2405.14751)] [[code](https://github.com/bytarnish/AGILE)]
+- COEVOL: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation (arXiv 2024) [[paper](https://arxiv.org/abs/2406.07054)] [[code](https://github.com/lirenhao1997/CoEvol)]
+- E2CL: Exploration-based Error Correction Learning for Embodied Agents (EMNLP-Findings 2024) [[paper](https://aclanthology.org/2024.findings-emnlp.448/)][[code](https://github.com/WangHanLinHenry/E2CL)]
+- STeCa: Step-level Trajectory Calibration for LLM Agent Learning (arXiv 2025) [[paper](https://arxiv.org/abs/2502.14276)][[code](https://github.com/WangHanLinHenry/STeCa)]
+- Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2411.14497)] [[code](https://github.com/CANGLETIAN/Star-Agents)]
+- ATLaS: Agent Tuning via Learning Critical Steps (**ACL 2025**) [[paper](https://arxiv.org/abs/2503.02197)]
+- Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2502.02533)]
+- Agent Planning with World Knowledge Model (**NeurIPS 2024**) [[paper](https://arxiv.org/pdf/2405.14205)] [[code](https://github.com/zjunlp/WKM)]
+- MULTIAGENT FINETUNING: SELF IMPROVEMENT WITH DIVERSE REASONING CHAINS (**ICLR 2025**) [[paper](https://arxiv.org/abs/2501.05707)] [[code](https://github.com/vsubramaniam851/multiagent-ft)]
+- Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning (**ACL 2025**) [[paper](https://arxiv.org/pdf/2412.14780)]
+-
+## Reinforcement Learning-based
+- CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models  (arXiv 2024) [[paper](https://arxiv.org/pdf/2404.01663)] [[code](https://github.com/heimy2000/CMAT)]
+- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2411.03817)]
+- WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2411.02337)] [[code](https://github.com/THUDM/WebRL)]
+- SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales (EMNLP 2024) [[paper](https://arxiv.org/abs/2405.20974)] [[code](https://github.com/tianyang-x/SaySelf)]
+- AgentGym: Evolving Large Language Model-based Agents across Diverse Environments (arXiv 2024) [[paper](https://arxiv.org/abs/2406.04151)] [[code](https://github.com/WooooDyy/AgentGym)]
+- Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2410.06101)]
+- GELI: Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents (EMNLP 2024) [[paper](https://aclanthology.org/2024.emnlp-main.881/)]
+- AGILE: A Novel Reinforcement Learning Framework of LLM Agents (NeurIPS 2024) [[paper](https://arxiv.org/abs/2405.14751)] [[code](https://github.com/bytarnish/AGILE)]
+- Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents (arxiv) [[paper](https://arxiv.org/abs/2408.07199)]
+- DMPO: Direct Multi-Turn Preference Optimization for Language Agents (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.14868)] [[code](https://github.com/swt-user/DMPO)]
+- Re-ReST: Reflection-Reinforced Self-Training for Language Agents (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.01495)] [[code](https://github.com/PlusLabNLP/Re-ReST)]
+- ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator (EMNLP 2024) [[paper](https://arxiv.org/abs/2405.18111)] [[code](https://github.com/chuhac/ATM-RAG)]
+- OPTIMA: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System (arXiv 2024) [[paper](https://arxiv.org/abs/2410.08115)] [[code](https://chenweize1998.github.io/optima-project-page/)]
+- EPO: Hierarchical LLM Agents with Environment Preference Optimization (EMNLP 2024) [[paper](https://arxiv.org/abs/2408.16090)] [[code](https://github.com/kevinz8866/EPO)]
+- Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.11176)] [[code](https://github.com/WeiminXiong/IPR)]
+- AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback (NeurIPS 2024) [[paper](https://arxiv.org/abs/2402.01469)] [[code](https://github.com/JianGuanTHU/AMOR)]
+- SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2503.15478)] [[code](https://github.com/facebookresearch/sweet_rl)]
+- Reinforcing Language Agents via Policy Optimization with Action Decomposition (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2405.15821)] [[code](https://github.com/morning9393/ADRL)]
+- STeCa: Step-level Trajectory Calibration for LLM Agent Learning (arXiv 2025) [[paper](https://arxiv.org/abs/2502.14276)][[code](https://github.com/WangHanLinHenry/STeCa)]
+- DITS: Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search (**arXiv 2025**) [[paper](https://arxiv.org/abs/2502.00955)]
+- EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning (**ACL 2025**) [[paper](https://arxiv.org/abs/2502.12486)] [[code](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/EPO)]
+- DAPO: Decoupled Clip and Dynamic Sampling Policy Optimization (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2503.14476)]
+- MARFT: Multi-Agent Reinforcement Fine-Tuning (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2504.16129)] [[code](https://github.com/jwliao-ai/MARFT)]
+## Hybrid Fine-Tuning Optimization
+- ReFT: Reasoning with Reinforced Fine-Tuning (ACL 2024) [[paper](https://arxiv.org/abs/2401.08967)] [[code](https://github.com/lqtrung1998/mwp_ReFT)]
+- AgentGym: Evolving Large Language Model-based Agents across Diverse Environments (arXiv 2024) [[paper](https://arxiv.org/abs/2406.04151)] [[code](https://github.com/WooooDyy/AgentGym)]
+- AGILE: A Novel Reinforcement Learning Framework of LLM Agents (NeurIPS 2024) [[paper](https://arxiv.org/abs/2405.14751)] [[code](https://github.com/bytarnish/AGILE)]
+- Re-ReST: Reflection-Reinforced Self-Training for Language Agents (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.01495)] [[code](https://github.com/PlusLabNLP/Re-ReST)]
+- AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback (NeurIPS 2024) [[paper](https://arxiv.org/abs/2402.01469)] [[code](https://github.com/JianGuanTHU/AMOR)]
+- Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents (ACL 2024) [[paper](https://aclanthology.org/2024.acl-long.409.pdf)] [[code](https://github.com/Yifan-Song793/ETO)]
+- OPTIMA: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System (arXiv 2024) [[paper](https://arxiv.org/abs/2410.08115)] [[code](https://chenweize1998.github.io/optima-project-page/)]
+- Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement (EMNLP 2024) [[paper](https://arxiv.org/abs/2406.11176)] [[code](https://github.com/WeiminXiong/IPR)]
+- Retrospex: Language Agent Meets Offline Reinforcement Learning Critic (EMNLP 2024) [[paper](https://aclanthology.org/2024.emnlp-main.268/)] [[code](https://github.com/Yufei-Xiang/Retrospex)]
+- ENVISION:Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models (arXiv 2024) [[paper](https://arxiv.org/abs/2406.11736)] [[code](https://github.com/xufangzhi/ENVISIONS)]
+- DITS: Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search (**arXiv 2025**) [[paper](https://arxiv.org/abs/2502.00955)]
+-
+# Parameter-Free Optimization
+## Experience-based
+- Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks (NeurIPS 2024) [[paper](https://arxiv.org/abs/2408.03615)] [[code](https://cybertronagent.github.io/Optimus-1.github.io/)]
+- Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents (arXiv 2024) [[paper](https://arxiv.org/abs/2405.02957)]
+- ExpeL: LLM Agents Are Experiential Learners (AAAI 2024) [[paper](https://arxiv.org/abs/2308.10144)] [[code](https://github.com/LeapLabTHU/ExpeL)]
+- AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning (NeurIPS 2024) [[paper](https://arxiv.org/abs/2405.16247)] [[code](https://github.com/minghchen/automanual)]
+- AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents (NeurIPS 2024) [[paper](https://arxiv.org/abs/2403.08978)]
+- Experiential Co-Learning of Software-Developing Agents (ACL 2024) [[paper](https://arxiv.org/abs/2312.17025)] [[code](https://github.com/OpenBMB/ChatDev)]
+## Feedback-based
+- Reflexion: Language Agents with Verbal Reinforcement Learning (NeurIPS 2023) [[paper](https://arxiv.org/pdf/2303.11366)] [[code](https://github.com/noahshinn/reflexion)]
+- QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction (ACL 2024) [[paper](https://arxiv.org/abs/2403.11886)] [[code](https://github.com/cdhx/QueryAgent)]
+- Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization (ACL 2024) [[paper](https://arxiv.org/abs/2402.17574)] [[code](https://github.com/zwq2018/Agent-Pro)]
+- SAGE: Self-Evolving Agents with Reflective and Memory-Augmented Abilities (arXiv 2024) [[paper](https://arxiv.org/abs/2409.00872)]
+- ReCon: Boosting LLM Agents with Recursive Contemplation for Effective Deception Handling (ACL-findings 2024) [[paper](https://openreview.net/pdf?id=LO-NO1-PwJR)]
+- Symbolic Learning Enables Self-Evolving Agents (arXiv 2024) [[paper](https://arxiv.org/abs/2406.18532)] [[code](https://github.com/aiwaves-cn/agents)]
+- COPPR:Reflective Multi-Agent Collaboration based on Large Language Models (NeurIPS 2024) [[paper](https://neurips.cc/virtual/2024/poster/93147)]
+- METAREFLECTION: Learning Instructions for Language Agents using Past Reflections (EMNLP 2024) [[paper](https://arxiv.org/abs/2405.13009)] [[code](https://github.com/microsoft/prose/tree/main/misc/MetaReflection)]
+- InteRecAgent: Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations (arXiv 2023) [[paper](https://arxiv.org/abs/2308.16505)] [[code](https://github.com/microsoft/RecAI/tree/main/InteRecAgent)]
+- NLRL: Natural Language Reinforcement Learning (arXiv 2024) [[paper](https://arxiv.org/abs/2411.14251)] [[code](https://github.com/waterhorse1/Natural-language-RL)]
+- Chain-of-Experts: When LLMs Meet Complex Operation Research Problems (ICLR 2024) [[paper](https://openreview.net/forum?id=HobyL1B9CZ)] [[code](https://github.com/xzymustbexzy/Chain-of-Experts)]
+- Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization (arXiv 2024) [[paper](https://arxiv.org/abs/2308.02151)]
+- SELF-TUNING: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching (arXiv 2024) [[paper](https://arxiv.org/pdf/2406.06326)]
+- OPRO: LARGE LANGUAGE MODELS AS OPTIMIZERS (ICLR 2024) [[paper](https://arxiv.org/abs/2309.03409)] [[code](https://github.com/google-deepmind/opro)]
+- MPO: Boosting LLM Agents with Meta Plan Optimization (**arXiv 2025**) [[paper](https://arxiv.org/abs/2503.02682)]
+## Tool-based
+- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (**EMNLP 2024**) [[paper](https://aclanthology.org/2024.emnlp-main.436/)] [[code](https://github.com/OSU-NLP-Group/Middleware)]
+- AVATAR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2406.11200)] [[code](https://github.com/zou-group/avatar)]
+- AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning (**ACL** **2024**) [[paper](https://arxiv.org/abs/2401.05268)] [[code](https://github.com/zjunlp/AutoAct)]
+- TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2308.03427)]
+- Lyra: Orchestrating Dual Correction in Automated Theorem Proving (**TMLR 2024**) [[paper](https://arxiv.org/abs/2309.15806)] [[code](https://github.com/chuanyang-zheng/lyra-theorem-prover)]
+- Offline Training of Language Model Agents with Functions as Learnable Weights (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2402.11359)]
+- VideoAgent: A Memory-Augmented Multimodal Agent for Video Understanding (**ECCV 2024**) [[paper](https://arxiv.org/pdf/2403.11481)] [[code](https://videoagent.github.io)]
+- ALITA: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution (**arXiv 2025**) [[paper](https://arxiv.org/abs/2505.20286)] [[code](https://github.com/CharlesQ9/Alita)]
+- Search-o1: Agentic Search-Enhanced Large Reasoning Models (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2501.05366)] [[code](https://github.com/sunnynexus/Search-o1)]
+## RAG-based
+- Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs (**EMNLP 2024**) [[paper](https://arxiv.org/abs/2409.19401)]
+- RaDA: Retrieval-augmented Web Agent Planning with LLMs (**ACL** **2024-findings**) [[paper](https://aclanthology.org/2024.findings-acl.802/)] [[code](https://github.com/ldilab/RaDA)]
+- AutoRAG: Automated Framework for Optimization of Retrieval Augmented Generation Pipeline (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2410.20878)] [[code](https://github.com/Marker-Inc-Korea/AutoRAG_ARAGOG_Paper)]
+- RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2402.03610)] [[code](https://github.com/PanasonicConnect/rap)]
+- MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2408.01869v1)] [[code](https://github.com/jihyechoi77/malade)]
+- PaperQA: Retrieval-Augmented Generative Agent for Scientific Research (**arXiv 2023**) [[paper](https://arxiv.org/abs/2312.07559)] [[code](https://github.com/future-house/paper-qa)]
+- Search-o1: Agentic Search-Enhanced Large Reasoning Models (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2501.05366)] [[code](https://github.com/sunnynexus/Search-o1)]
+## Multi-Agent
+- CAPO: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation (arXiv 2024) [[paper](https://arxiv.org/abs/2411.04679)]
+- A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops (arXiv 2024) [[paper](https://arxiv.org/abs/2412.17149)] [[code](https://anonymous.4open.science/r/evolver-1D11/)]
+- Training Agents with Weakly Supervised Feedback from Large Language Models (arXiv 2024) [[paper](https://arxiv.org/abs/2411.19547)]
+- Chatdev: Communicative Agents for Software Development (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2307.07924)] [[code](https://github.com/OpenBMB/ChatDev)]
+- MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2308.00352)] [[code](https://github.com/geekan/MetaGPT)]
+- MapCoder: Multi-Agent Code Generation for Competitive Problem Solving (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2405.11403)] [[code](https://github.com/Md-Ashraful-Pramanik/MapCoder)]
+- A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration (**COLM 2024**) [[paper](https://arxiv.org/abs/2310.02170)] [[code](https://github.com/SALT-NLP/DyLAN)]
+- Scaling Large-Language-Model-based Multi-Agent Collaboration (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2406.07155)] [[code](https://github.com/OpenBMB/ChatDev)]
+- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2308.10848)] [[code](https://github.com/OpenBMB/AgentVerse)]
+- SMoA: Improving Multi-Agent Large Language Models with Sparse Mixture-of-Agents (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2411.03284)] [[code](https://github.com/David-Li0406/SMoA)]
+- Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate (**EMNLP 2024**) [[paper](https://arxiv.org/abs/2305.19118)] [[code](https://github.com/Skytliang/Multi-Agents-Debate)]
+- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2308.08155)] [[code](https://github.com/microsoft/autogen)]
+# Datasets and Benchmarks
+## Datasets and Benchmarks for Evaluation
+### General Evaluation Tasks
+- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents (NeurIPS 2022) [[paper](https://arxiv.org/abs/2207.01206)] [[code](https://webshop-pnlp.github.io/)]
+- WebArena: A Realistic Web Environment for Building Autonomous Agents (arXiv 2024) [[paper](https://arxiv.org/abs/2307.13854)] [[code](https://webarena.dev/)]
+- Mind2Web: Towards a Generalist Agent for the Web (NeurIPS 2023 Spotlight) [[paper](https://arxiv.org/abs/2306.06070)] [[code](https://osu-nlp-group.github.io/Mind2Web/)]
+- Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration (ICLR 2018) [[paper](https://arxiv.org/abs/1802.08802)] [[code](https://github.com/Farama-Foundation/miniwob-plusplus)]
+- ScienceWorld: Is your Agent Smarter than a 5th Grader? (EMNLP 2022) [[paper](https://aclanthology.org/2022.emnlp-main.775/)] [[code](https://sciworld.apps.allenai.org/)]
+- ALFWorld: Aligning Text and Embodied Environments for Interactive Learning (ICLR 2021) [[paper](https://arxiv.org/abs/2010.03768)] [[code](https://alfworld.github.io/)]
+- Building Cooperative Embodied Agents Modularly with Large Language Models (ICLR 2024) [[paper](https://arxiv.org/abs/2307.02485)] [[code](https://vis-www.cs.umass.edu/Co-LLM-Agents/)]
+- ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks (CVPR 2020) [[paper](https://arxiv.org/abs/1912.01734)] [[code](https://askforalfred.com/)]
+- RLCard: A Toolkit for Reinforcement Learning in Card Games (AAAI-Workshop 2020) [[paper](https://arxiv.org/abs/1910.04376)] [[code](https://github.com/datamllab/rlcard)]
+- OpenSpiel: A Framework for Reinforcement Learning in Games (arXiv 2019) [[paper](https://arxiv.org/abs/1908.09453)] [[code](https://github.com/google-deepmind/open_spiel)]
+- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (EMNLP 2018) [[paper](https://arxiv.org/abs/1809.09600)] [[code](https://hotpotqa.github.io/)]
+- StrategyQA: Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies (TACL 2021) [[paper](https://arxiv.org/abs/2101.02235)] [[code](https://github.com/eladsegal/strategyqa)]
+- mmlu:Measuring Massive Multitask Language Understanding (ICLR 2021) [[paper](https://arxiv.org/abs/2009.03300)] [[code](https://github.com/hendrycks/test)]
+- TruthfulQA: Measuring How Models Mimic Human Falsehoods (ACL 2022) [[paper](https://arxiv.org/abs/2109.07958)] [[code](https://github.com/sylinrl/TruthfulQA)]
+- TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension (ACL 2017) [[paper](https://aclanthology.org/P17-1147/)]
+- PubMedQA: A Dataset for Biomedical Research Question Answering (EMNLP 2019) [[paper](https://arxiv.org/abs/1909.06146)] [[code](https://pubmedqa.github.io/)]
+- MuSiQue: Multihop Questions via Single-hop Question Composition (TACL 2022) [[paper](https://arxiv.org/abs/2108.00573)] [[code](https://github.com/stonybrooknlp/musique)]
+- Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps (COLING 2020) [[paper](https://arxiv.org/abs/2011.01060)] [[code](https://github.com/Alab-NII/2wikimultihop)]
+- A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers (NAACL 2021) [[paper](https://arxiv.org/abs/2105.03011)] [[code](https://huggingface.co/datasets/allenai/qasper)]
+- Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge (arXiv 2018) [[paper](https://arxiv.org/abs/1803.05457)] [[code](https://huggingface.co/datasets/allenai/ai2_arc)]
+- Training Verifiers to Solve Math Word Problems (arXiv 2021) [[[paper](https://arxiv.org/abs/2110.14168)] [[code](https://openai.com/index/solving-math-word-problems/)]
+- A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers (ACL 2020) [[paper](https://arxiv.org/abs/2106.15772)] [[code](https://github.com/chaochun/nlu-asdiv-dataset)]
+- mwp:Are NLP Models Really Able to Solve Simple Math Word Problems? (NAACL 2021) [[paper](https://arxiv.org/abs/2103.07191)] [[code](https://github.com/arkilpatel/SVAMP)]
+- Measuring Mathematical Problem Solving with the MATH Dataset (NeurIPS 2021) [[paper](https://arxiv.org/abs/2103.03874)] [[code](https://github.com/hendrycks/math/)]
+- T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step (ACL 2024) [[paper](https://arxiv.org/abs/2312.14033)] [[code](https://open-compass.github.io/T-Eval/)]
+- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (ICLR 2024) [[paper](https://arxiv.org/abs/2307.16789v2)] [[code](https://github.com/OpenBMB/ToolBench)]
+- MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback (ICLR 2024) [[paper](https://arxiv.org/abs/2309.10691)] [[code](https://xwang.dev/mint-bench/)]
+- API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs (EMNLP 2023) [[paper](https://arxiv.org/abs/2304.08244)] [[code](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/api-bank)]
+- A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge (ECCV 2022) [[paper](https://arxiv.org/abs/2206.01718)] [[code](https://github.com/allenai/aokvqa)]
+- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering (NeurIPS 2022) [[paper](https://arxiv.org/abs/2209.09513)] [[code](https://scienceqa.github.io/)]
+- VQA: Visual Question Answering (ICCV 2015) [[paper](https://arxiv.org/abs/1505.00468)] [[code](https://visualqa.org/)]
+- EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding (NeurIPS 2023) [[paper](https://arxiv.org/abs/2308.09126)] [[code](https://egoschema.github.io/)]
+- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR 2021) [[paper](https://arxiv.org/abs/2105.08276)] [[code](https://github.com/doc-doc/NExT-QA)]
+- SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (ICLR 2024) [[paper](https://arxiv.org/abs/2310.06770)] [[code](https://github.com/swe-bench/SWE-bench)]
+- Evaluating Large Language Models Trained on Code (arXiv 2021) [[paper](https://arxiv.org/abs/2107.03374)] [[code](https://github.com/openai/human-eval)]
+- LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code (arXiv 2024) [[paper](https://arxiv.org/abs/2403.07974v2)] [[code](https://github.com/LiveCodeBench/LiveCodeBench)]
+- Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs (NeurIPS 2023) [[paper](https://arxiv.org/abs/2305.03111)] [[code](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/bird)]
+- InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback (NeurIPS 2023) [[paper](https://arxiv.org/abs/2306.14898)] [[code](https://github.com/princeton-nlp/intercode)]
+### Multi-task Benchmarks
+- AgentBench: Evaluating LLMs as Agents (**ICLR** **2024**) [[paper](https://arxiv.org/abs/2308.03688)] [[code](https://github.com/THUDM/AgentBench)]
+- AgentGym: Evolving Large Language Model-based Agents across Diverse Environments (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2406.04151)] [[code](https://github.com/WooooDyy/AgentGym)]
+- Just-Eval: The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning (**ICLR** **2024**) [[paper](https://arxiv.org/abs/2312.01552)] [[code](https://github.com/Re-Align/just-eval)]
+- StreamBench: Towards Benchmarking Continuous Improvement of Language Agents (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2406.08747)] [[code](https://github.com/stream-bench/stream-bench)]
+- AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agent (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2401.13178)] [[code](https://github.com/hkust-nlp/AgentBoard)]
+- GAIA: A Benchmark for General AI Assistants (**arXiv 2023**) [[paper](https://arxiv.org/abs/2311.12983)] [[code](https://huggingface.co/gaia-benchmark)]
+- Humanity’s Last Exam (**arXiv 2025**) [[paper](https://arxiv.org/abs/2501.14249)] [[code](https://huggingface.co/datasets/cais/hle)]
+- NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls (**arXiv 2024**) [[paper](https://arxiv.org/abs/2409.03797v3)] [[code](https://github.com/IBM/NESTFUL)]
+- MCP_RADAR: Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models (**arXiv 2025**) [[paper](https://arxiv.org/pdf/2505.16700)] [[code](https://anonymous.4open.science/r/MCPRadar-B143)]
+# Application
+## Healthcare
+- Med-PaLM: Large language models encode clinical knowledge (**Nature 2023**) [[paper](https://arxiv.org/abs/2212.13138)]
+- DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2304.01097)] [[code](https://github.com/xionghonglin/DoctorGLM)]
+- BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2310.15896)] [[code](https://github.com/scutcyr/BianQue)]
+- DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2308.14346)] [[code](https://github.com/FudanDISC/DISC-MedLLM)]
+- ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning (**BCB 2024**) [[paper](https://dl.acm.org/doi/abs/10.1145/3698587.3701359)] [[code](https://github.com/LeoYML/clinical-agent)]
+- MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (**ACL-findings 2024**) [[paper](https://aclanthology.org/2024.findings-acl.33/)] [[code](https://github.com/gersteinlab/MedAgents)]
+- MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2404.15155)] [[code](https://github.com/mitmedialab/MDAgents)]
+- Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2405.02957)]
+- AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator (**COLING 2025**) [[paper](https://arxiv.org/abs/2402.09742)] [[code](https://github.com/LibertFan/AI_Hospital)]
+- KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis (**AAAI-25 Bridge Program**) [[paper](https://arxiv.org/abs/2412.16833)]
+- AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2402.13225)] [[code](https://github.com/ncbi-nlp/Clinical-Tool-Learning)]
+- MMedAgent: Learning to Use Medical Tools with Multi-modal Agent (**EMNLP-findings 2024**) [[paper](https://arxiv.org/abs/2407.02483)] [[code](https://github.com/Wangyixinxin/MMedAgent)]
+- HuatuoGPT-o1: Towards Medical Complex Reasoning with LLMs (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2412.18925)] [[code](https://github.com/FreedomIntelligence/HuatuoGPT-o1)]
+- IIMedGPT: Promoting Large Language Model Capabilities of Medical Tasks by Efficient Human Preference Alignment (**arXiv** **2025**) [[paper](https://arxiv.org/abs/2501.02869)]
+## Science
+- CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2407.09811)] [[code](https://github.com/lsq2wal/CellAgent)]
+- BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2405.17631)] [[code](https://github.com/snap-stanford/BioDiscoveryAgent/)]
+- ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning (**Digital Discovery 2024**) [[paper](https://arxiv.org/abs/2402.04268)] [[code](https://github.com/lamm-mit/ProtAgents)]
+- CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2404.18021)] [[code](https://github.com/cong-lab/crispr-gpt-pub)]
+- ChemCrow: Augmenting Large-Language Models with Chemistry Tools (**Nature** **Machine Intelligence** **2024**) [[paper](https://arxiv.org/abs/2304.05376)] [[code](https://github.com/ur-whitelab/chemcrow-public)]
+- DrugAssist: A Large Language Model for Molecule Optimization (**Briefings in Bioinformatics 2024**) [[paper](https://arxiv.org/abs/2401.10334)] [[code](https://github.com/blazerye/DrugAssist)]
+- Agent-based Learning of Materials Datasets from Scientific Literature (**Digital Discovery 2024**) [[paper](https://arxiv.org/abs/2312.11690)] [[code](https://github.com/AI4ChemS/Eunomia)]
+- DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2411.15692)] [[code](https://github.com/anrohanro/DrugAgent)]
+- MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization (**SC 2024**) [[paper](https://dl.acm.org/doi/10.1109/SC41406.2024.00013)]
+- Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2410.09403)] [[code](https://github.com/open-sciencelab/Virtual-Scientists)]
+- SciAgent: Tool-augmented Language Models for Scientific Reasoning (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2402.11451)]
+## Embodied Intelligence
+- Building Cooperative Embodied Agents Modularly with Large Language Models (**ICLR** **2024**) [[paper](https://arxiv.org/abs/2307.02485)] [[code](https://vis-www.cs.umass.edu/Co-LLM-Agents/)]
+- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (**PMLR 2023**) [[paper](https://arxiv.org/abs/2204.01691)] [[code](https://github.com/google-research/google-research/tree/master/saycan)]
+- RoCo: Dialectic Multi-Robot Collaboration with Large Language Models (**ICRA 2024**) [[paper](https://arxiv.org/abs/2307.04738)] [[code](https://github.com/MandiZhao/robot-collab)]
+- Voyager: An Open-Ended Embodied Agent with Large Language Models (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2305.16291)] [[code](https://github.com/MineDojo/Voyager)]
+- MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World (**CVPR** **2024**) [[paper](https://arxiv.org/abs/2401.08577)] [[code](https://github.com/UMass-Foundation-Model/MultiPLY)]
+- Retrospex: Language Agent Meets Offline Reinforcement Learning Critic (**EMNLP 2024**) [[paper](https://aclanthology.org/2024.emnlp-main.268/)] [[code](https://github.com/Yufei-Xiang/Retrospex)]
+- EPO: Hierarchical LLM Agents with Environment Preference Optimization (**EMNLP 2024**) [[paper](https://arxiv.org/abs/2408.16090)] [[code](https://github.com/kevinz8866/EPO)]
+- AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2405.16247)] [[code](https://github.com/minghchen/automanual)]
+- MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making (**EMNLP 2024**) [[paper](https://arxiv.org/abs/2409.16686)]
+- iVideoGPT: Interactive VideoGPTs are Scalable World Models (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2405.15223)] [[code](https://github.com/thuml/iVideoGPT)]
+- AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2401.12963)]
+- Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld (**CVPR** **2024**) [[paper](https://arxiv.org/abs/2311.16714)] [[code](https://github.com/stevenyangyj/Emma-Alfworld)]
+## Finance
+- Large Language Model Agent in Financial Trading: A Survey (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2408.06361)]
+- TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2309.03736)]
+- FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design (**AAAI-SS**) [[paper](https://arxiv.org/abs/2311.13743)] [[code](https://github.com/pipiku915/FinMem-LLM-StockTrading)]
+- A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist (**SIGKDD 2024**) [[paper](https://arxiv.org/abs/2402.18485)]
+- Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models (**WWW 2024**) [[paper](https://arxiv.org/abs/2402.03659)] [[code](https://github.com/koa-fin/sep)]
+- FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making (**NeurIPS 2024**) [[paper](https://arxiv.org/abs/2407.06567)]
+- TradingAgents: Multi-Agents LLM Financial Trading Framework (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2412.20138)] [[code](https://github.com/TradingAgents-AI/TradingAgents)]
+- FinVision: A Multi-Agent Framework for Stock Market Prediction (**ICAIF 2024**) [[paper](https://arxiv.org/abs/2411.08899)]
+- Simulating Financial Market via Large Language Model-based Agents (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2406.19966)]
+- FinVerse: An Autonomous Agent System for Versatile Financial Analysis (**arXiv 2024**) [[paper](https://arxiv.org/abs/2406.06379)]
+- FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2405.14767)] [[code](https://github.com/AI4Finance-Foundation/FinRobot)]
+## Programming
+- Agents in Software Engineering: Survey, Landscape, and Vision (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2409.09030)] [[code](https://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE)]
+- Large Language Model-Based Agents for Software Engineering: A Survey (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2409.02977)] [[code](https://github.com/FudanSELab/Agent4SE-Paper-List)]
+- Chatdev: Communicative Agents for Software Development (**ACL 2024**) [[paper](https://arxiv.org/abs/2307.07924)] [[code](https://github.com/OpenBMB/ChatDev)]
+- MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (**ICLR** **2024**) [[paper](https://arxiv.org/abs/2308.00352)] [[code](https://github.com/geekan/MetaGPT)]
+- MapCoder: Multi-Agent Code Generation for Competitive Problem Solving (**ACL** **2024**) [[paper](https://arxiv.org/abs/2405.11403)] [[code](https://github.com/Md-Ashraful-Pramanik/MapCoder)]
+- Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2404.02183)] [[code](https://github.com/tsukushiAI/self-organized-agent)]
+- Multi-Agent Software Development through Cross-Team Collaboration (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2406.08979)] [[code](https://github.com/OpenBMB/ChatDev)]
+- SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering (**NeurIPS 2024**) [[paper](http://arxiv.org/abs/2405.15793)] [[code](https://swe-agent.com/latest/)]
+- CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges (**ACL** **2024**) [[paper](https://arxiv.org/abs/2401.07339)]
+- AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation (**arXiv** **2023**) [[paper](https://arxiv.org/abs/2312.13010)] [[code](https://github.com/huangd1999/AgentCoder)]
+- RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning (**arXiv** **2024**) [[paper](https://arxiv.org/abs/2410.02089)]
+- Lemur: Harmonizing Natural Language and Code for Language Agents (**ICLR** **2024**) [[paper](https://arxiv.org/abs/2310.06830)] [[code](https://github.com/OpenLemur/Lemur)]
+- AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology (**FORGE 2025**) [[paper](https://arxiv.org/abs/2406.11912)] [[code](https://github.com/FSoft-AI4Code/AgileCoder)]
+## 📄 Citation
+If you find this project or the related paper helpful, please consider citing our work:
+**A Survey on the Optimization of Large Language Model-based Agents**   📚 [arXiv:2503.12434](https://arxiv.org/abs/2503.12434)
+```bibtex
+@article{du2025survey,
+  title={A Survey on the Optimization of Large Language Model-based Agents},
+  author={Du, Shangheng and Zhao, Jiabao and Shi, Jinxin and Xie, Zhentao and Jiang, Xin and Bai, Yanhong and He, Liang},
+  journal={arXiv preprint arXiv:2503.12434},
+  year={2025}
 }
 ```