Agentneed commited on
Commit
4b95d23
·
1 Parent(s): 00499db
.gitignore ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ venv*
2
+ .idea*
3
+ tst*
4
+ state*
5
+ junk/*
6
+ huggingface_*
7
+ junk/logs
8
+ papers*
9
+ hf*
10
+ junk/data/*
11
+ research_dir/*
12
+ .DS_Store
13
+ state_saves/*
14
+ __pycache__/*
15
+ Figure*.png
16
+ testrun.py
Dockerfile ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use Python 3.12 as recommended
2
+ FROM python:3.12-slim
3
+
4
+ # Install LaTeX (optional but recommended if compile_latex=true)
5
+ RUN apt-get update && \
6
+ apt-get install -y --no-install-recommends texlive-latex-extra && \
7
+ rm -rf /var/lib/apt/lists/*
8
+
9
+ # Create non-root user
10
+ RUN useradd -m -u 1000 user
11
+ USER user
12
+ ENV PATH="/home/user/.local/bin:$PATH"
13
+
14
+ WORKDIR /app
15
+
16
+ # Install Python deps
17
+ COPY --chown=user ./requirements.txt requirements.txt
18
+ RUN pip install --no-cache-dir --upgrade -r requirements.txt
19
+
20
+ # Install uvicorn for FastAPI
21
+ RUN pip install --no-cache-dir uvicorn fastapi pydantic
22
+
23
+ # Copy app code
24
+ COPY --chown=user . /app
25
+
26
+ # Expose default HF port
27
+ EXPOSE 7860
28
+
29
+ # Start FastAPI app
30
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Samuel Schmidgall
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,11 +1,189 @@
1
- ---
2
- title: Agent Paper
3
- emoji: 🦀
4
- colorFrom: gray
5
- colorTo: indigo
6
- sdk: docker
7
- pinned: false
8
- short_description: 端到端的自主研究工作流程,旨在协助您作为人类研究人员实现您的研究想法
9
- ---
10
-
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Laboratory: Using LLM Agents as Research Assistants
2
+
3
+
4
+ <p align="center">
5
+ <img src="media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
6
+ </p>
7
+
8
+ <p align="center">
9
+ 【English | <a href="readme/README-chinese.md">中文</a> | <a href="readme/README-japanese.md">日本語</a> | <a href="readme/README-korean.md">한국어</a> | <a href="readme/README-filipino.md">Filipino</a> | <a href="readme/README-french.md">Français</a> | <a href="readme/README-slovak.md">Slovenčina</a> | <a href="readme/README-portugese.md">Português</a> | <a href="readme/README-spanish.md">Español</a> | <a href="readme/README-turkish.md">Türkçe</a> | <a href="readme/README-hindi.md">हिंदी</a> | <a href="readme/README-bengali.md">বাংলা</a> | <a href="readme/README-vietnamese.md">Tiếng Việt</a> | <a href="readme/README-russian.md">Русский</a> | <a href="readme/README-arabic.md">العربية</a> | <a href="readme/README-farsi.md">فارسی</a> | <a href="readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【📝 <a href="https://arxiv.org/pdf/2501.04227">Paper</a> | 🌐 <a href="https://agentlaboratory.github.io/">Website</a> | 🌐 <a href="https://agentrxiv.github.io/">AgentRxiv Website</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
14
+ </p>
15
+
16
+ ### News
17
+ * [March/24/2025] 🎉 🎊 🎉 Now introducing **AgentRxiv**, a framework where autonomous research agents can upload, retrieve, and build on each other’s research. This allows agents to make cumulative progress on their research.
18
+
19
+ ## 📖 Overview
20
+
21
+ - **Agent Laboratory** is an end-to-end autonomous research workflow meant to assist **you** as the human researcher toward **implementing your research ideas**. Agent Laboratory consists of specialized agents driven by large language models to support you through the entire research workflow—from conducting literature reviews and formulating plans to executing experiments and writing comprehensive reports.
22
+ - This system is not designed to replace your creativity but to complement it, enabling you to focus on ideation and critical thinking while automating repetitive and time-intensive tasks like coding and documentation. By accommodating varying levels of computational resources and human involvement, Agent Laboratory aims to accelerate scientific discovery and optimize your research productivity.
23
+ <p align="center">
24
+ <img src="media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
25
+ </p>
26
+
27
+ - Agent Laboratory also supports **AgentRxiv**, a framework where autonomous research agents can upload, retrieve, and build on each other’s research. This allows agents to make cumulative progress on their research.
28
+
29
+ <p align="center">
30
+ <img src="media/agentrxiv.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+
34
+ ### 🔬 How does Agent Laboratory work?
35
+
36
+ - Agent Laboratory consists of three primary phases that systematically guide the research process: (1) Literature Review, (2) Experimentation, and (3) Report Writing. During each phase, specialized agents driven by LLMs collaborate to accomplish distinct objectives, integrating external tools like arXiv, Hugging Face, Python, and LaTeX to optimize outcomes. This structured workflow begins with the independent collection and analysis of relevant research papers, progresses through collaborative planning and data preparation, and results in automated experimentation and comprehensive report generation. Details on specific agent roles and their contributions across these phases are discussed in the paper.
37
+
38
+ <p align="center">
39
+ <img src="media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
40
+ </p>
41
+
42
+
43
+ ### 👾 Currently supported models
44
+
45
+ * **OpenAI**: o1, o1-preview, o1-mini, gpt-4o, o3-mini
46
+ * **DeepSeek**: deepseek-chat (deepseek-v3)
47
+
48
+ To select a specific llm set the flag `--llm-backend="llm_model"` for example `--llm-backend="gpt-4o"` or `--llm-backend="deepseek-chat"`. Please feel free to add a PR supporting new models according to your need!
49
+
50
+ ## 🖥️ Installation
51
+
52
+ ### Python venv option
53
+
54
+ * We recommend using python 3.12
55
+
56
+ 1. **Clone the GitHub Repository**: Begin by cloning the repository using the command:
57
+ ```bash
58
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
59
+ ```
60
+
61
+ 2. **Set up and Activate Python Environment**
62
+ ```bash
63
+ python -m venv venv_agent_lab
64
+ ```
65
+ - Now activate this environment:
66
+ ```bash
67
+ source venv_agent_lab/bin/activate
68
+ ```
69
+
70
+ 3. **Install required libraries**
71
+ ```bash
72
+ pip install -r requirements.txt
73
+ ```
74
+
75
+ 4. **Install pdflatex [OPTIONAL]**
76
+ ```bash
77
+ sudo apt install pdflatex
78
+ ```
79
+ - This enables latex source to be compiled by the agents.
80
+ - **[IMPORTANT]** If this step cannot be run due to not having sudo access, pdf compiling can be turned off via running Agent Laboratory via setting the `--compile-latex` flag to false: `--compile-latex "false"`
81
+
82
+
83
+
84
+ 5. **Now run Agent Laboratory!**
85
+
86
+ `python ai_lab_repo.py --yaml-location "experiment_configs/MATH_agentlab.yaml"`
87
+
88
+
89
+ ### Co-Pilot mode
90
+
91
+ To run Agent Laboratory in copilot mode, simply set the copilot-mode flag in your yaml config to `"true"`
92
+
93
+ -----
94
+ ## Tips for better research outcomes
95
+
96
+
97
+ #### [Tip #1] 📝 Make sure to write extensive notes! 📝
98
+
99
+ **Writing extensive notes is important** for helping your agent understand what you're looking to accomplish in your project, as well as any style preferences. Notes can include any experiments you want the agents to perform, providing API keys, certain plots or figures you want included, or anything you want the agent to know when performing research.
100
+
101
+ This is also your opportunity to let the agent know **what compute resources it has access to**, e.g. GPUs (how many, what type of GPU, how many GBs), CPUs (how many cores, what type of CPUs), storage limitations, and hardware specs.
102
+
103
+ In order to add notes, you must modify the task_notes_LLM structure inside of `ai_lab_repo.py`. Provided below is an example set of notes used for some of our experiments.
104
+
105
+
106
+ ```
107
+ task-notes:
108
+ plan-formulation:
109
+ - 'You should come up with a plan for only ONE experiment aimed at maximizing performance on the test set of MATH using prompting techniques.'
110
+ - 'Please use gpt-4o-mini for your experiments'
111
+ - 'You must evaluate on the entire 500 test questions of MATH'
112
+ data-preparation:
113
+ - 'Please use gpt-4o-mini for your experiments'
114
+ - 'You must evaluate on the entire 500 test questions of MATH'
115
+ - 'Here is a sample code you can use to load MATH\nfrom datasets import load_dataset\nMATH_test_set = load_dataset("HuggingFaceH4/MATH-500")["test"]'
116
+ ...
117
+ ```
118
+
119
+ --------
120
+
121
+ #### [Tip #2] 🚀 Using more powerful models generally leads to better research 🚀
122
+
123
+ When conducting research, **the choice of model can significantly impact the quality of results**. More powerful models tend to have higher accuracy, better reasoning capabilities, and better report generation. If computational resources allow, prioritize the use of advanced models such as o1-(mini/preview) or similar state-of-the-art large language models.
124
+
125
+ However, **it’s important to balance performance and cost-effectiveness**. While powerful models may yield better results, they are often more expensive and time-consuming to run. Consider using them selectively—for instance, for key experiments or final analyses—while relying on smaller, more efficient models for iterative tasks or initial prototyping.
126
+
127
+ When resources are limited, **optimize by fine-tuning smaller models** on your specific dataset or combining pre-trained models with task-specific prompts to achieve the desired balance between performance and computational efficiency.
128
+
129
+ -----
130
+
131
+ #### [Tip #3] ✅ You can load previous saves from checkpoints ✅
132
+
133
+ **If you lose progress, internet connection, or if a subtask fails, you can always load from a previous state.** All of your progress is saved by default in the `state_saves` variable, which stores each individual checkpoint.
134
+
135
+ -----
136
+
137
+
138
+ #### [Tip #4] 🈯 If you are running in a language other than English 🈲
139
+
140
+ If you are running Agent Laboratory in a language other than English, no problem, just make sure to provide a language flag to the agents to perform research in your preferred language. Note that we have not extensively studied running Agent Laboratory in other languages, so be sure to report any problems you encounter.
141
+
142
+ For example, if you are running in Chinese set the language in the yaml:
143
+
144
+ `language: "中文"`
145
+
146
+ ----
147
+
148
+
149
+ #### [Tip #5] 🌟 There is a lot of room for improvement 🌟
150
+
151
+ There is a lot of room to improve this codebase, so if you end up making changes and want to help the community, please feel free to share the changes you've made! We hope this tool helps you!
152
+
153
+
154
+ ## 📜 License
155
+
156
+ Source Code Licensing: Our project's source code is licensed under the MIT License. This license permits the use, modification, and distribution of the code, subject to certain conditions outlined in the MIT License.
157
+
158
+ ## 📬 Contact
159
+
160
+ If you would like to get in touch, feel free to reach out to [sschmi46@jhu.edu](mailto:sschmi46@jhu.edu)
161
+
162
+ ## Reference / Bibtex
163
+
164
+
165
+ ### Agent Laboratory
166
+ ```bibtex
167
+ @misc{schmidgall2025agentlaboratoryusingllm,
168
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
169
+ author={Samuel Schmidgall and Yusheng Su and Ze Wang and Ximeng Sun and Jialian Wu and Xiaodong Yu and Jiang Liu and Michael Moor and Zicheng Liu and Emad Barsoum},
170
+ year={2025},
171
+ eprint={2501.04227},
172
+ archivePrefix={arXiv},
173
+ primaryClass={cs.HC},
174
+ url={https://arxiv.org/abs/2501.04227},
175
+ }
176
+ ```
177
+
178
+ ### AgentRxiv
179
+ ```bibtex
180
+ @misc{schmidgall2025agentrxiv,
181
+ title={AgentRxiv: Towards Collaborative Autonomous Research},
182
+ author={Samuel Schmidgall and Michael Moor},
183
+ year={2025},
184
+ eprint={2503.18102},
185
+ archivePrefix={arXiv},
186
+ primaryClass={cs.AI},
187
+ url={https://arxiv.org/abs/2503.18102},
188
+ }
189
+ ```
agents.py ADDED
@@ -0,0 +1,739 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from utils import *
2
+ from tools import *
3
+ from inference import *
4
+ import random, string
5
+
6
+
7
+ def extract_json_between_markers(llm_output):
8
+ # Regular expression pattern to find JSON content between ```json and ```
9
+ json_pattern = r"```json(.*?)```"
10
+ matches = re.findall(json_pattern, llm_output, re.DOTALL)
11
+
12
+ if not matches:
13
+ # Fallback: Try to find any JSON-like content in the output
14
+ json_pattern = r"\{.*?\}"
15
+ matches = re.findall(json_pattern, llm_output, re.DOTALL)
16
+
17
+ for json_string in matches:
18
+ json_string = json_string.strip()
19
+ try:
20
+ parsed_json = json.loads(json_string)
21
+ return parsed_json
22
+ except json.JSONDecodeError:
23
+ # Attempt to fix common JSON issues
24
+ try:
25
+ # Remove invalid control characters
26
+ json_string_clean = re.sub(r"[\x00-\x1F\x7F]", "", json_string)
27
+ parsed_json = json.loads(json_string_clean)
28
+ return parsed_json
29
+ except json.JSONDecodeError:
30
+ continue # Try next match
31
+
32
+ return None # No valid JSON found
33
+
34
+
35
+
36
+ def get_score(outlined_plan, latex, reward_model_llm, reviewer_type=None, attempts=3, openai_api_key=None):
37
+ e = str()
38
+ for _attempt in range(attempts):
39
+ try:
40
+ # todo: have a reward function here
41
+ # template inherited from the AI Scientist (good work on this prompt Sakana AI team :D)
42
+ template_instructions = """
43
+ Respond in the following format:
44
+
45
+ THOUGHT:
46
+ <THOUGHT>
47
+
48
+ REVIEW JSON:
49
+ ```json
50
+ <JSON>
51
+ ```
52
+
53
+ In <THOUGHT>, first briefly discuss your intuitions and reasoning for the evaluation.
54
+ Detail your high-level arguments, necessary choices and desired outcomes of the review.
55
+ Do not make generic comments here, but be specific to your current paper.
56
+ Treat this as the note-taking phase of your review.
57
+
58
+ In <JSON>, provide the review in JSON format with the following fields in the order:
59
+ - "Summary": A summary of the paper content and its contributions.
60
+ - "Strengths": A list of strengths of the paper.
61
+ - "Weaknesses": A list of weaknesses of the paper.
62
+ - "Originality": A rating from 1 to 4 (low, medium, high, very high).
63
+ - "Quality": A rating from 1 to 4 (low, medium, high, very high).
64
+ - "Clarity": A rating from 1 to 4 (low, medium, high, very high).
65
+ - "Significance": A rating from 1 to 4 (low, medium, high, very high).
66
+ - "Questions": A set of clarifying questions to be answered by the paper authors.
67
+ - "Limitations": A set of limitations and potential negative societal impacts of the work.
68
+ - "Ethical Concerns": A boolean value indicating whether there are ethical concerns.
69
+ - "Soundness": A rating from 1 to 4 (poor, fair, good, excellent).
70
+ - "Presentation": A rating from 1 to 4 (poor, fair, good, excellent).
71
+ - "Contribution": A rating from 1 to 4 (poor, fair, good, excellent).
72
+ - "Overall": A rating from 1 to 10 (very strong reject to award quality).
73
+ - "Confidence": A rating from 1 to 5 (low, medium, high, very high, absolute).
74
+ - "Decision": A decision that has to be one of the following: Accept, Reject.
75
+
76
+ For the "Decision" field, don't use Weak Accept, Borderline Accept, Borderline Reject, or Strong Reject. Instead, only use Accept or Reject.
77
+ This JSON will be automatically parsed, so ensure the format is precise.
78
+ """
79
+ neurips_form = ("""
80
+ ## Review Form
81
+ Below is a description of the questions you will be asked on the review form for each paper and some guidelines on what to consider when answering these questions.
82
+ When writing your review, please keep in mind that after decisions have been made, reviews and meta-reviews of accepted papers and opted-in rejected papers will be made public.
83
+
84
+ 1. Summary: Briefly summarize the paper and its contributions. This is not the place to critique the paper; the authors should generally agree with a well-written summary.
85
+ - Strengths and Weaknesses: Please provide a thorough assessment of the strengths and weaknesses of the paper, touching on each of the following dimensions:
86
+ - Originality: Are the tasks or methods new? Is the work a novel combination of well-known techniques? (This can be valuable!) Is it clear how this work differs from previous contributions? Is related work adequately cited
87
+ - Quality: Is the submission technically sound? Are claims well supported (e.g., by theoretical analysis or experimental results)? Are the methods used appropriate? Is this a complete piece of work or work in progress? Are the authors careful and honest about evaluating both the strengths and weaknesses of their work
88
+ - Clarity: Is the submission clearly written? Is it well organized? (If not, please make constructive suggestions for improving its clarity.) Does it adequately inform the reader? (Note that a superbly written paper provides enough information for an expert reader to reproduce its results.)
89
+ - Significance: Are the results important? Are others (researchers or practitioners) likely to use the ideas or build on them? Does the submission address a difficult task in a better way than previous work? Does it advance the state of the art in a demonstrable way? Does it provide unique data, unique conclusions about existing data, or a unique theoretical or experimental approach?
90
+
91
+ 2. Questions: Please list up and carefully describe any questions and suggestions for the authors. Think of the things where a response from the author can change your opinion, clarify a confusion or address a limitation. This can be very important for a productive rebuttal and discussion phase with the authors.
92
+
93
+ 3. Limitations: Have the authors adequately addressed the limitations and potential negative societal impact of their work? If not, please include constructive suggestions for improvement.
94
+ In general, authors should be rewarded rather than punished for being up front about the limitations of their work and any potential negative societal impact. You are encouraged to think through whether any critical points are missing and provide these as feedback for the authors.
95
+
96
+ 4. Ethical concerns: If there are ethical issues with this paper, please flag the paper for an ethics review. For guidance on when this is appropriate, please review the NeurIPS ethics guidelines.
97
+
98
+ 5. Soundness: Please assign the paper a numerical rating on the following scale to indicate the soundness of the technical claims, experimental and research methodology and on whether the central claims of the paper are adequately supported with evidence.
99
+ 4: excellent
100
+ 3: good
101
+ 2: fair
102
+ 1: poor
103
+
104
+ 6. Presentation: Please assign the paper a numerical rating on the following scale to indicate the quality of the presentation. This should take into account the writing style and clarity, as well as contextualization relative to prior work.
105
+ 4: excellent
106
+ 3: good
107
+ 2: fair
108
+ 1: poor
109
+
110
+ 7. Contribution: Please assign the paper a numerical rating on the following scale to indicate the quality of the overall contribution this paper makes to the research area being studied. Are the questions being asked important? Does the paper bring a significant originality of ideas and/or execution? Are the results valuable to share with the broader NeurIPS community.
111
+ 4: excellent
112
+ 3: good
113
+ 2: fair
114
+ 1: poor
115
+
116
+ 8. Overall: Please provide an "overall score" for this submission. Choices:
117
+ 10: Award quality: Technically flawless paper with groundbreaking impact on one or more areas of AI, with exceptionally strong evaluation, reproducibility, and resources, and no unaddressed ethical considerations.
118
+ 9: Very Strong Accept: Technically flawless paper with groundbreaking impact on at least one area of AI and excellent impact on multiple areas of AI, with flawless evaluation, resources, and reproducibility, and no unaddressed ethical considerations.
119
+ 8: Strong Accept: Technically strong paper, with novel ideas, excellent impact on at least one area of AI or high-to-excellent impact on multiple areas of AI, with excellent evaluation, resources, and reproducibility, and no unaddressed ethical considerations.
120
+ 7: Accept: Technically solid paper, with high impact on at least one sub-area of AI or moderate-to-high impact on more than one area of AI, with good-to-excellent evaluation, resources, reproducibility, and no unaddressed ethical considerations.
121
+ 6: Weak Accept: Technically solid, moderate-to-high impact paper, with no major concerns with respect to evaluation, resources, reproducibility, ethical considerations.
122
+ 5: Borderline accept: Technically solid paper where reasons to accept outweigh reasons to reject, e.g., limited evaluation. Please use sparingly.
123
+ 4: Borderline reject: Technically solid paper where reasons to reject, e.g., limited evaluation, outweigh reasons to accept, e.g., good evaluation. Please use sparingly.
124
+ 3: Reject: For instance, a paper with technical flaws, weak evaluation, inadequate reproducibility and incompletely addressed ethical considerations.
125
+ 2: Strong Reject: For instance, a paper with major technical flaws, and/or poor evaluation, limited impact, poor reproducibility and mostly unaddressed ethical considerations.
126
+ 1: Very Strong Reject: For instance, a paper with trivial results or unaddressed ethical considerations
127
+
128
+ 9. Confidence: Please provide a "confidence score" for your assessment of this submission to indicate how confident you are in your evaluation. Choices:
129
+ 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully.
130
+ 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.
131
+ 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.
132
+ 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.
133
+ 1: Your assessment is an educated guess. The submission is not in your area or the submission was difficult to understand. Math/other details were not carefully checked.
134
+
135
+ You must make sure that all sections are properly created: abstract, introduction, methods, results, and discussion. Points must be reduced from your scores if any of these are missing.
136
+ """ + template_instructions)
137
+ if reviewer_type is None: reviewer_type = ""
138
+ sys = (
139
+ "You are an AI researcher who is reviewing a paper that was submitted to a prestigious ML venue. "
140
+ f"Be critical and cautious in your decision. {reviewer_type}\n"
141
+ ) + neurips_form
142
+ scoring = query_model(
143
+ model_str=f"{reward_model_llm}",
144
+ system_prompt=sys,
145
+ openai_api_key=openai_api_key,
146
+ prompt=(
147
+ f"Outlined in the following text is the research plan that the machine learning engineer was tasked with building: {outlined_plan}\n\n"
148
+ f"The following text is the research latex that the model produced: \n{latex}\n\n"), temp=0.0)
149
+ review_json = extract_json_between_markers(scoring)
150
+
151
+ overall = int(review_json["Overall"]) / 10
152
+ soundness = int(review_json["Soundness"]) / 4
153
+ confidence = int(review_json["Confidence"]) / 5
154
+ contribution = int(review_json["Contribution"]) / 4
155
+ presentation = int(review_json["Presentation"]) / 4
156
+ clarity = int(review_json["Clarity"]) / 4
157
+ originality = int(review_json["Originality"]) / 4
158
+ quality = int(review_json["Quality"]) / 4
159
+ significance = int(review_json["Significance"]) / 4
160
+
161
+ clarity_weight = 0.1
162
+ quality_weight = 0.1
163
+ overall_weight = 1.0
164
+ soundness_weight = 0.1
165
+ confidence_weight = 0.1
166
+ originality_weight = 0.1
167
+ significance_weight = 0.1
168
+ contribution_weight = 0.4
169
+ presentation_weight = 0.2
170
+
171
+ # max possible
172
+ max_score = (
173
+ clarity_weight + quality_weight + overall_weight + soundness_weight + confidence_weight + originality_weight + significance_weight + contribution_weight + presentation_weight)
174
+
175
+ performance = ((
176
+ soundness_weight * soundness + presentation_weight * presentation + confidence_weight * confidence + contribution_weight * contribution + overall_weight * overall + originality_weight * originality + significance * significance_weight + clarity_weight * clarity + quality_weight * quality) / max_score) * 10
177
+ return performance, f"The performance of your submission is: {performance}" + scoring, True
178
+ except Exception as e:
179
+ print(e)
180
+ return None, str(e), False
181
+ return 0, e
182
+
183
+
184
+ class ReviewersAgent:
185
+ def __init__(self, model="gpt-4o-mini", notes=None, openai_api_key=None):
186
+ if notes is None: self.notes = []
187
+ else: self.notes = notes
188
+ self.model = model
189
+ self.openai_api_key = openai_api_key
190
+
191
+ def inference(self, plan, report):
192
+ reviewer_1 = "You are a harsh but fair reviewer and expect good experiments that lead to insights for the research topic."
193
+ review_1 = get_score(outlined_plan=plan, latex=report, reward_model_llm=self.model, reviewer_type=reviewer_1, openai_api_key=self.openai_api_key)
194
+
195
+ reviewer_2 = "You are a harsh and critical but fair reviewer who is looking for an idea that would be impactful in the field."
196
+ review_2 = get_score(outlined_plan=plan, latex=report, reward_model_llm=self.model, reviewer_type=reviewer_2, openai_api_key=self.openai_api_key)
197
+
198
+ reviewer_3 = "You are a harsh but fair open-minded reviewer that is looking for novel ideas that have not been proposed before."
199
+ review_3 = get_score(outlined_plan=plan, latex=report, reward_model_llm=self.model, reviewer_type=reviewer_3, openai_api_key=self.openai_api_key)
200
+
201
+ return f"Reviewer #1:\n{review_1}, \nReviewer #2:\n{review_2}, \nReviewer #3:\n{review_3}"
202
+
203
+
204
+ class BaseAgent:
205
+ def __init__(self, model="gpt-4o-mini", notes=None, max_steps=100, openai_api_key=None):
206
+ if notes is None: self.notes = []
207
+ else: self.notes = notes
208
+ self.max_steps = max_steps
209
+ self.model = model
210
+ self.phases = []
211
+ self.plan = str()
212
+ self.report = str()
213
+ self.history = list()
214
+ self.prev_comm = str()
215
+ self.prev_report = str()
216
+ self.exp_results = str()
217
+ self.dataset_code = str()
218
+ self.results_code = str()
219
+ self.lit_review_sum = str()
220
+ self.interpretation = str()
221
+ self.prev_exp_results = str()
222
+ self.reviewer_response = str()
223
+ self.prev_results_code = str()
224
+ self.prev_interpretation = str()
225
+ self.openai_api_key = openai_api_key
226
+
227
+ self.second_round = False
228
+ self.max_hist_len = 15
229
+
230
+ def set_model_backbone(self, model):
231
+ self.model = model
232
+
233
+ @staticmethod
234
+ def clean_text(text):
235
+ """
236
+ Fix minor corrections
237
+ :return: (str) corrected text
238
+ """
239
+ text = text.replace("```\n", "```")
240
+ return text
241
+
242
+ def override_inference(self, query, temp=0.0):
243
+ sys_prompt = f"""You are {self.role_description()}"""
244
+ model_resp = query_model(model_str=self.model, system_prompt=sys_prompt, prompt=query, temp=temp, openai_api_key=self.openai_api_key)
245
+ return model_resp
246
+
247
+ def inference(self, research_topic, phase, step, feedback="", temp=None):
248
+ sys_prompt = f"""You are {self.role_description()} \nTask instructions: {self.phase_prompt(phase)}\n{self.command_descriptions(phase)}"""
249
+ context = self.context(phase)
250
+ history_str = "\n".join([_[1] for _ in self.history])
251
+ phase_notes = [_note for _note in self.notes if phase in _note["phases"]]
252
+ notes_str = f"Notes for the task objective: {phase_notes}\n" if len(phase_notes) > 0 else ""
253
+ complete_str = str()
254
+ if step/(self.max_steps-1) > 0.7: complete_str = "You must finish this task and submit as soon as possible!"
255
+ prompt = (
256
+ f"""{context}\n{'~' * 10}\nHistory: {history_str}\n{'~' * 10}\n"""
257
+ f"Current Step #{step}, Phase: {phase}\n{complete_str}\n"
258
+ f"[Objective] Your goal is to perform research on the following topic: {research_topic}\n"
259
+ f"Feedback: {feedback}\nNotes: {notes_str}\nYour previous command was: {self.prev_comm}. Make sure your new output is very different.\nPlease produce a single command below:\n")
260
+ model_resp = query_model(model_str=self.model, system_prompt=sys_prompt, prompt=prompt, temp=temp, openai_api_key=self.openai_api_key)
261
+ print("^"*50, phase, "^"*50)
262
+ model_resp = self.clean_text(model_resp)
263
+ self.prev_comm = model_resp
264
+ steps_exp = None
265
+ if feedback is not None and "```EXPIRATION" in feedback:
266
+ steps_exp = int(feedback.split("\n")[0].replace("```EXPIRATION ", ""))
267
+ feedback = extract_prompt(feedback, "EXPIRATION")
268
+ self.history.append((steps_exp, f"Step #{step}, Phase: {phase}, Feedback: {feedback}, Your response: {model_resp}"))
269
+ # remove histories that have expiration dates
270
+ for _i in reversed(range(len(self.history))):
271
+ if self.history[_i][0] is not None:
272
+ self.history[_i] = (self.history[_i][0] - 1, self.history[_i][1])
273
+ if self.history[_i][0] < 0:
274
+ self.history.pop(_i)
275
+ if len(self.history) >= self.max_hist_len:
276
+ self.history.pop(0)
277
+ return model_resp
278
+
279
+ def reset(self):
280
+ self.history.clear() # Clear the deque
281
+ self.prev_comm = ""
282
+
283
+ def context(self, phase):
284
+ raise NotImplementedError("Subclasses should implement this method.")
285
+
286
+ def phase_prompt(self, phase):
287
+ raise NotImplementedError("Subclasses should implement this method.")
288
+
289
+ def role_description(self):
290
+ raise NotImplementedError("Subclasses should implement this method.")
291
+
292
+ def command_descriptions(self, phase):
293
+ raise NotImplementedError("Subclasses should implement this method.")
294
+
295
+ def example_command(self, phase):
296
+ raise NotImplementedError("Subclasses should implement this method.")
297
+
298
+
299
+ class ProfessorAgent(BaseAgent):
300
+ def __init__(self, model="gpt4omini", notes=None, max_steps=100, openai_api_key=None):
301
+ super().__init__(model, notes, max_steps, openai_api_key)
302
+ self.phases = ["report writing"]
303
+
304
+ def generate_readme(self):
305
+ sys_prompt = f"""You are {self.role_description()} \n Here is the written paper \n{self.report}. Task instructions: Your goal is to integrate all of the knowledge, code, reports, and notes provided to you and generate a readme.md for a github repository."""
306
+ history_str = "\n".join([_[1] for _ in self.history])
307
+ prompt = (
308
+ f"""History: {history_str}\n{'~' * 10}\n"""
309
+ f"Please produce the readme below in markdown:\n")
310
+ model_resp = query_model(model_str=self.model, system_prompt=sys_prompt, prompt=prompt, openai_api_key=self.openai_api_key)
311
+ return model_resp.replace("```markdown", "")
312
+
313
+ def context(self, phase):
314
+ #sr_str = str()
315
+ #if self.second_round:
316
+ # sr_str = (
317
+ # f"The following are results from the previous experiments\n",
318
+ # f"Previous Experiment code: {self.prev_results_code}\n"
319
+ # f"Previous Results: {self.prev_exp_results}\n"
320
+ # f"Previous Interpretation of results: {self.prev_interpretation}\n"
321
+ # f"Previous Report: {self.prev_report}\n"
322
+ # f"{self.reviewer_response}\n\n\n"
323
+ # )
324
+ #if phase == "report writing":
325
+ # return (
326
+ # sr_str,
327
+ # f"Current Literature Review: {self.lit_review_sum}\n"
328
+ # f"Current Plan: {self.plan}\n"
329
+ # f"Current Dataset code: {self.dataset_code}\n"
330
+ # f"Current Experiment code: {self.results_code}\n"
331
+ # f"Current Results: {self.exp_results}\n"
332
+ # f"Current Interpretation of results: {self.interpretation}\n"
333
+ # )
334
+ return ""
335
+
336
+ def example_command(self, phase):
337
+ if phase not in self.phases:
338
+ raise Exception(f"Invalid phase: {phase}")
339
+ return (
340
+ "You can produce dialogue using the following command: ```DIALOGUE\ndialogue here\n```\n where dialogue here is the actual dialogue you will send and DIALOGUE is just the word DIALOGUE.\n"
341
+ "When performing a command, make sure to include the three ticks (```) at the top and bottom ```COMMAND\n<Insert command here>\n``` where COMMAND is the specific command you want to run (e.g. REPORT, DIALOGUE).\n")
342
+
343
+ def command_descriptions(self, phase):
344
+ if phase not in self.phases:
345
+ raise Exception(f"Invalid phase: {phase}")
346
+ return (
347
+ "When you believe a good report has been arrived at between you and the PhD student you can use the following command to end the dialogue and submit the plan ```LATEX\nreport here\n```\n where report here is the actual report written in compilable latex to be transmitted and LATEX is just the word LATEX.\n"
348
+ "Your report should include numbers, relevant metrics to the experiment (e.g. accuracy or loss) and measures of significance. You must propagate this information accurately. You must also submit the report promptly. Do not delay too long.\n"
349
+ "You must be incredibly detailed about what you did for the experiment and all of the findings.\n"
350
+ )
351
+
352
+ def phase_prompt(self, phase):
353
+ if phase not in self.phases:
354
+ raise Exception(f"Invalid phase: {phase}")
355
+ phase_str = (
356
+ "You are directing a PhD student to help them write a report in latex based on results from an experiment, and you interact with them through dialogue.\n"
357
+ "Your goal is to write a report in latex for an experiment. You should read through the code, read through the interpretation, and look at the results to understand what occurred. You should then discuss with the PhD student how they can write up the results and give their feedback to improve their thoughts.\n"
358
+ )
359
+ return phase_str
360
+
361
+ def role_description(self):
362
+ return "a computer science professor at a top university."
363
+
364
+
365
+ class PostdocAgent(BaseAgent):
366
+ def __init__(self, model="gpt4omini", notes=None, max_steps=100, openai_api_key=None):
367
+ super().__init__(model, notes, max_steps, openai_api_key)
368
+ self.phases = ["plan formulation", "results interpretation"]
369
+
370
+ def context(self, phase):
371
+ sr_str = str()
372
+ if self.second_round:
373
+ sr_str = (
374
+ f"The following are results from the previous experiments\n",
375
+ f"Previous Experiment code: {self.prev_results_code}\n"
376
+ f"Previous Results: {self.prev_exp_results}\n"
377
+ f"Previous Interpretation of results: {self.prev_interpretation}\n"
378
+ f"Previous Report: {self.prev_report}\n"
379
+ f"{self.reviewer_response}\n\n\n"
380
+ )
381
+ if phase == "plan formulation":
382
+ return (
383
+ sr_str,
384
+ f"Current Literature Review: {self.lit_review_sum}",
385
+ )
386
+ elif phase == "results interpretation":
387
+ return (
388
+ sr_str,
389
+ f"Current Literature Review: {self.lit_review_sum}\n"
390
+ f"Current Plan: {self.plan}\n"
391
+ f"Current Dataset code: {self.dataset_code}\n"
392
+ f"Current Experiment code: {self.results_code}\n"
393
+ f"Current Results: {self.exp_results}"
394
+ )
395
+ return ""
396
+
397
+ def example_command(self, phase):
398
+ if phase not in self.phases:
399
+ raise Exception(f"Invalid phase: {phase}")
400
+ return ()
401
+
402
+ def command_descriptions(self, phase):
403
+ if phase not in self.phases:
404
+ raise Exception(f"Invalid phase: {phase}")
405
+ if phase == "plan formulation":
406
+ return (
407
+ "You can produce dialogue using the following command: ```DIALOGUE\ndialogue here\n```\n where dialogue here is the actual dialogue you will send and DIALOGUE is just the word DIALOGUE.\n"
408
+ "When you believe a good plan has been arrived at between you and the PhD student you can use the following command to end the dialogue and submit the plan ```PLAN\nplan here\n```\n where plan here is the actual plan to be transmitted and PLAN is just the word PLAN. Plan here should provide a clear outline for how to achieve the task, including what machine learning models to use and implement, what types of datasets should be searched for and used to train the model, and the exact details of the experiment.\n"
409
+ "You can only use a SINGLE command per inference turn. Do not use more than one command per inference. If you use multiple commands, then only one of them will be executed, NOT BOTH.\n"
410
+ "Make sure not to produce too much dialogue and to submit an plan in reasonable time."
411
+ "When performing a command, make sure to include the three ticks (```) at the top and bottom ```COMMAND\ntext\n``` where COMMAND is the specific command you want to run (e.g. PLAN, DIALOGUE).\n"
412
+ )
413
+ elif phase == "results interpretation":
414
+ return (
415
+ "When you believe a good interpretation has been arrived at between you and the PhD student you can use the following command to end the dialogue and submit the plan ```INTERPRETATION\ninterpretation here\n```\n where interpretation here is the actual interpretation to be transmitted and INTERPRETATION is just the word INTERPRETATION. Please provide an INTERPRETATION in a reasonable amount of time.\n"
416
+ "You can produce dialogue using the following command: ```DIALOGUE\ndialogue here\n```\n where dialogue here is the actual dialogue you will send and DIALOGUE is just the word DIALOGUE.\n"
417
+ "You must submit the interpretation during this phase in a reasonable amount of time. Do not delay the submission."
418
+ "When performing a command, make sure to include the three ticks (```) at the top and bottom ```COMMAND\ntext\n``` where COMMAND is the specific command you want to run (e.g. INTERPRETATION, DIALOGUE).\n"
419
+ )
420
+
421
+ def phase_prompt(self, phase):
422
+ if phase not in self.phases:
423
+ raise Exception(f"Invalid phase: {phase}")
424
+ if phase == "plan formulation":
425
+ phase_str = (
426
+ "You are directing a PhD student to help them come up with a good plan, and you interact with them through dialogue.\n"
427
+ "Your goal is to produce plans that would make good experiments for the given topic. You should aim for a very simple experiment that showcases your plan, not a complex one. You should integrate the provided literature review and come up with plans on how to expand and build on these works for the given topic. Your plans should provide a clear outline for how to achieve the task, including what machine learning models to use and implement, what types of datasets should be searched for and used to train the model, and the exact details of the experiment. Your idea should be very innovative and unlike anything seen before.\n"
428
+ )
429
+ elif phase == "results interpretation":
430
+ phase_str = (
431
+ "You are directing a PhD student to help them come up with an interpretation for results from an experiment, and you interact with them through dialogue.\n"
432
+ "Your goal is to interpret results from experiments that were previously run. You should read through the code and look at the results to understand what occurred. You should then discuss with the PhD student how they can interpret the results and give their feedback to improve their thoughts. You should integrate the provided literature review, code, and plans to come up with an exciting interpretation that could make a compelling paper. Your plans should provide a clear outline that can be used to write an academic paper.\n"
433
+ "Your interpretation should include numbers, relevant metrics to the experiment (e.g. accuracy or loss) and measures of significance. You must propagate this information accurately. You must also complete this in a reasonable amount of time and then submit your results.\n"
434
+ )
435
+ return phase_str
436
+
437
+ def role_description(self):
438
+ return "a computer science postdoctoral student at a top university."
439
+
440
+
441
+ class MLEngineerAgent(BaseAgent):
442
+ def __init__(self, model="gpt4omini", notes=None, max_steps=100, openai_api_key=None):
443
+ super().__init__(model, notes, max_steps, openai_api_key)
444
+ self.phases = [
445
+ "data preparation",
446
+ "running experiments",
447
+ ]
448
+
449
+ def context(self, phase):
450
+ sr_str = str()
451
+ if self.second_round:
452
+ sr_str = (
453
+ f"The following are results from the previous experiments\n",
454
+ f"Previous Experiment code: {self.prev_results_code}\n"
455
+ f"Previous Results: {self.prev_exp_results}\n"
456
+ f"Previous Interpretation of results: {self.prev_interpretation}\n"
457
+ f"Previous Report: {self.prev_report}\n"
458
+ f"{self.reviewer_response}\n\n\n"
459
+ )
460
+ if phase == "data preparation":
461
+ return (
462
+ sr_str,
463
+ f"Current Literature Review: {self.lit_review_sum}\nPlan: {self.plan}",
464
+ f"Current Plan: {self.plan}")
465
+ #elif phase == "running experiments":
466
+ # return (
467
+ # sr_str,
468
+ # f"Current Literature Review: {self.lit_review_sum}\n"
469
+ # f"Current Plan: {self.plan}\n"
470
+ # f"Current Dataset code: {self.dataset_code}\n"
471
+ # )
472
+ return ""
473
+
474
+ def example_command(self, phase):
475
+ if phase not in self.phases:
476
+ raise Exception(f"Invalid phase: {phase}")
477
+ return ()
478
+
479
+ def command_descriptions(self, phase):
480
+ if phase not in self.phases:
481
+ raise Exception(f"Invalid phase: {phase}")
482
+ if phase == "data preparation":
483
+ return (
484
+ "You can produce code using the following command: ```python\ncode here\n```\n where code here is the actual code you will execute in a Python terminal, and python is just the word python. Try to incorporate some print functions. Do not use any classes or functions. If your code returns any errors, they will be provided to you, and you are also able to see print statements. You will receive all print statement results from the code. Make sure function variables are created inside the function or passed as a function parameter.\n" # Try to avoid creating functions.
485
+ "You can produce dialogue using the following command: ```DIALOGUE\ndialogue here\n```\n where dialogue here is the actual dialogue you will send, and DIALOGUE is just the word DIALOGUE.\n"
486
+ "You also have access to HuggingFace datasets. You can search the datasets repository using the following command: ```SEARCH_HF\nsearch query here\n``` where search query here is the query used to search HuggingFace datasets, and SEARCH_HF is the word SEARCH_HF. This will return a list of HuggingFace dataset descriptions which can be loaded into Python using the datasets library. Your code MUST use an external HuggingFace directory.\n"
487
+ "You MUST use a HuggingFace dataset in your code. DO NOT CREATE A MAIN FUNCTION. Try to make the code very simple.\n"
488
+ "You can only use a SINGLE command per inference turn. Do not use more than one command per inference. If you use multiple commands, then only one of them will be executed, NOT BOTH.\n"
489
+ "When performing a command, make sure to include the three ticks (```) at the top and bottom ```COMMAND\ntext\n``` where COMMAND is the specific command you want to run (e.g. python, DIALOGUE, SEARCH_HF).\n")
490
+ return ()
491
+
492
+ def phase_prompt(self, phase):
493
+ if phase not in self.phases:
494
+ raise Exception(f"Invalid phase: {phase}")
495
+ if phase == "data preparation":
496
+ phase_str = (
497
+ "You are a machine learning engineer being directed by a PhD student who will help you write the code, and you can interact with them through dialogue.\n"
498
+ "Your goal is to produce code that prepares the data for the provided experiment. You should aim for simple code to prepare the data, not complex code. You should integrate the provided literature review and the plan and come up with code to prepare data for this experiment.\n"
499
+ )
500
+ return phase_str
501
+
502
+ def role_description(self):
503
+ return "a machine learning engineer working at a top university."
504
+
505
+
506
+
507
+ class SWEngineerAgent(BaseAgent):
508
+ def __init__(self, model="gpt4omini", notes=None, max_steps=100, openai_api_key=None):
509
+ super().__init__(model, notes, max_steps, openai_api_key)
510
+ self.phases = [
511
+ "data preparation",
512
+ ]
513
+
514
+ def context(self, phase):
515
+ sr_str = str()
516
+ if self.second_round:
517
+ sr_str = (
518
+ f"The following are results from the previous experiments\n",
519
+ f"Previous Experiment code: {self.prev_results_code}\n"
520
+ f"Previous Results: {self.prev_exp_results}\n"
521
+ f"Previous Interpretation of results: {self.prev_interpretation}\n"
522
+ f"Previous Report: {self.prev_report}\n"
523
+ f"{self.reviewer_response}\n\n\n"
524
+ )
525
+ if phase == "data preparation":
526
+ return (
527
+ sr_str,
528
+ f"Current Literature Review: {self.lit_review_sum}\nPlan: {self.plan}",
529
+ f"Current Plan: {self.plan}")
530
+ return ""
531
+
532
+ def example_command(self, phase):
533
+ if phase not in self.phases:
534
+ raise Exception(f"Invalid phase: {phase}")
535
+ return ()
536
+
537
+ def command_descriptions(self, phase):
538
+ if phase not in self.phases:
539
+ raise Exception(f"Invalid phase: {phase}")
540
+ if phase == "data preparation":
541
+ return (
542
+ "You can produce dialogue using the following command: ```DIALOGUE\ndialogue here\n```\n where 'dialogue here' is the actual dialogue you will send and DIALOGUE is just the word DIALOGUE.\n"
543
+ "When you and the ML engineer have finalized your dataset preparation code and are ready to submit the final code, please use the following command: ```SUBMIT_CODE\ncode here\n```\n where 'code here' is the finalized code you will send and SUBMIT_CODE is just the word SUBMIT_CODE. Do not use any classes or functions. The submitted code must have a HuggingFace dataset import and must use an external HuggingFace dataset. If your code returns any errors, they will be provided to you, and you are also able to see print statements. Make sure function variables are created inside the function or passed as a function parameter. DO NOT CREATE A MAIN FUNCTION.\n"
544
+ "Make sure to submit code in a reasonable amount of time. Do not make the code too complex, try to make it simple. Do not take too long to submit code. Submit the code early. You should submit the code ASAP.\n"
545
+ "You can only use a single command per inference turn. Do not use more than one command per inference. If you use multiple commands, then only one of them will be executed, not both.\n"
546
+ "When performing a command, make sure to include the three ticks (```) at the top and bottom ```COMMAND\ntext\n``` where COMMAND is the specific command you want to run (e.g. SUBMIT_CODE, DIALOGUE).\n")
547
+ return ""
548
+
549
+ def phase_prompt(self, phase):
550
+ if phase not in self.phases:
551
+ raise Exception(f"Invalid phase: {phase}")
552
+ elif phase == "data preparation":
553
+ phase_str = (
554
+ "You are a software engineer directing a machine learning engineer, where the machine learning engineer will be writing the code, and you can interact with them through dialogue.\n"
555
+ "Your goal is to help the ML engineer produce code that prepares the data for the provided experiment. You should aim for very simple code to prepare the data, not complex code. You should integrate the provided literature review and the plan and come up with code to prepare data for this experiment.\n"
556
+ )
557
+ return phase_str
558
+
559
+ def role_description(self):
560
+ return "a software engineer working at a top university."
561
+
562
+
563
+ class PhDStudentAgent(BaseAgent):
564
+ def __init__(self, model="gpt4omini", notes=None, max_steps=100, openai_api_key=None):
565
+ super().__init__(model, notes, max_steps, openai_api_key)
566
+ self.phases = [
567
+ "literature review",
568
+ "plan formulation",
569
+ "running experiments",
570
+ "results interpretation",
571
+ "report writing",
572
+ "report refinement",
573
+ ]
574
+ self.lit_review = []
575
+
576
+ def context(self, phase):
577
+ sr_str = str()
578
+ if self.second_round:
579
+ sr_str = (
580
+ f"The following are results from the previous experiments\n",
581
+ f"Previous Experiment code: {self.prev_results_code}\n"
582
+ f"Previous Results: {self.prev_exp_results}\n"
583
+ f"Previous Interpretation of results: {self.prev_interpretation}\n"
584
+ f"Previous Report: {self.prev_report}\n"
585
+ f"{self.reviewer_response}\n\n\n"
586
+ )
587
+ if phase == "plan formulation":
588
+ return (
589
+ sr_str,
590
+ f"Current Literature Review: {self.lit_review_sum}",)
591
+ elif phase == "data preparation":
592
+ return (
593
+ sr_str,
594
+ f"Current Literature Review: {self.lit_review_sum}\n"
595
+ f"Current Plan: {self.plan}"
596
+ )
597
+ elif phase == "results interpretation":
598
+ return (
599
+ sr_str,
600
+ f"Current Literature Review: {self.lit_review_sum}\n"
601
+ f"Current Plan: {self.plan}\n"
602
+ f"Current Dataset code: {self.dataset_code}\n"
603
+ f"Current Experiment code: {self.results_code}\n"
604
+ f"Current Results: {self.exp_results}"
605
+ )
606
+ elif phase == "report refinement":
607
+ return (
608
+ sr_str,
609
+ f"Current Literature Review: {self.lit_review_sum}\n"
610
+ f"Current Plan: {self.plan}\n"
611
+ f"Current Dataset code: {self.dataset_code}\n"
612
+ f"Current Experiment code: {self.results_code}\n"
613
+ f"Current Results: {self.exp_results}\n"
614
+ f"Current Interpretation of results: {self.interpretation}"
615
+ )
616
+ elif phase == "literature review":
617
+ return sr_str
618
+ else:
619
+ return ""
620
+
621
+ def requirements_txt(self):
622
+ sys_prompt = f"""You are {self.role_description()} \nTask instructions: Your goal is to integrate all of the knowledge, code, reports, and notes provided to you and generate a requirements.txt for a github repository for all of the code."""
623
+ history_str = "\n".join([_[1] for _ in self.history])
624
+ prompt = (
625
+ f"""History: {history_str}\n{'~' * 10}\n"""
626
+ f"Please produce the requirements.txt below in markdown:\n")
627
+ model_resp = query_model(model_str=self.model, system_prompt=sys_prompt, prompt=prompt, openai_api_key=self.openai_api_key)
628
+ return model_resp
629
+
630
+ def example_command(self, phase):
631
+ if phase not in self.phases:
632
+ raise Exception(f"Invalid phase: {phase}")
633
+ return ()
634
+
635
+ def command_descriptions(self, phase):
636
+ if phase not in self.phases:
637
+ raise Exception(f"Invalid phase: {phase}")
638
+ if phase == "literature review":
639
+ return (
640
+ "To collect paper summaries, use the following command: ```SUMMARY\nSEARCH QUERY\n```\n where SEARCH QUERY is a string that will be used to find papers with semantically similar content and SUMMARY is just the word SUMMARY. Make sure your search queries are very short.\n"
641
+ "To get the full paper text for an arXiv paper, use the following command: ```FULL_TEXT\narXiv paper ID\n```\n where arXiv paper ID is the ID of the arXiv paper (which can be found by using the SUMMARY command), and FULL_TEXT is just the word FULL_TEXT. Make sure to read the full text using the FULL_TEXT command before adding it to your list of relevant papers.\n"
642
+ "If you believe a paper is relevant to the research project proposal, you can add it to the official review after reading using the following command: ```ADD_PAPER\narXiv_paper_ID\nPAPER_SUMMARY\n```\nwhere arXiv_paper_ID is the ID of the arXiv paper, PAPER_SUMMARY is a brief summary of the paper, and ADD_PAPER is just the word ADD_PAPER. You can only add one paper at a time. \n"
643
+ "Make sure to use ADD_PAPER when you see a relevant paper. DO NOT use SUMMARY too many times."
644
+ "You can only use a single command per inference turn. Do not use more than one command per inference. If you use multiple commands, then only one of them will be executed, not both.\n"
645
+ "Make sure to extensively discuss the experimental results in your summary.\n"
646
+ "When performing a command, make sure to include the three ticks (```) at the top and bottom ```COMMAND\ntext\n``` where COMMAND is the specific command you want to run (e.g. ADD_PAPER, FULL_TEXT, SUMMARY). Do not use the word COMMAND make sure to use the actual command, e.g. your command should look exactly like this: ```ADD_PAPER\ntext\n``` (where the command could be from ADD_PAPER, FULL_TEXT, SUMMARY)\n")
647
+ elif phase == "plan formulation":
648
+ return (
649
+ "You can produce dialogue using the following command: ```DIALOGUE\ndialogue here\n```\n where 'dialogue here' is the actual dialogue you will send and DIALOGUE is just the word DIALOGUE.\n"
650
+ "You can only use a single command per inference turn. Do not use more than one command per inference. If you use multiple commands, then only one of them will be executed, not both.\n"
651
+ "When performing a command, make sure to include the three ticks (```) at the top and bottom ```COMMAND\ntext\n``` where COMMAND is the specific command you want to run (e.g. DIALOGUE).\n"
652
+ )
653
+ elif phase == "data preparation":
654
+ return (
655
+ "You can produce dialogue using the following command: ```DIALOGUE\ndialogue here\n```\n where 'dialogue here' is the actual dialogue you will send and DIALOGUE is just the word DIALOGUE.\n"
656
+ "When you and the ML engineer have finalized your dataset preparation code and are ready to submit the final code, please use the following command: ```SUBMIT_CODE\ncode here\n```\n where 'code here' is the finalized code you will send and SUBMIT_CODE is just the word SUBMIT_CODE. Do not use any classes or functions. The submitted code must have a HuggingFace dataset import and must use an external HuggingFace dataset. If your code returns any errors, they will be provided to you, and you are also able to see print statements. Make sure function variables are created inside the function or passed as a function parameter. DO NOT CREATE A MAIN FUNCTION.\n"
657
+ "Make sure to submit code in a reasonable amount of time. Do not make the code too complex, try to make it simple. Do not take too long to submit code. Submit the code early. You should submit the code ASAP.\n"
658
+ "You can only use a single command per inference turn. Do not use more than one command per inference. If you use multiple commands, then only one of them will be executed, not both.\n"
659
+ "When performing a command, make sure to include the three ticks (```) at the top and bottom ```COMMAND\ntext\n``` where COMMAND is the specific command you want to run (e.g. SUBMIT_CODE, DIALOGUE).\n")
660
+ elif phase == "results interpretation":
661
+ return (
662
+ "You can produce dialogue using the following command: ```DIALOGUE\ndialogue here\n```\n where 'dialogue here' is the actual dialogue you will send and DIALOGUE is just the word DIALOGUE.\n"
663
+ "When performing a command, make sure to include the three ticks (```) at the top and bottom ```COMMAND\ntext\n``` where COMMAND is the specific command you want to run (e.g. DIALOGUE).\n"
664
+ )
665
+ #elif phase == "report writing":
666
+ # return (
667
+ # "You can produce dialogue using the following command: ```DIALOGUE\ndialogue here\n```\n where 'dialogue here' is the actual dialogue you will send and DIALOGUE is just the word DIALOGUE.\n"
668
+ # "When performing a command, make sure to include the three ticks (```) at the top and bottom ```COMMAND\ntext\n``` where COMMAND is the specific command you want to run (e.g. DIALOGUE).\n")
669
+ elif phase == "report refinement":
670
+ return ""
671
+ return ""
672
+
673
+ def phase_prompt(self, phase):
674
+ if phase not in self.phases:
675
+ raise Exception(f"Invalid phase: {phase}")
676
+
677
+ if phase == "literature review":
678
+ phase_str = (
679
+ "Your goal is to perform a literature review for the presented task and add papers to the literature review.\n"
680
+ "You have access to arXiv and can perform two search operations: (1) finding many different paper summaries from a search query and (2) getting a single full paper text for an arXiv paper.\n"
681
+ )
682
+ rev_papers = "Papers in your review so far: " + " ".join([_paper["arxiv_id"] for _paper in self.lit_review])
683
+ phase_str += rev_papers if len(self.lit_review) > 0 else ""
684
+ elif phase == "plan formulation":
685
+ phase_str = (
686
+ "You are a PhD student being directed by a postdoc who will help you come up with a good plan, and you interact with them through dialogue.\n"
687
+ "Your goal is to produce plans that would make good experiments for the given topic. You should aim for a very simple experiment that showcases your plan, not a complex one. You should integrate the provided literature review and come up with plans on how to expand and build on these works for the given topic. Your plans should provide a clear outline for how to achieve the task, including what machine learning models to use and implement, what types of datasets should be searched for and used to train the model, and the exact details of the experiment. Your idea should be very innovative and unlike anything seen before.\n"
688
+ )
689
+ elif phase == "results interpretation":
690
+ phase_str = (
691
+ "You are a PhD student being directed by a postdoc who will help you come up with an interpretation for results from an experiment, and you interact with them through dialogue.\n"
692
+ "Your goal is to interpret results from experiments that were previously run. You should read through the code and look at the results to understand what occurred. You should then discuss with the postdoc your interpretation and use their feedback to improve your thoughts. You should integrate the provided literature review, code, and plans to come up with an exciting interpretation that could make a compelling paper. Your plans should provide a clear outline that can be used to write an academic paper.\n"
693
+ "Your interpretation should include numbers, relevant metrics to the experiment (e.g. accuracy or loss) and measures of significance. You must propagate this information accurately.\n"
694
+ "You must submit the interpretation during this phase in a reasonable amount of time. Do not delay the submission."
695
+ )
696
+ #elif phase == "report writing":
697
+ # phase_str = (
698
+ # "You are a PhD student being directed by a professor who will help you write a report based on results from an experiment, and you interact with them through dialogue.\n"
699
+ # "Your goal is to write a report for an experiment entirely in latex. You should read through the code, read through the interpretation, and look at the results to understand what occurred. You should then discuss with the professor how you can write up the results and receive their feedback to improve your thoughts.\n"
700
+ # "Your report should include numbers, relevant metrics to the experiment (e.g. accuracy or loss) and measures of significance in latex. You must propagate this information accurately.\n"
701
+ # "You must be incredibly detailed about what you did for the experiment and all of the findings.\n"
702
+ # )
703
+ elif phase == "report refinement":
704
+ phase_str = (
705
+ "You are a PhD student who has submitted their paper to an ML conference called ICLR. Your goal was to write a research paper and get high scores from the reviewers so that it get accepted to the conference.\n"
706
+ )
707
+ else:
708
+ phase_str = ""
709
+ return phase_str
710
+
711
+ def role_description(self):
712
+ return "a computer science PhD student at a top university."
713
+
714
+ def add_review(self, review, arx_eng, agentrxiv=False, GLOBAL_AGENTRXIV=None):
715
+ try:
716
+ if agentrxiv:
717
+ arxiv_id = review.split("\n")[0]
718
+ review_text = "\n".join(review.split("\n")[1:])
719
+ full_text = GLOBAL_AGENTRXIV.retrieve_full_text(arxiv_id,)
720
+ else:
721
+ arxiv_id, review_text = review.strip().split("\n", 1)
722
+ full_text = arx_eng.retrieve_full_paper_text(arxiv_id)
723
+ review_entry = {
724
+ "arxiv_id": arxiv_id,
725
+ "full_text": full_text,
726
+ "summary": review_text,
727
+ }
728
+ self.lit_review.append(review_entry)
729
+ return f"Successfully added paper {arxiv_id}", full_text
730
+ except Exception as e:
731
+ return f"Error trying to add review -- bad formatting, try again: {str(e)}. Your provided Arxiv ID might not be valid. Make sure it references a real paper, which can be found using the SUMMARY command.", ""
732
+
733
+ def format_review(self):
734
+ return "Provided here is a literature review on this topic:\n" + "\n".join(
735
+ f"arXiv ID: {_l['arxiv_id']}, Summary: {_l['summary']}"
736
+ for _l in self.lit_review)
737
+
738
+
739
+
ai_lab_repo.py ADDED
@@ -0,0 +1,891 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import PyPDF2
2
+ import threading
3
+ from app import *
4
+ from agents import *
5
+ from copy import copy
6
+ from pathlib import Path
7
+ from datetime import date
8
+ from common_imports import *
9
+ from mlesolver import MLESolver
10
+ import argparse, pickle, yaml
11
+
12
+ GLOBAL_AGENTRXIV = None
13
+ DEFAULT_LLM_BACKBONE = "o3-mini"
14
+ RESEARCH_DIR_PATH = "MATH_research_dir"
15
+
16
+ os.environ["TOKENIZERS_PARALLELISM"] = "false"
17
+
18
+
19
+ class LaboratoryWorkflow:
20
+ def __init__(self, research_topic, openai_api_key, max_steps=100, num_papers_lit_review=5, agent_model_backbone=f"{DEFAULT_LLM_BACKBONE}", notes=list(), human_in_loop_flag=None, compile_pdf=True, mlesolver_max_steps=3, papersolver_max_steps=5, paper_index=0, except_if_fail=False, parallelized=False, lab_dir=None, lab_index=0, agentRxiv=False, agentrxiv_papers=5):
21
+ """
22
+ Initialize laboratory workflow
23
+ @param research_topic: (str) description of research idea to explore
24
+ @param max_steps: (int) max number of steps for each phase, i.e. compute tolerance budget
25
+ @param num_papers_lit_review: (int) number of papers to include in the lit review
26
+ @param agent_model_backbone: (str or dict) model backbone to use for agents
27
+ @param notes: (list) notes for agent to follow during tasks
28
+ """
29
+ self.agentRxiv = agentRxiv
30
+ self.max_prev_papers = 10
31
+ self.parallelized = parallelized
32
+ self.notes = notes
33
+ self.lab_dir = lab_dir
34
+ self.lab_index = lab_index
35
+ self.max_steps = max_steps
36
+ self.compile_pdf = compile_pdf
37
+ self.paper_index = paper_index
38
+ self.openai_api_key = openai_api_key
39
+ self.except_if_fail = except_if_fail
40
+ self.research_topic = research_topic
41
+ self.model_backbone = agent_model_backbone
42
+ self.num_papers_lit_review = num_papers_lit_review
43
+
44
+ self.print_cost = True
45
+ self.review_override = True # should review be overridden?
46
+ self.review_ovrd_steps = 0 # review steps so far
47
+ self.arxiv_paper_exp_time = 3
48
+ self.reference_papers = list()
49
+
50
+ ##########################################
51
+ ####### COMPUTE BUDGET PARAMETERS ########
52
+ ##########################################
53
+ self.num_ref_papers = 1
54
+ self.review_total_steps = 0 # num steps to take if overridden
55
+ self.arxiv_num_summaries = 5
56
+ self.num_agentrxiv_papers = agentrxiv_papers
57
+ self.mlesolver_max_steps = mlesolver_max_steps
58
+ self.papersolver_max_steps = papersolver_max_steps
59
+
60
+ self.phases = [
61
+ ("literature review", ["literature review"]),
62
+ ("plan formulation", ["plan formulation"]),
63
+ ("experimentation", ["data preparation", "running experiments"]),
64
+ ("results interpretation", ["results interpretation", "report writing", "report refinement"]),
65
+ ]
66
+ self.phase_status = dict()
67
+ for phase, subtasks in self.phases:
68
+ for subtask in subtasks:
69
+ self.phase_status[subtask] = False
70
+
71
+ self.phase_models = dict()
72
+ if type(agent_model_backbone) == str:
73
+ for phase, subtasks in self.phases:
74
+ for subtask in subtasks:
75
+ self.phase_models[subtask] = agent_model_backbone
76
+ elif type(agent_model_backbone) == dict:
77
+ # todo: check if valid
78
+ self.phase_models = agent_model_backbone
79
+
80
+ self.human_in_loop_flag = human_in_loop_flag
81
+
82
+ self.statistics_per_phase = {
83
+ "literature review": {"time": 0.0, "steps": 0.0,},
84
+ "plan formulation": {"time": 0.0, "steps": 0.0,},
85
+ "data preparation": {"time": 0.0, "steps": 0.0,},
86
+ "running experiments": {"time": 0.0, "steps": 0.0,},
87
+ "results interpretation": {"time": 0.0, "steps": 0.0,},
88
+ "report writing": {"time": 0.0, "steps": 0.0,},
89
+ "report refinement": {"time": 0.0, "steps": 0.0,},
90
+ }
91
+
92
+ self.save = True
93
+ self.verbose = True
94
+ self.reviewers = ReviewersAgent(model=self.model_backbone, notes=self.notes, openai_api_key=self.openai_api_key)
95
+ self.phd = PhDStudentAgent(model=self.model_backbone, notes=self.notes, max_steps=self.max_steps, openai_api_key=self.openai_api_key)
96
+ self.postdoc = PostdocAgent(model=self.model_backbone, notes=self.notes, max_steps=self.max_steps, openai_api_key=self.openai_api_key)
97
+ self.professor = ProfessorAgent(model=self.model_backbone, notes=self.notes, max_steps=self.max_steps, openai_api_key=self.openai_api_key)
98
+ self.ml_engineer = MLEngineerAgent(model=self.model_backbone, notes=self.notes, max_steps=self.max_steps, openai_api_key=self.openai_api_key)
99
+ self.sw_engineer = SWEngineerAgent(model=self.model_backbone, notes=self.notes, max_steps=self.max_steps, openai_api_key=self.openai_api_key)
100
+
101
+
102
+ def set_model(self, model):
103
+ self.set_agent_attr("model", model)
104
+ self.reviewers.model = model
105
+
106
+ def save_state(self, phase):
107
+ """
108
+ Save state for phase
109
+ @param phase: (str) phase string
110
+ @return: None
111
+ """
112
+ with open(f"state_saves/Paper{self.paper_index}.pkl", "wb") as f:
113
+ pickle.dump(self, f)
114
+
115
+ def set_agent_attr(self, attr, obj):
116
+ """
117
+ Set attribute for all agents
118
+ @param attr: (str) agent attribute
119
+ @param obj: (object) object attribute
120
+ @return: None
121
+ """
122
+ setattr(self.phd, attr, obj)
123
+ setattr(self.postdoc, attr, obj)
124
+ setattr(self.professor, attr, obj)
125
+ setattr(self.ml_engineer, attr, obj)
126
+ setattr(self.sw_engineer, attr, obj)
127
+
128
+ def reset_agents(self):
129
+ """
130
+ Reset all agent states
131
+ @return: None
132
+ """
133
+ self.phd.reset()
134
+ self.postdoc.reset()
135
+ self.professor.reset()
136
+ self.ml_engineer.reset()
137
+ self.sw_engineer.reset()
138
+
139
+ def perform_research(self):
140
+ """
141
+ Loop through all research phases
142
+ @return: None
143
+ """
144
+ for phase, subtasks in self.phases:
145
+ phase_start_time = time.time() # Start timing the phase
146
+ if self.verbose: print(f"{'*'*50}\nBeginning phase: {phase}\n{'*'*50}")
147
+ for subtask in subtasks:
148
+ if self.agentRxiv:
149
+ if self.verbose: print(f"{'&' * 30}\n[Lab #{self.lab_index} Paper #{self.paper_index}] Beginning subtask: {subtask}\n{'&' * 30}")
150
+ else:
151
+ if self.verbose: print(f"{'&'*30}\nBeginning subtask: {subtask}\n{'&'*30}")
152
+ if type(self.phase_models) == dict:
153
+ if subtask in self.phase_models:
154
+ self.set_model(self.phase_models[subtask])
155
+ else: self.set_model(f"{DEFAULT_LLM_BACKBONE}")
156
+ if (subtask not in self.phase_status or not self.phase_status[subtask]) and subtask == "literature review":
157
+ repeat = True
158
+ while repeat: repeat = self.literature_review()
159
+ self.phase_status[subtask] = True
160
+ if (subtask not in self.phase_status or not self.phase_status[subtask]) and subtask == "plan formulation":
161
+ repeat = True
162
+ while repeat: repeat = self.plan_formulation()
163
+ self.phase_status[subtask] = True
164
+ if (subtask not in self.phase_status or not self.phase_status[subtask]) and subtask == "data preparation":
165
+ repeat = True
166
+ while repeat: repeat = self.data_preparation()
167
+ self.phase_status[subtask] = True
168
+ if (subtask not in self.phase_status or not self.phase_status[subtask]) and subtask == "running experiments":
169
+ repeat = True
170
+ while repeat: repeat = self.running_experiments()
171
+ self.phase_status[subtask] = True
172
+ if (subtask not in self.phase_status or not self.phase_status[subtask]) and subtask == "results interpretation":
173
+ repeat = True
174
+ while repeat: repeat = self.results_interpretation()
175
+ self.phase_status[subtask] = True
176
+ if (subtask not in self.phase_status or not self.phase_status[subtask]) and subtask == "report writing":
177
+ repeat = True
178
+ while repeat: repeat = self.report_writing()
179
+ self.phase_status[subtask] = True
180
+ if (subtask not in self.phase_status or not self.phase_status[subtask]) and subtask == "report refinement":
181
+ return_to_exp_phase = self.report_refinement()
182
+
183
+ if not return_to_exp_phase:
184
+ if self.save: self.save_state(subtask)
185
+ return
186
+
187
+ self.set_agent_attr("second_round", return_to_exp_phase)
188
+ self.set_agent_attr("prev_report", copy(self.phd.report))
189
+ self.set_agent_attr("prev_exp_results", copy(self.phd.exp_results))
190
+ self.set_agent_attr("prev_results_code", copy(self.phd.results_code))
191
+ self.set_agent_attr("prev_interpretation", copy(self.phd.interpretation))
192
+
193
+ self.phase_status["plan formulation"] = False
194
+ self.phase_status["data preparation"] = False
195
+ self.phase_status["running experiments"] = False
196
+ self.phase_status["results interpretation"] = False
197
+ self.phase_status["report writing"] = False
198
+ self.phase_status["report refinement"] = False
199
+ self.perform_research()
200
+ if self.save: self.save_state(subtask)
201
+ # Calculate and print the duration of the phase
202
+ phase_end_time = time.time()
203
+ phase_duration = phase_end_time - phase_start_time
204
+ print(f"Subtask '{subtask}' completed in {phase_duration:.2f} seconds.")
205
+ self.statistics_per_phase[subtask]["time"] = phase_duration
206
+
207
+ def report_refinement(self):
208
+ """
209
+ Perform report refinement phase
210
+ @return: (bool) whether to repeat the phase
211
+ """
212
+ reviews = self.reviewers.inference(self.phd.plan, self.phd.report)
213
+ print("Reviews:", reviews)
214
+ if self.human_in_loop_flag["report refinement"]:
215
+ print(f"Provided are reviews from a set of three reviewers: {reviews}")
216
+ input("Would you like to be completed with the project or should the agents go back and improve their experimental results?\n (y) for go back (n) for complete project: ")
217
+ else:
218
+ review_prompt = f"Provided are reviews from a set of three reviewers: {reviews}. Would you like to be completed with the project or do you want to go back to the planning phase and improve your experiments?\n Type y and nothing else to go back, type n and nothing else for complete project."
219
+ self.phd.phases.append("report refinement")
220
+ if self.review_override:
221
+ if self.review_total_steps == self.review_ovrd_steps:
222
+ response = "n"
223
+ else:
224
+ response = "y"
225
+ self.review_ovrd_steps += 1
226
+ else:
227
+ response = self.phd.inference(
228
+ research_topic=self.research_topic, phase="report refinement", feedback=review_prompt, step=0)
229
+ if len(response) == 0:
230
+ raise Exception("Model did not respond")
231
+ response = response.lower().strip()[0]
232
+ if response == "n":
233
+ if self.verbose: print("*"*40, "\n", "REVIEW COMPLETE", "\n", "*"*40)
234
+ return False
235
+ elif response == "y":
236
+ self.set_agent_attr("reviewer_response", f"Provided are reviews from a set of three reviewers: {reviews}.")
237
+ return True
238
+ else: raise Exception("Model did not respond")
239
+
240
+ def report_writing(self):
241
+ """
242
+ Perform report writing phase
243
+ @return: (bool) whether to repeat the phase
244
+ """
245
+ # experiment notes
246
+ report_notes = [_note["note"] for _note in self.ml_engineer.notes if "report writing" in _note["phases"]]
247
+ report_notes = f"Notes for the task objective: {report_notes}\n" if len(report_notes) > 0 else ""
248
+ # instantiate mle-solver
249
+ from papersolver import PaperSolver
250
+ self.reference_papers = []
251
+ solver = PaperSolver(notes=report_notes, max_steps=self.papersolver_max_steps, plan=self.phd.plan, exp_code=self.phd.results_code, exp_results=self.phd.exp_results, insights=self.phd.interpretation, lit_review=self.phd.lit_review, ref_papers=self.reference_papers, topic=research_topic, openai_api_key=self.openai_api_key, llm_str=self.model_backbone["report writing"], compile_pdf=compile_pdf, save_loc=self.lab_dir)
252
+ # run initialization for solver
253
+ solver.initial_solve()
254
+ # run solver for N mle optimization steps
255
+ for _ in range(self.papersolver_max_steps): solver.solve()
256
+ # get best report results
257
+ report = "\n".join(solver.best_report[0][0])
258
+ score = solver.best_report[0][1]
259
+ match = re.search(r'\\title\{([^}]*)\}', report)
260
+ if match: report_title = match.group(1).replace(" ", "_")
261
+ else: report_title = "\n".join([str(random.randint(0, 10)) for _ in range(10)])
262
+ if self.agentRxiv: shutil.copyfile(self.lab_dir + "/tex/temp.pdf", f"uploads/{report_title}.pdf")
263
+ if self.verbose: print(f"Report writing completed, reward function score: {score}")
264
+ if self.human_in_loop_flag["report writing"]:
265
+ retry = self.human_in_loop("report writing", report)
266
+ if retry: return retry
267
+ self.set_agent_attr("report", report)
268
+ readme = self.professor.generate_readme()
269
+ save_to_file(f"./{self.lab_dir}", "readme.md", readme)
270
+ save_to_file(f"./{self.lab_dir}", "report.txt", report)
271
+ self.reset_agents()
272
+ return False
273
+
274
+ def results_interpretation(self):
275
+ """
276
+ Perform results interpretation phase
277
+ @return: (bool) whether to repeat the phase
278
+ """
279
+ max_tries = self.max_steps
280
+ dialogue = str()
281
+ # iterate until max num tries to complete task is exhausted
282
+ for _i in range(max_tries):
283
+ print(f"@@ Lab #{self.lab_index} Paper #{self.paper_index} @@")
284
+ resp = self.postdoc.inference(self.research_topic, "results interpretation", feedback=dialogue, step=_i)
285
+ if self.verbose: print("Postdoc: ", resp, "\n~~~~~~~~~~~")
286
+ dialogue = str()
287
+ if "```DIALOGUE" in resp:
288
+ dialogue = extract_prompt(resp, "DIALOGUE")
289
+ dialogue = f"The following is dialogue produced by the postdoctoral researcher: {dialogue}"
290
+ if self.verbose: print("#"*40, "\n", "Postdoc Dialogue:", dialogue, "\n", "#"*40)
291
+ if "```INTERPRETATION" in resp:
292
+ interpretation = extract_prompt(resp, "INTERPRETATION")
293
+ if self.human_in_loop_flag["results interpretation"]:
294
+ retry = self.human_in_loop("results interpretation", interpretation)
295
+ if retry: return retry
296
+ self.set_agent_attr("interpretation", interpretation)
297
+ # reset agent state
298
+ self.reset_agents()
299
+ self.statistics_per_phase["results interpretation"]["steps"] = _i
300
+ return False
301
+ resp = self.phd.inference(self.research_topic, "results interpretation", feedback=dialogue, step=_i)
302
+ if self.verbose: print("PhD Student: ", resp, "\n~~~~~~~~~~~")
303
+ dialogue = str()
304
+ if "```DIALOGUE" in resp:
305
+ dialogue = extract_prompt(resp, "DIALOGUE")
306
+ dialogue = f"The following is dialogue produced by the PhD student: {dialogue}"
307
+ if self.verbose: print("#"*40, "\n", "PhD Dialogue:", dialogue, "#"*40, "\n")
308
+ raise Exception("Max tries during phase: Results Interpretation")
309
+
310
+ def running_experiments(self):
311
+ """
312
+ Perform running experiments phase
313
+ @return: (bool) whether to repeat the phase
314
+ """
315
+ # experiment notes
316
+ experiment_notes = [_note["note"] for _note in self.ml_engineer.notes if "running experiments" in _note["phases"]]
317
+ experiment_notes = f"Notes for the task objective: {experiment_notes}\n" if len(experiment_notes) > 0 else ""
318
+ # instantiate mle-solver
319
+ solver = MLESolver(dataset_code=self.ml_engineer.dataset_code, notes=experiment_notes, insights=self.ml_engineer.lit_review_sum, max_steps=self.mlesolver_max_steps, plan=self.ml_engineer.plan, openai_api_key=self.openai_api_key, llm_str=self.model_backbone["running experiments"])
320
+ # run initialization for solver
321
+ solver.initial_solve()
322
+ # run solver for N mle optimization steps
323
+ for _ in range(self.mlesolver_max_steps-1):
324
+ solver.solve()
325
+ # get best code results
326
+ code = "\n".join(solver.best_codes[0][0])
327
+ # regenerate figures from top code
328
+ #execute_code(code)
329
+ score = solver.best_codes[0][1]
330
+ exp_results = solver.best_codes[0][2]
331
+ if self.verbose: print(f"Running experiments completed, reward function score: {score}")
332
+ if self.human_in_loop_flag["running experiments"]:
333
+ retry = self.human_in_loop("data preparation", code)
334
+ if retry: return retry
335
+ save_to_file(f"./{self.lab_dir}/src", "run_experiments.py", code)
336
+ save_to_file(f"./{self.lab_dir}/src", "experiment_output.log", exp_results)
337
+ self.set_agent_attr("results_code", code)
338
+ self.set_agent_attr("exp_results", exp_results)
339
+ # reset agent state
340
+ self.reset_agents()
341
+ return False
342
+
343
+ def data_preparation(self):
344
+ """
345
+ Perform data preparation phase
346
+ @return: (bool) whether to repeat the phase
347
+ """
348
+ max_tries = self.max_steps
349
+ ml_feedback = str()
350
+ ml_dialogue = str()
351
+ swe_feedback = str()
352
+ ml_command = str()
353
+ hf_engine = HFDataSearch()
354
+ # iterate until max num tries to complete task is exhausted
355
+ for _i in range(max_tries):
356
+ print(f"@@ Lab #{self.lab_index} Paper #{self.paper_index} @@")
357
+ if ml_feedback != "":
358
+ ml_feedback_in = "Feedback provided to the ML agent: " + ml_feedback
359
+ else: ml_feedback_in = ""
360
+ resp = self.sw_engineer.inference(self.research_topic, "data preparation", feedback=f"{ml_dialogue}\nFeedback from previous command: {swe_feedback}\n{ml_command}{ml_feedback_in}", step=_i)
361
+ swe_feedback = str()
362
+ swe_dialogue = str()
363
+ if "```DIALOGUE" in resp:
364
+ dialogue = extract_prompt(resp, "DIALOGUE")
365
+ swe_dialogue = f"\nThe following is dialogue produced by the SW Engineer: {dialogue}\n"
366
+ if self.verbose: print("#"*40, f"\nThe following is dialogue produced by the SW Engineer: {dialogue}", "\n", "#"*40)
367
+ if "```SUBMIT_CODE" in resp:
368
+ final_code = extract_prompt(resp, "SUBMIT_CODE")
369
+ code_resp = execute_code(final_code, timeout=60)
370
+ if self.verbose: print("!"*100, "\n", f"CODE RESPONSE: {code_resp}")
371
+ swe_feedback += f"\nCode Response: {code_resp}\n"
372
+ if "[CODE EXECUTION ERROR]" in code_resp:
373
+ swe_feedback += "\nERROR: Final code had an error and could not be submitted! You must address and fix this error.\n"
374
+ else:
375
+ if self.human_in_loop_flag["data preparation"]:
376
+ retry = self.human_in_loop("data preparation", final_code)
377
+ if retry: return retry
378
+ save_to_file(f"./{self.lab_dir}/src", "load_data.py", final_code)
379
+ self.set_agent_attr("dataset_code", final_code)
380
+ # reset agent state
381
+ self.reset_agents()
382
+ self.statistics_per_phase["data preparation"]["steps"] = _i
383
+ return False
384
+
385
+ if ml_feedback != "":
386
+ ml_feedback_in = "Feedback from previous command: " + ml_feedback
387
+ else:
388
+ ml_feedback_in = ""
389
+ resp = self.ml_engineer.inference(
390
+ self.research_topic, "data preparation",
391
+ feedback=f"{swe_dialogue}\n{ml_feedback_in}", step=_i)
392
+ #if self.verbose: print("ML Engineer: ", resp, "\n~~~~~~~~~~~")
393
+ ml_feedback = str()
394
+ ml_dialogue = str()
395
+ ml_command = str()
396
+ if "```DIALOGUE" in resp:
397
+ dialogue = extract_prompt(resp, "DIALOGUE")
398
+ ml_dialogue = f"\nThe following is dialogue produced by the ML Engineer: {dialogue}\n"
399
+ if self.verbose: print("#" * 40, f"\nThe following is dialogue produced by the ML Engineer: {dialogue}", "#" * 40, "\n")
400
+ if "```python" in resp:
401
+ code = extract_prompt(resp, "python")
402
+ code = self.ml_engineer.dataset_code + "\n" + code
403
+ code_resp = execute_code(code, timeout=120)
404
+ ml_command = f"Code produced by the ML agent:\n{code}"
405
+ ml_feedback += f"\nCode Response: {code_resp}\n"
406
+ if self.verbose: print("!"*100, "\n", f"CODE RESPONSE: {code_resp}")
407
+ if "```SEARCH_HF" in resp:
408
+ hf_query = extract_prompt(resp, "SEARCH_HF")
409
+ hf_res = "\n".join(hf_engine.results_str(hf_engine.retrieve_ds(hf_query)))
410
+ ml_command = f"HF search command produced by the ML agent:\n{hf_query}"
411
+ ml_feedback += f"Huggingface results: {hf_res}\n"
412
+ raise Exception("Max tries during phase: Data Preparation")
413
+
414
+ def plan_formulation(self):
415
+ """
416
+ Perform plan formulation phase
417
+ @return: (bool) whether to repeat the phase
418
+ """
419
+ max_tries = self.max_steps
420
+ dialogue = str()
421
+ # iterate until max num tries to complete task is exhausted
422
+ for _i in range(max_tries):
423
+ print(f"@@ Lab #{self.lab_index} Paper #{self.paper_index} @@")
424
+ # inference postdoc to
425
+ resp = self.postdoc.inference(self.research_topic, "plan formulation", feedback=dialogue, step=_i)
426
+ if self.verbose: print("Postdoc: ", resp, "\n~~~~~~~~~~~")
427
+ dialogue = str()
428
+
429
+ if "```DIALOGUE" in resp:
430
+ dialogue = extract_prompt(resp, "DIALOGUE")
431
+ dialogue = f"The following is dialogue produced by the postdoctoral researcher: {dialogue}"
432
+ if self.verbose: print("#"*40, "\n", "Postdoc Dialogue:", dialogue, "\n", "#"*40)
433
+
434
+ if "```PLAN" in resp:
435
+ plan = extract_prompt(resp, "PLAN")
436
+ if self.human_in_loop_flag["plan formulation"]:
437
+ retry = self.human_in_loop("plan formulation", plan)
438
+ if retry: return retry
439
+ self.set_agent_attr("plan", plan)
440
+ # reset agent state
441
+ self.reset_agents()
442
+ self.statistics_per_phase["plan formulation"]["steps"] = _i
443
+ return False
444
+
445
+ resp = self.phd.inference(self.research_topic, "plan formulation", feedback=dialogue, step=_i)
446
+ if self.verbose: print("PhD Student: ", resp, "\n~~~~~~~~~~~")
447
+
448
+ dialogue = str()
449
+ if "```DIALOGUE" in resp:
450
+ dialogue = extract_prompt(resp, "DIALOGUE")
451
+ dialogue = f"The following is dialogue produced by the PhD student: {dialogue}"
452
+ if self.verbose: print("#"*40, "\n", "PhD Dialogue:", dialogue, "#"*40, "\n")
453
+ if self.except_if_fail:
454
+ raise Exception("Max tries during phase: Plan Formulation")
455
+ else:
456
+ plan = "No plan specified."
457
+ if self.human_in_loop_flag["plan formulation"]:
458
+ retry = self.human_in_loop("plan formulation", plan)
459
+ if retry: return retry
460
+ self.set_agent_attr("plan", plan)
461
+ # reset agent state
462
+ self.reset_agents()
463
+ return False
464
+
465
+ def literature_review(self):
466
+ """
467
+ Perform literature review phase
468
+ @return: (bool) whether to repeat the phase
469
+ """
470
+ arx_eng = ArxivSearch()
471
+ max_tries = self.max_steps # lit review often requires extra steps
472
+ # get initial response from PhD agent
473
+ resp = self.phd.inference(self.research_topic, "literature review", step=0, temp=0.4)
474
+ if self.verbose: print(resp, "\n~~~~~~~~~~~")
475
+ # iterate until max num tries to complete task is exhausted
476
+ for _i in range(max_tries):
477
+ print(f"@@ Lab #{self.lab_index} Paper #{self.paper_index} @@")
478
+ feedback = str()
479
+ # grab summary of papers from arxiv
480
+ if "```SUMMARY" in resp:
481
+ query = extract_prompt(resp, "SUMMARY")
482
+ papers = arx_eng.find_papers_by_str(query, N=self.arxiv_num_summaries)
483
+ if self.agentRxiv:
484
+ if GLOBAL_AGENTRXIV.num_papers() > 0:
485
+ papers += GLOBAL_AGENTRXIV.search_agentrxiv(query, self.num_agentrxiv_papers,)
486
+ feedback = f"You requested arXiv papers related to the query {query}, here was the response\n{papers}"
487
+
488
+ # grab full text from arxiv ID
489
+ elif "```FULL_TEXT" in resp:
490
+ query = extract_prompt(resp, "FULL_TEXT")
491
+ if self.agentRxiv and "AgentRxiv" in query: full_text = GLOBAL_AGENTRXIV.retrieve_full_text(query,)
492
+ else: full_text = arx_eng.retrieve_full_paper_text(query)
493
+ # expiration timer so that paper does not remain in context too long
494
+ arxiv_paper = f"```EXPIRATION {self.arxiv_paper_exp_time}\n" + full_text + "```"
495
+ feedback = arxiv_paper
496
+
497
+ # if add paper, extract and add to lit review, provide feedback
498
+ elif "```ADD_PAPER" in resp:
499
+ query = extract_prompt(resp, "ADD_PAPER")
500
+ if self.agentRxiv and "AgentRxiv" in query: feedback, text = self.phd.add_review(query, arx_eng, agentrxiv=True, GLOBAL_AGENTRXIV=GLOBAL_AGENTRXIV)
501
+ else: feedback, text = self.phd.add_review(query, arx_eng)
502
+ if len(self.reference_papers) < self.num_ref_papers:
503
+ self.reference_papers.append(text)
504
+
505
+ # completion condition
506
+ if len(self.phd.lit_review) >= self.num_papers_lit_review:
507
+ # generate formal review
508
+ lit_review_sum = self.phd.format_review()
509
+ # if human in loop -> check if human is happy with the produced review
510
+ if self.human_in_loop_flag["literature review"]:
511
+ retry = self.human_in_loop("literature review", lit_review_sum)
512
+ # if not happy, repeat the process with human feedback
513
+ if retry:
514
+ self.phd.lit_review = []
515
+ return retry
516
+ # otherwise, return lit review and move on to next stage
517
+ if self.verbose: print(self.phd.lit_review_sum)
518
+ # set agent
519
+ self.set_agent_attr("lit_review_sum", lit_review_sum)
520
+ # reset agent state
521
+ self.reset_agents()
522
+ self.statistics_per_phase["literature review"]["steps"] = _i
523
+ return False
524
+ resp = self.phd.inference(self.research_topic, "literature review", feedback=feedback, step=_i + 1, temp=0.4)
525
+ if self.verbose: print(resp, "\n~~~~~~~~~~~")
526
+ if self.except_if_fail: raise Exception("Max tries during phase: Literature Review")
527
+ else:
528
+ if len(self.phd.lit_review) >= self.num_papers_lit_review:
529
+ # generate formal review
530
+ lit_review_sum = self.phd.format_review()
531
+ # if human in loop -> check if human is happy with the produced review
532
+ if self.human_in_loop_flag["literature review"]:
533
+ retry = self.human_in_loop("literature review", lit_review_sum)
534
+ # if not happy, repeat the process with human feedback
535
+ if retry:
536
+ self.phd.lit_review = []
537
+ return retry
538
+ # otherwise, return lit review and move on to next stage
539
+ if self.verbose: print(self.phd.lit_review_sum)
540
+ # set agent
541
+ self.set_agent_attr("lit_review_sum", lit_review_sum)
542
+ # reset agent state
543
+ self.reset_agents()
544
+ self.statistics_per_phase["literature review"]["steps"] = _i
545
+ return False
546
+
547
+ def human_in_loop(self, phase, phase_prod):
548
+ """
549
+ Get human feedback for phase output
550
+ @param phase: (str) current phase
551
+ @param phase_prod: (str) current phase result
552
+ @return: (bool) whether to repeat the loop
553
+ """
554
+ print("\n\n\n\n\n")
555
+ print(f"Presented is the result of the phase [{phase}]: {phase_prod}")
556
+ y_or_no = None
557
+ # repeat until a valid answer is provided
558
+ while y_or_no not in ["y", "n"]:
559
+ y_or_no = input("\n\n\nAre you happy with the presented content? Respond Y or N: ").strip().lower()
560
+ # if person is happy with feedback, move on to next stage
561
+ if y_or_no == "y": pass
562
+ # if not ask for feedback and repeat
563
+ elif y_or_no == "n":
564
+ # ask the human for feedback
565
+ notes_for_agent = input("Please provide notes for the agent so that they can try again and improve performance: ")
566
+ # reset agent state
567
+ self.reset_agents()
568
+ # add suggestions to the notes
569
+ self.notes.append({
570
+ "phases": [phase],
571
+ "note": notes_for_agent})
572
+ return True
573
+ else: print("Invalid response, type Y or N")
574
+ return False
575
+
576
+ class AgentRxiv:
577
+ def __init__(self, lab_index=0):
578
+ self.lab_index = lab_index
579
+ self.server_thread = None
580
+ self.initialize_server()
581
+ self.pdf_text = dict()
582
+ self.summaries = dict()
583
+
584
+ def initialize_server(self):
585
+ # Calculate the port dynamically
586
+ port = 5000 + self.lab_index
587
+ # Start the server on the computed port using a lambda to pass the port value
588
+ self.server_thread = threading.Thread(target=lambda: self.run_server(port))
589
+ self.server_thread.daemon = True
590
+ self.server_thread.start()
591
+ time.sleep(5) # allow time for the server to start up
592
+
593
+ @staticmethod
594
+ def num_papers():
595
+ return len(os.listdir("uploads"))
596
+
597
+ def retrieve_full_text(self, arxiv_id):
598
+ try:
599
+ return self.pdf_text[arxiv_id]
600
+ except Exception:
601
+ return "Paper ID not found?"
602
+
603
+ @staticmethod
604
+ def read_pdf_pypdf2(pdf_path):
605
+ with open(pdf_path, 'rb') as pdf_file:
606
+ reader = PyPDF2.PdfReader(pdf_file)
607
+ text = ''
608
+ for page_num in range(len(reader.pages)):
609
+ page = reader.pages[page_num]
610
+ text += page.extract_text()
611
+ return text
612
+
613
+ def search_agentrxiv(self, search_query, num_papers):
614
+ # Use the dynamic port here as well
615
+ url = f'http://127.0.0.1:{5000 + self.lab_index}/api/search?q={search_query}'
616
+ return_str = str()
617
+ try:
618
+ with app.app_context():
619
+ update_papers_from_uploads()
620
+ response = requests.get(url)
621
+ response.raise_for_status()
622
+ data = response.json()
623
+ return_str += "Search Query:" + data['query']
624
+ return_str += "Results:"
625
+ for result in data['results'][:num_papers]:
626
+ arxiv_id = f"AgentRxiv:ID_{result['id']}"
627
+ if arxiv_id not in self.summaries:
628
+ filename = Path(f'_tmp_{self.lab_index}.pdf')
629
+ response = requests.get(result['pdf_url'])
630
+ filename.write_bytes(response.content)
631
+ self.pdf_text[arxiv_id] = self.read_pdf_pypdf2(f'_tmp_{self.lab_index}.pdf')
632
+ self.summaries[arxiv_id] = query_model(
633
+ prompt=self.pdf_text[arxiv_id],
634
+ system_prompt="Please provide a 5 sentence summary of this paper.",
635
+ openai_api_key=os.getenv('OPENAI_API_KEY'),
636
+ model_str="gpt-4o-mini"
637
+ )
638
+ return_str += f"Title: {result['filename']}"
639
+ return_str += f"Summary: {self.summaries[arxiv_id]}\n"
640
+ formatted_date = date.today().strftime("%d/%m/%Y")
641
+ return_str += f"Publication Date: {formatted_date}\n"
642
+ return_str += f"arXiv paper ID: AgentRxiv:ID_{result['id']}"
643
+ return_str += "-" * 40
644
+ except Exception as e:
645
+ print(f"AgentRxiv Error: {e}")
646
+ return_str += f"Error: {e}"
647
+ return return_str
648
+
649
+ def run_server(self, port):
650
+ run_app(port=port)
651
+
652
+
653
+ def parse_arguments():
654
+ parser = argparse.ArgumentParser(description="AgentLaboratory Research Workflow")
655
+
656
+ parser.add_argument(
657
+ '--yaml-location',
658
+ type=str,
659
+ default="experiment_configs/MATH_agentlab.yaml",
660
+ help='Location of YAML to load config data.'
661
+ )
662
+
663
+ return parser.parse_args()
664
+
665
+
666
+ def parse_yaml(yaml_file_loc):
667
+ with open(yaml_file_loc, 'r') as file: agentlab_data = yaml.safe_load(file)
668
+ class YamlDataHolder:
669
+ def __init__(self): pass
670
+ parser = YamlDataHolder()
671
+ if "copilot_mode" in agentlab_data: parser.copilot_mode = agentlab_data["copilot_mode"]
672
+ else: parser.copilot_mode = False
673
+ if 'load-previous' in agentlab_data: parser.load_previous = agentlab_data["load-previous"]
674
+ else: parser.load_previous = False
675
+ if 'research-topic' in agentlab_data: parser.research_topic = agentlab_data["research-topic"]
676
+ if 'api-key' in agentlab_data: parser.api_key = agentlab_data["api-key"]
677
+ if 'deepseek-api-key' in agentlab_data: parser.deepseek_api_key = agentlab_data["deepseek-api-key"]
678
+ if 'compile-latex' in agentlab_data: parser.compile_latex = agentlab_data["compile-latex"]
679
+ else: parser.compile_latex = True
680
+ if 'llm-backend' in agentlab_data: parser.llm_backend = agentlab_data["llm-backend"]
681
+ else: parser.llm_backend = "o3-mini"
682
+ if 'lit-review-backend' in agentlab_data: parser.lit_review_backend = agentlab_data["lit-review-backend"]
683
+ else: parser.lit_review_backend = "gpt-4o-mini"
684
+ if 'language' in agentlab_data: parser.language = agentlab_data["language"]
685
+ else: parser.language = "English"
686
+ if 'num-papers-lit-review' in agentlab_data: parser.num_papers_lit_review = agentlab_data["num-papers-lit-review"]
687
+ else: parser.num_papers_lit_review = 5
688
+ if 'mlesolver-max-steps' in agentlab_data: parser.mlesolver_max_steps = agentlab_data["mlesolver-max-steps"]
689
+ else: parser.mlesolver_max_steps = 3
690
+ if 'papersolver-max-steps' in agentlab_data: parser.papersolver_max_steps = agentlab_data["papersolver-max-steps"]
691
+ else: parser.papersolver_max_steps = 5
692
+ if 'task-notes' in agentlab_data: parser.task_notes = agentlab_data["task-notes"]
693
+ else: parser.task_notes = []
694
+ if 'num-papers-to-write' in agentlab_data: parser.num_papers_to_write = agentlab_data["num-papers-to-write"]
695
+ else: parser.num_papers_to_write = 100
696
+ if 'parallel-labs' in agentlab_data: parser.parallel_labs = agentlab_data["parallel-labs"]
697
+ else: parser.parallel_labs = False
698
+ if 'num-parallel-labs' in agentlab_data: parser.num_parallel_labs = agentlab_data["num-parallel-labs"]
699
+ else: parser.num_parallel_labs = 8
700
+ if 'except-if-fail' in agentlab_data: parser.except_if_fail = agentlab_data["except-if-fail"]
701
+ else: parser.except_if_fail = False
702
+ if 'agentRxiv' in agentlab_data: parser.agentRxiv = agentlab_data["agentRxiv"]
703
+ else: parser.agentRxiv = False
704
+ if 'construct-agentRxiv' in agentlab_data: parser.construct_agentRxiv = agentlab_data["construct-agentRxiv"]
705
+ else: parser.construct_agentRxiv = False
706
+ if 'agentrxiv-papers' in agentlab_data: parser.agentrxiv_papers = agentlab_data["agentrxiv-papers"]
707
+ else: parser.agentrxiv_papers = 5
708
+
709
+ if 'lab-index' in agentlab_data: parser.lab_index = agentlab_data["lab-index"]
710
+ else: parser.lab_index = 0
711
+ return parser
712
+
713
+
714
+ if __name__ == "__main__":
715
+ user_args = parse_arguments()
716
+ yaml_to_use = user_args.yaml_location
717
+ args = parse_yaml(yaml_to_use)
718
+
719
+ llm_backend = args.llm_backend
720
+ human_mode = args.copilot_mode.lower() == "true" if type(args.copilot_mode) == str else args.copilot_mode
721
+ compile_pdf = args.compile_latex.lower() == "true" if type(args.compile_latex) == str else args.compile_latex
722
+ load_previous = args.load_previous.lower() == "true" if type(args.load_previous) == str else args.load_previous
723
+ parallel_labs = args.parallel_labs.lower() == "true" if type(args.parallel_labs) == str else args.parallel_labs
724
+ except_if_fail = args.except_if_fail.lower() == "true" if type(args.except_if_fail) == str else args.except_if_fail
725
+ agentRxiv = args.agentRxiv.lower() == "true" if type(args.agentRxiv) == str else args.agentRxiv
726
+ construct_agentRxiv = args.construct_agentRxiv.lower() == "true" if type(args.construct_agentRxiv) == str else args.construct_agentRxiv
727
+ lab_index = int(args.lab_index) if type(args.construct_agentRxiv) == str else args.lab_index
728
+
729
+ try: num_papers_to_write = int(args.num_papers_to_write.lower()) if type(args.num_papers_to_write) == str else args.num_papers_to_write
730
+ except Exception: raise Exception("args.num_papers_lit_review must be a valid integer!")
731
+ try: num_papers_lit_review = int(args.num_papers_lit_review.lower()) if type(args.num_papers_lit_review) == str else args.num_papers_lit_review
732
+ except Exception: raise Exception("args.num_papers_lit_review must be a valid integer!")
733
+ try: papersolver_max_steps = int(args.papersolver_max_steps.lower()) if type(args.papersolver_max_steps) == str else args.papersolver_max_steps
734
+ except Exception: raise Exception("args.papersolver_max_steps must be a valid integer!")
735
+ try: mlesolver_max_steps = int(args.mlesolver_max_steps.lower()) if type(args.mlesolver_max_steps) == str else args.mlesolver_max_steps
736
+ except Exception: raise Exception("args.mlesolver_max_steps must be a valid integer!")
737
+ if parallel_labs:
738
+ num_parallel_labs = int(args.num_parallel_labs)
739
+ print("="*20 , f"RUNNING {num_parallel_labs} LABS IN PARALLEL", "="*20)
740
+ else: num_parallel_labs = 0
741
+
742
+ api_key = (os.getenv('OPENAI_API_KEY') or args.api_key) if (hasattr(args, 'api_key') or os.getenv('OPENAI_API_KEY')) else None
743
+ deepseek_api_key = (os.getenv('DEEPSEEK_API_KEY') or args.deepseek_api_key) if (hasattr(args, 'deepseek_api_key') or os.getenv('DEEPSEEK_API_KEY')) else None
744
+ if api_key is not None and os.getenv('OPENAI_API_KEY') is None: os.environ["OPENAI_API_KEY"] = args.api_key
745
+ if deepseek_api_key is not None and os.getenv('DEEPSEEK_API_KEY') is None: os.environ["DEEPSEEK_API_KEY"] = args.deepseek_api_key
746
+
747
+ if not api_key and not deepseek_api_key: raise ValueError("API key must be provided via --api-key / -deepseek-api-key or the OPENAI_API_KEY / DEEPSEEK_API_KEY environment variable.")
748
+
749
+ if human_mode or args.research_topic is None: research_topic = input("Please name an experiment idea for AgentLaboratory to perform: ")
750
+ else: research_topic = args.research_topic
751
+
752
+ task_notes_LLM = list()
753
+ task_notes = args.task_notes
754
+ for _task in task_notes:
755
+ for _note in task_notes[_task]:
756
+ task_notes_LLM.append({"phases": [_task.replace("-", " ")], "note": _note})
757
+
758
+ if args.language != "English":
759
+ task_notes_LLM.append(
760
+ {"phases": ["literature review", "plan formulation", "data preparation", "running experiments", "results interpretation", "report writing", "report refinement"],
761
+ "note": f"You should always write in the following language to converse and to write the report {args.language}"},
762
+ )
763
+
764
+ human_in_loop = {
765
+ "literature review": human_mode,
766
+ "plan formulation": human_mode,
767
+ "data preparation": human_mode,
768
+ "running experiments": human_mode,
769
+ "results interpretation": human_mode,
770
+ "report writing": human_mode,
771
+ "report refinement": human_mode,
772
+ }
773
+
774
+ agent_models = {
775
+ "literature review": llm_backend,
776
+ "plan formulation": llm_backend,
777
+ "data preparation": llm_backend,
778
+ "running experiments": llm_backend,
779
+ "report writing": llm_backend,
780
+ "results interpretation": llm_backend,
781
+ "paper refinement": llm_backend,
782
+ }
783
+ if parallel_labs:
784
+ remove_figures()
785
+ GLOBAL_AGENTRXIV = AgentRxiv()
786
+ remove_directory(f"{RESEARCH_DIR_PATH}")
787
+ os.mkdir(os.path.join(".", f"{RESEARCH_DIR_PATH}"))
788
+ from concurrent.futures import ThreadPoolExecutor, as_completed
789
+ if not compile_pdf: raise Exception("PDF compilation must be used with agentRxiv!")
790
+ def run_lab(parallel_lab_index):
791
+ time_str = str()
792
+ time_now = time.time()
793
+ for _paper_index in range(num_papers_to_write):
794
+ lab_dir = os.path.join(RESEARCH_DIR_PATH, f"research_dir_lab{parallel_lab_index}_paper{_paper_index}")
795
+ os.mkdir(lab_dir)
796
+ os.mkdir(os.path.join(lab_dir, "src"))
797
+ os.mkdir(os.path.join(lab_dir, "tex"))
798
+ lab_instance = LaboratoryWorkflow(
799
+ parallelized=True,
800
+ research_topic=research_topic,
801
+ notes=task_notes_LLM,
802
+ agent_model_backbone=agent_models,
803
+ human_in_loop_flag=human_in_loop,
804
+ openai_api_key=api_key,
805
+ compile_pdf=compile_pdf,
806
+ num_papers_lit_review=num_papers_lit_review,
807
+ papersolver_max_steps=papersolver_max_steps,
808
+ mlesolver_max_steps=mlesolver_max_steps,
809
+ paper_index=_paper_index,
810
+ lab_index=parallel_lab_index,
811
+ except_if_fail=except_if_fail,
812
+ lab_dir=lab_dir,
813
+ agentRxiv=True,
814
+ agentrxiv_papers=args.agentrxiv_papers
815
+ )
816
+ lab_instance.perform_research()
817
+ time_str += str(time.time() - time_now) + " | "
818
+ with open(f"agent_times_{parallel_lab_index}.txt", "w") as f:
819
+ f.write(time_str)
820
+ time_now = time.time()
821
+
822
+ with ThreadPoolExecutor(max_workers=num_parallel_labs) as executor:
823
+ futures = [executor.submit(run_lab, lab_idx) for lab_idx in range(num_parallel_labs)]
824
+ for future in as_completed(futures):
825
+ try: future.result()
826
+ except Exception as e: print(f"Error in lab: {e}")
827
+
828
+ raise NotImplementedError("Todo: implement parallel labs")
829
+ else:
830
+ # remove previous files
831
+ remove_figures()
832
+ if agentRxiv: GLOBAL_AGENTRXIV = AgentRxiv(lab_index)
833
+ if not agentRxiv:
834
+ remove_directory(f"{RESEARCH_DIR_PATH}")
835
+ os.mkdir(os.path.join(".", f"{RESEARCH_DIR_PATH}"))
836
+ # make src and research directory
837
+ if not os.path.exists("state_saves"): os.mkdir(os.path.join(".", "state_saves"))
838
+ time_str = str()
839
+ time_now = time.time()
840
+ for _paper_index in range(num_papers_to_write):
841
+ lab_direct = f"{RESEARCH_DIR_PATH}/research_dir_{_paper_index}_lab_{lab_index}"
842
+ os.mkdir(os.path.join(".", lab_direct))
843
+ os.mkdir(os.path.join(f"./{lab_direct}", "src"))
844
+ os.mkdir(os.path.join(f"./{lab_direct}", "tex"))
845
+ lab = LaboratoryWorkflow(
846
+ research_topic=research_topic,
847
+ notes=task_notes_LLM,
848
+ agent_model_backbone=agent_models,
849
+ human_in_loop_flag=human_in_loop,
850
+ openai_api_key=api_key,
851
+ compile_pdf=compile_pdf,
852
+ num_papers_lit_review=num_papers_lit_review,
853
+ papersolver_max_steps=papersolver_max_steps,
854
+ mlesolver_max_steps=mlesolver_max_steps,
855
+ paper_index=_paper_index,
856
+ except_if_fail=except_if_fail,
857
+ agentRxiv=False,
858
+ lab_index=lab_index,
859
+ lab_dir=f"./{lab_direct}"
860
+ )
861
+ lab.perform_research()
862
+ time_str += str(time.time() - time_now) + " | "
863
+ with open(f"agent_times_{lab_index}.txt", "w") as f:
864
+ f.write(time_str)
865
+ time_now = time.time()
866
+
867
+
868
+
869
+
870
+
871
+
872
+
873
+ """
874
+ @@@@@@@@@@@@@@@ CHECKLIST @@@@@@@@@@@@@@@
875
+ Practical:
876
+ ----------
877
+ - Make a better config system (YAML?)
878
+
879
+ Advancements:
880
+ -------------
881
+ - Make the ability to have agents build on top of their own research
882
+ - Run agent labs in parallel (asynch)
883
+
884
+ """
885
+
886
+
887
+
888
+
889
+
890
+
891
+
app.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import random, time
2
+
3
+ from flask import Flask, render_template, request, redirect, url_for, flash, send_from_directory, jsonify
4
+ from werkzeug.utils import secure_filename
5
+ import os
6
+ from PyPDF2 import PdfReader
7
+ from flask_sqlalchemy import SQLAlchemy
8
+ from sentence_transformers import SentenceTransformer
9
+ from sklearn.metrics.pairwise import cosine_similarity
10
+ import numpy as np
11
+
12
+ app = Flask(__name__)
13
+ app.config['SECRET_KEY'] = 'your-secret-key'
14
+ app.config['UPLOAD_FOLDER'] = 'uploads/'
15
+ app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///papers.db'
16
+ app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
17
+
18
+ db = SQLAlchemy(app)
19
+
20
+ class Paper(db.Model):
21
+ id = db.Column(db.Integer, primary_key=True)
22
+ filename = db.Column(db.String(120), nullable=False)
23
+ text = db.Column(db.Text, nullable=True)
24
+
25
+ def update_papers_from_uploads():
26
+ for _tries in range(5):
27
+ try:
28
+ uploads_dir = app.config['UPLOAD_FOLDER']
29
+ file_list = os.listdir(uploads_dir)
30
+ print("Files in uploads folder:", file_list)
31
+ for filename in file_list:
32
+ if filename.lower().endswith('.pdf'):
33
+ # Check if file is already in the DB
34
+ if not Paper.query.filter_by(filename=filename).first():
35
+ print("Processing file:", filename)
36
+ file_path = os.path.join(uploads_dir, filename)
37
+ extracted_text = ""
38
+ try:
39
+ reader = PdfReader(file_path)
40
+ for page in reader.pages:
41
+ text = page.extract_text()
42
+ if text:
43
+ extracted_text += text
44
+ except Exception as e:
45
+ flash(f'Error processing {filename}: {e}')
46
+ continue
47
+ if not extracted_text.strip():
48
+ print(f"Warning: No text extracted from {filename}")
49
+ else:
50
+ print(f"Extracted {len(extracted_text)} characters from {filename}")
51
+ new_paper = Paper(filename=filename, text=extracted_text)
52
+ db.session.add(new_paper)
53
+ db.session.commit()
54
+ return
55
+ except Exception as e:
56
+ print("WEB SERVER LOAD EXCEPTION", e, str(e))
57
+ time.sleep(random.randint(5, 15))
58
+ return
59
+ #raise Exception("FAILED TO UPDATE")
60
+
61
+ # Load a pre-trained sentence transformer model
62
+ model = SentenceTransformer('all-MiniLM-L6-v2')
63
+
64
+ @app.route('/update', methods=['GET'])
65
+ def update_on_demand():
66
+ update_papers_from_uploads()
67
+ return jsonify({"message": "Uploads folder processed successfully."})
68
+
69
+ @app.route('/')
70
+ def index():
71
+ update_papers_from_uploads()
72
+ papers = Paper.query.all()
73
+ return render_template('index.html', papers=papers)
74
+
75
+ @app.route('/upload', methods=['GET', 'POST'])
76
+ def upload():
77
+ if request.method == 'POST':
78
+ if 'pdf' not in request.files:
79
+ flash('No file part')
80
+ return redirect(request.url)
81
+ file = request.files['pdf']
82
+ if file.filename == '':
83
+ flash('No selected file')
84
+ return redirect(request.url)
85
+ if file:
86
+ filename = secure_filename(file.filename)
87
+ file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
88
+ file.save(file_path)
89
+ extracted_text = ""
90
+ try:
91
+ reader = PdfReader(file_path)
92
+ for page in reader.pages:
93
+ text = page.extract_text()
94
+ if text:
95
+ extracted_text += text
96
+ except Exception as e:
97
+ flash(f'Error processing PDF: {e}')
98
+ new_paper = Paper(filename=filename, text=extracted_text)
99
+ db.session.add(new_paper)
100
+ db.session.commit()
101
+ flash('File uploaded and processed successfully!')
102
+ return redirect(url_for('index'))
103
+ return render_template('upload.html')
104
+
105
+ @app.route('/search')
106
+ def search():
107
+ query = request.args.get('q', '')
108
+ if query:
109
+ papers = Paper.query.all()
110
+ query_embedding = model.encode([query])
111
+ paper_texts = [paper.text for paper in papers if paper.text]
112
+ if not paper_texts:
113
+ return render_template('search.html', papers=[], query=query)
114
+ paper_embeddings = model.encode(paper_texts)
115
+ similarities = cosine_similarity(query_embedding, paper_embeddings)[0]
116
+ papers_with_scores = list(zip([p for p in papers if p.text], similarities))
117
+ papers_sorted = sorted(papers_with_scores, key=lambda x: x[1], reverse=True)
118
+ return render_template('search.html', papers=papers_sorted, query=query)
119
+ return render_template('search.html', papers=[], query=query)
120
+
121
+ @app.route('/api/search')
122
+ def api_search():
123
+ query = request.args.get('q', '')
124
+ if not query:
125
+ return jsonify({'error': 'No query provided'}), 400
126
+ papers = Paper.query.all()
127
+ if not papers:
128
+ return jsonify({'query': query, 'results': []})
129
+ query_embedding = model.encode([query])
130
+ paper_texts = [paper.text for paper in papers if paper.text]
131
+ if not paper_texts:
132
+ return jsonify({'query': query, 'results': []})
133
+ paper_embeddings = model.encode(paper_texts)
134
+ similarities = cosine_similarity(query_embedding, paper_embeddings)[0]
135
+ papers_with_scores = list(zip([p for p in papers if p.text], similarities))
136
+ papers_sorted = sorted(papers_with_scores, key=lambda x: x[1], reverse=True)
137
+ results = []
138
+ for paper, score in papers_sorted:
139
+ pdf_url = url_for('uploaded_file', filename=paper.filename, _external=True)
140
+ results.append({
141
+ 'id': paper.id,
142
+ 'filename': paper.filename,
143
+ 'similarity': float(score),
144
+ 'pdf_url': pdf_url
145
+ })
146
+ return jsonify({'query': query, 'results': results})
147
+
148
+ @app.route('/uploads/<path:filename>')
149
+ def uploaded_file(filename):
150
+ return send_from_directory(app.config['UPLOAD_FOLDER'], filename, mimetype='application/pdf')
151
+
152
+ @app.route('/view/<int:paper_id>')
153
+ def view_pdf(paper_id):
154
+ paper = Paper.query.get_or_404(paper_id)
155
+ pdf_url = url_for('uploaded_file', filename=paper.filename, _external=True)
156
+ return render_template('view.html', paper=paper, pdf_url=pdf_url)
157
+
158
+
159
+ def run_app(port=5000):
160
+ # Reset the database by removing the existing file
161
+ db_path = "papers.db"
162
+ if os.path.exists("instance/" + db_path):
163
+ os.remove("instance/" + db_path)
164
+ with app.app_context():
165
+ db.create_all()
166
+ if not os.path.exists(app.config['UPLOAD_FOLDER']):
167
+ os.makedirs(app.config['UPLOAD_FOLDER'])
168
+ app.run(debug=False, port=port)
169
+
170
+ if __name__ == '__main__':
171
+ run_app()
common_imports.py ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # General-purpose imports
2
+ import os
3
+ import sys
4
+ import json
5
+ import time
6
+ import re
7
+ import math
8
+ import logging
9
+ import random
10
+ import shutil
11
+ import pathlib
12
+ import argparse
13
+ import itertools
14
+ import datetime
15
+ import collections
16
+ import subprocess
17
+
18
+ # Data manipulation and analysis
19
+ import pandas as pd
20
+ import numpy as np
21
+ import csv
22
+ import json
23
+ import yaml
24
+ import h5py
25
+ import sqlite3
26
+ import pickle
27
+
28
+ # Visualization
29
+ import matplotlib.pyplot as plt
30
+ import seaborn as sns
31
+ import plotly.express as px
32
+ import plotly.graph_objects as go
33
+
34
+ # Hugging Face & Transformers
35
+ import transformers
36
+
37
+ # Deep learning frameworks
38
+ import torch
39
+ import torch.nn as nn
40
+ import torch.optim as optim
41
+ import torch.nn.functional as F
42
+ from torch.utils.data import DataLoader, Dataset, random_split
43
+ import tensorflow as tf
44
+ #import keras
45
+
46
+ # NLP Libraries
47
+ import tiktoken
48
+ import nltk
49
+ from nltk.tokenize import word_tokenize, sent_tokenize
50
+ from nltk.corpus import stopwords
51
+ from nltk.stem import PorterStemmer, WordNetLemmatizer
52
+ import spacy
53
+ import sacremoses
54
+ # Diffusers for image generation and stable diffusion
55
+ import diffusers
56
+ from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
57
+
58
+ # Performance acceleration libraries
59
+ import accelerate
60
+ from accelerate import Accelerator
61
+
62
+ # Hugging Face Hub utilities
63
+ import huggingface_hub
64
+ from huggingface_hub import HfApi, notebook_login
65
+
66
+ # Scikit-learn for machine learning
67
+ import sklearn
68
+ from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
69
+ from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix
70
+ from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder
71
+ from sklearn.decomposition import PCA
72
+ from sklearn.cluster import KMeans
73
+ from sklearn.svm import SVC
74
+ from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
75
+ from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
76
+
77
+ # Statistical analysis
78
+ import scipy
79
+ from scipy import stats, signal, spatial
80
+ from scipy.optimize import minimize
81
+ from scipy.spatial.distance import euclidean, cosine
82
+ from scipy.linalg import svd, eig
83
+ from statsmodels.api import OLS, Logit
84
+ from statsmodels.tsa.arima_model import ARIMA
85
+ from statsmodels.tsa.stattools import adfuller, pacf, acf
86
+
87
+ # Image processing and handling
88
+ from PIL import Image
89
+ import imageio
90
+ from skimage import io, color, filters, transform, exposure
91
+
92
+ # File handling and I/O
93
+ import gzip
94
+ import zipfile
95
+ import tarfile
96
+ import glob
97
+
98
+ # Parallel processing
99
+ import multiprocessing
100
+ from multiprocessing import Pool
101
+
102
+ # Miscellaneous utilities
103
+ import hashlib
104
+ import uuid
105
+ import base64
106
+ import warnings
107
+ from tqdm import tqdm
108
+ from functools import partial, lru_cache
109
+
110
+ # Other advanced libraries
111
+ import pydantic
112
+ import requests
113
+ import aiohttp
experiment_configs/MATH_agentlab.yaml ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # If you want to have user input or be a human-in-the-loop
2
+ copilot-mode: True
3
+
4
+ # Here is the research prompt. If num-papers-to-write > 1, you can treat this as a "research direction" otherwise it can be *very* specific and can be treated as a full research idea
5
+ research-topic: "Your goal is to design reasoning and prompt engineering techniques to maximize accuracy on the entire 500 test questions of MATH500 benchmark. Your idea should be very novel."
6
+
7
+ # Here you can put your OpenAI API key--if you don't have one or OpenAI doesn't work for you, you can also instead use `deepseek-api-key`
8
+ api-key: "OPENAI-API-KEY-HERE"
9
+ # or deepseek-api-key: "DEEPSEEK-API-KEY-HERE"
10
+ # Agent Laboratory backend
11
+ llm-backend: "o3-mini"
12
+ # Literature review backend
13
+ lit-review-backend: "o3-mini"
14
+
15
+ # Base language
16
+ language: "English"
17
+
18
+ # Number of arxiv papers to lit review
19
+ num-papers-lit-review: 5
20
+ # Total number of papers to write in sequence
21
+ num-papers-to-write: 1
22
+ # Do you want to run multiple agent labs in parallel?
23
+ parallel-labs: False
24
+
25
+ # Total mle-solver steps per lab
26
+ mlesolver-max-steps: 3
27
+ # Total paper-solver steps per lab
28
+ papersolver-max-steps: 1
29
+ # The lab index for this lab (used for parallel runs)
30
+ lab-index: 1
31
+ # If you want to load an existing save
32
+ load-existing: False
33
+ # If fail, run exception?
34
+ except-if-fail: False
35
+ # Compile latex into PDFs during paper-solver
36
+ compile-latex: False
37
+
38
+ # Task notes
39
+ task-notes:
40
+ plan-formulation:
41
+ - 'You should come up with a plan for only ONE experiment aimed at maximizing performance on the test set of MATH using prompting techniques.'
42
+ - 'The baseline performance of gpt-4o-mini on MATH-500 is 70.2%'
43
+ - 'Please use gpt-4o-mini for your experiments'
44
+ - 'You must evaluate on the entire 500 test questions of MATH'
45
+ - 'Your plan should be a novel prompting technique'
46
+ - 'Your evalution shound aim to get state-of-the-art performance on the MATH dataset using prompt a novel prompting idea'
47
+ - "DO NOT PLAN FOR TOO LONG. Submit your plan soon."
48
+ data-preparation:
49
+ - 'Please use gpt-4o-mini for your experiments'
50
+ - 'You must evaluate on the entire 500 test questions of MATH'
51
+ - 'Here is a sample code you can use to load MATH\nfrom datasets import load_dataset\nMATH_test_set = load_dataset("HuggingFaceH4/MATH-500")["test"]'
52
+ running-experiments:
53
+ - "For all strings you instantiate you must use triple quotes (''')"
54
+ - 'Please use gpt-4o-mini for your experiments'
55
+ - 'Do not try to obtain baseline accuracy or any comparison points. The baseline performance of gpt-4o-mini on MATH-500 is 70.2%'
56
+ - 'You can just use the query_gpt4omini(prompt=prompt, system=system_prompt) to prompt gpt-4o-mini. You can also access temperature by setting the temperature value query_gpt4omini(prompt=prompt, system=system_prompt, temperature=0.5) for example.'
57
+ - 'You must evaluate on the entire 500 test questions of MATH-500'
58
+ - "You should come up with a plan for ONE experiment aimed at maximizing performance on MATH using prompting techniques"
59
+ - "Make sure to use is_equiv() to evaluate if two answers are equivalent."
60
+ - 'Use the following code to inference gpt-4o-mini\nresponse = query_gpt4omini(prompt=prompt, system=system_prompt)'
61
+ - "Your code should parallelize inference. Make sure to write parallelized code."
62
+ - "YOU MUST MAKE YOUR CODE PARALLELIZED."
63
+ - "Create very thoughtful figures, that would make a good research study."
64
+ - 'You have access to only gpt-4o-mini'
65
+ - 'Here is some sample code to evaluate on MATH:\nimport multiprocessing\nimport concurrent.futures\nfrom datasets import load_dataset\n\ndef process_example(example):\n problem = example["problem"]\n solution = example["solution"]\n true_answer = remove_boxed(last_boxed_only_string(solution))\n prompt = f"""Solve the following math problem and provide your final answer enclosed in a LaTeX \\boxed{{...}} command.\n\nProblem: {problem}\n\nFinal Answer:"""\n response = query_gpt4omini(prompt=prompt, system="You are a skilled mathematician.")\n llm_answer = remove_boxed(last_boxed_only_string(response))\n correct = is_equiv(llm_answer, true_answer)\n return llm_answer, true_answer, correct\n\ndef main():\n math_test_set = load_dataset("HuggingFaceH4/MATH-500")["test"]\n total, correct_count = 0, 0\n max_workers = multiprocessing.cpu_count()\n with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:\n futures = [executor.submit(process_example, example) for example in math_test_set]\n for future in concurrent.futures.as_completed(futures):\n try: llm_answer, true_answer, correct = future.result()\n except Exception: continue\n total += 1\n if correct: correct_count += 1\n print(f"Step: {total}, LLM answer: {llm_answer}, True answer: {true_answer}, Accuracy: {(correct_count / total) * 100:.2f}%")\n print(f"Complete, final accuracy: {(correct_count / total) * 100:.2f}%")\n\nif __name__ == "__main__":\n main()'
66
+ - 'Generate figures with very colorful and artistic design'
67
+ results-interpretation:
68
+ - 'The baseline performance of gpt-4o-mini on MATH-500 is 70.2%'
69
+ report-writing:
70
+ - 'The baseline performance of gpt-4o-mini on MATH-500 is 70.2%'
experiment_configs/MATH_agentrxiv.yaml ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # If you want to have user input or be a human-in-the-loop
2
+ copilot-mode: False
3
+
4
+ # Here is the research prompt. If num-papers-to-write > 1, you can treat this as a "research direction" otherwise it can be *very* specific and can be treated as a full research idea
5
+ research-topic: "Your goal is to design reasoning and prompt engineering techniques to maximize accuracy on the entire 500 test questions of MATH500 benchmark. Your idea should be very novel."
6
+
7
+ # Here you can put your OpenAI API key--if you don't have one or OpenAI doesn't work for you, you can also instead use `deepseek-api-key`
8
+ api-key: "OPENAI-API-KEY-HERE"
9
+ # or deepseek-api-key: "DEEPSEEK-API-KEY-HERE"
10
+ # Agent Laboratory backend
11
+ llm-backend: "o3-mini"
12
+ # Literature review backend
13
+ lit-review-backend: "o3-mini"
14
+
15
+ # Base language
16
+ language: "English"
17
+
18
+ # Number of arxiv papers to lit review
19
+ num-papers-lit-review: 5
20
+ # Number of agentRxiv papers to lit review
21
+ agentrxiv-papers: 5
22
+ # Total number of papers to write in sequence
23
+ num-papers-to-write: 40
24
+ # Do you want to run multiple agent labs in parallel?
25
+ parallel-labs: False
26
+
27
+ # Total mle-solver steps per lab
28
+ mlesolver-max-steps: 3
29
+ # Total paper-solver steps per lab
30
+ papersolver-max-steps: 1
31
+ # The lab index for this lab (used for parallel runs)
32
+ lab-index: 1
33
+ # If you want to load an existing save
34
+ load-existing: False
35
+ # If fail, run exception?
36
+ except-if-fail: False
37
+ # Compile latex into PDFs during paper-solver
38
+ compile-latex: False
39
+
40
+ # Task notes
41
+ task-notes:
42
+ plan-formulation:
43
+ - 'You should come up with a plan for only ONE experiment aimed at maximizing performance on the test set of MATH using prompting techniques.'
44
+ - 'The baseline performance of gpt-4o-mini on MATH-500 is 70.2%'
45
+ - 'Please use gpt-4o-mini for your experiments'
46
+ - 'You must evaluate on the entire 500 test questions of MATH'
47
+ - 'Your plan should be a novel prompting technique'
48
+ - 'Your evalution shound aim to get state-of-the-art performance on the MATH dataset using prompt a novel prompting idea'
49
+ - "DO NOT PLAN FOR TOO LONG. Submit your plan soon."
50
+ data-preparation:
51
+ - 'Please use gpt-4o-mini for your experiments'
52
+ - 'You must evaluate on the entire 500 test questions of MATH'
53
+ - 'Here is a sample code you can use to load MATH\nfrom datasets import load_dataset\nMATH_test_set = load_dataset("HuggingFaceH4/MATH-500")["test"]'
54
+ running-experiments:
55
+ - "For all strings you instantiate you must use triple quotes (''')"
56
+ - 'Please use gpt-4o-mini for your experiments'
57
+ - 'Do not try to obtain baseline accuracy or any comparison points. The baseline performance of gpt-4o-mini on MATH-500 is 70.2%'
58
+ - 'You can just use the query_gpt4omini(prompt=prompt, system=system_prompt) to prompt gpt-4o-mini. You can also access temperature by setting the temperature value query_gpt4omini(prompt=prompt, system=system_prompt, temperature=0.5) for example.'
59
+ - 'You must evaluate on the entire 500 test questions of MATH-500'
60
+ - "You should come up with a plan for ONE experiment aimed at maximizing performance on MATH using prompting techniques"
61
+ - "Make sure to use is_equiv() to evaluate if two answers are equivalent."
62
+ - 'Use the following code to inference gpt-4o-mini\nresponse = query_gpt4omini(prompt=prompt, system=system_prompt)'
63
+ - "Your code should parallelize inference. Make sure to write parallelized code."
64
+ - "YOU MUST MAKE YOUR CODE PARALLELIZED."
65
+ - "Create very thoughtful figures, that would make a good research study."
66
+ - 'You have access to only gpt-4o-mini'
67
+ - 'Here is some sample code to evaluate on MATH:\nimport multiprocessing\nimport concurrent.futures\nfrom datasets import load_dataset\n\ndef process_example(example):\n problem = example["problem"]\n solution = example["solution"]\n true_answer = remove_boxed(last_boxed_only_string(solution))\n prompt = f"""Solve the following math problem and provide your final answer enclosed in a LaTeX \\boxed{{...}} command.\n\nProblem: {problem}\n\nFinal Answer:"""\n response = query_gpt4omini(prompt=prompt, system="You are a skilled mathematician.")\n llm_answer = remove_boxed(last_boxed_only_string(response))\n correct = is_equiv(llm_answer, true_answer)\n return llm_answer, true_answer, correct\n\ndef main():\n math_test_set = load_dataset("HuggingFaceH4/MATH-500")["test"]\n total, correct_count = 0, 0\n max_workers = multiprocessing.cpu_count()\n with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:\n futures = [executor.submit(process_example, example) for example in math_test_set]\n for future in concurrent.futures.as_completed(futures):\n try: llm_answer, true_answer, correct = future.result()\n except Exception: continue\n total += 1\n if correct: correct_count += 1\n print(f"Step: {total}, LLM answer: {llm_answer}, True answer: {true_answer}, Accuracy: {(correct_count / total) * 100:.2f}%")\n print(f"Complete, final accuracy: {(correct_count / total) * 100:.2f}%")\n\nif __name__ == "__main__":\n main()'
68
+ - 'Generate figures with very colorful and artistic design'
69
+ results-interpretation:
70
+ - 'The baseline performance of gpt-4o-mini on MATH-500 is 70.2%'
71
+ report-writing:
72
+ - 'The baseline performance of gpt-4o-mini on MATH-500 is 70.2%'
inference.py ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import openai
2
+ import time, tiktoken
3
+ from openai import OpenAI
4
+ import os, anthropic, json
5
+ import google.generativeai as genai
6
+
7
+ TOKENS_IN = dict()
8
+ TOKENS_OUT = dict()
9
+
10
+ encoding = tiktoken.get_encoding("cl100k_base")
11
+
12
+ def curr_cost_est():
13
+ costmap_in = {
14
+ "gpt-4o": 2.50 / 1000000,
15
+ "gpt-4o-mini": 0.150 / 1000000,
16
+ "o1-preview": 15.00 / 1000000,
17
+ "o1-mini": 3.00 / 1000000,
18
+ "claude-3-5-sonnet": 3.00 / 1000000,
19
+ "deepseek-chat": 1.00 / 1000000,
20
+ "o1": 15.00 / 1000000,
21
+ "o3-mini": 1.10 / 1000000,
22
+ }
23
+ costmap_out = {
24
+ "gpt-4o": 10.00/ 1000000,
25
+ "gpt-4o-mini": 0.6 / 1000000,
26
+ "o1-preview": 60.00 / 1000000,
27
+ "o1-mini": 12.00 / 1000000,
28
+ "claude-3-5-sonnet": 12.00 / 1000000,
29
+ "deepseek-chat": 5.00 / 1000000,
30
+ "o1": 60.00 / 1000000,
31
+ "o3-mini": 4.40 / 1000000,
32
+ }
33
+ return sum([costmap_in[_]*TOKENS_IN[_] for _ in TOKENS_IN]) + sum([costmap_out[_]*TOKENS_OUT[_] for _ in TOKENS_OUT])
34
+
35
+ def query_model(model_str, prompt, system_prompt, openai_api_key=None, gemini_api_key=None, anthropic_api_key=None, tries=5, timeout=5.0, temp=None, print_cost=True, version="1.5"):
36
+ preloaded_api = os.getenv('OPENAI_API_KEY')
37
+ if openai_api_key is None and preloaded_api is not None:
38
+ openai_api_key = preloaded_api
39
+ if openai_api_key is None and anthropic_api_key is None:
40
+ raise Exception("No API key provided in query_model function")
41
+ if openai_api_key is not None:
42
+ openai.api_key = openai_api_key
43
+ os.environ["OPENAI_API_KEY"] = openai_api_key
44
+ if anthropic_api_key is not None:
45
+ os.environ["ANTHROPIC_API_KEY"] = anthropic_api_key
46
+ if gemini_api_key is not None:
47
+ os.environ["GEMINI_API_KEY"] = gemini_api_key
48
+ for _ in range(tries):
49
+ try:
50
+ if model_str == "gpt-4o-mini" or model_str == "gpt4omini" or model_str == "gpt-4omini" or model_str == "gpt4o-mini":
51
+ model_str = "gpt-4o-mini"
52
+ messages = [
53
+ {"role": "system", "content": system_prompt},
54
+ {"role": "user", "content": prompt}]
55
+ if version == "0.28":
56
+ if temp is None:
57
+ completion = openai.ChatCompletion.create(
58
+ model=f"{model_str}", # engine = "deployment_name".
59
+ messages=messages
60
+ )
61
+ else:
62
+ completion = openai.ChatCompletion.create(
63
+ model=f"{model_str}", # engine = "deployment_name".
64
+ messages=messages, temperature=temp
65
+ )
66
+ else:
67
+ client = OpenAI()
68
+ if temp is None:
69
+ completion = client.chat.completions.create(
70
+ model="gpt-4o-mini-2024-07-18", messages=messages, )
71
+ else:
72
+ completion = client.chat.completions.create(
73
+ model="gpt-4o-mini-2024-07-18", messages=messages, temperature=temp)
74
+ answer = completion.choices[0].message.content
75
+
76
+ elif model_str == "gemini-2.0-pro":
77
+ genai.configure(api_key=gemini_api_key)
78
+ model = genai.GenerativeModel(model_name="gemini-2.0-pro-exp-02-05", system_instruction=system_prompt)
79
+ answer = model.generate_content(prompt).text
80
+ elif model_str == "gemini-1.5-pro":
81
+ genai.configure(api_key=gemini_api_key)
82
+ model = genai.GenerativeModel(model_name="gemini-1.5-pro", system_instruction=system_prompt)
83
+ answer = model.generate_content(prompt).text
84
+ elif model_str == "o3-mini":
85
+ model_str = "o3-mini"
86
+ messages = [
87
+ {"role": "user", "content": system_prompt + prompt}]
88
+ if version == "0.28":
89
+ completion = openai.ChatCompletion.create(
90
+ model=f"{model_str}", messages=messages)
91
+ else:
92
+ client = OpenAI()
93
+ completion = client.chat.completions.create(
94
+ model="o3-mini-2025-01-31", messages=messages)
95
+ answer = completion.choices[0].message.content
96
+
97
+ elif model_str == "claude-3.5-sonnet":
98
+ client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
99
+ message = client.messages.create(
100
+ model="claude-3-5-sonnet-latest",
101
+ system=system_prompt,
102
+ messages=[{"role": "user", "content": prompt}])
103
+ answer = json.loads(message.to_json())["content"][0]["text"]
104
+ elif model_str == "gpt4o" or model_str == "gpt-4o":
105
+ model_str = "gpt-4o"
106
+ messages = [
107
+ {"role": "system", "content": system_prompt},
108
+ {"role": "user", "content": prompt}]
109
+ if version == "0.28":
110
+ if temp is None:
111
+ completion = openai.ChatCompletion.create(
112
+ model=f"{model_str}", # engine = "deployment_name".
113
+ messages=messages
114
+ )
115
+ else:
116
+ completion = openai.ChatCompletion.create(
117
+ model=f"{model_str}", # engine = "deployment_name".
118
+ messages=messages, temperature=temp)
119
+ else:
120
+ client = OpenAI()
121
+ if temp is None:
122
+ completion = client.chat.completions.create(
123
+ model="gpt-4o-2024-08-06", messages=messages, )
124
+ else:
125
+ completion = client.chat.completions.create(
126
+ model="gpt-4o-2024-08-06", messages=messages, temperature=temp)
127
+ answer = completion.choices[0].message.content
128
+ elif model_str == "deepseek-chat":
129
+ model_str = "deepseek-chat"
130
+ messages = [
131
+ {"role": "system", "content": system_prompt},
132
+ {"role": "user", "content": prompt}]
133
+ if version == "0.28":
134
+ raise Exception("Please upgrade your OpenAI version to use DeepSeek client")
135
+ else:
136
+ deepseek_client = OpenAI(
137
+ api_key=os.getenv('DEEPSEEK_API_KEY'),
138
+ base_url="https://api.deepseek.com/v1"
139
+ )
140
+ if temp is None:
141
+ completion = deepseek_client.chat.completions.create(
142
+ model="deepseek-chat",
143
+ messages=messages)
144
+ else:
145
+ completion = deepseek_client.chat.completions.create(
146
+ model="deepseek-chat",
147
+ messages=messages,
148
+ temperature=temp)
149
+ answer = completion.choices[0].message.content
150
+ elif model_str == "o1-mini":
151
+ model_str = "o1-mini"
152
+ messages = [
153
+ {"role": "user", "content": system_prompt + prompt}]
154
+ if version == "0.28":
155
+ completion = openai.ChatCompletion.create(
156
+ model=f"{model_str}", # engine = "deployment_name".
157
+ messages=messages)
158
+ else:
159
+ client = OpenAI()
160
+ completion = client.chat.completions.create(
161
+ model="o1-mini-2024-09-12", messages=messages)
162
+ answer = completion.choices[0].message.content
163
+ elif model_str == "o1":
164
+ model_str = "o1"
165
+ messages = [
166
+ {"role": "user", "content": system_prompt + prompt}]
167
+ if version == "0.28":
168
+ completion = openai.ChatCompletion.create(
169
+ model="o1-2024-12-17", # engine = "deployment_name".
170
+ messages=messages)
171
+ else:
172
+ client = OpenAI()
173
+ completion = client.chat.completions.create(
174
+ model="o1-2024-12-17", messages=messages)
175
+ answer = completion.choices[0].message.content
176
+ elif model_str == "o1-preview":
177
+ model_str = "o1-preview"
178
+ messages = [
179
+ {"role": "user", "content": system_prompt + prompt}]
180
+ if version == "0.28":
181
+ completion = openai.ChatCompletion.create(
182
+ model=f"{model_str}", # engine = "deployment_name".
183
+ messages=messages)
184
+ else:
185
+ client = OpenAI()
186
+ completion = client.chat.completions.create(
187
+ model="o1-preview", messages=messages)
188
+ answer = completion.choices[0].message.content
189
+
190
+ try:
191
+ if model_str in ["o1-preview", "o1-mini", "claude-3.5-sonnet", "o1", "o3-mini"]:
192
+ encoding = tiktoken.encoding_for_model("gpt-4o")
193
+ elif model_str in ["deepseek-chat"]:
194
+ encoding = tiktoken.encoding_for_model("cl100k_base")
195
+ else:
196
+ encoding = tiktoken.encoding_for_model(model_str)
197
+ if model_str not in TOKENS_IN:
198
+ TOKENS_IN[model_str] = 0
199
+ TOKENS_OUT[model_str] = 0
200
+ TOKENS_IN[model_str] += len(encoding.encode(system_prompt + prompt))
201
+ TOKENS_OUT[model_str] += len(encoding.encode(answer))
202
+ if print_cost:
203
+ print(f"Current experiment cost = ${curr_cost_est()}, ** Approximate values, may not reflect true cost")
204
+ except Exception as e:
205
+ if print_cost: print(f"Cost approximation has an error? {e}")
206
+ return answer
207
+ except Exception as e:
208
+ print("Inference Exception:", e)
209
+ time.sleep(timeout)
210
+ continue
211
+ raise Exception("Max retries: timeout")
212
+
213
+
214
+ #print(query_model(model_str="o1-mini", prompt="hi", system_prompt="hey"))
mlesolver.py ADDED
@@ -0,0 +1,566 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import random
2
+ from copy import copy
3
+ from copy import deepcopy
4
+ from common_imports import *
5
+ from abc import abstractmethod
6
+
7
+
8
+ from tools import *
9
+ from inference import *
10
+ from pathlib import Path
11
+
12
+
13
+ from contextlib import contextmanager
14
+ import sys, os
15
+
16
+
17
+ os.environ["JOBLIB_VERBOSITY"] = "0"
18
+ logging.basicConfig(level=logging.WARNING)
19
+ warnings.filterwarnings("ignore")
20
+ warnings.simplefilter(action='ignore', category=FutureWarning)
21
+ import logging
22
+ logging.getLogger('sklearn.model_selection').setLevel(logging.WARNING)
23
+
24
+
25
+ GLOBAL_REPAIR_ATTEMPTS = 2
26
+
27
+
28
+ class Command:
29
+ def __init__(self):
30
+ self.cmd_type = "OTHER"
31
+
32
+ @abstractmethod
33
+ def docstring(self) -> str:
34
+ pass
35
+
36
+ @abstractmethod
37
+ def execute_command(self, *args) -> str:
38
+ pass
39
+
40
+ @abstractmethod
41
+ def matches_command(self, cmd_str) -> bool:
42
+ pass
43
+
44
+ @abstractmethod
45
+ def parse_command(self, cmd_str) -> tuple:
46
+ pass
47
+
48
+
49
+ """
50
+ @@@@@@@@@@@@@@@@@@
51
+ @@ CODING TOOLS @@
52
+ @@@@@@@@@@@@@@@@@@
53
+ """
54
+
55
+ class Replace(Command):
56
+ def __init__(self):
57
+ super().__init__()
58
+ self.cmd_type = "CODE-replace"
59
+
60
+ def docstring(self) -> str:
61
+ return (
62
+ "============= REWRITE CODE EDITING TOOL =============\n"
63
+ "You also have access to a code replacing tool. \n"
64
+ "This tool allows you to entirely re-write/replace all of the current code and erase all existing code.\n"
65
+ "You can use this tool via the following command: ```REPLACE\n<code here>\n```, where REPLACE is the word REPLACE and <code here> will be the new code that is replacing the entire set of old code. This tool is useful if you want to make very significant changes, such as entirely changing the model, or the learning process. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code. Try limiting the use of rewriting and aim for editing the code more."
66
+ )
67
+
68
+ def execute_command(self, *args) -> str:
69
+ # args[0] -> new code
70
+ args = args[0]
71
+ return args[0]
72
+
73
+ def matches_command(self, cmd_str) -> bool:
74
+ if "```REPLACE" in cmd_str: return True
75
+ return False
76
+
77
+ def parse_command(self, *args) -> tuple:
78
+ new_code = extract_prompt(args[0], "REPLACE")
79
+ code_exec = f"{args[1]}\n{new_code}"
80
+ code_ret = execute_code(code_exec)
81
+ if "[CODE EXECUTION ERROR]" in code_ret: return False, (None, code_ret,)
82
+ return True, (new_code.split("\n"), code_ret)
83
+
84
+
85
+
86
+ class Edit(Command):
87
+ def __init__(self):
88
+ super().__init__()
89
+ self.cmd_type = "CODE-edit"
90
+
91
+ def docstring(self) -> str:
92
+ return (
93
+ "============= CODE EDITING TOOL =============\n"
94
+ "You also have access to a code editing tool. \n"
95
+ "This tool allows you to replace lines indexed n through m (n:m) of the current code with as many lines of new code as you want to add. This removal is inclusive meaning that line n and m and everything between n and m is removed. This will be the primary way that you interact with code. \n"
96
+ "You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code. Your changes should significantly change the functionality of the code."
97
+ )
98
+
99
+ def execute_command(self, *args) -> str:
100
+ # args[0] -> N (int)
101
+ # args[1] -> M (int)
102
+ # args[2] -> old code
103
+ # args[3] -> new lines to replace
104
+ # args[4] -> new lines to replace
105
+ try:
106
+ args = args[0]
107
+ current_code = args[2]
108
+ lines_to_add = list(reversed(args[3]))
109
+ lines_to_replace = list(reversed(range(args[0], args[1]+1)))
110
+ for _ln in lines_to_replace:
111
+ current_code.pop(_ln)
112
+ for _line in lines_to_add:
113
+ current_code.insert(args[0], _line)
114
+ new_code = "\n".join(current_code)
115
+ code_exec = f"{args[4]}\n{new_code}"
116
+ code_ret = execute_code(code_exec)
117
+ if "CODE EXECUTION ERROR" in code_ret: return (False, None, code_ret)
118
+ return (True, current_code, code_ret)
119
+ except Exception as e:
120
+ return (False, None, str(e))
121
+
122
+ def matches_command(self, cmd_str) -> bool:
123
+ if "```EDIT" in cmd_str: return True
124
+ return False
125
+
126
+ def parse_command(self, *args) -> tuple:
127
+ cmd_str, codelines, datasetcode = args[0], args[1], args[2]
128
+ success = True
129
+ try:
130
+ text = extract_prompt(cmd_str, "EDIT").split("\n")
131
+ if len(text) == 0: return False, None
132
+ lines_to_edit = text[0].split(" ")
133
+ if len(lines_to_edit) != 2: return False, None
134
+ lines_to_edit = [int(_) for _ in lines_to_edit]
135
+ if len(text[1:]) == 0: return False, None
136
+ return success, (lines_to_edit[0], lines_to_edit[1], codelines, text[1:], datasetcode)
137
+ except Exception as e:
138
+ return False, (None, None, None, None, None)
139
+
140
+
141
+ def get_score(outlined_plan, code, code_return, REWARD_MODEL_LLM, attempts=3, openai_api_key=None):
142
+ e = str()
143
+ for _attempt in range(attempts):
144
+ try:
145
+ # todo: have a reward function here
146
+ sys = (
147
+ f"You are a professor agent who is serving as an expert reward model that can read a research plan, research code, and code output and are able to determine how well a model followed the plan, built the code, and got the proper output scored from 0 to 1 as a float.\n\n"
148
+ f"You must structure your score exactly in the following way: ```SCORE\n<score here>\n``` where SCORE is just the word score, <score here> is a floating point number between 0 and 1 representing how well the model followed the plan, built the code, and got the proper output."
149
+ )
150
+ scoring = query_model(
151
+ model_str=f"{REWARD_MODEL_LLM}",
152
+ system_prompt=sys,
153
+ openai_api_key=openai_api_key,
154
+ prompt=(
155
+ f"Outlined in the following text is the research plan that the machine learning engineer was tasked with building: {outlined_plan}\n\n"
156
+ f"The following text is the research code that the model produced: \n{code}\n\n"
157
+ f"The following is the output from the model: {code_return}\n\n"), temp=0.6)
158
+ performance = extract_prompt(text=scoring, word="SCORE")
159
+ performance = float(performance)
160
+ return performance, f"The performance of your submission is: {performance}", True
161
+ except Exception as e:
162
+ return None, str(e), False
163
+ return 0, e
164
+
165
+
166
+ def code_repair(code, error, ctype, REPAIR_LLM, openai_api_key=None):
167
+ if ctype == "replace":
168
+ repair_sys = (
169
+ "You are an automated code repair tool.\n"
170
+ "Your goal is to take in code and an error and repair the code to make sure the same error does not repeat itself, and also to remove any other potential errors from the code without affecting the code output.\n"
171
+ "Your output should match the original code as closely as possible.\n"
172
+ "You must wrap the code in the following ```python\n<code here>\n```\n"
173
+ "Do not forget the opening ```python and the closing ```."
174
+ )
175
+ model_resp = query_model(
176
+ openai_api_key=openai_api_key,
177
+ model_str=f"{REPAIR_LLM}",
178
+ system_prompt=repair_sys,
179
+ prompt=f"Provided here is the error: {error}\n\nProvided below is the code:\n\n{code}", temp=0.8)
180
+ return extract_prompt(model_resp, "python")
181
+ elif ctype == "edit":
182
+ repair_sys = (
183
+ "You are an automated code repair tool.\n"
184
+ "Your goal is to take in code and an error and repair the code to make sure the same error does not repeat itself, and also to remove any other potential errors from the code without affecting the code output.\n"
185
+ "Your output should match the original code as closely as possible.\n"
186
+
187
+ "============= CODE EDITING TOOL =============\n"
188
+ "You have access to a code editing tool. \n"
189
+ "This tool allows you to replace lines indexed n through m (n:m) of the current code with as many lines of new code as you want to add. This removal is inclusive meaning that line n and m and everything between n and m is removed. This will be the primary way that you interact with code. \n"
190
+ "You can edit code using the following command: ```EDIT N M\n<new lines to replace old lines>\n``` EDIT is the word EDIT, N is the first line index you want to replace and M the the last line index you want to replace (everything inbetween will also be removed), and <new lines to replace old lines> will be the new code that is replacing the old code. Before changing the existing code to be your new code, your new code will be tested and if it returns an error it will not replace the existing code.\n"
191
+ "Please use the code editing tool to fix this code."
192
+ "Do not forget the opening ```EDIT N M and the closing ```."
193
+ "Your output should look like the following\n\n```EDIT N M\n<new lines to replace old lines>\n```"
194
+ )
195
+ model_resp = query_model(
196
+ openai_api_key=openai_api_key,
197
+ model_str=f"{REPAIR_LLM}",
198
+ system_prompt=repair_sys,
199
+ prompt=f"Provided here is the error: {error}\n\nProvided below is the code:\n\n{code}", temp=0.2)
200
+ return model_resp
201
+
202
+
203
+ class MLESolver:
204
+ def __init__(self, dataset_code, openai_api_key=None, notes=None, max_steps=10, insights=None, plan=None, llm_str=None):
205
+ self.supress_print = False
206
+ if notes is None: self.notes = []
207
+ else: self.notes = notes
208
+ self.dataset_code = dataset_code
209
+ if plan is None: self.plan = ""
210
+ else: self.plan = plan
211
+ self.llm_str = llm_str
212
+ self.verbose = False
213
+ self.max_codes = 1
214
+ self.st_hist_len = 2
215
+ self.min_gen_trials = 1
216
+ self.code_lines = str()
217
+ self.st_history = list()
218
+ self.insights = insights
219
+ self.code_reflect = str()
220
+ self.max_steps = max_steps
221
+ self.prev_code_ret = str()
222
+ self.should_execute_code = True
223
+ self.openai_api_key = openai_api_key
224
+
225
+ def initial_solve(self):
226
+ """
227
+ Initialize the solver and get an initial set of code and a return
228
+ @return: None
229
+ """
230
+ # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
231
+ # @@ Initial CodeGen Commands @@
232
+ # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
233
+ self.best_score = None
234
+ self.commands = [Replace()]
235
+ self.model = f"{self.llm_str}"
236
+ init_code, init_return, self.best_score = self.gen_initial_code()
237
+ self.best_codes = [(copy(init_code), self.best_score, init_return) for _ in range(1)]
238
+
239
+ self.code_lines = init_code
240
+ self.model = f"{self.llm_str}"
241
+ self.commands = [Edit(), Replace()]
242
+ self.prev_working_code = copy(self.code_lines)
243
+
244
+ @staticmethod
245
+ def clean_text(text):
246
+ text = text.replace("```\n", "```")
247
+ text = text.replace("```python\n", "```REPLACE\n")
248
+ return text
249
+
250
+ def gen_initial_code(self):
251
+ num_attempts = 0
252
+ error_hist = list()
253
+ while True:
254
+ if num_attempts == 0:
255
+ err = str()
256
+ err_hist = str()
257
+ else:
258
+ err = f"The following was the previous command generated: {model_resp}. This was the error return {cmd_str}. You should make sure not to repeat this error and to solve the presented problem."
259
+ error_hist.append(err)
260
+ if len(error_hist) == 5: _ = error_hist.pop(0)
261
+ err = "\n".join(error_hist)
262
+ err_hist = "The following is a history of your previous errors\n" + err + "\nDO NOT REPEAT THESE."
263
+ model_resp = query_model(
264
+ openai_api_key=self.openai_api_key,
265
+ model_str=self.model,
266
+ system_prompt=self.system_prompt(),
267
+ prompt=f"{err_hist}\nYou should now use ```REPLACE to create initial code to solve the challenge. Now please enter the ```REPLACE command below:\n ", temp=1.0)
268
+ model_resp = self.clean_text(model_resp)
269
+ cmd_str, code_lines, prev_code_ret, should_execute_code, score = self.process_command(model_resp)
270
+ if not self.supress_print: print(f"@@@ INIT ATTEMPT: Command Exec // Attempt {num_attempts}: ", str(cmd_str).replace("\n", " | "))
271
+ if not self.supress_print: print(f"$$$ Score: {score}")
272
+ if score is not None: break
273
+ num_attempts += 1
274
+ return code_lines, prev_code_ret, score
275
+
276
+ def solve(self):
277
+ num_attempts = 0
278
+ best_pkg = None
279
+ top_score = None
280
+ self.prev_code_ret = None
281
+ self.should_execute_code = False
282
+ while True:
283
+ if len(self.commands) == 2: cmd_app_str = "You must output either the ```EDIT or ```REPLACE command immediately. "
284
+ else: cmd_app_str = ""
285
+ model_resp = query_model(
286
+ openai_api_key=self.openai_api_key,
287
+ model_str=self.model,
288
+ system_prompt=self.system_prompt(),
289
+ prompt=f"The following is your history:{self.history_str()}\n\n{cmd_app_str}Now please enter a command: ", temp=1.0)
290
+ model_resp = self.clean_text(model_resp)
291
+ self.code_lines = copy(random.choice(self.best_codes)[0])
292
+ cmd_str, code_lines, prev_code_ret, should_execute_code, score = self.process_command(model_resp)
293
+ self.st_history.append([model_resp, prev_code_ret, code_lines, cmd_str])
294
+ if len(self.st_history) > self.st_hist_len: self.st_history.pop(0)
295
+ if score is not None:
296
+ if top_score is None:
297
+ best_pkg = copy(code_lines), copy(prev_code_ret), copy(should_execute_code), copy(model_resp), copy(cmd_str)
298
+ top_score = score
299
+ elif score > top_score:
300
+ best_pkg = copy(code_lines), copy(prev_code_ret), copy(should_execute_code), copy(model_resp), copy(cmd_str)
301
+ top_score = score
302
+ if not self.supress_print: print(f"@@@ Command Exec // Attempt {num_attempts}: ", str(cmd_str).replace("\n", " | "))
303
+ if not self.supress_print: print(f"$$$ Score: {score}")
304
+ if num_attempts >= self.min_gen_trials and top_score is not None: break
305
+ num_attempts += 1
306
+ self.code_lines, self.prev_code_ret, self.should_execute_code, model_resp, cmd_str = best_pkg
307
+ if not self.supress_print: print(prev_code_ret)
308
+ # add top scoring code that was successful to the best codes
309
+ if top_score > self.best_codes[-1][1]:
310
+ # replace the lowest scoring one
311
+ if len(self.best_codes) >= self.max_codes:
312
+ self.best_codes.pop(-1)
313
+ self.code_reflect = self.reflect_code()
314
+ self.best_codes.append((copy(self.code_lines), copy(top_score), self.prev_code_ret))
315
+ # sort by score, to make sure lowest are removed in future
316
+ self.best_codes.sort(key=lambda x: x[1], reverse=True)
317
+ return model_resp, cmd_str
318
+
319
+ def reflect_code(self):
320
+ """
321
+ Provide a reflection on produced behavior for next execution
322
+ @return: (str) language model-produced reflection
323
+ """
324
+ code_strs = ("$"*40 + "\n\n").join([self.generate_code_lines(_code[0]) + f"\nCode Return {_code[1]}" for _code in self.best_codes])
325
+ code_strs = f"Please reflect on the following sets of code: {code_strs} and come up with generalizable insights that will help you improve your performance on this benchmark."
326
+ syst = self.system_prompt(commands=False) + code_strs
327
+ return query_model(prompt="Please reflect on ideas for how to improve your current code. Examine the provided code and think very specifically (with precise ideas) on how to improve performance, which methods to use, how to improve generalization on the test set with line-by-line examples below:\n", system_prompt=syst, model_str=f"{self.llm_str}", openai_api_key=self.openai_api_key)
328
+
329
+ def process_command(self, model_resp):
330
+ """
331
+ Take command from language model and execute if valid
332
+ @param model_resp: (str) language model output
333
+ @return: (tuple) tuple containing the following items
334
+ - cmd_str: (str) code execution return and success flag
335
+ - code_lines: (list) list of code lines as strings
336
+ - prev_code_ret: (str) output from running code
337
+ - should_execute_code: (bool) did the code change, if so we need to re-execute it
338
+ - score: (float) score of model
339
+ """
340
+ prev_code_ret = self.prev_code_ret
341
+ should_execute_code = self.should_execute_code
342
+ code_lines = copy(self.code_lines)
343
+ remove_figures()
344
+ for cmd in self.commands:
345
+ if cmd.matches_command(model_resp):
346
+ # attempt to execute the code edit command
347
+ if cmd.cmd_type == "CODE-edit":
348
+ score = None
349
+ failed = True
350
+ code_err = str()
351
+ for _tries in range(GLOBAL_REPAIR_ATTEMPTS):
352
+ success, args = cmd.parse_command(model_resp, copy(self.code_lines), self.dataset_code)
353
+ if success:
354
+ cmd_return = cmd.execute_command(args)
355
+ code_err = f"Return from executing code: {cmd_return[2]}"
356
+ if cmd_return[0]: # if success
357
+ code_lines = copy(cmd_return[1])
358
+ score, cmd_str, is_valid = get_score(self.plan, "\n".join(code_lines), cmd_return[2], openai_api_key=self.openai_api_key, REWARD_MODEL_LLM=self.llm_str)
359
+ if is_valid:
360
+ failed = False
361
+ break
362
+ code_err += f"\nReturn from executing code on real test set {cmd_str}"
363
+ repaired_code = code_repair(model_resp, code_err, REPAIR_LLM=self.llm_str, ctype="edit", openai_api_key=self.openai_api_key)
364
+ model_resp = repaired_code
365
+ if not self.supress_print: print(f" * Attempting repair // try {_tries}*")
366
+ if failed:
367
+ cmd_str = f"Code editing FAILED due to the following error: {code_err}. Code was reverted back to original state before edits."
368
+ if not self.supress_print: print("$$$$ CODE EDIT (failed)")
369
+ else:
370
+ cmd_str = "Code was successfully edited."
371
+ prev_code_ret = copy(cmd_return[2])
372
+ if not self.supress_print: print("$$$$ CODE EDIT (success)")
373
+ should_execute_code = True
374
+ return cmd_str, code_lines, prev_code_ret, should_execute_code, score
375
+ # attempt to execute the code replace command
376
+ elif cmd.cmd_type == "CODE-replace": # DONE
377
+ score = None
378
+ failed = True
379
+ code_err = str()
380
+ for _tries in range(GLOBAL_REPAIR_ATTEMPTS):
381
+ success, args = cmd.parse_command(model_resp, self.dataset_code)
382
+ code_err = f"Return from executing code: {args[1]}"
383
+ if success:
384
+ code_lines = copy(args[0])
385
+ score, cmd_str, is_valid = get_score(self.plan, "\n".join(code_lines), args[1], openai_api_key=self.openai_api_key, REWARD_MODEL_LLM=self.llm_str)
386
+ if is_valid:
387
+ failed = False
388
+ break
389
+ code_err += f"\nReturn from executing code on real test set {cmd_str}"
390
+ repaired_code = code_repair(extract_prompt(model_resp, "REPLACE", ), code_err, ctype="replace", openai_api_key=self.openai_api_key, REPAIR_LLM=self.llm_str)
391
+ repaired_code = f"```REPLACE\n{repaired_code}\n```"
392
+ model_resp = repaired_code
393
+ if not self.supress_print: print(f" * Attempting repair // try {_tries}*")
394
+ if failed:
395
+ cmd_str = f"Code replacement FAILED due to the following error: {code_err}. Code was reverted back to original state before edits."
396
+ if not self.supress_print: print("$$$$ CODE REPLACE (failed)")
397
+ else:
398
+ cmd_str = "Code was successfully replaced."
399
+ code_lines = copy(args[0])
400
+ prev_code_ret = copy(args[1])
401
+ if not self.supress_print: print("$$$$ CODE REPLACE (success)")
402
+ should_execute_code = True
403
+ return cmd_str, code_lines, prev_code_ret, should_execute_code, score
404
+ if not self.supress_print: print("$$$$ INVALID COMMAND (failed)")
405
+ return "Command not supported, choose from existing commands", None, None, None, None
406
+
407
+ def history_str(self):
408
+ """
409
+ Well-formatted history string
410
+ @return: (str) history string
411
+ """
412
+ hist_str = ""
413
+ for _hist in range(len(self.st_history)):
414
+ hist_str += f"-------- History ({len(self.st_history)-_hist} steps ago) -----\n"
415
+ hist_str += f"Because of the following response: {self.st_history[_hist][0]}\n" if len(self.st_history[_hist][0]) > 0 else ""
416
+ hist_str += f"and the following COMMAND response output: {self.st_history[_hist][3]}\n"
417
+ hist_str += f"With the following code used: {'#'*20}\n{self.st_history[_hist][2]}\n{'#'*20}\n\n"
418
+ hist_str += f"The environment feedback and reflection was as follows: {self.st_history[_hist][1]}\n"
419
+ hist_str += f"-------- End of history ({len(self.st_history)-_hist} steps ago) -------\n"
420
+ return hist_str
421
+
422
+ def system_prompt(self, commands=True):
423
+ """
424
+ Produce a system prompt for the mle-solver to solve ml problems
425
+ @param commands: (bool) whether to use command prompt
426
+ @return: (str) system prompt
427
+ """
428
+ return (
429
+ # ROLE DESCRIPTION
430
+ f"{self.role_description()}.\n"
431
+ # TASK INSTRUCTIONS
432
+ f"The following are your task instructions: {self.phase_prompt()}\n"
433
+ # LIT REVIEW INSIGHTS
434
+ f"Provided below are some insights from a literature review summary:\n{self.insights}\n"
435
+ # CODE INSIGHTS
436
+ f"{self.code_reflect}"
437
+ # NOTES
438
+ f"The following are notes, instructions, and general tips for you: {self.notes}"
439
+ # PLAN DESCRIPTION
440
+ f"You are given a machine learning research task described, where the plan is described as follows: {self.plan}\n"
441
+ # DATASET DESCRIPTION
442
+ f"{self.generate_dataset_descr_prompt()}"
443
+ # Create Figures
444
+ f"You should also try generating at least two figures to showcase the results, titled Figure_1.png and Figure_2.png\n"
445
+ f"Your method MUST not get 0% accuracy. If it does, you have done something wrong and must correct this. Make sure to check your accuracy calculation is correct.\n"
446
+ # transition
447
+ f"Your goal is to solve the research plan as well as possible. You will receive a score after you write the code and should aim to maximize the score by following the plan instructions and writing high quality code.\n"
448
+ f"Before each experiment please include a print statement explaining exactly what the results are meant to show in great detail before printing the results out.\n"
449
+ # COMMAND SET
450
+ f"The following are commands you have access to: {self.command_descriptions()}\n. You should try to have a diversity of command responses if appropriate. Do not repeat the same commend too many times. Please consider looking through your history and not repeating commands too many times." if commands else ""
451
+ )
452
+
453
+ def generate_code_lines(self, code):
454
+ """
455
+ Generate well-formatted code lines with line numbers
456
+ @param code: (list) list of code line strings
457
+ @return: (str) code lines formatted with line numbers
458
+ """
459
+ codestr = str()
460
+ for _index in range(len(code)):
461
+ codestr += f"{_index} |{code[_index]}\n"
462
+ return codestr
463
+
464
+ def feedback(self, code_return):
465
+ """
466
+ Provide execution feedback after command is run
467
+ @param code_return: (str) return from code execution
468
+ @return: (str) feedback string
469
+ """
470
+ if code_return is not None:
471
+ code_str = self.generate_code_lines(self.code_lines)
472
+ if "[CODE EXECUTION ERROR]" in code_return:
473
+ if not self.supress_print: print(f"@@@@ ERROR") # , {code_return.replace('\n', '')}")
474
+ reflect_prompt = f"This is your code: {code_str}\n\nYour code returned the following error {code_return}. Please provide a detailed reflection on why this error was returned, which lines in the code caused this error, and exactly (line by line) how you hope to fix this in the next update. This step is mostly meant to reflect in order to help your future self fix the error better. Do not provide entirely new code but provide suggestions on how to fix the bug using LINE EDITS."
475
+ elif os.path.exists("submission.csv"):
476
+ self.prev_working_code = copy(self.code_lines)
477
+ grade_return = get_score(self.plan, "\n".join(self.prev_working_code), code_return, openai_api_key=self.openai_api_key)[0]
478
+ if not self.supress_print: print(f"@@@@ SUBMISSION: model score {grade_return}", REWARD_MODEL_LLM=self.llm_str)
479
+ f"Your code was properly submitted and you have just received a grade for your model.\nYour score was {grade_return}.\n\n"
480
+ reflect_prompt = f"This is your code: {code_str}\n\nYour code successfully returned a submission csv. Consider further improving your technique through advanced learning techniques, data augmentation, or hyperparamter tuning to increase the score. Please provide a detailed reflection on how to improve your performance, which lines in the code could be improved upon, and exactly (line by line) how you hope to improve this in the next update. This step is mostly meant to reflect in order to help your future self."
481
+
482
+ for file in os.listdir("."):
483
+ if file.endswith(".csv"):
484
+ os.system(f"rm {file}")
485
+ else:
486
+ if not self.supress_print: print("@@@@ No return")
487
+ reflect_prompt = f"This is your code: {code_str}\n\nYour code did not return an error, but also did not successfully submit a submission csv file. Please reflect on how you can improve your submission for the next cycle to submit a file and obtain a high score."
488
+ elif not self.should_execute_code:
489
+ code_return = "No changes were made to the code."
490
+ reflect_prompt = "Reflect on your future plans and next steps to improve the code."
491
+ reflection = self.reflection(reflect_prompt, code_str, code_return)
492
+ return f"Code return: {code_return}\n\nReflection: {reflection}"
493
+
494
+ def reflection(self, reflect_prompt, code_str, code_return):
495
+ """
496
+ Reflect on your future plans and next steps to improve the code
497
+ @param reflect_prompt: (str) reflection prompt
498
+ @param code_str: (str) code string
499
+ @return: (str) reflection string
500
+ """
501
+ refl = query_model(prompt=reflect_prompt, system_prompt=self.system_prompt(commands=False), model_str=f"{self.llm_str}", openai_api_key=self.openai_api_key)
502
+ return f"During the previous execution, the following code was run: \n\n{code_str}\n\nThis code returned the following: \n{code_return}\nThe following is your reflection from this feedback {refl}\n"
503
+
504
+ def generate_dataset_descr_prompt(self):
505
+ """
506
+ Generate description prompt for kaggle dataset
507
+ @param data_loader: (DataLoader) data loader
508
+ @return: (str) description prompt
509
+ """
510
+ return f"\n- The following dataset code will be added to the beginning of your code always, so this does not need to be rewritten: {self.dataset_code}"
511
+
512
+ def phase_prompt(self,):
513
+ """
514
+ Describe system role and general tips for mle-solver
515
+ @return: (str) system role
516
+ """
517
+ phase_str = (
518
+ "You are an ML engineer and you will be writing the code for a research project.\n"
519
+ "Your goal is to produce code that obtains final results for a set of research experiments. You should aim for simple code to collect all results, not complex code. You should integrate the provided literature review and the plan to make sure you are implementing everything outlined in the plan. The dataset code will be added to the beginning of your code always, so this does not need to be rewritten. Make sure you do not write functions, only loose code.\n"
520
+ "I would recommend writing smaller code so you do not run out of time but make sure to work on all points in the plan in the same code. You code should run every experiment outlined in the plan for a single code.\n",
521
+ "You cannot pip install new libraries, but many machine learning libraries already work. If you wish to use a language model in your code, please use the following:\nAnything you decide to print inside your code will be provided to you as input, and you will be able to see that part of the code. Using print statements is useful for figuring out what is wrong and understanding your code better."
522
+ )
523
+ return phase_str
524
+
525
+ def role_description(self):
526
+ """
527
+ Provide role description
528
+ @return: (str) role description
529
+ """
530
+ return "You are an expert machine learning engineer working at a top university to write code to solve machine learning research challenges using your machine learning expertise."
531
+
532
+ @staticmethod
533
+ def _common_code_errors():
534
+ """
535
+ Some general tips to avoid common code errors, also TF has many errors so we avoid this and ask to use pytorch
536
+ @return: (str) common code errors
537
+ """
538
+ return (
539
+ "Make sure to import everything that you are using.\n"
540
+ "Reflect on the code before writing it to make sure there are no bugs or compilation issues.\n"
541
+ "YOU MUST USE COMMANDS PROPERLY. Do not use the word COMMAND for the command that is incorrect. You must use an actual command (e.g. EDIT, REPLACE...) NOT THE WORD COMMAND. Do not make this mistake.\n"
542
+ "Under no circumstances should you use tensorflow or keras. Only use pytorch for scikitlearn for deep learning.\n"
543
+ )
544
+
545
+ def command_descriptions(self):
546
+ """
547
+ Provide command descriptions
548
+ @return: (str) command descriptions
549
+ """
550
+ cmd_strings = "\n".join([_cmd.docstring() for _cmd in self.commands])
551
+ return f"\nYou also have access to tools which can be interacted with using the following structure: ```COMMAND\n<command information here>\n```, where COMMAND is whichever command you want to run (e.g. EDIT, REPLACE...), <command information here> is information used for the command, such as code to run or a search query, and ``` are meant to encapsulate the command. ``` must be included as part of the command both at the beginning and at the end of the code. DO NOT FORGOT TO HAVE ``` AT THE TOP AND BOTTOM OF CODE. and this structure must be followed to execute a command correctly. YOU CAN ONLY EXECUTE A SINGLE COMMAND AT A TIME! Do not try to perform multiple commands EVER only one. {self._common_code_errors()}" + cmd_strings
552
+
553
+ def run_code(self):
554
+ """
555
+ Actually execute the code that was generated
556
+ @return: (str) code return
557
+ """
558
+ if self.prev_code_ret is not None:
559
+ return self.prev_code_ret
560
+ elif self.should_execute_code:
561
+ return execute_code("\n".join(self.code_lines))
562
+ return "Changes have not yet been made to the code."
563
+
564
+
565
+
566
+
readme/README-arabic.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # مختبر الوكيل: استخدام وكلاء النماذج اللغوية الكبيرة كمساعدين بحثيين
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | العربية | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+
13
+ <p align="center">
14
+ 【🌐 <a href="https://agentlaboratory.github.io/">الموقع الإلكتروني</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">البرمجيات</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">الفيديو</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">مثال على ورقة بحثية</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">الاستشهاد</a>】
15
+ </p>
16
+
17
+ ## 📖 نظرة عامة
18
+
19
+ - **مختبر الوكيل** هو سير عمل بحثي مستقل من البداية للنهاية مصمم لمساعدتك كباحث بشري في **تنفيذ أفكار بحثك**. يتكون مختبر الوكيل من وكلاء متخصصين مدفوعين بنماذج لغوية كبيرة لدعمك طوال سير العمل البحثي بالكامل — من إجراء مراجعات الأدبيات وصياغة الخطط إلى تنفيذ التجارب وكتابة تقارير شاملة.
20
+ - هذا النظام ليس مصممًا لاستبدال إبداعك بل لتكملته، مما يتيح لك التركيز على توليد الأفكار والتفكير النقدي بينما يقوم بأتمتة المهام المتكررة والتي تستغرق وقتًا طويلاً مثل البرمجة والتوثيق. من خلال استيعاب مستويات مختلفة من الموارد الحاسوبية والمشاركة البشرية، يهدف مختبر الوكيل إلى تسريع الاكتشافات العلمية وتحسين إنتاجيتك البحثية.
21
+
22
+ <p align="center">
23
+ <img src="../media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
24
+ </p>
25
+
26
+ ### 🔬 كيف يعمل مختبر الوكيل؟
27
+
28
+ - يتكون مختبر الوكيل من ثلاث مراحل رئيسية توجه عملية البحث بشكل منهجي: (1) مراجعة الأدبيات، (2) التجارب، و(3) كتابة التقارير. خلال كل مرحلة، يتعاون وكلاء متخصصون مدفوعون بنماذج لغوية كبيرة لتحقيق أهداف مميزة، مع دمج أدوات خارجية مثل arXiv، Hugging Face، Python، وLaTeX لتحسين النتائج. يبدأ سير العمل هذا بجمع وتحليل مستقل للأوراق البحثية ذات الصلة، يتقدم من خلال التخطيط التعاوني وإعداد البيانات، وينتهي بتنفيذ التجارب تلقائيًا وتوليد تقارير شاملة. يتم مناقشة تفاصيل أدوار الوكلاء المحددة ومساهماتهم عبر هذه المراحل في الورقة البحثية.
29
+
30
+ <p align="center">
31
+ <img src="../media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
32
+ </p>
33
+
34
+ ## 🖥️ التثبيت
35
+
36
+
37
+ ### خيار البيئة الافتراضية للبايثون
38
+
39
+ 1. **استنساخ مستودع GitHub**: ابدأ باستنساخ المستودع باستخدام الأمر:
40
+ ```bash
41
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
42
+ ```
43
+
44
+ 2. **إعداد وتفعيل بيئة البايثون**
45
+ ```bash
46
+ python -m venv venv_agent_lab
47
+ ```
48
+
49
+ - الآن قم بتفعيل هذه البيئة:
50
+ ```bash
51
+ source venv_agent_lab/bin/activate
52
+ ```
53
+
54
+ 3. **تثبيت المكتبات المطلوبة**
55
+ ```bash
56
+ pip install -r requirements.txt
57
+ ```
58
+
59
+ 4. **تثبيت pdflatex [اختياري]**
60
+ ```bash
61
+ sudo apt install pdflatex
62
+ ```
63
+
64
+ - هذا يمكن الوكلاء من تجميع مصدر LaTeX.
65
+ - **[مهم]** إذا لم تتمكن من تشغيل هذه الخطوة بسبب عدم وجود صلاحيات sudo، يمكن إيقاف تجميع PDF عن طريق تشغيل مختبر الوكيل مع تعيين العلم --compile_latex إلى false:
66
+ ```bash
67
+ --compile_latex=False
68
+ ```
69
+
70
+ 5. **الآن قم بتشغيل مختبر الوكيل!**
71
+ ```bash
72
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
73
+ ```
74
+
75
+ أو، إذا لم يكن لديك pdflatex مثبتًا
76
+ ```bash
77
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
78
+ ```
79
+
80
+ -----
81
+ ## نصائح لتحقيق نتائج بحثية أفضل
82
+
83
+ #### [نصيحة #1] 📝 تأكد من كتابة ملاحظات شاملة! 📝
84
+
85
+ **كتابة ملاحظات شاملة أمر مهم** لمساعدة وكيلك على فهم ما تسعى إلى تحقيقه في مشروعك، بالإضافة إلى أي تفضيلات أسلوبية. يمكن أن تشمل الملاحظات أي تجارب ترغب في أن يقوم الوكلاء بتنفيذها، توفير مفاتيح API، بعض الرسوم البيانية أو الأشكال التي ترغب في تضمينها، أو أي شيء تريد أن يعرفه الوكيل عند إجراء البحث.
86
+
87
+ هذه أيضًا فرصتك لإعلام الوكيل **بالموارد الحاسوبية التي يمكنه الوصول إليها**، مثل وحدات معالجة الرسومات (عددها، نوعها، حجم الذاكرة)، وحدات المعالجة المركزية (عدد النوى، نوعها)، قيود التخزين، ومواصفات الأجهزة.
88
+
89
+ لإضافة ملاحظات، يجب تعديل هيكل task_notes_LLM داخل ملف ai_lab_repo.py. فيما يلي مثال على مجموعة من الملاحظات المستخدمة لبعض تجاربنا.
90
+
91
+ ```python
92
+ task_notes_LLM = [
93
+ {"phases": ["plan formulation"],
94
+ "note": f"You should come up with a plan for TWO experiments."},
95
+
96
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
97
+ "note": "Please use gpt-4o-mini for your experiments."},
98
+
99
+ {"phases": ["running experiments"],
100
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
101
+
102
+ {"phases": ["running experiments"],
103
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
104
+
105
+ {"phases": ["running experiments"],
106
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
107
+
108
+ {"phases": ["data preparation", "running experiments"],
109
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
110
+
111
+ {"phases": ["data preparation", "running experiments"],
112
+ "note": "Generate figures with very colorful and artistic design."},
113
+ ]
114
+ ```
115
+
116
+ --------
117
+
118
+ #### [نصيحة #2] 🚀 استخدام نماذج أكثر قوة يؤدي عمومًا إلى أبحاث أفضل 🚀
119
+
120
+ عند إجراء البحث، **يمكن أن يؤثر اختيار النموذج بشكل كبير على جودة النتائج**. النماذج الأكثر قوة تميل إلى أن تكون أكثر دقة، ولديها قدرات تفكير أفضل، وتوليد تقارير أفضل. إذا سمحت الموارد الحاسوبية، أعطِ الأولوية لاستخدام النماذج المتقدمة مثل o1-(mini/preview) أو نماذج لغوية كبيرة حديثة مماثلة.
121
+
122
+ ومع ذلك، **من المهم تحقيق التوازن بين الأداء والفعالية من حيث التكلفة**. بينما قد تؤدي النماذج القوية إلى نتائج أفضل، فهي غالبًا ما تكون أكثر تكلفة وتستغرق وقتًا أطول للتشغيل. فكر في استخدامها بشكل انتقائي — على سبيل المثال، للتجارب الرئيسية أو التحليلات النهائية — بينما تعتمد على نماذج أصغر وأكثر كفاءة للمهام التكرارية أو النمذجة الأولية.
123
+
124
+ عندما تكون الموارد محدودة، **قم بتحسين الأداء عن طريق ضبط النماذج الأصغر** على مجموعة البيانات الخاصة بك أو عن طريق دمج النماذج المدربة مسبقًا مع مطالبات محددة بالمهام لتحقيق التوازن المطلوب بين الأداء والكفاءة الحاسوبية.
125
+
126
+ -----
127
+
128
+ #### [نصيحة #3] ✅ يمكنك تحميل الحفظات السابقة من نقاط التفتيش ✅
129
+
130
+ **إذا فقدت تقدمك، أو انقطعت اتصال الإنترنت، أو فشلت مهمة فرعية، يمكنك دائمًا التحميل من حالة سابقة.** يتم حفظ كل تقدمك افتراضيًا في متغير state_saves، الذي يخزن كل نقطة تفتيش فردية. فقط مرر الحجج التالية عند تشغيل ai_lab_repo.py
131
+
132
+ ```bash
133
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
134
+ ```
135
+
136
+ -----
137
+
138
+ #### [نصيحة #4] 🈯 إذا كنت تعمل بلغة غير الإنجليزية 🈲
139
+
140
+ إذا كنت تشغل مختبر الوكيل بلغة غير الإنجليزية، لا مشكلة، فقط تأكد من توفير علم اللغة للوكلاء لأداء البحث بلغتك المفضلة. لاحظ أننا لم ندرس تشغيل مختبر الوكيل بلغات أخرى بشكل موسع، لذا تأكد من الإبلاغ عن أي مشكلات تواجهها.
141
+
142
+ على سبيل المثال، إذا كنت تعمل بالصينية:
143
+
144
+ ```bash
145
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
146
+ ```
147
+
148
+ ----
149
+
150
+ #### [نصيحة #5] 🌟 هناك الكثير من المجال للتحسين 🌟
151
+
152
+ هناك الكثير من المجال لتحسين قاعدة الشيفرة هذه، لذا إذا قمت بإجراء تغييرات وترغب في مساعدة المجتمع، لا تتردد في مشاركة التغييرات التي قمت بها! نأمل أن تساعدك هذه الأداة!
153
+
154
+ ## المرجع / Bibtex
155
+
156
+ ```bibtex
157
+ @preprint{schmidgall2025AgentLaboratory,
158
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
159
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
160
+ year={2025}
161
+ }
162
+ ```
readme/README-bengali.md ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # এজেন্ট ল্যাবরেটরি: গবেষণা সহকারী হিসেবে LLM এজেন্ট ব্যবহার
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+ <p align="center">
8
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | বাংলা | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
9
+ </p>
10
+
11
+ <p align="center">
12
+ 【🌐 <a href="https://agentlaboratory.github.io/">Website</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Example Paper</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
13
+ </p>
14
+
15
+ ## 📖 ওভারভিউ
16
+
17
+ - **এজেন্ট ল্যাবরেটরি** একটি এন্ড-টু-এন্ড স্বায়ত্তশাসিত গবেষণা ওয়ার্কফ্লো যা **আপনাকে** মানব গবেষক হিসেবে **আপনার গবেষণা ধারণাগুলি বাস্তবায়নে** সহায়তা করার জন্য ডিজাইন করা হয়েছে। এজেন্ট ল্যাবরেটরি বড় ভাষা মডেল দ্বারা চালিত বিশেষায়িত এজেন্টের সমন্বয়ে গঠিত যা আপনাকে সম্পূর্ণ গবেষণা ওয়ার্কফ্লো জুড়ে সহায়তা করে—সাহিত্য পর্যালোচনা পরিচালনা থেকে পরিকল্পনা গঠন, পরীক্ষা সম্পাদন এবং বিস্তৃত প্রতিবেদন লেখা পর্যন্ত।
18
+ - এই সিস্টেমটি আপনার সৃজনশীলতাকে প্রতিস্থাপন করার জন্য ডিজাইন করা হয়নি বরং এটি সম্পূরক করার জন্য, আপনাকে ধারণা গঠন এবং সমালোচনামূলক চিন্তাভাবনায় মনোনিবেশ করার পাশাপাশি কোডিং এবং ডকুমেন্টেশন মত পুনরাবৃত্তিমূলক এবং সময়সাপেক্ষ কাজগুলি স্বয়ংক্রিয়করণের সুযোগ দেয়। বিভিন্ন স্তরের গণনামূলক সম্পদ এবং মানব সম্পৃক্ততাকে সমন্বিত করে, এজেন্ট ল্যাবরেটরি বৈজ্ঞানিক আবিষ্কারকে ত্বরান্বিত করা এবং আপনার গবেষণা উৎপাদনশীলতাকে সর্বাধিক করতে লক্ষ্য রাখে।
19
+
20
+ <p align="center">
21
+ <img src="../media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
22
+ </p>
23
+
24
+ ### 🔬 এজেন্ট ল্যাবরেটরি কীভাবে কাজ করে?
25
+
26
+ - এজেন্ট ল্যাবরেটরি তিনটি প্রধান পর্যায় নিয়ে গঠিত যা পদ্ধতিগতভাবে গবেষণা প্রক্রিয়াকে নির্দেশ করে: (১) সাহিত্য পর্যালোচনা, (২) পরীক্ষা, এবং (৩) প্রতিবেদন লেখা। প্রতিটি পর্যায়ে, LLM দ্বারা চালিত বিশেষায়িত এজেন্টরা পৃথক লক্ষ্য অর্জনের জন্য সহযোগিতা করে, ফলাফল অপ্টিমাইজ করার জন্য arXiv, Hugging Face, Python এবং LaTeX এর মত বহিরাগত সরঞ্জামগুলিকে সংহত কর���। এই কাঠামোবদ্ধ ওয়ার্কফ্লো প্রাসঙ্গিক গবেষণা পত্রের স্বাধীন সংগ্রহ এবং বিশ্লেষণ দিয়ে শুরু হয়, সহযোগিতামূলক পরিকল্পনা এবং তথ্য প্রস্তুতির মাধ্যমে অগ্রসর হয়, এবং স্বয়ংক্রিয় পরীক্ষণ এবং বিস্তৃত প্রতিবেদন তৈরিতে শেষ হয়। এই পর্যায়গুলির জুড়ে নির্দিষ্ট এজেন্ট ভূমিকা এবং তাদের অবদান সম্পর্কে বিস্তারিত গবেষণাপত্রে আলোচনা করা হয়েছে।
27
+
28
+ <p align="center">
29
+ <img src="../media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
30
+ </p>
31
+
32
+ ## 🖥️ ইনস্টলেশন
33
+
34
+ ### পাইথন venv বিকল্প
35
+
36
+ 1. **GitHub রিপোজিটরি ক্লোন করুন**: কমান্ডটি ব্যবহার করে রিপোজিটরিটি ক্লোন করা শুরু করুন:
37
+ ```bash
38
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
39
+ ```
40
+
41
+ 2. **পাইথন পরিবেশ সেট আপ এবং সক্রিয় করুন**
42
+ ```bash
43
+ python -m venv venv_agent_lab
44
+ ```
45
+
46
+ - এখন এই পরিবেশটি সক্রিয় করুন:
47
+ ```bash
48
+ source venv_agent_lab/bin/activate
49
+ ```
50
+
51
+ 3. **প্রয়োজনীয় লাইব্রেরিগুলি ইনস্টল করুন**
52
+ ```bash
53
+ pip install -r requirements.txt
54
+ ```
55
+
56
+ 4. **pdflatex ইনস্টল করুন [ঐচ্ছিক]**
57
+ ```bash
58
+ sudo apt install pdflatex
59
+ ```
60
+
61
+ - এটি এজেন্ট দ্বারা ল্যাটেক্স সোর্স কম্পাইল করা সক্ষম করে।
62
+ - **[গুরুত্বপূর্ণ]** যদি sudo অ্যাক্সেস না থাকার কারণে এই ধাপটি চালানো না যায়, তাহলে --compile_latex ফ্ল্যাগটি false এ সেট করে এজেন্ট ল্যাবরেটরি চালিয়ে pdf কম্পাইলিং বন্ধ করা যেতে পারে: --compile_latex=False
63
+
64
+ 5. **এখন এজেন্ট ল্যাবরেটরি চালান!**
65
+ ```bash
66
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
67
+ ```
68
+ অথবা, যদি আপনি pdflatex ইনস্টল না করে থাকেন
69
+ ```bash
70
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
71
+ ```
72
+
73
+ -----
74
+
75
+ ## গবেষণার ফলাফল উন্নত করার টিপস
76
+
77
+ #### [টিপ #১] 📝 ব্যাপক নোট লেখার বিষয়টি নিশ্চিত করুন! 📝
78
+
79
+ **ব্যাপক নোট লেখা গুরুত্বপূর্ণ** কারণ এটি আপনার এজেন্টকে আপনার প্রকল্পে আপনি কী অর্জন করতে চাইছেন তা বোঝাতে এবং যে কোনও স্টাইল পছন্দ রয়েছে তা বুঝতে সাহায্য করে। নোটগুলিতে যে কোনও পরীক্ষা আপনি এজেন্টদের সম্পাদন করতে চান, API কী সরবরাহ করা, আপনি যে নির্দিষ্ট প্লট বা চিত্র অন্তর্ভুক্ত করতে চান, অথবা গবেষণা পরিচালনা করার সময় এজেন্টকে যা কিছু জানাতে চান তা অন্তর্ভুক্ত থাকতে পারে।
80
+
81
+ এটি এছাড়াও আপনার সুযোগ আপনার এজেন্টকে জানানোর **কোন কম্পিউট সম্পদগুলিতে এটি প্রবেশাধিকার রয়েছে**, উদাহরণস্বরূপ, GPUs (কতগুলো, কোন ধরণের GPU, কতগুলো GB), CPUs (কতগুলো কোর, কোন ধরণের CPU), স্টোরেজ সীমাবদ্ধতা, এবং হার্ডওয়্যার স্পেসিফিকেশন।
82
+
83
+ নোট যুক্ত করার জন্য, আপনাকে ai_lab_repo.py এর ভিতরে task_notes_LLM গ��নটি পরিবর্তন করতে হবে। নীচে কিছু পরীক্ষার জন্য ব্যবহৃত নোটগুলির একটি উদাহরণ দেওয়া হল।
84
+
85
+ ```python
86
+ task_notes_LLM = [
87
+ {"phases": ["plan formulation"],
88
+ "note": f"You should come up with a plan for TWO experiments."},
89
+
90
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
91
+ "note": "Please use gpt-4o-mini for your experiments."},
92
+
93
+ {"phases": ["running experiments"],
94
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
95
+
96
+ {"phases": ["running experiments"],
97
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
98
+
99
+ {"phases": ["running experiments"],
100
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
101
+
102
+ {"phases": ["data preparation", "running experiments"],
103
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
104
+
105
+ {"phases": ["data preparation", "running experiments"],
106
+ "note": "Generate figures with very colorful and artistic design."},
107
+ ]
108
+ ```
109
+
110
+ --------
111
+
112
+ #### [টিপ #২] 🚀 আরও শক্তিশালী মডেলগুলি সাধারণত আরও ভাল গবেষণার দিকে নিয়ে যায় 🚀
113
+
114
+ গবেষণা পরিচালনার সময়, **মডেলের নির্বাচন ফলাফলের গুণমানকে উল্লেখযোগ্যভাবে প্রভাবিত করতে পারে**। আরও শক্তিশালী মডেলগুলির সাধারণত উচ্চতর নির্ভুলতা, উন্নত যুক্তিবিদ্যা ক্ষমতা, এবং উন্নত প্রতিবেদন তৈরির ক্ষমতা থাকে। যদি গণনামূলক সম্পদ অনুমতি দেয়, তাহলে o1-(mini/preview) বা অনুরূপ অত্যাধুনিক বড় ভাষা মডেলগুলির মতো উন্নত মডেলগুলির ব্যবহারে অগ্রাধিকার দিন।
115
+
116
+ তবে, **কর্মক্ষমতা এবং ব্যয়-কার্যকারিতা মধ্যে ভারসাম্য বজায় রাখা গুরুত্বপূর্ণ**। শক্তিশালী মডেলগুলি যদিও ভাল ফলাফল দিতে পারে, তবে এগুলি প্রায়শই চালাতে বেশি ব্যয়বহুল এবং সময়সাপেক্ষ হয়। সেগুলি নির্বাচিতভাবে ব্যবহার করার কথা বিবেচনা করুন—উদাহরণস্বরূপ, মূল পরীক্ষাগুলি বা চূড়ান্ত বিশ্লেষণের জন্য—অব iterativeative কাজ বা প্রাথমিক প্রোটোটাইপিংয়ের জন্য ছোট, আরও দক্ষ মডেলগুলির উপর নির্ভর করে।
117
+
118
+ যখন সম্পদ সীমিত থাকে, **আপনার নির্দিষ্ট ডেটাসেটে ছোট মডেলগুলিকে সূক্ষ্ম-সংশোধন করে বা কার্য-নির্দিষ্ট প্রম্পটগুলির সাথে পূর্ব-প্রশিক্ষিত মডেলগুলিকে সংযোজন করে কর্মক্ষমতা এবং গণনামূলক দক্ষতার মধ্যে কাঙ্ক্ষিত ভারসাম্য অর্জন করুন**।
119
+
120
+ -----
121
+
122
+ #### [টিপ #৩] ✅ আপনি চেকপয়েন্টগুলি থেকে পূর্ববর্তী সেভগুলি লোড করতে পারেন ✅
123
+
124
+ **যদি আপনি অগ্রগতি হারান, ইন্টারনেট সংযোগ হারান, বা যদি একটি উপ-কাজ ব্যর্থ হয়, তবে আপনি সর্বদা পূর্ববর্তী অবস্থান থেকে লোড করতে পারেন।** আপনার সম��্ত অগ্রগতি ডিফল্টভাবে state_saves ভেরিয়েবলে সংরক্ষিত থাকে, যা প্রতিটি পৃথক চেকপয়েন্ট সংরক্ষণ করে। ai_lab_repo.py চালানোর সময় কেবল নিম্নলিখিত আর্গুমেন্টগুলি প্রদান করুন
125
+
126
+ ```bash
127
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
128
+ ```
129
+
130
+ -----
131
+
132
+ #### [টিপ #৪] 🈯 আপনি যদি ইংরেজির বাইরে অন্য কোনো ভাষায় চালাচ্ছেন 🈲
133
+
134
+ আপনি যদি এজেন্ট ল্যাবরেটরি ইংরেজির বাইরে অন্য কোনো ভাষায় চালাচ্ছেন, সমস্যা নেই, কেবল নিশ্চিত করুন যে আপনি এজেন্টদের আপনার পছন্দের ভাষায় গবেষণা সম্পাদনের জন্য একটি ভাষা ফ্ল্যাগ সরবরাহ করেছেন। লক্ষ্য করুন যে আমরা অন্যান্য ভাষায় এজেন্ট ল্যাবরেটরি চালানোর ব্যাপকভাবে অধ্যয়ন করি নি, তাই আপনি যে কোনও সমস্যা সম্মুখীন হলে তা রিপোর্ট করতে ভুলবেন না।
135
+
136
+ উদাহরণস্বরূপ, আপনি যদি চীনা ভাষায় চালাচ্ছেন:
137
+
138
+ ```bash
139
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
140
+ ```
141
+
142
+ ----
143
+
144
+ #### [টিপ #৫] 🌟 উন্নতির জন্য অনেক জায়গা রয়েছে 🌟
145
+
146
+ এই কোডবেস উন্নত করার জন্য অনেক সুযোগ রয়েছে, তাই আপনি যদি পরিবর্তন করতে পারেন এবং কমিউনিটির সাহায্য করতে চান, তবে দয়া করে আপনার করা পরিবর্তনগুলি ভাগ করতে দ্বিধা করবেন না! আমরা আশা করি এই টুলটি আপনাকে সাহায্য করবে!
147
+
148
+ ## রেফারেন্স / Bibtex
149
+
150
+ ```bibtex
151
+ @preprint{schmidgall2025AgentLaboratory,
152
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
153
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
154
+ year={2025}
155
+ }
156
+ ```
readme/README-chinese.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Laboratory: 使用大型语言模型代理作为研究助理
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+ <p align="center">
8
+ 【<a href="../README.md">English</a> | 中文 | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
9
+ </p>
10
+
11
+ <p align="center">
12
+ 【🌐 <a href="https://agentlaboratory.github.io/">网站</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">软件</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">视频</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">示例论文</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">引用</a>】
13
+ </p>
14
+
15
+ ## 📖 概述
16
+
17
+ - **Agent Laboratory** 是一个端到端的自主研究工作流程,旨在协助**您**作为人类研究人员**实现您的研究想法**。Agent Laboratory 由由大型语言模型驱动的专业代理组成,支持您完成整个研究工作流程——从进行文献综述和制定计划,到执行实验和撰写综合报告。
18
+ - 该系统并非旨在取代您的创造力,而是为了补充它,使您能够专注于创意和批判性思维,同时自动化重复且耗时的任务,如编码和文档编写。通过适应不同水平的计算资源和人类参与,Agent Laboratory 旨在加速科学发现并优化您的研究生产力。
19
+
20
+ <p align="center">
21
+ <img src="../media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
22
+ </p>
23
+
24
+ ### 🔬 Agent Laboratory 如何工作?
25
+
26
+ - Agent Laboratory 包含三个主要阶段,系统地引导研究过程:(1)文献综述,(2)实验,(3)报告撰写。在每个阶段,由大型语言模型驱动的专业代理协作完成不同的目标,整合了如 arXiv、Hugging Face、Python 和 LaTeX 等外部工具以优化结果。这一结构化的工作流程始于独立收集和分析相关研究论文,经过协作计划和数据准备,最终实现自动化实验和综合报告生成。论文中讨论了具体代理角色及其在这些阶段的贡献。
27
+
28
+ <p align="center">
29
+ <img src="../media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
30
+ </p>
31
+
32
+ ## 🖥️ 安装
33
+
34
+
35
+ ### Python 虚拟环境选项
36
+
37
+ 1. **克隆 GitHub 仓库**:首先使用以下命令克隆仓库:
38
+ ```bash
39
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
40
+ ```
41
+
42
+ 2. **设置并激活 Python 环境**
43
+ ```bash
44
+ python -m venv venv_agent_lab
45
+ ```
46
+ - 现在激活此环境:
47
+ ```bash
48
+ source venv_agent_lab/bin/activate
49
+ ```
50
+
51
+ 3. **安装所需库**
52
+ ```bash
53
+ pip install -r requirements.txt
54
+ ```
55
+
56
+ 4. **安装 pdflatex [可选]**
57
+ ```bash
58
+ sudo apt install pdflatex
59
+ ```
60
+ - 这使得代理能够编译 latex 源代码。
61
+ - **[重要]** 如果由于没有 sudo 权限而无法运行此步骤,可以通过将 `--compile_latex` 标志设置为 false 来关闭 pdf 编译:`--compile_latex=False`
62
+
63
+ 5. **现在运行 Agent Laboratory!**
64
+
65
+ `python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"`
66
+
67
+ 或者,如果您没有安装 pdflatex
68
+
69
+ `python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False`
70
+
71
+ -----
72
+
73
+ ## 提高研究成果的技巧
74
+
75
+ #### [技巧 #1] 📝 确保写下详尽的笔记! 📝
76
+
77
+ **写下详尽的笔记非常重要**,帮助您的代理理解您在项目中希望实现的目标,以及任何风格偏好。笔记可以包括您希望代理执行的任何实验、提供 API 密钥、希望包含的特定图表或图形,或任何您希望代理在进行研究时了解的内容。
78
+
79
+ 这也是您让代理知道**它可以访问的计算资源**的机会,例如 GPU(数量、类型、内存大小)、CPU(核心数量、类型)、存储限制和硬件规格。
80
+
81
+ 为了添加笔记,您必须修改 `ai_lab_repo.py` 中的 `task_notes_LLM` 结构。以下是我们的一些实验中使用的笔记示例。
82
+
83
+ ```
84
+ task_notes_LLM = [
85
+ {"phases": ["plan formulation"],
86
+ "note": f"You should come up with a plan for TWO experiments."},
87
+
88
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
89
+ "note": "Please use gpt-4o-mini for your experiments."},
90
+
91
+ {"phases": ["running experiments"],
92
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
93
+
94
+ {"phases": ["running experiments"],
95
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
96
+
97
+ {"phases": ["running experiments"],
98
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
99
+
100
+ {"phases": ["data preparation", "running experiments"],
101
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
102
+
103
+ {"phases": ["data preparation", "running experiments"],
104
+ "note": "Generate figures with very colorful and artistic design."},
105
+ ]
106
+ ```
107
+
108
+ --------
109
+
110
+ #### [技巧 #2] 🚀 使用更强大的模型通常会带来更好的研究 🚀
111
+
112
+ 在进行研究时,**模型的选择会显著影响结果的质量**。更强大的模型往往具有更高的准确性、更好的推理能力和更优秀的报告生成能力。如果计算资源允许,优先使用先进的模型,如 o1-(mini/preview) 或类似的最先进大型语言模型。
113
+
114
+ 然而,**在性能和成本效益之间取得平衡也很重要**。虽然强大的模型可能会产生更好的结果,但它们通常更昂贵且运行时间更长。考虑选择性地使用它们,例如用于关键实验或最终分析,同时在迭代任务或初步原型设计中依赖较小、更高效的模型。
115
+
116
+ 当资源有限时,**通过在您的特定数据集上微调较小的模型或将预训练模型与特定任务的提示相结合来优化,以实现性能与计算效率之间的理想平衡**。
117
+
118
+ -----
119
+
120
+ #### [技巧 #3] ✅ 您可以从检查点加载之前的保存 ✅
121
+
122
+ **如果您丢失了进度、互联网连接中断或子任务失败,您始终可以从先前的状态加载。** 您的所有进度默认保存在 `state_saves` 变量中,该变量存储每个单独的检查点。只需在运行 `ai_lab_repo.py` 时传递以下参数
123
+
124
+ `python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"`
125
+
126
+ -----
127
+
128
+ #### [技巧 #4] 🈯 如果您使用非英语语言运行 🈲
129
+
130
+ 如果您使用非英语语言运行 Agent Laboratory,没问题,只需确保向代理提供一个语言标志,以便用您喜欢的语言进行研究。请注意,我们尚未广泛研究使用其他语言运行 Agent Laboratory,因此请务必报告您遇到的任何问题。
131
+
132
+ 例如,如果您使用中文运行:
133
+
134
+ `python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"`
135
+
136
+ ----
137
+
138
+ #### [技巧 #5] 🌟 还有很大的改进空间 🌟
139
+
140
+ 这个代码库还有很大的改进空间,因此如果您进行了更改并希望帮助社区,请随时分享您所做的更改!我们希望这个工具对您有帮助!
141
+
142
+ ## 参考文献 / Bibtex
143
+
144
+ ```bibtex
145
+ @preprint{schmidgall2025AgentLaboratory,
146
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
147
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
148
+ year={2025}
149
+ }
150
+ ```
readme/README-farsi.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # آزمایشگاه ایجینت ها: استفاده از نمایندگان مدل‌های زبانی بزرگ به عنوان دستیار برای محققان
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | فارسی | <a href="../readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【🌐 <a href="https://agentlaboratory.github.io/">Website</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Example Paper</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
14
+ </p>
15
+
16
+ ## 📖 نمای کلی
17
+
18
+ - **آزمایشگاه ایجینت ها** یک سیستم کاملا اتوماتیک برای کارهای تحقیقاتی است که به منظور کمک به **شما** به عنوان پژوهشگر انسانی برای **اجرای ایده‌های تحقیقاتی خود** طراحی شده است. آزمایشگاه ایجینت ها شامل نمایندگان تخصصی است که توسط مدل‌های زبان بزرگ هدایت می‌شوند تاتا در تمام مراحل تحقیق از انجام مطالعه و تدوین برنامه‌ها تا اجرای آزمایش‌ها و نوشتن گزارش‌های جامع از شما حمایت کنند.
19
+ - این سیستم برای جایگزینی خلاقیت شما طراحی نشده است، بلکه برای تکمیل آن است، به شما این امکان را می‌دهد که بر ایده‌پردازی و تفکر انتقادی تمرکز کنید در حالی که وظایف تکراری و زمان‌بر مانند کدنویسی و مستندسازی خودکار می‌شوند. با پذیرش سطوح مختلف منابع محاسباتی و مشارکت انسانی، آزمایشگاه ایجنت ها هدف دارد تا کشف علمی را تسریع کرده و بهره‌وری تحقیقاتی شما را بهینه کند.
20
+
21
+ <p align="center">
22
+ <img src="../media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
23
+ </p>
24
+
25
+ ### 🔬 آزمایشگاه ایجنت ها چگونه کار می‌کند؟
26
+
27
+ - آزمایشگاه ایجنت ها شامل سه مرحله اصلی است که به طور سیستماتیک فرآیند تحقیق را هدایت می‌کنند: (1) مرور ادبیات، (2) آزمایش‌گری، و (3) نوشتن گزارش. در هر مرحله، عوامل تخصصی هدایت‌شده توسط مدل‌های زبان بزرگ با هم همکاری می‌کنند تا اهداف متمایز را محقق کنند و ابزارهای خارجی مانند arXiv، Hugging Face، Python، و LaTeX را برای بهینه‌سازی نتایج ادغام می‌کنند. این جریان کاری ساختاریافته با جمع‌آوری و تحلیل مستقل مقالات تحقیقاتی مرتبط آغاز می‌شود، از طریق برنامه‌ریزی مشارکتی و آماده‌سازی داده‌ها پیش می‌رود، و به آزمایش‌گری خودکار و تولید گزارش جامع منتهی می‌شود. جزئیات نقش‌های خاص عوامل و مشارکت‌های آن‌ها در این مراحل در مقاله مورد بحث قرار گرفته است.
28
+
29
+ <p align="center">
30
+ <img src="../media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+ ## 🖥️ نصب
34
+
35
+ ### گزینه محیط مجازی پایتون (venv)
36
+
37
+ 1. **کلون کردن مخزن گیت‌هاب**: با استفاده از دستور زیر، مخزن را کلون کنید:
38
+ ```bash
39
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
40
+ ```
41
+
42
+ 2. **تنظیم و فعال‌سازی محیط پایتون**
43
+ ```bash
44
+ python -m venv venv_agent_lab
45
+ ```
46
+
47
+ - این محیط را فعال کنید:
48
+ ```bash
49
+ source venv_agent_lab/bin/activate
50
+ ```
51
+
52
+ 3. **نصب کتابخانه‌های مورد نیاز**
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ 4. **نصب pdflatex [اختیاری]**
58
+ ```bash
59
+ sudo apt install pdflatex
60
+ ```
61
+
62
+ - این امکان را می‌دهد تا منبع LaTeX توسط عوامل کامپایل شود.
63
+ - **[مهم]** اگر به دلیل نداشتن دسترسی sudo نمی‌توانید این مرحله را اجرا کنید، می‌توانید کامپایل PDF را با اجرای آزمایشگاه ایجنت ها و تنظیم فلگ --compile_latex به false غیرفعال کنید:
64
+ ```
65
+ --compile_latex=False
66
+ ```
67
+
68
+ 5. **اکنون آزمایشگاه ایجنت ها را اجرا کنید!**
69
+ ```bash
70
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
71
+ ```
72
+
73
+ یا اگر pdflatex نصب نکرده‌اید:
74
+ ```bash
75
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
76
+ ```
77
+
78
+ -----
79
+ ## نکات برای نتایج بهتر تحقیق
80
+
81
+ #### [نکته #1] 📝 حتماً یادداشت‌های گسترده‌ای بنویسید! 📝
82
+
83
+ **نوشتن یادداشت‌های دقیق مهم است** تا به ایجنت ها شما در درک آنچه می‌خواهید در پروژه‌تان انجام دهید و همچنین هرگونه ترجیحات سبک کمک کند. یادداشت‌ها می‌توانند شامل هر آزمایشی باشند که می‌خواهید عوامل انجام دهند، ارائه کلیدهای API، نمودارها یا شکل‌های خاصی که می‌خواهید گنجانده شوند، یا هر چیزی که می‌خواهید ایجنت ها هنگام انجام تحقیق بداند.
84
+
85
+ این همچنین فرصت شماست تا به ایجنت ها اطلاع دهید **به چه منابع محاسباتی دسترسی دارد**، مثلاً GPUها (تعداد، نوع GPU، میزان GB)، CPUها (تعداد هسته، نوع CPUها)، محدودیت‌های ذخیره‌سازی، و مشخصات سخت‌افزاری.
86
+
87
+ برای افزودن یادداشت‌ها، باید ساختار task_notes_LLM را در داخل ai_lab_repo.py تغییر دهید. در زیر نمونه‌ای از مجموعه یادداشت‌هایی که برای برخی از آزمایش‌های ما استفاده شده است ارائه شده است.
88
+
89
+ ```python
90
+ task_notes_LLM = [
91
+ {"phases": ["plan formulation"],
92
+ "note": f"You should come up with a plan for TWO experiments."},
93
+
94
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
95
+ "note": "Please use gpt-4o-mini for your experiments."},
96
+
97
+ {"phases": ["running experiments"],
98
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
99
+
100
+ {"phases": ["running experiments"],
101
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
102
+
103
+ {"phases": ["running experiments"],
104
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
105
+
106
+ {"phases": ["data preparation", "running experiments"],
107
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
108
+
109
+ {"phases": ["data preparation", "running experiments"],
110
+ "note": "Generate figures with very colorful and artistic design."},
111
+ ]
112
+ ```
113
+
114
+ --------
115
+
116
+ #### [نکته #2] 🚀 استفاده از مدل‌های قدرتمندتر به طور کلی منجر به تحقیقات بهتر می‌شود 🚀
117
+
118
+ هنگام انجام تحقیقات، **انتخاب مدل می‌تواند به طور قابل توجهی بر کیفیت نتایج تأثیر بگذارد**. مدل‌های قدرتمندتر معمولاً دقت بالاتری دارند، قابلیت‌های استدلال بهتری ارائه می‌دهند و گزارش‌های بهتری تولید می‌کنند. اگر منابع محاسباتی اجازه می‌دهد، استفاده از مدل‌های پیشرفته مانند o1-(mini/preview) یا مدل‌های زبان بزرگ مشابه پیشرفته را در اولویت قرار دهید.
119
+
120
+ با این حال، **مهم است که تعادل بین عملکرد و هزینه را رعایت کنید**. در حالی که مدل‌های قدرتمند ممکن است نتایج بهتری ارائه دهند، اغلب ه��ینه‌بر و زمان‌بر هستند. در نظر بگیرید که از آن‌ها به صورت انتخابی استفاده کنید — به عنوان مثال، برای آزمایش‌های کلیدی یا تحلیل‌های نهایی — در حالی که برای وظایف تکراری یا نمونه‌سازی اولیه از مدل‌های کوچک‌تر و کارآمدتر استفاده کنید.
121
+
122
+ وقتی منابع محدود هستند، **با تنظیم دقیق مدل‌های کوچک‌تر بر روی مجموعه داده‌های خاص خود یا ترکیب مدل‌های پیش‌آموزش‌دیده با پرامپت‌های خاص وظیفه‌ای بهینه‌سازی کنید** تا تعادل مطلوب بین عملکرد و کارایی محاسباتی را به دست آورید.
123
+
124
+ -----
125
+
126
+ #### [نکته #3] ✅ می‌توانید ذخیره‌های قبلی را از نقاط بازگشت بارگذاری کنید ✅
127
+
128
+ **اگر پیشرفت خود را از دست دادید، اتصال اینترنت قطع شد، یا یک زیروظیفه شکست خورد، همیشه می‌توانید از وضعیت قبلی بارگذاری کنید.** تمام پیشرفت‌های شما به طور پیش‌فرض در متغیر state_saves ذخیره می‌شوند که هر نقطه بازگشت را ذخیره می‌کند. فقط هنگام اجرای ai_lab_repo.py از آرگومان‌های زیر استفاده کنید:
129
+
130
+ ```bash
131
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
132
+ ```
133
+
134
+ -----
135
+
136
+ #### [نکته #4] 🈯 اگر به زبانی غیر از انگلیسی اجرا می‌کنید 🈲
137
+
138
+ اگر آزمایشگاه ایحنت ها را به زبانی غیر از انگلیسی اجرا می‌کنید، مشکلی نیست، فقط مطمئن شوید که پرچم زبان را به عوامل ارائه دهید تا به زبان مورد نظر شما تحقیق انجام دهند. توجه داشته باشید که ما به طور گسترده‌ای اجرای آزمایشگاه ایجنت ها را به زبان‌های دیگر مطالعه نکرده‌ایم، بنابراین حتماً هر مشکلی که با آن مواجه شدید را گزارش دهید.
139
+
140
+ برای مثال، اگر به زبان چینی اجرا می‌کنید:
141
+
142
+ ```bash
143
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
144
+ ```
145
+
146
+ ----
147
+
148
+ #### [نکته #5] 🌟 جای پیشرفت زیادی وجود دارد 🌟
149
+
150
+ جای پیشرفت زیادی برای بهبود این کدبیس وجود دارد، بنابراین اگر در نهایت تغییراتی ایجاد کردید و می‌خواهید به جامعه کمک کنید، لطفاً تغییراتی که ایجاد کرده‌اید را به اشتراک بگذارید! امیدواریم این ابزار به شما کمک کند!
151
+
152
+ ## مراجع / Bibtex
153
+
154
+ ```bibtex
155
+ @preprint{schmidgall2025AgentLaboratory,
156
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
157
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
158
+ year={2025}
159
+ }
160
+ ```
readme/README-filipino.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Laboratory: Paggamit ng LLM Agents bilang mga Tagapag-Asistang Pang-research
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstrasyon ng daloy ng AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+ <p align="center">
8
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | Filipino | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
9
+ </p>
10
+
11
+ <p align="center">
12
+ 【🌐 <a href="https://agentlaboratory.github.io/">Website</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Example Paper</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
13
+ </p>
14
+
15
+ ## 📖 Pangkalahatang-ideya
16
+
17
+ - **Agent Laboratory** ay isang end-to-end na autonomous na workflow sa pananaliksik na nilalayong tulungan **ikaw** bilang isang human researcher sa **pagpapatupad ng iyong mga ideya sa pananaliksik**. Binubuo ang Agent Laboratory ng mga espesyalistang ahente na pinapagana ng malalaking modelo ng wika upang suportahan ka sa buong workflow ng pananaliksik—mula sa pagsasagawa ng mga pagsusuri sa literatura at pagbuo ng mga plano hanggang sa pagpapatupad ng mga eksperimento at pagsulat ng komprehensibong mga ulat.
18
+ - Ang sistemang ito ay hindi dinisenyo upang palitan ang iyong pagkamalikhain kundi upang kumpletuhin ito, na nagbibigay-daan sa iyo na magpokus sa ideasyon at kritikal na pag-iisip habang ina-automate ang mga paulit-ulit at matagal na gawain tulad ng pag-cocode at dokumentasyon. Sa pamamagitan ng pag-aakma sa iba't ibang antas ng computational na mga mapagkukunan at partisipasyon ng tao, layunin ng Agent Laboratory na pabilisin ang siyentipikong pagtuklas at i-optimize ang iyong produktibidad sa pananaliksik.
19
+
20
+ <p align="center">
21
+ <img src="../media/AgentLab.png" alt="Demonstrasyon ng daloy ng AgentClinic" style="width: 99%;">
22
+ </p>
23
+
24
+ ### 🔬 Paano gumagana ang Agent Laboratory?
25
+
26
+ - Binubuo ang Agent Laboratory ng tatlong pangunahing yugto na sistematikong ginagabayan ang proseso ng pananaliksik: (1) Pagsusuri ng Literatura, (2) Eksperimentasyon, at (3) Pagsulat ng Ulat. Sa bawat yugto, ang mga espesyalistang ahente na pinapagana ng LLMs ay nagtutulungan upang makamit ang mga natatanging layunin, na nag-iintegrate ng mga panlabas na kagamitan tulad ng arXiv, Hugging Face, Python, at LaTeX upang i-optimize ang mga resulta. Nagsisimula ang estrukturadong workflow na ito sa malayang koleksyon at pagsusuri ng mga kaugnay na papel sa pananaliksik, sumusulong sa pamamagitan ng kolaboratibong pagpaplano at paghahanda ng datos, at nagreresulta sa automated na eksperimento at komprehensibong paggawa ng ulat. Ang mga detalye tungkol sa mga tiyak na papel ng ahente at kanilang mga kontribusyon sa mga yugtong ito ay tinalakay sa papel.
27
+
28
+ <p align="center">
29
+ <img src="../media/AgentLabWF.png" alt="Demonstrasyon ng daloy ng AgentClinic" style="width: 99%;">
30
+ </p>
31
+
32
+ ## 🖥️ Pag-install
33
+
34
+ ### Python venv na opsyon
35
+
36
+ 1. **I-clone ang GitHub Repository**: Magsimula sa pamamagitan ng pag-clone ng repository gamit ang utos:
37
+ ```bash
38
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
39
+ ```
40
+
41
+ 2. **I-set up at I-activate ang Python Environment**
42
+ ```bash
43
+ python -m venv venv_agent_lab
44
+ ```
45
+ - Ngayon i-activate ang environment na ito:
46
+ ```bash
47
+ source venv_agent_lab/bin/activate
48
+ ```
49
+
50
+ 3. **I-install ang mga kinakailangang library**
51
+ ```bash
52
+ pip install -r requirements.txt
53
+ ```
54
+
55
+ 4. **I-install ang pdflatex [OPTIONAL]**
56
+ ```bash
57
+ sudo apt install pdflatex
58
+ ```
59
+ - Pinapayagan nitong ma-compile ng mga ahente ang latex source.
60
+ - **[MAHALAGA]** Kung hindi maisagawa ang hakbang na ito dahil sa kawalan ng sudo access, maaaring i-off ang pdf compiling sa pamamagitan ng pagpapatakbo ng Agent Laboratory gamit ang pag-set ng `--compile_latex` flag sa false:
61
+ ```bash
62
+ --compile_latex=False
63
+ ```
64
+
65
+ 5. **Ngayon patakbuhin ang Agent Laboratory!**
66
+ ```bash
67
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
68
+ ```
69
+ o, kung wala kang naka-install na pdflatex
70
+ ```bash
71
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
72
+ ```
73
+
74
+ -----
75
+
76
+ ## Mga Tip para sa Mas Mabuting Resulta ng Pananaliksik
77
+
78
+ #### [Tip #1] 📝 Tiyaking sumulat ng malawak na mga tala! 📝
79
+
80
+ **Mahalaga ang pagsusulat ng malawak na mga tala** upang matulungan ang iyong ahente na maunawaan kung ano ang nais mong makamit sa iyong proyekto, pati na rin ang anumang mga paboritong estilo. Maaaring kabilang sa mga tala ang anumang mga eksperimento na nais mong isagawa ng mga ahente, pagbibigay ng mga API key, tiyak na mga plot o figure na nais mong isama, o anumang nais mong malaman ng ahente kapag nagsasagawa ng pananaliksik.
81
+
82
+ Ito rin ang iyong pagkakataon upang ipaalam sa ahente **kung anong mga compute resources ang mayroon ito**, halimbawa, GPUs (ilan, anong uri ng GPU, ilang GBs), CPUs (ilang cores, anong uri ng CPUs), mga limitasyon sa storage, at mga specs ng hardware.
83
+
84
+ Upang magdagdag ng mga tala, kailangan mong baguhin ang `task_notes_LLM` na istraktura sa loob ng `ai_lab_repo.py`. Ibinigay sa ibaba ang isang halimbawa ng mga tala na ginamit para sa ilan sa aming mga eksperimento.
85
+
86
+ ```python
87
+ task_notes_LLM = [
88
+ {"phases": ["plan formulation"],
89
+ "note": f"You should come up with a plan for TWO experiments."},
90
+
91
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
92
+ "note": "Please use gpt-4o-mini for your experiments."},
93
+
94
+ {"phases": ["running experiments"],
95
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
96
+
97
+ {"phases": ["running experiments"],
98
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
99
+
100
+ {"phases": ["running experiments"],
101
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
102
+
103
+ {"phases": ["data preparation", "running experiments"],
104
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
105
+
106
+ {"phases": ["data preparation", "running experiments"],
107
+ "note": "Generate figures with very colorful and artistic design."},
108
+ ]
109
+ ```
110
+
111
+ --------
112
+
113
+ #### [Tip #2] 🚀 Ang paggamit ng mas malalakas na mga modelo ay karaniwang nagdudulot ng mas magagandang pananaliksik 🚀
114
+
115
+ Kapag nagsasagawa ng pananaliksik, **ang pagpili ng modelo ay maaaring malaki ang epekto sa kalidad ng mga resulta**. Ang mas malalakas na mga modelo ay karaniwang may mas mataas na katumpakan, mas mahusay na kakayahan sa pag-iisip, at mas magaling na paggawa ng ulat. Kung pinapayagan ng mga computational na mapagkukunan, bigyang prioridad ang paggamit ng mga advanced na modelo tulad ng o1-(mini/preview) o katulad na mga state-of-the-art na malalaking modelo ng wika.
116
+
117
+ Gayunpaman, **mahalagang balansehin ang pagganap at pagiging cost-effective**. Habang ang mga malalakas na modelo ay maaaring magbigay ng mas magagandang resulta, madalas silang mas mahal at mas matagal patakbuhin. Isaalang-alang ang paggamit ng mga ito nang selektibo—halimbawa, para sa mga pangunahing eksperimento o panghuling pagsusuri—habang umaasa sa mas maliit, mas mahusay na mga modelo para sa mga iteratibong gawain o paunang prototyping.
118
+
119
+ Kapag limitado ang mga mapagkukunan, **i-optimize sa pamamagitan ng fine-tuning ng mas maliliit na mga modelo** sa iyong partikular na dataset o pagsasama ng mga pre-trained na modelo sa mga task-specific na prompt upang makamit ang nais na balanse sa pagitan ng pagganap at computational na kahusayan.
120
+
121
+ -----
122
+
123
+ #### [Tip #3] ✅ Maaari kang mag-load ng mga naunang save mula sa mga checkpoint ✅
124
+
125
+ **Kung mawalan ka ng progreso, koneksyon sa internet, o kung mabigo ang isang subtask, maaari mong laging i-load mula sa isang naunang estado.** Ang lahat ng iyong progreso ay naka-save bilang default sa `state_saves` variable, na nag-iimbak ng bawat indibidwal na checkpoint. Ibigay lamang ang mga sumusunod na argumento kapag nagpapatakbo ng `ai_lab_repo.py`:
126
+
127
+ ```bash
128
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
129
+ ```
130
+
131
+ -----
132
+
133
+ #### [Tip #4] 🈯 Kung ikaw ay nagpapatakbo sa isang wika maliban sa Ingles 🈲
134
+
135
+ Kung nagpapatakbo ka ng Agent Laboratory sa isang wika maliban sa Ingles, walang problema, siguraduhing magbigay ng language flag sa mga ahente upang magsagawa ng pananaliksik sa iyong nais na wika. Tandaan na hindi pa namin lubusang pinag-aralan ang pagpapatakbo ng Agent Laboratory sa ibang mga wika, kaya siguraduhing iulat ang anumang mga problemang iyong makaharap.
136
+
137
+ Halimbawa, kung nagpapatakbo ka sa Chinese:
138
+
139
+ ```bash
140
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
141
+ ```
142
+
143
+ ----
144
+
145
+ #### [Tip #5] 🌟 Mayroong maraming puwang para sa pagpapabuti 🌟
146
+
147
+ Mayroong maraming puwang upang mapabuti ang codebase na ito, kaya kung ikaw ay gagawa ng mga pagbabago at nais makatulong sa komunidad, huwag mag-atubiling ibahagi ang mga pagbabagong iyong ginawa! Inaasahan naming makakatulong ang tool na ito sa iyo!
148
+
149
+ ## Reference / Bibtex
150
+
151
+ ```bibtex
152
+ @preprint{schmidgall2025AgentLaboratory,
153
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
154
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
155
+ year={2025}
156
+ }
157
+ ```
readme/README-french.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Laboratoire d'Agent : Utilisation des agents LLM comme assistants de recherche
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Démonstration du flux de AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | Français | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【🌐 <a href="https://agentlaboratory.github.io/">Site Web</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Logiciel</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Vidéo</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Article Exemple</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
14
+ </p>
15
+
16
+ ## 📖 Aperçu
17
+
18
+ - **Laboratoire d'Agent** est un flux de travail de recherche autonome de bout en bout destiné à vous assister en tant que chercheur humain dans **la mise en œuvre de vos idées de recherche**. Le Laboratoire d'Agent est composé d'agents spécialisés alimentés par de grands modèles de langage pour vous soutenir tout au long du processus de recherche—de la réalisation des revues de littérature et de la formulation de plans à l'exécution des expériences et à la rédaction de rapports complets.
19
+ - Ce système n'est pas conçu pour remplacer votre créativité, mais pour la compléter, vous permettant de vous concentrer sur l’idéation et la pensée critique tout en automatisant les tâches répétitives et chronophages telles que la programmation et la documentation. En s'adaptant à différents niveaux de ressources informatiques et d'implication humaine, le Laboratoire d'Agent vise à accélérer la découverte scientifique et à optimiser votre productivité en recherche.
20
+
21
+ <p align="center">
22
+ <img src="../media/AgentLab.png" alt="Démonstration du flux de AgentClinic" style="width: 99%;">
23
+ </p>
24
+
25
+ ### 🔬 Comment fonctionne le Laboratoire d'Agent ?
26
+
27
+ - Le Laboratoire d'Agent se compose de trois phases principales qui guident systématiquement le processus de recherche : (1) Revue de littérature, (2) Expérimentation et (3) Rédaction de rapports. Pendant chaque phase, des agents spécialisés alimentés par des LLM collaborent pour atteindre des objectifs distincts, en intégrant des outils externes tels qu'arXiv, Hugging Face, Python et LaTeX afin d'optimiser les résultats. Ce flux de travail structuré commence par la collecte et l'analyse indépendantes des articles de recherche pertinents, progresse par la planification collaborative et la préparation des données, et aboutit à l'expérimentation automatisée et à la génération de rapports complets. Les détails sur les rôles spécifiques des agents et leurs contributions au cours de ces phases sont abordés dans l'article.
28
+
29
+ <p align="center">
30
+ <img src="../media/AgentLabWF.png" alt="Démonstration du flux de AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+ ## 🖥️ Installation
34
+
35
+ ### Option d'environnement virtuel Python
36
+
37
+ 1. **Cloner le dépôt GitHub** : Commencez par cloner le dépôt en utilisant la commande :
38
+ ```bash
39
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
40
+ ```
41
+
42
+ 2. **Configurer et activer l'environnement Python**
43
+ ```bash
44
+ python -m venv venv_agent_lab
45
+ ```
46
+
47
+ - Activez maintenant cet environnement :
48
+ ```bash
49
+ source venv_agent_lab/bin/activate
50
+ ```
51
+
52
+ 3. **Installer les bibliothèques requises**
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ 4. **Installer pdflatex [OPTIONNEL]**
58
+ ```bash
59
+ sudo apt install pdflatex
60
+ ```
61
+
62
+ - Cela permet aux agents de compiler le code source LaTeX.
63
+ - **[IMPORTANT]** Si cette étape ne peut pas être exécutée en raison de l'absence d'accès sudo, la compilation PDF peut être désactivée en exécutant le Laboratoire d'Agent avec le drapeau `--compile_latex` défini sur `false` : `--compile_latex=False`
64
+
65
+ 5. **Lancez maintenant le Laboratoire d'Agent !**
66
+ ```bash
67
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "VOTRE IDÉE DE RECHERCHE"
68
+ ```
69
+
70
+ ou, si vous n'avez pas installé pdflatex
71
+
72
+ ```bash
73
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "VOTRE IDÉE DE RECHERCHE" --compile_latex=False
74
+ ```
75
+
76
+ -----
77
+ ## Conseils pour de meilleurs résultats de recherche
78
+
79
+ #### [Conseil n°1] 📝 Assurez-vous de prendre des notes détaillées ! 📝
80
+
81
+ **Prendre des notes détaillées est important** pour aider votre agent à comprendre ce que vous cherchez à accomplir dans votre projet, ainsi que toute préférence de style. Les notes peuvent inclure les expériences que vous souhaitez que les agents réalisent, la fourniture de clés API, certains graphiques ou figures que vous souhaitez inclure, ou tout ce que vous souhaitez que l'agent sache lors de la réalisation de recherches.
82
+
83
+ C'est également votre opportunité d'informer l'agent **quelles ressources informatiques il peut utiliser**, par exemple les GPU (combien, quel type de GPU, combien de Go), les CPU (combien de cœurs, quel type de CPU), les limitations de stockage et les spécifications matérielles.
84
+
85
+ Pour ajouter des notes, vous devez modifier la structure `task_notes_LLM` à l'intérieur de `ai_lab_repo.py`. Ci-dessous, un exemple de jeu de notes utilisé pour certaines de nos expériences.
86
+
87
+ ```python
88
+ task_notes_LLM = [
89
+ {"phases": ["plan formulation"],
90
+ "note": f"You should come up with a plan for TWO experiments."},
91
+
92
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
93
+ "note": "Please use gpt-4o-mini for your experiments."},
94
+
95
+ {"phases": ["running experiments"],
96
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
97
+
98
+ {"phases": ["running experiments"],
99
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
100
+
101
+ {"phases": ["running experiments"],
102
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
103
+
104
+ {"phases": ["data preparation", "running experiments"],
105
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
106
+
107
+ {"phases": ["data preparation", "running experiments"],
108
+ "note": "Generate figures with very colorful and artistic design."},
109
+ ]
110
+ ```
111
+
112
+ --------
113
+
114
+ #### [Conseil n°2] 🚀 Utiliser des modèles plus puissants conduit généralement à une meilleure recherche 🚀
115
+
116
+ Lors de la conduite de recherches, **le choix du modèle peut avoir un impact significatif sur la qualité des résultats**. Les modèles plus puissants ont tendance à avoir une précision plus élevée, de meilleures capacités de raisonnement et une meilleure génération de rapports. Si les ressources informatiques le permettent, privilégiez l'utilisation de modèles avancés tels que o1-(mini/preview) ou d'autres grands modèles de langage à la pointe de la technologie.
117
+
118
+ Cependant, **il est important de trouver un équilibre entre performance et rentabilité**. Bien que les modèles puissants puissent donner de meilleurs résultats, ils sont souvent plus coûteux et plus longs à exécuter. Envisagez de les utiliser de manière sélective—par exemple, pour des expériences clés ou des analyses finales—tout en comptant sur des modèles plus petits et plus efficaces pour des tâches itératives ou du prototypage initial.
119
+
120
+ Lorsque les ressources sont limitées, **optimisez en affinant des modèles plus petits** sur votre jeu de données spécifique ou en combinant des modèles pré-entraînés avec des invites spécifiques à la tâche afin d'atteindre l'équilibre souhaité entre performance et efficacité computationnelle.
121
+
122
+ -----
123
+
124
+ #### [Conseil n°3] ✅ Vous pouvez charger des sauvegardes précédentes depuis des points de contrôle ✅
125
+
126
+ **Si vous perdez des progrès, la connexion Internet ou si une sous-tâche échoue, vous pouvez toujours charger à partir d'un état précédent.** Tous vos progrès sont enregistrés par défaut dans la variable `state_saves`, qui stocke chaque point de contrôle individuel. Il vous suffit de passer les arguments suivants lors de l'exécution de `ai_lab_repo.py`
127
+
128
+ ```bash
129
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
130
+ ```
131
+
132
+ -----
133
+
134
+ #### [Conseil n°4] 🈯 Si vous utilisez une langue autre que l'anglais 🈲
135
+
136
+ Si vous exécutez le Laboratoire d'Agent dans une langue autre que l'anglais, pas de problème, assurez-vous simplement de fournir un drapeau de langue aux agents pour effectuer des recherches dans votre langue préférée. Notez que nous n'avons pas étudié de manière approfondie l'exécution du Laboratoire d'Agent dans d'autres langues, alors assurez-vous de signaler tout problème que vous rencontrez.
137
+
138
+ Par exemple, si vous utilisez le chinois :
139
+
140
+ ```bash
141
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
142
+ ```
143
+
144
+ ----
145
+
146
+ #### [Conseil n°5] 🌟 Il y a beaucoup de place pour l'amélioration 🌟
147
+
148
+ Il y a beaucoup de possibilités d'améliorer cette base de code, donc si vous finissez par apporter des modifications et souhaitez aider la communauté, n'hésitez pas à partager les changements que vous avez effectués ! Nous espérons que cet outil vous sera utile !
149
+
150
+ ## Référence / Bibtex
151
+
152
+ ```bibtex
153
+ @preprint{schmidgall2025AgentLaboratory,
154
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
155
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
156
+ year={2025}
157
+ }
158
+ ```
readme/README-hindi.md ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # एजेंट लैबोरेटरी: अनुसंधान सहायकों के रूप में LLM एजेंटों का उपयोग
3
+
4
+ <p align="center">
5
+ <img src="../media/AgentLabLogo.png" alt="AgentClinic के प्रवाह का प्रदर्शन" style="width: 99%;">
6
+ </p>
7
+
8
+
9
+ <p align="center">
10
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | हिंदी | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
11
+ </p>
12
+
13
+ <p align="center">
14
+ 【🌐 <a href="https://agentlaboratory.github.io/">Website</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Example Paper</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
15
+ </p>
16
+
17
+ ## 📖 अवलोकन
18
+
19
+ - **एजेंट लैबोरेटरी** एक अंत-से-अंत स्वायत्त अनुसंधान कार्यप्रवाह है जिसे **आप** को मानव शोधकर्ता के रूप में **अपने अनुसंधान विचारों को लागू करने** में सहायता करने के लिए डिज़ाइन किया गया है। एजेंट लैबोरेटरी में बड़े भाषा मॉडल द्वारा संचालित विशेषीकृत एजेंट शामिल हैं जो आपको संपूर्ण अनुसंधान कार्यप्रवाह के माध्यम से समर्थन करते हैं—साहित्य समीक्षा करने और योजनाएँ बनाने से लेकर प्रयोगों को निष्पादित करने और व्यापक रिपोर्ट लिखने तक।
20
+ - यह प्रणाली आपकी रचनात्मकता को बदलने के लिए नहीं बल्कि इसे पूरा करने के लिए डिज़ाइन की गई है, जिससे आप विचार-विमर्श और महत्वपूर्ण सोच पर ध्यान केंद्रित कर सकते हैं, जबकि कोडिंग और दस्तावेजीकरण जैसे दोहराए जाने वाले और समय-गहन कार्यों को स्वचालित किया जाता है। विभिन्न स्तर के संगणनात्मक संसाधनों और मानव भागीदारी को समायोजित करके, एजेंट लैबोरेटरी वैज्ञानिक खोज को तेज करने और आपके अनुसंधान उत्पादकता को अनुकूलित करने का लक्ष्य रखता है।
21
+
22
+ <p align="center">
23
+ <img src="../media/AgentLab.png" alt="AgentClinic के प्रवाह का प्रदर्शन" style="width: 99%;">
24
+ </p>
25
+
26
+ ### 🔬 एजेंट लैबोरेटरी कैसे काम करता है?
27
+
28
+ - एजेंट लैबोरेटरी तीन मुख्य चरणों से मिलकर बनता है जो अनुसंधान प्रक्रिया का व्यवस्थित रूप से मार्गदर्शन करते हैं: (1) साहित्य समीक्षा, (2) प्रयोग, और (3) रिपोर्ट लेखन। प्रत्येक चरण के दौरान, LLM द्वारा संचालित विशेषीकृत एजेंट विशिष्ट उद्देश्यों को प्राप्त करने के लिए सहयोग करते हैं, परिणामों को अनुकूलित करने के लिए arXiv, Hugging Face, Python, और LaTeX जैसे बाहरी उपकरणों को एकीकृत करते हैं। यह संरचित कार्यप्रवाह संबंधित अनुसंधान पत्रों के स्वतंत्र संग्रह और विश्लेषण से शुरू होता है, सहयोगात्मक योजना और डेटा तैयारी के माध्यम से प्रगति करता है, और स्वचालित प्रयोग और व्यापक रिपोर्ट जनरेशन में समाप्त होता है। इन चरणों में विशिष्ट एजेंट भूमिकाओं और उनके योगदान के विवरण पेपर में चर्चा किए गए हैं।
29
+
30
+ <p align="center">
31
+ <img src="../media/AgentLabWF.png" alt="AgentClinic के प्रवाह का प्रदर्शन" style="width: 99%;">
32
+ </p>
33
+
34
+ ## 🖥️ स्थापना
35
+
36
+ ### Python venv विकल्प
37
+
38
+ 1. **GitHub रिपॉजिटरी क्लोन करें**: रिपॉजिटरी को क्लोन करना शुरू करें निम्न कमांड का उपयोग करके:
39
+ ```bash
40
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
41
+ ```
42
+
43
+ 2. **पायथन पर्यावरण सेटअप और सक्रिय करें**
44
+ ```bash
45
+ python -m venv venv_agent_lab
46
+ ```
47
+ - अब इस पर्यावरण को सक्रिय करें:
48
+ ```bash
49
+ source venv_agent_lab/bin/activate
50
+ ```
51
+
52
+ 3. **आवश्यक पुस्तकालय स्थापित करें**
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ 4. **pdflatex स्थापित करें [वैकल्पिक]**
58
+ ```bash
59
+ sudo apt install pdflatex
60
+ ```
61
+ - यह एजेंटों द्वारा latex स्रोत को संकलित करने में सक्षम बनाता है।
62
+ - **[महत्वपूर्ण]** यदि इस चरण को sudo एक्सेस न होने के कारण नहीं चलाया जा सकता है, तो Agent Laboratory को --compile_latex फ्लैग को false सेट करके pdf संकलन बंद किया जा सकता है: `--compile_latex=False`
63
+
64
+ 5. **अब Agent Laboratory चलाएं!**
65
+ ```bash
66
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
67
+ ```
68
+ या, यदि आपने pdflatex स्थापित नहीं किया है:
69
+ ```bash
70
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
71
+ ```
72
+
73
+ -----
74
+ ## बेहतर अनुसंधान परिणामों के लिए सुझाव
75
+
76
+ #### [सुझाव #1] 📝 विस्तृत नोट्स लिखना सुनिश्चित करें! 📝
77
+
78
+ **विस्तृत नोट्स लिखना महत्वपूर्ण है** ताकि आपका एजेंट समझ सके कि आप अपने प्रोजेक्ट में क्या हासिल करना चाहते हैं, साथ ही किसी भी शैली की प्राथमिकताएँ। नोट्स में उन किसी भी प्रयोगों को शामिल किया जा सकता है जिन्हें आप एजेंटों से करने के लिए चाहते हैं, API कुंजी प्रदान करना, कुछ प्लॉट या आकृतियाँ शामिल करना, या कुछ भी जिसे आप अनुसंधान करते समय एजेंट को जानना चाहते हैं।
79
+
80
+ यह आपका अवसर भी है कि एजेंट को बताएं **कौन से कंप्यूट संसाधनों तक उसकी पहुंच है**, जैसे GPUs (कितने, किस प्रकार के GPU, कितने GBs), CPUs (कितने कोर, किस प्रकार के CPUs), स्टोरेज सीमाएँ, और हार्डवेयर स्पेसिफिकेशन।
81
+
82
+ नोट्स जोड़ने के लिए, आपको ai_lab_repo.py के अंदर task_notes_LLM संरचना को संशोधित करना होगा। नीचे हमारे कुछ प्रयोगों के लिए उपयोग किए गए नोट्स का एक उदाहरण दिया गया है।
83
+
84
+ ```python
85
+ task_notes_LLM = [
86
+ {"phases": ["plan formulation"],
87
+ "note": f"You should come up with a plan for TWO experiments."},
88
+
89
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
90
+ "note": "Please use gpt-4o-mini for your experiments."},
91
+
92
+ {"phases": ["running experiments"],
93
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
94
+
95
+ {"phases": ["running experiments"],
96
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
97
+
98
+ {"phases": ["running experiments"],
99
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
100
+
101
+ {"phases": ["data preparation", "running experiments"],
102
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
103
+
104
+ {"phases": ["data preparation", "running experiments"],
105
+ "note": "Generate figures with very colorful and artistic design."},
106
+ ]
107
+ ```
108
+
109
+ --------
110
+
111
+ #### [सुझाव #2] 🚀 अधिक शक्तिशाली मॉडल का उपयोग सामान्यतः बेहतर अनुसंधान की ओर ले जाता है 🚀
112
+
113
+ अनुसंधान करते समय, **मॉडल का चयन परिणामों की गुणवत्ता पर महत्वपूर्ण प्रभाव डाल सकता है**। अधिक शक्तिशाली मॉडल आमतौर पर उच्च सटीकता, बेहतर तर्क क्षमताओं, और बेहतर रिपोर्ट जनरेशन प्रदान करते हैं। यदि संगणनात्मक संसाधन अनुमति देते हैं, तो o1-(mini/preview) या इसी तरह के अत्याधुनिक बड़े भाषा मॉडल जैसे उन्नत मॉडलों के उपयोग को प्राथमिकता दें।
114
+
115
+ हालांकि, **प्रदर्शन और लागत-प्रभावशीलता के बीच संतुलन बनाना महत्वपूर्ण है**। जबकि शक्तिशाली मॉडल बेहतर परिणाम दे सकते हैं, उन्हें चलाने में अक्सर अधिक खर्च और समय लगता है। उन्हें चयनात्मक रूप से उपयोग करने पर विचार करें—उदाहरण के लिए, मुख्य प्रयोगों या अंतिम विश्लेषणों के लिए—जबकि पुनरावृत्त कार्यों या प्रारंभिक प्रोटोटाइपिंग के लिए छोटे, अधिक कुशल मॉडलों पर निर्भर रहें।
116
+
117
+ जब संसाधन सीमित हों, **अपने विशिष्ट डेटासेट पर छोटे मॉडलों को फाइन-ट्यून करके या कार्य-विशिष्ट प्रॉम्प्ट के साथ पूर्व-प्रशिक्षित मॉडलों को मिलाकर प्रदर्शन और संगणनात्मक दक्षता के बीच वांछित संतुलन प्राप्त करें**।
118
+
119
+ -----
120
+
121
+ #### [सुझाव #3] ✅ आप चेकपॉइंट से पिछले सहेजनों को लोड कर सकते हैं ✅
122
+
123
+ **यदि आप प्रगति खो देते हैं, इंटरनेट कनेक्शन खोते हैं, या कोई उपकार्य विफल हो जाता है, तो आप हमेशा पिछले स्थिति से लोड कर सकते हैं।** आपकी सभी प्रगति डिफ़ॉल्ट रूप से state_saves वेरिएबल में सहेजी जाती है, जो प्रत्येक व्यक्तिगत चेकपॉइंट को संग्रहीत करता है। बस ai_lab_repo.py चलाते समय निम्नलिखित तर्क पास क��ें:
124
+ ```bash
125
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
126
+ ```
127
+
128
+ -----
129
+
130
+ #### [सुझाव #4] 🈯 यदि आप अंग्रेजी के अलावा किसी अन्य भाषा में चला रहे हैं 🈲
131
+
132
+ यदि आप एजेंट लैबोरेटरी को अंग्रेजी के अलावा किसी अन्य भाषा में चला रहे हैं, तो कोई समस्या नहीं है, बस सुनिश्चित करें कि एजेंटों को आपके पसंदीदा भाषा में अनुसंधान करने के लिए एक भाषा फ्लैग प्रदान करें। ध्यान दें कि हमने अन्य भाषाओं में एजेंट लैबोरेटरी चलाने का व्यापक अध्ययन नहीं किया है, इसलिए किसी भी समस्या का सामना करने पर रिपोर्ट करना सुनिश्चित करें।
133
+
134
+ उदाहरण के लिए, यदि आप चीनी में चला रहे हैं:
135
+ ```bash
136
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
137
+ ```
138
+
139
+ ----
140
+
141
+ #### [सुझाव #5] 🌟 सुधार के लिए बहुत गुंजाइश है 🌟
142
+
143
+ इस कोडबेस में सुधार की बहुत गुंजाइश है, इसलिए यदि आप अंततः परिवर्तन करते हैं और समुदाय की मदद करना चाहते हैं, तो कृपया आप जो परिवर्तन किए हैं उन्हें साझा करने में संकोच न करें! हमें उम्मीद है कि यह उपकरण आपकी मदद करेगा!
144
+
145
+ ## संदर्भ / Bibtex
146
+
147
+ ```bibtex
148
+ @preprint{schmidgall2025AgentLaboratory,
149
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
150
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
151
+ year={2025}
152
+ }
153
+ ```
readme/README-italian.md ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Laboratorio Agenti: Utilizzo di Agenti LLM come Assistenti di Ricerca
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Dimostrazione del flusso di AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | Italiano】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【🌐 <a href="https://agentlaboratory.github.io/">Sito web</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Documento di esempio</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citazione</a>】
14
+ </p>
15
+
16
+ ## 📖 Panoramica
17
+
18
+ - **Agent Laboratory** è un flusso di lavoro di ricerca autonomo end-to-end progettato per assistere **te** come ricercatore umano nell'**implementazione delle tue idee di ricerca**. Agent Laboratory è composto da agenti specializzati guidati da grandi modelli linguistici per supportarti durante l'intero flusso di lavoro di ricerca—dalla conduzione di revisioni della letteratura e formulazione di piani all'esecuzione di esperimenti e alla scrittura di rapporti completi.
19
+ - Questo sistema non è progettato per sostituire la tua creatività ma per complementarla, permettendoti di concentrarti sull'ideazione e il pensiero critico mentre automatizza compiti ripetitivi e che richiedono tempo come la codifica e la documentazione. Accomodando diversi livelli di risorse computazionali e coinvolgimento umano, Agent Laboratory mira ad accelerare la scoperta scientifica e ottimizzare la tua produttività di ricerca.
20
+
21
+ <p align="center">
22
+ <img src="../media/AgentLab.png" alt="Dimostrazione del flusso di AgentClinic" style="width: 99%;">
23
+ </p>
24
+
25
+ ### 🔬 Come funziona Agent Laboratory?
26
+
27
+ - Agent Laboratory è composto da tre fasi principali che guidano sistematicamente il processo di ricerca: (1) Revisione della letteratura, (2) Sperimentazione e (3) Scrittura del rapporto. Durante ogni fase, agenti specializzati guidati da LLM collaborano per raggiungere obiettivi distinti, integrando strumenti esterni come arXiv, Hugging Face, Python e LaTeX per ottimizzare i risultati. Questo flusso di lavoro strutturato inizia con la raccolta e analisi indipendente di documenti di ricerca pertinenti, prosegue attraverso la pianificazione collaborativa e la preparazione dei dati, e si conclude con la sperimentazione automatizzata e la generazione di rapporti completi. I dettagli sui ruoli specifici degli agenti e i loro contributi in queste fasi sono discussi nel documento.
28
+
29
+ <p align="center">
30
+ <img src="../media/AgentLabWF.png" alt="Dimostrazione del flusso di AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+ ## 🖥️ Installazione
34
+
35
+ ### Opzione Python venv
36
+
37
+ 1. **Clona il Repository GitHub**: Inizia clonando il repository usando il comando:
38
+ ```bash
39
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
40
+ ```
41
+
42
+ 2. **Configura e Attiva l'Ambiente Python**
43
+ ```bash
44
+ python -m venv venv_agent_lab
45
+ ```
46
+ - Ora attiva questo ambiente:
47
+ ```bash
48
+ source venv_agent_lab/bin/activate
49
+ ```
50
+
51
+ 3. **Installa le librerie richieste**
52
+ ```bash
53
+ pip install -r requirements.txt
54
+ ```
55
+
56
+ 4. **Installa pdflatex [OPZIONALE]**
57
+ ```bash
58
+ sudo apt install pdflatex
59
+ ```
60
+ - Questo permette agli agenti di compilare il codice sorgente LaTeX.
61
+ - **[IMPORTANTE]** Se questo passaggio non può essere eseguito a causa della mancanza di accesso sudo, la compilazione del pdf può essere disattivata eseguendo Agent Laboratory impostando il flag --compile_latex su false: --compile_latex=False
62
+
63
+ 5. **Ora esegui Agent Laboratory!**
64
+ ```bash
65
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
66
+ ```
67
+ oppure, se non hai installato pdflatex
68
+ ```bash
69
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
70
+ ```
71
+
72
+ -----
73
+
74
+ ## Consigli per migliori risultati di ricerca
75
+
76
+ #### [Consiglio #1] 📝 Assicurati di scrivere appunti dettagliati! 📝
77
+
78
+ **Scrivere appunti dettagliati è importante** per aiutare il tuo agente a comprendere cosa intendi realizzare nel tuo progetto, nonché eventuali preferenze di stile. Gli appunti possono includere qualsiasi esperimento che desideri che gli agenti eseguano, fornire chiavi API, determinati grafici o figure che desideri includere, o qualsiasi cosa tu voglia che l'agente sappia durante la ricerca.
79
+
80
+ Questa è anche la tua opportunità di far sapere all'agente **a quali risorse computazionali ha accesso**, ad esempio GPU (quante, che tipo di GPU, quanti GB), CPU (quanti core, che tipo di CPU), limitazioni di archiviazione e specifiche hardware.
81
+
82
+ Per aggiungere appunti, devi modificare la struttura task_notes_LLM all'interno di ai_lab_repo.py. Di seguito è fornito un esempio di set di appunti utilizzati per alcuni dei nostri esperimenti.
83
+
84
+ ```python
85
+ task_notes_LLM = [
86
+ {"phases": ["plan formulation"],
87
+ "note": f"You should come up with a plan for TWO experiments."},
88
+
89
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
90
+ "note": "Please use gpt-4o-mini for your experiments."},
91
+
92
+ {"phases": ["running experiments"],
93
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
94
+
95
+ {"phases": ["running experiments"],
96
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
97
+
98
+ {"phases": ["running experiments"],
99
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
100
+
101
+ {"phases": ["data preparation", "running experiments"],
102
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
103
+
104
+ {"phases": ["data preparation", "running experiments"],
105
+ "note": "Generate figures with very colorful and artistic design."},
106
+ ]
107
+ ```
108
+
109
+ --------
110
+
111
+ #### [Consiglio #2] 🚀 Utilizzare modelli più potenti generalmente porta a migliori ricerche 🚀
112
+
113
+ Quando si conduce una ricerca, **la scelta del modello può influenzare significativamente la qualità dei risultati**. I modelli più potenti tendono ad avere una maggiore accuratezza, migliori capacità di ragionamento e una migliore generazione dei rapporti. Se le risorse computazionali lo consentono, dà priorità all'uso di modelli avanzati come o1-(mini/preview) o simili modelli linguistici di grandi dimensioni all'avanguardia.
114
+
115
+ Tuttavia, **è importante bilanciare le prestazioni e l'efficienza dei costi**. Sebbene i modelli potenti possano fornire risultati migliori, spesso sono più costosi e richiedono più tempo per essere eseguiti. Considera di usarli selettivamente—ad esempio, per esperimenti chiave o analisi finali—mentre ti affidi a modelli più piccoli ed efficienti per compiti iterativi o prototipazione iniziale.
116
+
117
+ Quando le risorse sono limitate, **ottimizza effettuando il fine-tuning di modelli più piccoli** sul tuo dataset specifico o combinando modelli pre-addestrati con prompt specifici per il compito per raggiungere l'equilibrio desiderato tra prestazioni ed efficienza computazionale.
118
+
119
+ -----
120
+
121
+ #### [Consiglio #3] ✅ Puoi caricare salvataggi precedenti dai checkpoint ✅
122
+
123
+ **Se perdi i progressi, la connessione a internet o se un sotto-compito fallisce, puoi sempre caricare da uno stato precedente.** Tutti i tuoi progressi vengono salvati di default nella variabile state_saves, che memorizza ogni singolo checkpoint. Basta passare i seguenti argomenti quando esegui ai_lab_repo.py
124
+
125
+ ```bash
126
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
127
+ ```
128
+
129
+ -----
130
+
131
+ #### [Consiglio #4] 🈯 Se stai utilizzando una lingua diversa dall'inglese 🈲
132
+
133
+ Se stai utilizzando Agent Laboratory in una lingua diversa dall'inglese, nessun problema, basta assicurarti di fornire un flag di lingua agli agenti per eseguire la ricerca nella tua lingua preferita. Nota che non abbiamo studiato approfonditamente l'utilizzo di Agent Laboratory in altre lingue, quindi assicurati di segnalare eventuali problemi che incontri.
134
+
135
+ Ad esempio, se stai utilizzando in cinese:
136
+
137
+ ```bash
138
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
139
+ ```
140
+
141
+ ----
142
+
143
+ #### [Consiglio #5] 🌟 C'è molto spazio per miglioramenti 🌟
144
+
145
+ C'è molto spazio per migliorare questo codice, quindi se alla fine apporti modifiche e vuoi aiutare la comunità, sentiti libero di condividere le modifiche che hai effettuato! Speriamo che questo strumento ti sia d'aiuto!
146
+
147
+ ## Riferimenti / Bibtex
148
+
149
+ ```bibtex
150
+ @preprint{schmidgall2025AgentLaboratory,
151
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
152
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
153
+ year={2025}
154
+ }
155
+ ```
readme/README-japanese.md ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Laboratory: Using LLM Agents as Research Assistants
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | 日本語 | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【🌐 <a href="https://agentlaboratory.github.io/">Website</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Example Paper</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
14
+ </p>
15
+
16
+ ## 📖 概要
17
+
18
+ - **Agent Laboratory**は、**あなた**が**研究アイデアを実現する**ために支援するエンドツーエンドの自律的な研究ワークフローです。Agent Laboratoryは、大規模言語モデルによって駆動される専門のエージェントで構成されており、文献レビューの実施や計画の策定から実験の実行、包括的な報告書の作成まで、研究の全過程をサポートします。
19
+ - このシステムはあなたの創造性を置き換えるものではなく、補完するために設計されています。アイデアの創出や批判的思考に集中できるようにし、コーディングやドキュメント作成のような反復的で時間のかかる作業を自動化します。計算資源や人間の関与のレベルに応じて対応することで、Agent Laboratoryは科学的発見を加速し、研究の生産性を最適化することを目指しています。
20
+
21
+ <p align="center">
22
+ <img src="../media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
23
+ </p>
24
+
25
+ ### 🔬 Agent Laboratoryはどのように機能しますか?
26
+
27
+ - Agent Laboratoryは、研究プロセスを体系的に導く3つの主要なフェーズから構成されています:(1)文献レビュー、(2)実験、(3)報告書作成。各フェーズでは、LLMによって駆動される専門のエージェントが協力してそれぞれの目標を達成し、arXiv、Hugging Face、Python、LaTeXなどの外部ツールを統合して成果を最適化します。この構造化されたワークフローは、関連する研究論文の独立した収集と分析から始まり、協力的な計画とデータ準備を経て、自動化された実験と包括的な報告書の生成に至ります。これらのフェーズ全体にわたる具体的なエージェントの役割と貢献の詳細は論文で説明されています。
28
+
29
+ <p align="center">
30
+ <img src="../media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+ ## 🖥️ インストール
34
+
35
+ ### Python venv オプション
36
+
37
+ 1. **GitHubリポジトリをクローンする**: 以下のコマンドを使用してリポジトリをクローンします:
38
+ ```bash
39
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
40
+ ```
41
+
42
+ 2. **Python環境を設定してアクティベートする**
43
+ ```bash
44
+ python -m venv venv_agent_lab
45
+ ```
46
+
47
+ - 次に、この環境をアクティベートします:
48
+ ```bash
49
+ source venv_agent_lab/bin/activate
50
+ ```
51
+
52
+ 3. **必要なライブラリをインストールする**
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ 4. **pdflatexをインストールする [オプション]**
58
+ ```bash
59
+ sudo apt install pdflatex
60
+ ```
61
+
62
+ - これにより、エージェントがLaTeXソースをコンパイルできるようになります。
63
+ - **[重要]** sudo権限がないためにこのステップを実行できない場合、Agent Laboratoryを実行する際に --compile_latexフラグをfalseに設定してPDFのコンパイルをオフにすることができます: --compile_latex=False
64
+
65
+ 5. **Agent Laboratoryを実行します!**
66
+
67
+ ```bash
68
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
69
+ ```
70
+
71
+ または、pdflatexがインストールされていない場合
72
+
73
+ ```bash
74
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
75
+ ```
76
+
77
+ -----
78
+ ## より良い研究成果を得るためのヒント
79
+
80
+
81
+ #### [ヒント #1] 📝 詳細なノートを書くことを忘れずに! 📝
82
+
83
+ **詳細なノートを書くことは重要です**。これにより、エージェントがプロジェクトで達成しようとしていることや、スタイルの好みを理解するのに役立ちます。ノートには、エージェントに実行してほしい実験、APIキーの提供、含めたい特定のプロットや図、研究を行う際にエージェントに知っておいてほしいことなどを含めることができます。
84
+
85
+ また、**エージェントがアクセスできる計算資源**を知らせる機会でもあります。例えば、GPU(数、種類、GB数)、CPU(コア数、種類)、ストレージの制限、ハードウェア仕様などです。
86
+
87
+ ノートを追加するには、ai_lab_repo.py内のtask_notes_LLM構造を変更する必要があります。以下に、いくつかの実験で使用されたノートの例を示します。
88
+
89
+ ```python
90
+ task_notes_LLM = [
91
+ {"phases": ["plan formulation"],
92
+ "note": f"You should come up with a plan for TWO experiments."},
93
+
94
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
95
+ "note": "Please use gpt-4o-mini for your experiments."},
96
+
97
+ {"phases": ["running experiments"],
98
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
99
+
100
+ {"phases": ["running experiments"],
101
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
102
+
103
+ {"phases": ["running experiments"],
104
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
105
+
106
+ {"phases": ["data preparation", "running experiments"],
107
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
108
+
109
+ {"phases": ["data preparation", "running experiments"],
110
+ "note": "Generate figures with very colorful and artistic design."},
111
+ ]
112
+ ```
113
+
114
+ --------
115
+
116
+ #### [ヒント #2] 🚀 より強力なモデルを使用することで、一般的により良い研究が可能になります 🚀
117
+
118
+ 研究を行う際、**モデルの選択は結果の質に大きな影響を与える可能性があります**。より強力なモデルは、通常、精度が高く、推論能力が優れており、報告書の生成も優れています。計算資源が許す場合は、o1-(mini/preview)などの先進的な大規模言語モデルの使用を優先してください。
119
+
120
+ ただし、**性能と費用対効果のバランスを取ることが重要です**。強力なモデルはより良い結果をもたらす可能性がありますが、実行には時間と費用がかかることが多いです。重要な実験や最終分析には選択的に使用し、反復作業や初期のプロトタイピングには小さく効率的なモデルを使用することを検討してください。
121
+
122
+ 資源が限られている場合は、**小さなモデルを特定のデータセットでファインチューニングするか、タスク固有のプロンプトと組み合わせて使用することで、性能と計算効率の間の望ましいバランスを達成します**。
123
+
124
+ -----
125
+
126
+ #### [ヒント #3] ✅ チェックポイントから以前の保存をロードできます ✅
127
+
128
+ **進捗が失われた場合、インターネット接続が切れた場合、またはサブタスクが失敗した場合でも、以前の状態から常にロードできます。** すべての進捗はデフォルトでstate_saves変数に保存され、各チェックポイントが保存されます。ai_lab_repo.pyを実行する際に、以下の引数を渡すだけです
129
+
130
+ ```bash
131
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
132
+ ```
133
+
134
+ -----
135
+
136
+
137
+
138
+
139
+ #### [ヒント #4] 🈯 英語以外の言語で実行している場合 🈲
140
+
141
+ Agent Laboratoryを英語以外の言語で実行している場合でも問題ありません。エージェントが希望する言語で研究を行えるように、言語フラグを提供することを確認してください。Agent Laboratoryを他の言語で実行することについては十分に研究していないため、問題が発生した場合は必ず報告してください。
142
+
143
+ 例えば、中��語で実行する場合:
144
+
145
+ ```bash
146
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
147
+ ```
148
+
149
+ ----
150
+
151
+ #### [ヒント #5] 🌟 改善の余地がたくさんあります 🌟
152
+
153
+ このコードベースには改善の余地がたくさんありますので、変更を加えてコミュニティに貢献したい場合は、ぜひ変更内容を共有してください!このツールが皆さんのお役に立つことを願っています!
154
+
155
+ ## 参考文献 / Bibtex
156
+
157
+ ```bibtex
158
+ @preprint{schmidgall2025AgentLaboratory,
159
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
160
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
161
+ year={2025}
162
+ }
163
+ ```
readme/README-korean.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Laboratory: Using LLM Agents as Research Assistants
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | 한국어 | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【🌐 <a href="https://agentlaboratory.github.io/">Website</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Example Paper</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
14
+ </p>
15
+
16
+ ## 📖 개요
17
+
18
+ - **Agent Laboratory**는 **당신**이 인간 연구자로서 **연구 아이디어를 구현**할 수 있도록 지원하는 엔드 투 엔드 자율 연구 워크플로우입니다. Agent Laboratory는 대규모 언어 모델에 의해 구동되는 전문화된 에이전트들로 구성되어 문헌 검토 수행, 계획 수립, 실험 실행, 종합 보고서 작성에 이르기까지 전체 연구 워크플로우를 지원합니다.
19
+ - 이 시스템은 당신의 창의성을 대체하기 위해 설계된 것이 아니라 보완하기 위해 설계되었습니다. 아이디어 발상과 비판적 사고에 집중할 수 있도록 하면서 코딩 및 문서화와 같은 반복적이고 시간이 많이 소요되는 작업을 자동화합니다. 다양한 수준의 컴퓨팅 자원과 인간의 참여를 수용함으로써 Agent Laboratory는 과학적 발견을 가속화하고 연구 생산성을 최적화하는 것을 목표로 합니다.
20
+
21
+ <p align="center">
22
+ <img src="../media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
23
+ </p>
24
+
25
+ ### 🔬 Agent Laboratory는 어떻게 작동하나요?
26
+
27
+ - Agent Laboratory는 연구 과정을 체계적으로 안내하는 세 가지 주요 단계로 구성됩니다: (1) 문헌 검토, (2) 실험, (3) 보고서 작성. 각 단계 동안 LLM에 의해 구동되는 전문화된 에이전트들이 협력하여 개별 목표를 달성하며, arXiv, Hugging Face, Python, LaTeX와 같은 외부 도구를 통합하여 결과를 최적화합니다. 이 구조화된 워크플로우는 관련 연구 논문의 독립적인 수집 및 분석으로 시작하여, 협력적인 계획 수립 및 데이터 준비를 거쳐, 자동화된 실험 실행 및 종합적인 보고서 생성으로 이어집니다. 이러한 단계 전반에 걸친 특정 에이전트 역할과 기여에 대한 자세한 내용은 논문에서 논의됩니다.
28
+
29
+ <p align="center">
30
+ <img src="../media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+ ## 🖥️ 설치
34
+
35
+ ### Python venv 옵션
36
+
37
+
38
+ 1. **GitHub 저장소 복제**: 다음 명령어를 사용하여 저장소를 복제합니다:
39
+ ```bash
40
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
41
+ ```
42
+
43
+
44
+ 2. **Python 환경 설정 및 활성화**
45
+ ```bash
46
+ python -m venv venv_agent_lab
47
+ ```
48
+
49
+ - 이제 이 환경을 활성화합니다:
50
+ ```bash
51
+ source venv_agent_lab/bin/activate
52
+ ```
53
+
54
+
55
+ 3. **필수 라이브러리 설치**
56
+ ```bash
57
+ pip install -r requirements.txt
58
+ ```
59
+
60
+
61
+ 4. **pdflatex 설치 [옵션]**
62
+ ```bash
63
+ sudo apt install pdflatex
64
+ ```
65
+
66
+ - 이는 에이전트들이 LaTeX 소스를 컴파일할 수 있도록 합니다.
67
+ - **[중요]** sudo 접근 권한이 없어 이 단계를 실행할 수 없는 경우, --compile_latex 플래그를 false로 설정하여 Agent Laboratory 실행 시 PDF 컴파일을 비활성화할 수 있습니다: `--compile_latex=False`
68
+
69
+
70
+ 5. **이제 Agent Laboratory를 실행하세요!**
71
+
72
+ ```bash
73
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
74
+ ```
75
+
76
+ 또는, pdflatex가 설치되어 있지 않은 경우
77
+
78
+ ```bash
79
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
80
+ ```
81
+
82
+ -----
83
+ ## 더 나은 연구 결과를 위한 팁
84
+
85
+ #### [팁 #1] 📝 광범위한 노트를 작성하세요! 📝
86
+
87
+ **광범위한 노트 작성은** 에이전트가 프로젝트에서 달성하려는 목표와 스타일 선호도를 이해하는 데 중요합니다. 노트에는 에이전트에게 수행하도록 원하는 실험, API 키 제공, 포함하고 싶은 특정 플롯이나 그림, 또는 연구를 수행할 때 에이전트가 알아야 할 모든 내용을 포함할 수 있습니다.
88
+
89
+ 또한, **에이전트가 접근할 수 있는 컴퓨팅 자원**을 알려줄 수 있는 기회이기도 합니다. 예를 들어 GPU (몇 개, 어떤 유형의 GPU, GB 수), CPU (코어 수, CPU 유형), 저장 한계 및 하드웨어 사양 등을 포함할 수 있습니다.
90
+
91
+ 노트를 추가하려면, ai_lab_repo.py 내부의 `task_notes_LLM` 구조를 수정해야 합니다. 아래는 일부 실험에 사용된 노트의 예시입니다.
92
+
93
+ ```python
94
+ task_notes_LLM = [
95
+ {"phases": ["plan formulation"],
96
+ "note": f"You should come up with a plan for TWO experiments."},
97
+
98
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
99
+ "note": "Please use gpt-4o-mini for your experiments."},
100
+
101
+ {"phases": ["running experiments"],
102
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
103
+
104
+ {"phases": ["running experiments"],
105
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
106
+
107
+ {"phases": ["running experiments"],
108
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
109
+
110
+ {"phases": ["data preparation", "running experiments"],
111
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
112
+
113
+ {"phases": ["data preparation", "running experiments"],
114
+ "note": "Generate figures with very colorful and artistic design."},
115
+ ]
116
+ ```
117
+
118
+ --------
119
+
120
+ #### [팁 #2] 🚀 더 강력한 모델을 사용하는 것이 일반적으로 더 나은 연구로 이어집니다 🚀
121
+
122
+ 연구를 수행할 때, **모델의 선택은 결과의 질에 상당한 영향을 미칠 수 있습니다**. 더 강력한 모델은 일반적으로 더 높은 정확도, 더 나은 추론 능력, 더 우수한 보고서 생성을 제공합니다. 컴퓨팅 자원이 허용한다면, o1-(mini/preview)와 같은 최첨단 대규모 언어 모델과 같은 고급 모델의 사용을 우선시하세요.
123
+
124
+ 그러나, **성능과 비용 효율성의 균형을 맞추는 것이 중요합니다**. 강력한 모델은 더 나은 결과를 제공할 수 있지만, 실행하는 데 비용과 시간이 더 많이 소요되는 경우가 많습니다. 예를 들어, 핵심 실험이나 최종 분석에는 고급 모델을 선택적으로 사용하고, 반복 작업이나 초기 프로토타이핑에는 더 작고 효율적인 모델을 사용하는 것을 고려하세요.
125
+
126
+ 자원이 제한된 경우, **작은 모델을 특정 데이터셋에 맞게 미세 조정하거나, 사전 훈련된 모델과 작업 특화 프롬프트를 결합하여 성능과 컴퓨팅 효율성 사이의 원하는 균형을 달성할 수 있습니다**.
127
+
128
+ -----
129
+
130
+ #### [팁 #3] ✅ 체크포인트에서 이전 저장 상태를 불러올 수 있습니다 ✅
131
+
132
+ **진행 상황을 잃었거나 인터넷 연결이 끊기거나 하위 작업이 실패한 경우, 이전 상태에서 항상 불러올 수 있습니다.** 모든 진행 상황은 기본적으로 `state_saves` 변수에 저장되며, 이는 각 개별 체크포인트를 저장합니다. ai_lab_repo.py를 실행할 때 다음 인수를 전달하면 됩니다.
133
+
134
+ ```bash
135
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
136
+ ```
137
+
138
+ -----
139
+
140
+ #### [팁 #4] 🈯 영어가 아닌 다른 언어로 실행하는 경우 🈲
141
+
142
+ Agent Laboratory를 영어가 아닌 다른 언어로 실행하는 경우, 문제 없습니다. 단, 에이전트가 선호하는 언어로 연구를 수행할 수 있도록 언어 플래그를 제공해야 합니다. 다른 언어로 Agent Laboratory를 실행하는 것에 대해 광범위하게 연구하지 않았으므로, 발생하는 문제를 반드시 보고해 주세요.
143
+
144
+ 예를 들어, 중국어로 실행하는 경우:
145
+
146
+ ```bash
147
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
148
+ ```
149
+
150
+ ----
151
+
152
+ #### [팁 #5] 🌟 개선의 여지가 많습니다 🌟
153
+
154
+ 이 코드��이스를 개선할 여지가 많으므로, 변경을 가하고 커뮤니티에 기여하고 싶다면, 변경한 사항을 자유롭게 공유해 주세요! 이 도구가 여러분에게 도움이 되길 바랍니다!
155
+
156
+ ## 참고 문헌 / Bibtex
157
+
158
+
159
+
160
+ ```bibtex
161
+ @preprint{schmidgall2025AgentLaboratory,
162
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
163
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
164
+ year={2025}
165
+ }
166
+ ```
readme/README-portugues.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Laboratory: Usando Agentes LLM como Assistentes de Pesquisa
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | Português | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【🌐 <a href="https://agentlaboratory.github.io/">Website</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Example Paper</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
14
+ </p>
15
+
16
+ ## 📖 Visão Geral
17
+
18
+ - **Agent Laboratory** é um fluxo de trabalho de pesquisa autônomo de ponta a ponta, destinado a auxiliar **você** como pesquisador humano na **implementação das suas ideias de pesquisa**. O Agent Laboratory consiste em agentes especializados movidos por grandes modelos de linguagem para apoiá-lo durante todo o fluxo de trabalho de pesquisa — desde a condução de revisões de literatura e formulação de planos até a execução de experimentos e a redação de relatórios abrangentes.
19
+ - Este sistema não foi projetado para substituir a sua criatividade, mas para complementá-la, permitindo que você se concentre na ideação e no pensamento crítico enquanto automatiza tarefas repetitivas e que consomem muito tempo, como codificação e documentação. Ao acomodar diferentes níveis de recursos computacionais e envolvimento humano, o Agent Laboratory visa acelerar a descoberta científica e otimizar a sua produtividade em pesquisa.
20
+
21
+ <p align="center">
22
+ <img src="../media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
23
+ </p>
24
+
25
+ ### 🔬 Como funciona o Agent Laboratory?
26
+
27
+ - O Agent Laboratory consiste em três fases principais que orientam sistematicamente o processo de pesquisa: (1) Revisão de Literatura, (2) Experimentação e (3) Redação de Relatórios. Durante cada fase, agentes especializados movidos por LLMs colaboram para alcançar objetivos distintos, integrando ferramentas externas como arXiv, Hugging Face, Python e LaTeX para otimizar os resultados. Este fluxo de trabalho estruturado começa com a coleta e análise independentes de artigos de pesquisa relevantes, avança através do planejamento colaborativo e preparação de dados, e resulta em experimentação automatizada e geração de relatórios abrangentes. Detalhes sobre os papéis específicos dos agentes e suas contribuições ao longo dessas fases são discutidos no artigo.
28
+
29
+ <p align="center">
30
+ <img src="../media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+ ## 🖥️ Instalação
34
+
35
+ ### Opção de ambiente virtual Python (venv)
36
+
37
+ 1. **Clone o Repositório do GitHub**: Comece clonando o repositório usando o comando:
38
+ ```bash
39
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
40
+ ```
41
+
42
+ 2. **Configure e Ative o Ambiente Python**
43
+ ```bash
44
+ python -m venv venv_agent_lab
45
+ ```
46
+
47
+ - Agora, ative este ambiente:
48
+ ```bash
49
+ source venv_agent_lab/bin/activate
50
+ ```
51
+
52
+ 3. **Instale as bibliotecas necessárias**
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ 4. **Instale o pdflatex [OPCIONAL]**
58
+ ```bash
59
+ sudo apt install pdflatex
60
+ ```
61
+
62
+ - Isso permite que o código LaTeX seja compilado pelos agentes.
63
+ - **[IMPORTANTE]** Se esta etapa não puder ser executada devido à falta de acesso sudo, a compilação de PDF pode ser desativada executando o Agent Laboratory com a flag --compile_latex definida como false: --compile_latex=False
64
+
65
+ 5. **Agora execute o Agent Laboratory!**
66
+
67
+ ```bash
68
+ python ai_lab_repo.py --api-key "API_KEY_AQUI" --llm-backend "o1-mini" --research-topic "SUA IDEIA DE PESQUISA"
69
+ ```
70
+
71
+ ou, se você não tiver o pdflatex instalado
72
+
73
+ ```bash
74
+ python ai_lab_repo.py --api-key "API_KEY_AQUI" --llm-backend "o1-mini" --research-topic "SUA IDEIA DE PESQUISA" --compile_latex=False
75
+ ```
76
+
77
+ -----
78
+ ## Dicas para melhores resultados de pesquisa
79
+
80
+ #### [Dica #1] 📝 Certifique-se de escrever notas extensas! 📝
81
+
82
+ **Escrever notas extensas é importante** para ajudar seu agente a entender o que você está tentando realizar em seu projeto, bem como quaisquer preferências de estilo. As notas podem incluir quaisquer experimentos que você deseja que os agentes realizem, fornecendo chaves de API, certos gráficos ou figuras que você deseja incluir, ou qualquer coisa que você queira que o agente saiba ao realizar a pesquisa.
83
+
84
+ Esta também é sua oportunidade de informar ao agente **a quais recursos de computação ele tem acesso**, por exemplo, GPUs (quantas, que tipo de GPU, quantos GBs), CPUs (quantos núcleos, que tipo de CPUs), limitações de armazenamento e especificações de hardware.
85
+
86
+ Para adicionar notas, você deve modificar a estrutura task_notes_LLM dentro de ai_lab_repo.py. Abaixo está um exemplo de conjunto de notas usadas em alguns de nossos experimentos.
87
+
88
+ ```python
89
+ task_notes_LLM = [
90
+ {"phases": ["plan formulation"],
91
+ "note": f"You should come up with a plan for TWO experiments."},
92
+
93
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
94
+ "note": "Please use gpt-4o-mini for your experiments."},
95
+
96
+ {"phases": ["running experiments"],
97
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
98
+
99
+ {"phases": ["running experiments"],
100
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
101
+
102
+ {"phases": ["running experiments"],
103
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
104
+
105
+ {"phases": ["data preparation", "running experiments"],
106
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
107
+
108
+ {"phases": ["data preparation", "running experiments"],
109
+ "note": "Generate figures with very colorful and artistic design."},
110
+ ]
111
+ ```
112
+
113
+ --------
114
+
115
+ #### [Dica #2] 🚀 Usar modelos mais poderosos geralmente leva a melhores pesquisas 🚀
116
+
117
+ Ao conduzir pesquisas, **a escolha do modelo pode impactar significativamente a qualidade dos resultados**. Modelos mais poderosos tendem a ter maior precisão, melhores capacidades de raciocínio e melhor geração de relatórios. Se os recursos computacionais permitirem, priorize o uso de modelos avançados como o1-(mini/preview) ou modelos de linguagem grandes de última geração similares.
118
+
119
+ No entanto, **é importante equilibrar desempenho e custo-benefício**. Embora modelos poderosos possam gerar melhores resultados, eles geralmente são mais caros e consomem mais tempo para serem executados. Considere usá-los seletivamente — por exemplo, para experimentos chave ou análises finais — enquanto confia em modelos menores e mais eficientes para tarefas iterativas ou prototipagem inicial.
120
+
121
+ Quando os recursos são limitados, **otimize ajustando modelos menores** no seu conjunto de dados específico ou combinando modelos pré-treinados com prompts específicos para a tarefa para alcançar o equilíbrio desejado entre desempenho e eficiência computacional.
122
+
123
+ -----
124
+
125
+ #### [Dica #3] ✅ Você pode carregar salvamentos anteriores a partir de checkpoints ✅
126
+
127
+ **Se você perder o progresso, conexão com a internet ou se uma subtarefa falhar, você sempre pode carregar a partir de um estado anterior.** Todo o seu progresso é salvo por padrão na variável state_saves, que armazena cada checkpoint individual. Basta passar os seguintes argumentos ao executar ai_lab_repo.py
128
+
129
+ ```bash
130
+ python ai_lab_repo.py --api-key "API_KEY_AQUI" --research-topic "SUA IDEIA DE PESQUISA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
131
+ ```
132
+
133
+ -----
134
+
135
+ #### [Dica #4] 🈯 Se você estiver executando em um idioma diferente do inglês 🈲
136
+
137
+ Se você estiver executando o Agent Laboratory em um idioma diferente do inglês, sem problema, apenas certifique-se de fornecer uma flag de idioma para que os agentes realizem a pesquisa no seu idioma preferido. Observe que não estudamos extensivamente a execução do Agent Laboratory em outros idiomas, portanto, certifique-se de relatar quaisquer problemas que encontrar.
138
+
139
+ Por exemplo, se você estiver executando em chinês:
140
+
141
+ ```bash
142
+ python ai_lab_repo.py --api-key "API_KEY_AQUI" --research-topic "SUA IDEIA DE PESQUISA (no seu idioma)" --llm-backend "o1-mini" --language "中文"
143
+ ```
144
+
145
+ ----
146
+
147
+ #### [Dica #5] 🌟 Há muito espaço para melhorias 🌟
148
+
149
+ Há muito espaço para melhorar esta base de código, então se você acabar fazendo alterações e quiser ajudar a comunidade, sinta-se à vontade para compartilhar as mudanças que você fez! Esperamos que esta ferramenta lhe seja útil!
150
+
151
+ ## Referência / Bibtex
152
+
153
+ ```bibtex
154
+ @preprint{schmidgall2025AgentLaboratory,
155
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
156
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
157
+ year={2025}
158
+ }
159
+ ```
readme/README-russian.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Лаборатория Агентов: Использование агентов на основе больших языковых моделей в качестве научных ассистентов
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | Русский | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【🌐 <a href="https://agentlaboratory.github.io/">Веб-сайт</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Программное обеспечение</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Видео</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Пример статьи</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Цитирование</a>】
14
+ </p>
15
+
16
+ ## 📖 Обзор
17
+
18
+ - **Лаборатория Агентов** — это автономный исследовательский процесс от начала до конца, предназначенный для помощи **вам** как человеческому исследователю в **реализации ваших исследовательских идей**. Лаборатория Агентов состоит из специализированных агентов, управляемых большими языковыми моделями, которые поддерживают вас на протяжении всего исследовательского процесса — от проведения обзора литературы и формулирования планов до выполнения экспериментов и написания подробных отчетов.
19
+ - Эта система не предназначена для замены вашего творчества, а дополняет его, позволяя вам сосредоточиться на генерации идей и критическом мышлении, одновременно автоматизируя повторяющиеся и времязатратные задачи, такие как кодирование и документирование. Адаптируясь к различным уровням вычислительных ресурсов и вовлеченности человека, Лаборатория Агентов стремится ускорить научные открытия и оптимизировать вашу исследовательскую продуктивность.
20
+
21
+ <p align="center">
22
+ <img src="../media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
23
+ </p>
24
+
25
+ ### 🔬 Как работает Лаборатория Агентов?
26
+
27
+ - Лаборатория Агентов состоит из трех основных фаз, которые систематически направляют исследовательский процесс: (1) Обзор литературы, (2) Экспериментирование и (3) Написание отчета. В каждой фазе специализированные агенты, управляемые большими языковыми моделями, сотрудничают для достижения отдельных целей, интегрируя внешние инструменты, такие как arXiv, Hugging Face, Python и LaTeX, для оптимизации результатов. Эта структурированная рабочая схема начинается с независимого сбора и анализа соответствующих научных работ, проходит через совместное планирование и подготовку данных и заканчивается автоматизированным проведением экспериментов и созданием подробных отчетов. Детали конкретных ролей агентов и их вклад на каждом этапе обсуждаются в статье.
28
+
29
+ <p align="center">
30
+ <img src="../media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+ ## 🖥️ Установка
34
+
35
+ ### Вариант с использованием Python venv
36
+
37
+ 1. **Клонируйте репозиторий GitHub**: Начните с клонирования репозитория с помощью команды:
38
+ ```bash
39
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
40
+ ```
41
+
42
+ 2. **Настройте и активируйте Python окружение**
43
+ ```bash
44
+ python -m venv venv_agent_lab
45
+ ```
46
+
47
+ - Теперь активируйте это окружение:
48
+ ```bash
49
+ source venv_agent_lab/bin/activate
50
+ ```
51
+
52
+ 3. **Установите необходимые библиотеки**
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ 4. **Установите pdflatex [ОПЦИОНАЛЬНО]**
58
+ ```bash
59
+ sudo apt install pdflatex
60
+ ```
61
+
62
+ - Это позволяет агентам компилировать исходный код LaTeX.
63
+ - **[ВАЖНО]** Если этот шаг невозможно выполнить из-за отсутствия прав sudo, можно отключить компиляцию pdf, запустив Лабораторию Агентов с флагом --compile_latex=False: --compile_latex=False
64
+
65
+ 5. **Теперь запустите Лабораторию Агентов!**
66
+
67
+ ```bash
68
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "ВАША ИССЛЕДОВАТЕЛЬСКАЯ ИДЕЯ"
69
+ ```
70
+
71
+ или, если у вас не установлен pdflatex
72
+
73
+ ```bash
74
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "ВАША ИССЛЕДОВАТЕЛЬСКАЯ ИДЕЯ" --compile_latex=False
75
+ ```
76
+
77
+ -----
78
+
79
+ ## Советы для лучших исследовательских результатов
80
+
81
+ #### [Совет №1] 📝 Обязательно записывайте подробные заметки! 📝
82
+
83
+ **Ведение подробных заметок важно** для того, чтобы ваш агент понимал, что вы хотите достичь в вашем проекте, а также любые предпочтения в стиле. Заметки могут включать любые эксперименты, которые вы хотите, чтобы агенты выполняли, предоставление API-ключей, определенные графики или фигуры, которые вы хотите включить, или любую информацию, которую вы хотите, чтобы агент знал при проведении исследований.
84
+
85
+ Это также ваша возможность сообщить агенту, **какие вычислительные ресурсы у него есть**, например, GPU (сколько, какой тип GPU, сколько GB), CPU (сколько ядер, какой тип CPU), ограничения по памяти и спецификации оборудования.
86
+
87
+ Чтобы добавить заметки, необходимо изменить структуру task_notes_LLM внутри файла ai_lab_repo.py. Ниже приведен пример набора заметок, использованных в некоторых наших экспериментах.
88
+
89
+ ```python
90
+ task_notes_LLM = [
91
+ {"phases": ["plan formulation"],
92
+ "note": f"You should come up with a plan for TWO experiments."},
93
+
94
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
95
+ "note": "Please use gpt-4o-mini for your experiments."},
96
+
97
+ {"phases": ["running experiments"],
98
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
99
+
100
+ {"phases": ["running experiments"],
101
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
102
+
103
+ {"phases": ["running experiments"],
104
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
105
+
106
+ {"phases": ["data preparation", "running experiments"],
107
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
108
+
109
+ {"phases": ["data preparation", "running experiments"],
110
+ "note": "Generate figures with very colorful and artistic design."},
111
+ ]
112
+ ```
113
+
114
+ --------
115
+
116
+ #### [Совет №2] 🚀 Использование более мощных моделей обычно приводит к лучшим исследованиям 🚀
117
+
118
+ При проведении исследований, **выбор модели может значительно повлиять на качество результатов**. Более мощные модели, как правило, имеют более высокую точность, лучшие способности к рассуждению и более качественное генерирование отчетов. Если вычислительные ресурсы позволяют, отдавайте предпочтение использованию продвинутых моделей, таких как o1-(mini/preview) или подобных современных больших языковых моделей.
119
+
120
+ Однако, **важно балансировать между производительностью и экономической эффективностью**. Хотя мощные модели могут давать лучшие результаты, они часто дороже и требуют больше времени для выполнения. Рассмотрите возможность использования их выборочно — например, для ключевых экспериментов или окончательных анализов — в то время как для итеративных задач или начального прототипирования полагайтесь на более маленькие и эффективные модели.
121
+
122
+ Когда ресурсы ограничены, **оптимизируйте, дорабатывая более маленькие модели** на вашем конкретном наборе данных или комбинируя предобученные модели с специфическими для задачи подсказками, чтобы достичь желаемого баланса между производительностью и вычислительной эффективностью.
123
+
124
+ -----
125
+
126
+ #### [Совет №3] ✅ Вы можете загрузить предыдущие сохранения из контрольных точек ✅
127
+
128
+ **Если вы потеряете прогресс, потеряете интернет-соединение или если подзадача завершится неудачей, вы всегда можете загрузить предыдущую версию.** Весь ваш прогресс сохраняется по умолчанию в переменной state_saves, которая хранит каждую отдельную контрольную точку. Просто передайте следующие аргументы при запуске ai_lab_repo.py
129
+
130
+ ```bash
131
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "ВАША ИССЛЕДОВАТЕЛЬСКАЯ ИДЕЯ" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
132
+ ```
133
+
134
+ -----
135
+
136
+ #### [Совет №4] 🈯 Если вы работаете на другом языке, кроме английского 🈲
137
+
138
+ Если вы запускаете Лабораторию Агентов на другом языке, кроме английского, это не проблема, просто убедитесь, что вы предоставили языковой флаг агентам для проведения исследований на предпочитаемом вами языке. Обратите внимание, что мы не проводили обширных исследований по запуску Лаборатории Агентов на других языках, поэтому обязательно сообщайте о любых возникающих проблемах.
139
+
140
+ Например, если вы работаете на китайском языке:
141
+
142
+ ```bash
143
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "ВАША ИССЛЕДОВАТЕЛЬСКАЯ ИДЕЯ (на вашем языке)" --llm-backend "o1-mini" --language "中文"
144
+ ```
145
+
146
+ ----
147
+
148
+ #### [Совет №5] 🌟 Есть много возможностей для улучшения 🌟
149
+
150
+ Есть много возможностей для улучшения этой кодовой базы, поэтому если вы внесете изменения и захотите помочь сообществу, пожалуйста, не стесняйтесь поделиться внесенными изменениями! Мы надеемся, что этот инструмент вам поможет!
151
+
152
+ ## Ссылки / Bibtex
153
+
154
+ bibtex
155
+ ```bibtex
156
+ @preprint{schmidgall2025AgentLaboratory,
157
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
158
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
159
+ year={2025}
160
+ }
161
+ ```
readme/README-slovak.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Laboratory: Používanie LLM Agentov ako Výskumných Asistentov
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstrácia toku AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | Slovenčina | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【🌐 <a href="https://agentlaboratory.github.io/">Webová stránka</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Softvér</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Príkladový článok</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citácia</a>】
14
+ </p>
15
+
16
+ ## 📖 Prehľad
17
+
18
+ - **Agent Laboratory** je autonómny výskumný pracovný postup od začiatku do konca, ktorý má za úlohu asistovať **vám** ako ľudskému výskumníkovi pri **realizácii vašich výskumných nápadov**. Agent Laboratory pozostáva zo špecializovaných agentov poháňaných veľkými jazykovými modelmi, ktorí vás podporujú počas celého výskumného procesu – od vykonávania literárnych prehľadov a formulovania plánov až po realizáciu experimentov a písanie komplexných správ.
19
+ - Tento systém nie je navrhnutý na nahradenie vašej kreativity, ale na jej doplnenie, čo vám umožňuje sústrediť sa na tvorivosť a kritické myslenie pri automatizácii opakujúcich sa a časovo náročných úloh, ako je kódovanie a dokumentácia. Tým, že zohľadňuje rôzne úrovne výpočtových zdrojov a ľudského zapojenia, Agent Laboratory má za cieľ urýchliť vedecké objavy a optimalizovať vašu výskumnú produktivitu.
20
+
21
+ <p align="center">
22
+ <img src="../media/AgentLab.png" alt="Demonstrácia toku AgentClinic" style="width: 99%;">
23
+ </p>
24
+
25
+ ### 🔬 Ako Agent Laboratory funguje?
26
+
27
+ - Agent Laboratory sa skladá z troch hlavných fáz, ktoré systematicky usmerňujú výskumný proces: (1) Literárny prehľad, (2) Experimentovanie a (3) Písanie správ. Počas každej fázy špecializovaní agenti poháňaní LLM spolupracujú na dosiahnutí konkrétnych cieľov, integrujúc externé nástroje ako arXiv, Hugging Face, Python a LaTeX na optimalizáciu výsledkov. Táto štruktúrovaná pracovná postupnosť začína nezávislým zhromažďovaním a analýzou relevantných výskumných prác, pokračuje cez kolaboratívne plánovanie a prípravu dát a končí automatizovaným experimentovaním a komplexnou generáciou správ. Podrobnosti o konkrétnych rolách agentov a ich príspevkoch v rámci týchto fáz sú diskutované v článku.
28
+
29
+ <p align="center">
30
+ <img src="../media/AgentLabWF.png" alt="Demonstrácia toku AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+ ## 🖥️ Inštalácia
34
+
35
+ ### Python venv možnosť
36
+
37
+ 1. **Naklonujte GitHub repozitár**: Začnite klonovaním repozitára pomocou príkazu:
38
+ ```bash
39
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
40
+ ```
41
+
42
+ 2. **Nastavte a aktivujte Python prostredie**
43
+ ```bash
44
+ python -m venv venv_agent_lab
45
+ ```
46
+
47
+ - Teraz aktivujte toto prostredie:
48
+ ```bash
49
+ source venv_agent_lab/bin/activate
50
+ ```
51
+
52
+ 3. **Nainštalujte požadované knižnice**
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ 4. **Nainštalujte pdflatex [VOLITEĽNÉ]**
58
+ ```bash
59
+ sudo apt install pdflatex
60
+ ```
61
+
62
+ - Toto umožňuje agentom kompilovať latex zdroj.
63
+ - **[DÔLEŽITÉ]** Ak tento krok nemôžete vykonať kvôli absencii sudo prístupu, kompiláciu pdf môžete vypnúť spustením Agent Laboratory s nastavením vlajky --compile_latex na false: `--compile_latex=False`
64
+
65
+ 5. **Teraz spustite Agent Laboratory!**
66
+ ```bash
67
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
68
+ ```
69
+
70
+ alebo, ak nemáte nainštalovaný pdflatex
71
+ ```bash
72
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
73
+ ```
74
+
75
+ -----
76
+ ## Tipy pre lepšie výskumné výsledky
77
+
78
+ #### [Tip #1] 📝 Uistite sa, že píšete rozsiahle poznámky! 📝
79
+
80
+ **Písanie rozsiahlych poznámok je dôležité** pre pomoc vášmu agentovi pochopiť, čo sa snažíte dosiahnuť vo vašom projekte, ako aj akékoľvek preferencie štýlu. Poznámky môžu obsahovať akékoľvek experimenty, ktoré chcete, aby agenti vykonali, poskytovanie API kľúčov, určité grafy alebo figúry, ktoré chcete zahrnúť, alebo čokoľvek, čo chcete, aby agent vedel pri vykonávaní výskumu.
81
+
82
+ Je to tiež vaša príležitosť informovať agenta, **aké výpočtové zdroje má k dispozícii**, napr. GPU (koľko, aký typ GPU, koľko GB), CPU (koľko jadier, aký typ CPU), obmedzenia úložiska a hardvérové špecifikácie.
83
+
84
+ Aby ste pridali poznámky, musíte upraviť štruktúru `task_notes_LLM` v súbore `ai_lab_repo.py`. Nižšie je uvedený príklad sady poznámok použitých pre niektoré naše experimenty.
85
+
86
+ ```python
87
+ task_notes_LLM = [
88
+ {"phases": ["plan formulation"],
89
+ "note": f"You should come up with a plan for TWO experiments."},
90
+
91
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
92
+ "note": "Please use gpt-4o-mini for your experiments."},
93
+
94
+ {"phases": ["running experiments"],
95
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
96
+
97
+ {"phases": ["running experiments"],
98
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
99
+
100
+ {"phases": ["running experiments"],
101
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
102
+
103
+ {"phases": ["data preparation", "running experiments"],
104
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
105
+
106
+ {"phases": ["data preparation", "running experiments"],
107
+ "note": "Generate figures with very colorful and artistic design."},
108
+ ]
109
+ ```
110
+
111
+ --------
112
+
113
+ #### [Tip #2] 🚀 Používanie výkonnejších modelov zvyčajne vedie k lepšiemu výskumu 🚀
114
+
115
+ Pri vykonávaní výskumu môže **výber modelu významne ovplyvniť kvalitu výsledkov**. Výkonnejšie modely majú tendenciu mať vyššiu presnosť, lepšie schopnosti logického uvažovania a lepšiu generáciu správ. Ak výpočtové zdroje umožňujú, uprednostnite používanie pokročilých modelov, ako sú o1-(mini/preview) alebo podobné najmodernejšie veľké jazykové modely.
116
+
117
+ Avšak, **je dôležité nájsť rovnováhu medzi výkonom a nákladovou efektívnosťou**. Zatiaľ čo výkonnejšie modely môžu priniesť lepšie výsledky, často sú drahšie a časovo náročnejšie na spustenie. Zvážte ich selektívne používanie – napríklad pre kľúčové experimenty alebo konečné analýzy – zatiaľ čo na iteratívne úlohy alebo počiatočné prototypovanie sa spoliehajte na menšie, efektívnejšie modely.
118
+
119
+ Keď sú zdroje obmedzené, **optimalizujte jemným ladením menších modelov** na vašich špecifických dátach alebo kombinovaním predtrénovaných modelov s úlohovo špecifickými promptami, aby ste dosiahli požadovanú rovnováhu medzi výkonom a výpočtovou efektívnosťou.
120
+
121
+ -----
122
+
123
+ #### [Tip #3] ✅ Môžete načítať predchádzajúce uloženia z kontrolných bodov ✅
124
+
125
+ **Ak stratíte postup, internetové pripojenie alebo ak sa podúloha nepodarí, môžete vždy načítať z predchádzajúceho stavu.** Všetok váš postup je predvolene uložený v premennej `state_saves`, ktorá ukladá každý jednotlivý kontrolný bod. Stačí pri spúšťaní `ai_lab_repo.py` zadať nasledujúce argumenty:
126
+
127
+ ```bash
128
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
129
+ ```
130
+
131
+ -----
132
+
133
+ #### [Tip #4] 🈯 Ak pracujete v inom jazyku než angličtine 🈲
134
+
135
+ Ak spúšťate Agent Laboratory v inom jazyku než v angličtine, nie je problém, stačí zabezpečiť, aby ste agentom poskytli jazykovú vlajku pre vykonávanie výskumu vo vašom preferovanom jazyku. Všimnite si, že sme neštudovali dôkladne spúšťanie Agent Laboratory v iných jazykoch, preto určite hláste akékoľvek problémy, na ktoré narazíte.
136
+
137
+ Napríklad, ak pracujete v čínštine:
138
+
139
+ ```bash
140
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
141
+ ```
142
+
143
+ ----
144
+
145
+ #### [Tip #5] 🌟 Je tu veľa priestoru na zlepšenie 🌟
146
+
147
+ Je tu veľa priestoru na zlepšenie tohto kódu, takže ak urobíte zmeny a chcete pomôcť komunite, neváhajte zdieľať zmeny, ktoré ste vykonali! Dúfame, že vám tento nástroj pomôže!
148
+
149
+ ## Reference / Bibtex
150
+
151
+ ```bibtex
152
+ @preprint{schmidgall2025AgentLaboratory,
153
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
154
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
155
+ year={2025}
156
+ }
157
+ ```
readme/README-spanish.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Laboratory: Using LLM Agents as Research Assistants
2
+
3
+
4
+ <p align="center">
5
+ <img src="../media/AgentLabLogo.png" alt="Demostración del flujo de AgentClinic" style="width: 99%;">
6
+ </p>
7
+
8
+
9
+ <p align="center">
10
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | Español | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
11
+ </p>
12
+
13
+ <p align="center">
14
+ 【🌐 <a href="https://agentlaboratory.github.io/">Sitio web</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Artículo de ejemplo</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citación</a>】
15
+ </p>
16
+
17
+ ## 📖 Overview
18
+
19
+ - **Agent Laboratory** es un flujo de trabajo de investigación autónomo de extremo a extremo diseñado para asistir **a ti** como investigador humano en **implementar tus ideas de investigación**. Agent Laboratory consiste en agentes especializados impulsados por grandes modelos de lenguaje para apoyarte a lo largo de todo el flujo de trabajo de investigación, desde la realización de revisiones bibliográficas y la formulación de planes hasta la ejecución de experimentos y la redacción de informes comprensivos.
20
+ - Este sistema no está diseñado para reemplazar tu creatividad, sino para complementarla, permitiéndote enfocarte en la ideación y el pensamiento crítico mientras automatiza tareas repetitivas y que consumen mucho tiempo, como la programación y la documentación. Al acomodar diferentes niveles de recursos computacionales e implicación humana, Agent Laboratory tiene como objetivo acelerar el descubrimiento científico y optimizar tu productividad en la investigación.
21
+
22
+ <p align="center">
23
+ <img src="../media/AgentLab.png" alt="Demostración del flujo de AgentClinic" style="width: 99%;">
24
+ </p>
25
+
26
+ ### 🔬 How does Agent Laboratory work?
27
+
28
+ - Agent Laboratory consta de tres fases principales que guían sistemáticamente el proceso de investigación: (1) Revisión de Literatura, (2) Experimentación y (3) Redacción de Informes. Durante cada fase, agentes especializados impulsados por LLM colaboran para lograr objetivos distintos, integrando herramientas externas como arXiv, Hugging Face, Python y LaTeX para optimizar los resultados. Este flujo de trabajo estructurado comienza con la recolección y análisis independiente de artículos de investigación relevantes, avanza a través de la planificación colaborativa y la preparación de datos, y culmina en la experimentación automatizada y la generación de informes comprensivos. Los detalles sobre roles específicos de los agentes y sus contribuciones a lo largo de estas fases se discuten en el documento.
29
+
30
+ <p align="center">
31
+ <img src="../media/AgentLabWF.png" alt="Demostración del flujo de AgentClinic" style="width: 99%;">
32
+ </p>
33
+
34
+ ## 🖥️ Installation
35
+
36
+ ### Python venv option
37
+
38
+
39
+ 1. **Clonar el Repositorio de GitHub**: Comienza clonando el repositorio usando el comando:
40
+ ```bash
41
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
42
+ ```
43
+
44
+
45
+ 2. **Configurar y Activar el Entorno de Python**
46
+ ```bash
47
+ python -m venv venv_agent_lab
48
+ ```
49
+
50
+ - Ahora activa este entorno:
51
+ ```bash
52
+ source venv_agent_lab/bin/activate
53
+ ```
54
+
55
+
56
+ 3. **Instalar las librerías requeridas**
57
+ ```bash
58
+ pip install -r requirements.txt
59
+ ```
60
+
61
+
62
+ 4. **Instalar pdflatex [OPCIONAL]**
63
+ ```bash
64
+ sudo apt install pdflatex
65
+ ```
66
+
67
+ - Esto permite que las fuentes de LaTeX sean compiladas por los agentes.
68
+ - **[IMPORTANTE]** Si no puedes ejecutar este paso debido a la falta de acceso sudo, la compilación de PDF puede desactivarse ejecutando Agent Laboratory configurando la bandera `--compile_latex` a falso: `--compile_latex=False`
69
+
70
+
71
+ 5. **¡Ahora ejecuta Agent Laboratory!**
72
+
73
+ ```bash
74
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
75
+ ```
76
+
77
+ o, si no tienes pdflatex instalado
78
+
79
+ ```bash
80
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
81
+ ```
82
+
83
+ -----
84
+ ## Consejos para mejores resultados de investigación
85
+
86
+
87
+ #### [Consejo #1] 📝 ¡Asegúrate de escribir notas extensas! 📝
88
+
89
+ **Escribir notas extensas es importante** para ayudar a tu agente a comprender lo que buscas lograr en tu proyecto, así como cualquier preferencia de estilo. Las notas pueden incluir cualquier experimento que desees que los agentes realicen, proporcionar claves de API, ciertos gráficos o figuras que quieras incluir, o cualquier cosa que quieras que el agente sepa al realizar la investigación.
90
+
91
+ Esta también es tu oportunidad para informar al agente **a qué recursos computacionales tiene acceso**, por ejemplo, GPUs (cuántas, qué tipo de GPU, cuántos GB), CPUs (cuántos núcleos, qué tipo de CPUs), limitaciones de almacenamiento y especificaciones de hardware.
92
+
93
+ Para agregar notas, debes modificar la estructura `task_notes_LLM` dentro de `ai_lab_repo.py`. A continuación se proporciona un ejemplo de conjunto de notas utilizadas en algunos de nuestros experimentos.
94
+
95
+ ```python
96
+ task_notes_LLM = [
97
+ {"phases": ["plan formulation"],
98
+ "note": f"You should come up with a plan for TWO experiments."},
99
+
100
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
101
+ "note": "Please use gpt-4o-mini for your experiments."},
102
+
103
+ {"phases": ["running experiments"],
104
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
105
+
106
+ {"phases": ["running experiments"],
107
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
108
+
109
+ {"phases": ["running experiments"],
110
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
111
+
112
+ {"phases": ["data preparation", "running experiments"],
113
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
114
+
115
+ {"phases": ["data preparation", "running experiments"],
116
+ "note": "Generate figures with very colorful and artistic design."},
117
+ ]
118
+ ```
119
+
120
+ --------
121
+
122
+ #### [Consejo #2] 🚀 ¡Usar modelos más potentes generalmente conduce a una mejor investigación! 🚀
123
+
124
+ Al realizar investigaciones, **la elección del modelo puede impactar significativamente la calidad de los resultados**. Los modelos más potentes tienden a tener mayor precisión, mejores capacidades de razonamiento y mejor generación de informes. Si los recursos computacionales lo permiten, prioriza el uso de modelos avanzados como o1-(mini/preview) o modelos de lenguaje grandes similares de última generación.
125
+
126
+ Sin embargo, **es importante equilibrar el rendimiento y la rentabilidad**. Aunque los modelos potentes pueden ofrecer mejores resultados, a menudo son más costosos y requieren más tiempo para ejecutarse. Considera usarlos de manera selectiva, por ejemplo, para experimentos clave o análisis finales, mientras confías en modelos más pequeños y eficientes para tareas iterativas o prototipos iniciales.
127
+
128
+ Cuando los recursos son limitados, **optimiza ajustando finamente modelos más pequeños** en tu conjunto de datos específico o combinando modelos preentrenados con prompts específicos para tareas para lograr el equilibrio deseado entre rendimiento y eficiencia computacional.
129
+
130
+ -----
131
+
132
+ #### [Consejo #3] ✅ Puedes cargar guardados anteriores desde puntos de control ✅
133
+
134
+ **Si pierdes progreso, la conexión a internet o si una subtarea falla, siempre puedes cargar desde un estado anterior.** Todo tu progreso se guarda por defecto en la variable `state_saves`, que almacena cada punto de control individual. Simplemente pasa los siguientes argumentos al ejecutar `ai_lab_repo.py`
135
+
136
+ ```bash
137
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
138
+ ```
139
+
140
+ -----
141
+
142
+ #### [Consejo #4] 🈯 Si estás ejecutando en un idioma que no sea inglés 🈲
143
+
144
+ Si estás ejecutando Agent Laboratory en un idioma que no sea inglés, no hay problema, solo asegúrate de proporcionar una bandera de idioma a los agentes para realizar la investigación en tu idioma preferido. Ten en cuenta que no hemos estudiado extensivamente la ejecución de Agent Laboratory en otros idiomas, así que asegúrate de reportar cualquier problema que encuentres.
145
+
146
+ Por ejemplo, si estás ejecutando en chino:
147
+
148
+ ```bash
149
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
150
+ ```
151
+
152
+ ----
153
+
154
+ #### [Consejo #5] 🌟 Hay mucho margen para mejorar 🌟
155
+
156
+ Hay mucho margen para mejorar esta base de código, así que si terminas haciendo cambios y quieres ayudar a la comunidad, ¡no dudes en compartir los cambios que has realizado! ¡Esperamos que esta herramienta te sea de ayuda!
157
+
158
+ ## Referencia / Bibtex
159
+
160
+
161
+
162
+ ```bibtex
163
+ @preprint{schmidgall2025AgentLaboratory,
164
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
165
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
166
+ year={2025}
167
+ }
168
+ ```
readme/README-turkish.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Laboratuvarı: LLM Ajanlarını Araştırma Asistanı Olarak Kullanma
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | Türkçe | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | <a href="../readme/README-vietnamese.md">Tiếng Việt</a> | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【🌐 <a href="https://agentlaboratory.github.io/">Website</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Example Paper</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
14
+ </p>
15
+
16
+ ## 📖 Genel Bakış
17
+
18
+ - **Agent Laboratuvarı**, **araştırma fikirlerinizi uygulamanıza** yardımcı olmak amacıyla **siz** insan araştırmacıyı desteklemek için tasarlanmış uçtan uca otonom bir araştırma iş akışıdır. Agent Laboratuvarı, literatür taramaları yapmaktan planlar oluşturmaya, deneyler yürütmekten kapsamlı raporlar yazmaya kadar tüm araştırma süreci boyunca sizi desteklemek için büyük dil modelleriyle desteklenen uzman ajanlardan oluşur.
19
+ - Bu sistem, yaratıcılığınızı yerine koymak için değil, onu tamamlamak için tasarlanmıştır; böylece kodlama ve dokümantasyon gibi tekrarlayan ve zaman alıcı görevleri otomatikleştirirken, fikir üretimi ve eleştirel düşünmeye odaklanabilirsiniz. Farklı düzeylerde hesaplama kaynakları ve insan katılımını karşılayarak, Agent Laboratuvarı bilimsel keşfi hızlandırmayı ve araştırma verimliliğinizi optimize etmeyi amaçlamaktadır.
20
+
21
+ <p align="center">
22
+ <img src="../media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
23
+ </p>
24
+
25
+ ### 🔬 Agent Laboratuvarı Nasıl Çalışır?
26
+
27
+ - Agent Laboratuvarı, araştırma sürecini sistematik olarak yönlendiren üç ana aşamadan oluşur: (1) Literatür Taraması, (2) Deney Yapma ve (3) Rapor Yazımı. Her aşamada, LLM'ler tarafından yönlendirilen uzman ajanlar, arXiv, Hugging Face, Python ve LaTeX gibi dış araçları entegre ederek farklı hedeflere ulaşmak için iş birliği yapar ve sonuçları optimize eder. Bu yapılandırılmış iş akışı, ilgili araştırma makalelerinin bağımsız olarak toplanması ve analiz edilmesiyle başlar, ortak planlama ve veri hazırlama aşamalarından geçer ve otomatik deney yapma ile kapsamlı rapor oluşturma ile sona erer. Bu aşamalarda belirli ajan rollerinin ve katkılarının detayları makalede tartışılmaktadır.
28
+
29
+ <p align="center">
30
+ <img src="../media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+ ## 🖥️ Kurulum
34
+
35
+ ### Python venv seçeneği
36
+
37
+ 1. **GitHub Deposu Klonlayın**: Depoyu aşağıdaki komutu kullanarak klonlayarak başlayın:
38
+ ```bash
39
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
40
+ ```
41
+
42
+ 2. **Python Ortamını Kurun ve Aktif Hale Getirin**
43
+ ```bash
44
+ python -m venv venv_agent_lab
45
+ ```
46
+
47
+ - Şimdi bu ortamı etkinleştirin:
48
+ ```bash
49
+ source venv_agent_lab/bin/activate
50
+ ```
51
+
52
+ 3. **Gerekli Kütüphaneleri Yükleyin**
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ 4. **pdflatex'i Yükleyin [SEÇENEKSEL]**
58
+ ```bash
59
+ sudo apt install pdflatex
60
+ ```
61
+
62
+ - Bu, ajanların LaTeX kaynaklarını derleyebilmesini sağlar.
63
+ - **[ÖNEMLİ]** Bu adımı sudo erişiminiz yoksa çalıştıramıyorsanız, Agent Laboratuvarı'nı çalıştırırken --compile_latex bayrağını false olarak ayarlayarak PDF derlemeyi kapatabilirsiniz: `--compile_latex=False`
64
+
65
+ 5. **Şimdi Agent Laboratuvarı'nı Çalıştırın!**
66
+ ```bash
67
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
68
+ ```
69
+
70
+ veya, pdflatex yüklü değilse
71
+
72
+ ```bash
73
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
74
+ ```
75
+
76
+ -----
77
+ ## Daha İyi Araştırma Sonuçları için İpuçları
78
+
79
+ #### [İpucu #1] 📝 Kapsamlı Notlar Yazdığınızdan Emin Olun! 📝
80
+
81
+ **Kapsamlı notlar yazmak**, ajanın projenizde neyi başarmak istediğinizi ve herhangi bir stil tercihlerinizi anlamasına yardımcı olduğu için önemlidir. Notlar, ajanların gerçekleştirmesini istediğiniz deneyler, API anahtarları sağlamak, dahil edilmesini istediğiniz belirli grafikler veya figürler veya araştırma yaparken ajanın bilmesi gereken her şey gibi unsurları içerebilir.
82
+
83
+ Ayrıca, ajana **erişebileceği hesaplama kaynaklarını** bildirmeniz için bir fırsattır, örneğin GPU'lar (kaç tane, hangi tür GPU, kaç GB), CPU'lar (kaç çekirdek, hangi tür CPU'lar), depolama sınırlamaları ve donanım özellikleri.
84
+
85
+ Not eklemek için, ai_lab_repo.py içindeki task_notes_LLM yapısını değiştirmeniz gerekir. Aşağıda, bazı deneylerimizde kullanılan örnek notlar verilmiştir.
86
+
87
+ ```python
88
+ task_notes_LLM = [
89
+ {"phases": ["plan formulation"],
90
+ "note": f"You should come up with a plan for TWO experiments."},
91
+
92
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
93
+ "note": "Please use gpt-4o-mini for your experiments."},
94
+
95
+ {"phases": ["running experiments"],
96
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
97
+
98
+ {"phases": ["running experiments"],
99
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
100
+
101
+ {"phases": ["running experiments"],
102
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
103
+
104
+ {"phases": ["data preparation", "running experiments"],
105
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
106
+
107
+ {"phases": ["data preparation", "running experiments"],
108
+ "note": "Generate figures with very colorful and artistic design."},
109
+ ]
110
+ ```
111
+
112
+ --------
113
+
114
+ #### [İpucu #2] 🚀 Daha Güçlü Modeller Kullanmak Genellikle Daha İyi Araştırma Sonuçlarına Yol Açar 🚀
115
+
116
+ Araştırma yaparken, **model seçimi sonuçların kalitesi üzerinde önemli bir etkiye sahip olabilir**. Daha güçlü modeller genellikle daha yüksek doğruluk, daha iyi akıl yürütme yetenekleri ve daha iyi rapor oluşturma özelliklerine sahiptir. Hesaplama kaynaklarınız izin veriyorsa, o1-(mini/preview) gibi gelişmiş modellerin veya benzeri en son büyük dil modellerinin kullanımını önceliklendirin.
117
+
118
+ Ancak, **performans ve maliyet etkinliği arasında denge kurmak önemlidir**. Güçlü modeller daha iyi sonuçlar verebilirken, genellikle çalıştırmaları daha pahalı ve zaman alıcıdır. Bunları seçici olarak kullanmayı düşünün—örneğin, ana deneyler veya son analizler için—iteratif görevler veya ilk prototipler için daha küçük, daha verimli modelleri kullanmaya devam edin.
119
+
120
+ Kaynaklar sınırlı olduğunda, **daha küçük modelleri özel veri setinizde ince ayar yaparak veya görev odaklı istemlerle önceden eğitilmiş modelleri birleştirerek performans ve hesaplama verimliliği arasında istenen dengeyi sağlayın**.
121
+
122
+ -----
123
+
124
+ #### [İpucu #3] ✅ Önceki Kontrol Noktalarından Kaydedilenleri Yükleyebilirsiniz ✅
125
+
126
+ **İlerlemenizi kaybederseniz, internet bağlantınız kesilirse veya bir alt görev başarısız olursa, her zaman önceki bir durumdan yükleme yapabilirsiniz.** Tüm ilerlemeniz varsayılan olarak her bir kontrol noktasını saklayan state_saves değişkeninde kaydedilir. ai_lab_repo.py çalıştırılırken aşağıdaki argümanları geçmeniz yeterlidir:
127
+
128
+ ```bash
129
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
130
+ ```
131
+
132
+ -----
133
+
134
+ #### [İpucu #4] 🈯 İngilizce Dışında Bir Dil Kullanıyorsanız 🈲
135
+
136
+ Agent Laboratuvarı'nı İngilizce dışında bir dilde çalıştırıyorsanız sorun yok, sadece ajanlara araştırmayı tercih ettiğiniz dilde gerçekleştirmeleri için bir dil bayrağı sağlamanız yeterlidir. Agent Laboratuvarı'nı diğer dillerde çalıştırmayı kapsamlı bir şekilde incelemediğimizi unutmayın, bu yüzden karşılaştığınız herhangi bir problemi bildirdiğinizden emin olun.
137
+
138
+ Örneğin, Çincede çalıştırıyorsanız:
139
+
140
+ ```bash
141
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
142
+ ```
143
+
144
+ ----
145
+
146
+ #### [İpucu #5] 🌟 Geliştirme İçin Çok Fazla Alan Var 🌟
147
+
148
+ Bu kod tabanını geliştirmek için çok fazla alan var, bu yüzden değişiklik yaparsanız ve topluluğa yardımcı olmak isterseniz, yaptığınız değişiklikleri paylaşmaktan çekinmeyin! Umarız bu araç size yardımcı olur!
149
+
150
+ ## Referans / Bibtex
151
+
152
+ ```bibtex
153
+ @preprint{schmidgall2025AgentLaboratory,
154
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
155
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
156
+ year={2025}
157
+ }
158
+ ```
readme/README-vietnamese.md ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Laboratory: Sử dụng Đại Diện LLM làm Trợ Lý Nghiên Cứu
2
+
3
+ <p align="center">
4
+ <img src="../media/AgentLabLogo.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
5
+ </p>
6
+
7
+
8
+ <p align="center">
9
+ 【<a href="../README.md">English</a> | <a href="../readme/README-chinese.md">中文</a> | <a href="../readme/README-japanese.md">日本語</a> | <a href="../readme/README-korean.md">한국어</a> | <a href="../readme/README-filipino.md">Filipino</a> | <a href="../readme/README-french.md">Français</a> | <a href="../readme/README-slovak.md">Slovenčina</a> | <a href="../readme/README-portugese.md">Português</a> | <a href="../readme/README-spanish.md">Español</a> | <a href="../readme/README-turkish.md">Türkçe</a> | <a href="../readme/README-hindi.md">हिंदी</a> | <a href="../readme/README-bengali.md">বাংলা</a> | Tiếng Việt | <a href="../readme/README-russian.md">Русский</a> | <a href="../readme/README-arabic.md">العربية</a> | <a href="../readme/README-farsi.md">فارسی</a> | <a href="../readme/README-italian.md">Italiano</a>】
10
+ </p>
11
+
12
+ <p align="center">
13
+ 【🌐 <a href="https://agentlaboratory.github.io/">Website</a> | 💻 <a href="https://github.com/SamuelSchmidgall/AgentLaboratory">Software</a> | 🎥 <a href="https://agentlaboratory.github.io/#youtube-video">Video</a> | 📚 <a href="https://agentlaboratory.github.io/#examples-goto">Example Paper</a> | 📰 <a href="https://agentlaboratory.github.io/#citation-ref">Citation</a>】
14
+ </p>
15
+
16
+ ## 📖 Tổng Quan
17
+
18
+ - **Agent Laboratory** là một quy trình nghiên cứu tự động từ đầu đến cuối, nhằm hỗ trợ **bạn** với tư cách là nhà nghiên cứu con người trong việc **triển khai các ý tưởng nghiên cứu của bạn**. Agent Laboratory bao gồm các đại diện chuyên biệt được điều khiển bởi các mô hình ngôn ngữ lớn để hỗ trợ bạn trong toàn bộ quy trình nghiên cứu—từ việc thực hiện đánh giá tài liệu và xây dựng kế hoạch đến thực hiện các thí nghiệm và viết các báo cáo toàn diện.
19
+ - Hệ thống này không được thiết kế để thay thế sự sáng tạo của bạn mà để bổ sung cho nó, cho phép bạn tập trung vào ý tưởng và tư duy phản biện trong khi tự động hóa các nhiệm vụ lặp đi lặp lại và tốn thời gian như mã hóa và tài liệu hóa. Bằng cách đáp ứng các mức độ tài nguyên tính toán và sự tham gia của con người khác nhau, Agent Laboratory nhằm mục tiêu tăng tốc khám phá khoa học và tối ưu hóa năng suất nghiên cứu của bạn.
20
+
21
+ <p align="center">
22
+ <img src="../media/AgentLab.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
23
+ </p>
24
+
25
+ ### 🔬 Agent Laboratory hoạt động như thế nào?
26
+
27
+ - Agent Laboratory bao gồm ba giai đoạn chính hướng dẫn hệ thống quy trình nghiên cứu một cách có hệ thống: (1) Đánh giá Tài liệu, (2) Thực nghiệm, và (3) Viết Báo cáo. Trong mỗi giai đoạn, các đại diện chuyên biệt được điều khiển bởi LLM hợp tác để đạt được các mục tiêu riêng biệt, tích hợp các công cụ bên ngoài như arXiv, Hugging Face, Python, và LaTeX để tối ưu hóa kết quả. Quy trình làm việc có cấu trúc này bắt đầu với việc thu thập và phân tích độc lập các bài báo nghiên cứu liên quan, tiến tới lập kế hoạch hợp tác và chuẩn bị dữ liệu, và kết thúc với việc thực hiện các thí nghiệm tự động và tạo báo cáo toàn diện. Chi tiết về các vai trò cụ thể của đại diện và đóng góp của họ trong các giai đoạn này được thảo luận trong bài báo.
28
+
29
+ <p align="center">
30
+ <img src="../media/AgentLabWF.png" alt="Demonstration of the flow of AgentClinic" style="width: 99%;">
31
+ </p>
32
+
33
+ ## 🖥️ Cài Đặt
34
+
35
+ ### Tùy chọn môi trường ảo Python
36
+
37
+
38
+ 1. **Nhân bản kho lưu trữ GitHub**: Bắt đầu bằng cách nhân bản kho lưu trữ bằng lệnh:
39
+ ```bash
40
+ git clone git@github.com:SamuelSchmidgall/AgentLaboratory.git
41
+ ```
42
+
43
+ 2. **Thiết lập và Kích hoạt Môi trường Python**
44
+ ```bash
45
+ python -m venv venv_agent_lab
46
+ ```
47
+
48
+ - Bây giờ kích hoạt môi trường này:
49
+ ```bash
50
+ source venv_agent_lab/bin/activate
51
+ ```
52
+
53
+ 3. **Cài đặt các thư viện cần thiết**
54
+ ```bash
55
+ pip install -r requirements.txt
56
+ ```
57
+
58
+ 4. **Cài đặt pdflatex [TUÝ CHỌN]**
59
+ ```bash
60
+ sudo apt install pdflatex
61
+ ```
62
+
63
+ - Điều này cho phép mã nguồn latex được biên dịch bởi các đại diện.
64
+ - **[QUAN TRỌNG]** Nếu bước này không thể chạy do không có quyền sudo, việc biên dịch pdf có thể được tắt bằng cách chạy Agent Laboratory với cờ --compile_latex đặt thành false: --compile_latex=False
65
+
66
+ 5. **Bây giờ chạy Agent Laboratory!**
67
+
68
+ ```bash
69
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA"
70
+ ```
71
+
72
+ hoặc, nếu bạn không cài đặt pdflatex
73
+
74
+ ```bash
75
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --llm-backend "o1-mini" --research-topic "YOUR RESEARCH IDEA" --compile_latex=False
76
+ ```
77
+
78
+ -----
79
+
80
+ ## Mẹo để đạt được kết quả nghiên cứu tốt hơn
81
+
82
+
83
+ #### [Mẹo #1] 📝 Hãy chắc chắn ghi chép kỹ lưỡng! 📝
84
+
85
+ **Việc ghi chép kỹ lưỡng là quan trọng** để giúp đại diện của bạn hiểu bạn đang muốn đạt được điều gì trong dự án của mình, cũng như bất kỳ sở thích về phong cách nào. Ghi chú có thể bao gồm bất kỳ thí nghiệm nào bạn muốn các đại diện thực hiện, cung cấp các khóa API, các biểu đồ hoặc hình vẽ cụ thể bạn muốn bao gồm, hoặc bất cứ điều gì bạn muốn đại diện biết khi thực hiện nghiên cứu.
86
+
87
+ Đây cũng là cơ hội của bạn để cho đại diện biết **các tài nguyên tính toán mà nó có quyền truy cập**, ví dụ: GPU (số lượng, loại GPU, số GB), CPU (số lượng lõi, loại CPU), hạn chế về lưu trữ, và các thông số phần cứng.
88
+
89
+ Để thêm ghi chú, bạn phải sửa cấu trúc task_notes_LLM bên trong ai_lab_repo.py. Dưới đây là một ví dụ về bộ ghi chú được sử dụng cho một số thí nghiệm của chúng tôi.
90
+
91
+
92
+ ```python
93
+ task_notes_LLM = [
94
+ {"phases": ["plan formulation"],
95
+ "note": f"You should come up with a plan for TWO experiments."},
96
+
97
+ {"phases": ["plan formulation", "data preparation", "running experiments"],
98
+ "note": "Please use gpt-4o-mini for your experiments."},
99
+
100
+ {"phases": ["running experiments"],
101
+ "note": f"Use the following code to inference gpt-4o-mini: \nfrom openai import OpenAI\nos.environ["OPENAI_API_KEY"] = "{api_key}"\nclient = OpenAI()\ncompletion = client.chat.completions.create(\nmodel="gpt-4o-mini-2024-07-18", messages=messages)\nanswer = completion.choices[0].message.content\n"},
102
+
103
+ {"phases": ["running experiments"],
104
+ "note": f"You have access to only gpt-4o-mini using the OpenAI API, please use the following key {api_key} but do not use too many inferences. Do not use openai.ChatCompletion.create or any openai==0.28 commands. Instead use the provided inference code."},
105
+
106
+ {"phases": ["running experiments"],
107
+ "note": "I would recommend using a small dataset (approximately only 100 data points) to run experiments in order to save time. Do not use much more than this unless you have to or are running the final tests."},
108
+
109
+ {"phases": ["data preparation", "running experiments"],
110
+ "note": "You are running on a MacBook laptop. You can use 'mps' with PyTorch"},
111
+
112
+ {"phases": ["data preparation", "running experiments"],
113
+ "note": "Generate figures with very colorful and artistic design."},
114
+ ]
115
+ ```
116
+
117
+ --------
118
+
119
+ #### [Mẹo #2] 🚀 Sử dụng các mô hình mạnh mẽ hơn thường dẫn đến nghiên cứu tốt hơn 🚀
120
+
121
+ Khi tiến hành nghiên cứu, **lựa chọn mô hình có thể ảnh hưởng đáng kể đến chất lượng kết quả**. Các mô hình mạnh mẽ hơn thường có độ chính xác cao hơn, khả năng lý luận tốt hơn và khả năng tạo báo cáo tốt hơn. Nếu tài nguyên tính toán cho phép, hãy ưu tiên sử dụng các mô hình tiên tiến như o1-(mini/preview) hoặc các mô hình ngôn ngữ lớn tiên tiến tương tự.
122
+
123
+ Tuy nhiên, **quan trọng là phải cân bằng giữa hiệu suất và chi phí hiệu quả**. Trong khi các mô hình mạnh mẽ có thể mang lại kết quả tốt hơn, chúng thường đắt hơn và tốn thời gian chạy. Hãy cân nhắc sử dụng chúng một cách chọn lọc—ví dụ, cho các thí nghiệm chính hoặc phân tích cuối cùng—trong khi dựa vào các mô hình nhỏ hơn, hiệu quả hơn cho các nhiệm vụ lặp đi lặp lại hoặc phát mẫu ban đầu.
124
+
125
+ Khi tài nguyên hạn chế, **tối ưu hóa bằng cách tinh chỉnh các mô hình nhỏ hơn** trên bộ dữ liệu cụ thể của bạn hoặc kết hợp các mô hình đã được huấn luyện trước với các gợi ý cụ thể cho nhiệm vụ để đạt được sự cân bằng mong muốn giữa hiệu suất và hiệu quả tính toán.
126
+
127
+ -----
128
+
129
+ #### [Mẹo #3] ✅ Bạn có thể tải lại các lưu trạng thái trước từ các điểm kiểm tra ✅
130
+
131
+ **Nếu bạn mất tiến độ, kết nối internet, hoặc nếu một nhiệm vụ phụ thất bại, bạn luôn có thể tải lại từ trạng thái trước đó.** Tất cả tiến độ của bạn được lưu mặc định trong biến state_saves, lưu trữ từng điểm kiểm tra riêng lẻ. Chỉ cần truyền các tham số sau khi chạy ai_lab_repo.py
132
+
133
+ ```bash
134
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA" --llm-backend "o1-mini" --load-existing True --load-existing-path "save_states/LOAD_PATH"
135
+ ```
136
+
137
+ -----
138
+
139
+ #### [Mẹo #4] 🈯 Nếu bạn đang chạy bằng ngôn ngữ khác tiếng Anh 🈲
140
+
141
+ Nếu bạn đang chạy Agent Laboratory bằng ngôn ngữ khác tiếng Anh, không vấn đề gì, chỉ cần đảm bảo cung cấp cờ ngôn ngữ cho các đại diện để thực hiện nghiên cứu bằng ngôn ngữ bạn mong muốn. Lưu ý rằng chúng tôi chưa nghiên cứu kỹ việc chạy Agent Laboratory bằng các ngôn ngữ khác, vì vậy hãy chắc chắn báo cáo bất kỳ vấn đề nào bạn gặp phải.
142
+
143
+ Ví dụ, nếu bạn đang chạy bằng tiếng Trung:
144
+
145
+ ```bash
146
+ python ai_lab_repo.py --api-key "API_KEY_HERE" --research-topic "YOUR RESEARCH IDEA (in your language)" --llm-backend "o1-mini" --language "中文"
147
+ ```
148
+
149
+ ----
150
+
151
+ #### [Mẹo #5] 🌟 Có rất nhiều cơ hội để cải thiện 🌟
152
+
153
+ Có rất nhiều cơ hội để cải thiện cơ sở mã này, vì vậy nếu bạn cuối cùng thay đổi và muốn giúp cộng đồng, hãy cảm thấy tự do chia sẻ các thay đổi mà bạn đã thực hiện! Chúng tôi hy vọng công cụ này sẽ giúp bạn!
154
+
155
+ ## Tài liệu Tham khảo / Bibtex
156
+
157
+ ```bibtex
158
+ @preprint{schmidgall2025AgentLaboratory,
159
+ title={Agent Laboratory: Using LLM Agents as Research Assistants},
160
+ author={Schmidgall, Samuel and Su, Yusheng and Wang, Ze and Sun, Ximeng and Wu, Jialian and Yu, Xiadong and Liu, Jiang, Liu, Zicheng and Barsoum, Emad},
161
+ year={2025}
162
+ }
163
+ ```
requirements.txt ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ absl-py==2.1.0
2
+ accelerate==1.1.1
3
+ aiohappyeyeballs==2.4.3
4
+ aiohttp==3.11.7
5
+ aiosignal==1.3.1
6
+ annotated-types==0.7.0
7
+ anthropic==0.39.0
8
+ anyio==4.6.2.post1
9
+ arxiv==2.1.3
10
+ astunparse==1.6.3
11
+ async-timeout==5.0.1
12
+ attrs==24.2.0
13
+ blis==1.0.1
14
+ catalogue==2.0.10
15
+ certifi==2024.8.30
16
+ charset-normalizer==3.4.0
17
+ click==8.1.7
18
+ cloudpathlib==0.20.0
19
+ confection==0.1.5
20
+ contourpy==1.3.0
21
+ cycler==0.12.1
22
+ cymem==2.0.10
23
+ datasets==3.1.0
24
+ diffusers==0.31.0
25
+ dill==0.3.8
26
+ distro==1.9.0
27
+ exceptiongroup==1.2.2
28
+ feedparser==6.0.11
29
+ filelock==3.16.1
30
+ flatbuffers==24.3.25
31
+ fonttools==4.55.0
32
+ frozenlist==1.5.0
33
+ fsspec==2024.9.0
34
+ gast==0.6.0
35
+ google-pasta==0.2.0
36
+ grpcio==1.68.0
37
+ h11==0.14.0
38
+ h5py==3.12.1
39
+ httpcore==1.0.7
40
+ httpx==0.27.2
41
+ huggingface-hub==0.26.2
42
+ idna==3.10
43
+ imageio==2.36.0
44
+ importlib_metadata==8.5.0
45
+ importlib_resources==6.4.5
46
+ Jinja2==3.1.4
47
+ jiter==0.7.1
48
+ joblib==1.4.2
49
+ keras==3.7.0
50
+ kiwisolver==1.4.7
51
+ langcodes==3.5.0
52
+ language_data==1.3.0
53
+ lazy_loader==0.4
54
+ libclang==18.1.1
55
+ marisa-trie==1.2.1
56
+ Markdown==3.7
57
+ markdown-it-py==3.0.0
58
+ MarkupSafe==3.0.2
59
+ matplotlib==3.9.2
60
+ mdurl==0.1.2
61
+ ml-dtypes==0.4.1
62
+ mpmath==1.3.0
63
+ multidict==6.1.0
64
+ multiprocess==0.70.16
65
+ murmurhash==1.0.11
66
+ namex==0.0.8
67
+ nest-asyncio==1.6.0
68
+ networkx==3.2.1
69
+ nltk==3.9.1
70
+ numpy==2.0.2
71
+ openai==1.55.1
72
+ opt_einsum==3.4.0
73
+ optree==0.13.1
74
+ packaging==24.2
75
+ pandas==2.2.3
76
+ patsy==1.0.1
77
+ pillow==11.0.0
78
+ plotly==5.24.1
79
+ preshed==3.0.9
80
+ propcache==0.2.0
81
+ protobuf==5.28.3
82
+ psutil==6.1.0
83
+ pyarrow==18.1.0
84
+ pydantic==2.10.2
85
+ pydantic_core==2.27.1
86
+ Pygments==2.18.0
87
+ pyparsing==3.2.0
88
+ pypdf==5.1.0
89
+ python-dateutil==2.9.0.post0
90
+ pytz==2024.2
91
+ PyYAML==6.0.2
92
+ regex==2024.11.6
93
+ requests==2.32.3
94
+ rich==13.9.4
95
+ sacremoses==0.1.1
96
+ safetensors==0.4.5
97
+ scikit-image==0.24.0
98
+ scikit-learn==1.5.2
99
+ scipy==1.13.1
100
+ seaborn==0.13.2
101
+ semanticscholar==0.8.4
102
+ sgmllib3k==1.0.0
103
+ shellingham==1.5.4
104
+ six==1.16.0
105
+ smart-open==7.0.5
106
+ sniffio==1.3.1
107
+ spacy==3.8.2
108
+ spacy-legacy==3.0.12
109
+ spacy-loggers==1.0.5
110
+ srsly==2.4.8
111
+ statsmodels==0.14.4
112
+ sympy==1.13.1
113
+ tenacity==9.0.0
114
+ termcolor==2.5.0
115
+ thinc==8.3.2
116
+ threadpoolctl==3.5.0
117
+ tifffile==2024.8.30
118
+ tiktoken==0.8.0
119
+ tokenizers==0.20.4
120
+ torch==2.5.1
121
+ tqdm==4.67.1
122
+ transformers==4.46.3
123
+ typer==0.13.1
124
+ typing_extensions==4.12.2
125
+ tzdata==2024.2
126
+ urllib3==2.2.3
127
+ wasabi==1.1.3
128
+ weasel==0.4.1
129
+ Werkzeug==3.1.3
130
+ wrapt==1.17.0
131
+ xxhash==3.5.0
132
+ yarl==1.18.0
133
+ zipp==3.21.0
134
+ google-generativeai
135
+ PyPDF2
tools.py ADDED
@@ -0,0 +1,325 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from utils import *
2
+
3
+ import os
4
+ import time
5
+ import arxiv
6
+ import io, sys
7
+ import traceback
8
+ import matplotlib
9
+ import numpy as np
10
+ import multiprocessing
11
+ from pypdf import PdfReader
12
+ from datasets import load_dataset
13
+ from psutil._common import bytes2human
14
+ from datasets import load_dataset_builder
15
+ from semanticscholar import SemanticScholar
16
+ from sklearn.metrics.pairwise import linear_kernel
17
+ from sklearn.feature_extraction.text import TfidfVectorizer
18
+
19
+
20
+
21
+ class HFDataSearch:
22
+ def __init__(self, like_thr=3, dwn_thr=50) -> None:
23
+ """
24
+ Class for finding relevant huggingface datasets
25
+ :param like_thr:
26
+ :param dwn_thr:
27
+ """
28
+ self.dwn_thr = dwn_thr
29
+ self.like_thr = like_thr
30
+ self.ds = load_dataset("nkasmanoff/huggingface-datasets")["train"]
31
+
32
+ # Initialize lists to collect filtered data
33
+ filtered_indices = []
34
+ filtered_descriptions = []
35
+ filtered_likes = []
36
+ filtered_downloads = []
37
+
38
+ # Iterate over the dataset and filter based on criteria
39
+ for idx, item in enumerate(self.ds):
40
+ # Get likes and downloads, handling None values
41
+ likes = int(item['likes']) if item['likes'] is not None else 0
42
+ downloads = int(item['downloads']) if item['downloads'] is not None else 0
43
+
44
+ # Check if likes and downloads meet the thresholds
45
+ if likes >= self.like_thr and downloads >= self.dwn_thr:
46
+ # Check if the description is a non-empty string
47
+ description = item['description']
48
+ if isinstance(description, str) and description.strip():
49
+ # Collect the data
50
+ filtered_indices.append(idx)
51
+ filtered_descriptions.append(description)
52
+ filtered_likes.append(likes)
53
+ filtered_downloads.append(downloads)
54
+
55
+ # Check if any datasets meet all criteria
56
+ if not filtered_indices:
57
+ print("No datasets meet the specified criteria.")
58
+ self.ds = []
59
+ self.descriptions = []
60
+ self.likes_norm = []
61
+ self.downloads_norm = []
62
+ self.description_vectors = None
63
+ return # Exit the constructor
64
+
65
+ # Filter the datasets using the collected indices
66
+ self.ds = self.ds.select(filtered_indices)
67
+
68
+ # Update descriptions, likes, and downloads
69
+ self.descriptions = filtered_descriptions
70
+ self.likes = np.array(filtered_likes)
71
+ self.downloads = np.array(filtered_downloads)
72
+
73
+ # Normalize likes and downloads
74
+ self.likes_norm = self._normalize(self.likes)
75
+ self.downloads_norm = self._normalize(self.downloads)
76
+
77
+ # Vectorize the descriptions
78
+ self.vectorizer = TfidfVectorizer()
79
+ self.description_vectors = self.vectorizer.fit_transform(self.descriptions)
80
+
81
+ def _normalize(self, arr):
82
+ min_val = arr.min()
83
+ max_val = arr.max()
84
+ if max_val - min_val == 0:
85
+ return np.zeros_like(arr, dtype=float)
86
+ return (arr - min_val) / (max_val - min_val)
87
+
88
+ def retrieve_ds(self, query, N=10, sim_w=1.0, like_w=0.0, dwn_w=0.0):
89
+ """
90
+ Retrieves the top N datasets matching the query, weighted by likes and downloads.
91
+ :param query: The search query string.
92
+ :param N: The number of results to return.
93
+ :param sim_w: Weight for cosine similarity.
94
+ :param like_w: Weight for likes.
95
+ :param dwn_w: Weight for downloads.
96
+ :return: List of top N dataset items.
97
+ """
98
+ if not self.ds or self.description_vectors is None:
99
+ print("No datasets available to search.")
100
+ return []
101
+
102
+ query_vector = self.vectorizer.transform([query])
103
+ cosine_similarities = linear_kernel(query_vector, self.description_vectors).flatten()
104
+ # Normalize cosine similarities
105
+ cosine_similarities_norm = self._normalize(cosine_similarities)
106
+ # Compute final scores
107
+ final_scores = (
108
+ sim_w * cosine_similarities_norm +
109
+ like_w * self.likes_norm +
110
+ dwn_w * self.downloads_norm
111
+ )
112
+ # Get top N indices
113
+ top_indices = final_scores.argsort()[-N:][::-1]
114
+ # Convert indices to Python ints
115
+ top_indices = [int(i) for i in top_indices]
116
+ top_datasets = [self.ds[i] for i in top_indices]
117
+ # check if dataset has a test & train set
118
+ has_test_set = list()
119
+ has_train_set = list()
120
+ ds_size_info = list()
121
+ for i in top_indices:
122
+ try:
123
+ dbuilder = load_dataset_builder(self.ds[i]["id"], trust_remote_code=True).info
124
+ except Exception as e:
125
+ has_test_set.append(False)
126
+ has_train_set.append(False)
127
+ ds_size_info.append((None, None, None, None))
128
+ continue
129
+
130
+ if dbuilder.splits is None:
131
+ has_test_set.append(False)
132
+ has_train_set.append(False)
133
+ ds_size_info.append((None, None, None, None))
134
+ continue
135
+ # Print number of examples for
136
+ has_test, has_train = "test" in dbuilder.splits, "train" in dbuilder.splits
137
+ has_test_set.append(has_test)
138
+ has_train_set.append(has_train)
139
+ test_dwn_size, test_elem_size = None, None
140
+ train_dwn_size, train_elem_size = None, None
141
+ if has_test:
142
+ test_dwn_size = bytes2human(dbuilder.splits["test"].num_bytes)
143
+ test_elem_size = dbuilder.splits["test"].num_examples
144
+ if has_train:
145
+ train_dwn_size = bytes2human(dbuilder.splits["train"].num_bytes)
146
+ train_elem_size = dbuilder.splits["train"].num_examples
147
+ ds_size_info.append((test_dwn_size, test_elem_size, train_dwn_size, train_elem_size))
148
+ for _i in range(len(top_datasets)):
149
+ top_datasets[_i]["has_test_set"] = has_test_set[_i]
150
+ top_datasets[_i]["has_train_set"] = has_train_set[_i]
151
+ top_datasets[_i]["test_download_size"] = ds_size_info[_i][0]
152
+ top_datasets[_i]["test_element_size"] = ds_size_info[_i][1]
153
+ top_datasets[_i]["train_download_size"] = ds_size_info[_i][2]
154
+ top_datasets[_i]["train_element_size"] = ds_size_info[_i][3]
155
+ return top_datasets
156
+
157
+ def results_str(self, results):
158
+ """
159
+ Provide results as list of results in human-readable format.
160
+ :param results: (list(dict)) list of results from search
161
+ :return: (list(str)) list of results in human-readable format
162
+ """
163
+ result_strs = list()
164
+ for result in results:
165
+ res_str = f"Dataset ID: {result['id']}\n"
166
+ res_str += f"Description: {result['description']}\n"
167
+ res_str += f"Likes: {result['likes']}\n"
168
+ res_str += f"Downloads: {result['downloads']}\n"
169
+ res_str += f"Has Testing Set: {result['has_test_set']}\n"
170
+ res_str += f"Has Training Set: {result['has_train_set']}\n"
171
+ res_str += f"Test Download Size: {result['test_download_size']}\n"
172
+ res_str += f"Test Dataset Size: {result['test_element_size']}\n"
173
+ res_str += f"Train Download Size: {result['train_download_size']}\n"
174
+ res_str += f"Train Dataset Size: {result['train_element_size']}\n"
175
+ result_strs.append(res_str)
176
+ return result_strs
177
+
178
+
179
+ class SemanticScholarSearch:
180
+ def __init__(self):
181
+ self.sch_engine = SemanticScholar(retry=False)
182
+
183
+ def find_papers_by_str(self, query, N=10):
184
+ paper_sums = list()
185
+ results = self.sch_engine.search_paper(query, limit=N, min_citation_count=3, open_access_pdf=True)
186
+ for _i in range(len(results)):
187
+ paper_sum = f'Title: {results[_i].title}\n'
188
+ paper_sum += f'Abstract: {results[_i].abstract}\n'
189
+ paper_sum += f'Citations: {results[_i].citationCount}\n'
190
+ paper_sum += f'Release Date: year {results[_i].publicationDate.year}, month {results[_i].publicationDate.month}, day {results[_i].publicationDate.day}\n'
191
+ paper_sum += f'Venue: {results[_i].venue}\n'
192
+ paper_sum += f'Paper ID: {results[_i].externalIds["DOI"]}\n'
193
+ paper_sums.append(paper_sum)
194
+ return paper_sums
195
+
196
+ def retrieve_full_paper_text(self, query):
197
+ pass
198
+
199
+
200
+ class ArxivSearch:
201
+ def __init__(self):
202
+ # Construct the default API client.
203
+ self.sch_engine = arxiv.Client()
204
+
205
+ def _process_query(self, query: str) -> str:
206
+ """Process query string to fit within MAX_QUERY_LENGTH while preserving as much information as possible"""
207
+ MAX_QUERY_LENGTH = 300
208
+
209
+ if len(query) <= MAX_QUERY_LENGTH:
210
+ return query
211
+
212
+ # Split into words
213
+ words = query.split()
214
+ processed_query = []
215
+ current_length = 0
216
+
217
+ # Add words while staying under the limit
218
+ # Account for spaces between words
219
+ for word in words:
220
+ # +1 for the space that will be added between words
221
+ if current_length + len(word) + 1 <= MAX_QUERY_LENGTH:
222
+ processed_query.append(word)
223
+ current_length += len(word) + 1
224
+ else:
225
+ break
226
+
227
+ return ' '.join(processed_query)
228
+
229
+ def find_papers_by_str(self, query, N=20):
230
+ processed_query = self._process_query(query)
231
+ max_retries = 3
232
+ retry_count = 0
233
+
234
+ while retry_count < max_retries:
235
+ try:
236
+ search = arxiv.Search(
237
+ query="abs:" + processed_query,
238
+ max_results=N,
239
+ sort_by=arxiv.SortCriterion.Relevance)
240
+
241
+ paper_sums = list()
242
+ # `results` is a generator; you can iterate over its elements one by one...
243
+ for r in self.sch_engine.results(search):
244
+ paperid = r.pdf_url.split("/")[-1]
245
+ pubdate = str(r.published).split(" ")[0]
246
+ paper_sum = f"Title: {r.title}\n"
247
+ paper_sum += f"Summary: {r.summary}\n"
248
+ paper_sum += f"Publication Date: {pubdate}\n"
249
+ #paper_sum += f"Categories: {' '.join(r.categories)}\n"
250
+ paper_sum += f"arXiv paper ID: {paperid}\n"
251
+ paper_sums.append(paper_sum)
252
+ time.sleep(2.0)
253
+ return "\n".join(paper_sums)
254
+
255
+ except Exception as e:
256
+ retry_count += 1
257
+ if retry_count < max_retries:
258
+ time.sleep(2 * retry_count)
259
+ continue
260
+ return None
261
+
262
+ def retrieve_full_paper_text(self, query, MAX_LEN=50000):
263
+ pdf_text = str()
264
+ paper = next(arxiv.Client().results(arxiv.Search(id_list=[query])))
265
+ # Download the PDF to the PWD with a custom filename.
266
+ paper.download_pdf(filename="downloaded-paper.pdf")
267
+ # creating a pdf reader object
268
+ reader = PdfReader('downloaded-paper.pdf')
269
+ # Iterate over all the pages
270
+ for page_number, page in enumerate(reader.pages, start=1):
271
+ # Extract text from the page
272
+ try:
273
+ text = page.extract_text()
274
+ except Exception as e:
275
+ os.remove("downloaded-paper.pdf")
276
+ time.sleep(2.0)
277
+ return "EXTRACTION FAILED"
278
+
279
+ # Do something with the text (e.g., print it)
280
+ pdf_text += f"--- Page {page_number} ---"
281
+ pdf_text += text
282
+ pdf_text += "\n"
283
+ os.remove("downloaded-paper.pdf")
284
+ time.sleep(2.0)
285
+ return pdf_text[:MAX_LEN]
286
+
287
+
288
+ # Set the non-interactive backend early in the module
289
+ matplotlib.use('Agg')
290
+ import matplotlib.pyplot as plt
291
+
292
+ def worker_run_code(code_str, output_queue):
293
+ output_capture = io.StringIO()
294
+ sys.stdout = output_capture
295
+ try:
296
+ # Create a globals dictionary with __name__ set to "__main__"
297
+ globals_dict = {"__name__": "__main__"}
298
+ exec(code_str, globals_dict)
299
+ except Exception as e:
300
+ output_capture.write(f"[CODE EXECUTION ERROR]: {str(e)}\n")
301
+ traceback.print_exc(file=output_capture)
302
+ finally:
303
+ sys.stdout = sys.__stdout__
304
+ output_queue.put(output_capture.getvalue())
305
+
306
+ def execute_code(code_str, timeout=600, MAX_LEN=1000):
307
+ #code_str = code_str.replace("\\n", "\n")
308
+ code_str = "from utils import *\n" + code_str
309
+ if "load_dataset('pubmed" in code_str:
310
+ return "[CODE EXECUTION ERROR] pubmed Download took way too long. Program terminated"
311
+ if "exit(" in code_str:
312
+ return "[CODE EXECUTION ERROR] The exit() command is not allowed you must remove this."
313
+ output_queue = multiprocessing.Queue()
314
+ proc = multiprocessing.Process(target=worker_run_code, args=(code_str, output_queue))
315
+ proc.start()
316
+ proc.join(timeout)
317
+ if proc.is_alive():
318
+ proc.terminate() # Forcefully kill the process
319
+ proc.join()
320
+ return (f"[CODE EXECUTION ERROR]: Code execution exceeded the timeout limit of {timeout} seconds. "
321
+ "You must reduce the time complexity of your code.")
322
+ else:
323
+ if not output_queue.empty(): output = output_queue.get()
324
+ else: output = ""
325
+ return output
utils.py ADDED
@@ -0,0 +1,480 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os, re
2
+ import shutil
3
+ import time
4
+ import tiktoken, openai
5
+ import subprocess, string
6
+ from openai import OpenAI
7
+ import google.generativeai as genai
8
+ from huggingface_hub import InferenceClient
9
+
10
+
11
+ def query_deepseekv3(prompt, system, api_key, attempt=0, temperature=0.0):
12
+ try:
13
+ client = OpenAI(api_key=api_key, base_url="https://api.deepseek.com")
14
+ response = client.chat.completions.create(
15
+ model="deepseek-chat",
16
+ messages=[
17
+ {"role": "system", "content": system},
18
+ {"role": "user", "content": prompt},
19
+ ],
20
+ stream=False, temperature=temperature,
21
+ )
22
+ return response.choices[0].message.content
23
+ except Exception as e:
24
+ print(f"Query qwen error: {e}")
25
+ if attempt >= 10: return f"Your attempt to query deepseekv3 failed: {e}"
26
+ return query_deepseekv3(prompt, system, attempt+1)
27
+
28
+
29
+ def query_qwen(prompt, system, api_key, attempt=0, temperature=0.0):
30
+ try:
31
+ client = InferenceClient(api_key=api_key)
32
+ if system is not None:
33
+ messages = [
34
+ {"role": "system", "content": system},
35
+ {"role": "user", "content": prompt}]
36
+ else:
37
+ messages = [
38
+ {"role": "user", "content": prompt}]
39
+
40
+ completion = client.chat.completions.create(
41
+ model="Qwen/QwQ-32B",
42
+ messages=messages,
43
+ max_tokens=500,
44
+ temperature=temperature
45
+ )
46
+ return completion.choices[0].message.content.strip()
47
+ except Exception as e:
48
+ print(f"Query qwen error: {e}")
49
+ if attempt >= 10: return f"Your attempt to inference gemini failed: {e}"
50
+ return query_qwen(prompt, system, attempt+1)
51
+
52
+
53
+ def query_gpt4omini(prompt, system, api_key, attempt=0, temperature=0.0):
54
+ try:
55
+ openai_api_key = api_key
56
+ openai.api_key = openai_api_key
57
+ os.environ["OPENAI_API_KEY"] = openai_api_key
58
+ if system is not None:
59
+ messages = [
60
+ {"role": "system", "content": system},
61
+ {"role": "user", "content": prompt}]
62
+ else:
63
+ messages = [
64
+ {"role": "user", "content": prompt}]
65
+ client = OpenAI()
66
+ response = client.chat.completions.create(
67
+ model="gpt-4o-mini", messages=messages, temperature=temperature).choices[0].message.content.strip()
68
+ return response
69
+ except Exception as e:
70
+ print(f"Query 4o-mini error: {e}")
71
+ if attempt >= 10: return f"Your attempt to inference gemini failed: {e}"
72
+ return query_gpt4omini(prompt, system, attempt+1)
73
+
74
+
75
+
76
+ def query_gpt4o(prompt, system, api_key, attempt=0, temperature=0.0):
77
+ try:
78
+ openai_api_key = api_key
79
+ openai.api_key = openai_api_key
80
+ os.environ["OPENAI_API_KEY"] = openai_api_key
81
+ if system is not None:
82
+ messages = [
83
+ {"role": "user", "content":system + prompt}]
84
+ else:
85
+ messages = [
86
+ {"role": "user", "content": prompt}]
87
+ client = OpenAI()
88
+ response = client.chat.completions.create(
89
+ model="gpt-4o", messages=messages, temperature=temperature).choices[0].message.content.strip()
90
+ return response
91
+ except Exception as e:
92
+ print(f"Query gpr-4o error: {e}")
93
+ if attempt >= 10: return f"Your attempt to inference gemini failed: {e}"
94
+ return query_gpt4o(prompt, system, attempt+1)
95
+
96
+
97
+
98
+ def query_gemini(prompt, system, api_key, attempt=0, temperature=0.0):
99
+ try:
100
+ genai.configure(api_key=api_key)
101
+ model = genai.GenerativeModel(model_name="gemini-1.5-pro", system_instruction=system)
102
+ response = model.generate_content(prompt, generation_config=genai.types.GenerationConfig(temperature=temperature)).text.strip()
103
+ time.sleep(1)
104
+ return response
105
+ except Exception as e:
106
+ print(f"Gemini error: {e}")
107
+ if attempt >= 10: return f"Your attempt to inference gemini failed: {e}"
108
+ time.sleep(1)
109
+ return query_gemini(prompt, system, attempt+1)
110
+
111
+
112
+
113
+ def query_gemini2p0(prompt, system, api_key, attempt=0, temperature=0.0,):
114
+ try:
115
+ genai.configure(api_key=api_key)
116
+ model = genai.GenerativeModel(model_name="gemini-2.0-flash", system_instruction=system)
117
+ response = model.generate_content(prompt, generation_config=genai.types.GenerationConfig(temperature=temperature)).text.strip()
118
+ time.sleep(1)
119
+ return response
120
+ except Exception as e:
121
+ print(f"Gemini error: {e}")
122
+ if attempt >= 10: return f"Your attempt to inference gemini failed: {e}"
123
+ time.sleep(1)
124
+ return query_gemini2p0(prompt, system, attempt+1)
125
+
126
+
127
+ def compile_latex(latex_code, output_path, compile=True, timeout=30):
128
+ latex_code = latex_code.replace(
129
+ r"\documentclass{article}",
130
+ "\\documentclass{article}\n\\usepackage{amsmath}\n\\usepackage{amssymb}\n\\usepackage{array}\n\\usepackage{algorithm}\n\\usepackage{algorithmicx}\n\\usepackage{algpseudocode}\n\\usepackage{booktabs}\n\\usepackage{colortbl}\n\\usepackage{color}\n\\usepackage{enumitem}\n\\usepackage{fontawesome5}\n\\usepackage{float}\n\\usepackage{graphicx}\n\\usepackage{hyperref}\n\\usepackage{listings}\n\\usepackage{makecell}\n\\usepackage{multicol}\n\\usepackage{multirow}\n\\usepackage{pgffor}\n\\usepackage{pifont}\n\\usepackage{soul}\n\\usepackage{sidecap}\n\\usepackage{subcaption}\n\\usepackage{titletoc}\n\\usepackage[symbol]{footmisc}\n\\usepackage{url}\n\\usepackage{wrapfig}\n\\usepackage{xcolor}\n\\usepackage{xspace}")
131
+ #print(latex_code)
132
+ dir_path = f"{output_path}/tex"
133
+ tex_file_path = os.path.join(dir_path, "temp.tex")
134
+ # Write the LaTeX code to the .tex file in the specified directory
135
+ with open(tex_file_path, "w") as f:
136
+ f.write(latex_code)
137
+
138
+ if not compile:
139
+ return f"Compilation successful"
140
+
141
+ # Compiling the LaTeX code using pdflatex with non-interactive mode and timeout
142
+ try:
143
+ result = subprocess.run(
144
+ ["pdflatex", "-interaction=nonstopmode", "temp.tex"],
145
+ check=True, # Raises a CalledProcessError on non-zero exit codes
146
+ stdout=subprocess.PIPE, # Capture standard output
147
+ stderr=subprocess.PIPE, # Capture standard error
148
+ timeout=timeout, # Timeout for the process
149
+ cwd=dir_path
150
+ )
151
+
152
+ # If compilation is successful, return the success message
153
+ return f"Compilation successful: {result.stdout.decode('utf-8')}"
154
+
155
+ except subprocess.TimeoutExpired:
156
+ # If the compilation takes too long, return a timeout message
157
+ return "[CODE EXECUTION ERROR]: Compilation timed out after {} seconds".format(timeout)
158
+ except subprocess.CalledProcessError as e:
159
+ # If there is an error during LaTeX compilation, return the error message
160
+ return f"[CODE EXECUTION ERROR]: Compilation failed. There was an error in your latex."
161
+
162
+
163
+ def count_tokens(messages, model="gpt-4"):
164
+ enc = tiktoken.encoding_for_model(model)
165
+ num_tokens = sum([len(enc.encode(message["content"])) for message in messages])
166
+ return num_tokens
167
+
168
+ def remove_figures():
169
+ """Remove a directory if it exists."""
170
+ for _file in os.listdir("."):
171
+ if "Figure_" in _file and ".png" in _file:
172
+ os.remove(_file)
173
+
174
+ def remove_directory(dir_path):
175
+ """Remove a directory if it exists."""
176
+ if os.path.exists(dir_path) and os.path.isdir(dir_path):
177
+ try:
178
+ shutil.rmtree(dir_path)
179
+ print(f"Directory {dir_path} removed successfully.")
180
+ except Exception as e:
181
+ print(f"Error removing directory {dir_path}: {e}")
182
+ else:
183
+ print(f"Directory {dir_path} does not exist or is not a directory.")
184
+
185
+
186
+ def save_to_file(location, filename, data):
187
+ """Utility function to save data as plain text."""
188
+ filepath = os.path.join(location, filename)
189
+ try:
190
+ with open(filepath, 'w') as f:
191
+ f.write(data) # Write the raw string instead of using json.dump
192
+ print(f"Data successfully saved to {filepath}")
193
+ except Exception as e:
194
+ print(f"Error saving file {filename}: {e}")
195
+
196
+
197
+ def clip_tokens(messages, model="gpt-4", max_tokens=100000):
198
+ enc = tiktoken.encoding_for_model(model)
199
+ total_tokens = sum([len(enc.encode(message["content"])) for message in messages])
200
+
201
+ if total_tokens <= max_tokens:
202
+ return messages # No need to clip if under the limit
203
+
204
+ # Start removing tokens from the beginning
205
+ tokenized_messages = []
206
+ for message in messages:
207
+ tokenized_content = enc.encode(message["content"])
208
+ tokenized_messages.append({"role": message["role"], "content": tokenized_content})
209
+
210
+ # Flatten all tokens
211
+ all_tokens = [token for message in tokenized_messages for token in message["content"]]
212
+
213
+ # Remove tokens from the beginning
214
+ clipped_tokens = all_tokens[total_tokens - max_tokens:]
215
+
216
+ # Rebuild the clipped messages
217
+ clipped_messages = []
218
+ current_idx = 0
219
+ for message in tokenized_messages:
220
+ message_token_count = len(message["content"])
221
+ if current_idx + message_token_count > len(clipped_tokens):
222
+ clipped_message_content = clipped_tokens[current_idx:]
223
+ clipped_message = enc.decode(clipped_message_content)
224
+ clipped_messages.append({"role": message["role"], "content": clipped_message})
225
+ break
226
+ else:
227
+ clipped_message_content = clipped_tokens[current_idx:current_idx + message_token_count]
228
+ clipped_message = enc.decode(clipped_message_content)
229
+ clipped_messages.append({"role": message["role"], "content": clipped_message})
230
+ current_idx += message_token_count
231
+ return clipped_messages
232
+
233
+
234
+
235
+ def extract_prompt(text, word):
236
+ code_block_pattern = rf"```{word}(.*?)```"
237
+ code_blocks = re.findall(code_block_pattern, text, re.DOTALL)
238
+ extracted_code = "\n".join(code_blocks).strip()
239
+ return extracted_code
240
+
241
+ from typing import Dict, List
242
+
243
+ import datasets
244
+
245
+
246
+ def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:
247
+ def _process_doc(doc: dict) -> dict:
248
+ out_doc = {
249
+ "problem": doc["problem"],
250
+ "solution": doc["solution"],
251
+ "answer": remove_boxed(last_boxed_only_string(doc["solution"])),
252
+ }
253
+ return out_doc
254
+
255
+ return dataset.map(_process_doc)
256
+
257
+
258
+ def process_results(doc: dict, results: List[str]) -> Dict[str, int]:
259
+ retval = 0
260
+ indices = [pos for pos, char in enumerate(results[0]) if char == "$"]
261
+ if len(indices) <= 1:
262
+ answer = results[0]
263
+ else:
264
+ answer = results[0][indices[0] + 1 : indices[-1]]
265
+
266
+ if is_equiv(answer, remove_boxed(last_boxed_only_string(doc["solution"]))):
267
+ retval = 1
268
+
269
+ results = {
270
+ "exact_match": retval,
271
+ }
272
+ return results
273
+
274
+
275
+ # string normalization from https://github.com/EleutherAI/lm-evaluation-harness/blob/master/lm_eval/tasks/hendrycks_math.py
276
+ def is_equiv(str1, str2, verbose=False):
277
+ if str1 is None and str2 is None:
278
+ print("WARNING: Both None")
279
+ return True
280
+ if str1 is None or str2 is None:
281
+ return False
282
+
283
+ try:
284
+ ss1 = strip_string(str1)
285
+ ss2 = strip_string(str2)
286
+ if verbose:
287
+ print(ss1, ss2)
288
+ return ss1 == ss2
289
+ except Exception:
290
+ return str1 == str2
291
+
292
+
293
+ def clean_answer(s):
294
+ s = s.replace("\\dfrac", "\\frac") # makes no difference but can lead to errors
295
+ s = s.replace("x \\in", "")
296
+ return s
297
+
298
+ def remove_boxed(s):
299
+ if "\\boxed " in s:
300
+ left = "\\boxed "
301
+ assert s[: len(left)] == left
302
+ return s[len(left) :]
303
+
304
+ left = "\\boxed{"
305
+
306
+ assert s[: len(left)] == left
307
+ assert s[-1] == "}"
308
+
309
+ return clean_answer(s[len(left) : -1])
310
+
311
+
312
+ def last_boxed_only_string(string):
313
+ idx = string.rfind("\\boxed")
314
+ if "\\boxed " in string:
315
+ return "\\boxed " + string.split("\\boxed ")[-1].split("$")[0]
316
+ if idx < 0:
317
+ idx = string.rfind("\\fbox")
318
+ if idx < 0:
319
+ return None
320
+
321
+ i = idx
322
+ right_brace_idx = None
323
+ num_left_braces_open = 0
324
+ while i < len(string):
325
+ if string[i] == "{":
326
+ num_left_braces_open += 1
327
+ if string[i] == "}":
328
+ num_left_braces_open -= 1
329
+ if num_left_braces_open == 0:
330
+ right_brace_idx = i
331
+ break
332
+ i += 1
333
+
334
+ if right_brace_idx is None:
335
+ retval = None
336
+ else:
337
+ retval = string[idx : right_brace_idx + 1]
338
+
339
+ return retval
340
+
341
+
342
+ def fix_fracs(string):
343
+ substrs = string.split("\\frac")
344
+ new_str = substrs[0]
345
+ if len(substrs) > 1:
346
+ substrs = substrs[1:]
347
+ for substr in substrs:
348
+ new_str += "\\frac"
349
+ if substr[0] == "{":
350
+ new_str += substr
351
+ else:
352
+ try:
353
+ assert len(substr) >= 2
354
+ except AssertionError:
355
+ return string
356
+ a = substr[0]
357
+ b = substr[1]
358
+ if b != "{":
359
+ if len(substr) > 2:
360
+ post_substr = substr[2:]
361
+ new_str += "{" + a + "}{" + b + "}" + post_substr
362
+ else:
363
+ new_str += "{" + a + "}{" + b + "}"
364
+ else:
365
+ if len(substr) > 2:
366
+ post_substr = substr[2:]
367
+ new_str += "{" + a + "}" + b + post_substr
368
+ else:
369
+ new_str += "{" + a + "}" + b
370
+ string = new_str
371
+ return string
372
+
373
+
374
+ def fix_a_slash_b(string):
375
+ if len(string.split("/")) != 2:
376
+ return string
377
+ a = string.split("/")[0]
378
+ b = string.split("/")[1]
379
+ try:
380
+ a = int(a)
381
+ b = int(b)
382
+ assert string == "{}/{}".format(a, b)
383
+ new_string = "\\frac{" + str(a) + "}{" + str(b) + "}"
384
+ return new_string
385
+ except AssertionError:
386
+ return string
387
+
388
+
389
+ def remove_right_units(string):
390
+ # "\\text{ " only ever occurs (at least in the val set) when describing units
391
+ if "\\text{ " in string:
392
+ splits = string.split("\\text{ ")
393
+ assert len(splits) == 2
394
+ return splits[0]
395
+ else:
396
+ return string
397
+
398
+
399
+ def fix_sqrt(string):
400
+ if "\\sqrt" not in string:
401
+ return string
402
+ splits = string.split("\\sqrt")
403
+ new_string = splits[0]
404
+ for split in splits[1:]:
405
+ if split[0] != "{":
406
+ a = split[0]
407
+ new_substr = "\\sqrt{" + a + "}" + split[1:]
408
+ else:
409
+ new_substr = "\\sqrt" + split
410
+ new_string += new_substr
411
+ return new_string
412
+
413
+
414
+ def strip_string(string):
415
+ # linebreaks
416
+ string = string.replace("\n", "")
417
+
418
+ # remove inverse spaces
419
+ string = string.replace("\\!", "")
420
+
421
+ # replace \\ with \
422
+ string = string.replace("\\\\", "\\")
423
+
424
+ # replace tfrac and dfrac with frac
425
+ string = string.replace("tfrac", "frac")
426
+ string = string.replace("dfrac", "frac")
427
+
428
+ # remove \left and \right
429
+ string = string.replace("\\left", "")
430
+ string = string.replace("\\right", "")
431
+
432
+ # Remove circ (degrees)
433
+ string = string.replace("^{\\circ}", "")
434
+ string = string.replace("^\\circ", "")
435
+
436
+ # remove dollar signs
437
+ string = string.replace("\\$", "")
438
+
439
+ # remove units (on the right)
440
+ string = remove_right_units(string)
441
+
442
+ # remove percentage
443
+ string = string.replace("\\%", "")
444
+ string = string.replace("\%", "") # noqa: W605
445
+
446
+ # " 0." equivalent to " ." and "{0." equivalent to "{." Alternatively, add "0" if "." is the start of the string
447
+ string = string.replace(" .", " 0.")
448
+ string = string.replace("{.", "{0.")
449
+ # if empty, return empty string
450
+ if len(string) == 0:
451
+ return string
452
+ if string[0] == ".":
453
+ string = "0" + string
454
+
455
+ # to consider: get rid of e.g. "k = " or "q = " at beginning
456
+ if len(string.split("=")) == 2:
457
+ if len(string.split("=")[0]) <= 2:
458
+ string = string.split("=")[1]
459
+
460
+ # fix sqrt3 --> sqrt{3}
461
+ string = fix_sqrt(string)
462
+
463
+ # remove spaces
464
+ string = string.replace(" ", "")
465
+
466
+ # \frac1b or \frac12 --> \frac{1}{b} and \frac{1}{2}, etc. Even works with \frac1{72} (but not \frac{72}1). Also does a/b --> \\frac{a}{b}
467
+ string = fix_fracs(string)
468
+
469
+ # manually change 0.5 --> \frac{1}{2}
470
+ if string == "0.5":
471
+ string = "\\frac{1}{2}"
472
+ if string == "5.5":
473
+ string = "\\frac{11}{2}"
474
+ if "(x - 3)(x + 3)" in string:
475
+ string = string.replace("(x - 3)(x + 3)", "(x+3)(x-3)")
476
+
477
+ # NOTE: X/Y changed to \frac{X}{Y} in dataset, but in simple cases fix in case the model output is X/Y
478
+ string = fix_a_slash_b(string)
479
+
480
+ return string