Spaces:

DuyguJones
/

ThinkForge_GAIA

Runtime error

App Files Files Community

DuyguJones commited on May 9, 2025

Commit

54741ab

1 Parent(s): a0958e5

test

Browse files

Files changed (3) hide show

test.ipynb → agents/__init__.py +0 -0
test_gaia.ipynb +584 -0
tools/__init__.py +0 -0

test.ipynb → agents/__init__.py RENAMED Viewed

File without changes

test_gaia.ipynb ADDED Viewed

	@@ -0,0 +1,584 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9eedd8e4",
+   "metadata": {},
+   "source": [
+    "# Testing the Agent"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f906ea45",
+   "metadata": {},
+   "source": [
+    "## Set the Env"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "1f26b276",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Enable auto-reloading of external modules when they change\n",
+    "%load_ext autoreload\n",
+    "\n",
+    "# Set auto-reload mode to 2: reload all modules (except those excluded) before executing a line\n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "28e55a4e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from dotenv import load_dotenv\n",
+    "load_dotenv()\n",
+    "\n",
+    "## Langsmith Tracking\n",
+    "os.environ[\"LANGCHAIN_API_KEY\"]=os.getenv(\"LANGCHAIN_API_KEY\")\n",
+    "os.environ[\"LANGCHAIN_TRACING_V2\"]=\"true\"\n",
+    "os.environ[\"LANGCHAIN_PROJECT\"]=os.getenv(\"LANGCHAIN_PROJECT\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ce694a1",
+   "metadata": {},
+   "source": [
+    "## Load Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1a66ce56",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import os\n",
+    "\n",
+    "# Load the metadata.jsonl file\n",
+    "with open('data/metadata.jsonl', 'r') as f:\n",
+    "    json_list = list(f)\n",
+    "\n",
+    "json_QA = []\n",
+    "for json_str in json_list:\n",
+    "    json_data = json.loads(json_str)\n",
+    "    json_QA.append(json_data)  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b11f6ca",
+   "metadata": {},
+   "source": [
+    "### Data Analyses\n",
+    "\n",
+    "Here we analyze the data in order to find what tools we need to build a robust agent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ef2da36e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'task_id': 'c61d22de-5f6c-4958-a7f6-5e9707bd3466',\n",
+       " 'Question': 'A paper about AI regulation that was originally submitted to arXiv.org in June 2022 shows a figure with three axes, where each axis has a label word at both ends. Which of these words is used to describe a type of society in a Physics and Society article submitted to arXiv.org on August 11, 2016?',\n",
+       " 'Level': 2,\n",
+       " 'Final answer': 'egalitarian',\n",
+       " 'file_name': '',\n",
+       " 'Annotator Metadata': {'Steps': '1. Go to arxiv.org and navigate to the Advanced Search page.\\n2. Enter \"AI regulation\" in the search box and select \"All fields\" from the dropdown.\\n3. Enter 2022-06-01 and 2022-07-01 into the date inputs, select \"Submission date (original)\", and submit the search.\\n4. Go through the search results to find the article that has a figure with three axes and labels on each end of the axes, titled \"Fairness in Agreement With European Values: An Interdisciplinary Perspective on AI Regulation\".\\n5. Note the six words used as labels: deontological, egalitarian, localized, standardized, utilitarian, and consequential.\\n6. Go back to arxiv.org\\n7. Find \"Physics and Society\" and go to the page for the \"Physics and Society\" category.\\n8. Note that the tag for this category is \"physics.soc-ph\".\\n9. Go to the Advanced Search page.\\n10. Enter \"physics.soc-ph\" in the search box and select \"All fields\" from the dropdown.\\n11. Enter 2016-08-11 and 2016-08-12 into the date inputs, select \"Submission date (original)\", and submit the search.\\n12. Search for instances of the six words in the results to find the paper titled \"Phase transition from egalitarian to hierarchical societies driven by competition between cognitive and social constraints\", indicating that \"egalitarian\" is the correct answer.',\n",
+       "  'Number of steps': '12',\n",
+       "  'How long did this take?': '8 minutes',\n",
+       "  'Tools': '1. Web browser\\n2. Image recognition tools (to identify and parse a figure with three axes)',\n",
+       "  'Number of tools': '2'}}"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "json_QA[0] # Display the first entry to check the structure"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "597d7911",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'A paper about AI regulation that was originally submitted to arXiv.org in June 2022 shows a figure with three axes, where each axis has a label word at both ends. Which of these words is used to describe a type of society in a Physics and Society article submitted to arXiv.org on August 11, 2016?'"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "json_QA[0]['Question'] # Access the 'Question' field"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "53f27fdd",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "==================================================\n",
+      "Task ID: e1fc63a2-da7a-432f-be78-7c4a95598703\n",
+      "Question: If Eliud Kipchoge could maintain his record-making marathon pace indefinitely, how many thousand hours would it take him to run the distance between the Earth and the Moon its closest approach? Please use the minimum perigee value on the Wikipedia page for the Moon when carrying out your calculation. Round your result to the nearest 1000 hours and do not use any comma separators if necessary.\n",
+      "Level: 1\n",
+      "Final Answer: 17\n",
+      "Annotator Metadata: \n",
+      "  ├── Steps: \n",
+      "  │      ├── 1. Googled Eliud Kipchoge marathon pace to find 4min 37sec/mile\n",
+      "  │      ├── 2. Converted into fractions of hours.\n",
+      "  │      ├── 3. Found moon periapsis in miles (225,623 miles).\n",
+      "  │      ├── 4. Multiplied the two to find the number of hours and rounded to the nearest 100 hours.\n",
+      "  ├── Number of steps: 4\n",
+      "  ├── How long did this take?: 20 Minutes\n",
+      "  ├── Tools:\n",
+      "  │      ├── 1. A web browser.\n",
+      "  │      ├── 2. A search engine.\n",
+      "  │      ├── 3. A calculator.\n",
+      "  └── Number of tools: 3\n",
+      "==================================================\n",
+      "Task ID: 6359a0b1-8f7b-499b-9336-840f9ab90688\n",
+      "Question: What is the area of the green polygon in the attached file? The numbers in purple represent the lengths of the side they are next to.\n",
+      "Level: 2\n",
+      "Final Answer: 39\n",
+      "Annotator Metadata: \n",
+      "  ├── Steps: \n",
+      "  │      ├── 1. Open the attached file.\n",
+      "  │      ├── 2. Split the shape into five rectangles.\n",
+      "  │      ├── 3. Find the missing side lengths from the side lengths that are given.\n",
+      "  │      ├── 4. Find the area for each rectangle.\n",
+      "  │      ├── 5. Add the areas together to get the area of the entire shape, 39.\n",
+      "  ├── Number of steps: 5\n",
+      "  ├── How long did this take?: 5-10 minutes\n",
+      "  ├── Tools:\n",
+      "  │      ├── 1. Image recognition\n",
+      "  │      ├── 2. OCR\n",
+      "  │      ├── 3. Calculator\n",
+      "  └── Number of tools: 3\n",
+      "==================================================\n",
+      "Task ID: b7f857e4-d8aa-4387-af2a-0e844df5b9d8\n",
+      "Question: The attached image contains a Python script. Run the Python code against an array of strings, listed below. The output of the Python script will be a URL containing C++ source code. Compile and run this C++ code against the array [35, 12, 8, 99, 21, 5] and return the sum of the third and fifth integers in the sorted list.\n",
+      "\n",
+      "arr = ['_alg', 'ghi', 'C++', 'jkl', 'tps', '/Q', 'pqr', 'stu', ':', '//', 'rose', 'vwx', 'yz1', '234', 'tta', '567', '890', 'cod', 'e.', 'or', 'g/', 'wiki', '/', 'ing', 'sort', 'abc' , 'or', 'it', 'hms', 'mno' , 'uic', 'ksort', '#', 'ht' ]\n",
+      "Level: 2\n",
+      "Final Answer: 47\n",
+      "Annotator Metadata: \n",
+      "  ├── Steps: \n",
+      "  │      ├── 1. Extract the Python code from the image\n",
+      "  │      ├── 2. Run the code against the provided array. \n",
+      "  │      ├── 3. Navigate to the returned URL (https://web.archive.org/web/20230609112831/https://rosettacode.org/wiki/sorting_algorithms/Quicksort#C++)\n",
+      "  │      ├── 4. Extract the C++ code from the page.\n",
+      "  │      ├── 5. Insert the provided array into the C++ source code:\n",
+      "  │      ├── int main() {\n",
+      "  │      ├──     std::vector<int> arr = {35, 12, 8, 99, 21, 5};\n",
+      "  │      ├──     quicksort(arr.begin(), arr.end());\n",
+      "  │      ├──     for (const auto& num : arr) {\n",
+      "  │      ├──         std::cout << num << \" \";\n",
+      "  │      ├──     }\n",
+      "  │      ├──     std::cout << \"\\n\";\n",
+      "  │      ├──       return 0;\n",
+      "  │      ├── }\n",
+      "  │      ├── 6. Compile the edited code.\n",
+      "  │      ├── 7. Run the compiled binary\n",
+      "  ├── Number of steps: 7\n",
+      "  ├── How long did this take?: 45 minutes\n",
+      "  ├── Tools:\n",
+      "  │      ├── 1. File handling\n",
+      "  │      ├── 2. Computer vision or OCR\n",
+      "  │      ├── 3. Web browser\n",
+      "  │      ├── 4. Python\n",
+      "  │      ├── 5. C++ compiler\n",
+      "  │      ├── 6. Calculator \n",
+      "  └── Number of tools: 6\n",
+      "==================================================\n"
+     ]
+    }
+   ],
+   "source": [
+    "import random\n",
+    "\n",
+    "random_samples = random.sample(json_QA, 3)\n",
+    "for sample in random_samples:\n",
+    "    print(\"=\" * 50)\n",
+    "    print(f\"Task ID: {sample['task_id']}\")\n",
+    "    print(f\"Question: {sample['Question']}\")\n",
+    "    print(f\"Level: {sample['Level']}\")\n",
+    "    print(f\"Final Answer: {sample['Final answer']}\")\n",
+    "    \n",
+    "    print(f\"Annotator Metadata: \")\n",
+    "    print(f\"  ├── Steps: \")\n",
+    "    for step in sample['Annotator Metadata']['Steps'].split('\\n'):\n",
+    "        print(f\"  │      ├── {step}\")\n",
+    "    print(f\"  ├── Number of steps: {sample['Annotator Metadata']['Number of steps']}\")\n",
+    "    print(f\"  ├── How long did this take?: {sample['Annotator Metadata']['How long did this take?']}\")\n",
+    "    print(f\"  ├── Tools:\")\n",
+    "    for tool in sample['Annotator Metadata']['Tools'].split('\\n'):\n",
+    "        print(f\"  │      ├── {tool}\")\n",
+    "    print(f\"  └── Number of tools: {sample['Annotator Metadata']['Number of tools']}\")\n",
+    "print(\"=\" * 50)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92c785ac",
+   "metadata": {},
+   "source": [
+    "## Create Database \n",
+    "\n",
+    "- supbase database: https://supabase.com/docs/guides/database/overview "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ac93270f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "### Linking to supabase server\n",
+    "import os\n",
+    "from dotenv import load_dotenv\n",
+    "from langchain_huggingface import HuggingFaceEmbeddings\n",
+    "from langchain_community.vectorstores import SupabaseVectorStore\n",
+    "from supabase.client import Client, create_client\n",
+    "\n",
+    "\n",
+    "load_dotenv()\n",
+    "embeddings = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-mpnet-base-v2\") #  dim=768\n",
+    "\n",
+    "supabase_url = os.environ.get(\"SUPABASE_URL\")\n",
+    "supabase_key = os.environ.get(\"SUPABASE_SERVICE_KEY\")\n",
+    "supabase: Client = create_client(supabase_url, supabase_key)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b745387a",
+   "metadata": {},
+   "source": [
+    "## Embedding and Vector Storage"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fb595623",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Convert question-answer pairs to a list of Document objects for vector database storage\n",
+    "from langchain.schema import Document\n",
+    "docs = []\n",
+    "for sample in json_QA:\n",
+    "    # Create a formatted text combining question and answer\n",
+    "    content = f\"Question : {sample['Question']}\\n\\nFinal answer : {sample['Final answer']}\"\n",
+    "    \n",
+    "    # Create a document dictionary with content, metadata, and vector embedding\n",
+    "    doc = {\n",
+    "        \"content\" : content,          # The actual text (question + answer)\n",
+    "        \"metadata\" : {                \n",
+    "            \"source\" : sample['task_id']  \n",
+    "        },\n",
+    "        \"embedding\" : embeddings.embed_query(content),  \n",
+    "    }\n",
+    "    docs.append(doc)\n",
+    "\n",
+    "# Insert all documents into Supabase vector database\n",
+    "try:\n",
+    "    response = supabase.table(\"documents\").insert(docs).execute()\n",
+    "    print(\"Data inserted successfully:\", response)\n",
+    "except Exception as e:\n",
+    "    print(\"Error inserting data into Supabase:\", e)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d95564f5",
+   "metadata": {},
+   "source": [
+    "## Vector Store and Create Retriever"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "935d2193",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# add items to vector database\n",
+    "vector_store = SupabaseVectorStore(\n",
+    "    client=supabase,\n",
+    "    embedding= embeddings,\n",
+    "    table_name=\"documents\",\n",
+    "    query_name=\"match_documents\",\n",
+    ")\n",
+    "retriever = vector_store.as_retriever()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "321e3452",
+   "metadata": {},
+   "source": [
+    "## Test Query"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cb568e4f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"On June 6, 2023, an article by Carolyn Collins Petersen was published in Universe Today. This article mentions a team that produced a paper about their observations, linked at the bottom of the article. Find this paper. Under what NASA award number was the work performed by R. G. Arendt supported by?\"\n",
+    "\n",
+    "# matched_docs = vector_store.similarity_search(query, 2)\n",
+    "docs = retriever.invoke(query)\n",
+    "docs[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27e9e2b4",
+   "metadata": {},
+   "source": [
+    "## Tool Usage Frequency in GAIA Benchmark"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5f2de7f1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from collections import Counter, OrderedDict\n",
+    "\n",
+    "tools = []\n",
+    "for sample in json_QA:\n",
+    "    for tool in sample['Annotator Metadata']['Tools'].split('\\n'):\n",
+    "        tool = tool[2:].strip().lower()\n",
+    "        if tool.startswith(\"(\"):\n",
+    "            tool = tool[11:].strip()\n",
+    "        tools.append(tool)\n",
+    "tools_counter = OrderedDict(Counter(tools))\n",
+    "print(\"List of tools used in all samples:\")\n",
+    "print(\"Total number of tools used:\", len(tools_counter))\n",
+    "for tool, count in tools_counter.items():\n",
+    "    print(f\"  ├── {tool}: {count}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41e913b0",
+   "metadata": {},
+   "source": [
+    "# Graph Implementation\n",
+    "\n",
+    "Here we build the agent graph with LangGraph to solve those tasks!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "24406a79",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "system_prompt = \"\"\" \n",
+    "You are a helpful assistant tasked with answering questions using a set of tools.\n",
+    "If the tool is not available, you can try to find the information online. You can also use your own knowledge to answer the question. \n",
+    "You need to provide a step-by-step explanation of how you arrived at the answer.\n",
+    "==========================\n",
+    "Here is a few examples showing you how to answer the question step by step.\n",
+    "\"\"\"\n",
+    "for i, samples in enumerate(random_samples):\n",
+    "    system_prompt += f\"\\nQuestion {i+1}: {samples['Question']}\\nSteps:\\n{samples['Annotator Metadata']['Steps']}\\nTools:\\n{samples['Annotator Metadata']['Tools']}\\nFinal Answer: {samples['Final answer']}\\n\"\n",
+    "system_prompt += \"\\n==========================\\n\"\n",
+    "system_prompt += \"Now, please answer the following question step by step.\\n\"\n",
+    "\n",
+    "# save the system_prompt to a file\n",
+    "with open('system_prompt.txt', 'w') as f:\n",
+    "    f.write(system_prompt)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "15ab2939",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# load the system prompt from the file\n",
+    "with open('system_prompt.txt', 'r') as f:\n",
+    "    system_prompt = f.read()\n",
+    "print(system_prompt) "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5e33194b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import dotenv\n",
+    "from langgraph.graph import MessagesState, START, StateGraph\n",
+    "from langgraph.prebuilt import tools_condition\n",
+    "from langgraph.prebuilt import ToolNode\n",
+    "from langchain_google_genai import ChatGoogleGenerativeAI\n",
+    "from langchain_core.messages import HumanMessage, SystemMessage\n",
+    "from tools.searchtools import question_retrieve_tool, wiki_search, web_search, arvix_search\n",
+    "from tools.mathtools import multiply, add, subtract, divide, modulus\n",
+    "\n",
+    "# Define the retriever from supabase\n",
+    "load_dotenv()\n",
+    "\n",
+    "tools = [\n",
+    "    multiply,\n",
+    "    add,\n",
+    "    subtract,\n",
+    "    divide,\n",
+    "    modulus,\n",
+    "    wiki_search,\n",
+    "    web_search,\n",
+    "    arvix_search,\n",
+    "    question_retrieve_tool\n",
+    "]\n",
+    "\n",
+    "llm = ChatGoogleGenerativeAI(model=\"gemini-2.0-flash\")\n",
+    "llm_with_tools = llm.bind_tools(tools)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2e8901db",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# load the system prompt from the file\n",
+    "with open('system_prompt.txt', 'r') as f:\n",
+    "    system_prompt = f.read()\n",
+    "\n",
+    "\n",
+    "# System message\n",
+    "sys_msg = SystemMessage(content=system_prompt)\n",
+    "\n",
+    "# Node\n",
+    "def assistant(state: MessagesState):\n",
+    "    \"\"\"Assistant node\"\"\"\n",
+    "    return {\"messages\": [llm_with_tools.invoke([sys_msg] + state[\"messages\"])]}\n",
+    "\n",
+    "# Build graph\n",
+    "builder = StateGraph(MessagesState)\n",
+    "builder.add_node(\"assistant\", assistant)\n",
+    "builder.add_node(\"tools\", ToolNode(tools))\n",
+    "builder.add_edge(START, \"assistant\")\n",
+    "builder.add_conditional_edges(\n",
+    "    \"assistant\",\n",
+    "    # If the latest message (result) from assistant is a tool call -> tools_condition routes to tools\n",
+    "    # If the latest message (result) from assistant is a not a tool call -> tools_condition routes to END\n",
+    "    tools_condition,\n",
+    ")\n",
+    "builder.add_edge(\"tools\", \"assistant\")\n",
+    "\n",
+    "# Compile graph\n",
+    "graph = builder.compile()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "74ba4cd8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from IPython.display import Image, display\n",
+    "\n",
+    "display(Image(graph.get_graph(xray=True).draw_mermaid_png()))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "af40c038",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "question = \"On June 6, 2023, an article by Carolyn Collins Petersen was published in Universe Today. This article mentions a team that produced a paper about their observations, linked at the bottom of the article. Find this paper. Under what NASA award number was the work performed by R. G. Arendt supported by?\"\n",
+    "messages = [HumanMessage(content=question)]\n",
+    "messages = graph.invoke({\"messages\": messages})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "133aba59",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for m in messages['messages']:\n",
+    "    m.pretty_print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f6738871",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

tools/__init__.py ADDED Viewed

File without changes