Final_Assignment_Template

Sleeping

App Files Files Community

AheedTahir commited on Nov 4, 2025

Commit

223e45d

1 Parent(s): 81917a3

Final Working Implementation

Browse files

Files changed (7) hide show

.env.example +24 -0
.gitignore +49 -0
README.md +89 -6
agent.py +356 -0
evaluation_app.py +217 -0
requirements.txt +16 -1
test_agent.py +84 -0

.env.example ADDED Viewed

	@@ -0,0 +1,24 @@

+# API Keys Configuration
+# Copy this file to .env and fill in your actual API keys
+# Groq API Key (for LLM)
+# Get from: https://console.groq.com
+GROQ_API_KEY=gsk_your_groq_api_key_here
+# Tavily API Key (for web search)
+# Get from: https://tavily.com
+TAVILY_API_KEY=tvly-your_tavily_api_key_here
+# Optional: Supabase (if using vector database)
+SUPABASE_URL=your_supabase_url_here
+SUPABASE_SERVICE_ROLE_KEY=your_supabase_key_here
+# Optional: HuggingFace (if using HF models)
+HUGGINGFACEHUB_API_TOKEN=hf_your_token_here
+# Optional: LangSmith (for debugging/tracing)
+LANGSMITH_API_KEY=lsv2_your_key_here
+LANGSMITH_TRACING=true
+LANGSMITH_PROJECT=ai_agent_course
+LANGSMITH_ENDPOINT=https://api.smith.langchain.com

.gitignore ADDED Viewed

	@@ -0,0 +1,49 @@

+# Environment variables
+.env
+.env.local
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+ENV/
+env/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Logs
+*.log
+# Test outputs
+test_results/

README.md CHANGED Viewed

@@ -1,15 +1,98 @@
 ---
-title: Template Final Assignment
-emoji: 🕵🏻‍♂️
 colorFrom: indigo
-colorTo: indigo
 sdk: gradio
 sdk_version: 5.25.2
-app_file: app.py
 pinned: false
 hf_oauth: true
-# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
 hf_oauth_expiration_minutes: 480
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: GAIA Agent - Certification
+emoji: 🤖
 colorFrom: indigo
+colorTo: purple
 sdk: gradio
 sdk_version: 5.25.2
+app_file: evaluation_app.py
 pinned: false
 hf_oauth: true
 hf_oauth_expiration_minutes: 480
 ---
+#  GAIA Agent - Hugging Face Agents Course Certification
+This is a LangGraph-based AI agent built to answer questions from the GAIA benchmark for the Hugging Face Agents Course Unit 4 certification.
+##  Goal
+Achieve **30%+ accuracy** on the GAIA benchmark to earn the certification.
+##  Agent Architecture
+The agent is built using:
+- **LLM**: Groq's Llama 3.3 70B (fast and free)
+- **Framework**: LangGraph for agent orchestration
+- **Tools**: 5 essential tools for maximum coverage
+### Tools Implemented
+1. **Web Search** (Tavily) - Search the internet for current information
+2. **Wikipedia Search** - Access encyclopedic knowledge (Wikipedia API)
+3. **Calculator** - Perform mathematical calculations
+4. **Python Executor** - Execute Python code for complex computations
+5. **File Reader** - Read CSV, JSON, and text files
+##  Answer Format Rules
+The agent follows GAIA's strict formatting requirements:
+- **Numbers**: No commas, no units (unless requested)
+- **Text**: No articles (a, an, the), no abbreviations
+- **Lists**: Comma-separated with one space after commas
+- **Dates**: ISO format (YYYY-MM-DD) unless specified
+##  Usage
+### Local Testing
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Set up environment variables in .env
+GROQ_API_KEY=your_key_here
+TAVILY_API_KEY=your_key_here
+# Test the agent
+python test_agent.py
+```
+### Running Evaluation
+1. Open the Space URL
+2. Log in with your HuggingFace account
+3. Click "Run Evaluation & Submit All Answers"
+4. Wait for results (takes ~1-2 hours due to rate limiting)
+##  Project Structure
+```
+.
+├── agent.py              # Main agent implementation
+├── evaluation_app.py     # Gradio app for evaluation
+├── test_agent.py         # Local testing script
+├── requirements.txt      # Python dependencies
+├── .env                  # API keys (not committed)
+└── README.md            # This file
+```
+##  Required API Keys
+- **GROQ_API_KEY**: Get from [console.groq.com](https://console.groq.com)
+- **TAVILY_API_KEY**: Get from [tavily.com](https://tavily.com)
+##  Expected Performance
+With the current tool set:
+- **Web Search + Wikipedia + Calculator**: ~25-30%
+- **+ File Processing**: ~35-40%
+- **+ Python Execution**: ~40-45%
+##  Course Information
+This project is part of the [Hugging Face Agents Course](https://huggingface.co/learn/agents-course) Unit 4 certification.
+##  License
+MY License - Feel free to use and modify for your own certification!

agent.py ADDED Viewed

	@@ -0,0 +1,356 @@

+"""
+GAIA Agent with Essential Tools for 30%+ Accuracy
+Built with LangGraph and Groq LLM
+"""
+import os
+import re
+import json
+from typing import Annotated
+from langchain_core.tools import tool
+from langchain_core.messages import SystemMessage
+from langchain_community.tools.tavily_search import TavilySearchResults
+from langchain_community.document_loaders import WikipediaLoader
+from langchain_groq import ChatGroq
+from langgraph.graph import StateGraph, MessagesState, START, END
+from langgraph.prebuilt import ToolNode, tools_condition
+from langgraph.checkpoint.memory import MemorySaver
+# Initialize LLM
+def get_llm():
+    """Get Groq LLM instance"""
+    return ChatGroq(
+        model="llama-3.3-70b-versatile",
+        temperature=0,
+        max_tokens=8000,
+        timeout=60,
+        max_retries=2
+    )
+# ============================================================================
+# TOOL DEFINITIONS
+# ============================================================================
+@tool
+def web_search(query: str) -> str:
+    """
+    Search the web for current information using Tavily.
+    Use this for finding recent information, facts, statistics, or any data not in your training.
+    Args:
+        query: The search query string
+    Returns:
+        Search results as formatted text
+    """
+    try:
+        tavily = TavilySearchResults(
+            max_results=5,
+            search_depth="advanced",
+            include_answer=True,
+            include_raw_content=False
+        )
+        results = tavily.invoke(query)
+        if not results:
+            return "No results found."
+        # Format results nicely
+        formatted = []
+        for i, result in enumerate(results, 1):
+            title = result.get('title', 'No title')
+            content = result.get('content', 'No content')
+            url = result.get('url', '')
+            formatted.append(f"Result {i}:\nTitle: {title}\nContent: {content}\nURL: {url}\n")
+        return "\n".join(formatted)
+    except Exception as e:
+        return f"Error searching web: {str(e)}"
+@tool
+def wikipedia_search(query: str) -> str:
+    """
+    Search Wikipedia for encyclopedic information.
+    Use this for historical facts, biographies, scientific concepts, etc.
+    Args:
+        query: The Wikipedia search query
+    Returns:
+        Wikipedia article content
+    """
+    try:
+        loader = WikipediaLoader(query=query, load_max_docs=2, doc_content_chars_max=4000)
+        docs = loader.load()
+        if not docs:
+            return f"No Wikipedia article found for '{query}'"
+        # Combine the documents
+        content = "\n\n---\n\n".join([doc.page_content for doc in docs])
+        return f"Wikipedia results for '{query}':\n\n{content}"
+    except Exception as e:
+        return f"Error searching Wikipedia: {str(e)}"
+@tool
+def calculate(expression: str) -> str:
+    """
+    Evaluate a mathematical expression safely.
+    Supports basic arithmetic: +, -, *, /, //, %, **, parentheses.
+    Also supports common math functions: abs, round, min, max, sum.
+    Args:
+        expression: Mathematical expression as a string (e.g., "2 + 2", "sqrt(16)", "10 ** 2")
+    Returns:
+        The calculated result
+    """
+    try:
+        # Import math for advanced functions
+        import math
+        # Create a safe namespace with math functions
+        safe_dict = {
+            'abs': abs, 'round': round, 'min': min, 'max': max, 'sum': sum,
+            'sqrt': math.sqrt, 'pow': pow, 'log': math.log, 'log10': math.log10,
+            'sin': math.sin, 'cos': math.cos, 'tan': math.tan,
+            'pi': math.pi, 'e': math.e, 'ceil': math.ceil, 'floor': math.floor
+        }
+        # Clean the expression
+        expression = expression.strip()
+        # Evaluate safely
+        result = eval(expression, {"__builtins__": {}}, safe_dict)
+        return str(result)
+    except Exception as e:
+        return f"Error calculating '{expression}': {str(e)}"
+@tool
+def python_executor(code: str) -> str:
+    """
+    Execute Python code safely for data processing and calculations.
+    Use this for complex calculations, data manipulation, or multi-step computations.
+    The code should print its output.
+    Args:
+        code: Python code to execute
+    Returns:
+        The output of the code execution
+    """
+    try:
+        import io
+        import sys
+        import math
+        import json
+        from datetime import datetime, timedelta
+        # Capture stdout
+        old_stdout = sys.stdout
+        sys.stdout = buffer = io.StringIO()
+        # Create safe execution environment
+        safe_globals = {
+            '__builtins__': {
+                'print': print, 'len': len, 'range': range, 'str': str,
+                'int': int, 'float': float, 'list': list, 'dict': dict,
+                'set': set, 'tuple': tuple, 'sorted': sorted, 'sum': sum,
+                'min': min, 'max': max, 'abs': abs, 'round': round,
+                'enumerate': enumerate, 'zip': zip, 'map': map, 'filter': filter,
+            },
+            'math': math,
+            'json': json,
+            'datetime': datetime,
+            'timedelta': timedelta,
+        }
+        # Execute code
+        exec(code, safe_globals)
+        # Get output
+        sys.stdout = old_stdout
+        output = buffer.getvalue()
+        return output if output else "Code executed successfully (no output)"
+    except Exception as e:
+        sys.stdout = old_stdout
+        return f"Error executing code: {str(e)}"
+@tool
+def read_file(filepath: str) -> str:
+    """
+    Read and return the contents of a file.
+    Supports text files, CSV, JSON, and basic file formats.
+    Args:
+        filepath: Path to the file to read
+    Returns:
+        File contents as string
+    """
+    try:
+        # Check if file exists
+        if not os.path.exists(filepath):
+            return f"File not found: {filepath}"
+        # Read based on file extension
+        if filepath.endswith('.json'):
+            with open(filepath, 'r', encoding='utf-8') as f:
+                data = json.load(f)
+                return json.dumps(data, indent=2)
+        elif filepath.endswith('.csv'):
+            try:
+                import pandas as pd
+                df = pd.read_csv(filepath)
+                return f"CSV file with {len(df)} rows and {len(df.columns)} columns:\n\n{df.to_string()}"
+            except ImportError:
+                # Fallback if pandas not available
+                with open(filepath, 'r', encoding='utf-8') as f:
+                    return f.read()
+        else:
+            # Read as text
+            with open(filepath, 'r', encoding='utf-8') as f:
+                content = f.read()
+                return content
+    except Exception as e:
+        return f"Error reading file '{filepath}': {str(e)}"
+# ============================================================================
+# SYSTEM PROMPT - GAIA Specific Instructions
+# ============================================================================
+GAIA_SYSTEM_PROMPT = """You are a helpful AI assistant designed to answer questions from the GAIA benchmark.
+CRITICAL ANSWER FORMAT RULES:
+1. For numbers: NO commas, NO units (unless explicitly requested)
+   - CORRECT: "1000" or "1000 meters" (if units requested)
+   - WRONG: "1,000" or "1000 meters" (if units not requested)
+2. For text answers: No articles (a, an, the), no abbreviations
+   - CORRECT: "United States"
+   - WRONG: "The United States" or "USA"
+3. For lists: Comma-separated with one space after each comma
+   - CORRECT: "apple, banana, orange"
+   - WRONG: "apple,banana,orange" or "apple, banana, orange."
+4. For dates: Use the format specified in the question
+   - If not specified, use ISO format: YYYY-MM-DD
+5. Be precise and concise - answer ONLY what is asked
+APPROACH:
+1. Read the question carefully and identify what information is needed
+2. Use tools to gather information (web search, Wikipedia, calculations)
+3. For multi-step questions, break down the problem and solve step by step
+4. Verify your answer matches the format requirements above
+5. Return ONLY the final answer in the correct format
+AVAILABLE TOOLS:
+- web_search: Search the internet for current information
+- wikipedia_search: Search Wikipedia for encyclopedic knowledge
+- calculate: Perform mathematical calculations
+- python_executor: Execute Python code for complex computations
+- read_file: Read files (CSV, JSON, text)
+Remember: Your final response should be ONLY the answer in the correct format, nothing else.
+"""
+# ============================================================================
+# AGENT GRAPH CONSTRUCTION
+# ============================================================================
+def build_graph():
+    """Build the LangGraph agent with tools"""
+    # Initialize LLM
+    llm = get_llm()
+    # Define tools
+    tools = [
+        web_search,
+        wikipedia_search,
+        calculate,
+        python_executor,
+        read_file
+    ]
+    # Bind tools to LLM
+    llm_with_tools = llm.bind_tools(tools)
+    # Define the assistant node
+    def assistant(state: MessagesState):
+        """Assistant node that calls the LLM"""
+        messages = state["messages"]
+        # Add system message if not present
+        if not any(isinstance(msg, SystemMessage) for msg in messages):
+            messages = [SystemMessage(content=GAIA_SYSTEM_PROMPT)] + messages
+        response = llm_with_tools.invoke(messages)
+        return {"messages": [response]}
+    # Build the graph
+    builder = StateGraph(MessagesState)
+    # Add nodes
+    builder.add_node("assistant", assistant)
+    builder.add_node("tools", ToolNode(tools))
+    # Add edges
+    builder.add_edge(START, "assistant")
+    builder.add_conditional_edges(
+        "assistant",
+        tools_condition,
+    )
+    builder.add_edge("tools", "assistant")
+    # Compile with memory
+    memory = MemorySaver()
+    graph = builder.compile(checkpointer=memory)
+    return graph
+# ============================================================================
+# TESTING
+# ============================================================================
+if __name__ == "__main__":
+    """Test the agent with sample questions"""
+    from langchain_core.messages import HumanMessage
+    # Build agent
+    print("Building agent...")
+    agent = build_graph()
+    # Test questions
+    test_questions = [
+        "What is 25 * 4 + 100?",
+        "Who was the first president of the United States?",
+        "Search for the population of Tokyo in 2024"
+    ]
+    for i, question in enumerate(test_questions, 1):
+        print(f"\n{'='*60}")
+        print(f"Test {i}: {question}")
+        print('='*60)
+        try:
+            config = {"configurable": {"thread_id": f"test_{i}"}}
+            result = agent.invoke(
+                {"messages": [HumanMessage(content=question)]},
+                config=config
+            )
+            answer = result['messages'][-1].content
+            print(f"Answer: {answer}")
+        except Exception as e:
+            print(f"Error: {e}")

evaluation_app.py ADDED Viewed

	@@ -0,0 +1,217 @@

+""" Basic Agent Evaluation Runner"""
+import os
+import inspect
+import gradio as gr
+import requests
+import pandas as pd
+import time
+from langchain_core.messages import HumanMessage
+from agent import build_graph
+# (Keep Constants as is)
+# --- Constants ---
+DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+# --- Basic Agent Definition ---
+# ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
+class BasicAgent:
+    """A langgraph agent."""
+    def __init__(self):
+        print("BasicAgent initialized.")
+        self.graph = build_graph()
+    def __call__(self, question: str) -> str:
+        print(f"Agent received question (first 50 chars): {question[:50]}...")
+        # Wrap the question in a HumanMessage from langchain_core
+        messages = [HumanMessage(content=question)]
+        config = {"configurable": {"thread_id": "evaluation"}}
+        result = self.graph.invoke({"messages": messages}, config=config)
+        answer = result['messages'][-1].content
+        # Extract final answer if it has "Final Answer:" prefix
+        if "Final Answer:" in answer:
+            answer = answer.split("Final Answer:")[-1].strip()
+        return answer
+def run_and_submit_all( profile: gr.OAuthProfile | None):
+    """
+    Fetches all questions, runs the BasicAgent on them, submits all answers,
+    and displays the results.
+    """
+    # --- Determine HF Space Runtime URL and Repo URL ---
+    space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
+    if profile:
+        username= f"{profile.username}"
+        print(f"User logged in: {username}")
+    else:
+        print("User not logged in.")
+        return "Please Login to Hugging Face with the button.", None
+    api_url = DEFAULT_API_URL
+    questions_url = f"{api_url}/questions"
+    submit_url = f"{api_url}/submit"
+    # 1. Instantiate Agent ( modify this part to create your agent)
+    try:
+        agent = BasicAgent()
+    except Exception as e:
+        print(f"Error instantiating agent: {e}")
+        return f"Error initializing agent: {e}", None
+    # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
+    agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
+    print(agent_code)
+    # 2. Fetch Questions
+    print(f"Fetching questions from: {questions_url}")
+    try:
+        response = requests.get(questions_url, timeout=15)
+        response.raise_for_status()
+        questions_data = response.json()
+        if not questions_data:
+             print("Fetched questions list is empty.")
+             return "Fetched questions list is empty or invalid format.", None
+        print(f"Fetched {len(questions_data)} questions.")
+    except requests.exceptions.RequestException as e:
+        print(f"Error fetching questions: {e}")
+        return f"Error fetching questions: {e}", None
+    except requests.exceptions.JSONDecodeError as e:
+         print(f"Error decoding JSON response from questions endpoint: {e}")
+         print(f"Response text: {response.text[:500]}")
+         return f"Error decoding server response for questions: {e}", None
+    except Exception as e:
+        print(f"An unexpected error occurred fetching questions: {e}")
+        return f"An unexpected error occurred fetching questions: {e}", None
+    # 3. Run your Agent
+    results_log = []
+    answers_payload = []
+    print(f"Running agent on {len(questions_data)} questions...")
+    for item in questions_data:
+        task_id = item.get("task_id")
+        question_text = item.get("question")
+        if not task_id or question_text is None:
+            print(f"Skipping item with missing task_id or question: {item}")
+            continue
+        time.sleep(30)
+        try:
+            submitted_answer = agent(question_text)
+            answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
+            results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
+        except Exception as e:
+             print(f"Error running agent on task {task_id}: {e}")
+             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
+    if not answers_payload:
+        print("Agent did not produce any answers to submit.")
+        return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
+    # 4. Prepare Submission
+    submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
+    status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
+    print(status_update)
+    # 5. Submit
+    print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
+    try:
+        response = requests.post(submit_url, json=submission_data, timeout=60)
+        response.raise_for_status()
+        result_data = response.json()
+        final_status = (
+            f"Submission Successful!\n"
+            f"User: {result_data.get('username')}\n"
+            f"Overall Score: {result_data.get('score', 'N/A')}% "
+            f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
+            f"Message: {result_data.get('message', 'No message received.')}"
+        )
+        print("Submission successful.")
+        results_df = pd.DataFrame(results_log)
+        return final_status, results_df
+    except requests.exceptions.HTTPError as e:
+        error_detail = f"Server responded with status {e.response.status_code}."
+        try:
+            error_json = e.response.json()
+            error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
+        except requests.exceptions.JSONDecodeError:
+            error_detail += f" Response: {e.response.text[:500]}"
+        status_message = f"Submission Failed: {error_detail}"
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+    except requests.exceptions.Timeout:
+        status_message = "Submission Failed: The request timed out."
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+    except requests.exceptions.RequestException as e:
+        status_message = f"Submission Failed: Network error - {e}"
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+    except Exception as e:
+        status_message = f"An unexpected error occurred during submission: {e}"
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+# --- Build Gradio Interface using Blocks ---
+with gr.Blocks() as demo:
+    gr.Markdown("# Basic Agent Evaluation Runner")
+    gr.Markdown(
+        """
+        **Instructions:**
+        1.  Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
+        2.  Log in to your Hugging Face account using the button below. This uses your HF username for submission.
+        3.  Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
+        ---
+        **Disclaimers:**
+        Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
+        This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
+        """
+    )
+    gr.LoginButton()
+    run_button = gr.Button("Run Evaluation & Submit All Answers")
+    status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
+    # Removed max_rows=10 from DataFrame constructor
+    results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
+    run_button.click(
+        fn=run_and_submit_all,
+        outputs=[status_output, results_table]
+    )
+if __name__ == "__main__":
+    print("\n" + "-"*30 + " App Starting " + "-"*30)
+    # Check for SPACE_HOST and SPACE_ID at startup for information
+    space_host_startup = os.getenv("SPACE_HOST")
+    space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
+    if space_host_startup:
+        print(f"✅ SPACE_HOST found: {space_host_startup}")
+        print(f"   Runtime URL should be: https://{space_host_startup}.hf.space")
+    else:
+        print("ℹ️  SPACE_HOST environment variable not found (running locally?).")
+    if space_id_startup: # Print repo URLs if SPACE_ID is found
+        print(f"✅ SPACE_ID found: {space_id_startup}")
+        print(f"   Repo URL: https://huggingface.co/spaces/{space_id_startup}")
+        print(f"   Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
+    else:
+        print("ℹ️  SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
+    print("-"*(60 + len(" App Starting ")) + "\n")
+    print("Launching Gradio Interface for Basic Agent Evaluation...")
+    demo.launch(debug=True, share=False)

requirements.txt CHANGED Viewed

@@ -1,2 +1,17 @@
 gradio
-requests

+# Core dependencies
 gradio
+requests
+pandas
+# LangChain and LangGraph
+langchain-core
+langchain-community
+langchain-groq
+langgraph
+# Tools and APIs
+tavily-python
+wikipedia
+# Utilities
+python-dotenv

test_agent.py ADDED Viewed

	@@ -0,0 +1,84 @@

+"""
+Simple test script for the GAIA agent
+"""
+import os
+from dotenv import load_dotenv
+from langchain_core.messages import HumanMessage
+from agent import build_graph
+# Load environment variables
+load_dotenv()
+# Verify API keys are set
+print("Checking API keys...")
+groq_key = os.getenv("GROQ_API_KEY")
+tavily_key = os.getenv("TAVILY_API_KEY")
+if not groq_key:
+    print("❌ GROQ_API_KEY not found in environment")
+else:
+    print(f"✅ GROQ_API_KEY found: {groq_key[:10]}...")
+if not tavily_key:
+    print("❌ TAVILY_API_KEY not found in environment")
+else:
+    print(f"✅ TAVILY_API_KEY found: {tavily_key[:10]}...")
+print("\n" + "="*60)
+print("Building agent...")
+print("="*60)
+try:
+    agent = build_graph()
+    print("✅ Agent built successfully!")
+except Exception as e:
+    print(f"❌ Error building agent: {e}")
+    exit(1)
+# Test questions (simple ones to verify functionality)
+test_questions = [
+    {
+        "question": "What is 25 * 4?",
+        "expected_type": "number",
+        "description": "Simple calculation test"
+    },
+    {
+        "question": "Who was the first president of the United States? Answer with just the name.",
+        "expected_type": "text",
+        "description": "Simple knowledge test"
+    }
+]
+print("\n" + "="*60)
+print("Running tests...")
+print("="*60)
+for i, test in enumerate(test_questions, 1):
+    print(f"\n{'='*60}")
+    print(f"Test {i}: {test['description']}")
+    print(f"Question: {test['question']}")
+    print('='*60)
+    try:
+        config = {"configurable": {"thread_id": f"test_{i}"}}
+        result = agent.invoke(
+            {"messages": [HumanMessage(content=test['question'])]},
+            config=config
+        )
+        answer = result['messages'][-1].content
+        # Extract final answer if it has "Final Answer:" prefix
+        if "Final Answer:" in answer:
+            answer = answer.split("Final Answer:")[-1].strip()
+        print(f"✅ Answer: {answer}")
+    except Exception as e:
+        print(f"❌ Error: {e}")
+        import traceback
+        traceback.print_exc()
+print("\n" + "="*60)
+print("Tests completed!")
+print("="*60)