Spaces:

Gralon
/

MyGAIASimpleAgent_clean

Paused

App Files Files Community

Gralon commited on Apr 25, 2025

Commit

dab1aa6

verified ·

1 Parent(s): bff8ed8

Upload 4 files

Browse files

Files changed (4) hide show

README.md +76 -5
app.py +393 -0
requirements.txt +11 -0
tools.py +376 -0

README.md CHANGED Viewed

@@ -1,12 +1,83 @@
 ---
-title: MyGAIASimpleAgent Clean
-emoji: 👀
-colorFrom: red
 colorTo: purple
 sdk: gradio
-sdk_version: 5.26.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: GAIA Agent - Hugging Face Agents Course
+emoji: 🧠
+colorFrom: blue
 colorTo: purple
 sdk: gradio
+sdk_version: 5.25.2
 app_file: app.py
 pinned: false
+hf_oauth: true
+hf_oauth_expiration_minutes: 480
 ---
+# GAIA Agent - Hugging Face Agents Course
+This project implements an intelligent agent for the final assessment of the Hugging Face Agents course. The agent is designed to achieve a score of 30% or higher on the GAIA benchmark.
+## Features
+- **Efficient Implementation**: Minimal yet powerful solution using smolagents
+- **OpenAI Integration**: Option to use gpt-4o-mini (cost efficient) or gpt-4o (higher accuracy)
+- **Web Search Capabilities**: Leverages DuckDuckGo search with rate limiting protections
+- **File Processing**: Handles various file types like CSV, Excel, and images
+- **Reverse Text Detection**: Automatically detects and handles reversed text questions
+- **Cost Controls**: Sample size slider and model selection options to manage API costs
+## Usage
+1. Clone this repository
+2. Install the required dependencies:
+```bash
+pip install -r requirements.txt
+```
+3. Create a `.env` file with your OpenAI API key:
+```
+OPENAI_API_KEY=your_key_here
+OPENAI_MODEL_ID=gpt-4o-mini  # or gpt-4o for higher accuracy
+```
+4. Run the application:
+```bash
+python app.py
+```
+## How it Works
+The agent uses a CodeAgent from smolagents with enhanced prompting and multiple tools to solve the GAIA questions. It employs a straightforward approach that:
+1. Receives questions from the GAIA API
+2. Processes questions with specialized handling for reversed text
+3. Uses appropriate tools based on the question type
+4. Returns precise answers in the expected format
+The agent is specifically designed to follow the GAIA benchmark format requirements, ensuring all answers are provided in the exact format expected by the evaluation system.
+## Tools
+- Web search (DuckDuckGo with rate limiting protection)
+- Reverse text analysis
+- File processing tools for CSV and Excel files
+- Image OCR capabilities
+- Date and time utilities
+- File download handling
+## Deployment
+To deploy on Hugging Face Spaces:
+1. Create a new Space on Hugging Face
+2. Upload all files from this repository (EXCLUDING the .env file)
+3. Add the following secrets in your Space settings:
+   - OPENAI_API_KEY: Your OpenAI API key
+   - OPENAI_MODEL_ID: The model to use (gpt-4o-mini or gpt-4o)
+4. Set HF_OAUTH to true in your Space settings to enable login/authentication
+## Testing
+You can use the test_single.py script to test the agent with individual questions locally:
+```bash
+python test_single.py
+```
+This helps verify functionality without incurring high API costs during development.

app.py ADDED Viewed

	@@ -0,0 +1,393 @@

+import os
+import gradio as gr
+import requests
+import inspect
+import pandas as pd
+from dotenv import load_dotenv
+from smolagents import CodeAgent, DuckDuckGoSearchTool, OpenAIServerModel
+from tools import (
+    ReverseTextTool,
+    ExtractTextFromImageTool,
+    AnalyzeCSVTool,
+    AnalyzeExcelTool,
+    DateCalculatorTool,
+    DownloadFileTool
+)
+# Try to load environment variables
+try:
+    load_dotenv()
+    print("Loaded environment variables from .env file")
+except Exception as e:
+    print(f"Note: Could not load .env file - {e}")
+# --- Constants ---
+DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+# --- GAIA Agent Definition ---
+class GAIAAgent:
+    def __init__(self, verbose=False):
+        self.verbose = verbose
+        print("Initializing GAIA Agent...")
+        # Get API key
+        api_key = os.environ.get("OPENAI_API_KEY")
+        if not api_key:
+            raise ValueError("OpenAI API key not found. Please set the OPENAI_API_KEY environment variable.")
+        # Initialize model with gpt-4o-mini for cost efficiency
+        model_id = os.environ.get("OPENAI_MODEL_ID", "gpt-4o-mini")  # Use environment variable or default to mini
+        print(f"Using OpenAI model: {model_id}")
+        model = OpenAIServerModel(
+            model_id=model_id,
+            api_key=api_key,
+            temperature=0.1
+        )
+        # Initialize tools with rate limiting for web search
+        # Note: Use a more compatible approach to rate limiting
+        # Instead of wait_time parameter, we'll handle delays explicitly in the agent prompts
+        duck_search_tool = DuckDuckGoSearchTool()
+        self.tools = [
+            duck_search_tool,         # Web search
+            ReverseTextTool(),        # Handling reversed text
+            ExtractTextFromImageTool(), # OCR for images
+            AnalyzeCSVTool(),         # CSV analysis
+            AnalyzeExcelTool(),       # Excel analysis
+            DateCalculatorTool(),     # Date calculations
+            DownloadFileTool()        # File downloads
+        ]
+        # Add more authorized imports to prevent common errors
+        additional_imports = [
+            "PyPDF2", "pdf2image", "pillow", "nltk", "sklearn",
+            "networkx", "matplotlib", "seaborn", "scipy", "time"
+        ]
+        # Initialize CodeAgent with planning, base tools and additional imports
+        self.agent = CodeAgent(
+            tools=self.tools,
+            model=model,
+            add_base_tools=True,  # Add memory and other base tools
+            planning_interval=3,   # Refresh planning every 3 steps
+            verbosity_level=2 if self.verbose else 0,
+            additional_authorized_imports=additional_imports
+        )
+        print("GAIA Agent initialized and ready")
+    def _is_reversed_text(self, text):
+        """Check if the text appears to be reversed"""
+        # Common patterns in reversed text
+        return (
+            text.startswith(".") or
+            ".rewsna eht sa" in text or
+            "esrever" in text or
+            "sdrawkcab" in text
+        )
+    def __call__(self, question: str) -> str:
+        """Process a question and return the answer"""
+        if self.verbose:
+            print(f"Processing question: {question[:100]}..." if len(question) > 100 else f"Processing question: {question}")
+        # Check if the question contains reversed text
+        if self._is_reversed_text(question):
+            if self.verbose:
+                print("Detected reversed text, will handle accordingly")
+            # Create a prompt that explicitly mentions the reversed text with GAIA guidelines
+            # Add guidance to limit tool usage and prevent infinite loops
+            prompt = f"""
+You are a general AI assistant. I will ask you a question.
+This question appears to be in reversed text. Here is the reversed version for clarity:
+{question[::-1]}
+Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER].
+YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.
+- If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise.
+- If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise.
+- If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.
+IMPORTANT NOTES TO LIMIT COSTS AND PREVENT ERRORS:
+- Use web search sparingly and only when absolutely necessary.
+- Limit to 1-2 web searches per question.
+- If a search fails due to rate limiting, add a 3-5 second delay using time.sleep() before retrying with a different search term.
+- Do not import libraries that aren't available - stick to basic Python and the tools provided.
+- Focus on answering directly with what you already know when possible.
+- If you've made more than 3 attempts to solve a problem, prioritize providing your best guess.
+- Always add a delay of 2-3 seconds between web searches using time.sleep() to avoid rate limiting.
+Remember to structure your response in Python code format using the final_answer() function.
+"""
+        else:
+            # For normal questions, create a standard prompt following GAIA guidelines
+            # Add guidance to limit tool usage and prevent infinite loops
+            prompt = f"""
+You are a general AI assistant. I will ask you a question. Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER].
+YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.
+- If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise.
+- If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise.
+- If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.
+Question: {question}
+IMPORTANT NOTES TO LIMIT COSTS AND PREVENT ERRORS:
+- Use web search sparingly and only when absolutely necessary.
+- Limit to 1-2 web searches per question.
+- If a search fails due to rate limiting, add a 3-5 second delay using time.sleep() before retrying with a different search term.
+- Do not import libraries that aren't available - stick to basic Python and the tools provided.
+- Focus on answering directly with what you already know when possible.
+- If you've made more than 3 attempts to solve a problem, prioritize providing your best guess.
+- Always add a delay of 2-3 seconds between web searches using time.sleep() to avoid rate limiting.
+Remember to structure your response in Python code format using the final_answer() function.
+"""
+        # Run the agent
+        try:
+            answer = self.agent.run(prompt)
+            if self.verbose:
+                print(f"Generated answer: {answer}")
+            return answer
+        except Exception as e:
+            error_msg = f"Error processing question: {e}"
+            if self.verbose:
+                print(error_msg)
+            return error_msg
+def run_and_submit_all(profile: gr.OAuthProfile | None, sample_size: int = 0):
+    """
+    Fetches all questions, runs the agent on them, submits all answers,
+    and displays the results.
+    Args:
+        profile: User profile for authentication
+        sample_size: Number of questions to process (0 for all questions)
+    """
+    # --- Determine HF Space Runtime URL and Repo URL ---
+    space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
+    if profile:
+        username = f"{profile.username}"
+        print(f"User logged in: {username}")
+    else:
+        print("User not logged in.")
+        return "Please Login to Hugging Face with the button.", None
+    api_url = DEFAULT_API_URL
+    questions_url = f"{api_url}/questions"
+    submit_url = f"{api_url}/submit"
+    # 1. Instantiate Agent
+    try:
+        agent = GAIAAgent(verbose=True)
+    except Exception as e:
+        print(f"Error instantiating agent: {e}")
+        return f"Error initializing agent: {e}", None
+    # Get the code URL for submission
+    agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
+    print(f"Agent code URL: {agent_code}")
+    # 2. Fetch Questions
+    print(f"Fetching questions from: {questions_url}")
+    try:
+        response = requests.get(questions_url, timeout=15)
+        response.raise_for_status()
+        questions_data = response.json()
+        if not questions_data:
+             print("Fetched questions list is empty.")
+             return "Fetched questions list is empty or invalid format.", None
+        print(f"Fetched {len(questions_data)} questions.")
+    except requests.exceptions.RequestException as e:
+        print(f"Error fetching questions: {e}")
+        return f"Error fetching questions: {e}", None
+    except requests.exceptions.JSONDecodeError as e:
+         print(f"Error decoding JSON response from questions endpoint: {e}")
+         print(f"Response text: {response.text[:500]}")
+         return f"Error decoding server response for questions: {e}", None
+    except Exception as e:
+        print(f"An unexpected error occurred fetching questions: {e}")
+        return f"An unexpected error occurred fetching questions: {e}", None
+    # 3. Run Agent on Questions
+    results_log = []
+    answers_payload = []
+    # Limit number of questions if sample_size is specified
+    if sample_size > 0 and sample_size < len(questions_data):
+        import random
+        print(f"Using a sample of {sample_size} questions from {len(questions_data)} total questions")
+        questions_data = random.sample(questions_data, sample_size)
+    print(f"Running agent on {len(questions_data)} questions...")
+    for i, item in enumerate(questions_data):
+        task_id = item.get("task_id")
+        question_text = item.get("question")
+        if not task_id or question_text is None:
+            print(f"Skipping item with missing task_id or question: {item}")
+            continue
+        try:
+            print(f"Processing question {i+1}/{len(questions_data)}: Task ID {task_id}")
+            submitted_answer = agent(question_text)
+            answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
+            results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
+            print(f"Successfully processed question {i+1}")
+            # Add a delay between questions to avoid rate limiting
+            if i < len(questions_data) - 1:
+                import time
+                print("Waiting 2 seconds before next question...")
+                time.sleep(2)
+        except Exception as e:
+             print(f"Error running agent on task {task_id}: {e}")
+             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
+    if not answers_payload:
+        print("Agent did not produce any answers to submit.")
+        return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
+    # 4. Prepare Submission
+    submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
+    status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
+    print(status_update)
+    # 5. Submit
+    print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
+    try:
+        response = requests.post(submit_url, json=submission_data, timeout=60)
+        response.raise_for_status()
+        result_data = response.json()
+        final_status = (
+            f"Submission Successful!\n"
+            f"User: {result_data.get('username')}\n"
+            f"Overall Score: {result_data.get('score', 'N/A')}% "
+            f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
+            f"Message: {result_data.get('message', 'No message received.')}"
+        )
+        print("Submission successful.")
+        results_df = pd.DataFrame(results_log)
+        return final_status, results_df
+    except requests.exceptions.HTTPError as e:
+        error_detail = f"Server responded with status {e.response.status_code}."
+        try:
+            error_json = e.response.json()
+            error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
+        except requests.exceptions.JSONDecodeError:
+            error_detail += f" Response: {e.response.text[:500]}"
+        status_message = f"Submission Failed: {error_detail}"
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+    except requests.exceptions.Timeout:
+        status_message = "Submission Failed: The request timed out."
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+    except requests.exceptions.RequestException as e:
+        status_message = f"Submission Failed: Network error - {e}"
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+    except Exception as e:
+        status_message = f"An unexpected error occurred during submission: {e}"
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+def test_single_question(question: str) -> str:
+    """Test the agent on a single question"""
+    try:
+        agent = GAIAAgent(verbose=True)
+        answer = agent(question)
+        return answer
+    except Exception as e:
+        return f"Error: {e}"
+# --- Build Gradio Interface using Blocks ---
+with gr.Blocks() as demo:
+    gr.Markdown("# GAIA Agent Evaluation Runner")
+    gr.Markdown(
+        """
+        ## Instructions:
+        1. Log in to your Hugging Face account using the button below
+        2. Test your agent on individual questions in the Testing tab
+        3. Run the full evaluation on the GAIA benchmark in the Evaluation tab
+        This agent is designed to achieve a score of 30% or higher on the GAIA benchmark.
+        """
+    )
+    gr.LoginButton()
+    with gr.Tab("Test Single Question"):
+        test_input = gr.Textbox(label="Enter a question to test", lines=3)
+        test_output = gr.Textbox(label="Answer", lines=3)
+        test_button = gr.Button("Test Question")
+        test_button.click(
+            fn=test_single_question,
+            inputs=test_input,
+            outputs=test_output
+        )
+    with gr.Tab("Full Evaluation"):
+        with gr.Row():
+            sample_size = gr.Slider(
+                minimum=0,
+                maximum=20,
+                value=0,
+                step=1,
+                label="Sample Size (0 for all questions)",
+                info="Set a number to limit how many questions to process (reduces costs)"
+            )
+        run_button = gr.Button("Run Evaluation & Submit All Answers")
+        status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
+        results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
+        run_button.click(
+            fn=run_and_submit_all,
+            inputs=[gr.LoginButton(), sample_size],
+            outputs=[status_output, results_table]
+        )
+if __name__ == "__main__":
+    print("\n" + "-"*30 + " GAIA Agent Starting " + "-"*30)
+    # Check for API key
+    api_key = os.environ.get("OPENAI_API_KEY")
+    if not api_key:
+        print("WARNING: OpenAI API key not found. Please set OPENAI_API_KEY environment variable.")
+    else:
+        print("OpenAI API key found.")
+    # Check environment variables
+    space_host = os.getenv("SPACE_HOST")
+    space_id = os.getenv("SPACE_ID")
+    if space_host:
+        print(f"✅ Running in Hugging Face Space: {space_host}")
+        print(f"   Runtime URL: https://{space_host}.hf.space")
+    else:
+        print("ℹ️ Running locally")
+    if space_id:
+        print(f"✅ Space ID: {space_id}")
+        print(f"   Repo URL: https://huggingface.co/spaces/{space_id}")
+        print(f"   Code URL: https://huggingface.co/spaces/{space_id}/tree/main")
+    print("-"*78 + "\n")
+    print("Launching Gradio Interface...")
+    demo.launch(debug=True)

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+gradio
+gradio[oauth]
+itsdangerous
+requests
+pandas
+numpy
+smolagents
+smolagents[openai]
+python-dotenv
+openai>=1.0.0
+litellm

tools.py ADDED Viewed

	@@ -0,0 +1,376 @@

+from smolagents import Tool
+import pandas as pd
+import os
+import tempfile
+import requests
+from urllib.parse import urlparse
+import json
+import re
+from datetime import datetime, timedelta
+class ReverseTextTool(Tool):
+    name = "reverse_text"
+    description = "Reverses the text in a string."
+    inputs = {
+        "text": {
+            "type": "string",
+            "description": "The text to reverse."
+        }
+    }
+    output_type = "string"
+    def forward(self, text: str) -> str:
+        return text[::-1]
+class ExtractTextFromImageTool(Tool):
+    name = "extract_text_from_image"
+    description = "Extracts text from an image file using OCR."
+    inputs = {
+        "image_path": {
+            "type": "string",
+            "description": "Path to the image file."
+        }
+    }
+    output_type = "string"
+    def forward(self, image_path: str) -> str:
+        try:
+            # Try to import pytesseract
+            import pytesseract
+            from PIL import Image
+            # Open the image
+            image = Image.open(image_path)
+            # Try different configurations for better results
+            configs = [
+                '--psm 6',  # Assume a single uniform block of text
+                '--psm 3',  # Automatic page segmentation, but no OSD
+                '--psm 1',  # Automatic page segmentation with OSD
+            ]
+            results = []
+            for config in configs:
+                try:
+                    text = pytesseract.image_to_string(image, config=config)
+                    if text.strip():
+                        results.append(text)
+                except Exception:
+                    continue
+            if results:
+                # Return the longest result, which is likely the most complete
+                return f"Extracted text from image:\n\n{max(results, key=len)}"
+            else:
+                return "No text could be extracted from the image."
+        except ImportError:
+            return "Error: pytesseract is not installed. Please install it with 'pip install pytesseract' and ensure Tesseract OCR is installed on your system."
+        except Exception as e:
+            return f"Error extracting text from image: {str(e)}"
+class AnalyzeCSVTool(Tool):
+    name = "analyze_csv_file"
+    description = "Analyzes a CSV file and provides information about its contents."
+    inputs = {
+        "file_path": {
+            "type": "string",
+            "description": "Path to the CSV file."
+        },
+        "query": {
+            "type": "string",
+            "description": "Optional query about the data.",
+            "default": "",
+            "nullable": True
+        }
+    }
+    output_type = "string"
+    def forward(self, file_path: str, query: str = "") -> str:
+        try:
+            # Read CSV file with different encodings if needed
+            for encoding in ['utf-8', 'latin1', 'iso-8859-1', 'cp1252']:
+                try:
+                    df = pd.read_csv(file_path, encoding=encoding)
+                    break
+                except UnicodeDecodeError:
+                    continue
+            else:
+                return "Error: Could not read the CSV file with any of the attempted encodings."
+            # Basic information
+            result = f"CSV file has {len(df)} rows and {len(df.columns)} columns.\n"
+            result += f"Columns: {', '.join(df.columns)}\n\n"
+            # If there's a specific query
+            if query:
+                if "count" in query.lower():
+                    result += f"Row count: {len(df)}\n"
+                # Look for column-specific queries
+                for col in df.columns:
+                    if col.lower() in query.lower():
+                        result += f"\nColumn '{col}' information:\n"
+                        if pd.api.types.is_numeric_dtype(df[col]):
+                            result += f"Min: {df[col].min()}\n"
+                            result += f"Max: {df[col].max()}\n"
+                            result += f"Mean: {df[col].mean()}\n"
+                            result += f"Median: {df[col].median()}\n"
+                        else:
+                            # For categorical data
+                            value_counts = df[col].value_counts().head(10)
+                            result += f"Unique values: {df[col].nunique()}\n"
+                            result += f"Top values:\n{value_counts.to_string()}\n"
+            # General statistics for all columns
+            else:
+                # For numeric columns
+                numeric_cols = df.select_dtypes(include=['number']).columns
+                if len(numeric_cols) > 0:
+                    result += "Numeric columns statistics:\n"
+                    result += df[numeric_cols].describe().to_string()
+                    result += "\n\n"
+                # For categorical columns, show counts of unique values
+                cat_cols = df.select_dtypes(exclude=['number']).columns
+                if len(cat_cols) > 0:
+                    result += "Categorical columns:\n"
+                    for col in cat_cols[:5]:  # Limit to first 5 columns
+                        result += f"- {col}: {df[col].nunique()} unique values\n"
+            return result
+        except Exception as e:
+            return f"Error analyzing CSV file: {str(e)}"
+class AnalyzeExcelTool(Tool):
+    name = "analyze_excel_file"
+    description = "Analyzes an Excel file and provides information about its contents."
+    inputs = {
+        "file_path": {
+            "type": "string",
+            "description": "Path to the Excel file."
+        },
+        "query": {
+            "type": "string",
+            "description": "Optional query about the data.",
+            "default": "",
+            "nullable": True
+        },
+        "sheet_name": {
+            "type": "string",
+            "description": "Name of the sheet to analyze (defaults to first sheet).",
+            "default": None,
+            "nullable": True
+        }
+    }
+    output_type = "string"
+    def forward(self, file_path: str, query: str = "", sheet_name: str = None) -> str:
+        try:
+            # Read sheet names first
+            excel_file = pd.ExcelFile(file_path)
+            sheet_names = excel_file.sheet_names
+            # Info about all sheets
+            result = f"Excel file contains {len(sheet_names)} sheets: {', '.join(sheet_names)}\n\n"
+            # If sheet name is specified, use it; otherwise use first sheet
+            if sheet_name is None:
+                sheet_name = sheet_names[0]
+            elif sheet_name not in sheet_names:
+                return f"Error: Sheet '{sheet_name}' not found. Available sheets: {', '.join(sheet_names)}"
+            # Read the specified sheet
+            df = pd.read_excel(file_path, sheet_name=sheet_name)
+            # Basic information
+            result += f"Sheet '{sheet_name}' has {len(df)} rows and {len(df.columns)} columns.\n"
+            result += f"Columns: {', '.join(df.columns)}\n\n"
+            # Handle query similar to CSV tool
+            if query:
+                if "count" in query.lower():
+                    result += f"Row count: {len(df)}\n"
+                # Look for column-specific queries
+                for col in df.columns:
+                    if col.lower() in query.lower():
+                        result += f"\nColumn '{col}' information:\n"
+                        if pd.api.types.is_numeric_dtype(df[col]):
+                            result += f"Min: {df[col].min()}\n"
+                            result += f"Max: {df[col].max()}\n"
+                            result += f"Mean: {df[col].mean()}\n"
+                            result += f"Median: {df[col].median()}\n"
+                        else:
+                            # For categorical data
+                            value_counts = df[col].value_counts().head(10)
+                            result += f"Unique values: {df[col].nunique()}\n"
+                            result += f"Top values:\n{value_counts.to_string()}\n"
+            else:
+                # For numeric columns
+                numeric_cols = df.select_dtypes(include=['number']).columns
+                if len(numeric_cols) > 0:
+                    result += "Numeric columns statistics:\n"
+                    result += df[numeric_cols].describe().to_string()
+                    result += "\n\n"
+                # For categorical columns, show counts of unique values
+                cat_cols = df.select_dtypes(exclude=['number']).columns
+                if len(cat_cols) > 0:
+                    result += "Categorical columns:\n"
+                    for col in cat_cols[:5]:  # Limit to first 5 columns
+                        result += f"- {col}: {df[col].nunique()} unique values\n"
+            return result
+        except Exception as e:
+            return f"Error analyzing Excel file: {str(e)}"
+class DateCalculatorTool(Tool):
+    name = "date_calculator"
+    description = "Performs date calculations like adding days, formatting dates, etc."
+    inputs = {
+        "query": {
+            "type": "string",
+            "description": "The date calculation to perform (e.g., 'What day is 10 days from today?', 'Format 2023-05-15 as MM/DD/YYYY')"
+        }
+    }
+    output_type = "string"
+    def forward(self, query: str) -> str:
+        try:
+            # Get current date/time
+            if re.search(r'(today|now|current date|current time)', query, re.IGNORECASE):
+                now = datetime.now()
+                if 'time' in query.lower():
+                    return f"Current date and time: {now.strftime('%Y-%m-%d %H:%M:%S')}"
+                else:
+                    return f"Today's date: {now.strftime('%Y-%m-%d')}"
+            # Add days to a date
+            add_match = re.search(r'(what|when).+?(\d+)\s+(day|days|week|weeks|month|months|year|years)\s+(from|after)\s+(.+)', query, re.IGNORECASE)
+            if add_match:
+                amount = int(add_match.group(2))
+                unit = add_match.group(3).lower()
+                date_text = add_match.group(5).strip()
+                # Parse the date
+                if date_text.lower() in ['today', 'now']:
+                    base_date = datetime.now()
+                else:
+                    try:
+                        # Try various date formats
+                        for fmt in ['%Y-%m-%d', '%m/%d/%Y', '%d/%m/%Y', '%B %d, %Y']:
+                            try:
+                                base_date = datetime.strptime(date_text, fmt)
+                                break
+                            except ValueError:
+                                continue
+                        else:
+                            return f"Could not parse date: {date_text}"
+                    except Exception as e:
+                        return f"Error parsing date: {e}"
+                # Calculate new date
+                if 'day' in unit:
+                    new_date = base_date + timedelta(days=amount)
+                elif 'week' in unit:
+                    new_date = base_date + timedelta(weeks=amount)
+                elif 'month' in unit:
+                    # Simplified month calculation
+                    new_month = base_date.month + amount
+                    new_year = base_date.year + (new_month - 1) // 12
+                    new_month = ((new_month - 1) % 12) + 1
+                    new_date = base_date.replace(year=new_year, month=new_month)
+                elif 'year' in unit:
+                    new_date = base_date.replace(year=base_date.year + amount)
+                return f"Date {amount} {unit} from {base_date.strftime('%Y-%m-%d')} is {new_date.strftime('%Y-%m-%d')}"
+            # Format a date
+            format_match = re.search(r'format\s+(.+?)\s+as\s+(.+)', query, re.IGNORECASE)
+            if format_match:
+                date_text = format_match.group(1).strip()
+                format_spec = format_match.group(2).strip()
+                # Parse the date
+                if date_text.lower() in ['today', 'now']:
+                    date_obj = datetime.now()
+                else:
+                    try:
+                        # Try various date formats
+                        for fmt in ['%Y-%m-%d', '%m/%d/%Y', '%d/%m/%Y', '%B %d, %Y']:
+                            try:
+                                date_obj = datetime.strptime(date_text, fmt)
+                                break
+                            except ValueError:
+                                continue
+                        else:
+                            return f"Could not parse date: {date_text}"
+                    except Exception as e:
+                        return f"Error parsing date: {e}"
+                # Convert format specification to strftime format
+                format_mapping = {
+                    'YYYY': '%Y',
+                    'YY': '%y',
+                    'MM': '%m',
+                    'DD': '%d',
+                    'HH': '%H',
+                    'mm': '%M',
+                    'ss': '%S'
+                }
+                strftime_format = format_spec
+                for key, value in format_mapping.items():
+                    strftime_format = strftime_format.replace(key, value)
+                return f"Formatted date: {date_obj.strftime(strftime_format)}"
+            return "I couldn't understand the date calculation query."
+        except Exception as e:
+            return f"Error performing date calculation: {str(e)}"
+class DownloadFileTool(Tool):
+    name = "download_file"
+    description = "Downloads a file from a URL and saves it locally."
+    inputs = {
+        "url": {
+            "type": "string",
+            "description": "The URL to download from."
+        },
+        "filename": {
+            "type": "string",
+            "description": "Optional filename to save as (default: derived from URL).",
+            "default": None,
+            "nullable": True
+        }
+    }
+    output_type = "string"
+    def forward(self, url: str, filename: str = None) -> str:
+        try:
+            # Parse URL to get filename if not provided
+            if not filename:
+                path = urlparse(url).path
+                filename = os.path.basename(path)
+                if not filename:
+                    # Generate a random name if we couldn't extract one
+                    import uuid
+                    filename = f"downloaded_{uuid.uuid4().hex[:8]}"
+            # Create temporary file
+            temp_dir = tempfile.gettempdir()
+            filepath = os.path.join(temp_dir, filename)
+            # Download the file
+            response = requests.get(url, stream=True)
+            response.raise_for_status()
+            # Save the file
+            with open(filepath, 'wb') as f:
+                for chunk in response.iter_content(chunk_size=8192):
+                    f.write(chunk)
+            return f"File downloaded to {filepath}. You can now analyze this file."
+        except Exception as e:
+            return f"Error downloading file: {str(e)}"