shekkari21 commited on
Commit
4dbe519
·
1 Parent(s): ec96f6b

added files for creating basic agentic loop

Browse files
.gitignore CHANGED
@@ -214,3 +214,6 @@ __marimo__/
214
 
215
  # Streamlit
216
  .streamlit/secrets.toml
 
 
 
 
214
 
215
  # Streamlit
216
  .streamlit/secrets.toml
217
+ my_code.ipynb
218
+ __pycache__/
219
+ .venv/
README.md CHANGED
@@ -1,7 +1,21 @@
1
- This repository contains the code for Manning Publications' "Build an AI Agent From Scratch".
2
 
3
- ### Install uv (docs: https://docs.astral.sh/uv/getting-started/installation/)
4
- - macOS/Linux (official script):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ```bash
6
  curl -LsSf https://astral.sh/uv/install.sh | sh
7
  ```
@@ -9,31 +23,99 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
9
  ```bash
10
  brew install uv
11
  ```
12
- - Verify installation:
13
- ```bash
14
- uv --version
15
- ```
16
 
17
- ### Create a virtual environment (uv venv)
 
 
18
  ```bash
19
  uv venv
20
  source .venv/bin/activate
21
  ```
22
 
23
- ### Install dependencies with uv
24
  ```bash
25
  uv pip install -r requirements.txt
26
  ```
27
 
28
- ### Install scratch_agents package (Required for Chapter 4+)
29
- - For Chapter 4 and beyond, install the scratch_agents package in editable mode:
30
  ```bash
31
- uv pip install -e .
 
32
  ```
33
 
34
- ### Environment variables
35
- - Copy the example env file and set your API keys:
36
- ```bash
37
- cp .env.example .env
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ```
39
- - Open `.env` and provide the necessary keys (e.g., `OPENAI_API_KEY=...`).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI Agent Framework
2
 
3
+ A flexible framework for building AI agents with tool support, MCP integration, and multi-step reasoning.
4
+
5
+ ## Features
6
+
7
+ - 🤖 **Agent System**: Multi-step reasoning with tool execution
8
+ - 🛠️ **Tool Framework**: Easy tool creation and integration
9
+ - 🔌 **MCP Integration**: Load tools from MCP servers
10
+ - 💬 **LLM Client**: Unified interface for LLM API calls via LiteLLM
11
+ - 📦 **Modular Design**: Clean, organized package structure
12
+
13
+ ## Installation
14
+
15
+ ### Prerequisites
16
+
17
+ Install `uv` (recommended package manager):
18
+ - macOS/Linux:
19
  ```bash
20
  curl -LsSf https://astral.sh/uv/install.sh | sh
21
  ```
 
23
  ```bash
24
  brew install uv
25
  ```
 
 
 
 
26
 
27
+ ### Setup
28
+
29
+ 1. Create a virtual environment:
30
  ```bash
31
  uv venv
32
  source .venv/bin/activate
33
  ```
34
 
35
+ 2. Install dependencies:
36
  ```bash
37
  uv pip install -r requirements.txt
38
  ```
39
 
40
+ 3. Set up environment variables:
 
41
  ```bash
42
+ cp .env.example .env
43
+ # Edit .env and add your API keys (OPENAI_API_KEY, TAVILY_API_KEY, etc.)
44
  ```
45
 
46
+ ## Quick Start
47
+
48
+ ```python
49
+ from agent_framework import Agent, LlmClient, FunctionTool
50
+
51
+ # Define a tool
52
+ def calculator(expression: str) -> float:
53
+ """Calculate mathematical expressions."""
54
+ return eval(expression)
55
+
56
+ # Create the agent
57
+ agent = Agent(
58
+ model=LlmClient(model="gpt-5-mini"),
59
+ tools=[FunctionTool(calculator)],
60
+ instructions="You are a helpful assistant.",
61
+ )
62
+
63
+ # Run the agent
64
+ result = await agent.run("What is 1234 * 5678?")
65
+ print(result.output) # "7006652"
66
+ ```
67
+
68
+ ## Package Structure
69
+
70
+ ```
71
+ agent_framework/
72
+ ├── __init__.py # Package exports
73
+ ├── models.py # Core data models (Message, ToolCall, Event, ExecutionContext)
74
+ ├── tools.py # BaseTool and FunctionTool classes
75
+ ├── llm.py # LlmClient and request/response models
76
+ ├── agent.py # Agent and AgentResult classes
77
+ ├── mcp.py # MCP tool loading utilities
78
+ └── utils.py # Helper functions for tool definitions
79
  ```
80
+
81
+ ## Usage Examples
82
+
83
+ ### Using the @tool Decorator
84
+
85
+ ```python
86
+ from agent_framework import tool
87
+
88
+ @tool
89
+ def multiply(a: float, b: float) -> float:
90
+ """Multiply two numbers."""
91
+ return a * b
92
+
93
+ # multiply is now a FunctionTool instance
94
+ ```
95
+
96
+ ### MCP Tool Integration
97
+
98
+ ```python
99
+ from agent_framework import load_mcp_tools
100
+ import os
101
+
102
+ connection = {
103
+ "command": "npx",
104
+ "args": ["-y", "tavily-mcp@latest"],
105
+ "env": {"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")}
106
+ }
107
+
108
+ mcp_tools = await load_mcp_tools(connection)
109
+ agent = Agent(
110
+ model=LlmClient(model="gpt-5-mini"),
111
+ tools=mcp_tools,
112
+ )
113
+ ```
114
+
115
+ ## Documentation
116
+
117
+ See `agent_framework/README.md` for detailed API documentation.
118
+
119
+ ## License
120
+
121
+ See LICENSE file for details.
agent_framework/README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Framework
2
+
3
+ A flexible framework for building AI agents with tool support, MCP integration, and multi-step reasoning.
4
+
5
+ ## Structure
6
+
7
+ ```
8
+ agent_framework/
9
+ ├── __init__.py # Package exports
10
+ ├── models.py # Core data models (Message, ToolCall, Event, ExecutionContext)
11
+ ├── tools.py # BaseTool and FunctionTool classes
12
+ ├── llm.py # LlmClient and request/response models
13
+ ├── agent.py # Agent and AgentResult classes
14
+ ├── mcp.py # MCP tool loading utilities
15
+ └── utils.py # Helper functions for tool definitions
16
+ ```
17
+
18
+ ## Quick Start
19
+
20
+ ```python
21
+ from agent_framework import Agent, LlmClient, FunctionTool
22
+
23
+ # Define a tool
24
+ def calculator(expression: str) -> float:
25
+ """Calculate mathematical expressions."""
26
+ return eval(expression)
27
+
28
+ # Create the agent
29
+ agent = Agent(
30
+ model=LlmClient(model="gpt-5-mini"),
31
+ tools=[FunctionTool(calculator)],
32
+ instructions="You are a helpful assistant.",
33
+ )
34
+
35
+ # Run the agent
36
+ result = await agent.run("What is 1234 * 5678?")
37
+ print(result.output) # "7006652"
38
+ ```
39
+
40
+ ## Components
41
+
42
+ ### Models (`models.py`)
43
+ - `Message`: Text messages in conversations
44
+ - `ToolCall`: LLM's request to execute a tool
45
+ - `ToolResult`: Result from tool execution
46
+ - `Event`: Recorded occurrence during agent execution
47
+ - `ExecutionContext`: Central storage for execution state
48
+
49
+ ### Tools (`tools.py`)
50
+ - `BaseTool`: Abstract base class for all tools
51
+ - `FunctionTool`: Wraps Python functions as tools
52
+
53
+ ### LLM (`llm.py`)
54
+ - `LlmClient`: Client for LLM API calls using LiteLLM
55
+ - `LlmRequest`: Request object for LLM calls
56
+ - `LlmResponse`: Response object from LLM calls
57
+
58
+ ### Agent (`agent.py`)
59
+ - `Agent`: Main agent class that orchestrates reasoning and tool execution
60
+ - `AgentResult`: Result of an agent execution
61
+
62
+ ### MCP (`mcp.py`)
63
+ - `load_mcp_tools()`: Load tools from MCP servers
64
+
65
+ ### Utils (`utils.py`)
66
+ - `function_to_input_schema()`: Convert function signature to JSON Schema
67
+ - `format_tool_definition()`: Format tool definition in OpenAI format
68
+ - `tool`: Decorator to convert functions to tools
69
+
70
+ ## Usage Examples
71
+
72
+ ### Basic Tool Usage
73
+
74
+ ```python
75
+ from agent_framework import FunctionTool
76
+
77
+ def my_function(x: int, y: int) -> int:
78
+ """Add two numbers."""
79
+ return x + y
80
+
81
+ tool = FunctionTool(my_function)
82
+ result = await tool.execute(context, x=5, y=3) # 8
83
+ ```
84
+
85
+ ### Using the @tool Decorator
86
+
87
+ ```python
88
+ from agent_framework import tool
89
+
90
+ @tool
91
+ def multiply(a: float, b: float) -> float:
92
+ """Multiply two numbers."""
93
+ return a * b
94
+
95
+ # multiply is now a FunctionTool instance
96
+ ```
97
+
98
+ ### MCP Tool Integration
99
+
100
+ ```python
101
+ from agent_framework import load_mcp_tools
102
+ import os
103
+
104
+ connection = {
105
+ "command": "npx",
106
+ "args": ["-y", "tavily-mcp@latest"],
107
+ "env": {"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")}
108
+ }
109
+
110
+ mcp_tools = await load_mcp_tools(connection)
111
+ agent = Agent(
112
+ model=LlmClient(model="gpt-5-mini"),
113
+ tools=mcp_tools,
114
+ )
115
+ ```
116
+
117
+ ## Installation
118
+
119
+ The framework uses:
120
+ - `pydantic` for data validation
121
+ - `litellm` for LLM API calls
122
+ - `mcp` for MCP server integration
123
+
124
+ Install dependencies:
125
+ ```bash
126
+ pip install pydantic litellm mcp
127
+ ```
128
+
agent_framework/__init__.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Agent Framework - A flexible framework for building AI agents with tool support."""
2
+
3
+ from .models import (
4
+ Message,
5
+ ToolCall,
6
+ ToolResult,
7
+ ContentItem,
8
+ Event,
9
+ ExecutionContext,
10
+ )
11
+ from .tools import BaseTool, FunctionTool, tool
12
+ from .llm import LlmClient, LlmRequest, LlmResponse
13
+ from .agent import Agent, AgentResult
14
+ from .mcp import load_mcp_tools
15
+ from .utils import (
16
+ function_to_input_schema,
17
+ format_tool_definition,
18
+ function_to_tool_definition,
19
+ mcp_tools_to_openai_format,
20
+ )
21
+
22
+ __all__ = [
23
+ # Models
24
+ "Message",
25
+ "ToolCall",
26
+ "ToolResult",
27
+ "ContentItem",
28
+ "Event",
29
+ "ExecutionContext",
30
+ # Tools
31
+ "BaseTool",
32
+ "FunctionTool",
33
+ "tool",
34
+ # LLM
35
+ "LlmClient",
36
+ "LlmRequest",
37
+ "LlmResponse",
38
+ # Agent
39
+ "Agent",
40
+ "AgentResult",
41
+ # MCP
42
+ "load_mcp_tools",
43
+ # Utils
44
+ "function_to_input_schema",
45
+ "format_tool_definition",
46
+ "function_to_tool_definition",
47
+ "mcp_tools_to_openai_format",
48
+ ]
49
+
50
+ __version__ = "0.1.0"
51
+
agent_framework/agent.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Agent class for executing multi-step reasoning with tools."""
2
+
3
+ from dataclasses import dataclass
4
+ from typing import List, Optional
5
+ from pydantic import BaseModel
6
+
7
+ from .models import (
8
+ ExecutionContext,
9
+ Event,
10
+ Message,
11
+ ToolCall,
12
+ ToolResult
13
+ )
14
+ from .tools import BaseTool
15
+ from .llm import LlmClient, LlmRequest, LlmResponse
16
+
17
+
18
+ @dataclass
19
+ class AgentResult:
20
+ """Result of an agent execution."""
21
+ output: str | BaseModel
22
+ context: ExecutionContext
23
+
24
+
25
+ class Agent:
26
+ """Agent that can reason and use tools to solve tasks."""
27
+
28
+ def __init__(
29
+ self,
30
+ model: LlmClient,
31
+ tools: List[BaseTool] = None,
32
+ instructions: str = "",
33
+ max_steps: int = 10,
34
+ name: str = "agent",
35
+ ):
36
+ self.model = model
37
+ self.instructions = instructions
38
+ self.max_steps = max_steps
39
+ self.name = name
40
+ self.tools = self._setup_tools(tools or [])
41
+
42
+ def _setup_tools(self, tools: List[BaseTool]) -> List[BaseTool]:
43
+ return tools
44
+
45
+ def _prepare_llm_request(self, context: ExecutionContext) -> LlmRequest:
46
+ """Convert execution context to LLM request."""
47
+ # Flatten events into content items
48
+ flat_contents = []
49
+ for event in context.events:
50
+ flat_contents.extend(event.content)
51
+
52
+ return LlmRequest(
53
+ instructions=[self.instructions] if self.instructions else [],
54
+ contents=flat_contents,
55
+ tools=self.tools,
56
+ tool_choice="auto" if self.tools else None,
57
+ )
58
+
59
+ async def think(self, llm_request: LlmRequest) -> LlmResponse:
60
+ """Get LLM's response/decision."""
61
+ return await self.model.generate(llm_request)
62
+
63
+ async def act(
64
+ self,
65
+ context: ExecutionContext,
66
+ tool_calls: List[ToolCall]
67
+ ) -> List[ToolResult]:
68
+ """Execute tool calls and return results."""
69
+ tools_dict = {tool.name: tool for tool in self.tools}
70
+ results = []
71
+
72
+ for tool_call in tool_calls:
73
+ if tool_call.name not in tools_dict:
74
+ results.append(ToolResult(
75
+ tool_call_id=tool_call.tool_call_id,
76
+ name=tool_call.name,
77
+ status="error",
78
+ content=[f"Tool '{tool_call.name}' not found"],
79
+ ))
80
+ continue
81
+
82
+ tool = tools_dict[tool_call.name]
83
+
84
+ try:
85
+ output = await tool.execute(context, **tool_call.arguments)
86
+ results.append(ToolResult(
87
+ tool_call_id=tool_call.tool_call_id,
88
+ name=tool_call.name,
89
+ status="success",
90
+ content=[str(output)],
91
+ ))
92
+ except Exception as e:
93
+ results.append(ToolResult(
94
+ tool_call_id=tool_call.tool_call_id,
95
+ name=tool_call.name,
96
+ status="error",
97
+ content=[str(e)],
98
+ ))
99
+
100
+ return results
101
+
102
+ async def step(self, context: ExecutionContext):
103
+ """Execute one step of the agent loop."""
104
+ # Prepare what to send to the LLM
105
+ llm_request = self._prepare_llm_request(context)
106
+
107
+ # Get LLM's decision
108
+ llm_response = await self.think(llm_request)
109
+
110
+ # Record LLM response as an event
111
+ response_event = Event(
112
+ execution_id=context.execution_id,
113
+ author=self.name,
114
+ content=llm_response.content,
115
+ )
116
+ context.add_event(response_event)
117
+
118
+ # Execute tools if the LLM requested any
119
+ tool_calls = [c for c in llm_response.content if isinstance(c, ToolCall)]
120
+ if tool_calls:
121
+ tool_results = await self.act(context, tool_calls)
122
+ tool_event = Event(
123
+ execution_id=context.execution_id,
124
+ author=self.name,
125
+ content=tool_results,
126
+ )
127
+ context.add_event(tool_event)
128
+
129
+ context.increment_step()
130
+
131
+ async def run(
132
+ self,
133
+ user_input: str,
134
+ context: ExecutionContext = None
135
+ ) -> AgentResult:
136
+ """Run the agent with user input."""
137
+ # Create or reuse context
138
+ if context is None:
139
+ context = ExecutionContext()
140
+
141
+ # Add user input as the first event
142
+ user_event = Event(
143
+ execution_id=context.execution_id,
144
+ author="user",
145
+ content=[Message(role="user", content=user_input)]
146
+ )
147
+ context.add_event(user_event)
148
+
149
+ # Execute steps until completion or max steps reached
150
+ while not context.final_result and context.current_step < self.max_steps:
151
+ await self.step(context)
152
+
153
+ # Check if the last event is a final response
154
+ last_event = context.events[-1]
155
+ if self._is_final_response(last_event):
156
+ context.final_result = self._extract_final_result(last_event)
157
+
158
+ return AgentResult(output=context.final_result, context=context)
159
+
160
+ def _is_final_response(self, event: Event) -> bool:
161
+ """Check if this event contains a final response."""
162
+ has_tool_calls = any(isinstance(c, ToolCall) for c in event.content)
163
+ has_tool_results = any(isinstance(c, ToolResult) for c in event.content)
164
+ return not has_tool_calls and not has_tool_results
165
+
166
+ def _extract_final_result(self, event: Event) -> str:
167
+ """Extract the final result from an event."""
168
+ for item in event.content:
169
+ if isinstance(item, Message) and item.role == "assistant":
170
+ return item.content
171
+ return None
agent_framework/llm.py ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """LLM client and request/response models."""
2
+
3
+ import json
4
+ from typing import Any, List, Optional, Dict
5
+ from pydantic import BaseModel, Field, ConfigDict
6
+ from litellm import acompletion
7
+
8
+ from .models import Message, ToolCall, ToolResult, ContentItem
9
+
10
+
11
+ class LlmRequest(BaseModel):
12
+ """Request object for LLM calls."""
13
+ model_config = ConfigDict(arbitrary_types_allowed=True)
14
+
15
+ instructions: List[str] = Field(default_factory=list)
16
+ contents: List[ContentItem] = Field(default_factory=list)
17
+ tools: List[Any] = Field(default_factory=list)
18
+ tool_choice: Optional[str] = None
19
+
20
+
21
+ class LlmResponse(BaseModel):
22
+ """Response object from LLM calls."""
23
+ content: List[ContentItem] = Field(default_factory=list)
24
+ error_message: Optional[str] = None
25
+ usage_metadata: Dict[str, Any] = Field(default_factory=dict)
26
+
27
+
28
+ class LlmClient:
29
+ """Client for LLM API calls using LiteLLM."""
30
+
31
+ def __init__(self, model: str, **config):
32
+ self.model = model
33
+ self.config = config
34
+
35
+ async def generate(self, request: LlmRequest) -> LlmResponse:
36
+ """Generate a response from the LLM."""
37
+ try:
38
+ messages = self._build_messages(request)
39
+ tools = [t.tool_definition for t in request.tools] if request.tools else None
40
+
41
+ response = await acompletion(
42
+ model=self.model,
43
+ messages=messages,
44
+ tools=tools,
45
+ **({"tool_choice": request.tool_choice}
46
+ if request.tool_choice else {}),
47
+ **self.config
48
+ )
49
+
50
+ return self._parse_response(response)
51
+ except Exception as e:
52
+ return LlmResponse(error_message=str(e))
53
+
54
+ def _build_messages(self, request: LlmRequest) -> List[dict]:
55
+ """Convert LlmRequest to API message format."""
56
+ messages = []
57
+
58
+ for instruction in request.instructions:
59
+ messages.append({"role": "system", "content": instruction})
60
+
61
+ for item in request.contents:
62
+ if isinstance(item, Message):
63
+ messages.append({"role": item.role, "content": item.content})
64
+
65
+ elif isinstance(item, ToolCall):
66
+ tool_call_dict = {
67
+ "id": item.tool_call_id,
68
+ "type": "function",
69
+ "function": {
70
+ "name": item.name,
71
+ "arguments": json.dumps(item.arguments)
72
+ }
73
+ }
74
+ # Append to previous assistant message if exists
75
+ if messages and messages[-1]["role"] == "assistant":
76
+ messages[-1].setdefault("tool_calls", []).append(tool_call_dict)
77
+ else:
78
+ messages.append({
79
+ "role": "assistant",
80
+ "content": None,
81
+ "tool_calls": [tool_call_dict]
82
+ })
83
+
84
+ elif isinstance(item, ToolResult):
85
+ messages.append({
86
+ "role": "tool",
87
+ "tool_call_id": item.tool_call_id,
88
+ "content": str(item.content[0]) if item.content else ""
89
+ })
90
+
91
+ return messages
92
+
93
+ def _parse_response(self, response) -> LlmResponse:
94
+ """Convert API response to LlmResponse."""
95
+ choice = response.choices[0]
96
+ content_items = []
97
+
98
+ if choice.message.content:
99
+ content_items.append(Message(
100
+ role="assistant",
101
+ content=choice.message.content
102
+ ))
103
+
104
+ if choice.message.tool_calls:
105
+ for tc in choice.message.tool_calls:
106
+ content_items.append(ToolCall(
107
+ tool_call_id=tc.id,
108
+ name=tc.function.name,
109
+ arguments=json.loads(tc.function.arguments)
110
+ ))
111
+
112
+ return LlmResponse(
113
+ content=content_items,
114
+ usage_metadata={
115
+ "input_tokens": response.usage.prompt_tokens,
116
+ "output_tokens": response.usage.completion_tokens,
117
+ }
118
+ )
agent_framework/mcp.py ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MCP (Model Context Protocol) tool integration."""
2
+
3
+ import os
4
+ from typing import Dict, List
5
+ from mcp import ClientSession, StdioServerParameters
6
+ from mcp.client.stdio import stdio_client
7
+
8
+ from .tools import BaseTool, FunctionTool
9
+
10
+
11
+ def _extract_text_content(result) -> str:
12
+ """Extract text content from MCP tool result."""
13
+ if not hasattr(result, 'content'):
14
+ return str(result)
15
+
16
+ texts = []
17
+ for item in result.content:
18
+ if hasattr(item, 'text'):
19
+ texts.append(item.text)
20
+ else:
21
+ texts.append(str(item))
22
+
23
+ return "\n\n".join(texts)
24
+
25
+
26
+ async def load_mcp_tools(connection: Dict) -> List[BaseTool]:
27
+ """Load tools from an MCP server and convert to FunctionTools.
28
+
29
+ Args:
30
+ connection: Dictionary with connection parameters:
31
+ - command: Command to run the MCP server
32
+ - args: Arguments for the command
33
+ - env: Environment variables (optional)
34
+
35
+ Returns:
36
+ List of BaseTool instances wrapping MCP tools
37
+
38
+ Example:
39
+ connection = {
40
+ "command": "npx",
41
+ "args": ["-y", "tavily-mcp@latest"],
42
+ "env": {"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")}
43
+ }
44
+ tools = await load_mcp_tools(connection)
45
+ """
46
+ tools = []
47
+
48
+ async with stdio_client(StdioServerParameters(**connection)) as (read, write):
49
+ async with ClientSession(read, write) as session:
50
+ await session.initialize()
51
+ mcp_tools = await session.list_tools()
52
+
53
+ for mcp_tool in mcp_tools.tools:
54
+ func_tool = _create_mcp_tool(mcp_tool, connection)
55
+ tools.append(func_tool)
56
+
57
+ return tools
58
+
59
+
60
+ def _create_mcp_tool(mcp_tool, connection: Dict) -> FunctionTool:
61
+ """Create a FunctionTool that wraps an MCP tool."""
62
+
63
+ async def call_mcp(**kwargs):
64
+ async with stdio_client(StdioServerParameters(**connection)) as (read, write):
65
+ async with ClientSession(read, write) as session:
66
+ await session.initialize()
67
+ result = await session.call_tool(mcp_tool.name, kwargs)
68
+ return _extract_text_content(result)
69
+
70
+ tool_definition = {
71
+ "type": "function",
72
+ "function": {
73
+ "name": mcp_tool.name,
74
+ "description": mcp_tool.description,
75
+ "parameters": mcp_tool.inputSchema,
76
+ }
77
+ }
78
+
79
+ return FunctionTool(
80
+ func=call_mcp,
81
+ name=mcp_tool.name,
82
+ description=mcp_tool.description,
83
+ tool_definition=tool_definition
84
+ )
agent_framework/models.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Core data models for the agent framework."""
2
+
3
+ from typing import Literal, Union, List, Dict, Optional, Any
4
+ from pydantic import BaseModel, Field
5
+ from dataclasses import dataclass, field
6
+ import uuid
7
+ from datetime import datetime
8
+
9
+
10
+ class Message(BaseModel):
11
+ """A text message in the conversation."""
12
+ type: Literal["message"] = "message"
13
+ role: Literal["system", "user", "assistant"]
14
+ content: str
15
+
16
+
17
+ class ToolCall(BaseModel):
18
+ """LLM's request to execute a tool."""
19
+ type: Literal["tool_call"] = "tool_call"
20
+ tool_call_id: str
21
+ name: str
22
+ arguments: dict
23
+
24
+
25
+ class ToolResult(BaseModel):
26
+ """Result from tool execution."""
27
+ type: Literal["tool_result"] = "tool_result"
28
+ tool_call_id: str
29
+ name: str
30
+ status: Literal["success", "error"]
31
+ content: list
32
+
33
+
34
+ ContentItem = Union[Message, ToolCall, ToolResult]
35
+
36
+
37
+ class Event(BaseModel):
38
+ """A recorded occurrence during agent execution."""
39
+ id: str = Field(default_factory=lambda: str(uuid.uuid4()))
40
+ execution_id: str
41
+ timestamp: float = Field(default_factory=lambda: datetime.now().timestamp())
42
+ author: str # "user" or agent name
43
+ content: List[ContentItem] = Field(default_factory=list)
44
+
45
+
46
+ @dataclass
47
+ class ExecutionContext:
48
+ """Central storage for all execution state."""
49
+
50
+ execution_id: str = field(default_factory=lambda: str(uuid.uuid4()))
51
+ events: List[Event] = field(default_factory=list)
52
+ current_step: int = 0
53
+ state: Dict[str, Any] = field(default_factory=dict)
54
+ final_result: Optional[str | BaseModel] = None
55
+
56
+ def add_event(self, event: Event):
57
+ """Append an event to the execution history."""
58
+ self.events.append(event)
59
+
60
+ def increment_step(self):
61
+ """Move to the next execution step."""
62
+ self.current_step += 1
agent_framework/tools.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tool system for the agent framework."""
2
+
3
+ from abc import ABC, abstractmethod
4
+ from typing import Dict, Any, Callable
5
+ import inspect
6
+ from .models import ExecutionContext
7
+ from .utils import function_to_input_schema, format_tool_definition
8
+
9
+
10
+ class BaseTool(ABC):
11
+ """Abstract base class for all tools."""
12
+
13
+ def __init__(
14
+ self,
15
+ name: str = None,
16
+ description: str = None,
17
+ tool_definition: Dict[str, Any] = None,
18
+ ):
19
+ self.name = name or self.__class__.__name__
20
+ self.description = description or self.__doc__ or ""
21
+ self._tool_definition = tool_definition
22
+
23
+ @property
24
+ def tool_definition(self) -> Dict[str, Any] | None:
25
+ return self._tool_definition
26
+
27
+ @abstractmethod
28
+ async def execute(self, context: ExecutionContext, **kwargs) -> Any:
29
+ pass
30
+
31
+ async def __call__(self, context: ExecutionContext, **kwargs) -> Any:
32
+ return await self.execute(context, **kwargs)
33
+
34
+
35
+ class FunctionTool(BaseTool):
36
+ """Wraps a Python function as a BaseTool."""
37
+
38
+ def __init__(
39
+ self,
40
+ func: Callable,
41
+ name: str = None,
42
+ description: str = None,
43
+ tool_definition: Dict[str, Any] = None
44
+ ):
45
+ self.func = func
46
+ self.needs_context = 'context' in inspect.signature(func).parameters
47
+
48
+ self.name = name or func.__name__
49
+ self.description = description or (func.__doc__ or "").strip()
50
+ tool_definition = tool_definition or self._generate_definition()
51
+
52
+ super().__init__(
53
+ name=self.name,
54
+ description=self.description,
55
+ tool_definition=tool_definition
56
+ )
57
+
58
+ async def execute(self, context: ExecutionContext = None, **kwargs) -> Any:
59
+ """Execute the wrapped function.
60
+
61
+ Context is only required if the wrapped function has a 'context' parameter.
62
+ """
63
+ if self.needs_context:
64
+ if context is None:
65
+ raise ValueError(
66
+ f"Tool '{self.name}' requires a context parameter. "
67
+ f"Please provide an ExecutionContext instance."
68
+ )
69
+ result = self.func(context=context, **kwargs)
70
+ else:
71
+ result = self.func(**kwargs)
72
+
73
+ # Handle both sync and async functions
74
+ if inspect.iscoroutine(result):
75
+ return await result
76
+ return result
77
+
78
+ def _generate_definition(self) -> Dict[str, Any]:
79
+ """Generate tool definition from function signature."""
80
+ parameters = function_to_input_schema(self.func)
81
+ return format_tool_definition(self.name, self.description, parameters)
82
+
83
+
84
+ def tool(
85
+ func: Callable = None,
86
+ *,
87
+ name: str = None,
88
+ description: str = None,
89
+ tool_definition: Dict[str, Any] = None
90
+ ):
91
+ """Decorator to convert a function into a FunctionTool.
92
+
93
+ Usage:
94
+ @tool
95
+ def my_function(x: int) -> int:
96
+ return x * 2
97
+
98
+ # Or with parameters:
99
+ @tool(name="custom_name", description="Custom description")
100
+ def my_function(x: int) -> int:
101
+ return x * 2
102
+ """
103
+ from typing import Union
104
+
105
+ def decorator(f: Callable) -> FunctionTool:
106
+ return FunctionTool(
107
+ func=f,
108
+ name=name,
109
+ description=description,
110
+ tool_definition=tool_definition
111
+ )
112
+
113
+ if func is not None:
114
+ return decorator(func)
115
+ return decorator
agent_framework/utils.py ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Utility functions for the agent framework."""
2
+
3
+ import inspect
4
+ from typing import Dict, Any
5
+
6
+
7
+ def function_to_input_schema(func) -> dict:
8
+ """Convert a function signature to JSON Schema input format."""
9
+ type_map = {
10
+ str: "string",
11
+ int: "integer",
12
+ float: "number",
13
+ bool: "boolean",
14
+ list: "array",
15
+ dict: "object",
16
+ type(None): "null",
17
+ }
18
+
19
+ try:
20
+ signature = inspect.signature(func)
21
+ except ValueError as e:
22
+ raise ValueError(
23
+ f"Failed to get signature for function {func.__name__}: {str(e)}"
24
+ )
25
+
26
+ parameters = {}
27
+ for param in signature.parameters.values():
28
+ try:
29
+ param_type = type_map.get(param.annotation, "string")
30
+ except KeyError as e:
31
+ raise KeyError(
32
+ f"Unknown type annotation {param.annotation} for parameter {param.name}: {str(e)}"
33
+ )
34
+ parameters[param.name] = {"type": param_type}
35
+
36
+ required = [
37
+ param.name
38
+ for param in signature.parameters.values()
39
+ if param.default == inspect._empty
40
+ ]
41
+
42
+ return {
43
+ "type": "object",
44
+ "properties": parameters,
45
+ "required": required,
46
+ }
47
+
48
+
49
+ def format_tool_definition(name: str, description: str, parameters: dict) -> dict:
50
+ """Format a tool definition in OpenAI function calling format."""
51
+ return {
52
+ "type": "function",
53
+ "function": {
54
+ "name": name,
55
+ "description": description,
56
+ "parameters": parameters,
57
+ },
58
+ }
59
+
60
+
61
+ def function_to_tool_definition(func) -> dict:
62
+ """Convert a function to OpenAI tool definition format."""
63
+ return format_tool_definition(
64
+ func.__name__,
65
+ func.__doc__ or "",
66
+ function_to_input_schema(func)
67
+ )
68
+
69
+
70
+ def mcp_tools_to_openai_format(mcp_tools) -> list[dict]:
71
+ """Convert MCP tool definitions to OpenAI tool format."""
72
+ return [
73
+ format_tool_definition(
74
+ name=tool.name,
75
+ description=tool.description,
76
+ parameters=tool.inputSchema,
77
+ )
78
+ for tool in mcp_tools.tools
79
+ ]
example.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Example usage of the agent framework."""
2
+
3
+ import asyncio
4
+ import os
5
+ from dotenv import load_dotenv
6
+ from agent_framework import Agent, LlmClient, FunctionTool, load_mcp_tools
7
+
8
+ load_dotenv()
9
+
10
+
11
+ # Example 1: Simple calculator tool
12
+ def calculator(expression: str) -> float:
13
+ """Calculate mathematical expressions."""
14
+ return eval(expression)
15
+
16
+
17
+ # Example 2: Using the @tool decorator
18
+ from agent_framework import tool
19
+
20
+ @tool
21
+ def search_web(query: str, max_results: int = 5) -> str:
22
+ """Search the web for information."""
23
+ # This is a placeholder - in real usage, you'd call an actual search API
24
+ return f"Search results for: {query}"
25
+
26
+
27
+ async def main():
28
+ # Create a calculator tool
29
+ calc_tool = FunctionTool(calculator)
30
+
31
+ # Create the agent
32
+ agent = Agent(
33
+ model=LlmClient(model="gpt-5-mini"),
34
+ tools=[calc_tool, search_web],
35
+ instructions="You are a helpful assistant that can calculate and search the web.",
36
+ )
37
+
38
+ # Run the agent
39
+ result = await agent.run("What is 1234 * 5678?")
40
+ print(f"Result: {result.output}")
41
+ print(f"Steps taken: {result.context.current_step}")
42
+
43
+ # Example with MCP tools
44
+ if os.getenv("TAVILY_API_KEY"):
45
+ connection = {
46
+ "command": "npx",
47
+ "args": ["-y", "tavily-mcp@latest"],
48
+ "env": {"TAVILY_API_KEY": os.getenv("TAVILY_API_KEY")}
49
+ }
50
+ mcp_tools = await load_mcp_tools(connection)
51
+
52
+ agent_with_mcp = Agent(
53
+ model=LlmClient(model="gpt-5-mini"),
54
+ tools=[calc_tool, *mcp_tools],
55
+ instructions="You are a helpful assistant with web search capabilities.",
56
+ )
57
+
58
+ result = await agent_with_mcp.run("What is the capital of France?")
59
+ print(f"Result: {result.output}")
60
+
61
+
62
+ if __name__ == "__main__":
63
+ asyncio.run(main())
64
+
my_code.ipynb CHANGED
@@ -1,926 +1 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "code",
5
- "execution_count": null,
6
- "id": "bd396f3a",
7
- "metadata": {},
8
- "outputs": [
9
- {
10
- "data": {
11
- "text/plain": [
12
- "True"
13
- ]
14
- },
15
- "execution_count": 1,
16
- "metadata": {},
17
- "output_type": "execute_result"
18
- }
19
- ],
20
- "source": [
21
- "from dotenv import load_dotenv, find_dotenv\n",
22
- "\n",
23
- "load_dotenv(find_dotenv())\n",
24
- "\n"
25
- ]
26
- },
27
- {
28
- "cell_type": "code",
29
- "execution_count": null,
30
- "id": "bdc55e33",
31
- "metadata": {},
32
- "outputs": [
33
- {
34
- "name": "stdout",
35
- "output_type": "stream",
36
- "text": [
37
- "ChatCompletionMessage(content='The capital of India is New Delhi.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None)\n",
38
- "The capital of India is New Delhi.\n"
39
- ]
40
- }
41
- ],
42
- "source": [
43
- "from openai import OpenAI\n",
44
- "client = OpenAI()\n",
45
- "\n",
46
- "response = client.chat.completions.create(\n",
47
- " model = 'gpt-5-mini',\n",
48
- " messages = [\n",
49
- " {'role': 'system', 'content' : 'You are a helpful assistant !'},\n",
50
- " {'role': 'user', 'content': 'What is the capital of India ?'}\n",
51
- " ]\n",
52
- ")\n",
53
- "\n",
54
- "print(response.choices[0].message)\n",
55
- "print(response.choices[0].message.content)\n"
56
- ]
57
- },
58
- {
59
- "cell_type": "code",
60
- "execution_count": 5,
61
- "id": "396e8826",
62
- "metadata": {},
63
- "outputs": [
64
- {
65
- "name": "stdout",
66
- "output_type": "stream",
67
- "text": [
68
- "Hello! How can I help you today?\n"
69
- ]
70
- }
71
- ],
72
- "source": [
73
- "## with this we can unify all providers\n",
74
- "\n",
75
- "from litellm import completion\n",
76
- "response = completion(\n",
77
- " model = 'gpt-5-mini',\n",
78
- " messages = [{'role' : 'user', 'content' : 'Hello !' }]\n",
79
- ")\n",
80
- "\n",
81
- "print(response.choices[0].message.content)"
82
- ]
83
- },
84
- {
85
- "cell_type": "code",
86
- "execution_count": null,
87
- "id": "cb505eb4",
88
- "metadata": {},
89
- "outputs": [
90
- {
91
- "name": "stdout",
92
- "output_type": "stream",
93
- "text": [
94
- "Nice to meet you, Akhil — how can I help you today?\n",
95
- "I don't know — I don't have access to personal details unless you tell me. What would you like me to call you in this chat? (I can use that name for this conversation, but I can't remember it across separate sessions unless you set it in your app/profile.)\n"
96
- ]
97
- }
98
- ],
99
- "source": [
100
- "from litellm import completion\n",
101
- "\n",
102
- "response1 = completion(\n",
103
- " model = 'gpt-5-mini',\n",
104
- " messages = [{'role' : 'user', 'content':'My name is Akhil'}]\n",
105
- ")\n",
106
- "\n",
107
- "response2 = completion(\n",
108
- " model = 'gpt-5-mini',\n",
109
- " messages = [{'role' : 'user', 'content':'what\\'s my name'}]\n",
110
- ")\n",
111
- "\n",
112
- "print(response1.choices[0].message.content)\n",
113
- "print(response2.choices[0].message.content)\n",
114
- "\n",
115
- "### This proves that each LLM call is independent. Our Model doesn't have memory"
116
- ]
117
- },
118
- {
119
- "cell_type": "code",
120
- "execution_count": 8,
121
- "id": "cd3ade31",
122
- "metadata": {},
123
- "outputs": [
124
- {
125
- "name": "stdout",
126
- "output_type": "stream",
127
- "text": [
128
- "Great — love the ambition, Akhil. If you want to be “the future of AI,” I can help you get there. How would you like me to help right now? (Pick one: roadmap, project ideas, resume/LinkedIn copy, interview prep, or a 12‑month actionable plan.)\n",
129
- "\n",
130
- "Below are a few immediately useful things you can use or ask me to expand.\n",
131
- "\n",
132
- "Quick elevator pitch / LinkedIn headline\n",
133
- "- Headline: Akhil — Building safe, scalable AI that augments human creativity and solves real-world problems\n",
134
- "- 1‑line pitch: “I build trustworthy AI systems that turn complex data into products people love — with a focus on safety, scalability, and real-world impact.”\n",
135
- "\n",
136
- "High‑level skills to prioritize\n",
137
- "- Foundations: probability, linear algebra, optimization\n",
138
- "- Core ML: supervised learning, neural networks, transfer learning, transformers\n",
139
- "- Systems & infra: PyTorch/TensorFlow, Docker, Kubernetes, model serving, MLOps\n",
140
- "- Specialized: LLMs, RL, generative models, multimodal models (vision+language)\n",
141
- "- Soft skills: product sense, communication, writing and presenting research\n",
142
- "- Ethics & safety: alignment concepts, bias mitigation, robust evaluation\n",
143
- "\n",
144
- "3 concrete projects (increasing complexity)\n",
145
- "1. End‑to‑end ML app: simple image classifier with deployment (Flask/FastAPI + Docker + test pipeline)\n",
146
- "2. LLM product prototype: retrieval-augmented chatbot for a specific domain (docs → vector DB → RAG)\n",
147
- "3. Research/engineering hybrid: fine-tune or distill a model for efficiency and publish a blog post + code on GitHub\n",
148
- "\n",
149
- "Practical 12‑month roadmap (high level)\n",
150
- "- Months 0–2: Fill gaps — math refresher, PyTorch, small projects, GitHub portfolio\n",
151
- "- Months 3–5: Build and deploy 2 production prototypes (one LLM-based), publish writeups\n",
152
- "- Months 6–9: Contribute to OSS or collaborate on a research project; attend conferences/meetups\n",
153
- "- Months 10–12: Target internships/roles, refine portfolio, prepare interviews, publish a substantial case study or replication\n",
154
- "\n",
155
- "Quick resources\n",
156
- "- Fast theory/math: “Mathematics for Machine Learning” + 3Blue1Brown playlists\n",
157
- "- Practical ML: Deep Learning Book (selected chapters), PyTorch docs, Hugging Face course\n",
158
- "- MLOps/RAG: LangChain/HF tutorials, Vector DB docs (Pinecone/Weaviate)\n",
159
- "\n",
160
- "If you want, I can:\n",
161
- "- Create a personalized 6‑ or 12‑month plan based on your background and time availability\n",
162
- "- Draft a LinkedIn summary, resume bullets, or a cover letter\n",
163
- "- Design a project roadmap with milestones and tech stack\n",
164
- "Tell me which and give me your experience level (student / early-career / senior / founder) and how many hours per week you can commit.\n",
165
- "Short answer: you’re Akhil — the person who just told me “I am going to be the Future of AI.” Beyond that, only you can fully answer “who am I,” but I can help you shape a clear, useful version of that identity for career, confidence, and action.\n",
166
- "\n",
167
- "Pick one of these and I’ll build it for you:\n",
168
- "- A crisp personal identity/mission statement (1–2 lines)\n",
169
- "- A short LinkedIn “About” summary\n",
170
- "- A 12‑month plan to become a leader in AI\n",
171
- "- A set of interview/resume bullets matched to your level\n",
172
- "\n",
173
- "If you want to explore it yourself first, answer 5 quick prompts (one sentence each):\n",
174
- "1. What technical skills do you already have (languages, frameworks, papers/projects)?\n",
175
- "2. What do you enjoy doing most in AI (research, building products, deploying models, safety/ethics)?\n",
176
- "3. What impact do you want to have (industry, research, social good, startups)?\n",
177
- "4. What are your top 2 strengths and top 1 weakness you want to fix?\n",
178
- "5. How many hours/week can you commit to learning or working toward this goal?\n",
179
- "\n",
180
- "Or, if you want an immediate example identity statement based on your earlier claim:\n",
181
- "- “I’m Akhil — an aspiring AI leader building safe, scalable systems that augment human creativity. My mission is to bridge cutting‑edge research and real‑world impact.”\n",
182
- "\n",
183
- "Tell me which option you want or answer the 5 prompts and I’ll draft something tailored.\n"
184
- ]
185
- }
186
- ],
187
- "source": [
188
- "### Managing conversation history\n",
189
- "\n",
190
- "\n",
191
- "from litellm import completion\n",
192
- "\n",
193
- "## Maintain a messages object\n",
194
- "messages = []\n",
195
- "\n",
196
- "## append your message/conversation\n",
197
- "messages.append({'role':'user', 'content':'My name is Akhil and I am going to be the Future of AI'})\n",
198
- "response3 = completion(model = 'gpt-5-mini', messages = messages)\n",
199
- "\n",
200
- "print(response3.choices[0].message.content)\n",
201
- "\n",
202
- "## append the message from assistant\n",
203
- "messages.append({'role':'assistant', 'content':response3.choices[0].message.content})\n",
204
- "\n",
205
- "## write a new message\n",
206
- "messages.append({'role':'user', 'content':'who am i'})\n",
207
- "response4 = completion(model = 'gpt-5-mini', messages = messages)\n",
208
- "\n",
209
- "print(response4.choices[0].message.content)\n",
210
- "\n",
211
- "\n"
212
- ]
213
- },
214
- {
215
- "cell_type": "code",
216
- "execution_count": null,
217
- "id": "e0868cf6",
218
- "metadata": {},
219
- "outputs": [
220
- {
221
- "name": "stdout",
222
- "output_type": "stream",
223
- "text": [
224
- "{\"name\":\"Akhil\",\"email\":\"akhil.masters21@gmail.com\",\"phone\":\"9550303420\"}\n"
225
- ]
226
- }
227
- ],
228
- "source": [
229
- "### Structured output\n",
230
- "\n",
231
- "from pydantic import BaseModel\n",
232
- "from litellm import completion\n",
233
- "\n",
234
- "class ExtractedInfo(BaseModel):\n",
235
- " name : str\n",
236
- " email : str\n",
237
- " phone : str | None = None\n",
238
- "\n",
239
- "response = completion(\n",
240
- " model=\"gpt-5-mini\",\n",
241
- " messages=[{\n",
242
- " \"role\": \"user\", \n",
243
- " \"content\": \"My name is Akhil, my email is akhil.masters21@gmail.com, and my phone is 9550303420.\"\n",
244
- " }],\n",
245
- " response_format=ExtractedInfo\n",
246
- ")\n",
247
- "\n",
248
- "print(response.choices[0].message.content)"
249
- ]
250
- },
251
- {
252
- "cell_type": "code",
253
- "execution_count": null,
254
- "id": "03d48814",
255
- "metadata": {},
256
- "outputs": [
257
- {
258
- "name": "stdout",
259
- "output_type": "stream",
260
- "text": [
261
- "Q: What is 2 + 2?\n",
262
- "A: 2 + 2 = 4.\n",
263
- "\n",
264
- "Q: What is the capital of Japan?\n",
265
- "A: The capital of Japan is Tokyo.\n",
266
- "\n",
267
- "Q: Who wrote Romeo and Juliet?\n",
268
- "A: Romeo and Juliet was written by William Shakespeare. It was likely written and first performed in the mid-1590s (published in 1597).\n",
269
- "\n"
270
- ]
271
- }
272
- ],
273
- "source": [
274
- "### Asynchronus calls\n",
275
- "\n",
276
- "import asyncio\n",
277
- "from litellm import acompletion\n",
278
- "async def get_response(prompt: str) -> str:\n",
279
- " response = await acompletion(\n",
280
- " model = 'gpt-5-mini',\n",
281
- " messages=[{\"role\": \"user\", \"content\": prompt}]\n",
282
- " )\n",
283
- " return response.choices[0].message.content\n",
284
- " \n",
285
- "prompts = [\n",
286
- " \"What is 2 + 2?\",\n",
287
- " \"What is the capital of Japan?\",\n",
288
- " \"Who wrote Romeo and Juliet?\"\n",
289
- "]\n",
290
- "\n",
291
- "### here \n",
292
- "## tasks = [get_response(What is 2 + 2?), get_response(What is the capital of Japan?)] \n",
293
- "## doesnt run the function, it just creates a coroutine object. Thats the difference in async.\n",
294
- "## functions are called in gather step\n",
295
- "\n",
296
- "tasks = [get_response(p) for p in prompts]\n",
297
- "results = await asyncio.gather(*tasks)\n",
298
- "\n",
299
- "for prompt, result in zip(prompts, results):\n",
300
- " print(f\"Q: {prompt}\")\n",
301
- " print(f\"A: {result}\\n\")\n"
302
- ]
303
- },
304
- {
305
- "cell_type": "code",
306
- "execution_count": 14,
307
- "id": "3333de1d",
308
- "metadata": {},
309
- "outputs": [
310
- {
311
- "name": "stdout",
312
- "output_type": "stream",
313
- "text": [
314
- "Q: What is 0 + 0?\n",
315
- "A: 0\n",
316
- "\n",
317
- "Because adding zero to zero yields zero.\n",
318
- "\n",
319
- "Q: What is 1 + 1?\n",
320
- "A: 1 + 1 = 2.\n",
321
- "\n",
322
- "Q: What is 2 + 2?\n",
323
- "A: 2 + 2 = 4.\n",
324
- "\n",
325
- "Q: What is 3 + 3?\n",
326
- "A: 3 + 3 = 6.\n",
327
- "\n",
328
- "Q: What is 4 + 4?\n",
329
- "A: 8\n",
330
- "\n",
331
- "Q: What is 5 + 5?\n",
332
- "A: 10\n",
333
- "\n",
334
- "Q: What is 6 + 6?\n",
335
- "A: 12\n",
336
- "\n",
337
- "Q: What is 7 + 7?\n",
338
- "A: 14\n",
339
- "\n",
340
- "Q: What is 8 + 8?\n",
341
- "A: 16\n",
342
- "\n",
343
- "Q: What is 9 + 9?\n",
344
- "A: 18\n",
345
- "\n",
346
- "Q: What is 10 + 10?\n",
347
- "A: 10 + 10 = 20\n",
348
- "\n",
349
- "Q: What is 11 + 11?\n",
350
- "A: 22\n",
351
- "\n",
352
- "Q: What is 12 + 12?\n",
353
- "A: 24\n",
354
- "\n",
355
- "Q: What is 13 + 13?\n",
356
- "A: 26\n",
357
- "\n",
358
- "Q: What is 14 + 14?\n",
359
- "A: 28\n",
360
- "\n",
361
- "Q: What is 15 + 15?\n",
362
- "A: 30\n",
363
- "\n",
364
- "Q: What is 16 + 16?\n",
365
- "A: 32\n",
366
- "\n",
367
- "Q: What is 17 + 17?\n",
368
- "A: 34\n",
369
- "\n",
370
- "Q: What is 18 + 18?\n",
371
- "A: 36\n",
372
- "\n",
373
- "Q: What is 19 + 19?\n",
374
- "A: 38\n",
375
- "\n"
376
- ]
377
- }
378
- ],
379
- "source": [
380
- "### rate limiting queries\n",
381
- "semaphore = asyncio.Semaphore(10)\n",
382
- "\n",
383
- "async def call_llm(prompt : str) -> str:\n",
384
- " async with semaphore:\n",
385
- " response = await acompletion(\n",
386
- " model=\"gpt-5-mini\",\n",
387
- " messages=[{\"role\": \"user\", \"content\": prompt}],\n",
388
- " num_retries=3 # Automatic retry with exponential backoff\n",
389
- " )\n",
390
- " return response.choices[0].message.content\n",
391
- "prompts = [f\"What is {i} + {i}?\" for i in range(20)]\n",
392
- "tasks = [call_llm(p) for p in prompts]\n",
393
- "results = await asyncio.gather(*tasks, return_exceptions=True)\n",
394
- "\n",
395
- "\n",
396
- "for prompt, result in zip(prompts, results):\n",
397
- " print(f\"Q: {prompt}\")\n",
398
- " print(f\"A: {result}\\n\")\n"
399
- ]
400
- },
401
- {
402
- "cell_type": "code",
403
- "execution_count": 16,
404
- "id": "1caef766",
405
- "metadata": {},
406
- "outputs": [
407
- {
408
- "name": "stderr",
409
- "output_type": "stream",
410
- "text": [
411
- "Generating test split: 100%|██████████| 93/93 [00:00<00:00, 1653.78 examples/s]\n",
412
- "Generating validation split: 100%|██████████| 53/53 [00:00<00:00, 32022.20 examples/s]"
413
- ]
414
- },
415
- {
416
- "name": "stdout",
417
- "output_type": "stream",
418
- "text": [
419
- "Number of Level 1 problems: 53\n"
420
- ]
421
- },
422
- {
423
- "name": "stderr",
424
- "output_type": "stream",
425
- "text": [
426
- "\n"
427
- ]
428
- }
429
- ],
430
- "source": [
431
- "## loading the GAIA dataset\n",
432
- "\n",
433
- "from datasets import load_dataset\n",
434
- "level1_problems = load_dataset(\"gaia-benchmark/GAIA\", \"2023_level1\", split=\"validation\")\n",
435
- "print(f\"Number of Level 1 problems: {len(level1_problems)}\")\n"
436
- ]
437
- },
438
- {
439
- "cell_type": "code",
440
- "execution_count": 17,
441
- "id": "733c211c",
442
- "metadata": {},
443
- "outputs": [
444
- {
445
- "data": {
446
- "text/plain": [
447
- "{'task_id': '8e867cd7-cff9-4e6c-867a-ff5ddc2550be',\n",
448
- " 'Question': 'How many studio albums were published by Mercedes Sosa between 2000 and 2009 (included)? You can use the latest 2022 version of english wikipedia.',\n",
449
- " 'Level': '1',\n",
450
- " 'Final answer': '3',\n",
451
- " 'file_name': '',\n",
452
- " 'file_path': '',\n",
453
- " 'Annotator Metadata': {'Steps': '1. I did a search for Mercedes Sosa\\n2. I went to the Wikipedia page for her\\n3. I scrolled down to \"Studio albums\"\\n4. I counted the ones between 2000 and 2009',\n",
454
- " 'Number of steps': '4',\n",
455
- " 'How long did this take?': '5 minutes',\n",
456
- " 'Tools': '1. web browser\\n2. google search',\n",
457
- " 'Number of tools': '2'}}"
458
- ]
459
- },
460
- "execution_count": 17,
461
- "metadata": {},
462
- "output_type": "execute_result"
463
- }
464
- ],
465
- "source": [
466
- "level1_problems[1]"
467
- ]
468
- },
469
- {
470
- "cell_type": "code",
471
- "execution_count": 19,
472
- "id": "3d5bcb22",
473
- "metadata": {},
474
- "outputs": [
475
- {
476
- "name": "stderr",
477
- "output_type": "stream",
478
- "text": [
479
- "100%|██████████| 40/40 [02:23<00:00, 3.58s/it]\n"
480
- ]
481
- }
482
- ],
483
- "source": [
484
- "## defining a respose for gaia\n",
485
- "from pydantic import BaseModel\n",
486
- "from tqdm.asyncio import tqdm\n",
487
- "gaia_prompt = \"\"\"You are a general AI assistant. I will ask you a question.\n",
488
- "First, determine if you can solve this problem with your current capabilities and set \"is_solvable\" accordingly.\n",
489
- "If you can solve it, set \"is_solvable\" to true and provide your answer in \"final_answer\".\n",
490
- "If you cannot solve it, set \"is_solvable\" to false and explain why in \"unsolvable_reason\".\n",
491
- "Your final answer should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.\n",
492
- "If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise.\n",
493
- "If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise.\n",
494
- "If you are asked for a comma separated list, apply the above rules depending on whether the element is a number or a string.\"\"\"\n",
495
- "\n",
496
- "class GaiaOutput(BaseModel):\n",
497
- " is_solvable: bool\n",
498
- " unsolvable_reason: str = \"\"\n",
499
- " final_answer: str = \"\"\n",
500
- "\n",
501
- "PROVIDER_SEMAPHORES = {'openai': asyncio.Semaphore(30), 'anthropic': asyncio.Semaphore(10)}\n",
502
- "\n",
503
- "def get_provider(model: str) -> str:\n",
504
- " return \"anthropic\" if model.startswith(\"anthropic/\") else \"openai\"\n",
505
- "\n",
506
- "\n",
507
- "async def solve_problem(model: str, question: str) -> GaiaOutput:\n",
508
- " provider = get_provider(model)\n",
509
- " async with PROVIDER_SEMAPHORES[provider]:\n",
510
- " response = await acompletion(\n",
511
- " model = model,\n",
512
- " messages=[\n",
513
- " {\"role\": \"system\", \"content\": gaia_prompt},\n",
514
- " {\"role\": \"user\", \"content\": question},\n",
515
- " ],\n",
516
- " response_format=GaiaOutput,\n",
517
- " num_retries=2,\n",
518
- " )\n",
519
- " finish_reason = response.choices[0].finish_reason\n",
520
- " content = response.choices[0].message.content\n",
521
- " if finish_reason == \"refusal\" or content is None:\n",
522
- " return GaiaOutput(\n",
523
- " is_solvable=False,\n",
524
- " unsolvable_reason=f\"Model refused to answer (finish_reason: {finish_reason})\",\n",
525
- " final_answer=\"\"\n",
526
- " )\n",
527
- " return GaiaOutput.model_validate_json(content)\n",
528
- "\n",
529
- "def is_correct(prediction: str | None, answer: str) -> bool:\n",
530
- " \"\"\"Check exact match between prediction and answer (case-insensitive).\"\"\"\n",
531
- " if prediction is None:\n",
532
- " return False\n",
533
- " return prediction.strip().lower() == answer.strip().lower()\n",
534
- "\n",
535
- "async def evaluate_gaia_single(problem: dict, model: str) -> dict:\n",
536
- " \"\"\"Evaluate a single problem-model pair and return result.\"\"\"\n",
537
- " try:\n",
538
- " output = await solve_problem(model, problem[\"Question\"])\n",
539
- " return {\n",
540
- " \"task_id\": problem[\"task_id\"],\n",
541
- " \"model\": model,\n",
542
- " \"correct\": is_correct(output.final_answer, problem[\"Final answer\"]),\n",
543
- " \"is_solvable\": output.is_solvable,\n",
544
- " \"prediction\": output.final_answer,\n",
545
- " \"answer\": problem[\"Final answer\"],\n",
546
- " \"unsolvable_reason\": output.unsolvable_reason,\n",
547
- " }\n",
548
- " except Exception as e:\n",
549
- " return {\n",
550
- " \"task_id\": problem[\"task_id\"],\n",
551
- " \"model\": model,\n",
552
- " \"correct\": False,\n",
553
- " \"is_solvable\": None,\n",
554
- " \"prediction\": None,\n",
555
- " \"answer\": problem[\"Final answer\"],\n",
556
- " \"error\": str(e),\n",
557
- " }\n",
558
- "\n",
559
- "async def run_experiment(\n",
560
- " problems: list[dict],\n",
561
- " models: list[str],\n",
562
- ") -> dict[str, list]:\n",
563
- " \"\"\"Evaluate all models on all problems.\"\"\"\n",
564
- " tasks = [\n",
565
- " evaluate_gaia_single(problem, model)\n",
566
- " for problem in problems\n",
567
- " for model in models\n",
568
- " ]\n",
569
- " \n",
570
- " all_results = await tqdm.gather(*tasks)\n",
571
- " \n",
572
- " # Group results by model\n",
573
- " results = {model: [] for model in models}\n",
574
- " for result in all_results:\n",
575
- " results[result[\"model\"]].append(result)\n",
576
- " \n",
577
- " return results\n",
578
- "\n",
579
- "MODELS = [\n",
580
- " \"gpt-5\",\n",
581
- " \"gpt-5-mini\"\n",
582
- "]\n",
583
- " \n",
584
- "subset = level1_problems.select(range(20))\n",
585
- "results = await run_experiment(subset, MODELS)"
586
- ]
587
- },
588
- {
589
- "cell_type": "code",
590
- "execution_count": 20,
591
- "id": "04f60efa",
592
- "metadata": {},
593
- "outputs": [
594
- {
595
- "data": {
596
- "text/plain": [
597
- "{'gpt-5': [{'task_id': 'e1fc63a2-da7a-432f-be78-7c4a95598703',\n",
598
- " 'model': 'gpt-5',\n",
599
- " 'correct': True,\n",
600
- " 'is_solvable': True,\n",
601
- " 'prediction': '17',\n",
602
- " 'answer': '17',\n",
603
- " 'unsolvable_reason': ''},\n",
604
- " {'task_id': '8e867cd7-cff9-4e6c-867a-ff5ddc2550be',\n",
605
- " 'model': 'gpt-5',\n",
606
- " 'correct': False,\n",
607
- " 'is_solvable': True,\n",
608
- " 'prediction': '4',\n",
609
- " 'answer': '3',\n",
610
- " 'unsolvable_reason': ''},\n",
611
- " {'task_id': 'ec09fa32-d03f-4bf8-84b0-1f16922c3ae4',\n",
612
- " 'model': 'gpt-5',\n",
613
- " 'correct': True,\n",
614
- " 'is_solvable': True,\n",
615
- " 'prediction': '3',\n",
616
- " 'answer': '3',\n",
617
- " 'unsolvable_reason': ''},\n",
618
- " {'task_id': '5d0080cb-90d7-4712-bc33-848150e917d3',\n",
619
- " 'model': 'gpt-5',\n",
620
- " 'correct': False,\n",
621
- " 'is_solvable': False,\n",
622
- " 'prediction': '',\n",
623
- " 'answer': '0.1777',\n",
624
- " 'unsolvable_reason': 'I don’t have access to the specific paper text or its figures and can’t browse to retrieve the exact calculated volume.'},\n",
625
- " {'task_id': 'a1e91b78-d3d8-4675-bb8d-62741b4b68a6',\n",
626
- " 'model': 'gpt-5',\n",
627
- " 'correct': False,\n",
628
- " 'is_solvable': False,\n",
629
- " 'prediction': '',\n",
630
- " 'answer': '3',\n",
631
- " 'unsolvable_reason': 'I can’t access or watch the linked video to determine the number.'},\n",
632
- " {'task_id': '46719c30-f4c3-4cad-be07-d5cb21eee6bb',\n",
633
- " 'model': 'gpt-5',\n",
634
- " 'correct': False,\n",
635
- " 'is_solvable': False,\n",
636
- " 'prediction': '',\n",
637
- " 'answer': 'Mapping Human Oriented Information to Software Agents for Online Systems Usage',\n",
638
- " 'unsolvable_reason': 'I need to look up the 2015 paper’s author list and their publication histories, which I cannot access without web browsing or additional details.'},\n",
639
- " {'task_id': '4b6bb5f7-f634-410e-815d-e673ab7f8632',\n",
640
- " 'model': 'gpt-5',\n",
641
- " 'correct': True,\n",
642
- " 'is_solvable': True,\n",
643
- " 'prediction': 'THE CASTLE',\n",
644
- " 'answer': 'THE CASTLE',\n",
645
- " 'unsolvable_reason': ''},\n",
646
- " {'task_id': 'cffe0e32-c9a6-4c52-9877-78ceb4aaa9fb',\n",
647
- " 'model': 'gpt-5',\n",
648
- " 'correct': False,\n",
649
- " 'is_solvable': False,\n",
650
- " 'prediction': '',\n",
651
- " 'answer': 'Fred',\n",
652
- " 'unsolvable_reason': 'The referenced document with employee profiles and gift assignments is not provided, so the giver who failed to give a gift cannot be determined.'},\n",
653
- " {'task_id': '2d83110e-a098-4ebb-9987-066c06fa42d0',\n",
654
- " 'model': 'gpt-5',\n",
655
- " 'correct': True,\n",
656
- " 'is_solvable': True,\n",
657
- " 'prediction': 'right',\n",
658
- " 'answer': 'Right',\n",
659
- " 'unsolvable_reason': ''},\n",
660
- " {'task_id': '5cfb274c-0207-4aa7-9575-6ac0bd95d9b2',\n",
661
- " 'model': 'gpt-5',\n",
662
- " 'correct': False,\n",
663
- " 'is_solvable': False,\n",
664
- " 'prediction': '',\n",
665
- " 'answer': 'No',\n",
666
- " 'unsolvable_reason': 'Missing the spreadsheet/layout of green plots, so I cannot determine if a non-backtracking loop exists.'},\n",
667
- " {'task_id': '27d5d136-8563-469e-92bf-fd103c28b57c',\n",
668
- " 'model': 'gpt-5',\n",
669
- " 'correct': True,\n",
670
- " 'is_solvable': True,\n",
671
- " 'prediction': '(¬A → B) ↔ (A ∨ ¬B)',\n",
672
- " 'answer': '(¬A → B) ↔ (A ∨ ¬B)',\n",
673
- " 'unsolvable_reason': ''},\n",
674
- " {'task_id': 'dc28cf18-6431-458b-83ef-64b3ce566c10',\n",
675
- " 'model': 'gpt-5',\n",
676
- " 'correct': True,\n",
677
- " 'is_solvable': True,\n",
678
- " 'prediction': '2',\n",
679
- " 'answer': '2',\n",
680
- " 'unsolvable_reason': ''},\n",
681
- " {'task_id': 'b816bfce-3d80-4913-a07d-69b752ce6377',\n",
682
- " 'model': 'gpt-5',\n",
683
- " 'correct': False,\n",
684
- " 'is_solvable': True,\n",
685
- " 'prediction': 'cute',\n",
686
- " 'answer': 'fluffy',\n",
687
- " 'unsolvable_reason': ''},\n",
688
- " {'task_id': '72e110e7-464c-453c-a309-90a95aed6538',\n",
689
- " 'model': 'gpt-5',\n",
690
- " 'correct': False,\n",
691
- " 'is_solvable': False,\n",
692
- " 'prediction': '',\n",
693
- " 'answer': 'Guatemala',\n",
694
- " 'unsolvable_reason': 'I don’t have browsing access to verify the 2020 BASE DDC 633 page and its flags.'},\n",
695
- " {'task_id': '42576abe-0deb-4869-8c63-225c2d75a95a',\n",
696
- " 'model': 'gpt-5',\n",
697
- " 'correct': True,\n",
698
- " 'is_solvable': True,\n",
699
- " 'prediction': 'Maktay Mato Apple',\n",
700
- " 'answer': 'Maktay mato apple',\n",
701
- " 'unsolvable_reason': ''},\n",
702
- " {'task_id': 'b415aba4-4b68-4fc6-9b89-2c812e55a3e1',\n",
703
- " 'model': 'gpt-5',\n",
704
- " 'correct': False,\n",
705
- " 'is_solvable': False,\n",
706
- " 'prediction': '',\n",
707
- " 'answer': 'diamond',\n",
708
- " 'unsolvable_reason': 'I don’t have browsing tools to look up the specific 2012 Scientific Reports conference proceedings article and identify the nano-compound without external access.'},\n",
709
- " {'task_id': 'cca530fc-4052-43b2-b130-b30968d8aa44',\n",
710
- " 'model': 'gpt-5',\n",
711
- " 'correct': False,\n",
712
- " 'is_solvable': False,\n",
713
- " 'prediction': '',\n",
714
- " 'answer': 'Rd5',\n",
715
- " 'unsolvable_reason': 'Cannot view the chessboard image'},\n",
716
- " {'task_id': '935e2cff-ae78-4218-b3f5-115589b19dae',\n",
717
- " 'model': 'gpt-5',\n",
718
- " 'correct': True,\n",
719
- " 'is_solvable': True,\n",
720
- " 'prediction': 'research',\n",
721
- " 'answer': 'research',\n",
722
- " 'unsolvable_reason': ''},\n",
723
- " {'task_id': '4fc2f1ae-8625-45b5-ab34-ad4433bc21f8',\n",
724
- " 'model': 'gpt-5',\n",
725
- " 'correct': True,\n",
726
- " 'is_solvable': True,\n",
727
- " 'prediction': 'FunkMonk',\n",
728
- " 'answer': 'FunkMonk',\n",
729
- " 'unsolvable_reason': ''},\n",
730
- " {'task_id': '5188369a-3bbe-43d8-8b94-11558f909a08',\n",
731
- " 'model': 'gpt-5',\n",
732
- " 'correct': False,\n",
733
- " 'is_solvable': False,\n",
734
- " 'prediction': '',\n",
735
- " 'answer': 'Annie Levin',\n",
736
- " 'unsolvable_reason': 'I need to look up Merriam-Webster’s Word of the Day page for June 27, 2022 to see the quoted writer, but I don’t have browsing access.'}],\n",
737
- " 'gpt-5-mini': [{'task_id': 'e1fc63a2-da7a-432f-be78-7c4a95598703',\n",
738
- " 'model': 'gpt-5-mini',\n",
739
- " 'correct': False,\n",
740
- " 'is_solvable': False,\n",
741
- " 'prediction': '',\n",
742
- " 'answer': '17',\n",
743
- " 'unsolvable_reason': 'I cannot access external websites such as Wikipedia to retrieve the exact minimum perigee value required for the calculation.'},\n",
744
- " {'task_id': '8e867cd7-cff9-4e6c-867a-ff5ddc2550be',\n",
745
- " 'model': 'gpt-5-mini',\n",
746
- " 'correct': False,\n",
747
- " 'is_solvable': False,\n",
748
- " 'prediction': '',\n",
749
- " 'answer': '3',\n",
750
- " 'unsolvable_reason': \"I cannot access the 2022 English Wikipedia from here to verify Mercedes Sosa's discography and reliably count studio albums released between 2000 and 2009.\"},\n",
751
- " {'task_id': 'ec09fa32-d03f-4bf8-84b0-1f16922c3ae4',\n",
752
- " 'model': 'gpt-5-mini',\n",
753
- " 'correct': True,\n",
754
- " 'is_solvable': True,\n",
755
- " 'prediction': '3',\n",
756
- " 'answer': '3',\n",
757
- " 'unsolvable_reason': ''},\n",
758
- " {'task_id': '5d0080cb-90d7-4712-bc33-848150e917d3',\n",
759
- " 'model': 'gpt-5-mini',\n",
760
- " 'correct': False,\n",
761
- " 'is_solvable': False,\n",
762
- " 'prediction': '',\n",
763
- " 'answer': '0.1777',\n",
764
- " 'unsolvable_reason': \"I cannot access external documents or the internet and do not have the paper's calculated fish bag volume memorized.\"},\n",
765
- " {'task_id': 'a1e91b78-d3d8-4675-bb8d-62741b4b68a6',\n",
766
- " 'model': 'gpt-5-mini',\n",
767
- " 'correct': False,\n",
768
- " 'is_solvable': False,\n",
769
- " 'prediction': '',\n",
770
- " 'answer': '3',\n",
771
- " 'unsolvable_reason': 'I cannot access or view external video content (YouTube) to count bird species on screen.'},\n",
772
- " {'task_id': '46719c30-f4c3-4cad-be07-d5cb21eee6bb',\n",
773
- " 'model': 'gpt-5-mini',\n",
774
- " 'correct': False,\n",
775
- " 'is_solvable': False,\n",
776
- " 'prediction': '',\n",
777
- " 'answer': 'Mapping Human Oriented Information to Software Agents for Online Systems Usage',\n",
778
- " 'unsolvable_reason': \"I cannot access external databases or the internet to look up the 2015 paper's authors and their publication histories, and I do not have that specific bibliographic information memorized.\"},\n",
779
- " {'task_id': '4b6bb5f7-f634-410e-815d-e673ab7f8632',\n",
780
- " 'model': 'gpt-5-mini',\n",
781
- " 'correct': False,\n",
782
- " 'is_solvable': False,\n",
783
- " 'prediction': '',\n",
784
- " 'answer': 'THE CASTLE',\n",
785
- " 'unsolvable_reason': 'I cannot reliably recall the exact wording of the first scene heading from the official script and I cannot access external resources to check the script to provide the precise, verbatim setting.'},\n",
786
- " {'task_id': 'cffe0e32-c9a6-4c52-9877-78ceb4aaa9fb',\n",
787
- " 'model': 'gpt-5-mini',\n",
788
- " 'correct': False,\n",
789
- " 'is_solvable': False,\n",
790
- " 'prediction': '',\n",
791
- " 'answer': 'Fred',\n",
792
- " 'unsolvable_reason': 'Insufficient information: the document with the employees, their likes, and assignment/gift details was not provided.'},\n",
793
- " {'task_id': '2d83110e-a098-4ebb-9987-066c06fa42d0',\n",
794
- " 'model': 'gpt-5-mini',\n",
795
- " 'correct': True,\n",
796
- " 'is_solvable': True,\n",
797
- " 'prediction': 'right',\n",
798
- " 'answer': 'Right',\n",
799
- " 'unsolvable_reason': ''},\n",
800
- " {'task_id': '5cfb274c-0207-4aa7-9575-6ac0bd95d9b2',\n",
801
- " 'model': 'gpt-5-mini',\n",
802
- " 'correct': False,\n",
803
- " 'is_solvable': False,\n",
804
- " 'prediction': '',\n",
805
- " 'answer': 'No',\n",
806
- " 'unsolvable_reason': 'I cannot access the attached spreadsheet or any images. Paste the grid (use G for Earl plots and . for others) or give coordinates so I can analyze the path.'},\n",
807
- " {'task_id': '27d5d136-8563-469e-92bf-fd103c28b57c',\n",
808
- " 'model': 'gpt-5-mini',\n",
809
- " 'correct': True,\n",
810
- " 'is_solvable': True,\n",
811
- " 'prediction': '(¬A → B) ↔ (A ∨ ¬B)',\n",
812
- " 'answer': '(¬A → B) ↔ (A ∨ ¬B)',\n",
813
- " 'unsolvable_reason': ''},\n",
814
- " {'task_id': 'dc28cf18-6431-458b-83ef-64b3ce566c10',\n",
815
- " 'model': 'gpt-5-mini',\n",
816
- " 'correct': True,\n",
817
- " 'is_solvable': True,\n",
818
- " 'prediction': '2',\n",
819
- " 'answer': '2',\n",
820
- " 'unsolvable_reason': ''},\n",
821
- " {'task_id': 'b816bfce-3d80-4913-a07d-69b752ce6377',\n",
822
- " 'model': 'gpt-5-mini',\n",
823
- " 'correct': False,\n",
824
- " 'is_solvable': False,\n",
825
- " 'prediction': '',\n",
826
- " 'answer': 'fluffy',\n",
827
- " 'unsolvable_reason': \"I cannot access external sources to read Emily Midkiff's June 2014 article in Fafnir and so cannot determine the quoted word.\"},\n",
828
- " {'task_id': '72e110e7-464c-453c-a309-90a95aed6538',\n",
829
- " 'model': 'gpt-5-mini',\n",
830
- " 'correct': False,\n",
831
- " 'is_solvable': False,\n",
832
- " 'prediction': '',\n",
833
- " 'answer': 'Guatemala',\n",
834
- " 'unsolvable_reason': 'I cannot browse the Bielefeld University Library BASE site or view its 2020 content to inspect the article flags. Determining which country’s flag was unique requires accessing that specific webpage or an archived snapshot, which I cannot do.'},\n",
835
- " {'task_id': '42576abe-0deb-4869-8c63-225c2d75a95a',\n",
836
- " 'model': 'gpt-5-mini',\n",
837
- " 'correct': True,\n",
838
- " 'is_solvable': True,\n",
839
- " 'prediction': 'Maktay Mato Apple',\n",
840
- " 'answer': 'Maktay mato apple',\n",
841
- " 'unsolvable_reason': ''},\n",
842
- " {'task_id': 'b415aba4-4b68-4fc6-9b89-2c812e55a3e1',\n",
843
- " 'model': 'gpt-5-mini',\n",
844
- " 'correct': False,\n",
845
- " 'is_solvable': False,\n",
846
- " 'prediction': '',\n",
847
- " 'answer': 'diamond',\n",
848
- " 'unsolvable_reason': 'I cannot access external web resources or the specific 2012 Scientific Reports conference proceedings to identify that article and its studied compound.'},\n",
849
- " {'task_id': 'cca530fc-4052-43b2-b130-b30968d8aa44',\n",
850
- " 'model': 'gpt-5-mini',\n",
851
- " 'correct': False,\n",
852
- " 'is_solvable': False,\n",
853
- " 'prediction': 'image not available',\n",
854
- " 'answer': 'Rd5',\n",
855
- " 'unsolvable_reason': 'Image not provided or inaccessible; cannot determine board position and legal winning move'},\n",
856
- " {'task_id': '935e2cff-ae78-4218-b3f5-115589b19dae',\n",
857
- " 'model': 'gpt-5-mini',\n",
858
- " 'correct': False,\n",
859
- " 'is_solvable': False,\n",
860
- " 'prediction': '',\n",
861
- " 'answer': 'research',\n",
862
- " 'unsolvable_reason': \"I cannot access or view the specific Wikipedia public logs for the Legume page from 2022; determining what 'R' stood for requires looking at those logs or contemporaneous Wikipedia discussion, which I cannot browse from here.\"},\n",
863
- " {'task_id': '4fc2f1ae-8625-45b5-ab34-ad4433bc21f8',\n",
864
- " 'model': 'gpt-5-mini',\n",
865
- " 'correct': False,\n",
866
- " 'is_solvable': False,\n",
867
- " 'prediction': '',\n",
868
- " 'answer': 'FunkMonk',\n",
869
- " 'unsolvable_reason': 'I cannot access Wikipedia or external web sources to check which dinosaur article was promoted in November 2016 and who nominated it.'},\n",
870
- " {'task_id': '5188369a-3bbe-43d8-8b94-11558f909a08',\n",
871
- " 'model': 'gpt-5-mini',\n",
872
- " 'correct': False,\n",
873
- " 'is_solvable': False,\n",
874
- " 'prediction': '',\n",
875
- " 'answer': 'Annie Levin',\n",
876
- " 'unsolvable_reason': 'I cannot access the Merriam-Webster Word of the Day archive or the web to verify the quoted writer for June 27 2022.'}]}"
877
- ]
878
- },
879
- "execution_count": 20,
880
- "metadata": {},
881
- "output_type": "execute_result"
882
- }
883
- ],
884
- "source": [
885
- "results"
886
- ]
887
- },
888
- {
889
- "cell_type": "markdown",
890
- "id": "99926f44",
891
- "metadata": {},
892
- "source": [
893
- "## Tool Usage"
894
- ]
895
- },
896
- {
897
- "cell_type": "code",
898
- "execution_count": null,
899
- "id": "ba50100c",
900
- "metadata": {},
901
- "outputs": [],
902
- "source": []
903
- }
904
- ],
905
- "metadata": {
906
- "kernelspec": {
907
- "display_name": ".venv",
908
- "language": "python",
909
- "name": "python3"
910
- },
911
- "language_info": {
912
- "codemirror_mode": {
913
- "name": "ipython",
914
- "version": 3
915
- },
916
- "file_extension": ".py",
917
- "mimetype": "text/x-python",
918
- "name": "python",
919
- "nbconvert_exporter": "python",
920
- "pygments_lexer": "ipython3",
921
- "version": "3.12.11"
922
- }
923
- },
924
- "nbformat": 4,
925
- "nbformat_minor": 5
926
- }
 
1
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tavily_mcp_server.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from tavily import TavilyClient
3
+ from dotenv import load_dotenv
4
+ from mcp.server.fastmcp import FastMCP
5
+
6
+ load_dotenv()
7
+
8
+ tavily_client = TavilyClient(os.getenv("TAVILY_API_KEY"))
9
+
10
+ mcp = FastMCP("custom-tavily-search")
11
+
12
+ @mcp.tool()
13
+ def search_web(query: str, max_results: int = 5) -> str:
14
+ """
15
+ Search the web using Tavily API.
16
+
17
+ Args:
18
+ query: Search query string
19
+ max_results: Maximum number of results to return (default: 5)
20
+
21
+ Returns:
22
+ Search results as formatted string
23
+ """
24
+ try:
25
+ response = tavily_client.search(
26
+ query,
27
+ max_results=max_results,
28
+ )
29
+ results = response.get("results", [])
30
+ return "\n\n".join(
31
+ f"Title: {r['title']}\nURL: {r['url']}\nContent: {r['content']}"
32
+ for r in results
33
+ )
34
+ except Exception as e:
35
+ return f"Error searching web: {str(e)}"
36
+
37
+ if __name__ == "__main__":
38
+ mcp.run(transport='stdio')