benkassmi commited on
Commit
bd67676
·
verified ·
1 Parent(s): 707cca1

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +115 -15
  2. app.py +303 -103
README.md CHANGED
@@ -11,22 +11,122 @@ app_port: 7860
11
 
12
  # Crawl4AI MCP Server
13
 
14
- This is a Crawl4AI MCP (Model Context Protocol) server deployed on Hugging Face Spaces.
15
 
16
- ## Features
17
 
18
- - Web scraping with Playwright
19
- - Markdown extraction
20
- - JavaScript execution
21
- - Batch URL processing
22
- - Screenshot capture
23
- - PDF generation
24
 
25
- ## API Endpoints
 
26
 
27
- - `GET /` - Health check
28
- - `GET /health` - Detailed health status
29
- - `POST /md` - Extract markdown from URL
30
- - `POST /crawl` - Batch process multiple URLs
31
- - `POST /execute_js` - Execute JavaScript on page
32
- - `GET /mcp/sse` - MCP Server-Sent Events endpoint
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  # Crawl4AI MCP Server
13
 
14
+ Serveur MCP (Model Context Protocol) pour le web scraping avec Crawl4AI, compatible avec **Microsoft Copilot Studio**.
15
 
16
+ ## ⚠️ Important - Transport Streamable HTTP
17
 
18
+ Ce serveur utilise le transport **Streamable HTTP** (et non SSE qui est déprécié).
 
 
 
 
 
19
 
20
+ - **Endpoint MCP** : `POST /mcp`
21
+ - **Protocole** : MCP Streamable HTTP 1.0
22
 
23
+ ## 🛠️ Outils disponibles
24
+
25
+ | Outil | Description |
26
+ |-------|-------------|
27
+ | `md` | Extraire le contenu markdown d'une page web |
28
+ | `html` | Extraire le HTML brut d'une page web |
29
+ | `crawl` | Crawler plusieurs URLs en batch |
30
+ | `execute_js` | Exécuter du JavaScript sur une page |
31
+
32
+ ## 🔗 Endpoints API
33
+
34
+ | Endpoint | Méthode | Description |
35
+ |----------|---------|-------------|
36
+ | `/mcp` | POST | **Endpoint MCP principal** (Streamable HTTP) |
37
+ | `/mcp` | GET | Retourne une erreur 405 (attendu par Copilot Studio) |
38
+ | `/` | GET | Health check et info serveur |
39
+ | `/health` | GET | Statut détaillé |
40
+ | `/debug/tools` | GET | Liste des outils (debug) |
41
+
42
+ ## 🚀 Configuration avec Microsoft Copilot Studio
43
+
44
+ ### Étape 1 : Vérifier que le serveur fonctionne
45
+
46
+ Accédez à `https://YOUR_SPACE.hf.space/mcp` - vous devriez voir :
47
+
48
+ ```json
49
+ {"jsonrpc":"2.0","error":{"code":-32000,"message":"Method not allowed."},"id":null}
50
+ ```
51
+
52
+ C'est normal ! Cela confirme que le serveur est correctement configuré.
53
+
54
+ ### Étape 2 : Créer un Custom Connector
55
+
56
+ 1. Allez sur [Power Apps Custom Connectors](https://make.preview.powerapps.com/customconnectors)
57
+ 2. Cliquez **+ New custom connector** → **Import from GitHub**
58
+ 3. Sélectionnez :
59
+ - **Connector Type** : `Custom`
60
+ - **Branch** : `dev`
61
+ - **Connector** : `MCP-Streamable-HTTP`
62
+ 4. Cliquez **Continue**
63
+ 5. Modifiez :
64
+ - **Connector Name** : `Crawl4AI MCP`
65
+ - **Host** : `YOUR_SPACE.hf.space` (sans https://)
66
+ 6. Cliquez **Create connector**
67
+
68
+ ### Étape 3 : Ajouter à votre Agent
69
+
70
+ 1. Allez dans [Copilot Studio](https://copilotstudio.preview.microsoft.com/)
71
+ 2. Sélectionnez votre agent
72
+ 3. Activez **Generative Orchestration**
73
+ 4. Allez dans **Tools** → **Add a tool**
74
+ 5. Filtrez par **Model Context Protocol**
75
+ 6. Sélectionnez **Crawl4AI MCP**
76
+ 7. Créez une nouvelle connexion
77
+ 8. Cliquez **Add to agent**
78
+
79
+ ### Étape 4 : Tester
80
+
81
+ Dans le panneau de test, essayez :
82
+ ```
83
+ Can you extract the content from https://example.com?
84
+ ```
85
+
86
+ ## 📋 Prérequis Copilot Studio
87
+
88
+ - Environment avec **"Get new features early"** activé
89
+ - **Generative Orchestration** activé sur l'agent
90
+ - Custom Connector configuré correctement
91
+
92
+ ## 🧪 Test local
93
+
94
+ ```bash
95
+ # Test de l'endpoint MCP
96
+ curl -X POST https://YOUR_SPACE.hf.space/mcp \
97
+ -H "Content-Type: application/json" \
98
+ -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
99
+ ```
100
+
101
+ Réponse attendue :
102
+ ```json
103
+ {
104
+ "jsonrpc": "2.0",
105
+ "id": 1,
106
+ "result": {
107
+ "tools": [
108
+ {"name": "md", ...},
109
+ {"name": "html", ...},
110
+ {"name": "crawl", ...},
111
+ {"name": "execute_js", ...}
112
+ ]
113
+ }
114
+ }
115
+ ```
116
+
117
+ ## 🔧 Développement local
118
+
119
+ ```bash
120
+ # Installer les dépendances
121
+ pip install -r requirements.txt
122
+ playwright install chromium
123
+
124
+ # Lancer le serveur
125
+ python app.py
126
+ ```
127
+
128
+ ## 📝 Notes
129
+
130
+ - SSE (`/mcp/sse`) est déprécié et redirige vers une info
131
+ - Le serveur retourne `405 Method Not Allowed` pour GET sur `/mcp` (comportement attendu)
132
+ - Les outils sont automatiquement découverts par Copilot Studio via `tools/list`
app.py CHANGED
@@ -1,11 +1,16 @@
1
  #!/usr/bin/env python3
 
 
 
 
2
  import asyncio
3
  import json
4
  import logging
5
- from typing import Any
6
- from fastapi import FastAPI, Request
7
- from fastapi.responses import StreamingResponse, JSONResponse
8
- from sse_starlette.sse import EventSourceResponse
 
9
  from crawl4ai import AsyncWebCrawler
10
  import uvicorn
11
 
@@ -14,25 +19,127 @@ logger = logging.getLogger(__name__)
14
 
15
  app = FastAPI(title="Crawl4AI MCP Server")
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  # === MCP PROTOCOL HANDLERS ===
18
 
19
- async def handle_mcp_request(request_data: dict) -> dict:
20
  """Handle MCP JSON-RPC 2.0 requests"""
21
  method = request_data.get("method")
22
  params = request_data.get("params", {})
23
  request_id = request_data.get("id")
24
 
25
- logger.info(f"MCP Request: {method}")
26
 
27
  try:
28
  if method == "initialize":
 
 
 
 
 
 
 
29
  return {
30
  "jsonrpc": "2.0",
31
  "id": request_id,
32
  "result": {
33
  "protocolVersion": "2024-11-05",
34
  "capabilities": {
35
- "tools": {}
 
 
36
  },
37
  "serverInfo": {
38
  "name": "crawl4ai-mcp-server",
@@ -41,66 +148,25 @@ async def handle_mcp_request(request_data: dict) -> dict:
41
  }
42
  }
43
 
 
 
 
 
44
  elif method == "tools/list":
45
- tools = [
46
- {
47
- "name": "md",
48
- "description": "Extract markdown content from a webpage",
49
- "inputSchema": {
50
- "type": "object",
51
- "properties": {
52
- "url": {"type": "string", "description": "URL to scrape"},
53
- "filter_mode": {"type": "string", "enum": ["raw", "fit"], "default": "fit"}
54
- },
55
- "required": ["url"]
56
- }
57
- },
58
- {
59
- "name": "html",
60
- "description": "Extract HTML from a webpage",
61
- "inputSchema": {
62
- "type": "object",
63
- "properties": {
64
- "url": {"type": "string", "description": "URL to scrape"}
65
- },
66
- "required": ["url"]
67
- }
68
- },
69
- {
70
- "name": "crawl",
71
- "description": "Batch crawl multiple URLs",
72
- "inputSchema": {
73
- "type": "object",
74
- "properties": {
75
- "urls": {"type": "array", "items": {"type": "string"}},
76
- "filter_mode": {"type": "string", "enum": ["raw", "fit"], "default": "fit"}
77
- },
78
- "required": ["urls"]
79
- }
80
- },
81
- {
82
- "name": "execute_js",
83
- "description": "Execute JavaScript on a page",
84
- "inputSchema": {
85
- "type": "object",
86
- "properties": {
87
- "url": {"type": "string"},
88
- "scripts": {"type": "array", "items": {"type": "string"}}
89
- },
90
- "required": ["url", "scripts"]
91
- }
92
- }
93
- ]
94
  return {
95
  "jsonrpc": "2.0",
96
  "id": request_id,
97
- "result": {"tools": tools}
 
 
98
  }
99
 
100
  elif method == "tools/call":
101
  tool_name = params.get("name")
102
  tool_args = params.get("arguments", {})
103
 
 
104
  result = await execute_tool(tool_name, tool_args)
105
 
106
  return {
@@ -112,11 +178,40 @@ async def handle_mcp_request(request_data: dict) -> dict:
112
  "type": "text",
113
  "text": result
114
  }
115
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  }
117
  }
118
 
119
  else:
 
120
  return {
121
  "jsonrpc": "2.0",
122
  "id": request_id,
@@ -127,7 +222,7 @@ async def handle_mcp_request(request_data: dict) -> dict:
127
  }
128
 
129
  except Exception as e:
130
- logger.error(f"Error handling MCP request: {str(e)}")
131
  return {
132
  "jsonrpc": "2.0",
133
  "id": request_id,
@@ -145,6 +240,7 @@ async def execute_tool(name: str, args: dict) -> str:
145
  url = args.get("url")
146
  filter_mode = args.get("filter_mode", "fit")
147
 
 
148
  async with AsyncWebCrawler(headless=True, verbose=False) as crawler:
149
  result = await crawler.arun(
150
  url=url,
@@ -160,11 +256,12 @@ async def execute_tool(name: str, args: dict) -> str:
160
  elif name == "html":
161
  url = args.get("url")
162
 
 
163
  async with AsyncWebCrawler(headless=True, verbose=False) as crawler:
164
  result = await crawler.arun(url=url, bypass_cache=True)
165
 
166
  if result.success:
167
- return result.html[:10000] # Limiter la taille
168
  else:
169
  return f"❌ Failed: {result.error_message}"
170
 
@@ -172,6 +269,7 @@ async def execute_tool(name: str, args: dict) -> str:
172
  urls = args.get("urls", [])[:10]
173
  results_text = f"# Batch Crawl Results ({len(urls)} URLs)\n\n"
174
 
 
175
  async with AsyncWebCrawler(headless=True, verbose=False) as crawler:
176
  for idx, url in enumerate(urls, 1):
177
  result = await crawler.arun(url=url, bypass_cache=True)
@@ -186,6 +284,7 @@ async def execute_tool(name: str, args: dict) -> str:
186
  url = args.get("url")
187
  scripts = args.get("scripts", [])
188
 
 
189
  async with AsyncWebCrawler(headless=True, verbose=False) as crawler:
190
  result = await crawler.arun(url=url, js_code=scripts, bypass_cache=True)
191
 
@@ -198,70 +297,171 @@ async def execute_tool(name: str, args: dict) -> str:
198
  return f"❌ Unknown tool: {name}"
199
 
200
  except Exception as e:
201
- logger.error(f"Error executing tool {name}: {str(e)}")
202
  return f"❌ Error: {str(e)}"
203
 
204
 
205
- # === ENDPOINTS ===
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
206
 
207
  @app.get("/")
208
  async def root():
 
209
  return {
210
  "status": "running",
211
  "server": "crawl4ai-mcp-server",
212
  "version": "1.0.0",
213
- "protocol": "MCP over SSE",
214
- "endpoint": "/mcp/sse"
 
 
 
 
 
 
 
 
 
 
 
 
 
215
  }
216
 
217
 
 
 
 
218
  @app.get("/mcp/sse")
219
- async def mcp_sse_get():
220
- """SSE endpoint for MCP protocol"""
221
- async def event_generator():
222
- # Send initial connection message
223
- yield {
224
- "event": "message",
225
- "data": json.dumps({
226
- "jsonrpc": "2.0",
227
- "method": "notifications/initialized",
228
- "params": {}
229
- })
230
- }
231
-
232
- # Keep connection alive
233
- while True:
234
- await asyncio.sleep(30)
235
- yield {"event": "ping", "data": ""}
236
-
237
- return EventSourceResponse(event_generator())
238
 
 
 
 
 
 
 
 
239
 
240
- @app.post("/mcp/sse")
241
- async def mcp_sse_post(request: Request):
242
- """Handle MCP requests via POST"""
 
243
  try:
244
  body = await request.json()
245
- logger.info(f"Received MCP request: {body}")
 
246
 
247
- response = await handle_mcp_request(body)
248
- return JSONResponse(content=response)
249
-
250
  except Exception as e:
251
- logger.error(f"Error in MCP POST: {str(e)}")
252
- return JSONResponse(
253
- content={
254
- "jsonrpc": "2.0",
255
- "error": {
256
- "code": -32700,
257
- "message": f"Parse error: {str(e)}"
258
- },
259
- "id": None
260
- },
261
- status_code=500
262
- )
263
 
264
 
265
  if __name__ == "__main__":
266
- logger.info("🚀 Starting Crawl4AI MCP Server on port 7860")
 
 
267
  uvicorn.run(app, host="0.0.0.0", port=7860)
 
1
  #!/usr/bin/env python3
2
+ """
3
+ Crawl4AI MCP Server - Compatible with Microsoft Copilot Studio
4
+ Uses Streamable HTTP transport (not deprecated SSE)
5
+ """
6
  import asyncio
7
  import json
8
  import logging
9
+ import uuid
10
+ from typing import Any, Dict, Optional
11
+ from fastapi import FastAPI, Request, Response, HTTPException
12
+ from fastapi.responses import JSONResponse, StreamingResponse
13
+ from fastapi.middleware.cors import CORSMiddleware
14
  from crawl4ai import AsyncWebCrawler
15
  import uvicorn
16
 
 
19
 
20
  app = FastAPI(title="Crawl4AI MCP Server")
21
 
22
+ # Add CORS middleware for cross-origin requests
23
+ app.add_middleware(
24
+ CORSMiddleware,
25
+ allow_origins=["*"],
26
+ allow_credentials=True,
27
+ allow_methods=["*"],
28
+ allow_headers=["*"],
29
+ )
30
+
31
+ # Session storage for stateful connections
32
+ sessions: Dict[str, Dict] = {}
33
+
34
+
35
+ # === MCP TOOLS DEFINITION ===
36
+
37
+ MCP_TOOLS = [
38
+ {
39
+ "name": "md",
40
+ "description": "Extract markdown content from a webpage. Use this to scrape and convert web pages to clean markdown format.",
41
+ "inputSchema": {
42
+ "type": "object",
43
+ "properties": {
44
+ "url": {
45
+ "type": "string",
46
+ "description": "The URL of the webpage to scrape"
47
+ },
48
+ "filter_mode": {
49
+ "type": "string",
50
+ "enum": ["raw", "fit"],
51
+ "default": "fit",
52
+ "description": "Filter mode: 'fit' for cleaned content, 'raw' for all content"
53
+ }
54
+ },
55
+ "required": ["url"]
56
+ }
57
+ },
58
+ {
59
+ "name": "html",
60
+ "description": "Extract raw HTML from a webpage",
61
+ "inputSchema": {
62
+ "type": "object",
63
+ "properties": {
64
+ "url": {
65
+ "type": "string",
66
+ "description": "The URL of the webpage to scrape"
67
+ }
68
+ },
69
+ "required": ["url"]
70
+ }
71
+ },
72
+ {
73
+ "name": "crawl",
74
+ "description": "Batch crawl multiple URLs and extract markdown content from each",
75
+ "inputSchema": {
76
+ "type": "object",
77
+ "properties": {
78
+ "urls": {
79
+ "type": "array",
80
+ "items": {"type": "string"},
81
+ "description": "List of URLs to crawl (max 10)"
82
+ },
83
+ "filter_mode": {
84
+ "type": "string",
85
+ "enum": ["raw", "fit"],
86
+ "default": "fit",
87
+ "description": "Filter mode for content extraction"
88
+ }
89
+ },
90
+ "required": ["urls"]
91
+ }
92
+ },
93
+ {
94
+ "name": "execute_js",
95
+ "description": "Execute JavaScript code on a webpage and return the resulting content",
96
+ "inputSchema": {
97
+ "type": "object",
98
+ "properties": {
99
+ "url": {
100
+ "type": "string",
101
+ "description": "The URL of the webpage"
102
+ },
103
+ "scripts": {
104
+ "type": "array",
105
+ "items": {"type": "string"},
106
+ "description": "List of JavaScript code snippets to execute"
107
+ }
108
+ },
109
+ "required": ["url", "scripts"]
110
+ }
111
+ }
112
+ ]
113
+
114
+
115
  # === MCP PROTOCOL HANDLERS ===
116
 
117
+ async def handle_mcp_request(request_data: dict, session_id: str = None) -> dict:
118
  """Handle MCP JSON-RPC 2.0 requests"""
119
  method = request_data.get("method")
120
  params = request_data.get("params", {})
121
  request_id = request_data.get("id")
122
 
123
+ logger.info(f"MCP Request: method={method}, id={request_id}, session={session_id}")
124
 
125
  try:
126
  if method == "initialize":
127
+ # Store session info
128
+ if session_id:
129
+ sessions[session_id] = {
130
+ "initialized": True,
131
+ "protocol_version": params.get("protocolVersion", "2024-11-05")
132
+ }
133
+
134
  return {
135
  "jsonrpc": "2.0",
136
  "id": request_id,
137
  "result": {
138
  "protocolVersion": "2024-11-05",
139
  "capabilities": {
140
+ "tools": {
141
+ "listChanged": False
142
+ }
143
  },
144
  "serverInfo": {
145
  "name": "crawl4ai-mcp-server",
 
148
  }
149
  }
150
 
151
+ elif method == "notifications/initialized":
152
+ # Client acknowledgment - no response needed for notifications
153
+ return None
154
+
155
  elif method == "tools/list":
156
+ logger.info(f"Returning {len(MCP_TOOLS)} tools")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
  return {
158
  "jsonrpc": "2.0",
159
  "id": request_id,
160
+ "result": {
161
+ "tools": MCP_TOOLS
162
+ }
163
  }
164
 
165
  elif method == "tools/call":
166
  tool_name = params.get("name")
167
  tool_args = params.get("arguments", {})
168
 
169
+ logger.info(f"Calling tool: {tool_name} with args: {tool_args}")
170
  result = await execute_tool(tool_name, tool_args)
171
 
172
  return {
 
178
  "type": "text",
179
  "text": result
180
  }
181
+ ],
182
+ "isError": False
183
+ }
184
+ }
185
+
186
+ elif method == "ping":
187
+ return {
188
+ "jsonrpc": "2.0",
189
+ "id": request_id,
190
+ "result": {}
191
+ }
192
+
193
+ elif method == "resources/list":
194
+ # We don't have resources, return empty list
195
+ return {
196
+ "jsonrpc": "2.0",
197
+ "id": request_id,
198
+ "result": {
199
+ "resources": []
200
+ }
201
+ }
202
+
203
+ elif method == "prompts/list":
204
+ # We don't have prompts, return empty list
205
+ return {
206
+ "jsonrpc": "2.0",
207
+ "id": request_id,
208
+ "result": {
209
+ "prompts": []
210
  }
211
  }
212
 
213
  else:
214
+ logger.warning(f"Unknown method: {method}")
215
  return {
216
  "jsonrpc": "2.0",
217
  "id": request_id,
 
222
  }
223
 
224
  except Exception as e:
225
+ logger.error(f"Error handling MCP request: {str(e)}", exc_info=True)
226
  return {
227
  "jsonrpc": "2.0",
228
  "id": request_id,
 
240
  url = args.get("url")
241
  filter_mode = args.get("filter_mode", "fit")
242
 
243
+ logger.info(f"Extracting markdown from: {url}")
244
  async with AsyncWebCrawler(headless=True, verbose=False) as crawler:
245
  result = await crawler.arun(
246
  url=url,
 
256
  elif name == "html":
257
  url = args.get("url")
258
 
259
+ logger.info(f"Extracting HTML from: {url}")
260
  async with AsyncWebCrawler(headless=True, verbose=False) as crawler:
261
  result = await crawler.arun(url=url, bypass_cache=True)
262
 
263
  if result.success:
264
+ return result.html[:10000] # Limit size
265
  else:
266
  return f"❌ Failed: {result.error_message}"
267
 
 
269
  urls = args.get("urls", [])[:10]
270
  results_text = f"# Batch Crawl Results ({len(urls)} URLs)\n\n"
271
 
272
+ logger.info(f"Batch crawling {len(urls)} URLs")
273
  async with AsyncWebCrawler(headless=True, verbose=False) as crawler:
274
  for idx, url in enumerate(urls, 1):
275
  result = await crawler.arun(url=url, bypass_cache=True)
 
284
  url = args.get("url")
285
  scripts = args.get("scripts", [])
286
 
287
+ logger.info(f"Executing JS on: {url}")
288
  async with AsyncWebCrawler(headless=True, verbose=False) as crawler:
289
  result = await crawler.arun(url=url, js_code=scripts, bypass_cache=True)
290
 
 
297
  return f"❌ Unknown tool: {name}"
298
 
299
  except Exception as e:
300
+ logger.error(f"Error executing tool {name}: {str(e)}", exc_info=True)
301
  return f"❌ Error: {str(e)}"
302
 
303
 
304
+ # === STREAMABLE HTTP ENDPOINT (for Copilot Studio) ===
305
+
306
+ @app.post("/mcp")
307
+ async def mcp_streamable_http(request: Request):
308
+ """
309
+ Main MCP endpoint using Streamable HTTP transport.
310
+ This is what Microsoft Copilot Studio expects.
311
+ """
312
+ try:
313
+ body = await request.json()
314
+ logger.info(f"MCP POST /mcp: {json.dumps(body)[:500]}")
315
+
316
+ # Get or create session from header
317
+ session_id = request.headers.get("mcp-session-id", str(uuid.uuid4()))
318
+
319
+ response = await handle_mcp_request(body, session_id)
320
+
321
+ if response is None:
322
+ # For notifications, return 202 Accepted
323
+ return Response(status_code=202)
324
+
325
+ # Return JSON response with session header
326
+ return JSONResponse(
327
+ content=response,
328
+ headers={
329
+ "mcp-session-id": session_id,
330
+ "Content-Type": "application/json"
331
+ }
332
+ )
333
+
334
+ except json.JSONDecodeError as e:
335
+ logger.error(f"JSON decode error: {str(e)}")
336
+ return JSONResponse(
337
+ content={
338
+ "jsonrpc": "2.0",
339
+ "error": {
340
+ "code": -32700,
341
+ "message": f"Parse error: {str(e)}"
342
+ },
343
+ "id": None
344
+ },
345
+ status_code=400
346
+ )
347
+ except Exception as e:
348
+ logger.error(f"Error in MCP endpoint: {str(e)}", exc_info=True)
349
+ return JSONResponse(
350
+ content={
351
+ "jsonrpc": "2.0",
352
+ "error": {
353
+ "code": -32603,
354
+ "message": f"Internal error: {str(e)}"
355
+ },
356
+ "id": None
357
+ },
358
+ status_code=500
359
+ )
360
+
361
+
362
+ @app.get("/mcp")
363
+ async def mcp_get_not_allowed():
364
+ """
365
+ GET requests to /mcp are not allowed in Streamable HTTP.
366
+ This error message is expected by Copilot Studio to validate the server.
367
+ """
368
+ return JSONResponse(
369
+ content={
370
+ "jsonrpc": "2.0",
371
+ "error": {
372
+ "code": -32000,
373
+ "message": "Method not allowed."
374
+ },
375
+ "id": None
376
+ },
377
+ status_code=405
378
+ )
379
+
380
+
381
+ @app.delete("/mcp")
382
+ async def mcp_delete_session(request: Request):
383
+ """Handle session termination"""
384
+ session_id = request.headers.get("mcp-session-id")
385
+ if session_id and session_id in sessions:
386
+ del sessions[session_id]
387
+ logger.info(f"Session deleted: {session_id}")
388
+ return Response(status_code=204)
389
+
390
+
391
+ # === HEALTH & INFO ENDPOINTS ===
392
 
393
  @app.get("/")
394
  async def root():
395
+ """Root endpoint with server information"""
396
  return {
397
  "status": "running",
398
  "server": "crawl4ai-mcp-server",
399
  "version": "1.0.0",
400
+ "protocol": "MCP Streamable HTTP",
401
+ "mcp_endpoint": "/mcp",
402
+ "tools_count": len(MCP_TOOLS),
403
+ "tools": [t["name"] for t in MCP_TOOLS]
404
+ }
405
+
406
+
407
+ @app.get("/health")
408
+ async def health():
409
+ """Health check endpoint"""
410
+ return {
411
+ "status": "healthy",
412
+ "tools_count": len(MCP_TOOLS),
413
+ "tools": [t["name"] for t in MCP_TOOLS],
414
+ "active_sessions": len(sessions)
415
  }
416
 
417
 
418
+ # === SSE ENDPOINTS (Legacy - for backward compatibility) ===
419
+
420
+ @app.get("/sse")
421
  @app.get("/mcp/sse")
422
+ async def sse_legacy_redirect():
423
+ """
424
+ Legacy SSE endpoint - redirects to info about the new endpoint.
425
+ SSE is deprecated, use Streamable HTTP at /mcp instead.
426
+ """
427
+ return JSONResponse(
428
+ content={
429
+ "message": "SSE transport is deprecated. Use Streamable HTTP instead.",
430
+ "mcp_endpoint": "/mcp",
431
+ "method": "POST",
432
+ "documentation": "https://modelcontextprotocol.io/specification/basic/transports#streamable-http"
433
+ },
434
+ status_code=200
435
+ )
436
+
437
+
438
+ # === DEBUG ENDPOINTS ===
 
 
439
 
440
+ @app.get("/debug/tools")
441
+ async def debug_tools():
442
+ """Debug endpoint to verify tools configuration"""
443
+ return {
444
+ "tools_count": len(MCP_TOOLS),
445
+ "tools": MCP_TOOLS
446
+ }
447
 
448
+
449
+ @app.post("/debug/test-tool")
450
+ async def debug_test_tool(request: Request):
451
+ """Debug endpoint to test a tool directly"""
452
  try:
453
  body = await request.json()
454
+ tool_name = body.get("name")
455
+ tool_args = body.get("arguments", {})
456
 
457
+ result = await execute_tool(tool_name, tool_args)
458
+ return {"result": result}
 
459
  except Exception as e:
460
+ return {"error": str(e)}
 
 
 
 
 
 
 
 
 
 
 
461
 
462
 
463
  if __name__ == "__main__":
464
+ logger.info("🚀 Starting Crawl4AI MCP Server (Streamable HTTP)")
465
+ logger.info(f"📋 Available tools: {[t['name'] for t in MCP_TOOLS]}")
466
+ logger.info("🔗 MCP Endpoint: POST /mcp")
467
  uvicorn.run(app, host="0.0.0.0", port=7860)