muhammadmaazuddin commited on
Commit
a5e74de
·
1 Parent(s): ed02112
.gitignore CHANGED
@@ -11,3 +11,4 @@ wheels/
11
  .env
12
 
13
  images/
 
 
11
  .env
12
 
13
  images/
14
+ tempimgs/
browser_agent_data/browseruse_agent_data/extracted_content_0.md DELETED
@@ -1,154 +0,0 @@
1
- <url>
2
- https://firebase.google.com/pricing
3
- </url>
4
- <query>
5
- Extract all pricing plan details, including plan names, features, and costs.
6
- </query>
7
- <result>
8
- **Pricing Plans:**
9
-
10
- **1. No-cost (Spark plan)**
11
- * **Features:** Generous no-cost usage limits, no payment method needed.
12
- * **Products & Costs:**
13
- * **A/B Testing:** No-cost
14
- * **Analytics:** No-cost
15
- * **App Check:** No-cost, subject to quotas and limits that vary based on attestation provider.
16
- * **App Distribution:** No-cost
17
- * **App Hosting:**
18
- * Outgoing bandwidth (Uncached/Cached): Not applicable
19
- * Storage: Not applicable
20
- * Cloud Products (Cloud Run, Cloud Build, Artifact Registry, Cloud Logging, Cloud Secrets Manager): Not applicable
21
- * **Authentication:**
22
- * Phone Auth - All regions: Not applicable
23
- * Other Authentication services: Included
24
- * With Identity Platform (Monthly active users): 50K MAUs
25
- * With Identity Platform (Monthly active users - SAML/OIDC): 50 MAUs
26
- * **Cloud Firestore (Standard edition):**
27
- * Stored data: 1 GiB total
28
- * Network egress: 10 GiB/month
29
- * Document writes: 20K writes/day
30
- * Document reads: 50K reads/day
31
- * Document deletes: 20K deletes/day
32
- * **Cloud Firestore (Enterprise edition):**
33
- * Stored data: 1 GiB total
34
- * Network egress: 10 GiB/month
35
- * Document writes - includes writes and deletes: 40K writes/day
36
- * Document reads: 50K reads/day
37
- * **Cloud Functions:** Not applicable for Invocations, GB-seconds, CPU-seconds, Outbound networking, Cloud Build minutes, Container storage in Artifact Registry.
38
- * **Cloud Messaging (FCM):** No-cost
39
- * **Cloud Storage (`*.appspot.com` legacy buckets):**
40
- * GB stored: 5 GB
41
- * GB downloaded: 1 GB/day
42
- * Upload operations: 20K/day
43
- * Download operations: 50K/day
44
- * Multiple buckets per project: Not included
45
- * **Cloud Storage (`*.firebasestorage.app` and any additional buckets):** Not applicable for GB stored, GB downloaded, Upload operations, Download operations, Multiple buckets per project.
46
- * **Crashlytics:** No-cost
47
- * **Data Connect:** Not applicable for Network egress, Operation count, Cloud SQL for PostgreSQL.
48
- * **Hosting:**
49
- * Storage: 10 GB
50
- * Data transfer: 360 MB/day
51
- * Custom domain & SSL: Included
52
- * Multiple sites per project: Included
53
- * **In-App Messaging:** No-cost
54
- * **Firebase ML:**
55
- * Custom Model Deployment: Included
56
- * Cloud Vision APIs: Not included
57
- * **Performance Monitoring:** No-cost
58
- * **Realtime Database:**
59
- * Simultaneous connections: 100
60
- * GB stored: 1 GB
61
- * GB downloaded: 10 GB/month
62
- * Multiple databases per project: Not included
63
- * **Remote Config:** No-cost
64
- * **Test Lab:**
65
- * Virtual Device Tests: 10 tests/day
66
- * Physical Device Tests: 5 tests/day
67
- * Android Device Streaming: 30 no-cost minutes per project, per month
68
- * **Firebase AI Logic client SDKs:** Included
69
- * **Google Cloud (BigQuery):** Included (sandbox limits)
70
- * **Google Cloud (Other IaaS):** Not included
71
- * **Gemini in Firebase:** No-cost for individuals or groups not using Google Workspace. Google Workspace users require a valid Gemini Code Assist subscription.
72
- * **Firebase Studio:** No-cost for three workspaces. Google Developer Program members can create: Standard (no-cost): 10 workspaces; Premium: 30 workspaces and an increased Gemini quota for the App Prototyping agent.
73
-
74
- **2. Pay as you go (Blaze plan)**
75
- * **Features:** Eligible developers can claim $300 of credits to get started, no-cost usage limits from Spark plan included*.
76
- * **Products & Costs:**
77
- * **A/B Testing:** No-cost
78
- * **Analytics:** No-cost
79
- * **App Check:** No-cost, subject to quotas and limits that vary based on attestation provider.
80
- * **App Distribution:** No-cost
81
- * **App Hosting:** (Starting August 1, 2025)
82
- * Outgoing bandwidth (Uncached): No-cost up to 10 GiB/month, then $0.20/GiB
83
- * Outgoing bandwidth (Cached): No-cost up to 10 GiB/month, then $0.15/GiB
84
- * Storage: No-cost up to 5 GB, then $0.10/GB
85
- * Cloud Products (Cloud Run, Cloud Build, Artifact Registry, Cloud Logging, Cloud Secrets Manager): Billed at Google Cloud pricing (links provided for each).
86
- * **Authentication:**
87
- * Phone Auth - All regions: Billed per SMS sent (see current rates)
88
- * Other Authentication services: Included
89
- * With Identity Platform (Monthly active users): No-cost up to 50K MAUs, then Google Cloud pricing
90
- * With Identity Platform (Monthly active users - SAML/OIDC): No-cost up to 50 MAUs, then Google Cloud pricing
91
- * **Cloud Firestore (Standard edition):**
92
- * Stored data: No-cost up to 1 GiB total, then Google Cloud pricing
93
- * Network egress: No-cost up to 10 GiB/month, then Google Cloud pricing
94
- * Document writes: No-cost up to 20K writes/day, then Google Cloud pricing
95
- * Document reads: No-cost up to 50K reads/day, then Google Cloud pricing
96
- * Document deletes: No-cost up to 20K deletes/day, then Google Cloud standard edition pricing
97
- * **Cloud Firestore (Enterprise edition):**
98
- * Stored data: No-cost up to 1 GiB total, then Google Cloud enterprise edition pricing
99
- * Network egress: No-cost up to 10 GiB/month, then Google Cloud enterprise edition pricing
100
- * Document writes - includes writes and deletes: No-cost up to 40K writes/day, then Google Cloud enterprise edition pricing
101
- * Document reads: No-cost up to 50K reads/day, then Google Cloud enterprise edition pricing
102
- * **Cloud Functions:**
103
- * Invocations: No-cost up to 2M/month, then $0.40/million
104
- * GB-seconds: No-cost up to 400K/month, then Google Cloud pricing
105
- * CPU-seconds: No-cost up to 200K/month, then Google Cloud pricing
106
- * Outbound networking: No-cost up to 5 GB/month, then $0.12/GB
107
- * Cloud Build minutes: No-cost up to 120 min/day, then $0.003/min
108
- * Container storage in Artifact Registry: No-cost up to 500MB of storage, then Google Cloud pricing (pricing varies based on location)
109
- * **Cloud Messaging (FCM):** No-cost
110
- * **Cloud Storage (`*.appspot.com` legacy buckets):**
111
- * GB stored: No-cost up to 5 GB, then $0.026/GB
112
- * GB downloaded: No-cost up to 1 GB/day, then $0.12/GB
113
- * Upload operations: No-cost up to 20K/day, then $0.05/10K
114
- * Download operations: No-cost up to 50K/day, then $0.004/10K
115
- * Multiple buckets per project: Included
116
- * **Cloud Storage (`*.firebasestorage.app` and any additional buckets):** (No-cost quotas only for `us-central1`, `us-west1`, `us-east1`)
117
- * GB stored: No-cost up to 5 GB-months, then Cloud Storage pricing
118
- * GB downloaded: No-cost up to 100 GB/month, then Cloud Storage pricing
119
- * Upload operations: No-cost up to 5K/month, then Cloud Storage pricing
120
- * Download operations: No-cost up to 50K/month, then Cloud Storage pricing
121
- * Multiple buckets per project: Included
122
- * **Crashlytics:** No-cost
123
- * **Data Connect:**
124
- * Network egress: No-cost up to 10 GiB/month, then Google Cloud Internet Data Transfer Rate Premium Tier pricing
125
- * Operation count: No-cost up to 250K operations per month, then $4.00 per million operations
126
- * Cloud SQL for PostgreSQL: 3 month no-cost trial for the first default Cloud SQL instance, then starting as low as $9.37/month (pricing varies based on regions and configurations, see Google Cloud pricing).
127
- * **Hosting:**
128
- * Storage: No-cost up to 10 GB, then $0.026/GB
129
- * Data transfer: No-cost up to 360 MB/day, then $0.15/GB
130
- * Custom domain & SSL: Included
131
- * Multiple sites per project: Included
132
- * **In-App Messaging:** No-cost
133
- * **Firebase ML:** (First 1000 Cloud Vision API calls/month have no costs)
134
- * Custom Model Deployment: Included
135
- * Cloud Vision APIs: $1.50/K (see Cloud Vision pricing)
136
- * **Performance Monitoring:** No-cost
137
- * **Realtime Database:**
138
- * Simultaneous connections: 200K per database
139
- * GB stored: No-cost up to 1 GB, then $5/GB
140
- * GB downloaded: No-cost up to 10 GB/month, then $1/GB
141
- * Multiple databases per project: Included
142
- * **Remote Config:** No-cost
143
- * **Test Lab:** (Charged for testing time only, rounded up to the nearest minute)
144
- * Virtual Device Tests: No-cost up to 60 min/day, then $1/device/hour
145
- * Physical Device Tests: No-cost up to 30 min/day, then $5/device/hour
146
- * Android Device Streaming: 30 no-cost minutes per project, per month, then 15 cents for each additional minute
147
- * **Firebase AI Logic client SDKs:** Billed according to current Google Cloud or Gemini Developer API pricing
148
- * **Google Cloud (BigQuery):** Included
149
- * **Google Cloud (Other IaaS):** Included
150
- * **Gemini in Firebase:** No-cost for individuals or groups not using Google Workspace. Google Workspace users require a valid Gemini Code Assist subscription.
151
- * **Firebase Studio:** No-cost for three workspaces. Google Developer Program members can create: Standard (no-cost): 10 workspaces; Premium: 30 workspaces and an increased Gemini quota for the App Prototyping agent.
152
-
153
- *Note: No-cost usage on Blaze plan is calculated daily. Details differ slightly for Cloud Functions, Firebase ML, Phone Auth, and Test Lab. No-cost usage quotas apply at the project-level, not at the app-level or for individual resources.*
154
- </result>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
browser_agent_data/browseruse_agent_data/todo.md CHANGED
@@ -1,10 +0,0 @@
1
- # Firebase Pricing and Brand Identity Extraction
2
-
3
- ## Goal: Extract pricing plan content and brand identity assets for a LinkedIn post.
4
-
5
- ## Tasks:
6
- - [ ] Navigate to https://firebase.google.com/pricing
7
- - [x] Extract content related to pricing plans.
8
- - [x] Extract brand's visual identity (primary/secondary colors, full palette, typography, design system elements, social media brand kit details).
9
- - [ ] Format and return the extracted data in the specified JSON schema.
10
- - [ ] Call done action.
 
 
 
 
 
 
 
 
 
 
 
github_pricing_header.png ADDED

Git LFS Details

  • SHA256: 0be4f0455583cf7a2bb6b88bbaed994bf81fb3d198754379fd1cd806fd1c570b
  • Pointer size: 130 Bytes
  • Size of remote file: 16.4 kB
pyproject.toml CHANGED
@@ -23,6 +23,7 @@ dependencies = [
23
  "openai-agents>=0.2.8",
24
  "pathlib>=1.0.1",
25
  "pillow>=11.3.0",
 
26
  "pydantic>=2.11.7",
27
  "pydantic-ai[logfire]>=1.0.1",
28
  "urljoin>=1.0.0",
 
23
  "openai-agents>=0.2.8",
24
  "pathlib>=1.0.1",
25
  "pillow>=11.3.0",
26
+ "playwright>=1.55.0",
27
  "pydantic>=2.11.7",
28
  "pydantic-ai[logfire]>=1.0.1",
29
  "urljoin>=1.0.0",
src/_agents.py CHANGED
@@ -1,33 +1,37 @@
1
  # type: ignore
2
- from agents import Agent, RunContextWrapper
3
- from model import get_model
4
  import os
5
- from dotenv import load_dotenv
6
- from agents import Agent, AsyncOpenAI, Runner,function_tool, AgentHooks, RunHooks, TContext
7
- from model import get_model
8
- from typing import Any, Optional, Dict
 
 
9
  import re
 
 
 
 
 
 
 
 
 
 
10
  import requests
11
- from markdownify import markdownify
12
  from requests.exceptions import RequestException
13
- from langchain_community.tools import DuckDuckGoSearchResults
14
  from bs4 import BeautifulSoup
15
  from urllib.parse import urljoin
 
16
  from langchain_core.output_parsers import JsonOutputParser
17
- import json
18
- import time
19
- import fal_client
20
- from PIL import Image
21
- from io import BytesIO
22
- from IPython.display import display
23
  from google import genai
24
- import logging
25
- import asyncio
26
- from datetime import datetime
27
- from browser_use import Agent as AgentBrowser, ChatGoogle, ChatOpenAI as ChatOpenAIBrowserUse, BrowserSession
28
- from pathlib import Path
29
-
30
-
31
  # anchor_client = Anchorbrowser(
32
  # api_key=os.getenv("ANCHOR_API_KEY")
33
  # )
@@ -87,9 +91,6 @@ content_agent = Agent(
87
 
88
 
89
 
90
-
91
-
92
-
93
  post_schema = """
94
  {
95
  "meta": {
@@ -301,7 +302,7 @@ You are Media Agent, a professional and specialized in creating social media for
301
 
302
  Your task:
303
  1. Receive a high-level user brief describing a social media post idea.
304
- 2. Generate a detailed DesignSpec (JSON structured specification) from the brief using 'generate_designSpec_from_brief', including platform, style, content, visuals, colors, typography, composition, lighting, mood, and finishing details.
305
  3. Using the generated DesignSpec, create a high-quality, brand-aligned social media image using 'generate_post_image' tool, (Don't change the schema use same as generated)
306
 
307
  Be concise, professional, and strictly follow the structured DesignSpec and design guidelines provided.
@@ -481,106 +482,538 @@ WebInspectorAgent = Agent(
481
 
482
  llm = ChatGoogle(model="gemini-2.5-flash", api_key=os.getenv("GEMINI_API_KEY"))
483
 
484
- # llm = ChatOpenAIBrowserUse(
485
- # model='openai/gpt-4.1-mini',
486
- # base_url='https://openrouter.ai/api/v1',
487
- # api_key=os.getenv('OPENROUTER_API_KEY'),
488
- # )
489
 
490
 
491
 
492
 
493
- import asyncio
494
- from datetime import datetime
495
- from pathlib import Path
496
 
497
 
498
 
499
- from pydantic import BaseModel, Field
500
- from browser_use import Tools, ActionResult
501
- from browser_use.browser import BrowserSession
502
- # from playwright.async_api import Page
503
- import os
504
-
505
- # Reuse the same Tools instance
506
  tools = Tools()
507
 
 
 
508
  class ElementScreenshotParams(BaseModel):
509
- selector: str = Field(
510
- ..., description="CSS selector for the element (e.g., '#login-button')"
 
511
  )
512
  filename: str = Field(
513
- default="element_screenshot.png", description="Output filename"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
514
  )
515
 
516
  @tools.action(
517
- description="Capture a screenshot of a specific element on the page using its CSS selector.",
 
518
  )
519
  async def element_screenshot(params: ElementScreenshotParams, browser_session: BrowserSession) -> ActionResult:
 
 
 
 
 
 
 
 
 
 
 
520
  try:
521
- page = browser_session.page
522
- output_path = os.path.join(browser_session.file_system_path, params.filename)
523
- element = page.locator(params.selector)
524
-
525
- # Wait for element to be visible
526
- await element.wait_for(state="visible", timeout=5000)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
527
 
528
- await element.screenshot(path=output_path)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
529
 
530
- success_msg = f"Element screenshot saved at: {output_path} (selector: {params.selector})"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
531
  return ActionResult(
532
  extracted_content=success_msg,
533
  include_in_memory=True,
534
- long_term_memory=f"Element screenshot taken: {params.selector} -> {output_path}",
535
- vision_content=[{"type": "image", "path": output_path}] # For vision analysis
536
  )
 
 
537
  except Exception as e:
538
- return ActionResult(error=f"Element screenshot failed: {str(e)}. Check selector: {params.selector}")
 
 
 
 
 
 
 
 
 
 
 
 
539
 
 
 
 
540
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
541
 
542
 
543
 
544
 
545
- task = f"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
546
  You are a Browser Intelligence Agent specialized in extracting website content and brand identity assets.
547
- Your job is to always return structured JSON output in the given schema.
548
 
549
  Follow these steps strictly:
550
 
551
- 1. Visit the given website URL.
 
 
 
552
 
553
  2. Content Extraction:
554
- - If the user provides a query:
555
- Search across multiple related pages within the same domain (navigation links, internal links, related pages).
556
- Extract only the relevant text or sections that match the query.
557
- Summarize results across all visited pages into a single coherent output.
558
- - If no query is provided:
559
- • Extract the full visible text from the landing page only.
560
 
561
  3. Brand & Design Extraction:
562
- - Extract the brand's visual identity:
563
- - Primary and secondary theme colors (hex codes).
564
- - Full palette if available.
565
- - Typography (fonts, weights, styles).
566
- - Design System or Style Guide elements.
567
- - Social Media Brand Kit details (logos, icons, button styles, heading styles).
568
-
569
- 4. Screenshots (Custom Tools via browser_use):
570
- - If the user specifies components (e.g., “screenshot all buttons or “screenshot hero section”), locate those elements and take full-resolution screenshots.
571
- - Save screenshots with meaningful names (e.g., `button_styles.png`, `hero_banner.png`).
572
- - If no specific component is requested, skip this step.
573
 
574
  5. Output:
575
- - Always return the result in this JSON schema:
 
576
 
577
  Today is {datetime.now().strftime('%Y-%m-%d')}
578
 
579
- User's query: Go to https://firebase.google.com/pricing and extract content and brand identity assets for linkedin post, Topic is pricing plans.
580
  """
581
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
582
 
583
 
 
 
 
 
584
 
585
  class PageVisited(BaseModel):
586
  url: str
@@ -639,21 +1072,149 @@ class BrowserAgentOutput(BaseModel):
639
 
640
 
641
  async def run_search() -> None:
642
- print('requested to run search')
643
- browser_agent = AgentBrowser(
644
- task=task,
645
- llm=llm,
646
- use_vision=True,
647
- generate_gif=False,
648
- # extend_system_message="Use the execute_js tool for extracting data/information from websites.",
649
- max_failures=3,
650
- file_system_path="./browser_agent_data",
651
- tools=tools,
652
- output_model_schema=BrowserAgentOutput,
653
- )
654
- history = await browser_agent.run(max_steps=15)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
655
 
656
- print(history.final_result)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
657
 
658
 
659
 
 
1
  # type: ignore
 
 
2
  import os
3
+ import sys
4
+ import time
5
+ import json
6
+ import logging
7
+ import asyncio
8
+
9
  import re
10
+ from playwright.async_api import TimeoutError as PlaywrightTimeoutError
11
+ import aiohttp
12
+ from typing import Any, Optional, Dict
13
+ from datetime import datetime
14
+ from pathlib import Path
15
+ from dotenv import load_dotenv
16
+ from pydantic import BaseModel, Field, conint
17
+ from PIL import Image
18
+ from io import BytesIO
19
+ from IPython.display import display
20
  import requests
21
+ import base64
22
  from requests.exceptions import RequestException
23
+ from markdownify import markdownify
24
  from bs4 import BeautifulSoup
25
  from urllib.parse import urljoin
26
+ from langchain_community.tools import DuckDuckGoSearchResults
27
  from langchain_core.output_parsers import JsonOutputParser
 
 
 
 
 
 
28
  from google import genai
29
+ import fal_client
30
+ from agents import Agent, AsyncOpenAI, Runner, function_tool, RunContextWrapper, AgentHooks, RunHooks, TContext
31
+ from model import get_model
32
+ from browser_use import Agent as AgentBrowser, ChatGoogle, ChatOpenAI as ChatOpenAIBrowserUse, Tools, ActionResult
33
+ from browser_use.browser import BrowserSession, BrowserProfile
34
+ from utils.chrome_playwright import start_chrome_with_debug_port, connect_playwright_to_cdp
 
35
  # anchor_client = Anchorbrowser(
36
  # api_key=os.getenv("ANCHOR_API_KEY")
37
  # )
 
91
 
92
 
93
 
 
 
 
94
  post_schema = """
95
  {
96
  "meta": {
 
302
 
303
  Your task:
304
  1. Receive a high-level user brief describing a social media post idea.
305
+ 2. Generate a detailed DesignSpec (JSON structured specification) from the brief using 'generate_designSpec_from_brief', including platform, style, content, visuals, colors, typography, composition, lighting, mood, and finishing requirements.
306
  3. Using the generated DesignSpec, create a high-quality, brand-aligned social media image using 'generate_post_image' tool, (Don't change the schema use same as generated)
307
 
308
  Be concise, professional, and strictly follow the structured DesignSpec and design guidelines provided.
 
482
 
483
  llm = ChatGoogle(model="gemini-2.5-flash", api_key=os.getenv("GEMINI_API_KEY"))
484
 
485
+ llm_browser = ChatOpenAIBrowserUse(
486
+ model='openai/gpt-4.1',
487
+ base_url='https://openrouter.ai/api/v1',
488
+ api_key=os.getenv('OPENROUTER_API_KEY'),
489
+ )
490
 
491
 
492
 
493
 
 
 
 
494
 
495
 
496
 
497
+ # Global Playwright variables and Tools instance
498
+ playwright_browser = None
499
+ playwright_page = None
 
 
 
 
500
  tools = Tools()
501
 
502
+
503
+
504
  class ElementScreenshotParams(BaseModel):
505
+ selectors: list[str] = Field(
506
+ ...,
507
+ description="A list of CSS selectors to try for locating the element(s). The first valid selector will be used."
508
  )
509
  filename: str = Field(
510
+ default="element_screenshot.png",
511
+ description="Output filename for the screenshot."
512
+ )
513
+ highlight: bool = Field(
514
+ default=True,
515
+ description="If True, draw a red border around the element before taking the screenshot."
516
+ )
517
+ padding: conint(ge=0) = Field(
518
+ default=10,
519
+ description="Padding (in pixels) to add around the element in the screenshot."
520
+ )
521
+ scroll_if_needed: bool = Field(
522
+ default=True,
523
+ description="If True, scroll the element into view before taking the screenshot."
524
+ )
525
+ fallback_to_full_page: bool = Field(
526
+ default=True,
527
+ description="If no element is found, fallback to taking a full page screenshot."
528
  )
529
 
530
  @tools.action(
531
+ description="Captures a screenshot of one or more elements on a page using CSS selectors, with options for highlighting, padding, and scrolling. It can try multiple selectors and fall back to a full-page screenshot.",
532
+ param_model=ElementScreenshotParams,
533
  )
534
  async def element_screenshot(params: ElementScreenshotParams, browser_session: BrowserSession) -> ActionResult:
535
+ """
536
+ A robust tool to capture screenshots of web elements.
537
+ - It can use JavaScript-based targeting for selectors.
538
+ - Tries multiple selectors to find the target element.
539
+ - Adds padding to provide context around the element.
540
+
541
+ """
542
+ print("-----------------browser_session_---------")
543
+ page = await browser_session.get_current_page()
544
+
545
+ # Prefer a session-owned file system path if the BrowserSession provides one
546
  try:
547
+ session_base = getattr(browser_session, 'file_system_path', None)
548
+ if session_base:
549
+ base_path = os.path.abspath(session_base)
550
+ else:
551
+ base_path = os.path.abspath(".")
552
+
553
+ # Create a unique directory for screenshots from this website and session
554
+ from urllib.parse import urlparse
555
+ import time
556
+
557
+ parsed_url = urlparse(await page.get_url())
558
+ # Sanitize website name to be filesystem-friendly
559
+ website_name = parsed_url.netloc.replace('www.', '').replace('.', '_').replace(':', '_')
560
+ timestamp = int(time.time())
561
+
562
+ screenshot_dir = os.path.join(base_path, "tempImgs", f"{website_name}-{timestamp}")
563
+
564
+ os.makedirs(screenshot_dir, exist_ok=True)
565
+
566
+ output_path = os.path.join(screenshot_dir, params.filename)
567
+ except Exception as e:
568
+ print(e)
569
+ # Fallback to current working directory if there's an issue creating the new one
570
+ output_path = os.path.join(os.path.abspath('.'), params.filename)
571
+
572
+
573
+ element = None
574
+ used_selector = None
575
+ error_messages = []
576
+ print("Trying to find element :", params)
577
+ for selector in params.selectors:
578
+ try:
579
+ print(selector)
580
+ loc = await page.evaluate("""
581
+ (selector, padding) => {
582
+ const el = document.querySelector(selector);
583
+ if (!el) {
584
+ return {
585
+ clip: { x: null, y: null, width: null, height: null },
586
+ tag: null,
587
+ selector: selector,
588
+ id: null,
589
+ classList: [],
590
+ };
591
+ }
592
+ const rect = el.getBoundingClientRect();
593
+ return {
594
+ clip: {
595
+ x: rect.x - padding,
596
+ y: rect.y - padding,
597
+ width: rect.width + 2 * padding,
598
+ height: rect.height + 2 * padding
599
+ },
600
+ tag: el.tagName,
601
+ selector: selector,
602
+ id: el.id || null,
603
+ classList: Array.from(el.classList || []),
604
 
605
+ };
606
+ }
607
+ """, selector, params.padding)
608
+ element = json.loads(loc)
609
+ # if await loc.count() > 0:
610
+ # element = loc.first() # Use the first element if multiple are found
611
+ # used_selector = selector
612
+ # await element.wait_for(state="attached", timeout=3000)
613
+ # break
614
+ # else:
615
+ # error_messages.append(f"Selector '{selector}' found no elements.")
616
+ except Exception as e:
617
+ error_messages.append(f"Error with selector '{selector}': {str(e)}")
618
+ print('Element found:', element)
619
+ print('at 1')
620
+ if not element:
621
+ # Full-page fallback screenshot disabled — prefer explicit errors instead of taking full-page screenshots.
622
+ # If you want to re-enable the fallback, uncomment the lines below.
623
+ # if params.fallback_to_full_page:
624
+ # try:
625
+ # await page.screenshot(path=output_path, full_page=True)
626
+ # fallback_msg = f"No element found for selectors {params.selectors}. Fell back to full-page screenshot at: {output_path}"
627
+ # return ActionResult(
628
+ # extracted_content=fallback_msg,
629
+ # long_term_memory=fallback_msg,
630
+ # vision_content=[{"type": "image", "path": output_path}]
631
+ # )
632
+ # except Exception as e:
633
+ # return ActionResult(error=f"Element not found and full-page screenshot failed: {str(e)}")
634
+
635
+ return ActionResult(error=f"Could not find any element using selectors: {params.selectors}. Errors: {'; '.join(error_messages)}")
636
+
637
+ print('at 2')
638
+ print(type(element))
639
+ try:
640
+ # Scroll element into view if needed
641
+ # if params.scroll_if_needed:
642
+ # await element.scroll_into_view_if_needed(timeout=5000)
643
 
644
+ # Wait for the element to be stable and visible
645
+ # await element.wait_for(state="visible", timeout=5000)
646
+ # await element.wait_for(state="attached", timeout=5000)
647
+
648
+ # # Highlight the element with a red border
649
+ # original_style = ""
650
+ # if params.highlight:
651
+ # original_style = await element.get_attribute("style") or ""
652
+ # print('evaluaiton 1')
653
+ # await element.evaluate("el => el.style.border = '3px solid red'")
654
+
655
+ # print('evaluaiton 2')
656
+ # Get bounding box and take screenshot with padding
657
+ clip_obj = dict(element).get('clip')
658
+
659
+ if not clip_obj or clip_obj.get('x') is None:
660
+ raise Exception("Could not get bounding box for the element.")
661
+
662
+ try:
663
+ # Get session id and client from the page wrapper
664
+ session_id = await page.session_id
665
+ client = page._client
666
+
667
+ params = {
668
+ 'format': 'png',
669
+ 'clip': {
670
+ 'x': float(clip_obj['x']),
671
+ 'y': float(clip_obj['y']),
672
+ 'width': float(clip_obj['width']),
673
+ 'height': float(clip_obj['height']),
674
+ 'scale': 1,
675
+ },
676
+ }
677
+ result = await client.send.Page.captureScreenshot(params, session_id=session_id)
678
+ img_b64 = result.get('data')
679
+ if not img_b64:
680
+ raise Exception('CDP captureScreenshot returned no data')
681
+ with open(output_path, 'wb') as f:
682
+ f.write(base64.b64decode(img_b64))
683
+ except Exception as e:
684
+ # Re-raise with context
685
+ raise Exception(f'Failed to take clipped screenshot via CDP: {e}')
686
+
687
+
688
+
689
+ success_msg = f"Element screenshot saved at: {output_path} (selector: '{used_selector}')"
690
  return ActionResult(
691
  extracted_content=success_msg,
692
  include_in_memory=True,
693
+ long_term_memory=f"Element screenshot taken: {used_selector} -> {output_path}",
694
+ vision_content=[{"type": "image", "path": output_path}]
695
  )
696
+ except PlaywrightTimeoutError:
697
+ return ActionResult(error=f"Element screenshot failed: Timeout waiting for element '{used_selector}' to be visible or stable.")
698
  except Exception as e:
699
+ return ActionResult(error=f"Element screenshot failed for selector '{used_selector}': {str(e)}")
700
+
701
+
702
+ # ------------------------ Custom helper tools ------------------------
703
+
704
+
705
+ @tools.action(
706
+ description="Finds a web page element using a natural language prompt and returns its selector, backend node id, and the element object.",
707
+ # param_model=,
708
+ )
709
+ async def find_element_by_prompt(query: str, browser_session: BrowserSession) -> dict:
710
+ """
711
+ Use the page's must_get_element_by_prompt (LLM-powered) to robustly locate an element matching the query.
712
 
713
+ Args:
714
+ query (str): Natural language description of the element to find (e.g., "footer section", "pricing table").
715
+ browser_session (BrowserSession): The active browser session object.
716
 
717
+ Returns:
718
+ dict: {
719
+ "selector": <css selector or None>,
720
+ "backend_node_id": int,
721
+ "element": <element object or None>,
722
+ "reason": <string>
723
+ }
724
+ - selector: CSS selector string for the matched element, or None if not found.
725
+ - backend_node_id: Unique backend node id for direct reference (int or None).
726
+ - element: The matched element object, or None if not found.
727
+ - reason: Reason for match or error (string).
728
+ """
729
+ page = await browser_session.get_current_page()
730
+ try:
731
+ # Use the LLM-powered method to get the element
732
+ element = await page.must_get_element_by_prompt(query)
733
+ # Try to build a selector from id/class/tag
734
+ selector = None
735
+ if hasattr(element, 'id') and element.id:
736
+ selector = f"#{element.id}"
737
+ elif hasattr(element, 'class_name') and element.class_name:
738
+ first_cls = element.class_name.split()[0]
739
+ selector = f".{first_cls}"
740
+ elif hasattr(element, 'tag_name') and element.tag_name:
741
+ selector = element.tag_name.lower()
742
+ # Always return backend_node_id for direct reference
743
+ backend_node_id = getattr(element, 'backend_node_id', None)
744
+ return {
745
+ "selector": selector,
746
+ "backend_node_id": backend_node_id,
747
+ "element": element,
748
+ "reason": "llm_match"
749
+ }
750
+ except Exception as e:
751
+ return {
752
+ "selector": None,
753
+ "backend_node_id": None,
754
+ "element": None,
755
+ "reason": f"llm_error: {e}"
756
+ }
757
 
758
 
759
 
760
 
761
+ @tools.action(
762
+ description="Injects or removes a visible red outline around the element identified by selector or selector dict for browser agent visual verification.",
763
+ )
764
+ async def highlight_element(selector_or_obj: str | dict, browser_session: BrowserSession) -> dict:
765
+ """
766
+ Inject or remove a visible red outline around the element identified by selector (or dict{selector}).
767
+
768
+ Args:
769
+ selector_or_obj (str | dict): CSS selector string or dict with 'selector' key to identify the element.
770
+ browser_session (BrowserSession): The active browser session object.
771
+ remove (bool, optional): If True, removes the highlight. If False or omitted, adds the highlight.
772
+
773
+ Returns:
774
+ dict: {ok: True/False, selector: used_selector, reason: str}
775
+ """
776
+ page = await browser_session.get_current_page()
777
+ remove = False
778
+ # Support dict with 'remove' key
779
+ if isinstance(selector_or_obj, dict):
780
+ selector = selector_or_obj.get('selector')
781
+ remove = selector_or_obj.get('remove', False)
782
+ else:
783
+ selector = selector_or_obj
784
+
785
+ if remove:
786
+ js = """
787
+ (sel) => {
788
+ const el = document.querySelector(sel);
789
+ if (!el) return { ok: false, reason: 'not_found', selector: sel };
790
+ if (el.dataset.__highlighted === '1') {
791
+ el.style.outline = el.dataset.__orig_outline || '';
792
+ delete el.dataset.__highlighted;
793
+ delete el.dataset.__orig_outline;
794
+ return { ok: true, selector: sel, reason: 'highlight_removed' };
795
+ }
796
+ return { ok: false, selector: sel, reason: 'no_highlight_to_remove' };
797
+ }
798
+ """
799
+ else:
800
+ js = """
801
+ (sel) => {
802
+ const el = document.querySelector(sel);
803
+ if (!el) return { ok: false, reason: 'not_found', selector: sel };
804
+ // store original outline to restore later
805
+ el.dataset.__orig_outline = el.style.outline || '';
806
+ el.style.outline = '3px solid red';
807
+ el.dataset.__highlighted = '1';
808
+ return { ok: true, selector: sel, reason: 'highlight_applied' };
809
+ }
810
+ """
811
+
812
+ try:
813
+ raw = await page.evaluate(js, selector)
814
+ return json.loads(raw)
815
+ except Exception as e:
816
+ return {"ok": False, "reason": str(e), "selector": selector}
817
+
818
+
819
+ @tools.action(
820
+ description="Returns the bounding box (x, y, width, height) for a given CSS selector or selector dict on the current page. Useful for element positioning, cropping, or screenshot tasks.",
821
+ )
822
+ async def get_bounding_box(selector_or_obj: str | dict, browser_session: BrowserSession) -> dict:
823
+ """
824
+ Description:
825
+ Returns the bounding box for a given CSS selector or selector dict on the current page.
826
+
827
+ Args:
828
+ selector_or_obj (str | dict): CSS selector string or dict with 'selector' key to identify the element.
829
+ browser_session (BrowserSession): The active browser session object.
830
+
831
+ Returns:
832
+ dict: {x: float or None, y: float or None, width: float or None, height: float or None, error: str (optional)}
833
+ - x, y: Top-left coordinates of the element (relative to viewport)
834
+ - width, height: Size of the element
835
+ - error: Error message if bounding box could not be retrieved
836
+ """
837
+ page = await browser_session.get_current_page()
838
+ if isinstance(selector_or_obj, dict):
839
+ selector = selector_or_obj.get('selector')
840
+ else:
841
+ selector = selector_or_obj
842
+
843
+ js = """
844
+ (sel) => {
845
+ const el = document.querySelector(sel);
846
+ if (!el) return { x: null, y: null, width: null, height: null };
847
+ const r = el.getBoundingClientRect();
848
+ return { x: r.x, y: r.y, width: r.width, height: r.height };
849
+ }
850
+ """
851
+
852
+ try:
853
+ raw = await page.evaluate(js, selector)
854
+ return json.loads(raw)
855
+ except Exception as e:
856
+ return {"x": None, "y": None, "width": None, "height": None, "error": str(e)}
857
+
858
+
859
+ @tools.action(
860
+ description="Takes a screenshot of a specific region (clip) of the current page, defined by x, y, width, height. Returns the saved image path and status.",
861
+ )
862
+ async def element_screenshot_clip(clip: dict, filename: str = 'element_clip.png', browser_session: BrowserSession = None) -> dict:
863
+ """
864
+ Description:
865
+ Takes a screenshot of a specific region (clip) of the current page, defined by x, y, width, height.
866
+
867
+ Args:
868
+ clip (dict): Dictionary with keys 'x', 'y', 'width', 'height' (all float/int) specifying the region to capture.
869
+ filename (str, optional): Output filename for the screenshot. Defaults to 'element_clip.png'.
870
+ browser_session (BrowserSession, optional): The active browser session object. Required.
871
+
872
+ Returns:
873
+ dict: {ok: True/False, path: str (if ok), error: str (if not ok)}
874
+ - ok: True if screenshot was successful, False otherwise
875
+ - path: Absolute path to the saved screenshot image (if ok)
876
+ - error: Error message if screenshot failed
877
+ """
878
+ if browser_session is None:
879
+ return {"ok": False, "error": "browser_session required"}
880
+
881
+ page = await browser_session.get_current_page()
882
+ try:
883
+ session_id = await page.session_id
884
+ client = page._client
885
+ params = {
886
+ 'format': 'png',
887
+ 'clip': {
888
+ 'x': float(clip['x']),
889
+ 'y': float(clip['y']),
890
+ 'width': float(clip['width']),
891
+ 'height': float(clip['height']),
892
+ 'scale': 1,
893
+ },
894
+ }
895
+ result = await client.send.Page.captureScreenshot(params, session_id=session_id)
896
+ img_b64 = result.get('data')
897
+ if not img_b64:
898
+ return {"ok": False, "error": 'no_data'}
899
+
900
+ # save in tempImgs root next to script
901
+ out_path = os.path.abspath(filename)
902
+ with open(out_path, 'wb') as f:
903
+ f.write(base64.b64decode(img_b64))
904
+
905
+ return {"ok": True, "path": out_path}
906
+ except Exception as e:
907
+ return {"ok": False, "error": str(e)}
908
+
909
+
910
+ @function_tool
911
+ async def verify_element_visual(query: str, screenshot_path: str, browser_session: BrowserSession, tolerance: int = 20) -> dict:
912
+ """Verify that the screenshot corresponds to the element found for `query`.
913
+
914
+ Strategy: find element by prompt, get bounding box, compare image size to bbox within tolerance.
915
+ Returns {verified: bool, selector: str or None, screenshot: path, details: ...}
916
+ """
917
+ # 1) locate element
918
+ found = await find_element_by_prompt(query, browser_session)
919
+ selector = found.get('selector')
920
+ if not selector:
921
+ return {"verified": False, "selector": None, "screenshot": screenshot_path, "details": "could_not_find_element"}
922
+
923
+ # 2) get bbox
924
+ bbox = await get_bounding_box(selector, browser_session)
925
+ if not bbox or bbox.get('width') is None:
926
+ return {"verified": False, "selector": selector, "screenshot": screenshot_path, "details": "could_not_get_bbox"}
927
+
928
+ # 3) load screenshot and compare sizes
929
+ try:
930
+ img = Image.open(screenshot_path)
931
+ w, h = img.size
932
+ except Exception as e:
933
+ return {"verified": False, "selector": selector, "screenshot": screenshot_path, "details": f"could_not_open_image: {e}"}
934
+
935
+ # Compare pixel sizes to bbox width/height
936
+ bw = int(round(bbox['width']))
937
+ bh = int(round(bbox['height']))
938
+
939
+ if abs(bw - w) <= tolerance and abs(bh - h) <= tolerance:
940
+ return {"verified": True, "selector": selector, "screenshot": screenshot_path, "details": "size_match"}
941
+ else:
942
+ return {"verified": False, "selector": selector, "screenshot": screenshot_path, "details": {"bbox": bbox, "image_size": [w, h]}}
943
+
944
+
945
+
946
+
947
+
948
+
949
+
950
+ task_old_1 = f"""
951
  You are a Browser Intelligence Agent specialized in extracting website content and brand identity assets.
952
+ Your goal is to visit the given website URL and return a structured, comprehensive extraction.
953
 
954
  Follow these steps strictly:
955
 
956
+ 1. Website Navigation:
957
+ - Open the provided URL.
958
+ - If a user query is provided, search across multiple related internal pages (navigation links, relevant subpages) that may contain information about the query.
959
+ - If no query is provided, focus on the landing page only.
960
 
961
  2. Content Extraction:
962
+ - If a query is provided:
963
+ Extract and summarize text relevant to the query from all visited pages.
964
+ Provide a coherent summary that highlights key points across pages.
965
+ - If no query:
966
+ Extract the full visible text from the landing page.
 
967
 
968
  3. Brand & Design Extraction:
969
+ - Identify and extract the brands visual identity, including:
970
+ Primary and secondary colors (hex codes).
971
+ Extended color palette if available.
972
+ Typography (fonts, weights, styles).
973
+ Design system or style guide elements.
974
+ Social media brand kit details (logos, icons, button styles, heading styles).
975
+
976
+ 4. Screenshots (via custom tools):
977
+ - Capture screenshots of **topic-related content** (e.g., pricing tables, signup buttons, hero sections if the query is “pricing plans”).
978
+ - Capture screenshots of **brand identity elements** (e.g., color swatches, typography samples, buttons, logos, icons, headings).
979
+ - Save screenshots with clear, descriptive filenames (e.g., `pricing_table.png`, `signup_button.png`, `primary_colors.png`, `typography_styles.png`).
980
 
981
  5. Output:
982
+ - Return the extracted content, brand identity data, and screenshot metadata in a clean and structured JSON format.
983
+ - Do not include free text or commentary outside the JSON.
984
 
985
  Today is {datetime.now().strftime('%Y-%m-%d')}
986
 
987
+ User's query: Go to https://github.com/pricing and extract content and brand identity assets and screenshots for linkedin post, Topic is pricing plans.
988
  """
989
 
990
+ task_old_2="""
991
+
992
+ ###Selector Discovery, Verification & Screenshot Instructions
993
+
994
+ When identifying selectors for taking elements or sections screenshots:
995
+ Verify each selector's element or section, then capture its screenshot immediately after successful verification.
996
+
997
+ 1. **Analyze** the HTML DOM structure of the page to identify potential selectors for the target elements or sections based on the query.
998
+ 2. **Generate** a list of possible selectors that could uniquely identify each target element.
999
+ 3. **Locate the Target Section or Element:**
1000
+ - Identify the element or section that visually and contextually matches the target.
1001
+ - Focus on the most relevant container or element that directly represents the intended target — not its parent or unrelated siblings.
1002
+ 4. For each candidate selector:
1003
+ - Use the `"execute_js"` tool to verify that the selector matches exactly the target.
1004
+ - **Highlight** the matched element by injecting a visible red border (`2px solid red`) or a temporary background color.
1005
+ 5. **Validate the Finalized Selector Against the Query:**
1006
+ - Once a selector is finalized, confirm that it accurately represents the element or section described in the query.
1007
+ - Ensure it precisely corresponds to the query intent and does not include unrelated, broader, or nested regions.
1008
+ 6. **Remove injected visual styles or modifications** from the DOM to restore the page to its original state before proceeding to the next selector.
1009
+ 7. **After verification**, immediately **capture a screenshot** of the verified element or section.
1010
+ 8. Continue this process until **all target selectors** have been verified and their screenshots captured.
1011
 
1012
 
1013
+ After successful verification, remove all injected visual styles or temporary DOM modifications.
1014
+ User's query: Go to https://github.com/pricing and take screenshot of header and pricing details
1015
+ """
1016
+
1017
 
1018
  class PageVisited(BaseModel):
1019
  url: str
 
1072
 
1073
 
1074
  async def run_search() -> None:
1075
+ print('====================================================')
1076
+ print('Starting run_search() function')
1077
+ print('====================================================')
1078
+
1079
+ # Check installed packages that might be relevant
1080
+ try:
1081
+ import importlib
1082
+ packages = ['browser_use', 'playwright', 'aiohttp']
1083
+ for package in packages:
1084
+ try:
1085
+ mod = importlib.import_module(package)
1086
+ print(f"✅ {package} is installed: {getattr(mod, '__version__', 'unknown version')}")
1087
+ except ImportError:
1088
+ print(f"❌ {package} is NOT installed")
1089
+ except Exception as e:
1090
+ print(f"Error checking packages: {e}")
1091
+
1092
+ # Check environment variables (redacted for security)
1093
+ for key in ['GEMINI_API_KEY', 'OPENROUTER_API_KEY']:
1094
+ if os.environ.get(key):
1095
+ print(f"✅ {key} environment variable is set")
1096
+ else:
1097
+ print(f"❌ {key} environment variable is NOT set")
1098
+
1099
+ chrome_process = None
1100
+ browser_session = None
1101
+
1102
+ try:
1103
+ # Launch the browser via BrowserSession so only the agent opens a window.
1104
+ print('🔄 Launching browser via BrowserSession (agent-managed launch)')
1105
+ browser_profile = BrowserProfile(
1106
+ is_local=True,
1107
+ headless=False,
1108
+ launch_args=[
1109
+ '--no-first-run',
1110
+ '--no-default-browser-check',
1111
+ '--disable-extensions',
1112
+ '--disable-background-networking',
1113
+ '--disable-background-timer-throttling',
1114
+ '--disable-backgrounding-occluded-windows',
1115
+ '--disable-popup-blocking',
1116
+ '--disable-renderer-backgrounding',
1117
+ '--force-color-profile=srgb',
1118
+ '--metrics-recording-only',
1119
+ '--mute-audio',
1120
+ ],
1121
+ )
1122
 
1123
+ print('Creating BrowserSession (this will launch Chrome once, managed by browser-use)')
1124
+ browser_session = BrowserSession(browser_profile=browser_profile)
1125
+ print(f"✅ Browser session created successfully: {browser_session}")
1126
+
1127
+ # Build the Browser Agent using the created session. Skip internal launch to avoid duplicates.
1128
+ print('🔄 Creating Browser Agent with provided BrowserSession...')
1129
+ browser_agent = AgentBrowser(
1130
+ task=task,
1131
+ llm=llm_browser,
1132
+ use_vision=True,
1133
+ generate_gif=False,
1134
+ max_failures=3,
1135
+ file_system_path="./browser_agent_data",
1136
+ tools=tools,
1137
+ output_model_schema=BrowserAgentOutput,
1138
+ browser_session=browser_session,
1139
+ skip_browser_launch=True,
1140
+ )
1141
+ print('✅ Browser Agent created with provided session')
1142
+
1143
+ print('🚀 Running browser agent...')
1144
+ try:
1145
+ print("Starting browser agent.run() with max_steps=15")
1146
+ history = await browser_agent.run(max_steps=15)
1147
+ print("-------------Agent run completed---------------")
1148
+ print("Steps executed:", len(history.steps) if hasattr(history, 'steps') else "Unknown")
1149
+ print("-------------Final result---------------")
1150
+ print(history.final_result)
1151
+ except Exception as run_error:
1152
+ print(f'❌ Error during browser agent run: {type(run_error).__name__}: {run_error}')
1153
+ import traceback
1154
+ print("Detailed traceback:")
1155
+ traceback.print_exc()
1156
+ raise
1157
+ except Exception as e:
1158
+ print(f'❌ Error: {e}')
1159
+ raise
1160
+ finally:
1161
+ # Clean up resources in proper order
1162
+ print('🧹 Cleaning up resources...')
1163
+
1164
+ # First close the browser session which will close its page
1165
+ try:
1166
+ if browser_session:
1167
+ print(f"Attempting to close browser session: {browser_session}")
1168
+ await browser_session.close()
1169
+ print('✅ Closed browser session')
1170
+ else:
1171
+ print('ℹ️ No browser session was created')
1172
+ except Exception as e:
1173
+ print(f'⚠️ Error closing browser session: {type(e).__name__}: {e}')
1174
+ import traceback
1175
+ traceback.print_exc()
1176
+
1177
+ # Then close the playwright browser
1178
+ if playwright_browser:
1179
+ try:
1180
+ print(f"Attempting to close Playwright browser: {playwright_browser}")
1181
+ await playwright_browser.close()
1182
+ print('✅ Closed Playwright browser')
1183
+ except Exception as e:
1184
+ print(f'⚠️ Error closing Playwright browser: {type(e).__name__}: {e}')
1185
+ import traceback
1186
+ traceback.print_exc()
1187
+
1188
+ # Finally terminate the Chrome process
1189
+ if chrome_process:
1190
+ try:
1191
+ print(f"Attempting to terminate Chrome process (PID: {chrome_process.pid})")
1192
+ chrome_process.terminate()
1193
+ print("Waiting for Chrome process to exit (timeout: 5s)")
1194
+ await asyncio.wait_for(chrome_process.wait(), 5)
1195
+ print('✅ Terminated Chrome process')
1196
+ except asyncio.TimeoutError:
1197
+ print('⚠️ Chrome process did not exit after 5s timeout, forcing kill')
1198
+ chrome_process.kill()
1199
+ print("Sent SIGKILL to Chrome process")
1200
+ except Exception as e:
1201
+ print(f'⚠️ Error terminating Chrome process: {type(e).__name__}: {e}')
1202
+ import traceback
1203
+ traceback.print_exc()
1204
+
1205
+ # Check if Chrome is still running via CDP
1206
+ try:
1207
+ print("Checking if Chrome CDP is still accessible...")
1208
+ async with aiohttp.ClientSession() as session:
1209
+ async with session.get('http://localhost:9222/json/version', timeout=aiohttp.ClientTimeout(total=1)) as response:
1210
+ if response.status == 200:
1211
+ print('⚠️ WARNING: Chrome with CDP is still running after cleanup!')
1212
+ else:
1213
+ print('✅ Chrome CDP no longer accessible (status code != 200)')
1214
+ except Exception:
1215
+ print('✅ Chrome CDP no longer accessible (connection failed)')
1216
+
1217
+ print('✅ All cleanup complete')
1218
 
1219
 
1220
 
src/utils/chrome_playwright.py ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import tempfile
3
+ import asyncio
4
+ import aiohttp
5
+ from playwright.async_api import async_playwright
6
+
7
+ async def start_chrome_with_debug_port(port: int = 9222):
8
+ """
9
+ Start Chrome with remote debugging enabled.
10
+ Returns the Chrome process.
11
+ """
12
+ user_data_dir = tempfile.mkdtemp(prefix='chrome_cdp_')
13
+ print(f"Created temp user data dir: {user_data_dir}")
14
+
15
+ chrome_paths = [
16
+ r'C:\Program Files\Google\Chrome\Application\chrome.exe',
17
+ 'chrome.exe',
18
+ 'chrome',
19
+ ]
20
+
21
+ chrome_exe = None
22
+ print(f"Looking for Chrome executable in these locations: {chrome_paths}")
23
+ for path in chrome_paths:
24
+ if os.path.exists(path):
25
+ print(f"Found Chrome at: {path}")
26
+ try:
27
+ print(f"Testing executable: {path}")
28
+ test_proc = await asyncio.create_subprocess_exec(
29
+ path, '--version', stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
30
+ )
31
+ stdout, stderr = await test_proc.communicate()
32
+ if test_proc.returncode == 0:
33
+ version = stdout.decode().strip() if stdout else "Unknown version"
34
+ print(f"Chrome executable works! Version: {version}")
35
+ chrome_exe = path
36
+ break
37
+ else:
38
+ error = stderr.decode().strip() if stderr else "Unknown error"
39
+ print(f"Chrome executable test failed: {error}")
40
+ except Exception as e:
41
+ print(f"Error testing Chrome executable {path}: {e}")
42
+ continue
43
+ elif path in ['chrome', 'chromium', 'chrome.exe']:
44
+ print(f"Checking PATH for {path}")
45
+ try:
46
+ test_proc = await asyncio.create_subprocess_exec(
47
+ path, '--version', stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
48
+ )
49
+ stdout, stderr = await test_proc.communicate()
50
+ if test_proc.returncode == 0:
51
+ version = stdout.decode().strip() if stdout else "Unknown version"
52
+ print(f"Chrome executable works via PATH! Version: {version}")
53
+ chrome_exe = path
54
+ break
55
+ else:
56
+ error = stderr.decode().strip() if stderr else "Unknown error"
57
+ print(f"Chrome executable test via PATH failed: {error}")
58
+ except Exception as e:
59
+ print(f"Error testing Chrome executable via PATH {path}: {e}")
60
+ continue
61
+
62
+ if not chrome_exe:
63
+ raise RuntimeError('❌ Chrome not found. Please install Chrome or Chromium.')
64
+
65
+ cmd = [
66
+ chrome_exe,
67
+ f'--remote-debugging-port={port}',
68
+ f'--user-data-dir={user_data_dir}',
69
+ '--no-first-run',
70
+ '--no-default-browser-check',
71
+ '--disable-extensions',
72
+ '--disable-background-networking',
73
+ '--disable-background-timer-throttling',
74
+ '--disable-backgrounding-occluded-windows',
75
+ '--disable-breakpad',
76
+ '--disable-component-extensions-with-background-pages',
77
+ '--disable-features=TranslateUI,BlinkGenPropertyTrees',
78
+ '--disable-ipc-flooding-protection',
79
+ '--disable-popup-blocking',
80
+ '--disable-prompt-on-repost',
81
+ '--disable-renderer-backgrounding',
82
+ '--force-color-profile=srgb',
83
+ '--metrics-recording-only',
84
+ '--mute-audio',
85
+ 'about:blank',
86
+ ]
87
+
88
+ print(f"Starting Chrome with command: {cmd}")
89
+ process = await asyncio.create_subprocess_exec(*cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE)
90
+ print(f"Chrome process started with PID: {process.pid}")
91
+
92
+ print(f"Waiting for Chrome CDP to be available at http://localhost:{port}/json/version...")
93
+ cdp_ready = False
94
+ for attempt in range(20):
95
+ try:
96
+ async with aiohttp.ClientSession() as session:
97
+ print(f"CDP check attempt {attempt+1}/20...")
98
+ async with session.get(
99
+ f'http://localhost:{port}/json/version', timeout=aiohttp.ClientTimeout(total=1)
100
+ ) as response:
101
+ if response.status == 200:
102
+ data = await response.json()
103
+ print(f"CDP connected successfully! Chrome version: {data.get('Browser', 'Unknown')}")
104
+ cdp_ready = True
105
+ break
106
+ else:
107
+ print(f"CDP check failed with status: {response.status}")
108
+ except Exception as e:
109
+ print(f"CDP check failed with error: {type(e).__name__}: {e}")
110
+ await asyncio.sleep(1)
111
+
112
+ if not cdp_ready:
113
+ print(f"ERROR: Chrome DevTools Protocol not available after timeout on port {port}")
114
+ stdout_data, stderr_data = await process.communicate()
115
+ print(f"Chrome STDOUT: {stdout_data.decode('utf-8', errors='ignore')}")
116
+ print(f"Chrome STDERR: {stderr_data.decode('utf-8', errors='ignore')}")
117
+ process.terminate()
118
+ raise RuntimeError('❌ Chrome failed to start with CDP')
119
+
120
+ return process
121
+
122
+ async def connect_playwright_to_cdp(cdp_url: str):
123
+ """
124
+ Connect Playwright to the same Chrome instance Browser-Use is using.
125
+ Returns the Playwright browser and page.
126
+ """
127
+ print(f"Connecting Playwright to CDP URL: {cdp_url}")
128
+ playwright = await async_playwright().start()
129
+ playwright_browser = await playwright.chromium.connect_over_cdp(cdp_url)
130
+ print(f"Playwright connected to browser")
131
+
132
+ if playwright_browser and playwright_browser.contexts and playwright_browser.contexts[0].pages:
133
+ playwright_page = playwright_browser.contexts[0].pages[0]
134
+ print(f"Using existing page: {await playwright_page.title()}")
135
+ elif playwright_browser:
136
+ print("No existing pages found, creating a new context and page")
137
+ context = await playwright_browser.new_context()
138
+ playwright_page = await context.new_page()
139
+ else:
140
+ playwright_page = None
141
+ print(f"Playwright page setup complete")
142
+ return playwright_browser, playwright_page
uv.lock CHANGED
@@ -2175,6 +2175,25 @@ wheels = [
2175
  { url = "https://files.pythonhosted.org/packages/34/e7/ae39f538fd6844e982063c3a5e4598b8ced43b9633baa3a85ef33af8c05c/pillow-11.3.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:c84d689db21a1c397d001aa08241044aa2069e7587b398c8cc63020390b1c1b8", size = 6984598 },
2176
  ]
2177
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2178
  [[package]]
2179
  name = "portalocker"
2180
  version = "2.10.1"
@@ -2606,6 +2625,18 @@ wheels = [
2606
  { url = "https://files.pythonhosted.org/packages/58/f0/427018098906416f580e3cf1366d3b1abfb408a0652e9f31600c24a1903c/pydantic_settings-2.10.1-py3-none-any.whl", hash = "sha256:a60952460b99cf661dc25c29c0ef171721f98bfcb52ef8d9ea4c943d7c8cc796", size = 45235 },
2607
  ]
2608
 
 
 
 
 
 
 
 
 
 
 
 
 
2609
  [[package]]
2610
  name = "pygments"
2611
  version = "2.19.2"
@@ -5733,6 +5764,7 @@ dependencies = [
5733
  { name = "openai-agents" },
5734
  { name = "pathlib" },
5735
  { name = "pillow" },
 
5736
  { name = "pydantic" },
5737
  { name = "pydantic-ai" },
5738
  { name = "urljoin" },
@@ -5758,6 +5790,7 @@ requires-dist = [
5758
  { name = "openai-agents", specifier = ">=0.2.8" },
5759
  { name = "pathlib", specifier = ">=1.0.1" },
5760
  { name = "pillow", specifier = ">=11.3.0" },
 
5761
  { name = "pydantic", specifier = ">=2.11.7" },
5762
  { name = "pydantic-ai", extras = ["logfire"], specifier = ">=1.0.1" },
5763
  { name = "urljoin", specifier = ">=1.0.0" },
 
2175
  { url = "https://files.pythonhosted.org/packages/34/e7/ae39f538fd6844e982063c3a5e4598b8ced43b9633baa3a85ef33af8c05c/pillow-11.3.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:c84d689db21a1c397d001aa08241044aa2069e7587b398c8cc63020390b1c1b8", size = 6984598 },
2176
  ]
2177
 
2178
+ [[package]]
2179
+ name = "playwright"
2180
+ version = "1.55.0"
2181
+ source = { registry = "https://pypi.org/simple" }
2182
+ dependencies = [
2183
+ { name = "greenlet" },
2184
+ { name = "pyee" },
2185
+ ]
2186
+ wheels = [
2187
+ { url = "https://files.pythonhosted.org/packages/80/3a/c81ff76df266c62e24f19718df9c168f49af93cabdbc4608ae29656a9986/playwright-1.55.0-py3-none-macosx_10_13_x86_64.whl", hash = "sha256:d7da108a95001e412effca4f7610de79da1637ccdf670b1ae3fdc08b9694c034", size = 40428109 },
2188
+ { url = "https://files.pythonhosted.org/packages/cf/f5/bdb61553b20e907196a38d864602a9b4a461660c3a111c67a35179b636fa/playwright-1.55.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:8290cf27a5d542e2682ac274da423941f879d07b001f6575a5a3a257b1d4ba1c", size = 38687254 },
2189
+ { url = "https://files.pythonhosted.org/packages/4a/64/48b2837ef396487807e5ab53c76465747e34c7143fac4a084ef349c293a8/playwright-1.55.0-py3-none-macosx_11_0_universal2.whl", hash = "sha256:25b0d6b3fd991c315cca33c802cf617d52980108ab8431e3e1d37b5de755c10e", size = 40428108 },
2190
+ { url = "https://files.pythonhosted.org/packages/08/33/858312628aa16a6de97839adc2ca28031ebc5391f96b6fb8fdf1fcb15d6c/playwright-1.55.0-py3-none-manylinux1_x86_64.whl", hash = "sha256:c6d4d8f6f8c66c483b0835569c7f0caa03230820af8e500c181c93509c92d831", size = 45905643 },
2191
+ { url = "https://files.pythonhosted.org/packages/83/83/b8d06a5b5721931aa6d5916b83168e28bd891f38ff56fe92af7bdee9860f/playwright-1.55.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:29a0777c4ce1273acf90c87e4ae2fe0130182100d99bcd2ae5bf486093044838", size = 45296647 },
2192
+ { url = "https://files.pythonhosted.org/packages/06/2e/9db64518aebcb3d6ef6cd6d4d01da741aff912c3f0314dadb61226c6a96a/playwright-1.55.0-py3-none-win32.whl", hash = "sha256:29e6d1558ad9d5b5c19cbec0a72f6a2e35e6353cd9f262e22148685b86759f90", size = 35476046 },
2193
+ { url = "https://files.pythonhosted.org/packages/46/4f/9ba607fa94bb9cee3d4beb1c7b32c16efbfc9d69d5037fa85d10cafc618b/playwright-1.55.0-py3-none-win_amd64.whl", hash = "sha256:7eb5956473ca1951abb51537e6a0da55257bb2e25fc37c2b75af094a5c93736c", size = 35476048 },
2194
+ { url = "https://files.pythonhosted.org/packages/21/98/5ca173c8ec906abde26c28e1ecb34887343fd71cc4136261b90036841323/playwright-1.55.0-py3-none-win_arm64.whl", hash = "sha256:012dc89ccdcbd774cdde8aeee14c08e0dd52ddb9135bf10e9db040527386bd76", size = 31225543 },
2195
+ ]
2196
+
2197
  [[package]]
2198
  name = "portalocker"
2199
  version = "2.10.1"
 
2625
  { url = "https://files.pythonhosted.org/packages/58/f0/427018098906416f580e3cf1366d3b1abfb408a0652e9f31600c24a1903c/pydantic_settings-2.10.1-py3-none-any.whl", hash = "sha256:a60952460b99cf661dc25c29c0ef171721f98bfcb52ef8d9ea4c943d7c8cc796", size = 45235 },
2626
  ]
2627
 
2628
+ [[package]]
2629
+ name = "pyee"
2630
+ version = "13.0.0"
2631
+ source = { registry = "https://pypi.org/simple" }
2632
+ dependencies = [
2633
+ { name = "typing-extensions" },
2634
+ ]
2635
+ sdist = { url = "https://files.pythonhosted.org/packages/95/03/1fd98d5841cd7964a27d729ccf2199602fe05eb7a405c1462eb7277945ed/pyee-13.0.0.tar.gz", hash = "sha256:b391e3c5a434d1f5118a25615001dbc8f669cf410ab67d04c4d4e07c55481c37", size = 31250 }
2636
+ wheels = [
2637
+ { url = "https://files.pythonhosted.org/packages/9b/4d/b9add7c84060d4c1906abe9a7e5359f2a60f7a9a4f67268b2766673427d8/pyee-13.0.0-py3-none-any.whl", hash = "sha256:48195a3cddb3b1515ce0695ed76036b5ccc2ef3a9f963ff9f77aec0139845498", size = 15730 },
2638
+ ]
2639
+
2640
  [[package]]
2641
  name = "pygments"
2642
  version = "2.19.2"
 
5764
  { name = "openai-agents" },
5765
  { name = "pathlib" },
5766
  { name = "pillow" },
5767
+ { name = "playwright" },
5768
  { name = "pydantic" },
5769
  { name = "pydantic-ai" },
5770
  { name = "urljoin" },
 
5790
  { name = "openai-agents", specifier = ">=0.2.8" },
5791
  { name = "pathlib", specifier = ">=1.0.1" },
5792
  { name = "pillow", specifier = ">=11.3.0" },
5793
+ { name = "playwright", specifier = ">=1.55.0" },
5794
  { name = "pydantic", specifier = ">=2.11.7" },
5795
  { name = "pydantic-ai", extras = ["logfire"], specifier = ">=1.0.1" },
5796
  { name = "urljoin", specifier = ">=1.0.0" },