Spaces:

muhammadmaazuddin
/

smgp

No application file

App Files Files Community

muhammadmaazuddin commited on Oct 5, 2025

Commit

a5e74de

1 Parent(s): ed02112

worked

Browse files

Files changed (8) hide show

.gitignore +1 -0
browser_agent_data/browseruse_agent_data/extracted_content_0.md +0 -154
browser_agent_data/browseruse_agent_data/todo.md +0 -10
github_pricing_header.png +3 -0
pyproject.toml +1 -0
src/_agents.py +652 -91
src/utils/chrome_playwright.py +142 -0
uv.lock +33 -0

.gitignore CHANGED Viewed

@@ -11,3 +11,4 @@ wheels/
 .env
 images/

 .env
 images/
+tempimgs/

browser_agent_data/browseruse_agent_data/extracted_content_0.md DELETED Viewed

@@ -1,154 +0,0 @@
-<url>
-https://firebase.google.com/pricing
-</url>
-<query>
-Extract all pricing plan details, including plan names, features, and costs.
-</query>
-<result>
-**Pricing Plans:**
-**1. No-cost (Spark plan)**
-*   **Features:** Generous no-cost usage limits, no payment method needed.
-*   **Products & Costs:**
-    *   **A/B Testing:** No-cost
-    *   **Analytics:** No-cost
-    *   **App Check:** No-cost, subject to quotas and limits that vary based on attestation provider.
-    *   **App Distribution:** No-cost
-    *   **App Hosting:**
-        *   Outgoing bandwidth (Uncached/Cached): Not applicable
-        *   Storage: Not applicable
-        *   Cloud Products (Cloud Run, Cloud Build, Artifact Registry, Cloud Logging, Cloud Secrets Manager): Not applicable
-    *   **Authentication:**
-        *   Phone Auth - All regions: Not applicable
-        *   Other Authentication services: Included
-        *   With Identity Platform (Monthly active users): 50K MAUs
-        *   With Identity Platform (Monthly active users - SAML/OIDC): 50 MAUs
-    *   **Cloud Firestore (Standard edition):**
-        *   Stored data: 1 GiB total
-        *   Network egress: 10 GiB/month
-        *   Document writes: 20K writes/day
-        *   Document reads: 50K reads/day
-        *   Document deletes: 20K deletes/day
-    *   **Cloud Firestore (Enterprise edition):**
-        *   Stored data: 1 GiB total
-        *   Network egress: 10 GiB/month
-        *   Document writes - includes writes and deletes: 40K writes/day
-        *   Document reads: 50K reads/day
-    *   **Cloud Functions:** Not applicable for Invocations, GB-seconds, CPU-seconds, Outbound networking, Cloud Build minutes, Container storage in Artifact Registry.
-    *   **Cloud Messaging (FCM):** No-cost
-    *   **Cloud Storage (`*.appspot.com` legacy buckets):**
-        *   GB stored: 5 GB
-        *   GB downloaded: 1 GB/day
-        *   Upload operations: 20K/day
-        *   Download operations: 50K/day
-        *   Multiple buckets per project: Not included
-    *   **Cloud Storage (`*.firebasestorage.app` and any additional buckets):** Not applicable for GB stored, GB downloaded, Upload operations, Download operations, Multiple buckets per project.
-    *   **Crashlytics:** No-cost
-    *   **Data Connect:** Not applicable for Network egress, Operation count, Cloud SQL for PostgreSQL.
-    *   **Hosting:**
-        *   Storage: 10 GB
-        *   Data transfer: 360 MB/day
-        *   Custom domain & SSL: Included
-        *   Multiple sites per project: Included
-    *   **In-App Messaging:** No-cost
-    *   **Firebase ML:**
-        *   Custom Model Deployment: Included
-        *   Cloud Vision APIs: Not included
-    *   **Performance Monitoring:** No-cost
-    *   **Realtime Database:**
-        *   Simultaneous connections: 100
-        *   GB stored: 1 GB
-        *   GB downloaded: 10 GB/month
-        *   Multiple databases per project: Not included
-    *   **Remote Config:** No-cost
-    *   **Test Lab:**
-        *   Virtual Device Tests: 10 tests/day
-        *   Physical Device Tests: 5 tests/day
-        *   Android Device Streaming: 30 no-cost minutes per project, per month
-    *   **Firebase AI Logic client SDKs:** Included
-    *   **Google Cloud (BigQuery):** Included (sandbox limits)
-    *   **Google Cloud (Other IaaS):** Not included
-    *   **Gemini in Firebase:** No-cost for individuals or groups not using Google Workspace. Google Workspace users require a valid Gemini Code Assist subscription.
-    *   **Firebase Studio:** No-cost for three workspaces. Google Developer Program members can create: Standard (no-cost): 10 workspaces; Premium: 30 workspaces and an increased Gemini quota for the App Prototyping agent.
-**2. Pay as you go (Blaze plan)**
-*   **Features:** Eligible developers can claim $300 of credits to get started, no-cost usage limits from Spark plan included*.
-*   **Products & Costs:**
-    *   **A/B Testing:** No-cost
-    *   **Analytics:** No-cost
-    *   **App Check:** No-cost, subject to quotas and limits that vary based on attestation provider.
-    *   **App Distribution:** No-cost
-    *   **App Hosting:** (Starting August 1, 2025)
-        *   Outgoing bandwidth (Uncached): No-cost up to 10 GiB/month, then $0.20/GiB
-        *   Outgoing bandwidth (Cached): No-cost up to 10 GiB/month, then $0.15/GiB
-        *   Storage: No-cost up to 5 GB, then $0.10/GB
-        *   Cloud Products (Cloud Run, Cloud Build, Artifact Registry, Cloud Logging, Cloud Secrets Manager): Billed at Google Cloud pricing (links provided for each).
-    *   **Authentication:**
-        *   Phone Auth - All regions: Billed per SMS sent (see current rates)
-        *   Other Authentication services: Included
-        *   With Identity Platform (Monthly active users): No-cost up to 50K MAUs, then Google Cloud pricing
-        *   With Identity Platform (Monthly active users - SAML/OIDC): No-cost up to 50 MAUs, then Google Cloud pricing
-    *   **Cloud Firestore (Standard edition):**
-        *   Stored data: No-cost up to 1 GiB total, then Google Cloud pricing
-        *   Network egress: No-cost up to 10 GiB/month, then Google Cloud pricing
-        *   Document writes: No-cost up to 20K writes/day, then Google Cloud pricing
-        *   Document reads: No-cost up to 50K reads/day, then Google Cloud pricing
-        *   Document deletes: No-cost up to 20K deletes/day, then Google Cloud standard edition pricing
-    *   **Cloud Firestore (Enterprise edition):**
-        *   Stored data: No-cost up to 1 GiB total, then Google Cloud enterprise edition pricing
-        *   Network egress: No-cost up to 10 GiB/month, then Google Cloud enterprise edition pricing
-        *   Document writes - includes writes and deletes: No-cost up to 40K writes/day, then Google Cloud enterprise edition pricing
-        *   Document reads: No-cost up to 50K reads/day, then Google Cloud enterprise edition pricing
-    *   **Cloud Functions:**
-        *   Invocations: No-cost up to 2M/month, then $0.40/million
-        *   GB-seconds: No-cost up to 400K/month, then Google Cloud pricing
-        *   CPU-seconds: No-cost up to 200K/month, then Google Cloud pricing
-        *   Outbound networking: No-cost up to 5 GB/month, then $0.12/GB
-        *   Cloud Build minutes: No-cost up to 120 min/day, then $0.003/min
-        *   Container storage in Artifact Registry: No-cost up to 500MB of storage, then Google Cloud pricing (pricing varies based on location)
-    *   **Cloud Messaging (FCM):** No-cost
-    *   **Cloud Storage (`*.appspot.com` legacy buckets):**
-        *   GB stored: No-cost up to 5 GB, then $0.026/GB
-        *   GB downloaded: No-cost up to 1 GB/day, then $0.12/GB
-        *   Upload operations: No-cost up to 20K/day, then $0.05/10K
-        *   Download operations: No-cost up to 50K/day, then $0.004/10K
-        *   Multiple buckets per project: Included
-    *   **Cloud Storage (`*.firebasestorage.app` and any additional buckets):** (No-cost quotas only for `us-central1`, `us-west1`, `us-east1`)
-        *   GB stored: No-cost up to 5 GB-months, then Cloud Storage pricing
-        *   GB downloaded: No-cost up to 100 GB/month, then Cloud Storage pricing
-        *   Upload operations: No-cost up to 5K/month, then Cloud Storage pricing
-        *   Download operations: No-cost up to 50K/month, then Cloud Storage pricing
-        *   Multiple buckets per project: Included
-    *   **Crashlytics:** No-cost
-    *   **Data Connect:**
-        *   Network egress: No-cost up to 10 GiB/month, then Google Cloud Internet Data Transfer Rate Premium Tier pricing
-        *   Operation count: No-cost up to 250K operations per month, then $4.00 per million operations
-        *   Cloud SQL for PostgreSQL: 3 month no-cost trial for the first default Cloud SQL instance, then starting as low as $9.37/month (pricing varies based on regions and configurations, see Google Cloud pricing).
-    *   **Hosting:**
-        *   Storage: No-cost up to 10 GB, then $0.026/GB
-        *   Data transfer: No-cost up to 360 MB/day, then $0.15/GB
-        *   Custom domain & SSL: Included
-        *   Multiple sites per project: Included
-    *   **In-App Messaging:** No-cost
-    *   **Firebase ML:** (First 1000 Cloud Vision API calls/month have no costs)
-        *   Custom Model Deployment: Included
-        *   Cloud Vision APIs: $1.50/K (see Cloud Vision pricing)
-    *   **Performance Monitoring:** No-cost
-    *   **Realtime Database:**
-        *   Simultaneous connections: 200K per database
-        *   GB stored: No-cost up to 1 GB, then $5/GB
-        *   GB downloaded: No-cost up to 10 GB/month, then $1/GB
-        *   Multiple databases per project: Included
-    *   **Remote Config:** No-cost
-    *   **Test Lab:** (Charged for testing time only, rounded up to the nearest minute)
-        *   Virtual Device Tests: No-cost up to 60 min/day, then $1/device/hour
-        *   Physical Device Tests: No-cost up to 30 min/day, then $5/device/hour
-        *   Android Device Streaming: 30 no-cost minutes per project, per month, then 15 cents for each additional minute
-    *   **Firebase AI Logic client SDKs:** Billed according to current Google Cloud or Gemini Developer API pricing
-    *   **Google Cloud (BigQuery):** Included
-    *   **Google Cloud (Other IaaS):** Included
-    *   **Gemini in Firebase:** No-cost for individuals or groups not using Google Workspace. Google Workspace users require a valid Gemini Code Assist subscription.
-    *   **Firebase Studio:** No-cost for three workspaces. Google Developer Program members can create: Standard (no-cost): 10 workspaces; Premium: 30 workspaces and an increased Gemini quota for the App Prototyping agent.
-*Note: No-cost usage on Blaze plan is calculated daily. Details differ slightly for Cloud Functions, Firebase ML, Phone Auth, and Test Lab. No-cost usage quotas apply at the project-level, not at the app-level or for individual resources.*
-</result>

browser_agent_data/browseruse_agent_data/todo.md CHANGED Viewed

@@ -1,10 +0,0 @@
-# Firebase Pricing and Brand Identity Extraction
-## Goal: Extract pricing plan content and brand identity assets for a LinkedIn post.
-## Tasks:
-- [ ] Navigate to https://firebase.google.com/pricing
-- [x] Extract content related to pricing plans.
-- [x] Extract brand's visual identity (primary/secondary colors, full palette, typography, design system elements, social media brand kit details).
-- [ ] Format and return the extracted data in the specified JSON schema.
-- [ ] Call done action.

github_pricing_header.png ADDED Viewed

Git LFS Details

SHA256: 0be4f0455583cf7a2bb6b88bbaed994bf81fb3d198754379fd1cd806fd1c570b
Pointer size: 130 Bytes
Size of remote file: 16.4 kB

pyproject.toml CHANGED Viewed

@@ -23,6 +23,7 @@ dependencies = [
     "openai-agents>=0.2.8",
     "pathlib>=1.0.1",
     "pillow>=11.3.0",
     "pydantic>=2.11.7",
     "pydantic-ai[logfire]>=1.0.1",
     "urljoin>=1.0.0",

     "openai-agents>=0.2.8",
     "pathlib>=1.0.1",
     "pillow>=11.3.0",
+    "playwright>=1.55.0",
     "pydantic>=2.11.7",
     "pydantic-ai[logfire]>=1.0.1",
     "urljoin>=1.0.0",

src/_agents.py CHANGED Viewed

@@ -1,33 +1,37 @@
 # type: ignore
-from agents import Agent, RunContextWrapper
-from model import get_model
 import os
-from dotenv import load_dotenv
-from agents import Agent, AsyncOpenAI, Runner,function_tool, AgentHooks, RunHooks, TContext
-from model import get_model
-from typing import Any, Optional, Dict
 import re
 import requests
-from markdownify import markdownify
 from requests.exceptions import RequestException
-from langchain_community.tools import DuckDuckGoSearchResults
 from bs4 import BeautifulSoup
 from urllib.parse import urljoin
 from langchain_core.output_parsers import JsonOutputParser
-import json
-import time
-import fal_client
-from PIL import Image
-from io import BytesIO
-from IPython.display import display
 from google import genai
-import logging
-import asyncio
-from datetime import datetime
-from browser_use import Agent as AgentBrowser, ChatGoogle, ChatOpenAI as ChatOpenAIBrowserUse, BrowserSession
-from pathlib import Path
 # anchor_client = Anchorbrowser(
 #     api_key=os.getenv("ANCHOR_API_KEY")
 # )
@@ -87,9 +91,6 @@ content_agent = Agent(
 post_schema = """
 {
   "meta": {
@@ -301,7 +302,7 @@ You are Media Agent, a professional and specialized in creating social media for
 Your task:
 1. Receive a high-level user brief describing a social media post idea.
-2. Generate a detailed DesignSpec (JSON structured specification) from the brief using 'generate_designSpec_from_brief', including platform, style, content, visuals, colors, typography, composition, lighting, mood, and finishing details.
 3. Using the generated DesignSpec, create a high-quality, brand-aligned social media image using 'generate_post_image' tool, (Don't change the schema use same as generated)
 Be concise, professional, and strictly follow the structured DesignSpec and design guidelines provided.
@@ -481,106 +482,538 @@ WebInspectorAgent = Agent(
 llm = ChatGoogle(model="gemini-2.5-flash", api_key=os.getenv("GEMINI_API_KEY"))
-# llm = ChatOpenAIBrowserUse(
-# 	model='openai/gpt-4.1-mini',
-# 	base_url='https://openrouter.ai/api/v1',
-# 	api_key=os.getenv('OPENROUTER_API_KEY'),
-# )
-import asyncio
-from datetime import datetime
-from pathlib import Path
-from pydantic import BaseModel, Field
-from browser_use import Tools, ActionResult
-from browser_use.browser import BrowserSession
-# from playwright.async_api import Page
-import os
-# Reuse the same Tools instance
 tools = Tools()
 class ElementScreenshotParams(BaseModel):
-    selector: str = Field(
-        ..., description="CSS selector for the element (e.g., '#login-button')"
     )
     filename: str = Field(
-        default="element_screenshot.png", description="Output filename"
     )
 @tools.action(
-    description="Capture a screenshot of a specific element on the page using its CSS selector.",
 )
 async def element_screenshot(params: ElementScreenshotParams, browser_session: BrowserSession) -> ActionResult:
     try:
-        page = browser_session.page
-        output_path = os.path.join(browser_session.file_system_path, params.filename)
-        element = page.locator(params.selector)
-        # Wait for element to be visible
-        await element.wait_for(state="visible", timeout=5000)
-        await element.screenshot(path=output_path)
-        success_msg = f"Element screenshot saved at: {output_path} (selector: {params.selector})"
         return ActionResult(
             extracted_content=success_msg,
             include_in_memory=True,
-            long_term_memory=f"Element screenshot taken: {params.selector} -> {output_path}",
-            vision_content=[{"type": "image", "path": output_path}]  # For vision analysis
         )
     except Exception as e:
-        return ActionResult(error=f"Element screenshot failed: {str(e)}. Check selector: {params.selector}")
-task = f"""
 You are a Browser Intelligence Agent specialized in extracting website content and brand identity assets.
-Your job is to always return structured JSON output in the given schema.
 Follow these steps strictly:
-1. Visit the given website URL.
 2. Content Extraction:
-   - If the user provides a query:
-     • Search across multiple related pages within the same domain (navigation links, internal links, related pages).
-     • Extract only the relevant text or sections that match the query.
-     • Summarize results across all visited pages into a single coherent output.
-   - If no query is provided:
-     • Extract the full visible text from the landing page only.
 3. Brand & Design Extraction:
-   - Extract the brand's visual identity:
-     - Primary and secondary theme colors (hex codes).
-     - Full palette if available.
-     - Typography (fonts, weights, styles).
-     - Design System or Style Guide elements.
-     - Social Media Brand Kit details (logos, icons, button styles, heading styles).
-4. Screenshots (Custom Tools via browser_use):
-   - If the user specifies components (e.g., “screenshot all buttons” or “screenshot hero section”), locate those elements and take full-resolution screenshots.
-   - Save screenshots with meaningful names (e.g., `button_styles.png`, `hero_banner.png`).
-   - If no specific component is requested, skip this step.
 5. Output:
-   - Always return the result in this JSON schema:
 Today is {datetime.now().strftime('%Y-%m-%d')}
-User's query: Go to https://firebase.google.com/pricing and extract content and brand identity assets for linkedin post, Topic is pricing plans.
 """
 class PageVisited(BaseModel):
     url: str
@@ -639,21 +1072,149 @@ class BrowserAgentOutput(BaseModel):
 async def run_search() -> None:
-    print('requested to run search')
-    browser_agent = AgentBrowser(
-        task=task,
-        llm=llm,
-        use_vision=True,
-        generate_gif=False,
-        # extend_system_message="Use the execute_js tool for extracting data/information from websites.",
-        max_failures=3,
-        file_system_path="./browser_agent_data",
-        tools=tools,
-        output_model_schema=BrowserAgentOutput,
-    )
-    history = await browser_agent.run(max_steps=15)
-    print(history.final_result)

 # type: ignore
 import os
+import sys
+import time
+import json
+import logging
+import asyncio
 import re
+from playwright.async_api import TimeoutError as PlaywrightTimeoutError
+import aiohttp
+from typing import Any, Optional, Dict
+from datetime import datetime
+from pathlib import Path
+from dotenv import load_dotenv
+from pydantic import BaseModel, Field, conint
+from PIL import Image
+from io import BytesIO
+from IPython.display import display
 import requests
+import base64
 from requests.exceptions import RequestException
+from markdownify import markdownify
 from bs4 import BeautifulSoup
 from urllib.parse import urljoin
+from langchain_community.tools import DuckDuckGoSearchResults
 from langchain_core.output_parsers import JsonOutputParser
 from google import genai
+import fal_client
+from agents import Agent, AsyncOpenAI, Runner, function_tool, RunContextWrapper, AgentHooks, RunHooks, TContext
+from model import get_model
+from browser_use import Agent as AgentBrowser, ChatGoogle, ChatOpenAI as ChatOpenAIBrowserUse, Tools, ActionResult
+from browser_use.browser import BrowserSession, BrowserProfile
+from utils.chrome_playwright import start_chrome_with_debug_port, connect_playwright_to_cdp
 # anchor_client = Anchorbrowser(
 #     api_key=os.getenv("ANCHOR_API_KEY")
 # )
 post_schema = """
 {
   "meta": {
 Your task:
 1. Receive a high-level user brief describing a social media post idea.
+2. Generate a detailed DesignSpec (JSON structured specification) from the brief using 'generate_designSpec_from_brief', including platform, style, content, visuals, colors, typography, composition, lighting, mood, and finishing requirements.
 3. Using the generated DesignSpec, create a high-quality, brand-aligned social media image using 'generate_post_image' tool, (Don't change the schema use same as generated)
 Be concise, professional, and strictly follow the structured DesignSpec and design guidelines provided.
 llm = ChatGoogle(model="gemini-2.5-flash", api_key=os.getenv("GEMINI_API_KEY"))
+llm_browser = ChatOpenAIBrowserUse(
+	model='openai/gpt-4.1',
+	base_url='https://openrouter.ai/api/v1',
+	api_key=os.getenv('OPENROUTER_API_KEY'),
+)
+# Global Playwright variables and Tools instance
+playwright_browser = None
+playwright_page = None
 tools = Tools()
 class ElementScreenshotParams(BaseModel):
+    selectors: list[str] = Field(
+        ...,
+        description="A list of CSS selectors to try for locating the element(s). The first valid selector will be used."
     )
     filename: str = Field(
+        default="element_screenshot.png",
+        description="Output filename for the screenshot."
+    )
+    highlight: bool = Field(
+        default=True,
+        description="If True, draw a red border around the element before taking the screenshot."
+    )
+    padding: conint(ge=0) = Field(
+        default=10,
+        description="Padding (in pixels) to add around the element in the screenshot."
+    )
+    scroll_if_needed: bool = Field(
+        default=True,
+        description="If True, scroll the element into view before taking the screenshot."
+    )
+    fallback_to_full_page: bool = Field(
+        default=True,
+        description="If no element is found, fallback to taking a full page screenshot."
     )
 @tools.action(
+    description="Captures a screenshot of one or more elements on a page using CSS selectors, with options for highlighting, padding, and scrolling. It can try multiple selectors and fall back to a full-page screenshot.",
+    param_model=ElementScreenshotParams,
 )
 async def element_screenshot(params: ElementScreenshotParams, browser_session: BrowserSession) -> ActionResult:
+    """
+A robust tool to capture screenshots of web elements.
+- It can use JavaScript-based targeting for selectors.
+- Tries multiple selectors to find the target element.
+- Adds padding to provide context around the element.
+    """
+    print("-----------------browser_session_---------")
+    page = await browser_session.get_current_page()
+    # Prefer a session-owned file system path if the BrowserSession provides one
     try:
+        session_base = getattr(browser_session, 'file_system_path', None)
+        if session_base:
+            base_path = os.path.abspath(session_base)
+        else:
+            base_path = os.path.abspath(".")
+        # Create a unique directory for screenshots from this website and session
+        from urllib.parse import urlparse
+        import time
+        parsed_url = urlparse(await page.get_url())
+        # Sanitize website name to be filesystem-friendly
+        website_name = parsed_url.netloc.replace('www.', '').replace('.', '_').replace(':', '_')
+        timestamp = int(time.time())
+        screenshot_dir = os.path.join(base_path, "tempImgs", f"{website_name}-{timestamp}")
+        os.makedirs(screenshot_dir, exist_ok=True)
+        output_path = os.path.join(screenshot_dir, params.filename)
+    except Exception as e:
+        print(e)
+        # Fallback to current working directory if there's an issue creating the new one
+        output_path = os.path.join(os.path.abspath('.'), params.filename)
+    element = None
+    used_selector = None
+    error_messages = []
+    print("Trying to find element :", params)
+    for selector in params.selectors:
+        try:
+            print(selector)
+            loc = await page.evaluate("""
+(selector, padding) => {
+    const el = document.querySelector(selector);
+    if (!el) {
+        return {
+            clip: { x: null, y: null, width: null, height: null },
+            tag: null,
+            selector: selector,
+            id: null,
+            classList: [],
+        };
+    }
+    const rect = el.getBoundingClientRect();
+    return {
+        clip: {
+            x: rect.x - padding,
+            y: rect.y - padding,
+            width: rect.width + 2 * padding,
+            height: rect.height + 2 * padding
+        },
+        tag: el.tagName,
+        selector: selector,
+        id: el.id || null,
+        classList: Array.from(el.classList || []),
+    };
+}
+""", selector, params.padding)
+            element = json.loads(loc)
+            # if await loc.count() > 0:
+            #     element = loc.first()  # Use the first element if multiple are found
+            #     used_selector = selector
+            #     await element.wait_for(state="attached", timeout=3000)
+            #     break
+            # else:
+            #     error_messages.append(f"Selector '{selector}' found no elements.")
+        except Exception as e:
+            error_messages.append(f"Error with selector '{selector}': {str(e)}")
+    print('Element found:', element)
+    print('at 1')
+    if not element:
+        # Full-page fallback screenshot disabled — prefer explicit errors instead of taking full-page screenshots.
+        # If you want to re-enable the fallback, uncomment the lines below.
+        # if params.fallback_to_full_page:
+        #     try:
+        #         await page.screenshot(path=output_path, full_page=True)
+        #         fallback_msg = f"No element found for selectors {params.selectors}. Fell back to full-page screenshot at: {output_path}"
+        #         return ActionResult(
+        #             extracted_content=fallback_msg,
+        #             long_term_memory=fallback_msg,
+        #             vision_content=[{"type": "image", "path": output_path}]
+        #         )
+        #     except Exception as e:
+        #         return ActionResult(error=f"Element not found and full-page screenshot failed: {str(e)}")
+        return ActionResult(error=f"Could not find any element using selectors: {params.selectors}. Errors: {'; '.join(error_messages)}")
+    print('at 2')
+    print(type(element))
+    try:
+        # Scroll element into view if needed
+        # if params.scroll_if_needed:
+        #     await element.scroll_into_view_if_needed(timeout=5000)
+        # Wait for the element to be stable and visible
+        # await element.wait_for(state="visible", timeout=5000)
+        # await element.wait_for(state="attached", timeout=5000)
+        # # Highlight the element with a red border
+        # original_style = ""
+        # if params.highlight:
+        #     original_style = await element.get_attribute("style") or ""
+        #     print('evaluaiton 1')
+        #     await element.evaluate("el => el.style.border = '3px solid red'")
+        # print('evaluaiton 2')
+        # Get bounding box and take screenshot with padding
+        clip_obj = dict(element).get('clip')
+        if not clip_obj or clip_obj.get('x') is None:
+            raise Exception("Could not get bounding box for the element.")
+        try:
+            # Get session id and client from the page wrapper
+            session_id = await page.session_id
+            client = page._client
+            params = {
+                'format': 'png',
+                'clip': {
+                    'x': float(clip_obj['x']),
+                    'y': float(clip_obj['y']),
+                    'width': float(clip_obj['width']),
+                    'height': float(clip_obj['height']),
+                    'scale': 1,
+                },
+            }
+            result = await client.send.Page.captureScreenshot(params, session_id=session_id)
+            img_b64 = result.get('data')
+            if not img_b64:
+                raise Exception('CDP captureScreenshot returned no data')
+            with open(output_path, 'wb') as f:
+                f.write(base64.b64decode(img_b64))
+        except Exception as e:
+            # Re-raise with context
+            raise Exception(f'Failed to take clipped screenshot via CDP: {e}')
+        success_msg = f"Element screenshot saved at: {output_path} (selector: '{used_selector}')"
         return ActionResult(
             extracted_content=success_msg,
             include_in_memory=True,
+            long_term_memory=f"Element screenshot taken: {used_selector} -> {output_path}",
+            vision_content=[{"type": "image", "path": output_path}]
         )
+    except PlaywrightTimeoutError:
+        return ActionResult(error=f"Element screenshot failed: Timeout waiting for element '{used_selector}' to be visible or stable.")
     except Exception as e:
+        return ActionResult(error=f"Element screenshot failed for selector '{used_selector}': {str(e)}")
+# ------------------------ Custom helper tools ------------------------
+@tools.action(
+    description="Finds a web page element using a natural language prompt and returns its selector, backend node id, and the element object.",
+    # param_model=,
+)
+async def find_element_by_prompt(query: str, browser_session: BrowserSession) -> dict:
+    """
+    Use the page's must_get_element_by_prompt (LLM-powered) to robustly locate an element matching the query.
+    Args:
+        query (str): Natural language description of the element to find (e.g., "footer section", "pricing table").
+        browser_session (BrowserSession): The active browser session object.
+    Returns:
+        dict: {
+            "selector": <css selector or None>,
+            "backend_node_id": int,
+            "element": <element object or None>,
+            "reason": <string>
+        }
+            - selector: CSS selector string for the matched element, or None if not found.
+            - backend_node_id: Unique backend node id for direct reference (int or None).
+            - element: The matched element object, or None if not found.
+            - reason: Reason for match or error (string).
+    """
+    page = await browser_session.get_current_page()
+    try:
+        # Use the LLM-powered method to get the element
+        element = await page.must_get_element_by_prompt(query)
+        # Try to build a selector from id/class/tag
+        selector = None
+        if hasattr(element, 'id') and element.id:
+            selector = f"#{element.id}"
+        elif hasattr(element, 'class_name') and element.class_name:
+            first_cls = element.class_name.split()[0]
+            selector = f".{first_cls}"
+        elif hasattr(element, 'tag_name') and element.tag_name:
+            selector = element.tag_name.lower()
+        # Always return backend_node_id for direct reference
+        backend_node_id = getattr(element, 'backend_node_id', None)
+        return {
+            "selector": selector,
+            "backend_node_id": backend_node_id,
+            "element": element,
+            "reason": "llm_match"
+        }
+    except Exception as e:
+        return {
+            "selector": None,
+            "backend_node_id": None,
+            "element": None,
+            "reason": f"llm_error: {e}"
+        }
+@tools.action(
+    description="Injects or removes a visible red outline around the element identified by selector or selector dict for browser agent visual verification.",
+)
+async def highlight_element(selector_or_obj: str | dict, browser_session: BrowserSession) -> dict:
+    """
+    Inject or remove a visible red outline around the element identified by selector (or dict{selector}).
+    Args:
+        selector_or_obj (str | dict): CSS selector string or dict with 'selector' key to identify the element.
+        browser_session (BrowserSession): The active browser session object.
+        remove (bool, optional): If True, removes the highlight. If False or omitted, adds the highlight.
+    Returns:
+        dict: {ok: True/False, selector: used_selector, reason: str}
+    """
+    page = await browser_session.get_current_page()
+    remove = False
+    # Support dict with 'remove' key
+    if isinstance(selector_or_obj, dict):
+        selector = selector_or_obj.get('selector')
+        remove = selector_or_obj.get('remove', False)
+    else:
+        selector = selector_or_obj
+    if remove:
+        js = """
+(sel) => {
+    const el = document.querySelector(sel);
+    if (!el) return { ok: false, reason: 'not_found', selector: sel };
+    if (el.dataset.__highlighted === '1') {
+        el.style.outline = el.dataset.__orig_outline || '';
+        delete el.dataset.__highlighted;
+        delete el.dataset.__orig_outline;
+        return { ok: true, selector: sel, reason: 'highlight_removed' };
+    }
+    return { ok: false, selector: sel, reason: 'no_highlight_to_remove' };
+}
+"""
+    else:
+        js = """
+(sel) => {
+    const el = document.querySelector(sel);
+    if (!el) return { ok: false, reason: 'not_found', selector: sel };
+    // store original outline to restore later
+    el.dataset.__orig_outline = el.style.outline || '';
+    el.style.outline = '3px solid red';
+    el.dataset.__highlighted = '1';
+    return { ok: true, selector: sel, reason: 'highlight_applied' };
+}
+"""
+    try:
+        raw = await page.evaluate(js, selector)
+        return json.loads(raw)
+    except Exception as e:
+        return {"ok": False, "reason": str(e), "selector": selector}
+@tools.action(
+    description="Returns the bounding box (x, y, width, height) for a given CSS selector or selector dict on the current page. Useful for element positioning, cropping, or screenshot tasks.",
+)
+async def get_bounding_box(selector_or_obj: str | dict, browser_session: BrowserSession) -> dict:
+    """
+    Description:
+        Returns the bounding box for a given CSS selector or selector dict on the current page.
+    Args:
+        selector_or_obj (str | dict): CSS selector string or dict with 'selector' key to identify the element.
+        browser_session (BrowserSession): The active browser session object.
+    Returns:
+        dict: {x: float or None, y: float or None, width: float or None, height: float or None, error: str (optional)}
+            - x, y: Top-left coordinates of the element (relative to viewport)
+            - width, height: Size of the element
+            - error: Error message if bounding box could not be retrieved
+    """
+    page = await browser_session.get_current_page()
+    if isinstance(selector_or_obj, dict):
+        selector = selector_or_obj.get('selector')
+    else:
+        selector = selector_or_obj
+    js = """
+(sel) => {
+    const el = document.querySelector(sel);
+    if (!el) return { x: null, y: null, width: null, height: null };
+    const r = el.getBoundingClientRect();
+    return { x: r.x, y: r.y, width: r.width, height: r.height };
+}
+"""
+    try:
+        raw = await page.evaluate(js, selector)
+        return json.loads(raw)
+    except Exception as e:
+        return {"x": None, "y": None, "width": None, "height": None, "error": str(e)}
+@tools.action(
+    description="Takes a screenshot of a specific region (clip) of the current page, defined by x, y, width, height. Returns the saved image path and status.",
+)
+async def element_screenshot_clip(clip: dict, filename: str = 'element_clip.png', browser_session: BrowserSession = None) -> dict:
+    """
+    Description:
+        Takes a screenshot of a specific region (clip) of the current page, defined by x, y, width, height.
+    Args:
+        clip (dict): Dictionary with keys 'x', 'y', 'width', 'height' (all float/int) specifying the region to capture.
+        filename (str, optional): Output filename for the screenshot. Defaults to 'element_clip.png'.
+        browser_session (BrowserSession, optional): The active browser session object. Required.
+    Returns:
+        dict: {ok: True/False, path: str (if ok), error: str (if not ok)}
+            - ok: True if screenshot was successful, False otherwise
+            - path: Absolute path to the saved screenshot image (if ok)
+            - error: Error message if screenshot failed
+    """
+    if browser_session is None:
+        return {"ok": False, "error": "browser_session required"}
+    page = await browser_session.get_current_page()
+    try:
+        session_id = await page.session_id
+        client = page._client
+        params = {
+            'format': 'png',
+            'clip': {
+                'x': float(clip['x']),
+                'y': float(clip['y']),
+                'width': float(clip['width']),
+                'height': float(clip['height']),
+                'scale': 1,
+            },
+        }
+        result = await client.send.Page.captureScreenshot(params, session_id=session_id)
+        img_b64 = result.get('data')
+        if not img_b64:
+            return {"ok": False, "error": 'no_data'}
+        # save in tempImgs root next to script
+        out_path = os.path.abspath(filename)
+        with open(out_path, 'wb') as f:
+            f.write(base64.b64decode(img_b64))
+        return {"ok": True, "path": out_path}
+    except Exception as e:
+        return {"ok": False, "error": str(e)}
+@function_tool
+async def verify_element_visual(query: str, screenshot_path: str, browser_session: BrowserSession, tolerance: int = 20) -> dict:
+    """Verify that the screenshot corresponds to the element found for `query`.
+    Strategy: find element by prompt, get bounding box, compare image size to bbox within tolerance.
+    Returns {verified: bool, selector: str or None, screenshot: path, details: ...}
+    """
+    # 1) locate element
+    found = await find_element_by_prompt(query, browser_session)
+    selector = found.get('selector')
+    if not selector:
+        return {"verified": False, "selector": None, "screenshot": screenshot_path, "details": "could_not_find_element"}
+    # 2) get bbox
+    bbox = await get_bounding_box(selector, browser_session)
+    if not bbox or bbox.get('width') is None:
+        return {"verified": False, "selector": selector, "screenshot": screenshot_path, "details": "could_not_get_bbox"}
+    # 3) load screenshot and compare sizes
+    try:
+        img = Image.open(screenshot_path)
+        w, h = img.size
+    except Exception as e:
+        return {"verified": False, "selector": selector, "screenshot": screenshot_path, "details": f"could_not_open_image: {e}"}
+    # Compare pixel sizes to bbox width/height
+    bw = int(round(bbox['width']))
+    bh = int(round(bbox['height']))
+    if abs(bw - w) <= tolerance and abs(bh - h) <= tolerance:
+        return {"verified": True, "selector": selector, "screenshot": screenshot_path, "details": "size_match"}
+    else:
+        return {"verified": False, "selector": selector, "screenshot": screenshot_path, "details": {"bbox": bbox, "image_size": [w, h]}}
+task_old_1 = f"""
 You are a Browser Intelligence Agent specialized in extracting website content and brand identity assets.
+Your goal is to visit the given website URL and return a structured, comprehensive extraction.
 Follow these steps strictly:
+1. Website Navigation:
+   - Open the provided URL.
+   - If a user query is provided, search across multiple related internal pages (navigation links, relevant subpages) that may contain information about the query.
+   - If no query is provided, focus on the landing page only.
 2. Content Extraction:
+   - If a query is provided:
+     • Extract and summarize text relevant to the query from all visited pages.
+     • Provide a coherent summary that highlights key points across pages.
+   - If no query:
+     • Extract the full visible text from the landing page.
 3. Brand & Design Extraction:
+   - Identify and extract the brand’s visual identity, including:
+     • Primary and secondary colors (hex codes).
+     • Extended color palette if available.
+     • Typography (fonts, weights, styles).
+     • Design system or style guide elements.
+     • Social media brand kit details (logos, icons, button styles, heading styles).
+4. Screenshots (via custom tools):
+   - Capture screenshots of **topic-related content** (e.g., pricing tables, signup buttons, hero sections if the query is “pricing plans”).
+   - Capture screenshots of **brand identity elements** (e.g., color swatches, typography samples, buttons, logos, icons, headings).
+   - Save screenshots with clear, descriptive filenames (e.g., `pricing_table.png`, `signup_button.png`, `primary_colors.png`, `typography_styles.png`).
 5. Output:
+   - Return the extracted content, brand identity data, and screenshot metadata in a clean and structured JSON format.
+   - Do not include free text or commentary outside the JSON.
 Today is {datetime.now().strftime('%Y-%m-%d')}
+User's query: Go to https://github.com/pricing and extract content and brand identity assets and screenshots for linkedin post, Topic is pricing plans.
 """
+task_old_2="""
+###Selector Discovery, Verification & Screenshot Instructions
+When identifying selectors for taking elements or sections screenshots:
+Verify each selector's element or section, then capture its screenshot immediately after successful verification.
+1. **Analyze** the HTML DOM structure of the page to identify potential selectors for the target elements or sections based on the query.
+2. **Generate** a list of possible selectors that could uniquely identify each target element.
+3. **Locate the Target Section or Element:**
+   - Identify the element or section that visually and contextually matches the target.
+   - Focus on the most relevant container or element that directly represents the intended target — not its parent or unrelated siblings.
+4. For each candidate selector:
+   - Use the `"execute_js"` tool to verify that the selector matches exactly the target.
+   - **Highlight** the matched element by injecting a visible red border (`2px solid red`) or a temporary background color.
+5. **Validate the Finalized Selector Against the Query:**
+   - Once a selector is finalized, confirm that it accurately represents the element or section described in the query.
+   - Ensure it precisely corresponds to the query intent and does not include unrelated, broader, or nested regions.
+6. **Remove injected visual styles or modifications** from the DOM to restore the page to its original state before proceeding to the next selector.
+7. **After verification**, immediately **capture a screenshot** of the verified element or section.
+8. Continue this process until **all target selectors** have been verified and their screenshots captured.
+After successful verification, remove all injected visual styles or temporary DOM modifications.
+User's query: Go to https://github.com/pricing and take screenshot of header and pricing details
+"""
 class PageVisited(BaseModel):
     url: str
 async def run_search() -> None:
+    print('====================================================')
+    print('Starting run_search() function')
+    print('====================================================')
+    # Check installed packages that might be relevant
+    try:
+        import importlib
+        packages = ['browser_use', 'playwright', 'aiohttp']
+        for package in packages:
+            try:
+                mod = importlib.import_module(package)
+                print(f"✅ {package} is installed: {getattr(mod, '__version__', 'unknown version')}")
+            except ImportError:
+                print(f"❌ {package} is NOT installed")
+    except Exception as e:
+        print(f"Error checking packages: {e}")
+    # Check environment variables (redacted for security)
+    for key in ['GEMINI_API_KEY', 'OPENROUTER_API_KEY']:
+        if os.environ.get(key):
+            print(f"✅ {key} environment variable is set")
+        else:
+            print(f"❌ {key} environment variable is NOT set")
+    chrome_process = None
+    browser_session = None
+    try:
+        # Launch the browser via BrowserSession so only the agent opens a window.
+        print('🔄 Launching browser via BrowserSession (agent-managed launch)')
+        browser_profile = BrowserProfile(
+            is_local=True,
+            headless=False,
+            launch_args=[
+                '--no-first-run',
+                '--no-default-browser-check',
+                '--disable-extensions',
+                '--disable-background-networking',
+                '--disable-background-timer-throttling',
+                '--disable-backgrounding-occluded-windows',
+                '--disable-popup-blocking',
+                '--disable-renderer-backgrounding',
+                '--force-color-profile=srgb',
+                '--metrics-recording-only',
+                '--mute-audio',
+            ],
+        )
+        print('Creating BrowserSession (this will launch Chrome once, managed by browser-use)')
+        browser_session = BrowserSession(browser_profile=browser_profile)
+        print(f"✅ Browser session created successfully: {browser_session}")
+        # Build the Browser Agent using the created session. Skip internal launch to avoid duplicates.
+        print('🔄 Creating Browser Agent with provided BrowserSession...')
+        browser_agent = AgentBrowser(
+            task=task,
+            llm=llm_browser,
+            use_vision=True,
+            generate_gif=False,
+            max_failures=3,
+            file_system_path="./browser_agent_data",
+            tools=tools,
+            output_model_schema=BrowserAgentOutput,
+            browser_session=browser_session,
+            skip_browser_launch=True,
+        )
+        print('✅ Browser Agent created with provided session')
+        print('🚀 Running browser agent...')
+        try:
+            print("Starting browser agent.run() with max_steps=15")
+            history = await browser_agent.run(max_steps=15)
+            print("-------------Agent run completed---------------")
+            print("Steps executed:", len(history.steps) if hasattr(history, 'steps') else "Unknown")
+            print("-------------Final result---------------")
+            print(history.final_result)
+        except Exception as run_error:
+            print(f'❌ Error during browser agent run: {type(run_error).__name__}: {run_error}')
+            import traceback
+            print("Detailed traceback:")
+            traceback.print_exc()
+            raise
+    except Exception as e:
+        print(f'❌ Error: {e}')
+        raise
+    finally:
+        # Clean up resources in proper order
+        print('🧹 Cleaning up resources...')
+        # First close the browser session which will close its page
+        try:
+            if browser_session:
+                print(f"Attempting to close browser session: {browser_session}")
+                await browser_session.close()
+                print('✅ Closed browser session')
+            else:
+                print('ℹ️ No browser session was created')
+        except Exception as e:
+            print(f'⚠️ Error closing browser session: {type(e).__name__}: {e}')
+            import traceback
+            traceback.print_exc()
+        # Then close the playwright browser
+        if playwright_browser:
+            try:
+                print(f"Attempting to close Playwright browser: {playwright_browser}")
+                await playwright_browser.close()
+                print('✅ Closed Playwright browser')
+            except Exception as e:
+                print(f'⚠️ Error closing Playwright browser: {type(e).__name__}: {e}')
+                import traceback
+                traceback.print_exc()
+        # Finally terminate the Chrome process
+        if chrome_process:
+            try:
+                print(f"Attempting to terminate Chrome process (PID: {chrome_process.pid})")
+                chrome_process.terminate()
+                print("Waiting for Chrome process to exit (timeout: 5s)")
+                await asyncio.wait_for(chrome_process.wait(), 5)
+                print('✅ Terminated Chrome process')
+            except asyncio.TimeoutError:
+                print('⚠️ Chrome process did not exit after 5s timeout, forcing kill')
+                chrome_process.kill()
+                print("Sent SIGKILL to Chrome process")
+            except Exception as e:
+                print(f'⚠️ Error terminating Chrome process: {type(e).__name__}: {e}')
+                import traceback
+                traceback.print_exc()
+        # Check if Chrome is still running via CDP
+        try:
+            print("Checking if Chrome CDP is still accessible...")
+            async with aiohttp.ClientSession() as session:
+                async with session.get('http://localhost:9222/json/version', timeout=aiohttp.ClientTimeout(total=1)) as response:
+                    if response.status == 200:
+                        print('⚠️ WARNING: Chrome with CDP is still running after cleanup!')
+                    else:
+                        print('✅ Chrome CDP no longer accessible (status code != 200)')
+        except Exception:
+            print('✅ Chrome CDP no longer accessible (connection failed)')
+        print('✅ All cleanup complete')

src/utils/chrome_playwright.py ADDED Viewed

	@@ -0,0 +1,142 @@

+import os
+import tempfile
+import asyncio
+import aiohttp
+from playwright.async_api import async_playwright
+async def start_chrome_with_debug_port(port: int = 9222):
+    """
+    Start Chrome with remote debugging enabled.
+    Returns the Chrome process.
+    """
+    user_data_dir = tempfile.mkdtemp(prefix='chrome_cdp_')
+    print(f"Created temp user data dir: {user_data_dir}")
+    chrome_paths = [
+        r'C:\Program Files\Google\Chrome\Application\chrome.exe',
+        'chrome.exe',
+        'chrome',
+    ]
+    chrome_exe = None
+    print(f"Looking for Chrome executable in these locations: {chrome_paths}")
+    for path in chrome_paths:
+        if os.path.exists(path):
+            print(f"Found Chrome at: {path}")
+            try:
+                print(f"Testing executable: {path}")
+                test_proc = await asyncio.create_subprocess_exec(
+                    path, '--version', stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
+                )
+                stdout, stderr = await test_proc.communicate()
+                if test_proc.returncode == 0:
+                    version = stdout.decode().strip() if stdout else "Unknown version"
+                    print(f"Chrome executable works! Version: {version}")
+                    chrome_exe = path
+                    break
+                else:
+                    error = stderr.decode().strip() if stderr else "Unknown error"
+                    print(f"Chrome executable test failed: {error}")
+            except Exception as e:
+                print(f"Error testing Chrome executable {path}: {e}")
+                continue
+        elif path in ['chrome', 'chromium', 'chrome.exe']:
+            print(f"Checking PATH for {path}")
+            try:
+                test_proc = await asyncio.create_subprocess_exec(
+                    path, '--version', stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
+                )
+                stdout, stderr = await test_proc.communicate()
+                if test_proc.returncode == 0:
+                    version = stdout.decode().strip() if stdout else "Unknown version"
+                    print(f"Chrome executable works via PATH! Version: {version}")
+                    chrome_exe = path
+                    break
+                else:
+                    error = stderr.decode().strip() if stderr else "Unknown error"
+                    print(f"Chrome executable test via PATH failed: {error}")
+            except Exception as e:
+                print(f"Error testing Chrome executable via PATH {path}: {e}")
+                continue
+    if not chrome_exe:
+        raise RuntimeError('❌ Chrome not found. Please install Chrome or Chromium.')
+    cmd = [
+        chrome_exe,
+        f'--remote-debugging-port={port}',
+        f'--user-data-dir={user_data_dir}',
+        '--no-first-run',
+        '--no-default-browser-check',
+        '--disable-extensions',
+        '--disable-background-networking',
+        '--disable-background-timer-throttling',
+        '--disable-backgrounding-occluded-windows',
+        '--disable-breakpad',
+        '--disable-component-extensions-with-background-pages',
+        '--disable-features=TranslateUI,BlinkGenPropertyTrees',
+        '--disable-ipc-flooding-protection',
+        '--disable-popup-blocking',
+        '--disable-prompt-on-repost',
+        '--disable-renderer-backgrounding',
+        '--force-color-profile=srgb',
+        '--metrics-recording-only',
+        '--mute-audio',
+        'about:blank',
+    ]
+    print(f"Starting Chrome with command: {cmd}")
+    process = await asyncio.create_subprocess_exec(*cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE)
+    print(f"Chrome process started with PID: {process.pid}")
+    print(f"Waiting for Chrome CDP to be available at http://localhost:{port}/json/version...")
+    cdp_ready = False
+    for attempt in range(20):
+        try:
+            async with aiohttp.ClientSession() as session:
+                print(f"CDP check attempt {attempt+1}/20...")
+                async with session.get(
+                    f'http://localhost:{port}/json/version', timeout=aiohttp.ClientTimeout(total=1)
+                ) as response:
+                    if response.status == 200:
+                        data = await response.json()
+                        print(f"CDP connected successfully! Chrome version: {data.get('Browser', 'Unknown')}")
+                        cdp_ready = True
+                        break
+                    else:
+                        print(f"CDP check failed with status: {response.status}")
+        except Exception as e:
+            print(f"CDP check failed with error: {type(e).__name__}: {e}")
+        await asyncio.sleep(1)
+    if not cdp_ready:
+        print(f"ERROR: Chrome DevTools Protocol not available after timeout on port {port}")
+        stdout_data, stderr_data = await process.communicate()
+        print(f"Chrome STDOUT: {stdout_data.decode('utf-8', errors='ignore')}")
+        print(f"Chrome STDERR: {stderr_data.decode('utf-8', errors='ignore')}")
+        process.terminate()
+        raise RuntimeError('❌ Chrome failed to start with CDP')
+    return process
+async def connect_playwright_to_cdp(cdp_url: str):
+    """
+    Connect Playwright to the same Chrome instance Browser-Use is using.
+    Returns the Playwright browser and page.
+    """
+    print(f"Connecting Playwright to CDP URL: {cdp_url}")
+    playwright = await async_playwright().start()
+    playwright_browser = await playwright.chromium.connect_over_cdp(cdp_url)
+    print(f"Playwright connected to browser")
+    if playwright_browser and playwright_browser.contexts and playwright_browser.contexts[0].pages:
+        playwright_page = playwright_browser.contexts[0].pages[0]
+        print(f"Using existing page: {await playwright_page.title()}")
+    elif playwright_browser:
+        print("No existing pages found, creating a new context and page")
+        context = await playwright_browser.new_context()
+        playwright_page = await context.new_page()
+    else:
+        playwright_page = None
+    print(f"Playwright page setup complete")
+    return playwright_browser, playwright_page

uv.lock CHANGED Viewed

@@ -2175,6 +2175,25 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/34/e7/ae39f538fd6844e982063c3a5e4598b8ced43b9633baa3a85ef33af8c05c/pillow-11.3.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:c84d689db21a1c397d001aa08241044aa2069e7587b398c8cc63020390b1c1b8", size = 6984598 },
 ]
 [[package]]
 name = "portalocker"
 version = "2.10.1"
@@ -2606,6 +2625,18 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/58/f0/427018098906416f580e3cf1366d3b1abfb408a0652e9f31600c24a1903c/pydantic_settings-2.10.1-py3-none-any.whl", hash = "sha256:a60952460b99cf661dc25c29c0ef171721f98bfcb52ef8d9ea4c943d7c8cc796", size = 45235 },
 ]
 [[package]]
 name = "pygments"
 version = "2.19.2"
@@ -5733,6 +5764,7 @@ dependencies = [
     { name = "openai-agents" },
     { name = "pathlib" },
     { name = "pillow" },
     { name = "pydantic" },
     { name = "pydantic-ai" },
     { name = "urljoin" },
@@ -5758,6 +5790,7 @@ requires-dist = [
     { name = "openai-agents", specifier = ">=0.2.8" },
     { name = "pathlib", specifier = ">=1.0.1" },
     { name = "pillow", specifier = ">=11.3.0" },
     { name = "pydantic", specifier = ">=2.11.7" },
     { name = "pydantic-ai", extras = ["logfire"], specifier = ">=1.0.1" },
     { name = "urljoin", specifier = ">=1.0.0" },

     { url = "https://files.pythonhosted.org/packages/34/e7/ae39f538fd6844e982063c3a5e4598b8ced43b9633baa3a85ef33af8c05c/pillow-11.3.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:c84d689db21a1c397d001aa08241044aa2069e7587b398c8cc63020390b1c1b8", size = 6984598 },
 ]
+[[package]]
+name = "playwright"
+version = "1.55.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "greenlet" },
+    { name = "pyee" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/80/3a/c81ff76df266c62e24f19718df9c168f49af93cabdbc4608ae29656a9986/playwright-1.55.0-py3-none-macosx_10_13_x86_64.whl", hash = "sha256:d7da108a95001e412effca4f7610de79da1637ccdf670b1ae3fdc08b9694c034", size = 40428109 },
+    { url = "https://files.pythonhosted.org/packages/cf/f5/bdb61553b20e907196a38d864602a9b4a461660c3a111c67a35179b636fa/playwright-1.55.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:8290cf27a5d542e2682ac274da423941f879d07b001f6575a5a3a257b1d4ba1c", size = 38687254 },
+    { url = "https://files.pythonhosted.org/packages/4a/64/48b2837ef396487807e5ab53c76465747e34c7143fac4a084ef349c293a8/playwright-1.55.0-py3-none-macosx_11_0_universal2.whl", hash = "sha256:25b0d6b3fd991c315cca33c802cf617d52980108ab8431e3e1d37b5de755c10e", size = 40428108 },
+    { url = "https://files.pythonhosted.org/packages/08/33/858312628aa16a6de97839adc2ca28031ebc5391f96b6fb8fdf1fcb15d6c/playwright-1.55.0-py3-none-manylinux1_x86_64.whl", hash = "sha256:c6d4d8f6f8c66c483b0835569c7f0caa03230820af8e500c181c93509c92d831", size = 45905643 },
+    { url = "https://files.pythonhosted.org/packages/83/83/b8d06a5b5721931aa6d5916b83168e28bd891f38ff56fe92af7bdee9860f/playwright-1.55.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:29a0777c4ce1273acf90c87e4ae2fe0130182100d99bcd2ae5bf486093044838", size = 45296647 },
+    { url = "https://files.pythonhosted.org/packages/06/2e/9db64518aebcb3d6ef6cd6d4d01da741aff912c3f0314dadb61226c6a96a/playwright-1.55.0-py3-none-win32.whl", hash = "sha256:29e6d1558ad9d5b5c19cbec0a72f6a2e35e6353cd9f262e22148685b86759f90", size = 35476046 },
+    { url = "https://files.pythonhosted.org/packages/46/4f/9ba607fa94bb9cee3d4beb1c7b32c16efbfc9d69d5037fa85d10cafc618b/playwright-1.55.0-py3-none-win_amd64.whl", hash = "sha256:7eb5956473ca1951abb51537e6a0da55257bb2e25fc37c2b75af094a5c93736c", size = 35476048 },
+    { url = "https://files.pythonhosted.org/packages/21/98/5ca173c8ec906abde26c28e1ecb34887343fd71cc4136261b90036841323/playwright-1.55.0-py3-none-win_arm64.whl", hash = "sha256:012dc89ccdcbd774cdde8aeee14c08e0dd52ddb9135bf10e9db040527386bd76", size = 31225543 },
+]
 [[package]]
 name = "portalocker"
 version = "2.10.1"
     { url = "https://files.pythonhosted.org/packages/58/f0/427018098906416f580e3cf1366d3b1abfb408a0652e9f31600c24a1903c/pydantic_settings-2.10.1-py3-none-any.whl", hash = "sha256:a60952460b99cf661dc25c29c0ef171721f98bfcb52ef8d9ea4c943d7c8cc796", size = 45235 },
 ]
+[[package]]
+name = "pyee"
+version = "13.0.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/95/03/1fd98d5841cd7964a27d729ccf2199602fe05eb7a405c1462eb7277945ed/pyee-13.0.0.tar.gz", hash = "sha256:b391e3c5a434d1f5118a25615001dbc8f669cf410ab67d04c4d4e07c55481c37", size = 31250 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/9b/4d/b9add7c84060d4c1906abe9a7e5359f2a60f7a9a4f67268b2766673427d8/pyee-13.0.0-py3-none-any.whl", hash = "sha256:48195a3cddb3b1515ce0695ed76036b5ccc2ef3a9f963ff9f77aec0139845498", size = 15730 },
+]
 [[package]]
 name = "pygments"
 version = "2.19.2"
     { name = "openai-agents" },
     { name = "pathlib" },
     { name = "pillow" },
+    { name = "playwright" },
     { name = "pydantic" },
     { name = "pydantic-ai" },
     { name = "urljoin" },
     { name = "openai-agents", specifier = ">=0.2.8" },
     { name = "pathlib", specifier = ">=1.0.1" },
     { name = "pillow", specifier = ">=11.3.0" },
+    { name = "playwright", specifier = ">=1.55.0" },
     { name = "pydantic", specifier = ">=2.11.7" },
     { name = "pydantic-ai", extras = ["logfire"], specifier = ">=1.0.1" },
     { name = "urljoin", specifier = ">=1.0.0" },