Speedofmastery's picture
Merge Landrun + Browser-Use + Chromium with AI agent support (without binary files)
d7b3d84

Browser Actor

Browser Actor is a web automation library built on CDP (Chrome DevTools Protocol) that provides low-level browser automation capabilities within the browser-use ecosystem.

Usage

Integrated with Browser (Recommended)

from browser_use import Browser  # Alias for BrowserSession

# Create and start browser session
browser = Browser()
await browser.start()

# Create new tabs and navigate
page = await browser.new_page("https://example.com")
pages = await browser.get_pages()
current_page = await browser.get_current_page()

Direct Page Access (Advanced)

from browser_use.actor import Page, Element, Mouse

# Create page with existing browser session
page = Page(browser_session, target_id, session_id)

Basic Operations

# Tab Management
page = await browser.new_page()  # Create blank tab
page = await browser.new_page("https://example.com")  # Create tab with URL
pages = await browser.get_pages()  # Get all existing tabs
await browser.close_page(page)  # Close specific tab

# Navigation
await page.goto("https://example.com")
await page.go_back()
await page.go_forward()
await page.reload()

Element Operations

# Find elements by CSS selector
elements = await page.get_elements_by_css_selector("input[type='text']")
buttons = await page.get_elements_by_css_selector("button.submit")

# Get element by backend node ID
element = await page.get_element(backend_node_id=12345)

# AI-powered element finding (requires LLM)
element = await page.get_element_by_prompt("search button", llm=your_llm)
element = await page.must_get_element_by_prompt("login form", llm=your_llm)

Note: get_elements_by_css_selector returns immediately without waiting for visibility.

Element Interactions

# Element actions
await element.click(button='left', click_count=1, modifiers=['Control'])
await element.fill("Hello World")  # Clears first, then types
await element.hover()
await element.focus()
await element.check()  # Toggle checkbox/radio
await element.select_option(["option1", "option2"])  # For dropdown/select
await element.drag_to(target_element)  # Drag and drop

# Element properties
value = await element.get_attribute("value")
box = await element.get_bounding_box()  # Returns BoundingBox or None
info = await element.get_basic_info()  # Comprehensive element info
screenshot_b64 = await element.screenshot(format='jpeg')

# Execute JavaScript on element (this context is the element)
text = await element.evaluate("() => this.textContent")
await element.evaluate("(color) => this.style.backgroundColor = color", "yellow")
classes = await element.evaluate("() => Array.from(this.classList)")

Mouse Operations

# Mouse operations
mouse = await page.mouse
await mouse.click(x=100, y=200, button='left', click_count=1)
await mouse.move(x=300, y=400, steps=1)
await mouse.down(button='left')  # Press button
await mouse.up(button='left')    # Release button
await mouse.scroll(x=0, y=100, delta_x=0, delta_y=-500)  # Scroll at coordinates

Page Operations

# JavaScript evaluation
result = await page.evaluate('() => document.title')  # Must use arrow function format
result = await page.evaluate('(x, y) => x + y', 10, 20)  # With arguments

# Keyboard input
await page.press("Control+A")  # Key combinations supported
await page.press("Escape")     # Single keys

# Page controls
await page.set_viewport_size(width=1920, height=1080)
page_screenshot = await page.screenshot()  # JPEG by default
page_png = await page.screenshot(format="png", quality=90)

# Page information
url = await page.get_url()
title = await page.get_title()

AI-Powered Features

# Content extraction using LLM
from pydantic import BaseModel

class ProductInfo(BaseModel):
    name: str
    price: float
    description: str

# Extract structured data from current page
products = await page.extract_content(
    "Find all products with their names, prices and descriptions",
    ProductInfo,
    llm=your_llm
)

Core Classes

  • BrowserSession (aliased as Browser): Main browser session manager with tab operations
  • Page: Represents a single browser tab or iframe for page-level operations
  • Element: Individual DOM element for interactions and property access
  • Mouse: Mouse operations within a page (click, move, scroll)

API Reference

BrowserSession Methods (Tab Management)

  • start() - Initialize and start the browser session
  • stop() - Stop the browser session (keeps browser alive)
  • kill() - Kill the browser process and reset all state
  • new_page(url=None) β†’ Page - Create blank tab or navigate to URL
  • get_pages() β†’ list[Page] - Get all available pages
  • get_current_page() β†’ Page | None - Get the currently focused page
  • close_page(page: Page | str) - Close page by object or ID
  • Session management and CDP client operations

Page Methods (Page Operations)

  • get_elements_by_css_selector(selector: str) β†’ list[Element] - Find elements by CSS selector
  • get_element(backend_node_id: int) β†’ Element - Get element by backend node ID
  • get_element_by_prompt(prompt: str, llm) β†’ Element | None - AI-powered element finding
  • must_get_element_by_prompt(prompt: str, llm) β†’ Element - AI element finding (raises if not found)
  • extract_content(prompt: str, structured_output: type[T], llm) β†’ T - Extract structured data using LLM
  • goto(url: str) - Navigate this page to URL
  • go_back(), go_forward() - Navigate history (with error handling)
  • reload() - Reload the current page
  • evaluate(page_function: str, *args) β†’ str - Execute JavaScript (MUST use (...args) => format)
  • press(key: str) - Press key on page (supports "Control+A" format)
  • set_viewport_size(width: int, height: int) - Set viewport dimensions
  • screenshot(format='jpeg', quality=None) β†’ str - Take page screenshot, return base64
  • get_url() β†’ str, get_title() β†’ str - Get page information
  • mouse β†’ Mouse - Get mouse interface for this page

Element Methods (DOM Interactions)

  • click(button='left', click_count=1, modifiers=None) - Click element with advanced fallbacks
  • fill(text: str, clear=True) - Fill input with text (clears first by default)
  • hover() - Hover over element
  • focus() - Focus the element
  • check() - Toggle checkbox/radio button (clicks to change state)
  • select_option(values: str | list[str]) - Select dropdown options
  • drag_to(target_element: Element | Position, source_position=None, target_position=None) - Drag to target element
  • evaluate(page_function: str, *args) β†’ str - Execute JavaScript on element (this = element)
  • get_attribute(name: str) β†’ str | None - Get attribute value
  • get_bounding_box() β†’ BoundingBox | None - Get element position/size
  • screenshot(format='jpeg', quality=None) β†’ str - Take element screenshot, return base64
  • get_basic_info() β†’ ElementInfo - Get comprehensive element information

Mouse Methods (Coordinate-Based Operations)

  • click(x: int, y: int, button='left', click_count=1) - Click at coordinates
  • move(x: int, y: int, steps=1) - Move to coordinates
  • down(button='left', click_count=1), up(button='left', click_count=1) - Press/release button
  • scroll(x=0, y=0, delta_x=None, delta_y=None) - Scroll page at coordinates

Type Definitions

Position

class Position(TypedDict):
    x: float
    y: float

BoundingBox

class BoundingBox(TypedDict):
    x: float
    y: float
    width: float
    height: float

ElementInfo

class ElementInfo(TypedDict):
    backendNodeId: int          # CDP backend node ID
    nodeId: int | None          # CDP node ID
    nodeName: str               # HTML tag name (e.g., "DIV", "INPUT")
    nodeType: int               # DOM node type
    nodeValue: str | None       # Text content for text nodes
    attributes: dict[str, str]  # HTML attributes
    boundingBox: BoundingBox | None  # Element position and size
    error: str | None           # Error message if info retrieval failed

Important Usage Notes

This is browser-use actor, NOT Playwright or Selenium. Only use the methods documented above.

Critical JavaScript Rules

  • page.evaluate() and element.evaluate() MUST use (...args) => {} arrow function format
  • Always returns string (objects are JSON-stringified automatically)
  • Use single quotes around the function: page.evaluate('() => document.title')
  • For complex selectors in JS: '() => document.querySelector("input[name=\\"email\\"]")'
  • element.evaluate(): this context is bound to the element automatically

Method Restrictions

  • get_elements_by_css_selector() returns immediately (no automatic waiting)
  • For dropdowns: use element.select_option(), NOT element.fill()
  • Form submission: click submit button or use page.press("Enter")
  • No methods like: element.submit(), element.dispatch_event(), element.get_property()

Error Prevention

  • Always verify page state changes with page.get_url(), page.get_title()
  • Use element.get_attribute() to check element properties
  • Validate CSS selectors before use
  • Handle navigation timing with appropriate asyncio.sleep() calls

AI Features

  • get_element_by_prompt() and extract_content() require an LLM instance
  • These methods use DOM analysis and structured output parsing
  • Best for complex page understanding and data extraction tasks