|
|
|
|
|
title: "Agent Settings" |
|
|
description: "Learn how to configure the agent" |
|
|
icon: "gear" |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The `Agent` class is the core component of Browser Use that handles browser automation. Here are the main configuration options you can use when initializing an agent. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
from browser_use import Agent |
|
|
from langchain_openai import ChatOpenAI |
|
|
|
|
|
agent = Agent( |
|
|
task="Search for latest news about AI", |
|
|
llm=ChatOpenAI(model="gpt-4o"), |
|
|
) |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
- `task`: The instruction for the agent to execute |
|
|
- `llm`: A LangChain chat model instance. See <a href="/customize/supported-models">LangChain Models</a> for supported models. |
|
|
|
|
|
|
|
|
|
|
|
Control how the agent operates: |
|
|
|
|
|
```python |
|
|
agent = Agent( |
|
|
task="your task", |
|
|
llm=llm, |
|
|
controller=custom_controller, |
|
|
use_vision=True, |
|
|
save_conversation_path="logs/conversation" |
|
|
) |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
- `controller`: Registry of functions the agent can call. Defaults to base Controller. See <a href="/customize/custom-functions">Custom Functions</a> for details. |
|
|
- `use_vision`: Enable/disable vision capabilities. Defaults to `True`. |
|
|
- When enabled, the model processes visual information from web pages |
|
|
- Disable to reduce costs or use models without vision support |
|
|
- For GPT-4o, image processing costs approximately 800-1000 tokens (~$0.002 USD) per image (but this depends on the defined screen size) |
|
|
- `save_conversation_path`: Path to save the complete conversation history. Useful for debugging. |
|
|
- `system_prompt_class`: Custom system prompt class. See <a href="/customize/system-prompt">System Prompt</a> for customization options. |
|
|
|
|
|
<Note> |
|
|
Vision capabilities are recommended for better web interaction understanding, |
|
|
but can be disabled to reduce costs or when using models without vision |
|
|
support. |
|
|
</Note> |
|
|
|
|
|
|
|
|
|
|
|
You can configure how the agent interacts with the browser. To see more `Browser` options refer to the <a href="/customize/browser-settings">Browser Settings</a> documentation. |
|
|
|
|
|
|
|
|
|
|
|
`browser`: A Browser Use Browser instance. When provided, the agent will reuse this browser instance and automatically create new contexts for each `run()`. |
|
|
|
|
|
```python |
|
|
from browser_use import Agent, Browser |
|
|
from browser_use.browser.context import BrowserContext |
|
|
|
|
|
|
|
|
browser = Browser() |
|
|
agent = Agent( |
|
|
task=task1, |
|
|
llm=llm, |
|
|
browser=browser |
|
|
) |
|
|
|
|
|
await agent.run() |
|
|
|
|
|
|
|
|
await browser.close() |
|
|
``` |
|
|
|
|
|
<Note> |
|
|
Remember: in this scenario the `Browser` will not be closed automatically. |
|
|
</Note> |
|
|
|
|
|
|
|
|
|
|
|
`browser_context`: A Playwright browser context. Useful for maintaining persistent sessions. See <a href="/customize/persistent-browser">Persistent Browser</a> for more details. |
|
|
|
|
|
```python |
|
|
from browser_use import Agent, Browser |
|
|
from playwright.async_api import BrowserContext |
|
|
|
|
|
|
|
|
async with await browser.new_context() as context: |
|
|
agent = Agent( |
|
|
task=task2, |
|
|
llm=llm, |
|
|
browser_context=context |
|
|
) |
|
|
|
|
|
|
|
|
await agent.run() |
|
|
|
|
|
|
|
|
next_agent = Agent( |
|
|
task=task2, |
|
|
llm=llm, |
|
|
browser_context=context |
|
|
) |
|
|
|
|
|
... |
|
|
|
|
|
await browser.close() |
|
|
``` |
|
|
|
|
|
For more information about how browser context works, refer to the [Playwright |
|
|
documentation](https://playwright.dev/docs/api/class-browsercontext). |
|
|
|
|
|
<Note> |
|
|
You can reuse the same context for multiple agents. If you do nothing, the |
|
|
browser will be automatically created and closed on `run()` completion. |
|
|
</Note> |
|
|
|
|
|
|
|
|
|
|
|
The agent is executed using the async `run()` method: |
|
|
|
|
|
- `max_steps` (default: `100`) |
|
|
Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time. |
|
|
|
|
|
|
|
|
|
|
|
The method returns an `AgentHistoryList` object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts. |
|
|
|
|
|
```python |
|
|
|
|
|
history = await agent.run() |
|
|
|
|
|
|
|
|
history.urls() |
|
|
history.screenshots() |
|
|
history.action_names() |
|
|
history.extracted_content() |
|
|
history.errors() |
|
|
history.model_actions() |
|
|
``` |
|
|
|
|
|
The `AgentHistoryList` provides many helper methods to analyze the execution: |
|
|
|
|
|
- `final_result()`: Get the final extracted content |
|
|
- `is_done()`: Check if the agent completed successfully |
|
|
- `has_errors()`: Check if any errors occurred |
|
|
- `model_thoughts()`: Get the agent's reasoning process |
|
|
- `action_results()`: Get results of all actions |
|
|
|
|
|
<Note> |
|
|
For a complete list of helper methods and detailed history analysis |
|
|
capabilities, refer to the [AgentHistoryList source |
|
|
code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L111). |
|
|
</Note> |
|
|
|
|
|
## Run initial actions without LLM |
|
|
With [this example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py) you can run initial actions without the LLM. |
|
|
Specify the action as a dictionary where the key is the action name and the value is the action parameters. You can find all our actions in the [Controller](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) source code. |
|
|
```python |
|
|
|
|
|
initial_actions = [ |
|
|
{'open_tab': {'url': 'https://www.google.com'}}, |
|
|
{'open_tab': {'url': 'https://en.wikipedia.org/wiki/Randomness'}}, |
|
|
{'scroll_down': {'amount': 1000}}, |
|
|
] |
|
|
agent = Agent( |
|
|
task='What theories are displayed on the page?', |
|
|
initial_actions=initial_actions, |
|
|
llm=llm, |
|
|
) |
|
|
``` |
|
|
|
|
|
## Run with planner model |
|
|
|
|
|
You can configure the agent to use a separate planner model for high-level task planning: |
|
|
|
|
|
```python |
|
|
from langchain_openai import ChatOpenAI |
|
|
|
|
|
# Initialize models |
|
|
llm = ChatOpenAI(model='gpt-4o') |
|
|
planner_llm = ChatOpenAI(model='o3-mini') |
|
|
|
|
|
agent = Agent( |
|
|
task="your task", |
|
|
llm=llm, |
|
|
planner_llm=planner_llm, # Separate model for planning |
|
|
use_vision_for_planner=False, # Disable vision for planner |
|
|
planner_interval=4 # Plan every 4 steps |
|
|
) |
|
|
``` |
|
|
|
|
|
### Planner Parameters |
|
|
|
|
|
- `planner_llm`: A LangChain chat model instance used for high-level task planning. Can be a smaller/cheaper model than the main LLM. |
|
|
- `use_vision_for_planner`: Enable/disable vision capabilities for the planner model. Defaults to `True`. |
|
|
- `planner_interval`: Number of steps between planning phases. Defaults to `1`. |
|
|
|
|
|
Using a separate planner model can help: |
|
|
- Reduce costs by using a smaller model for high-level planning |
|
|
- Improve task decomposition and strategic thinking |
|
|
- Better handle complex, multi-step tasks |
|
|
|
|
|
<Note> |
|
|
The planner model is optional. If not specified, the agent will not use the planner model. |
|
|
</Note> |
|
|
|