Browser Use¶
What it is¶
Browser Use is an open-source framework that allows LLMs to interact with real browsers, enabling them to perform web-based tasks like form-filling, scraping, and application navigation.
What problem it solves¶
It bridges the gap between static scraping (which fails on dynamic, JS-heavy sites) and manual browser automation, allowing agents to "see" and "interact" with the web just like a human would.
Where it fits in the stack¶
Infrastructure / Framework. It provides the interface for agents to drive browsers via Playwright or similar drivers.
Typical use cases¶
- Complex Scraping: Extracting data from authenticated or multi-step web processes.
- Workflow Automation: Automating tasks on web apps that lack official APIs.
- Agent Testing: Verifying browser-based agent behaviors.
Example company use cases¶
- Finance ops: log into a supplier portal, download monthly statements, and hand them to a document pipeline.
- Lead generation: gather structured data from directories or websites that do not expose a practical API.
- QA and support: reproduce user-reported UI issues or verify that a browser-based workflow still works after changes.
Example workflow shape¶
Find target page -> authenticate -> navigate multi-step flow -> extract result -> store structured output
Strengths¶
- Native MCP Support: Can be used as an MCP server with Claude Desktop.
- High Success Rate: Reportedly high accuracy on benchmarks like WebVoyager.
- Multi-LLM: Works with any major LLM through standard providers.
- Active Community: Rapidly growing star count (78k+).
Limitations¶
- Overhead: Driving a real browser is slower and more resource-intensive than API calls.
- Cost: High token consumption for vision-based or detailed DOM-reasoning tasks.
- Fragility: Still subject to breakage on massive UI changes, though more robust than traditional XPaths.
When to use it¶
- When an application has no API but needs to be automated.
- For deep web research that requires multi-tab navigation or interactive sessions.
When not to use it¶
- When a fast, stable REST API is available for the same task.
- For high-frequency, low-latency data extraction.
Selection comments¶
- Prefer APIs first, browser automation second.
- Browser Use is strongest when the workflow is interactive, stateful, and human-like.
- Pair it with n8n for scheduling and retries, and with mem0 if the agent must remember prior interactions.
Getting started¶
Installation¶
pip install browser-use
Basic Usage¶
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
agent = Agent(
task="Go to Hacker News and find the top story about AI.",
llm=ChatOpenAI(model="gpt-4o"),
)
result = await agent.run()
print(result)
import asyncio
asyncio.run(main())
CLI examples¶
# Run a simple browser-use task from the CLI
python -m browser_use "Search for the latest news on SpaceX"
# Start the browser-use web UI for interactive task creation
python -m browser_use --ui
# List all available browser-use agent configurations
python -m browser_use --list-agents
API examples¶
from browser_use import Agent, Browser, BrowserConfig
from langchain_anthropic import ChatAnthropic
# Advanced configuration with a persistent browser context
browser = Browser(config=BrowserConfig(headless=False))
agent = Agent(
task="Log in to my dashboard and download the last 3 reports",
llm=ChatAnthropic(model="claude-3-5-sonnet-20240620"),
browser=browser
)
async def run_task():
history = await agent.run()
print(f"Task completed. Steps taken: {len(history)}")
await browser.close()
Licensing and cost¶
- Open Source: Yes (MIT)
- Cost: Free (Self-hosted)
- Self-hostable: Yes
Related tools / concepts¶
Sources / References¶
Contribution Metadata¶
- Last reviewed: 2026-05-22
- Confidence: high