Scaling Web Automation with Pilo and Agentic AI
Building reliable AI web agents is incredibly complex. To solve this, we are open sourcing Pilo. Pilo is a robust execution engine that uses the accessibility tree, smart context compression, and agentic reasoning loops to reliably navigate the modern web. Run it entirely on your own hardware.

The Chaos of the Modern Web
Building reliable web automation is notoriously difficult. Modern websites are dynamic, heavily reliant on JavaScript, and constantly changing. When you add Large Language Models to the mix to create autonomous web agents, the complexity multiplies. You suddenly have to manage browser orchestration, keep token costs down while maintaining full page context, handle flaky networks, and build complex reasoning loops just to get an AI to reliably extract data or click a button. You end up managing massive amounts of infrastructure just to handle a basic reasoning loop.
We didn't just stumble upon these problems; we ran into them headfirst while building Tabstack. We needed a reliable engine to power our own /automate endpoint, but nothing off the shelf could handle the chaos of the modern web without constant breakage. So we built our own solution.
We call it Pilo. It became the bedrock of our platform, and we realized that developers everywhere need this same robust, transparent foundation. Today, we are very excited to share Pilo as an open source project, enabling anyone to run this powerful web automation engine completely on their own infrastructure.
You can find the full source code and documentation on Github here.
Giving AI the Steering Wheel
At its core, Pilo moves away from rigid scripts and embraces an agentic loop built around Playwright's accessibility tree. Instead of telling a program exactly where to click, you give Pilo a natural language goal. Pilo then takes over and makes decisions at every step based on what it actually sees on the screen.
The system operates in a continuous, intelligent loop:
- Observe: Pilo captures the current page state using the browser's accessibility tree. This provides a semantic, stable structure of the page rather than a chaotic mess of raw HTML tags.
- Decide: Pilo passes this state to an LLM provider of your choice. The LLM evaluates the context and selects a specific tool to use next.
- Act: Pilo executes the chosen action, whether that is navigating to a URL, clicking a button, filling out a form, or extracting structured data.
Pilo handles all the gritty details under the hood. It features layered retries for flaky navigation, automatic context truncation to prevent context window bloat during long tasks, and a robust error recovery philosophy. If an action fails, Pilo does not simply crash. It feeds the error back to the LLM so the agent can adapt and try a new approach.

Anatomy of an Autonomous Action
To understand how Pilo handles the complexities of the web, let's look under the hood at a task like "find the best pizza restaurants in Seattle".
1. The Planning Phase: Before launching a browser, Pilo forces the LLM to pause and strategize. It calls a create_plan tool to generate a step-by-step path and, crucially, defines strict success criteria. For a search-heavy task like this, the planner might smartly set the starting URL to about:blank, signaling the agent to immediately trigger its search workflow rather than wasting time navigating to a specific homepage.
2. Layered Navigation: Navigating to a URL is the single most failure-prone operation in browser automation. Pilo wraps navigation in a layered defense system. It starts with timeout escalation, doubling the allowed wait time on each failure. If the network is truly flaky and throws a DNS error, Pilo will automatically kill and restart the entire browser instance to clear any bad state before retrying.
3. Compressing the Matrix: Once the page loads, Pilo needs to show it to the LLM. Passing raw HTML is too expensive and noisy, so Pilo captures the accessibility tree, a semantic map of the page's structure.
It pipes this tree through a compression engine that maps verbose tags like listitem to li, shortens reference IDs, and deduplicates repetitive text. This process slashes token usage by 60 to 80 percent while keeping every interactive element accessible.
4. The Decision Loop: The LLM reviews the compressed snapshot and selects a tool, such as click. Pilo resolves the reference ID (e.g., e45) to a live Playwright locator and executes the action. If the page shifted and e45 is gone, Pilo catches the InvalidRefException and feeds it back to the LLM as a recoverable error. The agent sees the failure, realizes the page state has changed, and naturally requests a fresh snapshot to try again.
5. Quality Control: When the agent thinks it's finished and calls done, Pilo doesn't just take its word for it. It triggers an isolated validation step where a second LLM acts as a grader. It compares the final results against the success criteria defined in step one. If the agent found only two restaurants when the plan demanded five, the validator marks it as "partial" quality and sends the agent back to work with specific feedback on what's missing.
Seeing is Believing: Pilo in Your Browser
While Pilo is a powerful backend engine, we wanted to make its capabilities completely tangible. Because the event system decouples the core logic from where it runs, Pilo is built to work seamlessly in different contexts, including a browser extension. You can install the extension, give it a natural language prompt, and literally watch the agentic loop drive your browser in real time. It is one thing to read about accessibility trees and LLM decision making; it is entirely another to sit back and watch an AI navigate, click, and extract data across the live web right in front of your eyes. It provides a perfect sandbox to test tasks and understand exactly how Pilo thinks and operates.
The Engine Inside Tabstack
We built Pilo to be completely standalone. You can run it locally, plug in your own API keys, and use it to power your own applications without ever signing up for Tabstack. Our goal is to provide developers with a reliable, open source foundation for web agents.
However, building and scaling this kind of infrastructure in production is incredibly resource intensive. You have to manage compute pools, handle persistent browser sessions, and scale concurrent tasks. This exact challenge is why we built Tabstack in the first place.
Pilo is the core engine that underlies our /automate endpoint. If you want to leverage the power and resilience of Pilo but do not want to manage the underlying infrastructure, browser orchestration, or scaling logistics, Tabstack provides all of this out of the box.