Each main AI platform can now browse web sites autonomously. Chrome’s auto browse scrolls and clicks. ChatGPT Atlas fills types and completes purchases. Perplexity Comet researches throughout tabs. However none of those brokers sees your web site the way in which a human does.
That is Half 4 in a five-part sequence on optimizing web sites for the agentic net. Half 1 lined the evolution from web optimization to AAIO. Half 2 defined the way to get your content material cited in AI responses. Half 3 mapped the protocols forming the infrastructure layer. This text will get technical: how AI brokers really understand your web site, and what to construct for them.
The core perception is one which retains arising in my analysis: Essentially the most impactful factor you are able to do for AI agent compatibility is identical work net accessibility advocates have been pushing for many years. The accessibility tree, initially constructed for display readers, is changing into the first interface between AI brokers and your web site.
Based on the 2025 Imperva Dangerous Bot Report (Imperva is a cybersecurity firm), automated site visitors surpassed human site visitors for the primary time in 2024, constituting 51% of all net interactions. Not all of that’s agentic searching, however the course is evident: the non-human viewers on your web site is already bigger than the human one, and it’s rising. All through this text, we draw solely from official documentation, peer-reviewed analysis, and bulletins from the businesses constructing this infrastructure.
Three Methods Brokers See Your Web site
When a human visits your web site, they see colours, format, photographs, and typography. When an AI agent visits, it sees one thing completely completely different. Understanding what brokers really understand is the inspiration for constructing web sites that work for them.
The key AI platforms use three distinct approaches, and the variations have direct implications for the way you need to construction your web site.
Imaginative and prescient: Studying Screenshots
Anthropic’s Pc Use takes essentially the most literal strategy. Claude captures screenshots of the browser, analyzes the visible content material, and decides what to click on or sort primarily based on what it “sees.” It’s a steady suggestions loop: screenshot, cause, act, screenshot. The agent operates on the pixel degree, figuring out buttons by their visible look and studying textual content from the rendered picture.
Google’s Venture Mariner follows an analogous sample with what Google describes as an “observe-plan-act” loop: observe captures visible components and underlying code buildings, plan formulates motion sequences, and act simulates person interactions. Mariner achieved an 83.5% success fee on the WebVoyager benchmark.
The imaginative and prescient strategy works, but it surely’s computationally costly, delicate to format modifications, and restricted by what’s visually rendered on display.
ChatGPT Atlas makes use of ARIA tags, the identical labels and roles that help display readers, to interpret web page construction and interactive components.
Atlas is constructed on Chromium, however quite than analyzing rendered pixels, it queries the accessibility tree for components with particular roles (“button”, “hyperlink”) and accessible names. This is identical knowledge construction that display readers like VoiceOver and NVDA use to assist folks with visible disabilities navigate the online.
Microsoft’s Playwright MCP, the official MCP server for browser automation, takes the identical strategy. It offers accessibility snapshots quite than screenshots, giving AI fashions a structured illustration of the web page. Microsoft intentionally selected accessibility knowledge over visible rendering for his or her browser automation normal.
Hybrid: Each At As soon as
In apply, essentially the most succesful brokers mix approaches. OpenAI’s Pc-Utilizing Agent (CUA), which powers each Operator and Atlas, layers screenshot evaluation with DOM processing and accessibility tree parsing. It prioritizes ARIA labels and roles, falling again to textual content content material and structural selectors when accessibility knowledge isn’t obtainable.
Perplexity’s analysis confirms the identical sample. Their BrowseSafe paper, which particulars the security infrastructure behind Comet’s browser agent, describes utilizing “hybrid context administration combining accessibility tree snapshots with selective imaginative and prescient.”
Platform
Main Strategy
Particulars
Anthropic Pc Use
Imaginative and prescient (screenshots)
Screenshot, cause, act suggestions loop
Google Venture Mariner
Imaginative and prescient + code construction
Observe-plan-act with visible and structural knowledge
OpenAI Atlas
Accessibility tree
Explicitly makes use of ARIA tags and roles
OpenAI CUA
Hybrid
Screenshots + DOM + accessibility tree
Microsoft Playwright MCP
Accessibility tree
Accessibility snapshots, no screenshots
Perplexity Comet
Hybrid
Accessibility tree + selective imaginative and prescient
The sample is evident. Even platforms that began with vision-first approaches are incorporating accessibility knowledge. And the platforms optimizing for reliability and effectivity (Atlas, Playwright MCP) lead with the accessibility tree.
Your web site’s accessibility tree isn’t a compliance artifact. It’s more and more the first interface brokers use to grasp and work together along with your web site.
Final yr, earlier than the European Accessibility Act took impact, I half-joked that it will be ironic if the factor that lastly received folks to care about accessibility was AI brokers, not the folks accessibility was designed for. That’s now not a joke.
The Accessibility Tree Is Your Agent Interface
The accessibility tree is a simplified illustration of your web page’s DOM that browsers generate for assistive applied sciences. The place the total DOM comprises each div, span, type, and script, the accessibility tree strips away the noise and exposes solely what issues: interactive components, their roles, their names, and their states.
Because of this it really works so effectively for brokers. A typical web page’s DOM may comprise 1000’s of nodes. The accessibility tree reduces that to the weather a person (or agent) can really work together with: buttons, hyperlinks, type fields, headings, landmarks. For AI fashions that course of net pages inside a restricted context window, that discount is important.
Comply with WAI-ARIA greatest practices by including descriptive roles, labels, and states to interactive components like buttons, menus, and types. This helps ChatGPT acknowledge what every component does and work together along with your website extra precisely.
And:
Making your web site extra accessible helps ChatGPT Agent in Atlas perceive it higher.
Analysis knowledge backs this up. Essentially the most rigorous knowledge on this comes from a UC Berkeley and College of Michigan examine revealed for CHI 2026, the premier educational convention on human-computer interplay. The researchers examined Claude Sonnet 4.5 on 60 real-world net duties underneath completely different accessibility situations, accumulating 40.4 hours of interplay knowledge throughout 158,325 occasions. The outcomes had been placing:
Situation
Activity Success Charge
Avg. Completion Time
Commonplace (default)
78.33%
324.87 seconds
Keyboard-only
41.67%
650.91 seconds
Magnified viewport
28.33%
1,072.20 seconds
Below normal situations, the agent succeeded almost 80% of the time. Limit it to keyboard-only interplay (simulating how display reader customers navigate) and success drops to 42%, taking twice as lengthy. Limit the viewport (simulating magnification instruments), and success drops to twenty-eight%, taking up thrice as lengthy.
The paper identifies three classes of gaps:
Notion gaps: brokers can’t reliably entry display reader bulletins or ARIA state modifications that may inform them what occurred after an motion.
Cognitive gaps: brokers wrestle to trace activity state throughout a number of steps.
Motion gaps: brokers underutilize keyboard shortcuts and fail at interactions like drag-and-drop.
The implication is direct. Web sites that current a wealthy, well-labeled accessibility tree give brokers the knowledge they should succeed. Web sites that depend on visible cues, hover states, or complicated JavaScript interactions with out accessible alternate options create the situations for agent failure.
Perplexity’s search API structure paper from September 2025 reinforces this from the content material facet. Their indexing system prioritizes content material that’s “prime quality in each substance and type, with info captured in a way that preserves the unique content material construction and format.” Web sites “heavy on well-structured knowledge in record or desk type” profit from “extra formulaic parsing and extraction guidelines.” Construction isn’t simply useful. It’s what makes dependable parsing attainable.
Semantic HTML: The Agent Basis
The accessibility tree is constructed out of your HTML. Use semantic components, and the browser generates a helpful accessibility tree routinely. Skip them, and the tree is sparse or deceptive.
This isn’t new recommendation. Internet requirements advocates have been screaming “use semantic HTML” for 20 years. Not everybody listened. What’s new is that the viewers has expanded. It was once about display readers and a comparatively small share of customers. Now it’s about each AI agent that visits your web site.
Use native components. A component routinely seems within the accessibility tree with the position “button” and its textual content content material because the accessible identify. A
doesn't. The agent doesn’t understand it’s clickable.
Search flights
Label your types. Each enter wants an related label. Brokers learn labels to grasp what knowledge a area expects.
The autocomplete attribute deserves consideration. It tells brokers (and browsers) precisely what sort of knowledge a area expects, utilizing standardized values like identify, e mail, tel, street-address, and group. When an agent fills a type on somebody’s behalf, autocomplete attributes make the distinction between assured area mapping and guessing.
Set up heading hierarchy. Use h1 via h6 in logical order. Brokers use headings to grasp web page construction and find particular content material sections. Skip ranges (leaping from h1 to h4) create confusion about content material relationships.
Use landmark areas. HTML5 landmark components (
, , , , ) inform brokers the place they're on the web page. A component is unambiguously navigation. A
requires interpretation. Readability for the win, all the time.
Microsoft’s Playwright check brokers, launched in October 2025, generate check code that makes use of accessible selectors by default. When the AI generates a Playwright check, it writes:
const todoInput = web page.getByRole('textbox', { identify: 'What must be accomplished?' });
Not CSS selectors. Not XPath. Accessible roles and names. Microsoft constructed its AI testing instruments to seek out components the identical manner display readers do, as a result of it’s extra dependable.
The ultimate slide of my Conversion Lodge keynote about optimizing web sites for AI brokers. (Picture Credit score: Slobodan Manic)
ARIA: Helpful, Not Magic
OpenAI recommends ARIA (Accessible Wealthy Web Functions), the W3C normal for making dynamic net content material accessible. However ARIA is a complement, not a substitute. Like protein shakes: helpful on high of an actual weight loss plan, counterproductive as a alternative for precise meals.
If you should use a local HTML component or attribute with the semantics and conduct you require already inbuilt, as a substitute of re-purposing a component and including an ARIA position, state or property to make it accessible, then achieve this.
The truth that the W3C needed to make “don’t use ARIA” the primary rule of ARIA tells you every thing about how usually it will get misused.
Adrian Roselli, a acknowledged net accessibility skilled, raised an vital concern in his October 2025 evaluation of OpenAI’s steerage. He argues that recommending ARIA with out ample context dangers encouraging misuse. Web sites that use ARIA are typically much less accessible in line with WebAIM’s annual survey of the highest million web sites, as a result of ARIA is commonly utilized incorrectly as a band-aid over poor HTML construction. Roselli warns that OpenAI’s steerage might incentivize practices like keyword-stuffing in aria-label attributes, the identical sort of gaming that plagued meta key phrases in early web optimization.
The appropriate strategy is layered:
Begin with semantic HTML. Use , , , , and different native components. These work accurately by default.
Add ARIA when native HTML isn’t sufficient. Customized elements that don’t have HTML equivalents (tab panels, tree views, disclosure widgets) want ARIA roles and states to be comprehensible.
Use ARIA states for dynamic content material. When JavaScript modifications the web page, ARIA attributes talk what occurred:
Hold aria-label descriptive and sincere. Use it to supply context that isn’t seen on display, like distinguishing between a number of “Delete” buttons on the identical web page. Don’t stuff it with key phrases.
The precept is identical one which applies to good web optimization: construct for the person first, optimize for the system second. Semantic HTML is constructing for the person. ARIA is fine-tuning for edge circumstances the place HTML falls quick.
The Rendering Query
Browser-based brokers like Chrome auto browse, ChatGPT Atlas, and Perplexity Comet run on Chromium. They execute JavaScript. They'll render your single-page utility.
However not every thing that visits your web site is a full browser agent.
AI crawlers (PerplexityBot, OAI-SearchBot, ClaudeBot) index your content material for retrieval and quotation. Many of those crawlers don't execute client-side JavaScript. In case your web page is a clean till React hydrates, these crawlers see an empty web page. Your content material is invisible to the AI search ecosystem.
Half 2 of this sequence lined the quotation facet: AI methods choose fragments from listed content material. In case your content material isn’t within the preliminary HTML, it’s not within the index. If it’s not within the index, it doesn’t get cited. Server-side rendering isn’t only a efficiency optimization.
It’s a visibility requirement.
Even for full browser brokers, JavaScript-heavy web sites create friction. Dynamic content material that masses after interactions, infinite scroll that by no means indicators completion, and types that reconstruct themselves after every enter all create alternatives for brokers to lose monitor of state. The A11y-CUA analysis attributed a part of agent failure to “cognitive gaps”: brokers dropping monitor of what’s occurring throughout complicated multi-step interactions. Easier, extra predictable rendering reduces these failures.
Microsoft’s steerage from Half 2 applies right here immediately: “Don’t conceal vital solutions in tabs or expandable menus: AI methods might not render hidden content material, so key particulars might be skipped.” If info issues, put it within the seen HTML. Don’t require interplay to disclose it.
Sensible rendering priorities:
Server-side render or pre-render content material pages. If an AI crawler can’t see it, it doesn’t exist within the AI ecosystem.
Keep away from blank-shell SPAs for content material pages. Frameworks like Subsequent.js (which powers this web site), Nuxt, and Astro make SSR simple.
Don’t conceal crucial info behind interactions. Costs, specs, availability, and key particulars needs to be within the preliminary HTML, not behind accordions or tabs.
Use normal hyperlinks for navigation. Consumer-side routing that doesn’t replace the URL or makes use of onClick handlers as a substitute of actual hyperlinks breaks agent navigation.
Testing Your Agent Interface
You wouldn’t ship an internet site with out testing it in a browser. Testing how brokers understand your web site is changing into equally vital.
Display reader testing is one of the best proxy. If VoiceOver (macOS), NVDA (Home windows), or TalkBack (Android) can navigate your web site efficiently, figuring out buttons, studying type labels, and following the content material construction, brokers can possible do the identical. Each audiences depend on the identical accessibility tree. This isn’t an ideal proxy (brokers have capabilities display readers don’t, and vice versa), but it surely catches the vast majority of points.
Microsoft’s Playwright MCP offers direct accessibility snapshots. If you wish to see precisely what an AI agent sees, Playwright MCP generates structured accessibility snapshots of any web page. These snapshots strip away visible presentation and present you the roles, names, and states that brokers work with. Revealed as @playwright/mcp on npm, it’s essentially the most direct technique to view your web site via an agent’s eyes.
The output appears to be like one thing like this (simplified):
In case your crucial interactive components don’t seem within the snapshot, or seem with out helpful names, brokers will wrestle along with your web site.
Browserbase’s Stagehand (v3, launched October 2025, and humbly self-described as “one of the best browser automation framework”) offers one other angle. It parses each DOM and accessibility bushes, and its self-healing execution adapts to DOM modifications in actual time. It’s helpful for testing whether or not brokers can full particular workflows in your web site, like filling a type or finishing a checkout.
The Lynx browser is a low-tech choice price making an attempt. It’s a text-only browser that strips away all visible rendering, displaying you roughly what a non-visual agent parses. A trick I picked up from Jes Scholz on the podcast.
A sensible testing workflow:
Run VoiceOver or NVDA via your web site’s key person flows. Are you able to full the core duties with out imaginative and prescient?
Generate Playwright MCP accessibility snapshots of crucial pages. Are interactive components labeled and identifiable?
View your web page supply. Is the first content material within the HTML, or does it require JavaScript to render?
Load your web page in Lynx or disable CSS and verify if the content material order and hierarchy nonetheless make sense. Brokers don’t see your format.
A Guidelines For Your Improvement Workforce
Should you’re sharing this text along with your builders (and you need to), right here’s the prioritized implementation record. Ordered by impression and energy, beginning with the modifications that have an effect on essentially the most agent interactions for the least work.
Excessive impression, low effort:
Use native HTML components. for actions, for hyperlinks, for dropdowns. Substitute
patterns wherever they exist.
Label each type enter. Affiliate components with inputs utilizing the for attribute. Add autocomplete attributes with normal values.
Server-side render content material pages. Guarantee major content material is within the preliminary HTML response.
Excessive impression, reasonable effort:
Implement landmark areas. Wrap content material in , , , and components. Add aria-label when a number of landmarks of the identical sort exist on the identical web page.
Repair heading hierarchy. Guarantee a single h1, with h2 via h6 in logical order with out skipping ranges.
Transfer crucial content material out of hidden containers. Costs, specs, and key particulars mustn't require clicks or interactions to disclose.
Reasonable impression, low effort:
Add ARIA states to dynamic elements. Use aria-expanded, aria-controls, and aria-hidden for menus, accordions, and toggles.
Use descriptive hyperlink textual content. “Learn the total report” as a substitute of “Click on right here.” Brokers use hyperlink textual content to grasp the place hyperlinks lead.
Take a look at with a display reader. Make it a part of your QA course of, not a one-time audit.
Key Takeaways
AI brokers understand web sites via three approaches: imaginative and prescient, DOM parsing, and the accessibility tree. The trade is converging on the accessibility tree as essentially the most dependable technique. OpenAI Atlas, Microsoft Playwright MCP, and Perplexity’s Comet all depend on accessibility knowledge.
Internet accessibility is now not nearly compliance. The accessibility tree is the literal interface AI brokers use to grasp your web site. The UC Berkeley/College of Michigan examine exhibits agent success charges drop considerably when accessibility options are constrained.
Semantic HTML is the inspiration. Native components like , , , and routinely create a helpful accessibility tree. No framework required. No ARIA wanted for the fundamentals.
ARIA is a complement, not a substitute. Use it for dynamic states and customized elements. However begin with semantic HTML and add ARIA solely the place native components fall quick. Misused ARIA makes web sites much less accessible, no more.
Server-side rendering is an agent visibility requirement. AI crawlers that don’t execute JavaScript can’t see content material in blank-shell SPAs. In case your content material isn’t within the preliminary HTML, it doesn’t exist within the AI ecosystem.
Display reader testing is one of the best proxy for agent compatibility. If VoiceOver or NVDA can navigate your web site, brokers in all probability can too. For direct inspection, Playwright MCP accessibility snapshots present precisely what brokers see.
The primary three elements of this sequence lined why the shift issues, the way to get cited, and what protocols are being constructed. This text lined the implementation layer. The encouraging information is that these aren’t separate workstreams. Accessible, well-structured web sites carry out higher for people, rank higher in search, get cited extra usually by AI, and work higher for brokers. It’s the identical work serving 4 audiences.
And the work builds on itself. The semantic HTML and structured knowledge lined listed below are precisely what WebMCP builds on for its declarative type strategy. The accessibility tree your web site exposes at this time turns into the inspiration for the structured software interfaces of tomorrow.
Up subsequent in Half 5: the commerce layer. How Stripe, Shopify, and OpenAI are constructing the infrastructure for AI brokers to finish purchases, and what it means on your checkout move.