F12: Agent Vision & Browser Automation

Problem Statement

Today, Claude Code and Forge agents are blind. They can write code, publish pages, and call APIs — but they can't verify what a page actually looks like, whether a button works, or if a layout broke. Visual verification requires the human to open a browser, screenshot, and paste back.

This creates three gaps:

No post-publish verification — pages go live without visual QA
No wizard/UI testing — F1 Page Builder "shipped" but nobody confirmed it actually works
No agent autonomy — Forge agents can't navigate, click, or interact with web UIs

Architecture

Claude Code (local) Forge VPS Agents │ │ │ MCP tool calls │ MCP tool calls ▼ ▼ ┌─────────────────────────────────────────────┐ │ Browser MCP Server │ │ ┌──────────┬──────────┬──────────────────┐ │ │ │screenshot│ navigate │ click / fill │ │ │ │ tool │ tool │ tool │ │ │ └──────────┴──────────┴──────────────────┘ │ │ ┌──────────────┐ │ │ │ Auth Store │ │ │ │ (cookies, │ │ │ │ sessions) │ │ │ └──────────────┘ │ │ │ │ │ ┌──────────────┐ │ │ │ Playwright │ │ │ │ (Chromium) │ │ │ └──────────────┘ │ └─────────────────────────────────────────────┘ │ ▼ Live web pages (ideas.asapai.net, dashboard, etc.)

Key Decision

Playwright as the base layer. It's proven, headless-capable, supports auth persistence, and runs on both Windows (local) and Linux (VPS). Wrapped in an MCP server so any agent — Claude Code, Forge commander, outreach agents — can call it as a tool.

Key Decision

Implementation lives in Forge repo. Browser automation is agent infrastructure, not app code. folio-saas consumes it via MCP config. outreach-agents consume it for prospect research and LinkedIn interaction.

MCP Server Tools

The Browser MCP server exposes these tools to any connected agent:

Tool	Parameters	Returns	Use Case
`screenshot`	url, selector?, fullPage?	PNG image (base64 or file path)	Visual verification after publish
`navigate`	url	Page title, status code, final URL	Check redirects, verify page loads
`click`	selector	Success/failure, new page state	Test buttons, wizard steps, nav
`fill`	selector, value	Success/failure	Fill forms, login fields
`evaluate`	javascript	Return value	Check DOM state, read elements
`login`	site (enum), credentials?	Auth cookie stored	Authenticate to Supabase, dashboard
`pdf`	url	PDF file path	Generate PDF from page
`wait_for`	selector, timeout?	Found/timeout	Wait for dynamic content to load

Implementation Phases

1 Playwright MCP Server (Forge repo) Not Started

Build the core MCP server that wraps Playwright in a tool interface.

Deliverables

forge/mcp-servers/browser/ — MCP server package (TypeScript or Python)
8 tools: screenshot, navigate, click, fill, evaluate, login, pdf, wait_for
Auth store: persistent cookie jar for Supabase sessions
Login presets: nowpage-dashboard (Supabase auth), vercel (if needed)
Screenshot output: saves to /tmp/screenshots/ with timestamp naming

Acceptance Criteria

Can screenshot https://ideas.asapai.net/day-1-complete-v2 and return a PNG
Can login to NowPage dashboard and screenshot the page list
Can click "+ Build Page" button and screenshot the wizard
Works on Windows (local dev) and Linux (VPS) with headless Chromium

2 Claude Code Integration Not Started

Wire the MCP server into Claude Code so sessions can see pages.

Deliverables

MCP config entry in .claude/mcp.json pointing to the browser server
Post-publish verification pattern: publish → screenshot → confirm
Wizard testing: navigate dashboard → click Build Page → screenshot each step

Integration Pattern

// After publishing a page:
// 1. Call screenshot tool with the live URL
// 2. Claude sees the PNG and can verify layout
// 3. If broken, fix and re-publish

// After building a feature:
// 1. Login to dashboard
// 2. Navigate to feature
// 3. Screenshot and verify

3 Forge VPS Headless Not Started

Deploy the browser server on the VPS so Forge agents have vision.

Deliverables

Headless Chromium installed on VPS (apt install chromium-browser)
MCP server runs as systemd service alongside other Forge services
Forge commander can call browser tools in agent loops
Outreach agents can research prospect websites and take screenshots
Memory budget: cap at 512MB for Chromium (VPS has limited RAM)

Acceptance Criteria

Forge agent can publish a page AND verify it visually in one loop
Outreach scout agent can visit a prospect's website and extract info
Chromium restarts cleanly if it crashes (systemd restart policy)

4 Antigravity Evaluation Not Started

Evaluate whether Google Antigravity (or browser-use) provides enough value over raw Playwright to justify switching.

Evaluation Criteria

Semantic understanding — can it identify "the login button" without a CSS selector?
MCP compatibility — does it have an MCP server or can we wrap it?
Resource overhead — memory/CPU cost vs raw Playwright on VPS
Auth handling — does it manage sessions better than our Playwright wrapper?
Maturity — is it production-ready or experimental?

Decision Gate

If Antigravity adds meaningful capability (especially semantic element finding), migrate. If it's just a wrapper with overhead, stay on raw Playwright. Document decision as an HC page.

Priority Use Cases

1. Post-Publish Visual QA

After hc-publish.js publishes a page, automatically screenshot the live URL and present to the agent/user. Catches broken layouts, missing styles, failed JS rendering.

2. F1 Wizard Verification

Login to dashboard → click "+ Build Page" → screenshot each wizard step → verify forms render, preview works, publish succeeds. This is the immediate blocker — we don't know if F1 actually works.

3. Outreach Prospect Research

Scout agent visits prospect's website, screenshots their current setup, extracts key info (tech stack, team size, content quality) to inform outreach messaging.

4. Regression Testing

Before deploying code changes, screenshot key pages (registries, dashboard, public pages) and compare with previous screenshots. Flag visual regressions.

5. HC Page Quality Gate

After generating an HC page, render it in a browser to verify: metadata parses correctly, JS executes, responsive layout works, fonts load. Gate before publish.

Non-Goals (v1)

Full E2E test suite — this is agent tooling, not a testing framework
Browser extension replacement — Chrome extension still works for human-in-the-loop
Multi-tab orchestration — single page context per tool call is sufficient
Video recording — screenshots only for v1 (video is expensive on VPS)
Mobile emulation — desktop viewport only for v1

Dependencies & Risks

Dependency	Risk	Mitigation
Playwright on Windows	Some features behave differently on Win vs Linux	Test on both platforms; headless mode minimizes differences
VPS memory (2GB total)	Chromium is memory-hungry	Cap at 512MB, single browser context, auto-close after 60s idle
Supabase auth cookies	Session tokens expire	Auth store refreshes automatically; login tool re-authenticates on 401
Antigravity maturity	May be too experimental for production	Phase 4 is evaluation-only; Playwright is the safe base

Success Metrics

Visual verification coverage: 100% of published pages get auto-screenshotted
Wizard confirmed: F1 Page Builder verified working (or bugs found and fixed)
Agent autonomy: Forge publish loop includes visual QA step without human intervention
Time to verify: <10 seconds from publish to screenshot available

Agent Vision & Browser Automation

Problem Statement

Architecture

MCP Server Tools

Implementation Phases

Deliverables

Acceptance Criteria

Deliverables

Integration Pattern

Deliverables

Acceptance Criteria

Evaluation Criteria

Priority Use Cases

1. Post-Publish Visual QA

2. F1 Wizard Verification

3. Outreach Prospect Research

4. Regression Testing

5. HC Page Quality Gate

Non-Goals (v1)

Dependencies & Risks

Success Metrics

Related Resources