Give Claude Code and Forge agents eyes and hands in the browser
Today, Claude Code and Forge agents are blind. They can write code, publish pages, and call APIs — but they can't verify what a page actually looks like, whether a button works, or if a layout broke. Visual verification requires the human to open a browser, screenshot, and paste back.
This creates three gaps:
Playwright as the base layer. It's proven, headless-capable, supports auth persistence, and runs on both Windows (local) and Linux (VPS). Wrapped in an MCP server so any agent — Claude Code, Forge commander, outreach agents — can call it as a tool.
Implementation lives in Forge repo. Browser automation is agent infrastructure, not app code. folio-saas consumes it via MCP config. outreach-agents consume it for prospect research and LinkedIn interaction.
The Browser MCP server exposes these tools to any connected agent:
| Tool | Parameters | Returns | Use Case |
|---|---|---|---|
screenshot |
url, selector?, fullPage? | PNG image (base64 or file path) | Visual verification after publish |
navigate |
url | Page title, status code, final URL | Check redirects, verify page loads |
click |
selector | Success/failure, new page state | Test buttons, wizard steps, nav |
fill |
selector, value | Success/failure | Fill forms, login fields |
evaluate |
javascript | Return value | Check DOM state, read elements |
login |
site (enum), credentials? | Auth cookie stored | Authenticate to Supabase, dashboard |
pdf |
url | PDF file path | Generate PDF from page |
wait_for |
selector, timeout? | Found/timeout | Wait for dynamic content to load |
Build the core MCP server that wraps Playwright in a tool interface.
forge/mcp-servers/browser/ — MCP server package (TypeScript or Python)nowpage-dashboard (Supabase auth), vercel (if needed)/tmp/screenshots/ with timestamp namingWire the MCP server into Claude Code so sessions can see pages.
.claude/mcp.json pointing to the browser server// After publishing a page:
// 1. Call screenshot tool with the live URL
// 2. Claude sees the PNG and can verify layout
// 3. If broken, fix and re-publish
// After building a feature:
// 1. Login to dashboard
// 2. Navigate to feature
// 3. Screenshot and verify
Deploy the browser server on the VPS so Forge agents have vision.
apt install chromium-browser)Evaluate whether Google Antigravity (or browser-use) provides enough value over raw Playwright to justify switching.
If Antigravity adds meaningful capability (especially semantic element finding), migrate. If it's just a wrapper with overhead, stay on raw Playwright. Document decision as an HC page.
After hc-publish.js publishes a page, automatically screenshot the live URL and present to the agent/user. Catches broken layouts, missing styles, failed JS rendering.
Login to dashboard → click "+ Build Page" → screenshot each wizard step → verify forms render, preview works, publish succeeds. This is the immediate blocker — we don't know if F1 actually works.
Scout agent visits prospect's website, screenshots their current setup, extracts key info (tech stack, team size, content quality) to inform outreach messaging.
Before deploying code changes, screenshot key pages (registries, dashboard, public pages) and compare with previous screenshots. Flag visual regressions.
After generating an HC page, render it in a browser to verify: metadata parses correctly, JS executes, responsive layout works, fonts load. Gate before publish.
| Dependency | Risk | Mitigation |
|---|---|---|
| Playwright on Windows | Some features behave differently on Win vs Linux | Test on both platforms; headless mode minimizes differences |
| VPS memory (2GB total) | Chromium is memory-hungry | Cap at 512MB, single browser context, auto-close after 60s idle |
| Supabase auth cookies | Session tokens expire | Auth store refreshes automatically; login tool re-authenticates on 401 |
| Antigravity maturity | May be too experimental for production | Phase 4 is evaluation-only; Playwright is the safe base |