Feature Request F12

Agent Vision & Browser Automation

Give Claude Code and Forge agents eyes and hands in the browser

Priority: High Effort: Medium (4 phases) Owner: Jason Repo: forge Date: Feb 25, 2026

Problem Statement

Today, Claude Code and Forge agents are blind. They can write code, publish pages, and call APIs — but they can't verify what a page actually looks like, whether a button works, or if a layout broke. Visual verification requires the human to open a browser, screenshot, and paste back.

This creates three gaps:

Architecture

Claude Code (local) Forge VPS Agents │ │ │ MCP tool calls │ MCP tool calls ▼ ▼ ┌─────────────────────────────────────────────┐ │ Browser MCP Server │ │ ┌──────────┬──────────┬──────────────────┐ │ │ │screenshot│ navigate │ click / fill │ │ │ │ tool │ tool │ tool │ │ │ └──────────┴──────────┴──────────────────┘ │ │ ┌──────────────┐ │ │ │ Auth Store │ │ │ │ (cookies, │ │ │ │ sessions) │ │ │ └──────────────┘ │ │ │ │ │ ┌──────────────┐ │ │ │ Playwright │ │ │ │ (Chromium) │ │ │ └──────────────┘ │ └─────────────────────────────────────────────┘ │ ▼ Live web pages (ideas.asapai.net, dashboard, etc.)
Key Decision

Playwright as the base layer. It's proven, headless-capable, supports auth persistence, and runs on both Windows (local) and Linux (VPS). Wrapped in an MCP server so any agent — Claude Code, Forge commander, outreach agents — can call it as a tool.

Key Decision

Implementation lives in Forge repo. Browser automation is agent infrastructure, not app code. folio-saas consumes it via MCP config. outreach-agents consume it for prospect research and LinkedIn interaction.

MCP Server Tools

The Browser MCP server exposes these tools to any connected agent:

Tool Parameters Returns Use Case
screenshot url, selector?, fullPage? PNG image (base64 or file path) Visual verification after publish
navigate url Page title, status code, final URL Check redirects, verify page loads
click selector Success/failure, new page state Test buttons, wizard steps, nav
fill selector, value Success/failure Fill forms, login fields
evaluate javascript Return value Check DOM state, read elements
login site (enum), credentials? Auth cookie stored Authenticate to Supabase, dashboard
pdf url PDF file path Generate PDF from page
wait_for selector, timeout? Found/timeout Wait for dynamic content to load

Implementation Phases

1 Playwright MCP Server (Forge repo) Not Started

Build the core MCP server that wraps Playwright in a tool interface.

Deliverables

  • forge/mcp-servers/browser/ — MCP server package (TypeScript or Python)
  • 8 tools: screenshot, navigate, click, fill, evaluate, login, pdf, wait_for
  • Auth store: persistent cookie jar for Supabase sessions
  • Login presets: nowpage-dashboard (Supabase auth), vercel (if needed)
  • Screenshot output: saves to /tmp/screenshots/ with timestamp naming

Acceptance Criteria

  • Can screenshot https://ideas.asapai.net/day-1-complete-v2 and return a PNG
  • Can login to NowPage dashboard and screenshot the page list
  • Can click "+ Build Page" button and screenshot the wizard
  • Works on Windows (local dev) and Linux (VPS) with headless Chromium
2 Claude Code Integration Not Started

Wire the MCP server into Claude Code so sessions can see pages.

Deliverables

  • MCP config entry in .claude/mcp.json pointing to the browser server
  • Post-publish verification pattern: publish → screenshot → confirm
  • Wizard testing: navigate dashboard → click Build Page → screenshot each step

Integration Pattern

// After publishing a page:
// 1. Call screenshot tool with the live URL
// 2. Claude sees the PNG and can verify layout
// 3. If broken, fix and re-publish

// After building a feature:
// 1. Login to dashboard
// 2. Navigate to feature
// 3. Screenshot and verify
3 Forge VPS Headless Not Started

Deploy the browser server on the VPS so Forge agents have vision.

Deliverables

  • Headless Chromium installed on VPS (apt install chromium-browser)
  • MCP server runs as systemd service alongside other Forge services
  • Forge commander can call browser tools in agent loops
  • Outreach agents can research prospect websites and take screenshots
  • Memory budget: cap at 512MB for Chromium (VPS has limited RAM)

Acceptance Criteria

  • Forge agent can publish a page AND verify it visually in one loop
  • Outreach scout agent can visit a prospect's website and extract info
  • Chromium restarts cleanly if it crashes (systemd restart policy)
4 Antigravity Evaluation Not Started

Evaluate whether Google Antigravity (or browser-use) provides enough value over raw Playwright to justify switching.

Evaluation Criteria

  • Semantic understanding — can it identify "the login button" without a CSS selector?
  • MCP compatibility — does it have an MCP server or can we wrap it?
  • Resource overhead — memory/CPU cost vs raw Playwright on VPS
  • Auth handling — does it manage sessions better than our Playwright wrapper?
  • Maturity — is it production-ready or experimental?
Decision Gate

If Antigravity adds meaningful capability (especially semantic element finding), migrate. If it's just a wrapper with overhead, stay on raw Playwright. Document decision as an HC page.

Priority Use Cases

1. Post-Publish Visual QA

After hc-publish.js publishes a page, automatically screenshot the live URL and present to the agent/user. Catches broken layouts, missing styles, failed JS rendering.

2. F1 Wizard Verification

Login to dashboard → click "+ Build Page" → screenshot each wizard step → verify forms render, preview works, publish succeeds. This is the immediate blocker — we don't know if F1 actually works.

3. Outreach Prospect Research

Scout agent visits prospect's website, screenshots their current setup, extracts key info (tech stack, team size, content quality) to inform outreach messaging.

4. Regression Testing

Before deploying code changes, screenshot key pages (registries, dashboard, public pages) and compare with previous screenshots. Flag visual regressions.

5. HC Page Quality Gate

After generating an HC page, render it in a browser to verify: metadata parses correctly, JS executes, responsive layout works, fonts load. Gate before publish.

Non-Goals (v1)

Dependencies & Risks

DependencyRiskMitigation
Playwright on Windows Some features behave differently on Win vs Linux Test on both platforms; headless mode minimizes differences
VPS memory (2GB total) Chromium is memory-hungry Cap at 512MB, single browser context, auto-close after 60s idle
Supabase auth cookies Session tokens expire Auth store refreshes automatically; login tool re-authenticates on 401
Antigravity maturity May be too experimental for production Phase 4 is evaluation-only; Playwright is the safe base

Success Metrics

Related Resources