RALPH v5 — Discovery Brief

Date: 2026-02-26  |  Author: Claude (Forge Agent) + Jason MacDonald  |  Status: Discovery Complete — Ready for Design Phase

1. What Is Ralph Today?

Ralph is Forge's autonomous coding agent — an external bash loop that spawns fresh Claude Code instances per task. It's the build factory that executes queued coding work while Jason sleeps, works on other things, or is between sessions.

Current Architecture (v4 — Production)

┌─────────────────────────────────────────────┐
│ systemd timer (every 60s)                    │
│ → ralph-poller.sh                            │
│   ├── Check Supabase ralph_queue             │
│   ├── Call Gatekeeper (:5015/v1/gate)        │
│   ├── Atomic task claim (PATCH status)       │
│   └── nohup ralph.sh "project" "task" &      │
└─────────────────────────────────────────────┘
           ↓ spawns fresh process
┌─────────────────────────────────────────────┐
│ ralph.sh (672 lines)                         │
│ ├── Model Router (4-tier cascade)            │
│ │   ├── T0: Claude Max CLI (free, preferred) │
│ │   ├── T1: Ollama Qwen 7B (free, simple)   │
│ │   ├── T2: LiteLLM API (paid, budget-capped)│
│ │   └── T3: Poll & Wait (24h max)           │
│ ├── Self-healing (env strip, escalation)     │
│ ├── Circuit breaker (3 consecutive fails)    │
│ ├── Factory Pulse events → Supabase          │
│ ├── Host mode (forge) / Docker mode (other)  │
│ └── Queue drain (chain next task on success) │
└─────────────────────────────────────────────┘
           ↓ fresh Claude instance
┌─────────────────────────────────────────────┐
│ claude -p "task description"                 │
│ ├── Fresh 200k context window                │
│ ├── Reads CLAUDE.md + skills automatically   │
│ ├── Executes task, writes code, runs tests   │
│ ├── Git commit on success                    │
│ └── Exits → loop continues                   │
└─────────────────────────────────────────────┘

What's Working

What's NOT Working / Missing


2. History: From PRD v2.1 to v4

January 2026: PRD v2.1 Written

January-February 2026: Architecture Pivot

What the PRD Got Right

What the PRD Proposed That Was Superseded

PRD ProposedWhat Actually ExistsWhy Different
Claude Service (port 5001)claude -p direct CLIClaude Code IS the service
Bash Service (port 5004)Claude Code bash toolClaude Code handles this natively
GitHub Service (port 5005)git commands in ralph.sh + deploy.shNo wrapper needed
Router Service (port 5002)ClawdRouter (8080) + model-router.shMore sophisticated cascade
Credit Service (port 5003)budget_log table + budget.yamlSimpler, config-driven
Skills Service (port 5006).claude/skills/ native loadingClaude Code loads skills natively
Service Launchersystemd units + guardianProduction-grade vs dev scripts
Doc Generator (port 5007)/discover → /design → /plan skillsManual flow, not automated
Discussion Service (port 5008)Human conversation in Claude CodeNot automated
Research Service (port 5009)/discover skillSingle-agent, not 4-parallel
Quick Mode (port 5012)Complexity routing via GatekeeperSimple tasks auto-route to cheap models

What the PRD Proposed That's Still Unbuilt

ComponentStatusStill Needed?
Orchestrator subagent spawningNot builtYES — quality improvement
4-document structure (PROJECT/REQ/ROADMAP/STATE)Not builtMAYBE — skills flow works
Automated PRD pipelineNot builtYES — biggest gap
UAT ServiceNot builtYES — quality gate
Session Manager / HANDOFF.mdPartial (SESSION-QUEUE.md)MAYBE — current approach works
Practice modes (progressive trust)Not builtYES — trust_level column unused
Multi-node distributed RalphNot builtYES — Jason mentioned this

3. Competitive Landscape: Build Orchestrators in 2026

What Exists Now

ToolApproachStrengthsWeaknesses
Claude Code CLISingle agent, interactiveBest quality, full IDENo automated loops
Ralph v4 (Forge)External bash loop, queue-drivenProduction-tested, self-healingNo planning pipeline, single node
DevinFull autonomous agentEnd-to-end, cloud-native$500/mo, black box, variable quality
Cursor AgentIDE-integrated autonomousGood UX, integratedTied to Cursor IDE
AiderCLI pair programmerSimple, effectiveNo orchestration, no queue
Sweep AIGitHub-integratedAuto-PR from issuesLimited scope, GitHub-only
OpenHands (ex-Devin OSS)Open-source autonomousFree, customizableComplex setup, less reliable
SWE-AgentResearch-grade agentWell-studiedAcademic, not production

Ralph's Unique Position

Ralph is the only build orchestrator that:

  1. Uses queue-based task dispatch (not just PR-based or chat-based)
  2. Has 4-tier model cascade with budget controls
  3. Has self-healing (circuit breaker, env healing, escalation)
  4. Integrates with a personal AI operating system (Forge)
  5. Is already in production processing real tasks
  6. Supports both host and containerized execution modes

Key Insight

Ralph isn't competing with Devin or Cursor — it's a personal build factory optimized for one operator (Jason). The goal isn't general-purpose autonomy but leveraged output for a specific operator.


4. Gap Analysis: What Ralph v5 Should Address

Gap 1: No Planning Pipeline (CRITICAL)

Current: Tasks are manually written and queued. Quality depends entirely on how well the human wrote the task description.

Joe's #4: "Ralph amplifies whatever you feed it — garbage in, garbage out."

Impact: Bad task descriptions → wasted tokens, failed attempts, scope creep.

Solution: Automated PRD pipeline — task description → structured plan → decomposed subtasks → approval gate → execution.

Gap 2: No Research Before Building (HIGH)

Current: Ralph jumps straight into coding. No exploration of existing patterns, libraries, or architecture.

PRD v2.1 proposed: 4 parallel research agents (stack, feature, architecture, pitfall).

Impact: Suboptimal implementations, reinventing wheels, missing best practices.

Solution: Pre-execution research phase, at minimum a quick codebase scan + existing pattern check.

Gap 3: No UAT/QA Loop (HIGH)

Current: Tests pass → task complete. No human verification step. No auto-debug on failure feedback.

Impact: Tasks "pass" but don't actually work as intended. No learning from rejection.

Solution: Optional UAT gate — present changes, collect feedback, auto-debug if rejected.

Gap 4: Single Node Only (MEDIUM)

Current: Ralph runs only on the Hetzner VPS at 178.156.253.142.

Jason mentioned: "Working in a different session on standing up as a separate node."

Impact: Can't scale, can't run while VPS is busy, can't run on other machines.

Solution: Distributed Ralph — queue-based architecture already supports this. Need: API key management, git repo access, result reporting.

Gap 5: Progressive Trust Not Enforced (MEDIUM)

Current: trust_level column exists in ralph_queue but is never checked.

Joe's #7: "Don't start with the jackhammer."

Impact: All tasks treated equally — no distinction between supervised and unsupervised.

Solution: Enforce trust levels: supervised (human watches) → attended (periodic check) → unattended (overnight).

Gap 6: No Task Progress Tracking (LOW)

Current: Binary pass/fail per attempt. No visibility into what Ralph is doing mid-task.

Impact: Can't tell if task is 90% done or completely stuck until it finishes.

Solution: Progress events (started, coding, testing, reviewing) emitted during execution.

Gap 7: Subagent Architecture (LOW)

Current: Single Claude instance per task. No planner/coder/reviewer/tester decomposition.

PRD v2.1 proposed: Full subagent architecture with tool scoping.

Impact: Context budget less efficient, no separation of concerns.

Solution: Claude Code's Task tool already supports subagents. Could be enabled via CLAUDE.md instructions.


5. Build vs Buy vs Extend

Recommendation: EXTEND (v4 → v5)

Ralph v4 is production-tested and working. The foundation is solid:

Don't rebuild. Extend what works. The PRD v2.1's 13-microservice architecture was the right instinct but over-engineered — Claude Code CLI already provides most of those capabilities natively.

What to Build (v5 Additions)

ComponentEffortImpactPriority
Planning Pipeline2-3 daysEliminates garbage-inP0
Pre-execution Research1-2 daysBetter implementationsP1
UAT Gate1-2 daysQuality verificationP1
Multi-node Support3-5 daysScale + availabilityP2
Progressive Trust0.5 daySafety railsP2
Progress Events0.5 dayBetter visibilityP3
Subagent Prompting1 dayContext efficiencyP3

What NOT to Build


6. Proposed Architecture: Ralph v5

┌─────────────────────────────────────────────────────────┐
│ RALPH v5: Planned Autonomous Build Factory               │
│                                                          │
│ NEW: Planning Pipeline                                   │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │
│ │ Task     │→│ Research  │→│ Plan     │→│ Approve  │    │
│ │ Intake   │ │ (quick)   │ │ Decompose│ │ Gate     │    │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘    │
│                                           ↓              │
│ EXISTING: Execution Engine (v4)                          │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │
│ │ Poller   │→│ Gatekeeper│→│ Ralph.sh │→│ Claude   │    │
│ │ (60s)    │ │ (quality) │ │ (cascade)│ │ (fresh)  │    │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘    │
│                                           ↓              │
│ NEW: Quality Loop                                        │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐                  │
│ │ Tests    │→│ UAT Gate │→│ Deploy   │                  │
│ │ (auto)   │ │ (optional)│ │ (merge)  │                  │
│ └──────────┘ └──────────┘ └──────────┘                  │
│                                                          │
│ NEW: Multi-node (Future)                                 │
│ ┌──────────┐ ┌──────────┐                                │
│ │ Node 1   │ │ Node 2   │  ← Same queue, different VPS  │
│ │ (Hetzner)│ │ (Future) │                                │
│ └──────────┘ └──────────┘                                │
└─────────────────────────────────────────────────────────┘

7. Risks & Mitigations

RiskLikelihoodImpactMitigation
Over-engineering v5 like v2.1 wasHIGHWasted effortKeep it bash + Claude Code. No new microservices unless truly needed.
Planning pipeline slows throughputMEDIUMReduced task velocityMake planning optional per trust level. Simple tasks skip planning.
Multi-node adds complexityMEDIUMDebugging nightmaresStart with second node as read-only worker, graduate to full autonomy.
Claude Max API changesLOWBreaking changesmodel-router.sh already handles fallbacks.
Budget overruns on paid tiersLOWCost surpriseAlready mitigated: daily cap, per-task cap, circuit breaker.

8. Joe's 7 Mistakes — Current Status

#Mistakev4 Statusv5 Target
1Plugin ≠ Real Ralph (context rot)External bash loop, fresh contextMaintain
2Vague criteriaTests must passAdd: lint + typecheck gates
3Tasks too largeGatekeeper decomposes, but inconsistentPlanning pipeline with auto-decomposition
4Skip planningNo automated planningPlanning pipeline (P0)
5No feedback loopsTests auto-runAdd: lint, typecheck, review agent
6Infinite iterationsMax 5 attempts, circuit breakerMaintain
7No practice firsttrust_level column exists, unusedEnforce progressive trust

9. Next Steps

  1. Design Phase (/design) — Architecture for v5 additions (planning pipeline, UAT gate, progressive trust)
  2. Plan Phase (/plan) — Break into Ralph-executable tasks
  3. Build — Ralph builds Ralph v5 (meta! but the right approach since v4 works)

Key Decision for Jason

Multi-node: Do you want Ralph v5 to support running on a second VPS? This is the biggest architectural decision and affects everything else. If yes, the queue-based architecture already supports it — we just need API key distribution, git repo access, and result reporting. If no, we focus on planning pipeline + UAT quality improvements.


10. Files Reference

FilePurpose
/opt/forge/scripts/ralph.shMain Ralph loop (672 lines)
/opt/forge/scripts/ralph-poller.shQueue poller (367 lines)
/opt/forge/scripts/lib/model-router.sh4-tier cascade (560 lines)
/opt/forge/scripts/lib/lockfile.shConcurrency protection
/opt/forge/scripts/pulse.shFactory Pulse events
/opt/forge/scripts/deploy.shAtomic merge + smoke test
/opt/forge/config/budget.yamlBudget caps
/opt/forge/config/cascade.yamlModel cascade config
/opt/forge/config/gating.yamlGatekeeper config
/opt/forge/config/services.jsonService registry
/opt/forge/services/ralph-poller/Systemd unit files
/opt/forge/Vault/projects/Ralph2/Original PRD + work log
Published via Forge — 2026-02-26  |  ideas.asapai.net