RALPH v5 — Discovery Brief

Date: 2026-02-26 | Author: Claude (Forge Agent) + Jason MacDonald | Status: Discovery Complete — Ready for Design Phase

1. What Is Ralph Today?

Ralph is Forge's autonomous coding agent — an external bash loop that spawns fresh Claude Code instances per task. It's the build factory that executes queued coding work while Jason sleeps, works on other things, or is between sessions.

Current Architecture (v4 — Production)

┌─────────────────────────────────────────────┐
│ systemd timer (every 60s)                    │
│ → ralph-poller.sh                            │
│   ├── Check Supabase ralph_queue             │
│   ├── Call Gatekeeper (:5015/v1/gate)        │
│   ├── Atomic task claim (PATCH status)       │
│   └── nohup ralph.sh "project" "task" &      │
└─────────────────────────────────────────────┘
           ↓ spawns fresh process
┌─────────────────────────────────────────────┐
│ ralph.sh (672 lines)                         │
│ ├── Model Router (4-tier cascade)            │
│ │   ├── T0: Claude Max CLI (free, preferred) │
│ │   ├── T1: Ollama Qwen 7B (free, simple)   │
│ │   ├── T2: LiteLLM API (paid, budget-capped)│
│ │   └── T3: Poll & Wait (24h max)           │
│ ├── Self-healing (env strip, escalation)     │
│ ├── Circuit breaker (3 consecutive fails)    │
│ ├── Factory Pulse events → Supabase          │
│ ├── Host mode (forge) / Docker mode (other)  │
│ └── Queue drain (chain next task on success) │
└─────────────────────────────────────────────┘
           ↓ fresh Claude instance
┌─────────────────────────────────────────────┐
│ claude -p "task description"                 │
│ ├── Fresh 200k context window                │
│ ├── Reads CLAUDE.md + skills automatically   │
│ ├── Executes task, writes code, runs tests   │
│ ├── Git commit on success                    │
│ └── Exits → loop continues                   │
└─────────────────────────────────────────────┘

What's Working

External bash loop with fresh context per task (Joe's #1)
4-tier model cascade with budget controls
Gatekeeper quality gate (score 0-100, decomposition)
Self-healing: env var stripping, exponential backoff, circuit breaker
Factory Pulse observability (events → Supabase → Dashboard)
Dashboard UI: Ralph status card, task queue, model tier display
Systemd timer: poller runs every 60s reliably
Anti-loop protection: spawn tracker (3 fails → mark failed)
Stale task recovery: reset orphaned "running" tasks
Queue drain: chain tasks on success without poller delay
Budget caps: $10/day, $3/task via config/budget.yaml

What's NOT Working / Missing

Tasks frequently fail quickly (logs show claim → spawn → die within 60s)
Model routing struggles when Claude Max is occupied by interactive session
No automated PRD/planning pipeline (relies on manual /discover → /design → /plan)
No research agent spawning before building
No formal UAT/QA loop
No multi-node capability (single VPS only)
No progress tracking within a task (just pass/fail per attempt)
Gatekeeper decomposition queues subtasks but doesn't always work cleanly
Trust levels exist in schema but aren't enforced progressively

2. History: From PRD v2.1 to v4

January 2026: PRD v2.1 Written

Proposed 13 Python FastAPI microservices
Inspired by: GSD (4-doc structure), Alex Dunlop (external loop), Joe Njenga (7 mistakes), Claude Code patterns
Target: Windows local dev (C:\Dev\ralph-orchestrator)
6 of 8 foundation services were actually built (Tasks 1-6)
Services: claude-service, bash-service, github-service, router-service, credit-service, skills-service

January-February 2026: Architecture Pivot

Moved from Windows local → Hetzner VPS (Linux)
Pivoted from Python FastAPI services → Bash scripts + Claude Code CLI
Why: Claude Code CLI provides everything the services were wrapping (bash execution, git operations, code editing, testing)
Ralph v3 built: Self-healing, external bash loop, systemd integration
Ralph v4 added: 4-tier model cascade, Factory Pulse, Gatekeeper

What the PRD Got Right

External bash loop architecture (Joe's #1) — built and working
Fresh context per task — built and working
Budget enforcement — built and working
Max iterations (Joe's #6) — built (5 attempts default)
Feedback loops / tests (Joe's #5) — built (test detection per mode)

What the PRD Proposed That Was Superseded

PRD Proposed	What Actually Exists	Why Different
Claude Service (port 5001)	`claude -p` direct CLI	Claude Code IS the service
Bash Service (port 5004)	Claude Code bash tool	Claude Code handles this natively
GitHub Service (port 5005)	git commands in ralph.sh + deploy.sh	No wrapper needed
Router Service (port 5002)	ClawdRouter (8080) + model-router.sh	More sophisticated cascade
Credit Service (port 5003)	budget_log table + budget.yaml	Simpler, config-driven
Skills Service (port 5006)	.claude/skills/ native loading	Claude Code loads skills natively
Service Launcher	systemd units + guardian	Production-grade vs dev scripts
Doc Generator (port 5007)	/discover → /design → /plan skills	Manual flow, not automated
Discussion Service (port 5008)	Human conversation in Claude Code	Not automated
Research Service (port 5009)	/discover skill	Single-agent, not 4-parallel
Quick Mode (port 5012)	Complexity routing via Gatekeeper	Simple tasks auto-route to cheap models

What the PRD Proposed That's Still Unbuilt

Component	Status	Still Needed?
Orchestrator subagent spawning	Not built	YES — quality improvement
4-document structure (PROJECT/REQ/ROADMAP/STATE)	Not built	MAYBE — skills flow works
Automated PRD pipeline	Not built	YES — biggest gap
UAT Service	Not built	YES — quality gate
Session Manager / HANDOFF.md	Partial (SESSION-QUEUE.md)	MAYBE — current approach works
Practice modes (progressive trust)	Not built	YES — trust_level column unused
Multi-node distributed Ralph	Not built	YES — Jason mentioned this

3. Competitive Landscape: Build Orchestrators in 2026

What Exists Now

Tool	Approach	Strengths	Weaknesses
Claude Code CLI	Single agent, interactive	Best quality, full IDE	No automated loops
Ralph v4 (Forge)	External bash loop, queue-driven	Production-tested, self-healing	No planning pipeline, single node
Devin	Full autonomous agent	End-to-end, cloud-native	$500/mo, black box, variable quality
Cursor Agent	IDE-integrated autonomous	Good UX, integrated	Tied to Cursor IDE
Aider	CLI pair programmer	Simple, effective	No orchestration, no queue
Sweep AI	GitHub-integrated	Auto-PR from issues	Limited scope, GitHub-only
OpenHands (ex-Devin OSS)	Open-source autonomous	Free, customizable	Complex setup, less reliable
SWE-Agent	Research-grade agent	Well-studied	Academic, not production

Ralph's Unique Position

Ralph is the only build orchestrator that:

Uses queue-based task dispatch (not just PR-based or chat-based)
Has 4-tier model cascade with budget controls
Has self-healing (circuit breaker, env healing, escalation)
Integrates with a personal AI operating system (Forge)
Is already in production processing real tasks
Supports both host and containerized execution modes

Key Insight

Ralph isn't competing with Devin or Cursor — it's a personal build factory optimized for one operator (Jason). The goal isn't general-purpose autonomy but leveraged output for a specific operator.

4. Gap Analysis: What Ralph v5 Should Address

Gap 1: No Planning Pipeline (CRITICAL)

Current: Tasks are manually written and queued. Quality depends entirely on how well the human wrote the task description.

Joe's #4: "Ralph amplifies whatever you feed it — garbage in, garbage out."

Impact: Bad task descriptions → wasted tokens, failed attempts, scope creep.

Solution: Automated PRD pipeline — task description → structured plan → decomposed subtasks → approval gate → execution.

Gap 2: No Research Before Building (HIGH)

Current: Ralph jumps straight into coding. No exploration of existing patterns, libraries, or architecture.

PRD v2.1 proposed: 4 parallel research agents (stack, feature, architecture, pitfall).

Impact: Suboptimal implementations, reinventing wheels, missing best practices.

Solution: Pre-execution research phase, at minimum a quick codebase scan + existing pattern check.

Gap 3: No UAT/QA Loop (HIGH)

Current: Tests pass → task complete. No human verification step. No auto-debug on failure feedback.

Impact: Tasks "pass" but don't actually work as intended. No learning from rejection.

Solution: Optional UAT gate — present changes, collect feedback, auto-debug if rejected.

Gap 4: Single Node Only (MEDIUM)

Current: Ralph runs only on the Hetzner VPS at 178.156.253.142.

Jason mentioned: "Working in a different session on standing up as a separate node."

Impact: Can't scale, can't run while VPS is busy, can't run on other machines.

Solution: Distributed Ralph — queue-based architecture already supports this. Need: API key management, git repo access, result reporting.

Gap 5: Progressive Trust Not Enforced (MEDIUM)

Current: trust_level column exists in ralph_queue but is never checked.

Joe's #7: "Don't start with the jackhammer."

Impact: All tasks treated equally — no distinction between supervised and unsupervised.

Solution: Enforce trust levels: supervised (human watches) → attended (periodic check) → unattended (overnight).

Gap 6: No Task Progress Tracking (LOW)

Current: Binary pass/fail per attempt. No visibility into what Ralph is doing mid-task.

Impact: Can't tell if task is 90% done or completely stuck until it finishes.

Solution: Progress events (started, coding, testing, reviewing) emitted during execution.

Gap 7: Subagent Architecture (LOW)

Current: Single Claude instance per task. No planner/coder/reviewer/tester decomposition.

PRD v2.1 proposed: Full subagent architecture with tool scoping.

Impact: Context budget less efficient, no separation of concerns.

Solution: Claude Code's Task tool already supports subagents. Could be enabled via CLAUDE.md instructions.

5. Build vs Buy vs Extend

Recommendation: EXTEND (v4 → v5)

Ralph v4 is production-tested and working. The foundation is solid:

External loop
Fresh context
Budget controls
Self-healing
Observability
Dashboard

Don't rebuild. Extend what works. The PRD v2.1's 13-microservice architecture was the right instinct but over-engineered — Claude Code CLI already provides most of those capabilities natively.

What to Build (v5 Additions)

Component	Effort	Impact	Priority
Planning Pipeline	2-3 days	Eliminates garbage-in	P0
Pre-execution Research	1-2 days	Better implementations	P1
UAT Gate	1-2 days	Quality verification	P1
Multi-node Support	3-5 days	Scale + availability	P2
Progressive Trust	0.5 day	Safety rails	P2
Progress Events	0.5 day	Better visibility	P3
Subagent Prompting	1 day	Context efficiency	P3

What NOT to Build

Python FastAPI microservices for things Claude Code already does
Docker-compose orchestration (systemd works fine)
PowerShell/Windows support (VPS is Linux, that ship sailed)
Custom PII detection (Presidio already handles this)
Custom model routing (ClawdRouter + model-router.sh already handle this)
Custom git service (Claude Code + deploy.sh already handle this)

6. Proposed Architecture: Ralph v5

┌─────────────────────────────────────────────────────────┐
│ RALPH v5: Planned Autonomous Build Factory               │
│                                                          │
│ NEW: Planning Pipeline                                   │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │
│ │ Task     │→│ Research  │→│ Plan     │→│ Approve  │    │
│ │ Intake   │ │ (quick)   │ │ Decompose│ │ Gate     │    │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘    │
│                                           ↓              │
│ EXISTING: Execution Engine (v4)                          │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │
│ │ Poller   │→│ Gatekeeper│→│ Ralph.sh │→│ Claude   │    │
│ │ (60s)    │ │ (quality) │ │ (cascade)│ │ (fresh)  │    │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘    │
│                                           ↓              │
│ NEW: Quality Loop                                        │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐                  │
│ │ Tests    │→│ UAT Gate │→│ Deploy   │                  │
│ │ (auto)   │ │ (optional)│ │ (merge)  │                  │
│ └──────────┘ └──────────┘ └──────────┘                  │
│                                                          │
│ NEW: Multi-node (Future)                                 │
│ ┌──────────┐ ┌──────────┐                                │
│ │ Node 1   │ │ Node 2   │  ← Same queue, different VPS  │
│ │ (Hetzner)│ │ (Future) │                                │
│ └──────────┘ └──────────┘                                │
└─────────────────────────────────────────────────────────┘

7. Risks & Mitigations

Risk	Likelihood	Impact	Mitigation
Over-engineering v5 like v2.1 was	HIGH	Wasted effort	Keep it bash + Claude Code. No new microservices unless truly needed.
Planning pipeline slows throughput	MEDIUM	Reduced task velocity	Make planning optional per trust level. Simple tasks skip planning.
Multi-node adds complexity	MEDIUM	Debugging nightmares	Start with second node as read-only worker, graduate to full autonomy.
Claude Max API changes	LOW	Breaking changes	model-router.sh already handles fallbacks.
Budget overruns on paid tiers	LOW	Cost surprise	Already mitigated: daily cap, per-task cap, circuit breaker.

8. Joe's 7 Mistakes — Current Status

#	Mistake	v4 Status	v5 Target
1	Plugin ≠ Real Ralph (context rot)	External bash loop, fresh context	Maintain
2	Vague criteria	Tests must pass	Add: lint + typecheck gates
3	Tasks too large	Gatekeeper decomposes, but inconsistent	Planning pipeline with auto-decomposition
4	Skip planning	No automated planning	Planning pipeline (P0)
5	No feedback loops	Tests auto-run	Add: lint, typecheck, review agent
6	Infinite iterations	Max 5 attempts, circuit breaker	Maintain
7	No practice first	trust_level column exists, unused	Enforce progressive trust

9. Next Steps

Design Phase (/design) — Architecture for v5 additions (planning pipeline, UAT gate, progressive trust)
Plan Phase (/plan) — Break into Ralph-executable tasks
Build — Ralph builds Ralph v5 (meta! but the right approach since v4 works)

Key Decision for Jason

Multi-node: Do you want Ralph v5 to support running on a second VPS? This is the biggest architectural decision and affects everything else. If yes, the queue-based architecture already supports it — we just need API key distribution, git repo access, and result reporting. If no, we focus on planning pipeline + UAT quality improvements.

10. Files Reference

File	Purpose
`/opt/forge/scripts/ralph.sh`	Main Ralph loop (672 lines)
`/opt/forge/scripts/ralph-poller.sh`	Queue poller (367 lines)
`/opt/forge/scripts/lib/model-router.sh`	4-tier cascade (560 lines)
`/opt/forge/scripts/lib/lockfile.sh`	Concurrency protection
`/opt/forge/scripts/pulse.sh`	Factory Pulse events
`/opt/forge/scripts/deploy.sh`	Atomic merge + smoke test
`/opt/forge/config/budget.yaml`	Budget caps
`/opt/forge/config/cascade.yaml`	Model cascade config
`/opt/forge/config/gating.yaml`	Gatekeeper config
`/opt/forge/config/services.json`	Service registry
`/opt/forge/services/ralph-poller/`	Systemd unit files
`/opt/forge/Vault/projects/Ralph2/`	Original PRD + work log

Published via Forge — 2026-02-26 | ideas.asapai.net