Ralph is Forge's autonomous coding agent — an external bash loop that spawns fresh Claude Code instances per task. It's the build factory that executes queued coding work while Jason sleeps, works on other things, or is between sessions.
┌─────────────────────────────────────────────┐
│ systemd timer (every 60s) │
│ → ralph-poller.sh │
│ ├── Check Supabase ralph_queue │
│ ├── Call Gatekeeper (:5015/v1/gate) │
│ ├── Atomic task claim (PATCH status) │
│ └── nohup ralph.sh "project" "task" & │
└─────────────────────────────────────────────┘
↓ spawns fresh process
┌─────────────────────────────────────────────┐
│ ralph.sh (672 lines) │
│ ├── Model Router (4-tier cascade) │
│ │ ├── T0: Claude Max CLI (free, preferred) │
│ │ ├── T1: Ollama Qwen 7B (free, simple) │
│ │ ├── T2: LiteLLM API (paid, budget-capped)│
│ │ └── T3: Poll & Wait (24h max) │
│ ├── Self-healing (env strip, escalation) │
│ ├── Circuit breaker (3 consecutive fails) │
│ ├── Factory Pulse events → Supabase │
│ ├── Host mode (forge) / Docker mode (other) │
│ └── Queue drain (chain next task on success) │
└─────────────────────────────────────────────┘
↓ fresh Claude instance
┌─────────────────────────────────────────────┐
│ claude -p "task description" │
│ ├── Fresh 200k context window │
│ ├── Reads CLAUDE.md + skills automatically │
│ ├── Executes task, writes code, runs tests │
│ ├── Git commit on success │
│ └── Exits → loop continues │
└─────────────────────────────────────────────┘
C:\Dev\ralph-orchestrator)| PRD Proposed | What Actually Exists | Why Different |
|---|---|---|
| Claude Service (port 5001) | claude -p direct CLI | Claude Code IS the service |
| Bash Service (port 5004) | Claude Code bash tool | Claude Code handles this natively |
| GitHub Service (port 5005) | git commands in ralph.sh + deploy.sh | No wrapper needed |
| Router Service (port 5002) | ClawdRouter (8080) + model-router.sh | More sophisticated cascade |
| Credit Service (port 5003) | budget_log table + budget.yaml | Simpler, config-driven |
| Skills Service (port 5006) | .claude/skills/ native loading | Claude Code loads skills natively |
| Service Launcher | systemd units + guardian | Production-grade vs dev scripts |
| Doc Generator (port 5007) | /discover → /design → /plan skills | Manual flow, not automated |
| Discussion Service (port 5008) | Human conversation in Claude Code | Not automated |
| Research Service (port 5009) | /discover skill | Single-agent, not 4-parallel |
| Quick Mode (port 5012) | Complexity routing via Gatekeeper | Simple tasks auto-route to cheap models |
| Component | Status | Still Needed? |
|---|---|---|
| Orchestrator subagent spawning | Not built | YES — quality improvement |
| 4-document structure (PROJECT/REQ/ROADMAP/STATE) | Not built | MAYBE — skills flow works |
| Automated PRD pipeline | Not built | YES — biggest gap |
| UAT Service | Not built | YES — quality gate |
| Session Manager / HANDOFF.md | Partial (SESSION-QUEUE.md) | MAYBE — current approach works |
| Practice modes (progressive trust) | Not built | YES — trust_level column unused |
| Multi-node distributed Ralph | Not built | YES — Jason mentioned this |
| Tool | Approach | Strengths | Weaknesses |
|---|---|---|---|
| Claude Code CLI | Single agent, interactive | Best quality, full IDE | No automated loops |
| Ralph v4 (Forge) | External bash loop, queue-driven | Production-tested, self-healing | No planning pipeline, single node |
| Devin | Full autonomous agent | End-to-end, cloud-native | $500/mo, black box, variable quality |
| Cursor Agent | IDE-integrated autonomous | Good UX, integrated | Tied to Cursor IDE |
| Aider | CLI pair programmer | Simple, effective | No orchestration, no queue |
| Sweep AI | GitHub-integrated | Auto-PR from issues | Limited scope, GitHub-only |
| OpenHands (ex-Devin OSS) | Open-source autonomous | Free, customizable | Complex setup, less reliable |
| SWE-Agent | Research-grade agent | Well-studied | Academic, not production |
Ralph is the only build orchestrator that:
Ralph isn't competing with Devin or Cursor — it's a personal build factory optimized for one operator (Jason). The goal isn't general-purpose autonomy but leveraged output for a specific operator.
Current: Tasks are manually written and queued. Quality depends entirely on how well the human wrote the task description.
Joe's #4: "Ralph amplifies whatever you feed it — garbage in, garbage out."
Impact: Bad task descriptions → wasted tokens, failed attempts, scope creep.
Solution: Automated PRD pipeline — task description → structured plan → decomposed subtasks → approval gate → execution.
Current: Ralph jumps straight into coding. No exploration of existing patterns, libraries, or architecture.
PRD v2.1 proposed: 4 parallel research agents (stack, feature, architecture, pitfall).
Impact: Suboptimal implementations, reinventing wheels, missing best practices.
Solution: Pre-execution research phase, at minimum a quick codebase scan + existing pattern check.
Current: Tests pass → task complete. No human verification step. No auto-debug on failure feedback.
Impact: Tasks "pass" but don't actually work as intended. No learning from rejection.
Solution: Optional UAT gate — present changes, collect feedback, auto-debug if rejected.
Current: Ralph runs only on the Hetzner VPS at 178.156.253.142.
Jason mentioned: "Working in a different session on standing up as a separate node."
Impact: Can't scale, can't run while VPS is busy, can't run on other machines.
Solution: Distributed Ralph — queue-based architecture already supports this. Need: API key management, git repo access, result reporting.
Current: trust_level column exists in ralph_queue but is never checked.
Joe's #7: "Don't start with the jackhammer."
Impact: All tasks treated equally — no distinction between supervised and unsupervised.
Solution: Enforce trust levels: supervised (human watches) → attended (periodic check) → unattended (overnight).
Current: Binary pass/fail per attempt. No visibility into what Ralph is doing mid-task.
Impact: Can't tell if task is 90% done or completely stuck until it finishes.
Solution: Progress events (started, coding, testing, reviewing) emitted during execution.
Current: Single Claude instance per task. No planner/coder/reviewer/tester decomposition.
PRD v2.1 proposed: Full subagent architecture with tool scoping.
Impact: Context budget less efficient, no separation of concerns.
Solution: Claude Code's Task tool already supports subagents. Could be enabled via CLAUDE.md instructions.
Ralph v4 is production-tested and working. The foundation is solid:
Don't rebuild. Extend what works. The PRD v2.1's 13-microservice architecture was the right instinct but over-engineered — Claude Code CLI already provides most of those capabilities natively.
| Component | Effort | Impact | Priority |
|---|---|---|---|
| Planning Pipeline | 2-3 days | Eliminates garbage-in | P0 |
| Pre-execution Research | 1-2 days | Better implementations | P1 |
| UAT Gate | 1-2 days | Quality verification | P1 |
| Multi-node Support | 3-5 days | Scale + availability | P2 |
| Progressive Trust | 0.5 day | Safety rails | P2 |
| Progress Events | 0.5 day | Better visibility | P3 |
| Subagent Prompting | 1 day | Context efficiency | P3 |
┌─────────────────────────────────────────────────────────┐
│ RALPH v5: Planned Autonomous Build Factory │
│ │
│ NEW: Planning Pipeline │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Task │→│ Research │→│ Plan │→│ Approve │ │
│ │ Intake │ │ (quick) │ │ Decompose│ │ Gate │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ↓ │
│ EXISTING: Execution Engine (v4) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Poller │→│ Gatekeeper│→│ Ralph.sh │→│ Claude │ │
│ │ (60s) │ │ (quality) │ │ (cascade)│ │ (fresh) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ↓ │
│ NEW: Quality Loop │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Tests │→│ UAT Gate │→│ Deploy │ │
│ │ (auto) │ │ (optional)│ │ (merge) │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ NEW: Multi-node (Future) │
│ ┌──────────┐ ┌──────────┐ │
│ │ Node 1 │ │ Node 2 │ ← Same queue, different VPS │
│ │ (Hetzner)│ │ (Future) │ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Over-engineering v5 like v2.1 was | HIGH | Wasted effort | Keep it bash + Claude Code. No new microservices unless truly needed. |
| Planning pipeline slows throughput | MEDIUM | Reduced task velocity | Make planning optional per trust level. Simple tasks skip planning. |
| Multi-node adds complexity | MEDIUM | Debugging nightmares | Start with second node as read-only worker, graduate to full autonomy. |
| Claude Max API changes | LOW | Breaking changes | model-router.sh already handles fallbacks. |
| Budget overruns on paid tiers | LOW | Cost surprise | Already mitigated: daily cap, per-task cap, circuit breaker. |
| # | Mistake | v4 Status | v5 Target |
|---|---|---|---|
| 1 | Plugin ≠ Real Ralph (context rot) | External bash loop, fresh context | Maintain |
| 2 | Vague criteria | Tests must pass | Add: lint + typecheck gates |
| 3 | Tasks too large | Gatekeeper decomposes, but inconsistent | Planning pipeline with auto-decomposition |
| 4 | Skip planning | No automated planning | Planning pipeline (P0) |
| 5 | No feedback loops | Tests auto-run | Add: lint, typecheck, review agent |
| 6 | Infinite iterations | Max 5 attempts, circuit breaker | Maintain |
| 7 | No practice first | trust_level column exists, unused | Enforce progressive trust |
/design) — Architecture for v5 additions (planning pipeline, UAT gate, progressive trust)/plan) — Break into Ralph-executable tasksMulti-node: Do you want Ralph v5 to support running on a second VPS? This is the biggest architectural decision and affects everything else. If yes, the queue-based architecture already supports it — we just need API key distribution, git repo access, and result reporting. If no, we focus on planning pipeline + UAT quality improvements.
| File | Purpose |
|---|---|
/opt/forge/scripts/ralph.sh | Main Ralph loop (672 lines) |
/opt/forge/scripts/ralph-poller.sh | Queue poller (367 lines) |
/opt/forge/scripts/lib/model-router.sh | 4-tier cascade (560 lines) |
/opt/forge/scripts/lib/lockfile.sh | Concurrency protection |
/opt/forge/scripts/pulse.sh | Factory Pulse events |
/opt/forge/scripts/deploy.sh | Atomic merge + smoke test |
/opt/forge/config/budget.yaml | Budget caps |
/opt/forge/config/cascade.yaml | Model cascade config |
/opt/forge/config/gating.yaml | Gatekeeper config |
/opt/forge/config/services.json | Service registry |
/opt/forge/services/ralph-poller/ | Systemd unit files |
/opt/forge/Vault/projects/Ralph2/ | Original PRD + work log |