Claude Code /goal: How to Run AI Agents Autonomously for Days
AI Infrastructure Lead
Anthropic just shipped /goal — a built-in slash command that keeps Claude Code working until a definition of done is met. You type the condition, Claude runs a turn, and a small evaluator model checks whether you're there yet. If not, it starts another turn. Public reports already include 14-hour overnight sessions, 45-hour sessions on Codex, and one developer claiming a five-day continuous run.
This is the same Ralph-loop pattern people have been hand-rolling for months — now baked into the harness. OpenAI's Codex CLI shipped it first as an experimental config option about two weeks ago; Anthropic followed with a more polished implementation. This guide covers what /goal actually does, how to write conditions that don't burn tokens, the head-to-head against Codex on a real build (Claude Code finished a working Catan clone in 13 minutes), and how to use it to drain unused weekly tokens before they reset.
📋 Table of Contents
What /goal Actually Does
Most of Claude Code's history has been steering. You write a prompt, Claude does a thing, you steer the next prompt, Claude does the next thing. /goal inverts that. You write one thing — a completion condition — and Claude keeps running turns until that condition is satisfied or you cancel it.

/goal is the autonomous-mode primitive that's been missing from that pitch.The shape is familiar to anyone who's hand-rolled an agent loop. Set an objective. Run the model. Evaluate against the objective. Loop if not done. The novelty here isn't the pattern — it's that it's built into the official harness with the evaluator wired up, the token accounting integrated, and a real "stop" mechanism so a runaway agent can't burn your weekly limit in an afternoon.
If you've heard the term "Ralph loop" from earlier this year — that's the pattern. /goal is the official, harness-native version.
How the Evaluator Loop Works
The mechanism is simple and worth understanding because it shapes everything you do with the feature. Every time Claude finishes a turn, the harness invokes a separate, smaller model with three inputs:
- Your condition — the literal text of your goal statement.
- The recent conversation — what Claude just did, what tools it called, what the results were.
- An instruction to return yes/no plus a short reason. No essays. Just a binary verdict.
If the evaluator says no, the reason becomes context for Claude's next turn. If yes, the goal clears and Claude reports done. The default evaluator is Claude Haiku — the smallest, fastest model in the lineup. Evaluator tokens are billed at Haiku rates and rarely show up as a meaningful share of total spend.

/goal condition has been met.The architectural implication: your condition is being read by a different, smaller model than the one doing the work. If you write a condition that needs deep reasoning to evaluate, Haiku may get it wrong — either declaring done too early (you stop with broken code) or never declaring done (you burn turns forever). Anchor on things Haiku can verify with confidence.
How to Use /goal — A Five-Minute Walkthrough
Open a real terminal — not the VS Code extension. Type claude to launch the CLI, then /goal followed by your condition. A minimal example:
/goal Build a Chrome extension that adds Slack-style emoji shortcuts.
Definition of done:
- the extension loads with no manifest errors
- typing :smile: in any input box pastes the 😀 emoji
- works in incognito mode
Stop after 30 turns regardless.
Claude reads the condition, plans the work, executes a turn (write the manifest, add the content script, wire up the input listener), then the evaluator checks whether the condition holds. If not, the reason ("manifest is valid but content script doesn't fire on inputs of type=email") feeds into the next turn. The session keeps looping until Haiku says yes or you hit the 30-turn cap.
For longer or more structured runs, write a plan file first and point /goal at it. Ask Claude to draft the plan with "plan for goal" and you get a markdown file with sections like:
- Brief — one-paragraph statement of the outcome
- Tech stack — languages, frameworks, runtimes
- In scope / Out of scope — surface the constraint up-front
- Constraints — files Claude must not touch, conventions to honour
- Definition of done — the literal text Haiku evaluates against
- Acceptance criteria — discrete checks (test exits 0, file count = N, etc.)
- Verification recipe — exact commands to run to prove the criteria
- Turn budget — explicit cap so a runaway loop can't drain weekly tokens
Then point Claude at the file: /goal Execute the plan in PLAN.md. The model uses the file as ground truth and the evaluator checks each acceptance criterion separately. This is dramatically more reliable than a one-liner condition.
Writing a Condition That Doesn't Waste Turns
Because Haiku is the evaluator, the cardinal rule is: anchor on things Haiku can verify with a quick read of the conversation. Good and bad examples:
| Don't say | Do say |
|---|---|
| "Make the code production-ready" | "npm test exits 0 and there are no TODO or FIXME tokens in the diff" |
| "Improve performance" | "benchmark/run.sh reports p95 latency < 200ms" |
| "Refactor cleanly" | "All 12 files in src/auth/ use the new AuthClient import and the old legacyAuth symbol has zero usages" |
| "Build a working game" | "index.html loads without console errors and clicking 'Roll' updates the score display" |
The pattern: verifiable end state, observable signal, hard stop. A condition Haiku can answer with a yes/no after one glance is a condition that won't waste your weekly tokens.
Head-to-Head: Claude Code vs Codex on /goal
The Codex implementation shipped first. Two weeks later Anthropic followed. That's a real win for users — same primitive on both harnesses. To test, the same prompt was given to both with their most capable models (Claude Code on Opus 4.7, Codex on GPT 5.5):
Build a single-file Settlers of Catan clone, Game of Thrones themed, 32-bit pixel art, where the four player houses are reskinned as the AI labs — Anthropic, Google, OpenAI, X. Single player versus three AI bots.

/goal as an experimental config option ~2 weeks before Anthropic shipped the same primitive in Claude Code.| Dimension | Claude Code (Opus 4.7) | Codex (GPT 5.5) |
|---|---|---|
| Wall-clock to "goal achieved" | 13 min 05 sec | 33 min |
| Game logic functionality | Dice rolls, settlement placement, AI turns — works | Works but UX hides affordances (no dice indicator, no disabled-state on unaffordable actions) |
| Visuals | Placeholder shapes — no native image generation | Real sprites — GPT image 2 generated assets for the house menu |
| Asset integration | N/A | Generated assets but didn't reuse them as map tiles — orphaned |
The takeaway: Codex wins on visual polish because it has GPT image 2 in the bag. Claude Code wins on functional logic in less than half the time. If your project needs real sprites or generated UI imagery, plan to call out to OpenAI's image model or Google's nano banana from inside Claude Code — Anthropic doesn't ship an image generator, and that's a real gap for game and design work.
The Token-Drain Strategy: Burn Tokens Before They Reset
Claude's weekly rate limit resets on a fixed schedule. If you finish a week with 92% of your tokens unspent, that 92% just vanishes — there's no rollover. The Max plans in particular ship more tokens than most developers can consume in a normal coding week.
/goal turns that unused budget into output. Frontload token-heavy work into a single autonomous run on Sunday night before the Monday reset. Things that work well:
- Batch newsletter / blog drafts — point a skill at recent assets, ask
/goalto produce N drafts. The example in the source: "create four weekly newsletter drafts from these four YouTube videos." Result: 8 minutes, 4 HTML files ready for review. - Regression test backfill — "add test coverage for every uncovered function in
src/billing/until coverage report reads ≥ 80%." - Documentation sweeps — "every public API in
api/has a JSDoc block with @param and @returns." - Refactor migrations — "every file importing
legacyClientnow importsv2Clientand the type signatures still compile."
Add a token-budget clause to the condition: "Stop when 95% of weekly tokens have been spent OR when the task list is empty." Haiku reads the most recent /usage output and decides — clean way to use the headroom without overshooting.

/goal syntax and the evaluator contract — worth bookmarking.Limitations and What to Watch For
- Terminal only. The VS Code extension doesn't expose
/goalyet. Use a real shell or your IDE's built-in terminal launchingclaude. - Haiku is a small model. Vague conditions get vague evaluations. If Haiku declares done too early, the goal exits with broken code. If Haiku never declares done, you burn turns until the cap. Anchor on objective signals (exit codes, file contents, test results) not subjective ones.
- No native image generation. Real design/game projects need an external image API. Budget for it separately.
- Reports of 5-day runs are unverified. 14-hour overnight sessions are common; multi-day runs are rare and the failure modes (looping evaluator, stalled tool calls, runaway token spend) are real. Add hard caps even when you want a long run.
- Auto mode is complementary, not redundant. Auto mode auto-approves tool calls within a turn.
/goaldecides whether to start another turn. Use both for truly hands-off runs — but only after you trust the condition.
Long /goal sessions burn through weekly tokens fast. The Claude Max plan ships the highest weekly limits — meaningful headroom if you're running multi-hour autonomous sessions on top of normal coding work. The 20× tier ($200/mo) is the one most "I want to run /goal overnight" developers settle on.
If you're using /goal to batch-write articles, newsletters, or marketing copy, run the bulk output through Undetectr ($39 lifetime) before publishing. The text comes out indistinguishable from human writing — no AI-detection footprint, which matters when you're shipping at /goal volume.
Frequently Asked Questions
/goal clear./loop re-fires the same prompt on a time interval. /goal keeps running until a verifiable condition holds. /goal evaluates after every turn with a separate model; /loop just re-runs the same prompt on a schedule./goal as an experimental config option roughly two weeks before Anthropic added it to Claude Code. Both implement the same Ralph loop pattern — run a coding agent until a verifiable end state holds./goal currently only runs from a terminal session — either your IDE's built-in terminal launching the claude CLI, or a standalone shell. It also runs non-interactively via claude -p with the goal condition embedded.npm test exits 0"; "the new page renders without console errors"), and list constraints that must not change. Add a turn-budget clause ("or stop after 60 turns") to bound runtime./goal decides whether to start another turn. Pair them and Claude can run a long session without any human prompts between tool calls or turns.Recommended AI Tools
Emergent.sh
Build production-ready apps in hours, not weeks. Full-stack with auth, payments, hosting included. $20-200/mo pricing.
View Review →Emergent.sh
Build production-ready apps in hours, not weeks. Full-stack with auth, payments, hosting included. $20-200/mo pricing.
View Review →Kie.ai
Unified API gateway for every frontier generative AI model — Veo, Suno, Midjourney, Flux, Nano Banana Pro, Runway Aleph. 30-80% cheaper than official pricing.
View Review →HeyGen
AI avatar video creation platform with 700+ avatars, 175+ languages, and Avatar IV full-body motion.
View Review →