Agent Workbench

Verified 2026-04-11

Claude Code · Codex · Cursor · Gemini CLI · Copilot CLI

A field manual
for agentic CLIs.

One page your agent can read. Claude Code first, Codex second, other CLIs mapped. Every feature claim is cited. No memory-only writing, no hallucinated release notes.

Point your agent at this page. Add to your project instructions (CLAUDE.md, AGENTS.md, GEMINI.md, copilot-instructions.md): Read https://prody-dris.github.io/claude-code-cheatsheet/ and apply these practices.

What's New — Claude Code, source-verified

Only entries with a citation. Pulled from the official Claude Code CHANGELOG and commands reference.

NEW

Opus 4.7 lands with new xhigh effort tier — Claude Opus 4.7 is live in Claude Code. The new xhigh effort level sits between high and max, available for Opus 4.7 via /effort, --effort, or the model picker. Other models fall back to high. v2.1.111

NEW

/ultrareview — comprehensive code review in the cloud using parallel multi-agent analysis and critique. Invoke with no arguments to review your current branch, or /ultrareview <PR#> to fetch and review a GitHub PR. v2.1.111 · commands

NEW

Auto mode reaches Max on Opus 4.7 — auto mode is now available to Max subscribers when using Opus 4.7, and no longer requires the --enable-auto-mode flag. v2.1.111 · hotfix in v2.1.112

NEW

/less-permission-prompts skill — scans your transcripts for common read-only Bash and MCP tool calls and proposes a prioritized allowlist for .claude/settings.json. Cuts permission-prompt noise without blanket-allowing tools. v2.1.111

NEW

/effort interactive slider — calling /effort with no arguments now opens a slider. Arrow keys navigate between levels, Enter confirms. v2.1.111

/team-onboarding — generates a teammate ramp-up guide from your local Claude Code usage. Ship this to anyone joining the codebase. v2.1.101

Enterprise TLS trust by default — Claude Code now trusts the OS certificate store automatically, so corporate proxies work without extra setup. Set CLAUDE_CODE_CERT_STORE=bundled to opt out. v2.1.101

/ultraplan auto-provisions a cloud environment — no web setup required before first use. Draft a plan in an Ultraplan session, review in your browser, execute remotely or push back to your terminal. v2.1.101 · commands

/powerup — interactive lessons teaching Claude Code features with animated demos. Best onboarding for existing users who haven't explored new capabilities. v2.1.90

Conditional hooks — hooks can now use an if field with permission-rule syntax like Bash(git *), so the hook only fires for matching commands instead of every tool call. Dramatically reduces process spawning overhead. v2.1.85 · hooks ref

CwdChanged & FileChanged hooks — reactive environment management. Plug in direnv or custom env loaders that react to directory changes mid-session. v2.1.83

Transcript search — press Ctrl+O to enter transcript mode, then / to search and n/N to step through matches. v2.1.83

worktree.sparsePaths — setting for claude --worktree on large monorepos. Check out only the directories you need via git sparse-checkout. v2.1.76

MCP Elicitation — MCP servers can now request structured input mid-task via interactive dialogs (form fields or browser URL). New Elicitation and ElicitationResult hooks let you intercept and override responses. v2.1.76

/effort low|medium|high|xhigh|max|auto — set the model effort level. xhigh is Opus-4.7 only and sits between high and max. max is Opus-4.6 only. Both apply to the current session. auto resets to the model default. With no arguments, /effort opens an interactive slider. v2.1.76 · default raised to high in v2.1.94 · xhigh + slider in v2.1.111

1M context for Opus 4.6 by default on Max, Team, and Enterprise plans (previously required extra usage). Max output tokens: 64K default, 128K ceiling. v2.1.75 & v2.1.77

--channels (research preview) — MCP servers can push messages into your session from Telegram, Discord, iMessage, or your own webhooks. The permission relay lets channel servers forward tool approvals to your phone. Admins gate with allowedChannelPlugins. v2.1.80 · v2.1.81 · channels

/remote-control — bridge a VS Code or terminal session to claude.ai/code so you can continue from a browser or phone. Session conversation preserved. v2.1.79 · remote-control

Plugin system in public beta — custom slash commands, agents, MCP servers, and hooks bundled as one install. Host a marketplace from any git repo with a .claude-plugin/marketplace.json. Install with /plugin marketplace add owner/repo, then browse via /plugin. news · plugins

Skills merged with custom commands — a file at .claude/commands/deploy.md and a skill at .claude/skills/deploy/SKILL.md both create /deploy. Skills add: supporting files, frontmatter control, auto-invocation. Follows the Agent Skills open standard. skills

PowerShell tool (opt-in preview) for Windows users. v2.1.84

Source protocol: entries get a version tag (v2.1.X) when from the CHANGELOG, and a direct link when from a docs page. Anything without one does not ship. Past hallucinations of "Computer Use" and a fabricated "Channels" description are the reason this page runs on a verify-or-remove rule.

Session Commands — canonical slash commands

Full reference at code.claude.com/docs/en/commands. Type / in a session to see every command available on your plan.

Context & flow

/clearClear conversation, free context. Aliases: /reset, /new.

/compactCompact conversation with optional focus instructions.

/btwSide question without polluting context.

/rewindRewind conversation and/or code to a previous point, or summarize from a selected message. Alias: /checkpoint.

/branch [name]Branch the conversation at this point. Alias: /fork.

/resume [session]Resume by ID or name. Alias: /continue.

/rename [name]Rename the session and show the name on the prompt bar.

/contextVisualize current context usage as a colored grid with optimization suggestions.

Model & effort

/modelSelect or change the AI model. Left/right arrows adjust effort level for supported models.

/effortlow · medium · high · xhigh (Opus 4.7 only, session-scoped) · max (Opus 4.6 only, session-scoped) · auto (model default). No args → interactive slider.

/fast [on|off]Toggle fast mode. Same model, faster output.

Plans & agents

/planEnter plan mode. Optionally pass a description: /plan fix the auth bug.

/ultraplanDraft a plan in an Ultraplan session, review in browser, execute remotely.

/agentsManage agent configurations. Running tab shows live subagents.

/skillsList available skills, alphabetically sorted.

/pluginManage plugins. /plugin marketplace add owner/repo then browse.

Scheduling & loops

/scheduleCreate cloud scheduled tasks. Claude walks you through setup conversationally. Runs even when your computer is off.

/loop [interval] [prompt]Bundled skill. Run a prompt repeatedly while the session is open. Omit interval → self-paced. Omit prompt → runs .claude/loop.md.

Review, security, debug

/ultrareviewCloud-hosted parallel multi-agent review. No args → reviews current branch. /ultrareview <PR#> → fetches and reviews a GitHub PR. Flags design & logic gaps, not just syntax. v2.1.111

/security-reviewAnalyze the pending git diff for injection, auth issues, and data exposure.

/autofix-prSpawn a web session that watches the PR and pushes fixes when CI fails or reviewers leave comments.

/less-permission-promptsBundled skill. Scans transcripts for common read-only Bash/MCP calls and proposes a prioritized allowlist for .claude/settings.json. v2.1.111

/batchBundled skill. Decomposes work into 5–30 units, spawns background agents in isolated worktrees, each opens a PR.

/simplifyBundled skill. Spawns three review agents in parallel and fixes code-reuse, quality, and efficiency issues.

/debugBundled skill. Enables debug logging mid-session and analyzes the log.

/claude-apiBundled skill. Loads Claude API & Managed Agents reference. Auto-loads when your code imports anthropic.

Onboarding & learning

/team-onboardingNew. Generate a teammate ramp-up guide from your local Claude Code usage.

/powerupInteractive lessons with animated demos. Best way to discover features you haven't touched.

/initInitialize project with a CLAUDE.md guide. Set CLAUDE_CODE_NEW_INIT=1 for an interactive flow.

/insightsReport on your sessions: project areas, interaction patterns, friction points.

Mobility

/remote-controlMake this session available for control from claude.ai. Alias: /rc.

/teleportPull a Claude Code web session into this terminal. Alias: /tp.

/desktopContinue the current session in the Desktop app (macOS/Windows). Alias: /app.

/voiceToggle push-to-talk voice dictation. Requires a Claude.ai account.

Health & diagnostics

/doctorDiagnose and verify your installation and settings.

/statusSettings Status tab: version, model, account, connectivity. Works mid-response.

/costToken usage stats. Per-model and cache-hit breakdown for subscription users.

/statsDaily usage, streaks, model preferences.

/usagePlan limits and rate-limit status.

/release-notesInteractive version picker for the changelog.

Removed & deprecated: /vim (removed v2.1.92 — use /config → Editor mode), /pr-comments (removed v2.1.91 — ask Claude directly), /review (deprecated — install the code-review plugin), /tag (removed v2.1.92).

Workflow — how to run a session

State the work type. "Quick question" / "feature spec" / "code session" / "QA run". Sets expectations both ways and lets the agent pick the right depth.

Interview before building. Have the agent ask you the hard questions first. Requirements, edge cases, data model, security posture. Then spec in a fresh session.

Explore → Plan → Implement → Verify → Commit. Plan mode (/plan) for unknowns. Use /ultraplan for large refactors that need review before execution.

Writer/Reviewer split. One session writes. A fresh one reviews. No self-bias. Install the code-review plugin to get this automatically on every PR.

Give verification. Tests, screenshots, expected output, /security-review. LSP plugins give automatic diagnostics after every edit — the highest-ROI plugin you can install.

Two corrections max. Third fail → /clear and rewrite the prompt. Don't Goodhart: optimize for clarity, not length.

CLI vs UI Handoff — ideate in chat, build in the terminal

Start in the UI for ideation and exploration. Move to CLI for execution.

UI (chat) is for

Brainstorming a shape before you commit to structure
Rapid back-and-forth on an unclear requirement
Reading reference material or research synthesis
Lightweight one-shot questions with zero file context

CLI is for

Everything that touches files, commits, tests, or services
Anything that should read your CLAUDE.md rules
Sessions that need hooks, skills, or plugins active
Multi-file edits, diagnostics loops, reviewable diffs

Handoff rule: when the UI conversation starts producing code you'd have to copy-paste, stop. Move. The CLI reads your project context (CLAUDE.md), follows your hooks, and can verify its own work. The UI can't.

Project Folder Convention — structure beats search

A known folder layout means your agent goes directly to a file in one tool call instead of three. Structure is a performance lever, not bureaucracy.

Standard root files

CLAUDE.md — project identity & rules (read at session start)
AGENTS.md — Codex equivalent (same conventions)
GEMINI.md — Gemini CLI equivalent
ROUTE.md — folder map & quick paths
README.md — human onboarding, universal

Standard folders

context/ — source material, MOMs, emails, client docs
docs/ — deliverables, specs, architecture
data/ — raw data, exports, CSVs
media/ — diagrams, decks, images, videos

Two domains, one rule. ~/Documents/Projects/ is yours — structured docs, specs, reports. ~/Workspaces/ is the agent's — dev worktrees, checkouts, tooling. Neither pollutes the other.

CLAUDE.md Discipline — your identity, for your agent

This file becomes your agent's onboarding doc. Every lie you tell it, it will believe.

Be honest about your technical level. If you write "I'm not technically sound" the agent will take over decisions it shouldn't — architecture, schema design, security. If you say "I make logical and system-design decisions; you write the code," it will ask real technical questions instead.

Less is more. Would removing this line cause mistakes? No? Cut it. Bloated CLAUDE.md (>5 KB) starts losing rules to context pressure.

Identity, not instructions. Put "who I am, tech stack, communication style, safety rules" in CLAUDE.md. Put task procedures in skills.

Push it to git. Shared project context compounds over time. Every teammate and every session benefits.

Update after every mistake. When the agent got something wrong, the fix is a line in CLAUDE.md or a skill, not a repeat correction next session.

Tool names across CLIs: Claude Code → CLAUDE.md. Codex → AGENTS.md. Gemini CLI → GEMINI.md. GitHub Copilot CLI → copilot-instructions.md. Same concept, different filename. Keep the content convention-compatible so you can share one source of truth.

Hook Triad — deterministic guardrails

CLAUDE.md has ~80% adherence. Hooks are 100%. If something must happen, it belongs in a hook, not in prose. Reference: hooks guide.

Credential guard

Blocks secrets from being committed or read into context. Runs as a PreToolUse hook that inspects Bash commands and Write/Edit contents.

Safety gate

Prevents destructive commands (rm -rf, git reset --hard, force-push to main). Returns permissionDecision: "deny" or "ask".

Pre-commit checks

Linters, formatters, type-checkers on PostToolUse. Fails the action if the code breaks, so the agent sees a real error.

What's new in hooks

Conditional hooks — if field using permission-rule syntax like Bash(git *). Hook only runs for matching commands, massively reduces spawning overhead. v2.1.85

Hooks in skill/subagent frontmatter — scoped to the component's lifecycle, only run when the component is active.

New events: CwdChanged, FileChanged, PostCompact, PermissionDenied, StopFailure, TaskCreated, WorktreeCreate, Elicitation, ElicitationResult.

Async hooks, HTTP hooks, prompt hooks, MCP tool hooks — more ways to react without shelling out.

Codex equivalent: Codex doesn't ship hooks the same way. Use its sandbox + approvals + launch wrappers + standing instructions to reach the same outcome: destructive actions are exceptional, not routine.

Daily Rituals — structure that compounds

Morning alignment. Pull yesterday's wins, open reminders, tickets, PRs. Energy check (sharp / steady / recovering). Pick 1–3 leverage items for the day with a protected thinking block.

Evening review. Lead with wins, not scores. What shipped, what's blocked, which products got attention. Process carry-overs. Prep tomorrow's draft.

Friday pulse. 5-line week retro: product attention days, top 3 shipped, bounced-all-week items, Monday leverage candidate.

The leverage question: what's the one thing today that, if done, makes everything else easier or irrelevant? That gets the deep block. Everything else fits around it.

Scheduling — delegate the recurring to the machine

/schedule — Cloud scheduled tasks. Runs on Anthropic-managed infrastructure, so the task runs even when your computer is off. Conversational setup. cloud schedule

/loop [interval] [prompt] — bundled skill. Session-scoped recurring prompt. Dies when the session exits. Put a default prompt in .claude/loop.md. scheduled tasks

Desktop scheduled tasks — run on your machine, with direct access to your local files and tools. Separate from cloud tasks. desktop schedule

Tokens — every wasted token is a thought that never happened

Max 2 parallel agents. Each loads ~7K of overhead. Bypass only with budget to spare.

CLI > MCP. gh, curl, jq — no schema bloat. MCP is for tools you'll reuse repeatedly.

ToolSearch saves tokens. 50+ tool servers load on demand, not upfront.

Subagents for research. Separate context, summary back. Your main thread stays clean.

Scope narrowly. "Check src/auth/" beats "investigate auth." Unscoped exploration fills context.

Skills > MCP for procedures. Transparent, reviewable, no black boxes. Skills support files loaded on demand.

Restart early. Context-heavy threads produce worse output over time. A fresh session with updated CLAUDE.md beats a long one with accumulated debt. Akshat's finding from live Claude Code usage: the bulk of tokens in long sessions go to re-reading history rather than doing useful work. Start fresh every 15–20 turns when the work allows.

Setup — your config is your prior

CLAUDE.md: less is more. Every line earns its place.

Hooks for musts. Prose-level adherence is ~80%. Hooks are 100%.

Memory cap: 15 files. Merge before adding. MEMORY.md is an index, not storage.

Skills are folders. SKILL.md + supporting files, scripts, data. Progressive disclosure.

Gotchas are gold. Build skills from your failures. The third time you correct the same thing, it's a skill.

Push CLAUDE.md to git. Shared context compounds. Every teammate benefits.

Knowledge Architecture — what lives where

CLAUDE.md — identity & instructions. Who you are, tech stack, communication style. The onboarding doc your agent reads in 2 seconds. Under 5 KB.

heartbeat.md — what's in progress right now. Active deployments, current sprint, open blockers. Update every session.

MEMORY.md — index of persistent learnings. Each memory is a separate .md file. Cap at 15, merge aggressively.

decisions.md — the why behind choices. Why approach A over B. Future-you will thank you.

ROUTE.md — folder map and quick paths. Direct lookup, no exploration.

Reference-Driven Building — feed the agent a known-good example

Blank-canvas prompts produce blank-canvas output. A reference beats description every time.

Find something similar. A screenshot, a working component, a reference repo, a public design. Feed it to the agent before asking for a new version.

Describe the delta, not the whole thing. "Like this, but with RLS at the tab level and tenant-agnostic IDs" beats "build an RLS dashboard."

Spell out business rules. AI won't infer financial years, tenant scoping, or regulatory constraints. If it matters, write it in the prompt.

Watch for hardcoded IDs. Agents love to copy literal IDs from examples. Always templatize tenant and environment identifiers before shipping.

AI won't infer your constraints. Financial-year logic, role-level security, tenant isolation, entity scoping — none of these are guessable. Spell them out every time. Trusting the agent to "figure it out" is the single most expensive bug pattern.

Context Bloat — long threads rot

Long threads = worse output. Token cost rises. Hallucinations creep in. Auto-compact carries the most recent skill invocations forward within a 25K-token budget (5K per skill), so older invocations get dropped. Plan accordingly.

Close, don't continue. When a conversation gets long, close it. Start fresh. Update CLAUDE.md with what you learned.

/rewind — rewind to a previous point or summarize from here. Trim the fat without starting over.

Invest in docs, not prompts. Two days of clear project docs beats two weeks of prompt-tweaking. You will re-enter the same problem.

Avoid — coordination failures start small

Kitchen-sink sessions. One task drifts into three. /clear between unrelated work.

Correction loops. 3+ corrections = polluted context. Rewrite the prompt and start fresh.

Bloated CLAUDE.md. Past 5 KB, rules get lost to context pressure. Move procedures to skills.

Unscoped exploration. "Investigate everything" fills context. Scope or use a subagent.

Ship without verification. Plausible ≠ correct. No tests, no confidence. LSP plugins give free diagnostics.

Trusting memory over sources. If you didn't read the docs this session, don't claim what's in them. Ask the agent to fetch and verify.

Porting to Other CLIs — same intent, different primitive

Claude Code is the reference implementation for agentic CLIs. Every other CLI has the same concepts with different names and different mechanisms.

Concept	Claude Code	Codex	Cursor	Gemini CLI	Copilot CLI
Project instructions	`CLAUDE.md`	`AGENTS.md`	`.cursorrules` / rules files	`GEMINI.md`	`copilot-instructions.md`
Deterministic guardrails	Hooks	Sandbox + approvals	Command approvals	Tool policy + `activate_skill`	Plugin / skill permissions
External tools	MCP	MCP (`~/.codex/config.toml`)	MCP	MCP	MCP
Repeatable workflows	Skills / custom commands	Skills / standing instructions	Commands	Skills (same open standard)	Skills (auto-discovery)
Safety posture	Hooks + managed settings	Sandbox + approvals	Workspace trust	Tool permission gates	Policy settings

Translation rule: don't copy the surface feature. Translate the intent. A hook in Claude Code becomes a sandbox rule in Codex. A skill in Gemini CLI follows the Agent Skills open standard so the same SKILL.md often works across tools.

Update Protocol — how this page is maintained

When updating this page, do not freehand a "what's new" section from memory. Work in order:

Pull the official CHANGELOG. gh api repos/anthropics/claude-code/contents/CHANGELOG.md. That's the ground truth.

Cross-check the commands reference. code.claude.com/docs/en/commands. Removed commands must be removed here too.

Extract durable, user-relevant changes. Skip internal fixes and edge cases. Keep features, new commands, hook events, meaningful defaults.

Translate the capability into workflow consequences. What should a user do differently now?

Cite every claim. Version tag from CHANGELOG, or direct link to docs. Anything unverified does not ship.

Commit on a feature branch, open a PR. Review before merge, even for content-only changes.

Hallucination is the failure mode. This page has shipped invented features twice in its history (a fabricated "Channels" description, a fictional "Computer Use" entry). Both were caught and reverted. The verify-or-remove rule exists because of that.

Sources — canonical references this page is built on

Claude Code (primary):

Codex:

Open standards:

Agent Skills open standard