Claude Code – Token Optimisation Cheat Sheet

A practical checklist to reduce Claude Code token usage and cost.

Part 1 covers what you can do with zero installs. Part 2 covers external tools and repositories. Save 30–70% on costs.

Claude Code can dramatically speed up development, but costs grow quickly if context and token usage aren’t managed. Most token waste comes from long conversations, large logs, unnecessary files loaded into context, and Claude re-learning your codebase from scratch every session.

I’ve collated a checklist in two parts: first, everything you can do with Claude Code’s built-in features (no installs, no external repos), then external tools that layer on top.

Part 1: Built-in Optimisation

Everything here ships with Claude Code or requires only a config file edit. No installs, no external dependencies.

CLAUDE.md hygiene

Your CLAUDE.md file loads into every single message. Every line costs tokens on every turn.

Action	Why
Keep `CLAUDE.md` under 150 lines	Every line multiplies across every message in every session. `wc -l CLAUDE.md` to check.
Move large workflows to `.claude/skills/`	Skills load on-demand when triggered. Rules in `.claude/rules/` and content in `CLAUDE.md` load every time.
Use `@file` references sparingly	Each `@`-referenced file adds its full content to context. Reference only what Claude needs for the current task.
Write concise instructions	“Fix lint errors before committing” beats a 20-line explanation of your lint setup. Claude already knows ESLint, Prettier, etc.

Check out this trending CLAUDE.md from Andrej Karpathy’s skills repo — it’s just 64 lines.

.claudeignore (one-time setup)

Create a .claudeignore file in your project root. Same syntax as .gitignore. Stops Claude from scanning or reading matched paths.

node_modules/
dist/
build/
coverage/
logs/
*.min.js
*.map
*.lock
__pycache__/
.next/

Without this, Claude may read thousands of generated files during exploration, burning tokens on content that provides no value.

Model switching

Not every task needs Opus. Picking the right model per task is one of the easiest cost wins.

Model	When to use	Command
Haiku	Formatting, renaming, file moves, simple grep-and-replace, generating boilerplate, quick lookups, mechanical refactors	`/model haiku`
Sonnet	Default daily driver. Code review, bug fixes, multi-file changes, test writing, architectural questions	`/model sonnet`
Opus	Complex debugging, large refactors spanning 10+ files, nuanced design decisions, security review, when Sonnet gives wrong answers twice	`/model opus`

Rule: start with Sonnet. Drop to Haiku for mechanical work. Escalate to Opus only when Sonnet struggles. Most teams spend 60%+ on Opus when Sonnet handles 80% of tasks fine.

Session management commands

These are built into Claude Code. Use them actively, not as a last resort.

Command	When to use	Why
`/compact`	At ~70% context usage	Don’t wait for auto-compact at 95%. Tell Claude what to preserve first: ”Before we compact, note we decided X…”
`/clear`	After completing a distinct task	Starts fresh context. Cheaper than carrying stale conversation forward.
`/cost`	After each major task	Check spending so you can adjust behaviour.
`/context`	When things feel slow or expensive	Shows what’s actually filling your context. Diagnose before optimising blindly.
`/resume <name>` and `/rename <name>`	Between work sessions	Resume a named session instead of re-explaining everything.

MCP server discipline

Each enabled MCP server adds 100–500 tokens of tool definitions to every message. If you have 10 servers enabled but only use 2, you’re paying for 8 on every turn.

Know what you have before changing anything:

Command	What it shows
`/mcp`	Lists all configured MCP servers and their current status (enabled/disabled)
`/mcp details <server>`	Shows tools exposed by that server (this is what costs tokens per message)
`/mcp disable <server>`	Disables server, removes its tool definitions from context
`/mcp enable <server>`	Re-enables when needed

Workflow: run /mcp first to see the full list. Check which servers you actually used this week. Disable the rest. Re-enable when needed.

Prompt craft

The single biggest lever you control. A vague prompt causes Claude to explore broadly (reading many files, producing long responses). A precise prompt gets a precise answer.

Habit	Example	Token impact
Precise scope	`”Fix the race condition in src/auth.ts lines 42-60”`	Prevents Claude from reading unrelated files
Done criteria	`”Done when: tests pass, no lint errors, no new warnings”`	Stops Claude from continuing past the goal
Plan before implement	Press `Shift+Tab` or ask “outline your plan first”	Catches wrong approaches before Claude reads 20 files
One task per session	Don’t chain unrelated tasks	Each topic adds context that persists and compounds
Reference specific files	`”Update the handler in api/users.ts, not the route”`	Eliminates exploration time

Subagents for exploration

When Claude needs to explore many files (searching for usages, understanding a module), use the Task tool or ask Claude to use subagents. The subagent reads 20 files but returns only a summary to your main session. Your main context stays clean.

You don’t need to install anything for this. Ask: ”Use a subagent to find all callers of processPayment, then summarize what you found.”

Environment variables

Set these in ~/.claude/settings.json or as shell env vars:

Variable	Value	Why
`MAX_THINKING_TOKENS`	`10000`	Caps extended thinking burn on simple tasks. Default can be much higher.
`CLAUDE_CODE_SUBAGENT_MODEL`	`haiku`	Uses cheaper Haiku for subagent work that doesn’t need full reasoning.

Hooks (config only, no external tools)

Hooks are shell commands in .claude/settings.json that run on events. No plugins required.

Hook	Example	Why
Session start	`git branch --show-current`	Injects current branch so Claude doesn’t need to ask or run git.
Pre-tool guard	Warn when a file >100KB is about to be read	Prevents accidental full reads of generated or binary files.
Output filter	`grep “ERROR” logs/app.log \| tail -20`	Sends only relevant log lines instead of full log files.

Scripts for repeated tasks

If you run the same sequence often (lint, format, typecheck), put it in a script. Claude calls one script instead of three commands, and the output is shorter than three separate tool results.

# scripts/check.sh
npm run lint && npm run format && npm run typecheck

Quick reference file

Create docs/QUICK_REF.md with your most-used commands, architecture notes, and conventions. Keep it under 50 lines. Claude reads this small file instead of large documentation.

Callout: Never go past 12% of your context window (~120K out of 1M tokens). Above that you enter the context rot zone where every extra token costs more for lower-quality output. The 1M window is insurance for emergencies, not a target.

Part 2: External Tools & Repositories

These require third-party installation but provide capabilities beyond what’s built in.

Output compression

Tool	Install	Stars	Why
caveman	`/plugin marketplace add JuliusBrussee/caveman` · github	61K	65% mean output token reduction. Strips narrative filler while keeping code intact. One of the highest-leverage changes you can make.
RTK	`brew install rtk` → `rtk gain` → `rtk init --global` · github	—	Compresses CLI outputs before they reach Claude. Up to 90% reduction on noisy commands.

Persistent memory

Tool	Install	Stars	Why
claude-mem	`npx claude-mem install` · github	75K	Captures session work, compresses with AI, injects relevant context into future sessions. ChromaDB vector search + local SQLite. Eliminates re-explaining your architecture every session.

Context engineering

Tool	Install	Stars	Why
GSD	`npx gsd install` · github	63K	Spec-driven development. Subagents get exactly the context they need. Solves context drift on sessions longer than ~30 mins.
Language server	`npm install -g pyright` or `typescript-language-server`	—	Gives Claude type info, definitions, and references without reading large files.
code-review-graph	`pip install code-review-graph` · github	~1K	Tree-sitter knowledge graph in SQLite. Claude queries the graph instead of scanning files. 6.8x average token reduction on reviews.
claude-token-efficient	`git clone github.com/drona23/claude-token-efficient` · github	—	Drop-in optimised CLAUDE.md. 63% output token reduction in benchmarks.
claude-token-optimizer	github	—	Restructures your docs directory so Claude loads only what it needs. Typical: 11,000 → 1,300 tokens on session start.

Measure usage

Tool	Install	Why
ccusage	`npx ccusage@latest` → `npx ccusage daily --project myapp --breakdown` · github	Shows where tokens are spent by project, model, and time period.
claude-monitor	`pip install claude-monitor` → `claude-monitor --plan pro --refresh-rate 5`	Live visibility into token usage while you work.
token-optimizer	`/plugin marketplace add alexgreensh/token-optimizer` · github	Finds ghost tokens: skills you never use, orphaned MEMORY.md files, decisions lost on compaction.
claude-token-lens	`npm install -g claude-token-lens`	Real-time token attribution showing which tool or MCP server is burning quota.

Star-ranked tool summary

Tool	Stars	Cost impact	Install
everything-claude-code	185K	High: strategic-compact + monitoring	`npx ecc install`
claude-mem	75K	High: eliminates session re-learning	`npx claude-mem install`
ccswitch	73K	Medium: model routing + MCP management	farion1231/cc-switch
GSD	63K	High: context engineering, no drift	`npx gsd install`
caveman	61K	Very high: 65% output reduction	`/plugin marketplace add JuliusBrussee/caveman`
claude-squad	7.4K	Medium: parallel agents per worktree	`brew install claude-squad`
code-review-graph	~1K	Very high on large codebases (6–49x)	`pip install code-review-graph`

The Main Causes of Token Waste

Source	Built-in Fix	External Fix
Verbose Claude responses	Concise prompts, done criteria	caveman (65% reduction)
Large CLI outputs	Hooks to filter logs	RTK compression
Large generated files	`.claudeignore`	—
Long conversations	`/clear`, `/compact` at 70%	—
Re-explaining codebase	`/resume`, QUICK_REF.md	claude-mem
Claude reading irrelevant files	`.claudeignore`, precise prompts	code-review-graph
Unused MCP servers	`/mcp`, `/mcp disable`	—
Overspending on model tier	Start Sonnet, Haiku for mechanical, Opus only when needed	—

Optimising these areas typically reduces token usage by 30–70% while making Claude responses faster and more reliable.

Part 1: Built-in Optimisation#

CLAUDE.md hygiene#

.claudeignore (one-time setup)#

Model switching#

Session management commands#

MCP server discipline#

Prompt craft#

Subagents for exploration#

Environment variables#

Hooks (config only, no external tools)#

Scripts for repeated tasks#

Quick reference file#

Part 2: External Tools & Repositories#

Output compression#

Persistent memory#

Context engineering#

Measure usage#

Star-ranked tool summary#

The Main Causes of Token Waste#