A practical checklist to reduce Claude Code token usage and cost.

Part 1 covers what you can do with zero installs. Part 2 covers external tools and repositories. Save 30–70% on costs.


Claude Code can dramatically speed up development, but costs grow quickly if context and token usage aren’t managed. Most token waste comes from long conversations, large logs, unnecessary files loaded into context, and Claude re-learning your codebase from scratch every session.

I’ve collated a checklist in two parts: first, everything you can do with Claude Code’s built-in features (no installs, no external repos), then external tools that layer on top.

Part 1: Built-in Optimisation

Everything here ships with Claude Code or requires only a config file edit. No installs, no external dependencies.

CLAUDE.md hygiene

Your CLAUDE.md file loads into every single message. Every line costs tokens on every turn.

ActionWhy
Keep CLAUDE.md under 150 linesEvery line multiplies across every message in every session. wc -l CLAUDE.md to check.
Move large workflows to .claude/skills/Skills load on-demand when triggered. Rules in .claude/rules/ and content in CLAUDE.md load every time.
Use @file references sparinglyEach @-referenced file adds its full content to context. Reference only what Claude needs for the current task.
Write concise instructions“Fix lint errors before committing” beats a 20-line explanation of your lint setup. Claude already knows ESLint, Prettier, etc.

Check out this trending CLAUDE.md from Andrej Karpathy’s skills repo — it’s just 64 lines.

.claudeignore (one-time setup)

Create a .claudeignore file in your project root. Same syntax as .gitignore. Stops Claude from scanning or reading matched paths.

node_modules/
dist/
build/
coverage/
logs/
*.min.js
*.map
*.lock
__pycache__/
.next/

Without this, Claude may read thousands of generated files during exploration, burning tokens on content that provides no value.

Model switching

Not every task needs Opus. Picking the right model per task is one of the easiest cost wins.

ModelWhen to useCommand
HaikuFormatting, renaming, file moves, simple grep-and-replace, generating boilerplate, quick lookups, mechanical refactors/model haiku
SonnetDefault daily driver. Code review, bug fixes, multi-file changes, test writing, architectural questions/model sonnet
OpusComplex debugging, large refactors spanning 10+ files, nuanced design decisions, security review, when Sonnet gives wrong answers twice/model opus

Rule: start with Sonnet. Drop to Haiku for mechanical work. Escalate to Opus only when Sonnet struggles. Most teams spend 60%+ on Opus when Sonnet handles 80% of tasks fine.

Session management commands

These are built into Claude Code. Use them actively, not as a last resort.

CommandWhen to useWhy
/compactAt ~70% context usageDon’t wait for auto-compact at 95%. Tell Claude what to preserve first: ”Before we compact, note we decided X…”
/clearAfter completing a distinct taskStarts fresh context. Cheaper than carrying stale conversation forward.
/costAfter each major taskCheck spending so you can adjust behaviour.
/contextWhen things feel slow or expensiveShows what’s actually filling your context. Diagnose before optimising blindly.
/resume <name> and /rename <name>Between work sessionsResume a named session instead of re-explaining everything.

MCP server discipline

Each enabled MCP server adds 100–500 tokens of tool definitions to every message. If you have 10 servers enabled but only use 2, you’re paying for 8 on every turn.

Know what you have before changing anything:

CommandWhat it shows
/mcpLists all configured MCP servers and their current status (enabled/disabled)
/mcp details <server>Shows tools exposed by that server (this is what costs tokens per message)
/mcp disable <server>Disables server, removes its tool definitions from context
/mcp enable <server>Re-enables when needed

Workflow: run /mcp first to see the full list. Check which servers you actually used this week. Disable the rest. Re-enable when needed.

Prompt craft

The single biggest lever you control. A vague prompt causes Claude to explore broadly (reading many files, producing long responses). A precise prompt gets a precise answer.

HabitExampleToken impact
Precise scope”Fix the race condition in src/auth.ts lines 42-60”Prevents Claude from reading unrelated files
Done criteria”Done when: tests pass, no lint errors, no new warnings”Stops Claude from continuing past the goal
Plan before implementPress Shift+Tab or ask “outline your plan first”Catches wrong approaches before Claude reads 20 files
One task per sessionDon’t chain unrelated tasksEach topic adds context that persists and compounds
Reference specific files”Update the handler in api/users.ts, not the route”Eliminates exploration time

Subagents for exploration

When Claude needs to explore many files (searching for usages, understanding a module), use the Task tool or ask Claude to use subagents. The subagent reads 20 files but returns only a summary to your main session. Your main context stays clean.

You don’t need to install anything for this. Ask: ”Use a subagent to find all callers of processPayment, then summarize what you found.”

Environment variables

Set these in ~/.claude/settings.json or as shell env vars:

VariableValueWhy
MAX_THINKING_TOKENS10000Caps extended thinking burn on simple tasks. Default can be much higher.
CLAUDE_CODE_SUBAGENT_MODELhaikuUses cheaper Haiku for subagent work that doesn’t need full reasoning.

Hooks (config only, no external tools)

Hooks are shell commands in .claude/settings.json that run on events. No plugins required.

HookExampleWhy
Session startgit branch --show-currentInjects current branch so Claude doesn’t need to ask or run git.
Pre-tool guardWarn when a file >100KB is about to be readPrevents accidental full reads of generated or binary files.
Output filtergrep “ERROR” logs/app.log | tail -20Sends only relevant log lines instead of full log files.

Scripts for repeated tasks

If you run the same sequence often (lint, format, typecheck), put it in a script. Claude calls one script instead of three commands, and the output is shorter than three separate tool results.

# scripts/check.sh
npm run lint && npm run format && npm run typecheck

Quick reference file

Create docs/QUICK_REF.md with your most-used commands, architecture notes, and conventions. Keep it under 50 lines. Claude reads this small file instead of large documentation.


Callout: Never go past 12% of your context window (~120K out of 1M tokens). Above that you enter the context rot zone where every extra token costs more for lower-quality output. The 1M window is insurance for emergencies, not a target.

Part 2: External Tools & Repositories

These require third-party installation but provide capabilities beyond what’s built in.

Output compression

ToolInstallStarsWhy
caveman/plugin marketplace add JuliusBrussee/caveman · github61K65% mean output token reduction. Strips narrative filler while keeping code intact. One of the highest-leverage changes you can make.
RTKbrew install rtkrtk gainrtk init --global · githubCompresses CLI outputs before they reach Claude. Up to 90% reduction on noisy commands.

Persistent memory

ToolInstallStarsWhy
claude-memnpx claude-mem install · github75KCaptures session work, compresses with AI, injects relevant context into future sessions. ChromaDB vector search + local SQLite. Eliminates re-explaining your architecture every session.

Context engineering

ToolInstallStarsWhy
GSDnpx gsd install · github63KSpec-driven development. Subagents get exactly the context they need. Solves context drift on sessions longer than ~30 mins.
Language servernpm install -g pyright or typescript-language-serverGives Claude type info, definitions, and references without reading large files.
code-review-graphpip install code-review-graph · github~1KTree-sitter knowledge graph in SQLite. Claude queries the graph instead of scanning files. 6.8x average token reduction on reviews.
claude-token-efficientgit clone github.com/drona23/claude-token-efficient · githubDrop-in optimised CLAUDE.md. 63% output token reduction in benchmarks.
claude-token-optimizergithubRestructures your docs directory so Claude loads only what it needs. Typical: 11,000 → 1,300 tokens on session start.

Measure usage

ToolInstallWhy
ccusagenpx ccusage@latestnpx ccusage daily --project myapp --breakdown · githubShows where tokens are spent by project, model, and time period.
claude-monitorpip install claude-monitorclaude-monitor --plan pro --refresh-rate 5Live visibility into token usage while you work.
token-optimizer/plugin marketplace add alexgreensh/token-optimizer · githubFinds ghost tokens: skills you never use, orphaned MEMORY.md files, decisions lost on compaction.
claude-token-lensnpm install -g claude-token-lensReal-time token attribution showing which tool or MCP server is burning quota.

Star-ranked tool summary

ToolStarsCost impactInstall
everything-claude-code185KHigh: strategic-compact + monitoringnpx ecc install
claude-mem75KHigh: eliminates session re-learningnpx claude-mem install
ccswitch73KMedium: model routing + MCP managementfarion1231/cc-switch
GSD63KHigh: context engineering, no driftnpx gsd install
caveman61KVery high: 65% output reduction/plugin marketplace add JuliusBrussee/caveman
claude-squad7.4KMedium: parallel agents per worktreebrew install claude-squad
code-review-graph~1KVery high on large codebases (6–49x)pip install code-review-graph

The Main Causes of Token Waste

SourceBuilt-in FixExternal Fix
Verbose Claude responsesConcise prompts, done criteriacaveman (65% reduction)
Large CLI outputsHooks to filter logsRTK compression
Large generated files.claudeignore
Long conversations/clear, /compact at 70%
Re-explaining codebase/resume, QUICK_REF.mdclaude-mem
Claude reading irrelevant files.claudeignore, precise promptscode-review-graph
Unused MCP servers/mcp, /mcp disable
Overspending on model tierStart Sonnet, Haiku for mechanical, Opus only when needed

Optimising these areas typically reduces token usage by 30–70% while making Claude responses faster and more reliable.