Caveman English: A Token & Context Saver for Claude Code
When you write to Claude, every "the", "please", and "could you" costs tokens. Multiplied across a long Claude Code session, polite English is one of the most expensive things you ship into context.
Caveman English strips prompts to a telegraphic core: drop articles, drop fillers, keep verbs and nouns. The model still understands. You save 30–50% of input tokens — and just as importantly, you save context window, which is the harder limit to hit on long sessions.
What it looks like
Before:
Could you please refactor the Frontend.elm file to extract the navigation
into its own module, and make sure the existing tests still pass?
After:
refactor Frontend.elm: extract nav into own module. tests must still pass.
Same instruction, ~55% fewer tokens. Claude follows it identically.
Why it works
LLMs are robust to grammatical noise. Articles ("a", "the"), modal politeness ("could you", "would you mind"), and hedges ("maybe", "if possible") are mostly redundant given the imperative verb that follows. The model uses the content words — verbs, nouns, file paths, constraints — to plan its response. The function words barely move the needle on instruction adherence.
Where caveman English does lose something: nuance and tone. If the difference between "rewrite this" and "consider whether a rewrite is appropriate" matters for your task, keep the long form. For 90% of dev work — fix this, build that, refactor X, run Y — telegraphic style is enough.
Rules of thumb
- Keep the verb first.
refactor X,add Y,fix Z,run tests. The verb tells Claude what mode it's in. - Drop articles. "the file", "a function", "an error" — gone. Almost never matters.
- Drop politeness. No "please", no "could you", no "would you mind". Claude doesn't need encouragement.
- Keep specifics. File paths, function names, error messages, line numbers — these are content words, never strip them.
- Keep constraints. "no new deps", "must pass tests", "don't touch CSS" — these are load-bearing.
- Keep the why when it changes the answer. "fast for prod" vs "readable for review" → different output. Worth the tokens.
What you save
Two things, and the second matters more:
- Input tokens = direct cost. ~30–50% reduction on prompt text. Real money on long Claude API sessions.
- Context budget = the harder limit. Every prompt, every tool call, every file read goes into a fixed-size window. Caveman prompts leave more room for Claude's output and for the next turn. On a 200k window with 50 tool calls, this compounds.
Output tokens are unaffected — Claude still writes full English in responses. Only your half of the conversation gets compressed.
The bigger win: make Claude speak Caveman too
Compressing your prompts saves input tokens. But the much bigger win is telling Claude to reason and respond in Caveman English — because output tokens cost ~5× more than input tokens at the API level, and Claude is naturally verbose.
One instruction in your CLAUDE.md, system prompt, or first message:
Respond in caveman English. Drop articles, drop politeness,
drop hedges. Verbs first. No preambles ("Great question!",
"Let me think...", "I'll now..."). Skip closing pleasantries.
Just answer.
What it changes:
- Final responses shrink 30–60% with zero information loss. The "Sure! Let me look at that..." → "I'll now check the file..." → "Let me know if..." sandwich alone is a brutal token tax across hundreds of turns.
- Extended thinking / reasoning blocks (when enabled) compress in the same way — fewer tokens spent on internal deliberation, faster latency, lower cost.
- Tool-using sessions (Claude Code, agentic loops): every tool call is followed by a response chunk. Multiplied across 50+ tool calls in a single session, the saved output tokens dominate everything else.
Pricing reality check (Anthropic API, mid-2026): every output token is roughly 5× the price of an input token across Sonnet, Opus, and Haiku. So shaving 30% off output text outweighs shaving 50% off input text. Both compound.
Caveat: you lose the teaching tone — Claude won't volunteer the why unless asked. For ship-it dev work that's a feature. For learning a new domain, drop a one-shot "explain in full English for this answer" when you actually want the lecture.
When NOT to use it
- First message of a session when you're framing intent and constraints. Spend the tokens, set the stage.
- Architecture discussions where nuance matters.
- Code review prompts where you want Claude to explain why, not just what.
- Anything user-facing. Don't ship caveman English into commit messages, PRs, or docs. It's a private input compression scheme.
Concrete savings
Here is the same task to Claude Code, both styles:
# Polite (52 tokens)
Could you please look at the failing test in tests/AuthSpec.elm and figure out
why it's broken? It started failing after the last commit. Don't change any
production code unless absolutely necessary.
# Caveman (28 tokens)
tests/AuthSpec.elm: failing since last commit. find cause. fix in test only,
not prod code.
46% reduction. Same outcome. Now imagine 100 prompts in one session.
Bottom line
Treat your prompts like commit messages, not emails. Imperative, terse, content-rich. Save the tokens for what Claude is going to do, not what you're going to say.