You might have noticed that Claude Code tracks usage via “tokens” consumed, which you’ve probably seen from the /cost or /context commands, your Anthropic dash, or just in Claude convos.
Tokens are the constant, foundational unit that Claude uses to read and generate text. They’re not exclusive to Claude, as Claude tokens are the same as NLP tokens, which are what LLMs use to process input and generate.
Tokenization in NLP
LLMs break text content down into tokens, which are chunks of text that might be somewhere between a character and a full word long. Depending on your tokenizer, the word “programmatically” could be broken down into tokens as “program” + “matically”, or as a whole word (alternatively, character tokenization would break it down by letter, to catch unique spellings and typos). You can see how OpenAI’s models tokenize text with this webapp.
Each token has a next-most-likely token, based on how that token has been used in the training data. LLMs take input text, and output a series of tokens that are statistically likely to follow the tokenized input.
Punctuation, whitespace, and code syntax are included in tokens, and their token length is shorter than that of written copy. Every LLM has a set context window, which is the memory limit of how many tokens it can hold at a time (most Claude models hold ~200k). When you exceed the context window, earlier tokens will get dropped.
So when you’re using Claude Code, keep in mind that code is more token-dense than natural language. And languages like Python are more token-efficient than Rust or C++.
How Claude tokenizes text
Claude uses a tokenizer trained specifically on its training data, so strings will be tokenized differently from other models like GPT-5. Claude’s token length averages out to about 4 letters per token, and 1.5 tokens per word. Token length is shorter for code snippets, e.g. () or { will each equal one token.
When you prompt CC with an input, it doesn’t just tokenize your message alone: it’ll process that along with the conversation history, loaded files, system prompts, command outputs, etc. As Claude processes more context, it consumes more tokens since this “input” compounds. Longer sessions will start to drain usage faster as they go on because of this.
Input vs. output tokens (and cache tokens)
Claude distinguishes between input tokens (e.g. what you submit to the model, as well as history + context) and output tokens (the text that it generates). These are tracked separately and have different pricing rates.
Claude prices output tokens higher than input tokens. With Claude Sonnet, for example, output tokens cost ~5x more per million than input tokens. So when Claude writes 300 lines of code, it’ll be much more pricey vs. when it reads 300 lines and generates a short answer. Or when CC reads in your entire codebase, it won’t cost as much as one-shot generating a codebase of a comparable size.
There’s a third type of token class: cache tokens. CC supports prompt caching, which lets repeated context (a long system prompt, a codebase snapshot, etc.) get stored then reused at a much lower token price rate. So adding project context to your system prompt and CLAUDE.md will help keep usage lower over time (vs. repeating the same instructions often in a prompt).
How to track your token usage
Here’s a detailed guide on tracking Claude Code tokens.
If you’re using Claude via the API, you can view your token consumption, costs by model, and usage over time in the Claude Console.
Running the /cost command during a Claude Code session will show token usage for your current session, with respect to cost. If you’re on a Pro or Max subscription and not the API, /cost reflects what you would have paid via API. You can also run /context to see how many tokens are currently filling your context window.
Third-party tools can give you good metrics on your usage. ccusage is a lightweight CLI tool that reads and analyzes Claude’s local JSONL files for token metrics. You can run and install it with npx ccusage@latest. If you want something more real-time, you can run Claude-Code-Usage-Monitor to track live token consumption and burn rate. It’ll also predict when you’re likely to hit your limits.
Environments for Claude Code
When you’re developing with Claude Code, you’ll want to make sure that every new feature is tested extensively in a secure, isolated environment. Ephemeral environments pair beautifully with CC: you can spin up an environment automatically based on a branch/PR, run tests, do QA, push patches, and then merge once you’ve determined it’s ready.
Shipyard is a plug-and-play ephemeral environment solution for devs using Claude Code. Claude can interact with the environments on its own via MCP/CLI (pull logs, get each live URL, visit the environments with Playwright MCP, etc). Try it free for 30 days and see how much faster your dev/test loop gets.