What Are Tokens? (And Why They Matter)
A practical, slightly nerdy guide to tokens: what they are, how tokenization works (BPE/WordPiece/SentencePiece), why tokens control cost, speed, and limits, plus safe ways to shrink prompts without losing quality.

If a prompt is your instruction to an AI model, tokens are the currency you spend to express it - and to receive an answer. They determine how much you pay, how fast you get results, and how much “memory” the model has for your context. On PractiqAI, where you practice real tasks and get judged on outcomes, understanding tokens makes your prompts leaner, cheaper, and more reliable - especially as you iterate through attempts and perfect prompts.
If you’ve already read our primer on prompts, consider this the companion piece that explains the “length, cost, and limits” side of the equation.
The 20-second gist
A token is a chunk of text (often ~4 characters in English). Billing, speed, and model limits are measured in tokens. Keep prompts high-signal and structured to get more done with fewer tokens.
Definition: tokens vs. characters/words
A token is a unit of text the model actually sees. It may be a single character (“a”), part of a word (“encod”), a whole word (“hello”), or punctuation and spaces. This depends on the tokenizer used by the model. Rules of thumb for English from OpenAI: 1 token ≈ 4 characters ≈ ¾ of a word; 100 tokens ≈ 75 words; a typical paragraph is ~100 tokens. Spaces, punctuation, and partial words all count. (OpenAI Help Center)
This is why your word processor’s “word count” rarely matches your API’s “token count.” Tokens are model-specific, not human-intuitive. The same sentence can tokenize differently across models.
How tokenization works (BPE, WordPiece, SentencePiece)
Tokenization converts text into token IDs. Modern LLMs use subword tokenizers because they balance vocabulary size and expressiveness:
- BPE (Byte-Pair Encoding) merges the most frequent symbol pairs to create subwords. It’s reversible, works on arbitrary text, and tends to map roughly to 4 bytes per token. OpenAI’s
tiktokenlibrary implements a fast BPE tokenizer used for OpenAI models. (GitHub) - WordPiece (used in BERT-style models) grows a vocabulary by adding the merge that most increases the likelihood of the training data; you’ll often see
##prefixes for subword continuations. (Hugging Face) - SentencePiece often appears with a Unigram language model; it treats text as a stream of Unicode characters and uses a special
▁symbol to represent spaces, making it robust for languages without whitespace segmentation (e.g., Japanese, Chinese). (Hugging Face)
Different algorithms → different token boundaries → different counts for the same text. Always match your tokenizer to your model.
Why tokens drive cost, speed, and limits
LLMs read and write tokens. That means:
- Cost: Most providers bill per token (input, output, sometimes “cached input” at a reduced rate). OpenAI’s public pricing pages enumerate per-million-token rates and discounts for cached prompts. (OpenAI)
- Speed: More tokens in → more compute → higher latency. Large outputs also stream more tokens to you, increasing total time.
- Limits: Every model has a maximum context window (prompt + output combined). If you exceed it, your request fails; if you approach it, critical details can be truncated. OpenAI’s help notes that each model has a maximum combined token limit and that high-capacity models now support very large contexts. (OpenAI Help Center)
Newer accounting categories
You may see cached tokens (for repeated prefixes) and reasoning tokens (internal thinking steps in some models) in usage reports. Providers price them differently, but they’re all tokens for billing purposes.
Counting tokens (interactive tools, quick math)
The easiest way to “see” tokens is OpenAI’s interactive Tokenizer - paste text to visualize splits and counts. Keep it open while you write prompts. (OpenAI Platform)
If you need code, use tiktoken:
# Count tokens with OpenAI's tiktoken (Python)
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o") # or a specific encoding, e.g., "o200k_base"
text = "Summarize the findings in <=120 words. Use plain English."
num_tokens = len(enc.encode(text))
print(num_tokens)tiktoken is fast, BPE-based, and aligned with OpenAI models. (GitHub)
For a back-of-the-napkin estimate: in English text, multiply characters by 0.25 or words by ~1.33 to approximate tokens - straight from OpenAI’s rules of thumb. Then confirm with a tokenizer before you ship. (OpenAI Help Center)
Token budgets: context vs. output
Think of your request as a budget equation:
input_tokens + output_tokens ≤ model_context_windowThree practical implications:
- Reserve output space. If your context is ~20k tokens and the model’s window is 24k, you have little room to answer. Trim the prompt or set a strict output specification so the model doesn’t overflow. (OpenAI Help Center)
- Chunk large inputs. Split documents, summarize sections, or stage work: extract → summarize → finalize.
- Leverage caching when available. Static prefixes (style guides, schemas, glossaries) can be billed cheaper as cached tokens on some providers. Verify your vendor’s policy. (OpenAI)
Multilingual quirks & emojis
Token counts vary by language and script:
- Languages without spaces (Chinese, Japanese) tokenize differently: SentencePiece/Unigram treats text as a continuous stream and uses special markers for boundaries, often leading to more tokens per character than English. (Hugging Face)
- Spanish and other languages can exhibit higher token-to-character ratios than English; OpenAI’s help cites “Cómo estás” (10 chars → 5 tokens). (OpenAI Help Center)
- Emoji are deceptive: a visible “single emoji” may be a ZWJ sequence - multiple Unicode code points joined by an invisible zero-width joiner - which can become several tokens. Family/skin-tone variants are classic multi-code-point examples. (unicode.org)
Copy/paste surprises
Some editors normalize quotes or spaces; tokenizers “see” those differences. If counts look off, paste the exact text into a tokenizer to verify.
Practical tips to shorten prompts safely
Shorter prompts can be cheaper and faster - but only if you keep signal:
- Lead with structure, not prose. Declare the output shape first (“Return ONLY valid JSON matching this schema: …”). Structure compresses instructions and reduces meandering.
- Remove pleasantries and redundancy. “Please,” “thank you,” and repeated reminders add tokens without adding control.
- Name the audience once. Instead of repeating “for procurement managers” everywhere, set it once and keep only the constraints that truly matter.
- Prefer constraints over explanations. Replace long rationales with a short checklist of acceptance criteria.
- Use compact examples. One tight few-shot example beats five wordy ones.
- Minify JSON. Pretty-printed JSON wastes tokens on whitespace. Require compact output (
{"a":1,"b":2}) unless humans need to read it. - Externalize boilerplate. If the platform allows, put reusable style guides or glossaries in a cached prefix. (OpenAI)
- Cap the answer. Specify maximum length in words or tokens and what to do if constraints conflict (e.g., “truncate lower-priority sections”).
These habits line up with the way PractiqAI trains you - clear tasks, measurable criteria, compact context - so “short but strict” becomes second nature.
Tokenization pitfalls in code/JSON
Code and machine-readable outputs are fertile ground for accidental bloat:
- Pretty vs. minified JSON: Indentation and spaces are extra tokens. Ask for compact JSON in production pipelines.
- Base64 and hashes: Long, low-frequency strings tokenize poorly and explode counts. Avoid embedding them; attach as files or reference IDs instead.
- Ambiguous schemas: If you don’t pin keys and types, models drift into verbose prose. Declare schemas and require “JSON only.”
- Hidden whitespace: Tabs vs. spaces, Windows vs. Unix newlines - these can change tokenization.
- Fence leakage: If you ask for “code only,” but also include narrative instructions inside the fenced block, you pay twice: once for the instruction and again for the model to ignore it.
A tokenizer library like tiktoken makes it easy to test realistic payloads from your app before you push to production. (GitHub)
Quick checklist (before you hit “Send”)
You should be able to answer “yes” to most of these:
- Have I specified the output format up front (JSON schema, sections, or code-only)?
- Is the audience and tone declared once, not repeated everywhere?
- Are my acceptance criteria short and checkable?
- Did I minify machine-readable output and avoid deadweight strings (base64, UUID lists)?
- Did I reserve output tokens so the model has room to answer? (OpenAI Help Center)
- Did I confirm counts with a tokenizer tool or library? (OpenAI Platform)
Mini-exercises (optimize this prompt)
These are designed to practice meaning-preserving compression. Paste each “Before” into the OpenAI Tokenizer, then your “After,” and compare counts. Aim for –30% tokens with equal or better control. (OpenAI Platform)
Exercise 1 - Meeting recap
Before
You are a helpful assistant. Please write a concise summary of the following meeting notes that is easy to read for busy executives who do not have time to dig through details. The summary should be short, and it should also include a list of the key decisions and a list of action items with owners and dates. Keep it professional and friendly. Avoid jargon. Thank you.
Notes:
<notes here>After
Task: Executive meeting recap.
Output (JSON only):
{"summary": "<=120 words, plain language>", "decisions": ["..."], "actions": [{"owner":"", "item":"", "due":"YYYY-MM-DD"}]}
Use only facts in Notes.
Notes: <notes here>Exercise 2 - Bug report triage
Before
Please analyze this bug report text and help me determine severity, likely component, and a short reproduction plan. If there is missing information, politely ask a follow-up question first. Also provide a guess for how long this might take to fix in hours or days.After
Classify bug → {severity, component, repro_steps, missing_info[], fix_eta_range}.
If critical facts are missing, return missing_info with 1–3 questions and do NOT guess.
Text: <bug report>Exercise 3 - Research paragraph
Before
Write a paragraph that is educational for high school students, in a clear style with no complicated words, explaining why photosynthesis is important, and include two real-world examples and one misconception to avoid.After
Audience: high school students.
Write ≤120 words explaining why photosynthesis matters.
Include exactly 2 real examples and 1 common misconception (label it).
No figurative language.Starter references
If you want to go deeper or need something to bookmark:
- OpenAI Tokenizer (interactive): visualize splits and counts. (OpenAI Platform)
- OpenAI help - token counting basics: rules of thumb, pricing concepts, limits, categories like cached/reasoning tokens. (OpenAI Help Center)
tiktoken(OpenAI’s BPE tokenizer): fast, production-grade counting and encoding APIs. (GitHub)- Hugging Face tokenizers overview: how BPE, WordPiece, and SentencePiece differ, with concrete examples. (Hugging Face)
Connecting back to practice. Tokens are not an abstract math puzzle; they’re the operational constraint you manage on every task. PractiqAI’s task-and-judge loop is designed so you feel how structure and brevity improve results, reduce cost, and raise pass rates - skills you can carry straight into your daily work.
Where to go next
Pair this with our article “What Is a Prompt?” to align intent + structure with length + cost. Then jump into a PractiqAI course and practice trimming prompts without losing control.
Paweł Brzuszkiewicz
PractiqAI Team
PractiqAI designs guided drills and feedback loops that make learning with AI feel like muscle memory training. Follow along for product notes and workflow ideas from the team.
Ready to make AI practice part of your routine?
Explore interactive drills, daily streaks, and certification paths built by the PractiqAI team.
Explore coursesLatest articles
Fresh insights from the PractiqAI team.
