Chapter 5

Tokens & Tokenization

How text becomes numbers, pricing, and counting tokens.

Learning objectives

Define token and why models do not use “words” directly
Estimate token counts for budgeting
Relate tokens to API pricing

Tokens are the meter on the pump

Models split text into tokens — chunks that might be a whole word, a syllable, or punctuation. English averages ~4 characters per token, but varies:

Text	Rough tokens
`Hello`	1
`workshopco.ca`	3–5
500-word FAQ page	~650–750
Full Book 1 chapter pasted in	Thousands — expensive

Input vs output tokens

APIs bill separately:

Input tokens — system prompt + conversation history + user message + retrieved docs
Output tokens — the model’s reply (often cost more per token)

Example pricing shape (illustrative — check vendor):
Input:  $0.15 / 1M tokens
Output: $0.60 / 1M tokens

One FAQ reply: 800 input + 200 output ≈ fractions of a cent
1,000 chats/month ≈ still under a few dollars IF prompts stay small

Worked example — Workshop Co. monthly estimate

Assumptions: 200 FAQ chats/month, 1,200 input + 300 output tokens each.

Input:  200 × 1,200 = 240,000 tokens
Output: 200 × 300   =  60,000 tokens

At $0.15 / $0.60 per 1M (illustrative):
  Input cost  ≈ $0.036
  Output cost ≈ $0.036
  Total API   ≈ $0.07/month + engineering time

Bill spikes when someone pastes entire log files into the chat widget.

Token blowups

Pasting 50 KB nginx logs → thousands of tokens per message
Sending full conversation forever → use summarization or window limits
Repeating huge system prompt every turn → cache prompts where vendor supports it

Try it yourself

Use a tokenizer tool (OpenAI Tokenizer, Hugging Face) on Workshop Co.’s system prompt draft:

“You are Workshop Co.’s FAQ assistant. Answer only from the provided class schedule JSON. Never invent dates. If unsure, say contact support@workshopco.ca.”

Count tokens. Add a 400-token JSON schedule. What is one request total?

Ballpark

System prompt ~45–60 tokens + JSON ~400 + user question ~20 ≈ ~500 input tokens per turn before history.

Quick quiz

Why is code often more token-heavy than prose?

Answer

Symbols, indentation, and rare identifiers split into many tokens; long URLs and base64 explode counts.