Chapter 3

Tokens & Tokenization

How text becomes numbers, pricing, and counting tokens.

Learning objectives

  • Define token and why models do not use “words” directly
  • Estimate token counts for budgeting
  • Relate tokens to API pricing

Tokens are the meter on the pump

Models split text into tokens — chunks that might be a whole word, a syllable, or punctuation. English averages ~4 characters per token, but varies:

TextRough tokens
Hello1
workshopco.ca3–5
500-word FAQ page~650–750
Full Book 1 chapter pasted inThousands — expensive

Input vs output tokens

APIs bill separately:

  • Input tokens — system prompt + conversation history + user message + retrieved docs
  • Output tokens — the model’s reply (often cost more per token)
Example pricing shape (illustrative — check vendor):
Input:  $0.15 / 1M tokens
Output: $0.60 / 1M tokens

One FAQ reply: 800 input + 200 output ≈ fractions of a cent
1,000 chats/month ≈ still under a few dollars IF prompts stay small

Worked example — Workshop Co. monthly estimate

Assumptions: 200 FAQ chats/month, 1,200 input + 300 output tokens each.

Input:  200 × 1,200 = 240,000 tokens
Output: 200 × 300   =  60,000 tokens

At $0.15 / $0.60 per 1M (illustrative):
  Input cost  ≈ $0.036
  Output cost ≈ $0.036
  Total API   ≈ $0.07/month + engineering time

Bill spikes when someone pastes entire log files into the chat widget.

Token blowups
  • Pasting 50 KB nginx logs → thousands of tokens per message
  • Sending full conversation forever → use summarization or window limits
  • Repeating huge system prompt every turn → cache prompts where vendor supports it

Try it yourself

Use a tokenizer tool (OpenAI Tokenizer, Hugging Face) on Workshop Co.’s system prompt draft:

“You are Workshop Co.’s FAQ assistant. Answer only from the provided class schedule JSON. Never invent dates. If unsure, say contact support@workshopco.ca.”

Count tokens. Add a 400-token JSON schedule. What is one request total?

Ballpark

System prompt ~45–60 tokens + JSON ~400 + user question ~20 ≈ ~500 input tokens per turn before history.

Quick quiz

Why is code often more token-heavy than prose?

Answer

Symbols, indentation, and rare identifiers split into many tokens; long URLs and base64 explode counts.