Chapter 6

Context Length & Memory

Context windows, truncation, RAG, and staying within limits.

Learning objectives

Define context window and what happens when you exceed it
Apply strategies: truncate, summarize, RAG
Design Workshop Co. FAQ within a 128k window realistically

Context length = working memory limit

The context window is the maximum tokens the model can consider in one request (input + output combined on many APIs). Examples:

Model tier (examples)	Context window
Smaller / older chat models	8k–32k tokens
Current flagship APIs	128k–200k+ tokens
Self-hosted 7B on 16 GB GPU	Often 4k–8k practical

Bigger is not free

Larger context costs more per request and can reduce focus — “lost in the middle” effect where mid-document facts get ignored.

What happens when you overflow

Hard error — API rejects request (400)
Truncation — provider drops oldest messages (dangerous for support bots)
Summarize-then-continue — your code compresses history

RAG — Retrieval Augmented Generation

Instead of stuffing entire manuals into context:

Embed

Convert FAQ chunks to vectors (offline)

Retrieve

User question → top 3 relevant chunks only

Generate

LLM answers using those chunks as source

Worked example — Workshop Co. class FAQ

System: You answer from SOURCES only. Cite source id.

SOURCES:
[id:intro] Intro to Woodworking — Sat Mar 14, 9am, 6 seats, $189
[id:box] Box Joint — Sat Mar 21, sold out

User: Is there anything March 14 for beginners?

→ Model should cite [id:intro], not invent Apr dates

Try it yourself

Marcus wants to paste all 7 Swift Host textbooks into one support bot. Roughly 250k words. What should you tell him?

Answer

Do not paste wholesale — exceeds practical context and wastes tokens. Use RAG: chunk by chapter, embed, retrieve 2–5 relevant sections per question. Link to full chapters on site for humans.

Scenario

Support chat keeps “forgetting” the customer’s name after 30 messages. Likely cause?

Answer

Context truncation — early messages dropped when history exceeds window. Fix: rolling summary, CRM lookup by session ID, or store name server-side and re-inject in system prompt.