Context Length & Memory
Context windows, truncation, RAG, and staying within limits.
Learning objectives
- Define context window and what happens when you exceed it
- Apply strategies: truncate, summarize, RAG
- Design Workshop Co. FAQ within a 128k window realistically
Context length = working memory limit
The context window is the maximum tokens the model can consider in one request (input + output combined on many APIs). Examples:
| Model tier (examples) | Context window |
|---|---|
| Smaller / older chat models | 8k–32k tokens |
| Current flagship APIs | 128k–200k+ tokens |
| Self-hosted 7B on 16 GB GPU | Often 4k–8k practical |
Larger context costs more per request and can reduce focus — “lost in the middle” effect where mid-document facts get ignored.
What happens when you overflow
- Hard error — API rejects request (400)
- Truncation — provider drops oldest messages (dangerous for support bots)
- Summarize-then-continue — your code compresses history
RAG — Retrieval Augmented Generation
Instead of stuffing entire manuals into context:
Convert FAQ chunks to vectors (offline)
User question → top 3 relevant chunks only
LLM answers using those chunks as source
Worked example — Workshop Co. class FAQ
System: You answer from SOURCES only. Cite source id.
SOURCES:
[id:intro] Intro to Woodworking — Sat Mar 14, 9am, 6 seats, $189
[id:box] Box Joint — Sat Mar 21, sold out
User: Is there anything March 14 for beginners?
→ Model should cite [id:intro], not invent Apr dates
Try it yourself
Marcus wants to paste all 7 Swift Host textbooks into one support bot. Roughly 250k words. What should you tell him?
Answer
Do not paste wholesale — exceeds practical context and wastes tokens. Use RAG: chunk by chapter, embed, retrieve 2–5 relevant sections per question. Link to full chapters on site for humans.
Scenario
Support chat keeps “forgetting” the customer’s name after 30 messages. Likely cause?
Answer
Context truncation — early messages dropped when history exceeds window. Fix: rolling summary, CRM lookup by session ID, or store name server-side and re-inject in system prompt.