What Are LLMs?
Predictive text at scale — models, training, inference.
Learning objectives
- Explain LLM in one sentence a manager understands
- Distinguish training, fine-tuning, and inference
- Know what an LLM cannot reliably do
LLM = autocomplete at industrial scale
A Large Language Model (LLM) reads your text and predicts the most likely next token (piece of a word), again and again, until it finishes a reply. It has no persistent memory unless you send history in each API call.
ChatGPT, Claude, Gemini, Llama, Mistral — all are LLM products. “GPT-4” is a model name; “ChatGPT” is the chat app built on top.
Three phases
| Phase | Who runs it | Cost |
|---|---|---|
| Training | Vendor (months, thousands of GPUs) | Millions — not your job |
| Fine-tuning | Optional — custom style on your docs | $$ — specialist task |
| Inference | You — each API call or self-hosted GPU | Per token — your budget |
What LLMs are good / bad at
Summarizing, drafting, classifying intent, explaining docs, code snippets from examples
Math without tools, live facts after training cutoff, guaranteed truth, secrets you should not share
Worked example — FAQ without hallucination risk
Workshop Co. should not ask the model “when is the next class?” from memory. Instead:
- API fetches class list from database (ground truth)
- LLM formats answer in friendly tone from that JSON only
This pattern — retrieve facts, then generate prose — is the foundation of RAG (Chapter 6).
Try it yourself
List three Workshop Co. tasks: (1) safe for LLM alone, (2) needs database first, (3) should never use LLM.
Sample
- Safe alone: rewrite class description for social media
- Needs DB: “seats left in March intro class”
- Never: legal waiver interpretation, medical advice, storing credit cards in prompts
Quick quiz
Does the model “know” your server IP unless you paste it in the prompt?
Answer
No. It only sees what you send in that request (plus any retrieval/RAG you attach). Treat every prompt as potentially logged by the provider.