Chapter 4

What Are LLMs?

Predictive text at scale — models, training, inference.

Learning objectives

Explain LLM in one sentence a manager understands
Distinguish training, fine-tuning, and inference
Know what an LLM cannot reliably do

LLM = autocomplete at industrial scale

A Large Language Model (LLM) reads your text and predicts the most likely next token (piece of a word), again and again, until it finishes a reply. It has no persistent memory unless you send history in each API call.

Plain definition

ChatGPT, Claude, Gemini, Llama, Mistral — all are LLM products. “GPT-4” is a model name; “ChatGPT” is the chat app built on top.

Three phases

Phase	Who runs it	Cost
Training	Vendor (months, thousands of GPUs)	Millions — not your job
Fine-tuning	Optional — custom style on your docs	$$ — specialist task
Inference	You — each API call or self-hosted GPU	Per token — your budget

What LLMs are good / bad at

Good at

Summarizing, drafting, classifying intent, explaining docs, code snippets from examples

Weak at

Math without tools, live facts after training cutoff, guaranteed truth, secrets you should not share

Worked example — FAQ without hallucination risk

Workshop Co. should not ask the model “when is the next class?” from memory. Instead:

API fetches class list from database (ground truth)
LLM formats answer in friendly tone from that JSON only

This pattern — retrieve facts, then generate prose — is the foundation of RAG (Chapter 6).

Try it yourself

List three Workshop Co. tasks: (1) safe for LLM alone, (2) needs database first, (3) should never use LLM.

Sample

Safe alone: rewrite class description for social media
Needs DB: “seats left in March intro class”
Never: legal waiver interpretation, medical advice, storing credit cards in prompts

Quick quiz

Does the model “know” your server IP unless you paste it in the prompt?

Answer

No. It only sees what you send in that request (plus any retrieval/RAG you attach). Treat every prompt as potentially logged by the provider.