What is an LLM?
A large language model predicts the next token — roughly, the next chunk of text — given everything before it, one token at a time. Trained on vast text, it learns patterns of language and knowledge well enough to answer, summarize, and write. Understanding tokens and context is the key to using it well.
Why it matters
LLMs are reshaping software, and "AI engineer" roles increasingly mean building on them. Even classical ML practitioners need to understand them. Knowing how they actually work — prediction over tokens within a context window — dispels the magic and tells you why they hallucinate, forget, and cost what they do.
What to learn
- Tokens and tokenization
- Next-token prediction as the core mechanism
- The context window and its limits
- Temperature and sampling
- Why models hallucinate
- The transformer at a high level
- Capabilities versus reliability
Common pitfall
Treating an LLM as a database of facts. It generates plausible text, not verified truth, so it will state wrong things with total confidence — hallucination. Use LLMs for language tasks and reasoning over provided context, verify any factual claim, and ground them in real data (the RAG node) when accuracy matters.
Resources
Primary (free):
- Andrej Karpathy — Intro to LLMs · video
- Hugging Face — LLM course · course
- Jay Alammar — The illustrated transformer · article
Practice
Use a tokenizer tool to see how a sentence breaks into tokens, then call an LLM API with the same prompt at two different temperatures and compare the outputs. Deliberately ask something it is likely to get wrong and observe a confident hallucination. Done when you can explain tokens, context, and why it hallucinated.
Outcomes
- Explain tokens, next-token prediction, and context windows.
- Adjust temperature and sampling for a task.
- Explain why LLMs hallucinate.
- Decide when an LLM is and is not the right tool.