AI docs · Foundations
Large language models (LLMs)
Models trained to predict the next token of text, which turns out to be a powerful way to read, write, and reason over language.
What it is
- An LLM is a neural network trained on large amounts of text to predict the next token (a word or word-piece) given what came before.
- That simple objective, at scale, produces models that can summarize, translate, answer questions, write code, and follow instructions.
How it works
- Most modern LLMs use the Transformer architecture, which uses attention to weigh how tokens relate to each other.
- They are pretrained on broad text, then often fine-tuned and aligned (e.g. with human feedback) to follow instructions helpfully and safely.
- At use time, the model is given a prompt and generates a response token by token.
Trade-offs
- Strong general capability, but no built-in notion of truth: they can be fluent and wrong.
- Bigger models are more capable but cost more to run; smaller models can be cheaper and faster for narrow tasks.
When to use it
- Tasks involving understanding or generating natural language, code, or structured text.
- When flexibility matters more than guaranteed correctness, with checks around the output.
Common pitfalls
- Treating output as fact without verification.
- Assuming a bigger model is always the right answer when a smaller one would do.