LLM (large language model)
What is an LLM (large language model)?
Definition
A large language model (LLM) is a neural network trained on vast amounts of text to predict the next token in a sequence. By doing this at scale, it learns to generate, summarize, translate and reason over natural language — and it powers chatbots, AI agents and most modern AI features.
Table of contents
A large language model (LLM) is the engine behind today's AI boom — the technology inside ChatGPT, Claude and Gemini. It is a neural network with billions of parameters, trained to predict text, that turns out to be remarkably good at language tasks.
How an LLM works
An LLM is trained on a huge corpus of text with one deceptively simple objective: predict the next token (a word or word-piece) given everything before it. Repeated across trillions of words, this teaches the model grammar, facts, styles and patterns of reasoning. At runtime it generates text one token at a time, each prediction feeding the next.
Tokens & context window
LLMs don't read characters or words directly — they read tokens, chunks of roughly 3–4 characters. The context window is how many tokens the model can consider at once (its working memory). A bigger window lets it handle longer documents, but every token costs compute and money, which is why cost control matters in production.
Limits & hallucinations
Because an LLM predicts plausible text rather than looking up facts, it can "hallucinate" — produce confident but wrong answers. Its knowledge is also frozen at training time. Both limits are addressed by retrieval-augmented generation (RAG), which feeds the model real, current data to ground its answers.
Where LLMs are used
An LLM on its own answers prompts. Wrapped with tools and a goal it becomes an AI agent; connected to your data it powers search, drafting, classification and support. The model is the engine — the value comes from how you wire it into real workflows.
Summary
An LLM is a next-token predictor trained at massive scale. Understanding tokens, the context window and hallucination is the foundation for using it responsibly — and for building production systems that stay accurate and affordable.
Frequently asked questions
What is the difference between an LLM and AI?
AI is the broad field; an LLM is one specific kind of AI specialized in language. Most of what people call "AI" today — chatbots, writing assistants, AI agents — is built on top of LLMs.
Why do LLMs make mistakes?
An LLM generates statistically likely text rather than retrieving verified facts, so it can produce confident but incorrect answers. Grounding it with retrieval (RAG) and validation greatly reduces this.
More from the Wiki-Lexikon
What is an AI agent?
An AI agent is software that uses a language model to plan and act toward a goal — calling tools, making decisions and running multi-step tasks autonomously. Definition, how it works and examples.
What is RAG (retrieval-augmented generation)?
RAG (retrieval-augmented generation) feeds an LLM relevant, current data at query time so its answers are grounded in your facts — not just its training. Definition, how it works and why it matters.
What is prompt engineering?
Prompt engineering is the craft of writing instructions that get reliable, accurate results from an LLM. Definition, core techniques (context, examples, structure) and why it matters in production.
What is a vector database (and embeddings)?
A vector database stores text as embeddings — numeric vectors of meaning — so AI can search by similarity, not just keywords. Definition, how embeddings work and why vector search powers RAG.