← Dictionary
AI & Technoun

Hallucination

/həˌluːsɪˈneɪʃən/

When an LLM outputs information that's plausible-sounding but wrong.

Definition

A hallucination is output from a large language model (LLM) that is fluent and confident-sounding but factually incorrect — invented citations, fabricated numbers, plausible-but-wrong claims, or misattributed quotes.

Hallucinations are a structural feature of how LLMs work, not a bug to be patched. Models predict the next plausible token; plausibility doesn't track truth. The model that confidently cites a non-existent court case isn't broken — it's working exactly as designed, generating fluent text in the shape of a citation. Improvements in training, prompting, and tool-use reduce hallucination but don't eliminate it.

The mitigation playbook is well understood. Retrieval-augmented generation (RAG) grounds answers in real documents. Tool use lets the model query authoritative sources. Confidence thresholds and citation requirements force the model to reveal when it's uncertain. Production AI systems combine all three — and still need human review for high-stakes output.

Origin

The term has been used in NLP research since the 2010s; mainstream usage exploded after ChatGPT's launch in November 2022. Andrej Karpathy and others have argued the term is misleading (it implies a malfunction; the behaviour is actually working-as-designed) but the language has stuck.

How it works

  1. Architect for grounding: use RAG to give the model real source documents instead of relying on training data.
  2. Use tool use: let the model query a database or API rather than recall facts.
  3. Force citations: require every factual claim to come with a source.
  4. Use confidence-aware prompting: 'if you don't know, say so'.
  5. Verify high-stakes output with humans or independent checks.
  6. Monitor for hallucinations in production via spot-checks and user-feedback loops.

When to use it

Use when

  • Any LLM-powered product where factual accuracy matters (customer support, legal, medical, financial).
  • Internal tools where users may take output at face value.
  • When integrating LLMs into existing knowledge-work flows.

Skip when

  • Pure creative or brainstorming uses where invention is the goal.
  • Demos and prototypes where output isn't user-facing.

Key metrics

Examples

In practice at Makreate

Makreate's AI work — both for clients building AI products and for our own internal tooling — treats hallucinations as a design constraint, not an open question. We architect with grounding (RAG) by default for any factual use case, build citation requirements into prompts, and never ship LLM output to end users without a verification layer. A recent client was building an AI customer-support tool that hallucinated product specs 6% of the time. We refactored to RAG against their actual product database and forced citation links to source docs; hallucination rate dropped under 0.5% and the support team trusted the output enough to actually use it.

AI Web App Development →

Common mistakes

Frequently asked

Can hallucinations be eliminated?

No. They're a structural feature of how LLMs generate text. They can be dramatically reduced (RAG, tool use, citations) but not eliminated.

Do bigger models hallucinate less?

On average, yes — but more confidently when they do. Mitigation depends on architecture (grounding, tool use), not just model size.

How do I detect hallucinations in production?

User-feedback loops, spot-checks, and verification against ground-truth sources. No fully-automated detection works perfectly yet.

Further reading

Related terms

WhatsApp