← Dictionary
AI & Technoun

Prompt Engineering

/prɒmpt ˌendʒɪˈnɪərɪŋ/

Designing the inputs to an LLM so it produces the output you need at production quality.

Definition

Prompt engineering is the discipline of designing the inputs to a Large Language Model — instructions, context, examples, format requirements, constraints — so the model consistently produces outputs that meet production requirements for accuracy, format, and tone.

Prompt engineering is to LLM applications what query optimisation is to databases or compiler flags are to native code. The same model produces wildly different outputs depending on how it's prompted. Vague prompts get vague answers; structured prompts with clear instructions, relevant context, and worked examples (few-shot) get production-quality outputs.

The craft is rapidly maturing. The early chaos of "prompt magic" is giving way to structured patterns: chain-of-thought reasoning, constitutional prompting, output schemas (JSON mode, structured generation), and prompt templating frameworks. Production prompts are versioned, evaluated, and tested like any other piece of code.

Origin

The discipline emerged with GPT-3's release (2020) when developers discovered the model's behaviour was wildly sensitive to phrasing. Formal techniques (few-shot prompting, chain-of-thought) were named in academic papers in 2022; the broader practice matured into a recognisable discipline through 2023–2024.

How it works

  1. Define the task precisely — what output, what format, what constraints.
  2. Write a structured prompt: role, instruction, context, format spec, examples.
  3. Include 2–5 worked examples (few-shot) when the task is complex or format-sensitive.
  4. Use chain-of-thought ("think step by step") for reasoning tasks.
  5. Constrain output (JSON schema, structured generation, regex) where format matters.
  6. Build an evaluation set; iterate prompt variations against it; ship the winner.

When to use it

Use when

  • On every LLM-powered feature beyond throwaway demos.
  • When prompt outputs are inconsistent or low-quality.
  • When migrating between models — prompts often need adjustment per model.

Skip when

  • On problems where deterministic logic would do. Prompt engineering can't fix a fundamentally non-LLM problem.

Key metrics

Examples

In practice at Makreate

Every Makreate AI build invests in prompt engineering as a first-class discipline — versioned, evaluated, and tested like code. On a recent client engagement we built a customer-support copilot. Initial prompts hit 72% accuracy on our eval set. Six iterations later — adding role definition, structured output schema, four worked examples, and chain-of-thought reasoning — we hit 94%. Same model, same data, three weeks of prompt work.

AI Web App Development →

Common mistakes

Frequently asked

Few-shot or zero-shot prompting?

Few-shot when format or domain is non-obvious — examples teach the model what "correct" looks like. Zero-shot when the task is well-known to the model and brevity matters.

Should I write prompts in English?

Yes, in production, almost always. English-language training data dominates major LLMs, and English prompts produce more reliable behaviour. Specialised use cases may differ.

How do I version prompts?

Same as code — git-tracked, reviewed in PRs, with an eval suite that runs in CI. Promptlayer, LangSmith, and similar tools add observability.

Further reading

Related terms

WhatsApp