Fine-Tuning — Definition & Examples

Definition

Fine-tuning is the process of taking a pre-trained AI model and continuing to train it on a smaller, domain-specific dataset — adapting its behaviour, style, or knowledge to a particular use case without training a model from scratch.

Fine-tuning was the default customisation approach in the GPT-3 era. With the rise of much larger models and retrieval-augmented architectures (RAG), most production teams now reach for prompting and RAG before fine-tuning — they're cheaper, faster to iterate, and produce results that are easier to inspect and update.

Fine-tuning still has legitimate uses: enforcing very specific output formats, adapting model style to a brand voice, training on proprietary task data where prompt engineering hits a ceiling. But it's no longer the first move; it's a specialised tool.

Origin

Fine-tuning as a transfer-learning technique predates LLMs — it's been standard practice in deep learning since ~2014. The OpenAI fine-tuning API (2021) brought it to mainstream LLM use; the technique remains essential in computer vision and speech.

How it works

Determine whether fine-tuning is actually needed (try prompting and RAG first).
Build a high-quality dataset (typically 50-1,000 examples for instruction fine-tuning).
Choose the base model (GPT-4o-mini, Llama, Claude, etc. — vendor support varies).
Run the fine-tuning job (most major vendors offer managed fine-tuning).
Evaluate the fine-tuned model against the base model on a held-out test set.
Deploy with monitoring; budget for ongoing retraining as data drifts.

When to use it

Use when

When prompting and RAG can't achieve the required output quality or style.
For domain-specific task formats with proprietary data.
When latency or cost demands a smaller fine-tuned model.

Skip when

For general-purpose tasks — the base models are very strong.
When the dataset is small (under 50 examples).
Before exhausting prompt engineering and RAG.

Key metrics

Task accuracy on held-out test set
Latency vs. base model
Cost per request vs. base model
Time-to-update when behaviour needs to change

Examples

We fine-tuned a small model on customer-support ticket categorisation and cut cost 95% with no accuracy loss.
The fine-tune didn't beat the prompt — we'd over-indexed on the technique.
Fine-tuning on brand voice produced cleaner copy than any prompt we'd written.

In practice at Makreate

Makreate AI engagements treat fine-tuning as a specialised tool, not a default. We typically exhaust prompting and RAG before recommending fine-tuning, because the iteration speed of prompting is materially faster and the quality gap has narrowed substantially with modern models. When fine-tuning is genuinely the right tool — for very specific output formats or significant cost optimisation — we build the eval framework first and the fine-tune second.

AI Web App Development →

Common mistakes

Fine-tuning before exhausting prompting and RAG.
Fine-tuning on too small a dataset.
Not measuring against base-model performance.
Forgetting that fine-tuned models age — retrain as data drifts.
Locking yourself into a vendor's fine-tuning format.

Frequently asked

How many examples do I need to fine-tune?

Highly variable. Instruction fine-tuning often works with 50-500 examples. Classification tasks may need 1,000+. Style adaptation often needs surprisingly few (50-200).

Fine-tune or use RAG?

RAG for current/proprietary knowledge; fine-tuning for output style or format. They can also be combined.

Should I fine-tune the latest model or a smaller one?

For cost-sensitive workloads, fine-tune the smallest model that meets quality bars. For quality-sensitive workloads, prompt the largest available model first.