Definition
An embedding is a dense numeric vector (typically 256–3072 dimensions) representing a piece of content — text, image, audio — produced by an embedding model trained so that semantically similar items have geometrically close vectors.
Embeddings turn meaning into numbers. "Cancel my subscription" and "how do I unsubscribe" share little surface text but mean the same thing — and a good embedding model produces vectors close to each other for both. That property powers semantic search (find content by meaning, not keywords), clustering (group similar items), classification (compare to category vectors), and RAG (retrieve relevant context by query).
Different embedding models produce different vector spaces — they're not interchangeable across systems. The choice depends on use case (search vs. classification), language (English-only vs. multilingual), domain (general vs. code, biomedical, etc.), latency, cost, and dimension count. OpenAI's text-embedding-3-large, Cohere's embed-v3, Voyage's voyage-3 family, and open-source models (BGE, E5) are common choices in 2025.
Origin
Word embeddings (Word2Vec, 2013; GloVe, 2014) introduced the idea of representing words as vectors. Modern transformer-based sentence and document embeddings (Sentence-BERT 2019, OpenAI ada 2022) extended the technique to longer text. Multimodal embeddings (CLIP, 2021) extended it to images.
How it works
- Pick an embedding model that matches your domain and constraints.
- Chunk the source content (for text: 500–1500 tokens with overlap).
- Pass each chunk through the model; receive a fixed-dimension vector.
- Store vectors in a vector database (or pgvector inside Postgres).
- On query: embed the query; find vectors with smallest cosine distance (or use hybrid retrieval).
- Re-embed periodically as content changes; incremental indexing pipelines pay for themselves.
When to use it
Use when
- Semantic search across content (docs, products, knowledge base).
- RAG retrieval.
- Classification by similarity to category vectors.
- Clustering and de-duplication.
Skip when
- When exact keyword match is what's needed (BM25 outperforms for some queries).
- On extremely small corpora where loading everything into context is simpler.
Key metrics
- Retrieval recall@K (was the right answer in the top K?).
- Mean Reciprocal Rank (MRR) on a labelled eval set.
- Embedding cost per item.
- Query latency (embedding + search).
Examples
- We embedded 4,000 support articles to power semantic search over them.
- Embedding distance is how machines measure 'similar in meaning' instead of 'same words'.
- Switching to a larger embedding model lifted retrieval recall from 78% to 94%.
In practice at Makreate
Makreate AI builds use embeddings to unlock semantic search and retrieval over your content — not keyword matching, real understanding. On a recent enterprise engagement we embedded 50,000 internal documents (policies, procedures, technical specs). The resulting semantic search reduced internal query response time from 14 minutes (search + ask + escalate) to under 2 minutes (semantic search returns the right doc immediately). Estimated time savings across the org: 11,000 hours per year.
AI Web App Development →Common mistakes
- Re-embedding the entire corpus on every change.
- Mixing embedding models. The same model must produce both indexed vectors and query vectors.
- Wrong chunk size. Too small loses context; too large dilutes signal.
- Skipping evaluation. "Looks like it's working" isn't a metric.
- Ignoring multilingual needs. English-only embeddings perform poorly on non-English content.
Frequently asked
Which embedding model is best?
Use cases differ. text-embedding-3-large (OpenAI) is a strong default for general English text. Voyage and Cohere have excellent options. For specialised domains (biomedical, legal, code) consider domain-tuned models.
Cosine or dot-product similarity?
Cosine is the default and works for normalised embeddings. Dot product is faster and equivalent if vectors are L2-normalised. Many vector databases default to one or the other.
How big should the embedding be (dimensions)?
Larger embeddings carry more information but cost more to store and search. 1024–1536 dimensions is a common sweet spot in 2025.