What Are Vector Databases

Vector databases went from “weird research thing” to “every RAG diagram has one” in about eighteen months. If you’re building LLM apps, search, or recommendations, you’re now expected to have opinions about HNSW, hybrid search, and which vector DB to pick… ideally without spending your entire quarter reading papers.

This article is meant to be that missing mental model: not product marketing, but a practical map of what vector databases really are, how they differ from “vector store” libraries, and how to think about indexing, data modeling, reranking, freshness, multimodal vectors, and ops.

If you’ve read our piece on prompts, you know we treat a prompt as a specification for an AI model. A vector database is, in a similar spirit, the infrastructure that makes those model calls grounded in your data instead of the model’s imagination.

Let’s start with the shortest possible version.

The short answer

A vector database is a database that:

Stores objects (documents, products, code snippets…) plus vector embeddings for those objects,
Builds special indexes (HNSW, IVF, etc.) so it can quickly find nearest neighbors to a query vector,
Lets you combine that search with filters, metadata, and hybrid keyword+vector ranking,
And wraps it all in database things: durability, replication, backups, access control, observability.

Tools like Weaviate or Qdrant are examples of full-fledged vector databases: they persist both objects and vectors, expose query APIs, and integrate vector search with filters and hybrid search at scale.

By contrast, a “vector store” library (like a bare FAISS index) is usually just an in-memory (or on-disk) ANN index you embed inside your app. It’s great for experiments or small systems, but you have to bolt on persistence, metadata filtering, auth, monitoring, and backup yourself.

The rest of this article fills in the details so you can choose—and use—these tools without guessing.

What a vector database actually does

At its core, a vector database does three jobs:

Turns your data into vectors (or accepts vectors you generate yourself).
Indexes those vectors so nearest-neighbor queries are fast enough for production.
Returns rich objects, not just IDs, and lets you filter and rank them in useful ways.

Most modern vector DBs store both:

A payload/object: JSON-like structure (e.g., { id, title, body, tags, created_at, ... }), and
One or more vector fields: fixed-length arrays of floats representing text, images, code, etc.

On query, you send either:

A query vector directly, or
A piece of text/image/code that the DB or your app converts to a vector.

The database then:

Uses its index (HNSW, IVF, flat…) to find the closest vectors to your query in high-dimensional space.
Applies filters (e.g., tenant_id = 'acme' AND language = 'en').
Returns the objects with their metadata, sometimes with scores, highlights, and/or reranked orders.

That’s the main difference from a simple library like FAISS:

FAISS: “Give me the nearest 100 vector IDs to this vector.”
Vector DB: “Give me the nearest 10 customer FAQs in English for tenant X, created in the last 90 days, with their titles, content, and metadata, ready to send to the LLM.”

The libraries are the engine; the vector database is the car around that engine (chassis, steering, brakes, airbags, dashboard).

A good mental model

Think of a vector database as “Postgres for embeddings”—not in terms of SQL, but in terms of durability, scaling, and ergonomics. You could do everything by hand, but you probably don’t want to.

Index types: HNSW, IVF, filters, and metadata

If a vector DB is the car, the index is the transmission. It decides how you trade off speed, memory, and recall.

HNSW in one paragraph

HNSW (Hierarchical Navigable Small World) is a graph-based approximate nearest neighbor index. It builds a layered graph where each vector is a node; edges connect “nearby” vectors. To answer a query, you:

Start at a coarse top layer with a small subset of nodes,
Walk greedily toward neighbors that are closer to the query vector,
Drop down layers as you converge, until you’re in the fine-grained bottom layer.

This gives very fast search with high recall, and supports incremental inserts reasonably well, which is why you see HNSW as a default index in many vector DBs.

You usually tune a few parameters (like M, ef_construction, and ef_search) to trade memory usage for accuracy and latency.

IVF in one paragraph

IVF (Inverted File Index) works differently:

It clusters your vectors into centroids (think “buckets in vector space”).
At query time, it finds the closest centroids to your query,
Then only searches within those buckets instead of the entire dataset.

This is powerful when you have very large, mostly static datasets: you can skip 90–99% of the data at query time and stay fast, at the cost of a bit of recall. IVF variants (like IVF-Flat, IVF-PQ) mix clustering with compression.

Flat indexes

Some systems also provide a “flat” (brute-force) index: scan all vectors, compute exact distances. It’s simple, always correct, and works surprisingly well for small-to-medium collections or for filtered subsets where only a few thousand vectors pass the filter.

Filters and metadata

Indexing would be easy if you never had filters. In real systems, you almost always have conditions like:

“Only show documents for this tenant.”
“Only content in English.”
“Only items with is_published = true and created_at > now – 30 days.”

Vector DBs attach metadata to each object (Weaviate calls them “properties” on an object; Qdrant calls it “payload”).

At query time, the database:

Either applies filters first, narrows the candidate set, and then runs vector search within that subset (great recall, but tricky to optimize), or
Or runs vector search first, then post-filters candidate IDs and maybe backfills if too many get filtered out.

Modern systems do a mix of both and add tricks like bitset masks or specialized “filter-aware” indexing. Filtering is one of the hardest parts of vector search at scale; it’s not just a WHERE clause slapped on top.

Hybrid search: sparse + dense

Vector search is not here to kill keyword search; it’s here to team up with it.

In retrieval land, you’ll see two big families:

Dense vectors: embeddings from a neural model (text, image, code). Great at capturing meaning.
Sparse vectors / lexical: traditional keyword-based signals (BM25, term frequency, inverted indexes). Great at respecting exact wording and rare tokens (IDs, error codes).

Hybrid search combines them.

A common pattern (used by Weaviate, Qdrant, Elastic, and others) is:

Run a dense vector search for semantic matches.
Run a sparse/keyword search (e.g., BM25) in parallel.
Use a fusion algorithm—like Reciprocal Rank Fusion (RRF) or weighted scoring—to merge the two ranked lists into a single result set.

This gives you:

Flexibility for natural language queries (thanks to dense vectors),
Hard guarantees on keyword presence (thanks to sparse),
And better robustness to weird queries like “error E1234 in module zyx”.

Hybrid becomes especially important in enterprise settings where users mix jargon, IDs, and half-remembered phrases. Dense-only search often “hallucinates” plausible but irrelevant docs; sparse-only misses semantic matches. Hybrid search lets you cheat: you get both.

Data modeling: objects and payloads

If you treat a vector DB as “just an index,” you will eventually suffer. You need to model your data as carefully as you would in a relational database.

Objects vs. payloads

Most vector DBs expose a concept like:

Collection / class – roughly a table (e.g., documents, products).
Point / object – a row or record, with:
ID (string or UUID),
One or more vector fields,
A payload / property set (a JSON-ish map of scalar values and arrays).

Qdrant, for example, stores “points” with an ID, payload, and one or more vectors; collections group points and define indexing. Weaviate stores objects with a class schema and associated vectors.

Modeling for RAG (and similar)

In a typical RAG system, you might have:

id: stable ID for a chunk of text,
doc_id: ID of the original document,
chunk_index: position within the doc,
title, section_heading, body,
source (URL or path),
tags, tenant_id, created_at, language, etc.
vector: embedding of title + body or just the body.

Why this matters:

You can group results by doc_id when presenting to users or building context for the LLM.
You can de-duplicate near‐identical chunks from the same document.
You can easily filter on tenant_id, language, or created_at without reinventing your schema later.

A tiny example

Pseudo-schema for a “document chunks” collection:

json

{
  "id": "string",
  "doc_id": "string",
  "chunk_index": "number",
  "title": "string",
  "body": "string",
  "source": "string",
  "tenant_id": "string",
  "language": "string",
  "created_at": "string",
  "tags": ["string"],
  "vector": "float[1536]"
}

Model this once and your prompts (e.g., “retrieve 5 chunks per doc, most recent first”) become much simpler and more powerful.

Reranking and quality–cost tradeoffs

Even with good indexes and hybrid search, your first-stage retrieval is still approximate. That’s where reranking comes in.

The typical production pattern is two-stage retrieval:

Stage 1: Cheap, approximate search

Use the vector DB (possibly hybrid) to retrieve, say, top 50–200 candidates.
This is fast and usually limited by your index and network.

Stage 2: Expensive, precise reranking

Feed those candidates (query + candidate text) into a cross-encoder or an LLM-based scorer.
Compute a more accurate similarity score, then reorder the list and keep the top N (e.g., 10).

Why bother?

Quality: Cross-encoders look at the full query–document pair jointly, not just compressed embeddings. They’re often significantly more accurate, especially on nuanced or long queries.
Cost: Doing cross-encoder scoring on every document is impossible. Doing it on ~100 candidates is often fine.

The tradeoffs

You tune:

How many candidates to retrieve from the vector DB (top-k).
Which reranker (small cross-encoder vs large LLM; open-source vs API).
How many reranked results you keep and pass downstream (e.g., into the prompt context).

The knobs interact:

Increasing k improves recall but raises reranking cost and latency.
A heavier reranker improves precision but might blow up your latency budget.
LLM-based reranking can double as safety or policy filters, but it’s the most expensive path.

In practice, teams often start with:

k ≈ 50–100 from the vector DB,
A small, dedicated reranker model (e.g., cross-encoder from the same family as your embeddings),
Then experiment with LLM-based reranking for high-value queries only.

Freshness strategies: upserts, TTL, rebuilds

In a perfect world, your vector index would always reflect the latest state of your data. In reality, you have updates, deletes, and rapidly-changing domains like news or pricing.

There are three broad layers to keeping things fresh.

1. Upserts and soft deletes

Most vector DBs support upsert operations: if a point with a given ID exists, it’s updated; otherwise, it’s inserted. Under the hood, the index either:

Updates the existing node’s vector and metadata, or
Marks the old one as deleted and inserts a new one.

This works well for:

Corrections (fixing typos in documents),
Incremental content updates,
Soft-delete patterns (mark as is_deleted = true and filter out).

Over time, though, heavy churn can make your index fragmented. You might need periodic compaction or rebuilds, depending on the engine and workload.

2. Time-to-live (TTL) and retention

Some databases let you set a TTL per record; others require you to implement retention yourself. Either way, think about:

Which content should expire quickly (e.g., search logs, temporary results).
Which content is long-lived (knowledge base, docs).

A simple pattern:

Add a expires_at timestamp to payload.
Apply a filter expires_at > now() on all queries.
Run a background job to hard-delete expired points and trigger compaction when needed.

3. Full and partial rebuilds

Sometimes the underlying embedding distribution changes: you switch models, add new modalities, or drastically expand the corpus. Mix old and new vectors too much and your distances stop meaning what you think they mean.

In that case, you may need to:

Re-embed and re-index either:
The entire collection, or
At least a consistent subset (e.g., everything for a tenant or a time window).
Manage versioned collections:
docs_v1 (old embedding model),
docs_v2 (new model),
Gradually migrate queries and write paths from v1 to v2.

This avoids mixing incompatible embeddings and lets you roll back if needed.

Multimodal vectors: text, images, code

“Vector database” doesn’t mean “text database.” Vectors can represent anything: images, audio, code, user behavior, product interactions, you name it.

Modern tools like Weaviate and Qdrant are designed to store vectors for many data types and often support multiple vector fields per object (sometimes called multivector representations).

Common multimodal patterns

Text + image search: Each product has:
vector_text – embedding of title + description,
vector_image – embedding of product image. You can query either field depending on whether the user types text or uploads a photo.
Code search: Embed both function bodies and docstrings/comments. Store language, repo, file path as metadata. You can later combine:
Vector search for semantics,
Filters for repo or language,
Hybrid search for symbol names.
Behavioral + content embeddings: For recommendations, combine:
Item content vectors (text/image),
User history embeddings (aggregations of past interactions).

Storage patterns

You can:

Store multiple named vectors per object (e.g., content_vector, image_vector, code_vector).
Or store separate collections for each modality, linked by a shared item_id.

The first option is easier to manage operationally; the second is sometimes simpler when different teams own different modalities.

Ops basics: scale, backups, privacy

Vector databases are still databases. They need the same boring-but-critical operational care as any other infrastructure component.

Scaling

Most modern vector DBs are:

Cloud-native and horizontally scalable, with concepts like collections, shards, and replicas.
Designed to use a mix of RAM + disk instead of keeping everything in memory.

Key scaling questions:

QPS and latency: How many queries per second, and at what p95/p99 latency?
Data size: How big is your corpus now, and in 12–24 months?
Update pattern: Mostly append-only (great), or heavy updates/deletes (harder)?
Multitenancy: Do you need to isolate tenants physically (separate clusters) or logically (filters, namespaces)?

Capacity planning is easier if you test with realistic embeddings and queries early rather than scaling from a tiny toy dataset.

Backups and disaster recovery

You absolutely want:

Periodic snapshots of collections and index files, plus any write-ahead logs.
A restore path tested end-to-end: “Can we spin up a new cluster and restore yesterday’s snapshot in under N minutes?”

Some managed offerings provide one-click snapshots; self-hosted setups may require file-system snapshots or database-native backup tools. In either case, treat backups exactly as seriously as you would for your transactional database.

Privacy and security

Vector DBs often store sensitive internal text: tickets, logs, emails, documentation, even code. You need a story for:

Network security: VPC peering, private endpoints, TLS, firewall rules.
Auth and authz: API keys, OAuth, per-tenant access control (e.g. tenant_id filters that cannot be bypassed).
Encryption at rest: Are vectors and payloads stored encrypted on disk?
Data residency: Where are your embeddings physically stored?

A subtle point: even if you strip names/emails, embeddings can sometimes leak information via membership inference attacks. For high‑risk data, consider:

On-prem or VPC deployments,
Extra obfuscation or aggregation,
Limiting retention and access.

Common pitfalls: duplication, drift, and friends

Vector databases are easy to prototype and surprisingly easy to mess up in production. A few classic traps:

1. Duplication (silent bloat)

You re-index your content every night and keep inserting instead of upserting. Or you chunk your documents differently every time. You end up with:

Many slightly different chunks of the same text,
Lots of near-duplicate results,
Wasted storage and slower queries.

Mitigations:

Use stable IDs and upserts for chunks (e.g., doc_id + chunk_index).
Add a doc_id field and group results client-side.
Run periodic dedup jobs based on hash-of-text or similarity thresholds.

2. Embedding drift

You switch from model_v1 to model_v2 for embeddings but don’t re-index old data. The database happily stores both, but the meaning of “distance” changes. Retrieval relevance degrades slowly, and nobody notices until metrics fall off a cliff.

Mitigations:

Store an embedding_version in payload.
Keep embeddings consistent within a collection, or version collections (e.g., docs_v1, docs_v2).
When rolling out a new model, A/B test on a subset and plan a rebuild window.

3. Schema and filter drift

You start with a simple schema, then add filters later (“only show to tenant X”, “only show language=EN”). Old records don’t have those fields, so they silently fail filters or behave oddly.

Mitigations:

Design for multitenancy and language from day one.
Backfill new fields for existing records.
Use default values (e.g., language = 'unknown') to avoid null weirdness.

4. Overshooting with vectors

Sometimes the best solution is… boring. Replacing a perfectly good keyword filter (“find order by ID”) with a vector search is wasted complexity and worse UX.

Mitigation:

Be honest: “Does this query need semantics, or just exact matching / filters / joins?”
Use vector search where users truly behave like humans: natural language, fuzzy phrasing, similarity, recommendation.

5. No evaluation framework

You build a vector DB, tune some knobs, and ship. Six months later, you’re not sure if retrieval is better or worse than before.

Mitigation:

Build a small evaluation set of (query, relevant documents) pairs.
Track metrics like nDCG@k, Recall@k, or even simple “percentage of queries where at least one relevant doc appears in top 10.”
Whenever you change embeddings, indexes, or hybrid weights, re-run the evaluation.

Quick selection checklist

Okay, you’re convinced you need something that does vector search. Do you need a vector DB, or is a lighter vector store library enough? And which direction should you lean?

Use this as a starting checklist.

1. Scale and lifecycle

Ask yourself:

How much data?
< 100k vectors, mostly static → A library like FAISS or a simple pgvector setup might be enough.
Millions to billions of vectors, with dynamic updates → A dedicated vector DB is usually worth it.
How dynamic?
Append-only or occasional updates → IVF or HNSW with periodic maintenance is fine.
Heavy writes/updates/deletes → Prefer engines and configurations optimized for online updates (e.g., HNSW with good compaction, log-based storage).

2. Query complexity

If you only ever do “top-k nearest neighbors, no filters,” a library is fine.
If you need hybrid search, complex filters, aggregations, or multi-tenant isolation, pick a vector database that supports those features natively (Weaviate, Qdrant, etc.).

3. Operational posture

Can your team own and operate another distributed system?
If yes, self-hosted Qdrant or Weaviate might be great.
If no, consider a managed vector DB or a cloud search product with vector support.
Do you have strict data residency or privacy requirements?
If yes, VPC / on-prem or self-host may be non-negotiable.

4. Multimodal and future-proofing

Will you likely add image, code, or audio search soon?
Pick a DB that supports multiple vectors per object and is friendly to multimodal setups.

5. Reranking + RAG

If you’re doing serious RAG, plan for two-stage retrieval (vector DB + reranker) from the start.
Make sure your chosen DB integrates well with your reranking model stack and doesn’t become the latency bottleneck.

If you want a concrete shortlist to explore, the Weaviate “What is a Vector Database?” guide and the Qdrant fundamentals & quickstart docs are solid, vendor-backed intros that cover concepts and give you copy-pastable examples.

Turning this into practice (and where PractiqAI fits)

Understanding vector databases conceptually is step one; using them well is really about practice:

Designing schemas for real content.
Deciding where to use hybrid search vs pure dense.
Tuning k, index parameters, and reranker budgets.
Debugging weird retrieval failures.

PractiqAI’s whole philosophy is to turn this kind of knowledge into hands-on tasks you can actually fail and then master. Think of exercises like:

“Design a Qdrant collection for a multi-tenant support knowledge base.”
“Implement hybrid search over FAQs with a simple reranking step.”
“Plan a migration from one embedding model to another without downtime.”

Each task forces you to translate the theory from this article into concrete prompts, queries, and data models—so vector databases stop being magic boxes and start being just another tool you’re fluent with.

If you take one idea away from this article, make it this:

A vector database is infrastructure for meaning. Treat it with the same rigor you treat your transactional database—schema design, indexing, operations—and it will quietly power your AI systems instead of quietly sabotaging them.

Let’s start with the shortest possible version.

The short answer

A vector database is a database that:

Stores objects (documents, products, code snippets…) plus vector embeddings for those objects,
Builds special indexes (HNSW, IVF, etc.) so it can quickly find nearest neighbors to a query vector,
Lets you combine that search with filters, metadata, and hybrid keyword+vector ranking,
And wraps it all in database things: durability, replication, backups, access control, observability.

The rest of this article fills in the details so you can choose—and use—these tools without guessing.

What a vector database actually does

At its core, a vector database does three jobs:

Turns your data into vectors (or accepts vectors you generate yourself).
Indexes those vectors so nearest-neighbor queries are fast enough for production.
Returns rich objects, not just IDs, and lets you filter and rank them in useful ways.

Most modern vector DBs store both:

A payload/object: JSON-like structure (e.g., { id, title, body, tags, created_at, ... }), and
One or more vector fields: fixed-length arrays of floats representing text, images, code, etc.

On query, you send either:

A query vector directly, or
A piece of text/image/code that the DB or your app converts to a vector.

The database then:

Uses its index (HNSW, IVF, flat…) to find the closest vectors to your query in high-dimensional space.
Applies filters (e.g., tenant_id = 'acme' AND language = 'en').
Returns the objects with their metadata, sometimes with scores, highlights, and/or reranked orders.

That’s the main difference from a simple library like FAISS:

FAISS: “Give me the nearest 100 vector IDs to this vector.”
Vector DB: “Give me the nearest 10 customer FAQs in English for tenant X, created in the last 90 days, with their titles, content, and metadata, ready to send to the LLM.”

The libraries are the engine; the vector database is the car around that engine (chassis, steering, brakes, airbags, dashboard).

A good mental model

Index types: HNSW, IVF, filters, and metadata

If a vector DB is the car, the index is the transmission. It decides how you trade off speed, memory, and recall.

HNSW in one paragraph

Start at a coarse top layer with a small subset of nodes,
Walk greedily toward neighbors that are closer to the query vector,
Drop down layers as you converge, until you’re in the fine-grained bottom layer.

This gives very fast search with high recall, and supports incremental inserts reasonably well, which is why you see HNSW as a default index in many vector DBs.

You usually tune a few parameters (like M, ef_construction, and ef_search) to trade memory usage for accuracy and latency.

IVF in one paragraph

IVF (Inverted File Index) works differently:

It clusters your vectors into centroids (think “buckets in vector space”).
At query time, it finds the closest centroids to your query,
Then only searches within those buckets instead of the entire dataset.

Flat indexes

Filters and metadata

Indexing would be easy if you never had filters. In real systems, you almost always have conditions like:

“Only show documents for this tenant.”
“Only content in English.”
“Only items with is_published = true and created_at > now – 30 days.”

Vector DBs attach metadata to each object (Weaviate calls them “properties” on an object; Qdrant calls it “payload”).

At query time, the database:

Either applies filters first, narrows the candidate set, and then runs vector search within that subset (great recall, but tricky to optimize), or
Or runs vector search first, then post-filters candidate IDs and maybe backfills if too many get filtered out.

Hybrid search: sparse + dense

Vector search is not here to kill keyword search; it’s here to team up with it.

In retrieval land, you’ll see two big families:

Dense vectors: embeddings from a neural model (text, image, code). Great at capturing meaning.
Sparse vectors / lexical: traditional keyword-based signals (BM25, term frequency, inverted indexes). Great at respecting exact wording and rare tokens (IDs, error codes).

Hybrid search combines them.

A common pattern (used by Weaviate, Qdrant, Elastic, and others) is:

Run a dense vector search for semantic matches.
Run a sparse/keyword search (e.g., BM25) in parallel.
Use a fusion algorithm—like Reciprocal Rank Fusion (RRF) or weighted scoring—to merge the two ranked lists into a single result set.

This gives you:

Flexibility for natural language queries (thanks to dense vectors),
Hard guarantees on keyword presence (thanks to sparse),
And better robustness to weird queries like “error E1234 in module zyx”.

Data modeling: objects and payloads

If you treat a vector DB as “just an index,” you will eventually suffer. You need to model your data as carefully as you would in a relational database.

Objects vs. payloads

Most vector DBs expose a concept like:

Collection / class – roughly a table (e.g., documents, products).
Point / object – a row or record, with:
ID (string or UUID),
One or more vector fields,
A payload / property set (a JSON-ish map of scalar values and arrays).

Modeling for RAG (and similar)

In a typical RAG system, you might have:

id: stable ID for a chunk of text,
doc_id: ID of the original document,
chunk_index: position within the doc,
title, section_heading, body,
source (URL or path),
tags, tenant_id, created_at, language, etc.
vector: embedding of title + body or just the body.

Why this matters:

You can group results by doc_id when presenting to users or building context for the LLM.
You can de-duplicate near‐identical chunks from the same document.
You can easily filter on tenant_id, language, or created_at without reinventing your schema later.

A tiny example

Pseudo-schema for a “document chunks” collection:

json

{
  "id": "string",
  "doc_id": "string",
  "chunk_index": "number",
  "title": "string",
  "body": "string",
  "source": "string",
  "tenant_id": "string",
  "language": "string",
  "created_at": "string",
  "tags": ["string"],
  "vector": "float[1536]"
}

Model this once and your prompts (e.g., “retrieve 5 chunks per doc, most recent first”) become much simpler and more powerful.

Reranking and quality–cost tradeoffs

Even with good indexes and hybrid search, your first-stage retrieval is still approximate. That’s where reranking comes in.

The typical production pattern is two-stage retrieval:

Stage 1: Cheap, approximate search

Use the vector DB (possibly hybrid) to retrieve, say, top 50–200 candidates.
This is fast and usually limited by your index and network.

Stage 2: Expensive, precise reranking

Feed those candidates (query + candidate text) into a cross-encoder or an LLM-based scorer.
Compute a more accurate similarity score, then reorder the list and keep the top N (e.g., 10).

Why bother?

Quality: Cross-encoders look at the full query–document pair jointly, not just compressed embeddings. They’re often significantly more accurate, especially on nuanced or long queries.
Cost: Doing cross-encoder scoring on every document is impossible. Doing it on ~100 candidates is often fine.

The tradeoffs

You tune:

How many candidates to retrieve from the vector DB (top-k).
Which reranker (small cross-encoder vs large LLM; open-source vs API).
How many reranked results you keep and pass downstream (e.g., into the prompt context).

The knobs interact:

Increasing k improves recall but raises reranking cost and latency.
A heavier reranker improves precision but might blow up your latency budget.
LLM-based reranking can double as safety or policy filters, but it’s the most expensive path.

In practice, teams often start with:

k ≈ 50–100 from the vector DB,
A small, dedicated reranker model (e.g., cross-encoder from the same family as your embeddings),
Then experiment with LLM-based reranking for high-value queries only.

Freshness strategies: upserts, TTL, rebuilds

In a perfect world, your vector index would always reflect the latest state of your data. In reality, you have updates, deletes, and rapidly-changing domains like news or pricing.

There are three broad layers to keeping things fresh.

1. Upserts and soft deletes

Most vector DBs support upsert operations: if a point with a given ID exists, it’s updated; otherwise, it’s inserted. Under the hood, the index either:

Updates the existing node’s vector and metadata, or
Marks the old one as deleted and inserts a new one.

This works well for:

Corrections (fixing typos in documents),
Incremental content updates,
Soft-delete patterns (mark as is_deleted = true and filter out).

Over time, though, heavy churn can make your index fragmented. You might need periodic compaction or rebuilds, depending on the engine and workload.

2. Time-to-live (TTL) and retention

Some databases let you set a TTL per record; others require you to implement retention yourself. Either way, think about:

Which content should expire quickly (e.g., search logs, temporary results).
Which content is long-lived (knowledge base, docs).

A simple pattern:

Add a expires_at timestamp to payload.
Apply a filter expires_at > now() on all queries.
Run a background job to hard-delete expired points and trigger compaction when needed.

3. Full and partial rebuilds

In that case, you may need to:

Re-embed and re-index either:
The entire collection, or
At least a consistent subset (e.g., everything for a tenant or a time window).
Manage versioned collections:
docs_v1 (old embedding model),
docs_v2 (new model),
Gradually migrate queries and write paths from v1 to v2.

This avoids mixing incompatible embeddings and lets you roll back if needed.

Multimodal vectors: text, images, code

“Vector database” doesn’t mean “text database.” Vectors can represent anything: images, audio, code, user behavior, product interactions, you name it.

Modern tools like Weaviate and Qdrant are designed to store vectors for many data types and often support multiple vector fields per object (sometimes called multivector representations).

Common multimodal patterns

Text + image search: Each product has:
vector_text – embedding of title + description,
vector_image – embedding of product image. You can query either field depending on whether the user types text or uploads a photo.
Code search: Embed both function bodies and docstrings/comments. Store language, repo, file path as metadata. You can later combine:
Vector search for semantics,
Filters for repo or language,
Hybrid search for symbol names.
Behavioral + content embeddings: For recommendations, combine:
Item content vectors (text/image),
User history embeddings (aggregations of past interactions).

Storage patterns

You can:

Store multiple named vectors per object (e.g., content_vector, image_vector, code_vector).
Or store separate collections for each modality, linked by a shared item_id.

The first option is easier to manage operationally; the second is sometimes simpler when different teams own different modalities.

Ops basics: scale, backups, privacy

Vector databases are still databases. They need the same boring-but-critical operational care as any other infrastructure component.

Scaling

Most modern vector DBs are:

Cloud-native and horizontally scalable, with concepts like collections, shards, and replicas.
Designed to use a mix of RAM + disk instead of keeping everything in memory.

Key scaling questions:

QPS and latency: How many queries per second, and at what p95/p99 latency?
Data size: How big is your corpus now, and in 12–24 months?
Update pattern: Mostly append-only (great), or heavy updates/deletes (harder)?
Multitenancy: Do you need to isolate tenants physically (separate clusters) or logically (filters, namespaces)?

Capacity planning is easier if you test with realistic embeddings and queries early rather than scaling from a tiny toy dataset.

Backups and disaster recovery

You absolutely want:

Periodic snapshots of collections and index files, plus any write-ahead logs.
A restore path tested end-to-end: “Can we spin up a new cluster and restore yesterday’s snapshot in under N minutes?”

Privacy and security

Vector DBs often store sensitive internal text: tickets, logs, emails, documentation, even code. You need a story for:

Network security: VPC peering, private endpoints, TLS, firewall rules.
Auth and authz: API keys, OAuth, per-tenant access control (e.g. tenant_id filters that cannot be bypassed).
Encryption at rest: Are vectors and payloads stored encrypted on disk?
Data residency: Where are your embeddings physically stored?

A subtle point: even if you strip names/emails, embeddings can sometimes leak information via membership inference attacks. For high‑risk data, consider:

On-prem or VPC deployments,
Extra obfuscation or aggregation,
Limiting retention and access.

Common pitfalls: duplication, drift, and friends

Vector databases are easy to prototype and surprisingly easy to mess up in production. A few classic traps:

1. Duplication (silent bloat)

You re-index your content every night and keep inserting instead of upserting. Or you chunk your documents differently every time. You end up with:

Many slightly different chunks of the same text,
Lots of near-duplicate results,
Wasted storage and slower queries.

Mitigations:

Use stable IDs and upserts for chunks (e.g., doc_id + chunk_index).
Add a doc_id field and group results client-side.
Run periodic dedup jobs based on hash-of-text or similarity thresholds.

2. Embedding drift

Mitigations:

Store an embedding_version in payload.
Keep embeddings consistent within a collection, or version collections (e.g., docs_v1, docs_v2).
When rolling out a new model, A/B test on a subset and plan a rebuild window.

3. Schema and filter drift

Mitigations:

Design for multitenancy and language from day one.
Backfill new fields for existing records.
Use default values (e.g., language = 'unknown') to avoid null weirdness.

4. Overshooting with vectors

Sometimes the best solution is… boring. Replacing a perfectly good keyword filter (“find order by ID”) with a vector search is wasted complexity and worse UX.

Mitigation:

Be honest: “Does this query need semantics, or just exact matching / filters / joins?”
Use vector search where users truly behave like humans: natural language, fuzzy phrasing, similarity, recommendation.

5. No evaluation framework

You build a vector DB, tune some knobs, and ship. Six months later, you’re not sure if retrieval is better or worse than before.

Mitigation:

Build a small evaluation set of (query, relevant documents) pairs.
Track metrics like nDCG@k, Recall@k, or even simple “percentage of queries where at least one relevant doc appears in top 10.”
Whenever you change embeddings, indexes, or hybrid weights, re-run the evaluation.

Quick selection checklist

Okay, you’re convinced you need something that does vector search. Do you need a vector DB, or is a lighter vector store library enough? And which direction should you lean?

Use this as a starting checklist.

1. Scale and lifecycle

Ask yourself:

How much data?
< 100k vectors, mostly static → A library like FAISS or a simple pgvector setup might be enough.
Millions to billions of vectors, with dynamic updates → A dedicated vector DB is usually worth it.
How dynamic?
Append-only or occasional updates → IVF or HNSW with periodic maintenance is fine.
Heavy writes/updates/deletes → Prefer engines and configurations optimized for online updates (e.g., HNSW with good compaction, log-based storage).

2. Query complexity

If you only ever do “top-k nearest neighbors, no filters,” a library is fine.
If you need hybrid search, complex filters, aggregations, or multi-tenant isolation, pick a vector database that supports those features natively (Weaviate, Qdrant, etc.).

3. Operational posture

Can your team own and operate another distributed system?
If yes, self-hosted Qdrant or Weaviate might be great.
If no, consider a managed vector DB or a cloud search product with vector support.
Do you have strict data residency or privacy requirements?
If yes, VPC / on-prem or self-host may be non-negotiable.

4. Multimodal and future-proofing

Will you likely add image, code, or audio search soon?
Pick a DB that supports multiple vectors per object and is friendly to multimodal setups.

5. Reranking + RAG

If you’re doing serious RAG, plan for two-stage retrieval (vector DB + reranker) from the start.
Make sure your chosen DB integrates well with your reranking model stack and doesn’t become the latency bottleneck.

Turning this into practice (and where PractiqAI fits)

Understanding vector databases conceptually is step one; using them well is really about practice:

Designing schemas for real content.
Deciding where to use hybrid search vs pure dense.
Tuning k, index parameters, and reranker budgets.
Debugging weird retrieval failures.

PractiqAI’s whole philosophy is to turn this kind of knowledge into hands-on tasks you can actually fail and then master. Think of exercises like:

“Design a Qdrant collection for a multi-tenant support knowledge base.”
“Implement hybrid search over FAQs with a simple reranking step.”
“Plan a migration from one embedding model to another without downtime.”

If you take one idea away from this article, make it this:

A vector database is infrastructure for meaning. Treat it with the same rigor you treat your transactional database—schema design, indexing, operations—and it will quietly power your AI systems instead of quietly sabotaging them.

Let’s start with the shortest possible version.

The short answer

A vector database is a database that:

Stores objects (documents, products, code snippets…) plus vector embeddings for those objects,
Builds special indexes (HNSW, IVF, etc.) so it can quickly find nearest neighbors to a query vector,
Lets you combine that search with filters, metadata, and hybrid keyword+vector ranking,
And wraps it all in database things: durability, replication, backups, access control, observability.

The rest of this article fills in the details so you can choose—and use—these tools without guessing.

What a vector database actually does

At its core, a vector database does three jobs:

Turns your data into vectors (or accepts vectors you generate yourself).
Indexes those vectors so nearest-neighbor queries are fast enough for production.
Returns rich objects, not just IDs, and lets you filter and rank them in useful ways.

Most modern vector DBs store both:

A payload/object: JSON-like structure (e.g., { id, title, body, tags, created_at, ... }), and
One or more vector fields: fixed-length arrays of floats representing text, images, code, etc.

On query, you send either:

A query vector directly, or
A piece of text/image/code that the DB or your app converts to a vector.

The database then:

Uses its index (HNSW, IVF, flat…) to find the closest vectors to your query in high-dimensional space.
Applies filters (e.g., tenant_id = 'acme' AND language = 'en').
Returns the objects with their metadata, sometimes with scores, highlights, and/or reranked orders.

That’s the main difference from a simple library like FAISS:

FAISS: “Give me the nearest 100 vector IDs to this vector.”
Vector DB: “Give me the nearest 10 customer FAQs in English for tenant X, created in the last 90 days, with their titles, content, and metadata, ready to send to the LLM.”

The libraries are the engine; the vector database is the car around that engine (chassis, steering, brakes, airbags, dashboard).

A good mental model

Index types: HNSW, IVF, filters, and metadata

If a vector DB is the car, the index is the transmission. It decides how you trade off speed, memory, and recall.

HNSW in one paragraph

Start at a coarse top layer with a small subset of nodes,
Walk greedily toward neighbors that are closer to the query vector,
Drop down layers as you converge, until you’re in the fine-grained bottom layer.

This gives very fast search with high recall, and supports incremental inserts reasonably well, which is why you see HNSW as a default index in many vector DBs.

You usually tune a few parameters (like M, ef_construction, and ef_search) to trade memory usage for accuracy and latency.

IVF in one paragraph

IVF (Inverted File Index) works differently:

It clusters your vectors into centroids (think “buckets in vector space”).
At query time, it finds the closest centroids to your query,
Then only searches within those buckets instead of the entire dataset.

Flat indexes

Filters and metadata

Indexing would be easy if you never had filters. In real systems, you almost always have conditions like:

“Only show documents for this tenant.”
“Only content in English.”
“Only items with is_published = true and created_at > now – 30 days.”

Vector DBs attach metadata to each object (Weaviate calls them “properties” on an object; Qdrant calls it “payload”).

At query time, the database:

Either applies filters first, narrows the candidate set, and then runs vector search within that subset (great recall, but tricky to optimize), or
Or runs vector search first, then post-filters candidate IDs and maybe backfills if too many get filtered out.

Hybrid search: sparse + dense

Vector search is not here to kill keyword search; it’s here to team up with it.

In retrieval land, you’ll see two big families:

Dense vectors: embeddings from a neural model (text, image, code). Great at capturing meaning.
Sparse vectors / lexical: traditional keyword-based signals (BM25, term frequency, inverted indexes). Great at respecting exact wording and rare tokens (IDs, error codes).

Hybrid search combines them.

A common pattern (used by Weaviate, Qdrant, Elastic, and others) is:

Run a dense vector search for semantic matches.
Run a sparse/keyword search (e.g., BM25) in parallel.
Use a fusion algorithm—like Reciprocal Rank Fusion (RRF) or weighted scoring—to merge the two ranked lists into a single result set.

This gives you:

Flexibility for natural language queries (thanks to dense vectors),
Hard guarantees on keyword presence (thanks to sparse),
And better robustness to weird queries like “error E1234 in module zyx”.

Data modeling: objects and payloads

If you treat a vector DB as “just an index,” you will eventually suffer. You need to model your data as carefully as you would in a relational database.

Objects vs. payloads

Most vector DBs expose a concept like:

Collection / class – roughly a table (e.g., documents, products).
Point / object – a row or record, with:
ID (string or UUID),
One or more vector fields,
A payload / property set (a JSON-ish map of scalar values and arrays).

Modeling for RAG (and similar)

In a typical RAG system, you might have:

id: stable ID for a chunk of text,
doc_id: ID of the original document,
chunk_index: position within the doc,
title, section_heading, body,
source (URL or path),
tags, tenant_id, created_at, language, etc.
vector: embedding of title + body or just the body.

Why this matters:

You can group results by doc_id when presenting to users or building context for the LLM.
You can de-duplicate near‐identical chunks from the same document.
You can easily filter on tenant_id, language, or created_at without reinventing your schema later.

A tiny example

Pseudo-schema for a “document chunks” collection:

json

{
  "id": "string",
  "doc_id": "string",
  "chunk_index": "number",
  "title": "string",
  "body": "string",
  "source": "string",
  "tenant_id": "string",
  "language": "string",
  "created_at": "string",
  "tags": ["string"],
  "vector": "float[1536]"
}

Model this once and your prompts (e.g., “retrieve 5 chunks per doc, most recent first”) become much simpler and more powerful.

Reranking and quality–cost tradeoffs

Even with good indexes and hybrid search, your first-stage retrieval is still approximate. That’s where reranking comes in.

The typical production pattern is two-stage retrieval:

Stage 1: Cheap, approximate search

Use the vector DB (possibly hybrid) to retrieve, say, top 50–200 candidates.
This is fast and usually limited by your index and network.

Stage 2: Expensive, precise reranking

Feed those candidates (query + candidate text) into a cross-encoder or an LLM-based scorer.
Compute a more accurate similarity score, then reorder the list and keep the top N (e.g., 10).

Why bother?

Quality: Cross-encoders look at the full query–document pair jointly, not just compressed embeddings. They’re often significantly more accurate, especially on nuanced or long queries.
Cost: Doing cross-encoder scoring on every document is impossible. Doing it on ~100 candidates is often fine.

The tradeoffs

You tune:

How many candidates to retrieve from the vector DB (top-k).
Which reranker (small cross-encoder vs large LLM; open-source vs API).
How many reranked results you keep and pass downstream (e.g., into the prompt context).

The knobs interact:

Increasing k improves recall but raises reranking cost and latency.
A heavier reranker improves precision but might blow up your latency budget.
LLM-based reranking can double as safety or policy filters, but it’s the most expensive path.

In practice, teams often start with:

k ≈ 50–100 from the vector DB,
A small, dedicated reranker model (e.g., cross-encoder from the same family as your embeddings),
Then experiment with LLM-based reranking for high-value queries only.

Freshness strategies: upserts, TTL, rebuilds

In a perfect world, your vector index would always reflect the latest state of your data. In reality, you have updates, deletes, and rapidly-changing domains like news or pricing.

There are three broad layers to keeping things fresh.

1. Upserts and soft deletes

Most vector DBs support upsert operations: if a point with a given ID exists, it’s updated; otherwise, it’s inserted. Under the hood, the index either:

Updates the existing node’s vector and metadata, or
Marks the old one as deleted and inserts a new one.

This works well for:

Corrections (fixing typos in documents),
Incremental content updates,
Soft-delete patterns (mark as is_deleted = true and filter out).

Over time, though, heavy churn can make your index fragmented. You might need periodic compaction or rebuilds, depending on the engine and workload.

2. Time-to-live (TTL) and retention

Some databases let you set a TTL per record; others require you to implement retention yourself. Either way, think about:

Which content should expire quickly (e.g., search logs, temporary results).
Which content is long-lived (knowledge base, docs).

A simple pattern:

Add a expires_at timestamp to payload.
Apply a filter expires_at > now() on all queries.
Run a background job to hard-delete expired points and trigger compaction when needed.

3. Full and partial rebuilds

In that case, you may need to:

Re-embed and re-index either:
The entire collection, or
At least a consistent subset (e.g., everything for a tenant or a time window).
Manage versioned collections:
docs_v1 (old embedding model),
docs_v2 (new model),
Gradually migrate queries and write paths from v1 to v2.

This avoids mixing incompatible embeddings and lets you roll back if needed.

Multimodal vectors: text, images, code

“Vector database” doesn’t mean “text database.” Vectors can represent anything: images, audio, code, user behavior, product interactions, you name it.

Modern tools like Weaviate and Qdrant are designed to store vectors for many data types and often support multiple vector fields per object (sometimes called multivector representations).

Common multimodal patterns

Text + image search: Each product has:
vector_text – embedding of title + description,
vector_image – embedding of product image. You can query either field depending on whether the user types text or uploads a photo.
Code search: Embed both function bodies and docstrings/comments. Store language, repo, file path as metadata. You can later combine:
Vector search for semantics,
Filters for repo or language,
Hybrid search for symbol names.
Behavioral + content embeddings: For recommendations, combine:
Item content vectors (text/image),
User history embeddings (aggregations of past interactions).

Storage patterns

You can:

Store multiple named vectors per object (e.g., content_vector, image_vector, code_vector).
Or store separate collections for each modality, linked by a shared item_id.

The first option is easier to manage operationally; the second is sometimes simpler when different teams own different modalities.

Ops basics: scale, backups, privacy

Vector databases are still databases. They need the same boring-but-critical operational care as any other infrastructure component.

Scaling

Most modern vector DBs are:

Cloud-native and horizontally scalable, with concepts like collections, shards, and replicas.
Designed to use a mix of RAM + disk instead of keeping everything in memory.

Key scaling questions:

QPS and latency: How many queries per second, and at what p95/p99 latency?
Data size: How big is your corpus now, and in 12–24 months?
Update pattern: Mostly append-only (great), or heavy updates/deletes (harder)?
Multitenancy: Do you need to isolate tenants physically (separate clusters) or logically (filters, namespaces)?

Capacity planning is easier if you test with realistic embeddings and queries early rather than scaling from a tiny toy dataset.

Backups and disaster recovery

You absolutely want:

Periodic snapshots of collections and index files, plus any write-ahead logs.
A restore path tested end-to-end: “Can we spin up a new cluster and restore yesterday’s snapshot in under N minutes?”

Privacy and security

Vector DBs often store sensitive internal text: tickets, logs, emails, documentation, even code. You need a story for:

Network security: VPC peering, private endpoints, TLS, firewall rules.
Auth and authz: API keys, OAuth, per-tenant access control (e.g. tenant_id filters that cannot be bypassed).
Encryption at rest: Are vectors and payloads stored encrypted on disk?
Data residency: Where are your embeddings physically stored?

A subtle point: even if you strip names/emails, embeddings can sometimes leak information via membership inference attacks. For high‑risk data, consider:

On-prem or VPC deployments,
Extra obfuscation or aggregation,
Limiting retention and access.

Common pitfalls: duplication, drift, and friends

Vector databases are easy to prototype and surprisingly easy to mess up in production. A few classic traps:

1. Duplication (silent bloat)

You re-index your content every night and keep inserting instead of upserting. Or you chunk your documents differently every time. You end up with:

Many slightly different chunks of the same text,
Lots of near-duplicate results,
Wasted storage and slower queries.

Mitigations:

Use stable IDs and upserts for chunks (e.g., doc_id + chunk_index).
Add a doc_id field and group results client-side.
Run periodic dedup jobs based on hash-of-text or similarity thresholds.

2. Embedding drift

Mitigations:

Store an embedding_version in payload.
Keep embeddings consistent within a collection, or version collections (e.g., docs_v1, docs_v2).
When rolling out a new model, A/B test on a subset and plan a rebuild window.

3. Schema and filter drift

Mitigations:

Design for multitenancy and language from day one.
Backfill new fields for existing records.
Use default values (e.g., language = 'unknown') to avoid null weirdness.

4. Overshooting with vectors

Sometimes the best solution is… boring. Replacing a perfectly good keyword filter (“find order by ID”) with a vector search is wasted complexity and worse UX.

Mitigation:

Be honest: “Does this query need semantics, or just exact matching / filters / joins?”
Use vector search where users truly behave like humans: natural language, fuzzy phrasing, similarity, recommendation.

5. No evaluation framework

You build a vector DB, tune some knobs, and ship. Six months later, you’re not sure if retrieval is better or worse than before.

Mitigation:

Build a small evaluation set of (query, relevant documents) pairs.
Track metrics like nDCG@k, Recall@k, or even simple “percentage of queries where at least one relevant doc appears in top 10.”
Whenever you change embeddings, indexes, or hybrid weights, re-run the evaluation.

Quick selection checklist

Okay, you’re convinced you need something that does vector search. Do you need a vector DB, or is a lighter vector store library enough? And which direction should you lean?

Use this as a starting checklist.

1. Scale and lifecycle

Ask yourself:

How much data?
< 100k vectors, mostly static → A library like FAISS or a simple pgvector setup might be enough.
Millions to billions of vectors, with dynamic updates → A dedicated vector DB is usually worth it.
How dynamic?
Append-only or occasional updates → IVF or HNSW with periodic maintenance is fine.
Heavy writes/updates/deletes → Prefer engines and configurations optimized for online updates (e.g., HNSW with good compaction, log-based storage).

2. Query complexity

If you only ever do “top-k nearest neighbors, no filters,” a library is fine.
If you need hybrid search, complex filters, aggregations, or multi-tenant isolation, pick a vector database that supports those features natively (Weaviate, Qdrant, etc.).

3. Operational posture

Can your team own and operate another distributed system?
If yes, self-hosted Qdrant or Weaviate might be great.
If no, consider a managed vector DB or a cloud search product with vector support.
Do you have strict data residency or privacy requirements?
If yes, VPC / on-prem or self-host may be non-negotiable.

4. Multimodal and future-proofing

Will you likely add image, code, or audio search soon?
Pick a DB that supports multiple vectors per object and is friendly to multimodal setups.

5. Reranking + RAG

If you’re doing serious RAG, plan for two-stage retrieval (vector DB + reranker) from the start.
Make sure your chosen DB integrates well with your reranking model stack and doesn’t become the latency bottleneck.

Turning this into practice (and where PractiqAI fits)

Understanding vector databases conceptually is step one; using them well is really about practice:

Designing schemas for real content.
Deciding where to use hybrid search vs pure dense.
Tuning k, index parameters, and reranker budgets.
Debugging weird retrieval failures.

PractiqAI’s whole philosophy is to turn this kind of knowledge into hands-on tasks you can actually fail and then master. Think of exercises like:

“Design a Qdrant collection for a multi-tenant support knowledge base.”
“Implement hybrid search over FAQs with a simple reranking step.”
“Plan a migration from one embedding model to another without downtime.”

If you take one idea away from this article, make it this:

A vector database is infrastructure for meaning. Treat it with the same rigor you treat your transactional database—schema design, indexing, operations—and it will quietly power your AI systems instead of quietly sabotaging them.

Paweł Brzuszkiewicz

Ready to make AI practice part of your routine?

Why read: What Are Vector Databases

Agents: When a Single Prompt Isn’t Enough

Prompt Injection: The Security Risk

What is Chain‑of‑Thought (CoT)

Apply this article inside a course

Everyday AI: Practical Prompting

Python Fundamentals by Doing

Explore curated learning paths

Practice what you just learned

Smart Grocery Consolidation

Goal to Checklist

Diverse Blog Headlines

Where to go after this story

Compare PractiqAI plans

Learn faster on the PractiqAI blog

See what shipped recently

What Are Vector Databases

Paweł Brzuszkiewicz

Ready to make AI practice part of your routine?

Why read: What Are Vector Databases

Agents: When a Single Prompt Isn’t Enough

Prompt Injection: The Security Risk

What is Chain‑of‑Thought (CoT)

Apply this article inside a course

Everyday AI: Practical Prompting

Python Fundamentals by Doing

Explore curated learning paths

Practice what you just learned

Smart Grocery Consolidation

Goal to Checklist

Diverse Blog Headlines

Where to go after this story

Compare PractiqAI plans

Learn faster on the PractiqAI blog

See what shipped recently

What Are Vector Databases