PGVector
Quick Summary
PGVector is an open-source PostgreSQL extension that enables semantic search and similarity-based retrieval directly within PostgreSQL, making it a scalable, SQL-native solution for LLM applications and RAG pipelines. Learn more about PGVector here.
When building your PGVector retriever, you'll have to define hyperparameters like LIMIT
and the embedding model
to encode your text chunks. DeepEval can help you optimize these parameters by evaluating how well your PGVector retriever does under different hyperparameter combinations:
To get started, install PGVector and the PostgreSQL client using the following command:
pip install psycopg2 pgvector
Setup PGVector
To interact with a PostgreSQL database from Python, we'll use the psycopg2
library, which provides a low-level database adapter following the PostgreSQL client-server protocol, to connect to our database. This connection allows us to execute SQL queries, fetch results, and manage transactions.
import psycopg2
import os
# Connect to PostgreSQL database
conn = psycopg2.connect(
dbname="your_database",
user="your_user",
password=os.getenv("PG_PASSWORD"), # Set in environment variable
host="localhost",
port="5432"
)
cursor = conn.cursor()
Next, you'll need to create a table to store text
chunks along with their corresponding embedding vectors
. To enable vector operations, you'll need to activate the pgvector
extension.
# Enable the pgvector extension (only needed once)
cursor.execute("CREATE EXTENSION IF NOT EXISTS vector;")
# Define table schema for text and embeddings
cursor.execute("""
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
text TEXT,
embedding vector(384) -- Defines a 384-dimension vector
);
""")
conn.commit()
Finally, you'll need to convert your document chunks into vectors using an embedding model and store them in PostgreSQL. We'll use all-MiniLM-L6-v2
from sentence-transformers
to generate embeddings and insert them into the documents
table.
# Load an embedding model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
# Example document chunks
document_chunks = [
"PGVector brings vector search to PostgreSQL.",
"RAG improves AI-generated responses with retrieved context.",
"Vector search enables high-precision semantic retrieval.",
...
]
# Store chunks with embeddings in PGVector
for chunk in document_chunks:
embedding = model.encode(chunk).tolist() # Convert text to vector
cursor.execute(
"INSERT INTO documents (text, embedding) VALUES (%s, %s);",
(chunk, embedding)
)
conn.commit()
PGVector functions as the retrieval engine in your RAG pipeline, efficiently fetching relevant document chunks to provide your LLM generator with grounded context. The diagram below illustrates how PGVector integrates into your RAG pipeline.

Evaluating PGVector Retrieval
Evaluating your PGVector retriever involves two key steps. First, you need to generate a test case by preparing an input
query along with the expected LLM response. This input
is then processed through your RAG pipeline to produce an LLMTestCase
, which includes the query, actual output, expected output, and retrieved context.
Once the test case is created, the next step is to assess retrieval performance using a selection of evaluation metrics designed to measure the precision, recall, and relevance of the retrieved context.
Preparing your Test Case
Since retrieving relevant retrieval_context
from your PGVector table is the first step in generating a response from your RAG pipeline, you need to perform a similarity search based on the input
query. The function below encodes the input
query into an embedding and retrieves the top-K
(or LIMIT
) most similar document chunks using cosine similarity.
...
def search(query, top_k=3):
query_embedding = model.encode(query).tolist()
cursor.execute("""
SELECT text FROM documents
ORDER BY embedding <-> %s -- Use <-> for cosine similarity
LIMIT %s;
""", (query_embedding, top_k))
return [row[0] for row in cursor.fetchall()]
query = "How does PGVector work?"
retrieval_context = search(query)
Next, we'll insert the retrieval_context
retrieved from the vector database into our prompt template to generate an LLM response, referred to as actual_output
. This step finalizes the required parameters needed to construct an LLMTestCase
.
from deepeval.test_case import LLMTestCase
...
prompt = """
Answer the user question based on the supporting context
User Question:
{input}
Supporting Context:
{retrieval_context}
"""
actual_output = generate(prompt) # hypothetical function, replace with your own LLM
print(actual_output)
# PGVector enables efficient vector search within PostgreSQL for AI applications.
test_case = LLMTestCase(
input=input,
actual_output=actual_output,
retrieval_context=retrieval_context,
expected_output="PGVector is an extension that brings efficient vector search capabilities to PostgreSQL.",
)
Running Evaluations
Before evaluating the LLMTestCase
, we need to define deepeval
metrics that measure the effectiveness of the PGVector retriever. Key retrieval metrics include contextual recall, contextual precision, and contextual relevancy, which assesses how well the retrieved retrieval_context
.
You can learn more about these contextual metrics and why they're relevant to retriever evaluation in this guide.
from deepeval import evaluate
from deepeval.metrics import (
ContextualRecallMetric,
ContextualPrecisionMetric,
ContextualRelevancyMetric,
)
...
contextual_recall = ContextualRecallMetric(),
contextual_precision = ContextualPrecisionMetric()
contextual_relevancy = ontextualRelevancyMetric()
evaluate(
[test_case],
metrics=[contextual_recall, contextual_precision, contextual_relevancy]
)
Improving PGVector Retrieval
After running multiple test cases, let's assume that the Contextual Precision score is lower than expected. This suggests that while our retriever is fetching relevant contexts, some of them may not be the best match for the query, introducing noise into the response.
Key Findings
Query | Contextual Precision Score | Contextual Recall Score |
---|---|---|
"How does PGVector store embeddings?" | 0.42 | 0.91 |
"Explain PGVector’s similarity search." | 0.38 | 0.87 |
"What makes PGVector efficient for RAG?" | 0.40 | 0.85 |
Addressing Low Precision
Since precision measures how well the retrieved contexts align with the query, a lower score often means that some retrieved results are not as relevant as they should be. Possible improvements include:
Using a More Domain-Specific Embedding Model
If your use case involves technical documentation, a general-purpose model likeall-MiniLM-L6-v2
may not be ideal. Consider testing models such as:BAAI/bge-small-en
for better retrieval ranking.sentence-transformers/msmarco-distilbert-base-v4
for dense passage retrieval.nomic-ai/nomic-embed-text-v1
for handling longer text chunks.
Optimizing Retrieval Parameters
- Adjust
LIMIT
in your retrieval query to control the number of retrieved results.
- Adjust
Next Steps
After refining your retrieval strategy—whether by adjusting embedding models or tuning retrieval parameters—it's crucial to generate new test cases and reassess performance. Focus on Contextual Precision, as improvements here indicate a more accurate and relevant retrieval process.
For systematic retrieval evaluation and embedding model comparisons, use Confident AI.