PGVector

Quick Summary

PGVector is an open-source PostgreSQL extension that enables semantic search and similarity-based retrieval directly within PostgreSQL, making it a scalable, SQL-native solution for LLM applications and RAG pipelines. Learn more about PGVector here.

When building your PGVector retriever, you'll have to define hyperparameters like LIMIT and the embedding model to encode your text chunks. DeepEval can help you optimize these parameters by evaluating how well your PGVector retriever does under different hyperparameter combinations:

info

To get started, install PGVector and the PostgreSQL client using the following command:

pip install psycopg2 pgvector

Setup PGVector

To interact with a PostgreSQL database from Python, we'll use the psycopg2 library, which provides a low-level database adapter following the PostgreSQL client-server protocol, to connect to our database. This connection allows us to execute SQL queries, fetch results, and manage transactions.

import psycopg2
import os

# Connect to PostgreSQL database
conn = psycopg2.connect(
    dbname="your_database",
    user="your_user",
    password=os.getenv("PG_PASSWORD"),  # Set in environment variable
    host="localhost",
    port="5432"
)
cursor = conn.cursor()

Next, you'll need to create a table to store text chunks along with their corresponding embedding vectors. To enable vector operations, you'll need to activate the pgvector extension.

# Enable the pgvector extension (only needed once)
cursor.execute("CREATE EXTENSION IF NOT EXISTS vector;")

# Define table schema for text and embeddings
cursor.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        text TEXT,
        embedding vector(384)  -- Defines a 384-dimension vector
    );
""")
conn.commit()

Finally, you'll need to convert your document chunks into vectors using an embedding model and store them in PostgreSQL. We'll use all-MiniLM-L6-v2 from sentence-transformers to generate embeddings and insert them into the documents table.

# Load an embedding model
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

# Example document chunks
document_chunks = [
    "PGVector brings vector search to PostgreSQL.",
    "RAG improves AI-generated responses with retrieved context.",
    "Vector search enables high-precision semantic retrieval.",
    ...
]

# Store chunks with embeddings in PGVector
for chunk in document_chunks:
    embedding = model.encode(chunk).tolist()  # Convert text to vector
    cursor.execute(
        "INSERT INTO documents (text, embedding) VALUES (%s, %s);",
        (chunk, embedding)
    )

conn.commit()

PGVector functions as the retrieval engine in your RAG pipeline, efficiently fetching relevant document chunks to provide your LLM generator with grounded context. The diagram below illustrates how PGVector integrates into your RAG pipeline.

Source: HexaCluster

Evaluating PGVector Retrieval

Evaluating your PGVector retriever involves two key steps. First, you need to generate a test case by preparing an input query along with the expected LLM response. This input is then processed through your RAG pipeline to produce an LLMTestCase, which includes the query, actual output, expected output, and retrieved context.

Once the test case is created, the next step is to assess retrieval performance using a selection of evaluation metrics designed to measure the precision, recall, and relevance of the retrieved context.

Preparing your Test Case

Since retrieving relevant retrieval_context from your PGVector table is the first step in generating a response from your RAG pipeline, you need to perform a similarity search based on the input query. The function below encodes the input query into an embedding and retrieves the top-K (or LIMIT) most similar document chunks using cosine similarity.

...

def search(query, top_k=3):
    query_embedding = model.encode(query).tolist()

    cursor.execute("""
        SELECT text FROM documents
        ORDER BY embedding <-> %s  -- Use <-> for cosine similarity
        LIMIT %s;
    """, (query_embedding, top_k))

    return [row[0] for row in cursor.fetchall()]

query = "How does PGVector work?"
retrieval_context = search(query)

Next, we'll insert the retrieval_context retrieved from the vector database into our prompt template to generate an LLM response, referred to as actual_output. This step finalizes the required parameters needed to construct an LLMTestCase.

from deepeval.test_case import LLMTestCase

...

prompt = """
Answer the user question based on the supporting context

User Question:
{input}

Supporting Context:
{retrieval_context}
"""

actual_output = generate(prompt) # hypothetical function, replace with your own LLM
print(actual_output)
# PGVector enables efficient vector search within PostgreSQL for AI applications.

test_case = LLMTestCase(
    input=input,
    actual_output=actual_output,
    retrieval_context=retrieval_context,
    expected_output="PGVector is an extension that brings efficient vector search capabilities to PostgreSQL.",
)

Running Evaluations

Before evaluating the LLMTestCase, we need to define deepeval metrics that measure the effectiveness of the PGVector retriever. Key retrieval metrics include contextual recall, contextual precision, and contextual relevancy, which assesses how well the retrieved retrieval_context.

info

You can learn more about these contextual metrics and why they're relevant to retriever evaluation in this guide.

from deepeval import evaluate
from deepeval.metrics import (
    ContextualRecallMetric,
    ContextualPrecisionMetric,
    ContextualRelevancyMetric,
)

...

contextual_recall = ContextualRecallMetric(),
contextual_precision = ContextualPrecisionMetric()
contextual_relevancy = ontextualRelevancyMetric()

evaluate(
    [test_case],
    metrics=[contextual_recall, contextual_precision, contextual_relevancy]
)

Improving PGVector Retrieval

After running multiple test cases, let's assume that the Contextual Precision score is lower than expected. This suggests that while our retriever is fetching relevant contexts, some of them may not be the best match for the query, introducing noise into the response.

Key Findings

Query	Contextual Precision Score	Contextual Recall Score
"How does PGVector store embeddings?"	0.42	0.91
"Explain PGVector’s similarity search."	0.38	0.87
"What makes PGVector efficient for RAG?"	0.40	0.85

Addressing Low Precision

Since precision measures how well the retrieved contexts align with the query, a lower score often means that some retrieved results are not as relevant as they should be. Possible improvements include:

Using a More Domain-Specific Embedding Model
If your use case involves technical documentation, a general-purpose model like all-MiniLM-L6-v2 may not be ideal. Consider testing models such as:
- BAAI/bge-small-en for better retrieval ranking.
- sentence-transformers/msmarco-distilbert-base-v4 for dense passage retrieval.
- nomic-ai/nomic-embed-text-v1 for handling longer text chunks.
Optimizing Retrieval Parameters
- Adjust LIMIT in your retrieval query to control the number of retrieved results.

Next Steps

After refining your retrieval strategy—whether by adjusting embedding models or tuning retrieval parameters—it's crucial to generate new test cases and reassess performance. Focus on Contextual Precision, as improvements here indicate a more accurate and relevant retrieval process.

info

For systematic retrieval evaluation and embedding model comparisons, use Confident AI.

Quick Summary​

Setup PGVector​

Evaluating PGVector Retrieval​

Preparing your Test Case​

Running Evaluations​

Improving PGVector Retrieval​

Key Findings​

Addressing Low Precision​

Next Steps​

Quick Summary

Setup PGVector

Evaluating PGVector Retrieval

Preparing your Test Case

Running Evaluations

Improving PGVector Retrieval

Key Findings

Addressing Low Precision

Next Steps