Weaviate

Quick Summary

Weaviate is a cloud-native, open-source vector database that uses state-of-the-art ML models to embed data. It is fast, flexible, and designed for production-readiness, capable of performing 10-NN nearest neighbor searches on millions of objects in milliseconds.

tip

To learn more about leveraging Weaviate as your retrieval engine, visit this page.

RAG pipeline with Weaviate retrieval engine (source: Weaviate)

Youn can easily evaluate your Weaviate retriever with DeepEval to find the best hyperparameters for your Weaviate engine. This parameters include with_limit (top-K) and vectorizer (embedding model), among many others.

info

You can quickly get started with Weaviate by running the following command in your CLI:

pip install weaviate-client

Setup Weaviate

To start using Weaviate, establish a connection to your local or cloud-hosted instance by initializing a Weaviate client and configuring authentication with your API key.

import weaviate
import os

client = weaviate.Client(
    url="http://localhost:8080",  # Change this if using Weaviate Cloud
    auth_client_secret=weaviate.AuthApiKey(os.getenv("WEAVIATE_API_KEY"))  # Set your API key
)

To enable efficient similarity search, define a Weaviate schema that stores documents with a text property for raw content and an associated vector for embeddings. Since Weaviate supports both internal and external vectorization, this schema is configured to use an external embedding model.

...

# Define the schema
class_name = "Document"
if not client.schema.exists(class_name):
    schema = {
        "classes": [
            {
                "class": class_name,
                "vectorizer": "none",  # Using an external embedding model
                "properties": [
                    {"name": "text", "dataType": ["text"]},  # Stores chunk text
                ]
            }
        ]
    }
    client.schema.create(schema)

Before adding documents to Weaviate, convert text into vector representations using an embedding model. We'll be using all-MiniLM-L6-v2 from sentence_transformers.

tip

Using an external embedding model ensures flexibility in choosing the most suitable representation for your data, which can be important if your Weaviate engine is struggling to score well on metrics like Contextual Precision.

...

# Load an embedding model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")

# Example document chunks
document_chunks = [
    "Weaviate is a cloud-native vector database for scalable AI search.",
    "Weaviate enables fast semantic search across millions of vectors.",
    "It integrates with external embedding models for custom vectorization.",
    ...
]
# Store chunks with embeddings
with client.batch as batch:
    for i, chunk in enumerate(document_chunks):
        embedding = model.encode(chunk).tolist()  # Convert text to vector
        batch.add_data_object(
            {"text": chunk}, class_name=class_name, vector=embedding
        )

Evaluating Weaviate Retrieval

Once the Weaviate retriever is set up, we can begin evaluating its effectiveness in returning relevant contexts. This involves:

Constructing a Test Case: to do so, define an input query that represents a typical search scenario and prepare the expected output. Then generate the actual_output for the given input and extract the retrieved context during generation.
Evaluating the Test Case: simply run deepeval's evaluate function on your populated test case and selection of retriever metrics.

Preparing your Test Case

The first step in generating the actual_output from your RAG pipeline is retrieving the relevant retrieval_context from your Qdrant collection based on the input query. Below is a function that encodes the query, searches for the top 3 most relevant vectors in Qdrant, and extracts the corresponding text from the retrieved results.

...

def search(query):
    query_embedding = model.encode(query).tolist()

    result = client.query.get("Document", ["text"]) \
        .with_near_vector({"vector": query_embedding}) \
        .with_limit(3) \
        .do()

    return [hit["text"] for hit in result["data"]["Get"]["Document"]] if result["data"]["Get"]["Document"] else None

query = "How does Weaviate work?"
retrieval_context = search(query)

Next, incorporate the retrieved context into your LLM's prompt template to generate a response.

prompt = """
Answer the user question based on the supporting context.

User Question:
{input}

Supporting Context:
{retrieval_context}
"""

actual_output = generate(prompt)  # Replace with your LLM function
print(actual_output)

With both the actual_output and retrieval_context generated, we now have all the necessary parameters to construct our test case:

from deepeval.test_case import LLMTestCase

test_case = LLMTestCase(
    input=input,
    actual_output=actual_output,
    retrieval_context=retrieval_context,
    expected_output="Weaviate is a powerful vector database for AI applications, optimized for efficient semantic retrieval.",
)

Before proceeding with the evaluation, let's examine the generated actual_output.

Weaviate is a cloud-native vector database that enables fast semantic search using vector embeddings and hybrid retrieval.

Running Evaluations

To evaluate an LLMTestCase, define the relevant retrieval metrics and pass them into the evaluate function along with the test case.

from deepeval.metrics import (
    ContextualRecallMetric,
    ContextualPrecisionMetric,
    ContextualRelevancyMetric,
)
from deepeval import evaluate

...

contextual_recall = ContextualRecallMetric(),
contextual_precision = ContextualPrecisionMetric()
contextual_relevancy = ontextualRelevancyMetric()

evaluate(
    [test_case],
    metrics=[contextual_recall, contextual_precision, contextual_relevancy]
)

Improving Weaviate Retrieval

Once you've evaluated your Weaviate retriever, it's time to analyze the results and fine-tune your retrieval pipeline. Below are example evaluation results from more test cases.

Query	Contextual Precision	Contextual Recall	Contextual Relevancy
"How does Weaviate store vector data?"	0.62	0.95	0.50
"Explain Weaviate's indexing method."	0.55	0.89	0.47
"What makes Weaviate efficient for search?"	0.68	0.91	0.53

Contextual Precision is suboptimal → Some retrieved contexts might be too generic or off-topic.
Contextual Recall is strong → Weaviate is retrieving enough relevant documents.
Contextual Relevancy is inconsistent → The quality of retrieved documents varies across queries.

info

Each metric is impacted by specific retrieval hyperparameters. To understand how these affect your results, refer to this RAG evaluation guide.

Improving Retrieval Quality

To enhance retrieval performance, experiment with the following Weaviate hyperparameters:

Tuning with_limit (Top-K retrieval)
- If precision is low, reduce with_limit to retrieve fewer but more accurate results.
- If recall is too high with irrelevant results, adjust with_limit to balance quantity and quality.
Optimizing vectorizer (embedding model)
- Test alternative embedding models for better domain-specific retrieval:
  - BAAI/bge-small-en for ranking improvements.
  - nomic-ai/nomic-embed-text-v1 for retrieving longer-form documents.
  - msmarco-distilbert-base-v4 for passage retrieval.
Implementing Hybrid Retrieval (Vector + BM25)
- If Weaviate’s pure vector search isn’t retrieving precise matches, combining vector search with BM25 keyword retrieval can help.
Applying Advanced Filtering (nearText, where constraints)
- Leverage metadata-based filtering to refine search results and remove less relevant chunks.

Experimenting With Different Configurations

To systematically test variations in retrieval settings, run multiple test cases and compare contextual metric scores.

# Example of running multiple test cases with different retrieval settings
for vectorizer in ["all-MiniLM-L6-v2", "bge-small-en", "nomic-embed-text-v1"]:
    retrieval_context = search(query, vectorizer)

    test_case = LLMTestCase(
        input=query,
        actual_output=llm.generate(query, retrieval_context),
        retrieval_context=retrieval_context,
        expected_output="Weaviate is an optimized vector database for AI applications.",
    )

    evaluate([test_case], metrics=[contextual_recall, contextual_precision, contextual_relevancy])

Tracking Improvements

After tuning your Weaviate retriever, monitor improvements in Contextual Precision, Contextual Recall, and Contextual Relevancy to determine the best hyperparameter combination.

tip

For structured tracking of retrieval performance and hyperparameter comparisons, Confident AI provides real-time evaluation analysis.

Quick Summary​

Setup Weaviate​

Evaluating Weaviate Retrieval​

Preparing your Test Case​

Running Evaluations​

Improving Weaviate Retrieval​

Improving Retrieval Quality​

Experimenting With Different Configurations​

Tracking Improvements​