Chroma

Quick Summary

Chroma is one of the most popular open-source AI application databases, and supports many retrieval features such as embeddings storage, vector search, document storage, metadata filtering, and multi-modal retrieval.

DeepEval allows you to easily evaluate and optimize your Chroma retriever by tuning hyperparameters like n_results (more commonly known as top-K) and the embedding model used in your Chroma retrieval pipeline.

info

To get started, install Chroma through the CLI using the following command:

pip install chromadb

To learn more about using Chroma for your RAG pipeline, visit this page. The diagram below illustrates how you can utilize Chroma as the entire retrieval pipeline for your LLM application.

Source: Chroma

Setup Chroma

To get started with Chroma, initialize a persistent client and create a collection to store your documents. The collection acts as a vector database for storing and retrieving embeddings, while the persistent client ensures data is retained across sessions.

import chromadb

# Initialize Chroma client
client = chromadb.PersistentClient(path="./chroma_db")

# Create or load a collection
collection = client.get_or_create_collection(name="rag_documents")

Next, define an embedding model (we'll use sentence_transformers) to convert document chunks into vectors before adding them to your Chroma collection, along with the document chunks as metadata.

...

# Load an embedding model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")

# Example document chunks
document_chunks = [
    "Chroma is an open-source vector database for efficient embedding retrieval.",
    "It enables fast semantic search using vector similarity.",
    "Chroma retrieves relevant data with cosine similarity.",
    ...
]

# Store chunks with embeddings in Chroma
for i, chunk in enumerate(document_chunks):
    embedding = model.encode(chunk).tolist()  # Convert text to vector
    collection.add(
        ids=[str(i)],  # Unique ID for each document
        embeddings=[embedding],  # Vector representation
        metadatas=[{"text": chunk}]  # Store original text as metadata
    )

You'll be querying from this Chroma collection during generation to retrieve relevant contexts based on the user input, before passing them along with your input into your LLM's prompt template.

note

By default, Chroma utilizes cosine similarity to find similar chunks.

Evaluating Chroma Retrieval

To evaluate your Chroma retriever, you'll first need to prepare an input query and generate a response from your RAG pipeline in order to create an LLMTestCase. You'll also need to extract the contexts retrieved from your Chroma collection during generation and prepare the expected LLM response to complete the LLMTestCase.

rmation

By default, input and actual_output are required for all metrics. However, retrieval_context, context, and expected_output are optional, and different metrics may or may not require additional parameters. To check the specific requirements, visit the metrics section.

After you've prepared your LLMTestCase, evaluating your Chroma retriever is as easy passing the test case along with your selection of metrics into DeepEval's evaluate function.

Preparing your Test Case

To prepare our test case, we'll be using "How does Chroma work?" as our input. Before generating a response from your RAG pipeline, you'll first need to retrieve the relevant context using a search function. Our search function in the example below first embeds the input query before retrieving the top three most relevant text chunks (n_results=3) from our chroma collection.

...

def search(query):
    query_embedding = model.encode(query).tolist()

    res = collection.query(
        query_embeddings=[query_embedding],
        n_results=3  # Retrieve top-K matches
    )

    return res["metadatas"][0][0]["text"] if res["metadatas"][0] else None

query = "How does Chroma work?"
retrieval_context = search(query)

Next, we'll pass the retrieved context from our Chroma collection into the LLM's prompt template to generate the final response.

...

prompt = """
Answer the user question based on the supporting context.

User Question:
{input}

Supporting Context:
{retrieval_context}
"""

actual_output = generate(prompt)  # Replace with your LLM function
print(actual_output)
print(expected_output)

Printing the actual_output generated by our RAG pipeline yields the following example:

Chroma is a lightweight vector database designed for AI applications, enabling fast semantic retrieval.

Let's compare this to the expected_output we've prepared:

Chroma is an open-source vector database that enables fast retrieval using cosine similarity.

With all the elements ready, we'll create an LLMTestCase by providing the input and expected output, along with the actual output and retrieved context.

from deepeval.test_case import LLMTestCase

...

test_case = LLMTestCase(
    input=input,
    actual_output=actual_output,
    retrieval_context=retrieval_context,
    expected_output=expected_output
)

Running Evaluations

To begin running evaluations, we'll need to define metrics relevant to our Chroma retriever. These include ContextualRecallMetric, ContextualPrecisionMetric, and ContextualRelevancyMetric, which specifically evaluate RAG retrievers.

tip

To learn more about how these metrics are calculated and why they're relevant to retrievers, visit the individual metric pages.

from deepeval.metrics import (
    ContextualPrecisionMetric,
    ContextualRecallMetric,
    ContextualRelevancyMetric,
)

contextual_precision = ContextualPrecisionMetric()
contextual_recall = ContextualRecallMetric(),
contextual_relevancy = ontextualRelevancyMetric()

To run evaluations, simply pass the prepared test case you've prepared into the evaluate function, along with the retriever metrics you defined.

from deepeval import evaluate

...

evaluate(
    [test_case],
    metrics=[contextual_recall, contextual_precision, contextual_relevancy]
)

Improving Chroma Retrieval

Hypothetically, we've run multiple inputs and prepared several test cases, consistently observing that the Contextual Relevancy score is below the required threshold.

Inputs	Contextual Relevancy Score	Contextual Recall Score
"How does Chroma work?"	0.45	0.85
"What is the retrieval process in Chroma?"	0.43	0.92
"Explain Chroma's vector database."	0.55	0.67

This suggests that you may need to adjust the length of each document or tweak n_results to retrieve more relevant contexts from your Chroma collection. This is because Contextual Relevancy evaluates both the retrieved text chunks and the top-K selection.

tip

If you're curious about which metrics evaluate which specific retrieval parameters, check out this guide.

Depending on the failing scores in your retriever, you'll want to experiment with different parameters (e.g., n_results, embedding model, etc.) in your Chroma retrieval pipeline until you're satisfied with the results. This can be as simple as writing a for loop to run evaluations many times:

...

def search(query, n_results):
    query_embedding = model.encode(query).tolist()

    res = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results  # Retrieve top-K matches
    )

    return res["metadatas"][0][0]["text"] if res["metadatas"][0] else None


# Define input and expected output
...

# Iterate over different top-K values
for top_k in [3, 5, 7]:
    retrieval_context = search(input_query, top_k)

    # Define test case
    ...

    # Evaluate the retrieval quality
    evaluate(
        [test_case],
        metrics=[contextual_recall, contextual_precision, contextual_relevancy]
    )

note

If you need a systematic way to analyze your retriever and compare the effects of changing chroma hyperparameters side by side, you'll want to log in to Confident AI.

Quick Summary​

Setup Chroma​

Evaluating Chroma Retrieval​

Preparing your Test Case​

Running Evaluations​

Improving Chroma Retrieval​