Vector Similarity#

Vectors (also called “Embeddings”), represent an AI model’s impression (or understanding) of a piece of unstructured data like text, images, audio, videos, etc. Vector Similarity Search (VSS) is the process of finding vectors in the vector database that are similar to a given query vector. Popular VSS uses include recommendation systems, image and video search, document retrieval, and question answering.

Index Creation#

Before doing vector search, first define the schema and create an index.

[1]:
import redis
from redis.commands.search.field import TagField, VectorField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query

r = redis.Redis(host="localhost", port=6379)

INDEX_NAME = "index"                              # Vector Index Name
DOC_PREFIX = "doc:"                               # RediSearch Key Prefix for the Index

def create_index(vector_dimensions: int):
    try:
        # check to see if index exists
        r.ft(INDEX_NAME).info()
        print("Index already exists!")
    except:
        # schema
        schema = (
            TagField("tag"),                       # Tag Field Name
            VectorField("vector",                  # Vector Field Name
                "FLAT", {                          # Vector Index Type: FLAT or HNSW
                    "TYPE": "FLOAT32",             # FLOAT32 or FLOAT64
                    "DIM": vector_dimensions,      # Number of Vector Dimensions
                    "DISTANCE_METRIC": "COSINE",   # Vector Search Distance Metric
                }
            ),
        )

        # index Definition
        definition = IndexDefinition(prefix=[DOC_PREFIX], index_type=IndexType.HASH)

        # create Index
        r.ft(INDEX_NAME).create_index(fields=schema, definition=definition)

We’ll start by working with vectors that have 1536 dimensions.

[2]:
# define vector dimensions
VECTOR_DIMENSIONS = 1536

# create the index
create_index(vector_dimensions=VECTOR_DIMENSIONS)

Adding Vectors to Redis#

Next, we add vectors (dummy data) to Redis using hset. The search index listens to keyspace notifications and will include any written HASH objects prefixed by DOC_PREFIX.

[ ]:
%pip install numpy
[3]:
import numpy as np
[4]:
# instantiate a redis pipeline
pipe = r.pipeline()

# define some dummy data
objects = [
    {"name": "a", "tag": "foo"},
    {"name": "b", "tag": "foo"},
    {"name": "c", "tag": "bar"},
]

# write data
for obj in objects:
    # define key
    key = f"doc:{obj['name']}"
    # create a random "dummy" vector
    obj["vector"] = np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
    # HSET
    pipe.hset(key, mapping=obj)

res = pipe.execute()

Searching#

You can use VSS queries with the .ft(...).search(...) query command. To use a VSS query, you must specify the option .dialect(2).

There are two supported types of vector queries in Redis: KNN and Range. Hybrid queries can work in both settings and combine elements of traditional search and VSS.

KNN Queries#

KNN queries are for finding the topK most similar vectors given a query vector.

[5]:
query = (
    Query("*=>[KNN 2 @vector $vec as score]")
     .sort_by("score")
     .return_fields("id", "score")
     .paging(0, 2)
     .dialect(2)
)

query_params = {
    "vec": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
}
r.ft(INDEX_NAME).search(query, query_params).docs
[5]:
[Document {'id': 'doc:b', 'payload': None, 'score': '0.2376562953'},
 Document {'id': 'doc:c', 'payload': None, 'score': '0.240063905716'}]

Range Queries#

Range queries provide a way to filter results by the distance between a vector field in Redis and a query vector based on some pre-defined threshold (radius).

[6]:
query = (
    Query("@vector:[VECTOR_RANGE $radius $vec]=>{$YIELD_DISTANCE_AS: score}")
     .sort_by("score")
     .return_fields("id", "score")
     .paging(0, 3)
     .dialect(2)
)

# Find all vectors within 0.8 of the query vector
query_params = {
    "radius": 0.8,
    "vec": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
}
r.ft(INDEX_NAME).search(query, query_params).docs
[6]:
[Document {'id': 'doc:a', 'payload': None, 'score': '0.243115246296'},
 Document {'id': 'doc:c', 'payload': None, 'score': '0.24981123209'},
 Document {'id': 'doc:b', 'payload': None, 'score': '0.251443207264'}]

See additional Range Query examples in this Jupyter notebook.

Hybrid Queries#

Hybrid queries contain both traditional filters (numeric, tags, text) and VSS in one single Redis command.

[7]:
query = (
    Query("(@tag:{ foo })=>[KNN 2 @vector $vec as score]")
     .sort_by("score")
     .return_fields("id", "tag", "score")
     .paging(0, 2)
     .dialect(2)
)

query_params = {
    "vec": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
}
r.ft(INDEX_NAME).search(query, query_params).docs
[7]:
[Document {'id': 'doc:b', 'payload': None, 'score': '0.24422544241', 'tag': 'foo'},
 Document {'id': 'doc:a', 'payload': None, 'score': '0.259926855564', 'tag': 'foo'}]

See additional Hybrid Query examples in this Jupyter notebook.

Vector Creation and Storage Examples#

The above examples use dummy data as vectors. However, in reality, most use cases leverage production-grade AI models for creating embeddings. Below we will take some sample text data, pass it to the OpenAI and Cohere API’s respectively, and then write them to Redis.

[8]:
texts = [
    "Today is a really great day!",
    "The dog next door barks really loudly.",
    "My cat escaped and got out before I could close the door.",
    "It's supposed to rain and thunder tomorrow."
]

OpenAI Embeddings#

Before working with OpenAI Embeddings, we clean up our existing search index and create a new one.

[9]:
# delete index
r.ft(INDEX_NAME).dropindex(delete_documents=True)

# make a new one
create_index(vector_dimensions=VECTOR_DIMENSIONS)
[ ]:
%pip install openai
[10]:
import openai

# set your OpenAI API key - get one at https://platform.openai.com
openai.api_key = "YOUR OPENAI API KEY"
[11]:
# Create Embeddings with OpenAI text-embedding-ada-002
# https://openai.com/blog/new-and-improved-embedding-model
response = openai.Embedding.create(input=texts, engine="text-embedding-ada-002")
embeddings = np.array([r["embedding"] for r in response["data"]], dtype=np.float32)

# Write to Redis
pipe = r.pipeline()
for i, embedding in enumerate(embeddings):
    pipe.hset(f"doc:{i}", mapping = {
        "vector": embedding.tobytes(),
        "content": texts[i],
        "tag": "openai"
    })
res = pipe.execute()
[12]:
embeddings
[12]:
array([[ 0.00509819,  0.0010873 , -0.00228475, ..., -0.00457579,
         0.01329307, -0.03167175],
       [-0.00357223, -0.00550784, -0.01314328, ..., -0.02915693,
         0.01470436, -0.01367203],
       [-0.01284631,  0.0034875 , -0.01719686, ..., -0.01537451,
         0.01953256, -0.05048691],
       [-0.01145045, -0.00785481,  0.00206323, ..., -0.02070181,
        -0.01629098, -0.00300795]], dtype=float32)

Search with OpenAI Embeddings#

Now that we’ve created embeddings with OpenAI, we can also perform a search to find relevant documents to some input text.

[13]:
text = "animals"

# create query embedding
response = openai.Embedding.create(input=[text], engine="text-embedding-ada-002")
query_embedding = np.array([r["embedding"] for r in response["data"]], dtype=np.float32)[0]

query_embedding
[13]:
array([ 0.00062901, -0.0070723 , -0.00148926, ..., -0.01904645,
       -0.00436092, -0.01117944], dtype=float32)
[14]:
# query for similar documents that have the openai tag
query = (
    Query("(@tag:{ openai })=>[KNN 2 @vector $vec as score]")
     .sort_by("score")
     .return_fields("content", "tag", "score")
     .paging(0, 2)
     .dialect(2)
)

query_params = {"vec": query_embedding.tobytes()}
r.ft(INDEX_NAME).search(query, query_params).docs

# the two pieces of content related to animals are returned
[14]:
[Document {'id': 'doc:1', 'payload': None, 'score': '0.214349985123', 'content': 'The dog next door barks really loudly.', 'tag': 'openai'},
 Document {'id': 'doc:2', 'payload': None, 'score': '0.237052619457', 'content': 'My cat escaped and got out before I could close the door.', 'tag': 'openai'}]

Cohere Embeddings#

Before working with Cohere Embeddings, we clean up our existing search index and create a new one.

[15]:
# delete index
r.ft(INDEX_NAME).dropindex(delete_documents=True)

# make a new one for cohere embeddings (1024 dimensions)
VECTOR_DIMENSIONS = 1024
create_index(vector_dimensions=VECTOR_DIMENSIONS)
[ ]:
%pip install cohere
[16]:
import cohere

co = cohere.Client("YOUR COHERE API KEY")
[17]:
# Create Embeddings with Cohere
# https://docs.cohere.ai/docs/embeddings
response = co.embed(texts=texts, model="small")
embeddings = np.array(response.embeddings, dtype=np.float32)

# Write to Redis
for i, embedding in enumerate(embeddings):
    r.hset(f"doc:{i}", mapping = {
        "vector": embedding.tobytes(),
        "content": texts[i],
        "tag": "cohere"
    })
[18]:
embeddings
[18]:
array([[-0.3034668 , -0.71533203, -0.2836914 , ...,  0.81152344,
         1.0253906 , -0.8095703 ],
       [-0.02560425, -1.4912109 ,  0.24267578, ..., -0.89746094,
         0.15625   , -3.203125  ],
       [ 0.10125732,  0.7246094 , -0.29516602, ..., -1.9638672 ,
         1.6630859 , -0.23291016],
       [-2.09375   ,  0.8588867 , -0.23352051, ..., -0.01541138,
         0.17053223, -3.4042969 ]], dtype=float32)

Search with Cohere Embeddings#

Now that we’ve created embeddings with Cohere, we can also perform a search to find relevant documents to some input text.

[19]:
text = "animals"

# create query embedding
response = co.embed(texts=[text], model="small")
query_embedding = np.array(response.embeddings[0], dtype=np.float32)

query_embedding
[19]:
array([-0.49682617,  1.7070312 ,  0.3466797 , ...,  0.58984375,
        0.1060791 , -2.9023438 ], dtype=float32)
[20]:
# query for similar documents that have the cohere tag
query = (
    Query("(@tag:{ cohere })=>[KNN 2 @vector $vec as score]")
     .sort_by("score")
     .return_fields("content", "tag", "score")
     .paging(0, 2)
     .dialect(2)
)

query_params = {"vec": query_embedding.tobytes()}
r.ft(INDEX_NAME).search(query, query_params).docs

# the two pieces of content related to animals are returned
[20]:
[Document {'id': 'doc:1', 'payload': None, 'score': '0.658673524857', 'content': 'The dog next door barks really loudly.', 'tag': 'cohere'},
 Document {'id': 'doc:2', 'payload': None, 'score': '0.662699103355', 'content': 'My cat escaped and got out before I could close the door.', 'tag': 'cohere'}]

Find more example apps, tutorials, and projects using Redis Vector Similarity Search in this GitHub organization.