Skip to main content

Keyword & Hybrid search

In this lesson, you will perform keyword (BM25) and hybrid searches in practice.

We covered the concepts of keyword and hybrid searches briefly in another course (Key Concepts & Architecture). Briefly, a keyword search identifies similar objects based on exact word matches, while a hybrid search combines both keyword and semantic similarity.

Here, you will see a practical example of how to perform keyword and hybrid searches using the Python client and the previously ingested "Movies" data.

Code

Run this to find entries in "Movies" with the highest keyword search scores for the term "history", and print out the title and release year of the top 5 matches.

import weaviate
from weaviate.classes.query import Filter, MetadataQuery
import os


# Instantiate your client (not shown). e.g.:
# client = weaviate.connect_to_weaviate_cloud(...) or
# client = weaviate.connect_to_local(...)

# Configure collection object
movies = client.collections.use("Movies")

# Perform query
response = movies.query.bm25(
query="history", limit=5, return_metadata=MetadataQuery(score=True)
)

# Inspect the response
for o in response.objects:
print(o.properties["title"], o.properties["release_date"].year) # Print the title and release year (note the release date is a datetime object)
print(f"BM25 score: {o.metadata.score:.3f}\n") # Print the BM25 score of the object from the query

client.close()

Explain the code

The results are based on a keyword search score using the BM25f algorithm.

The limit parameter here sets the maximum number of results to return.

The return_metadata parameter takes a MetadataQuery class instance to set metadata properties to return. This query returns the BM25 score of each result.

Example results
American History X 1998
BM25 score: 2.707

A Beautiful Mind 2001
BM25 score: 1.896

Legends of the Fall 1994
BM25 score: 1.663

Hacksaw Ridge 2016
BM25 score: 1.554

Night at the Museum 2006
BM25 score: 1.529

Code

Run this to find entries in "Movies" with the highest hybrid search scores for the term "history", and print out the title and release year of the top 5 matches.

import weaviate
from weaviate.classes.query import Filter, MetadataQuery
import os


# Instantiate your client (not shown). e.g.:
# client = weaviate.connect_to_weaviate_cloud(...) or
# client = weaviate.connect_to_local(...)

# Configure collection object
movies = client.collections.use("Movies")

# Perform query
response = movies.query.hybrid(
query="history", limit=5, return_metadata=MetadataQuery(score=True)
)

# Inspect the response
for o in response.objects:
print(o.properties["title"], o.properties["release_date"].year) # Print the title and release year (note the release date is a datetime object)
print(f"Hybrid score: {o.metadata.score:.3f}\n") # Print the hybrid search score of the object from the query

client.close()

Explain the code

The results are based on a hybrid search score. A hybrid search blends results of BM25 and semantic/vector searches.

The limit parameter here sets the maximum number of results to return.

The return_metadata parameter takes a MetadataQuery class instance to set metadata properties to return. This query returns the hybrid score of each result.

Example results
Legends of the Fall 1994
Hybrid score: 0.016

Hacksaw Ridge 2016
Hybrid score: 0.016

A Beautiful Mind 2001
Hybrid score: 0.015

The Butterfly Effect 2004
Hybrid score: 0.015

Night at the Museum 2006
Hybrid score: 0.012
What's next?

Keyword searches reward exact matches, while hybrid searches combine the benefits of keyword and semantic searches. In the next module, you will learn how to combine any search type with filters to further refine search results.

Login to track your progress