Weaviate Academy

Recall that Weaviate supports three types of searches: vector, keyword, and hybrid.

Keyword search uses a traditional text matching algorithm (BM25) for exact term matching.

Vector search (also called semantic search) identifies similar objects by their meaning, using vector similarity.

Now that we have our data, let's compare how vector, keyword, and hybrid search handle the same queries to understand their different behaviors.

Instantiate the collection object

import weaviate
import os

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.getenv("WEAVIATE_URL"),  # Replace with your WCD URL
    auth_credentials=os.getenv("WEAVIATE_API_KEY")
)

movies = client.collections.use("Movies")

API docs

We'll use the collection object to perform the searches.

Define a function to compare search types

Here, we set up a function to compare all three search types.

from weaviate.collections import Collection
from weaviate.classes.query import MetadataQuery


def compare_search_types(collection: Collection, query: str):
    """
    Run the same query with vector, keyword, and hybrid search.
    Print results side-by-side for comparison.
    """

    # 1. Vector search (near_text)
    vector_results = collection.query.near_text(
        query=query,
        limit=5,
        target_vector="title",
        return_metadata=MetadataQuery(distance=True)
    )

    print(f"\nVector search results for: {query}:")
    for o in vector_results.objects:
        print(o.properties["title"], f"vector distance: {o.metadata.distance:.3f}")

    # Keyword search (BM25)
    keyword_results = collection.query.bm25(
        query=query,
        limit=5,
        query_properties=["title"],
        return_metadata=MetadataQuery(score=True)
    )

    print(f"\nKeyword search results for: {query}")
    for o in keyword_results.objects:
        print(o.properties["title"], f"score: {o.metadata.score:.3f}")

    # Hybrid search
    hybrid_results = collection.query.hybrid(
        query=query,
        limit=5,
        target_vector="title",
        query_properties=["title"],
        return_metadata=MetadataQuery(score=True)
    )

    print(f"\nHybrid search results for: {query}")
    for o in hybrid_results.objects:
        print(o.properties["title"], f"score: {o.metadata.score:.3f}")


# Test with each query
compare_search_types(movies, "Captain America")
compare_search_types(movies, "action")
compare_search_types(movies, "adventur")
compare_search_types(movies, "액션")

client.close()

API docs

Then it's called with different queries so we can observe how the results differ. Take a look at the results, and reflect on how the different search types perform.

Query 1: "Captain America" (exact title)

Vector search:

Captain America: The First Avenger    distance: 0.494
Captain America: Civil War            distance: 0.506
Captain America: The Winter Soldier   distance: 0.538
Captain Marvel                        distance: 0.592
The Avengers                          distance: 0.732

Keyword search:

Captain America: Civil War            score: 4.116
Captain America: The Winter Soldier   score: 3.675
Captain America: The First Avenger    score: 3.675
Captain Marvel                        score: 2.642

Hybrid search:

Captain America: Civil War            score: 0.974
Captain America: The First Avenger    score: 0.910
Captain America: The Winter Soldier   score: 0.820
Captain Marvel                        score: 0.500
The Avengers                          score: 0.215

Key insight: All three methods find the right movies. Hybrid combines both signals - "Civil War" wins because it scores well on both vector similarity AND keyword matching (fewer words means higher BM25 score).

Query 2: "action" (generic word)

Vector search:

Last Action Hero       distance: 0.705
Sister Act             distance: 0.782
Mission: Impossible    distance: 0.815
Inception              distance: 0.816
Mars Attacks!          distance: 0.824

Keyword search:

Last Action Hero       score: 2.781

Hybrid search:

Last Action Hero       score: 1.000
Sister Act             score: 0.398
Mission: Impossible    score: 0.263
Inception              score: 0.262
Mars Attacks!          score: 0.230

Key insight: Keyword search only finds one movie (exact match). Vector search finds action movies semantically even without the word "action" in the title. Hybrid balances both - exact match gets boosted while still finding semantic matches.

Query 3: "adventur" (misspelling)

Vector search:

Avatar                                  distance: 0.806
The Avengers                            distance: 0.822
Monsters, Inc.                          distance: 0.836
Captain America: The First Avenger      distance: 0.842
Antz                                    distance: 0.843

Keyword search:

(no results)

Hybrid search:

Avatar                                  score: 0.700
The Avengers                            score: 0.609
Monsters, Inc.                          score: 0.532
Captain America: The First Avenger      score: 0.497
Antz                                    score: 0.495

Key insight: Keyword search completely fails on typos. Vector search is robust - the embedding model understands "adventur" ≈ "adventure". Critical for user-facing applications where typos are common.

Query 4: "액션" (Korean for "action")

Vector search:

Last Action Hero           distance: 0.688
Hot Shots!                 distance: 0.751
Kick-Ass                   distance: 0.752
Mars Attacks!              distance: 0.754
American History X         distance: 0.760

Keyword search:

(no results)

Hybrid search:

Last Action Hero           score: 0.700
Hot Shots!                 score: 0.376
Kick-Ass                   score: 0.369
Mars Attacks!              score: 0.355
American History X         score: 0.325

Key insight: Keyword search can't match across languages. Vector search works cross-lingually - the embedding model maps "액션" (Korean) ≈ "action" (English) to the same semantic space. Essential for multi-lingual applications.

Patterns and trade-offs

These examples illustrate that there is no single best search type. Each has its own strengths and weaknesses, meaning you must make trade-offs based on your specific use case.

Vector search

✅ Semantic/conceptual queries - "space action" finds sci-fi movies
✅ Typo tolerance - "adventur" still works
✅ Multi-lingual - Korean query finds English titles
✅ Multi-modal - image search, video search, audio search
✅ Broad matching - "action" finds action movies without that word
❌ Less predictable - "Sister Act" for "action" (semantically related, but unexpected)

Keyword search

✅ Exact matching - precise BM25 scoring for exact terms
✅ Predictable - you know why results match
✅ Transparent - easy to explain
❌ Sparse results - "action" returns only 1 movie
❌ No typo tolerance - "adventur" returns nothing
❌ No semantic, multi-modal or multi-lingual understanding

Hybrid search

✅ Balanced - combines semantic + exact match
✅ Better coverage - more results than keyword alone
✅ Typo tolerant - inherits from vector component
✅ Exact match boosting - still rewards precision
✅ Most versatile - handles all query types reasonably well
❌ Slower & higher load - requires both searches to be performed

Trade-off: Vector favors recall (finding related items), keyword favors precision (exact matches), hybrid balances both. This can be tuned via the alpha parameter (more on this later).

Default recommendation

For most user-facing search applications, start with hybrid search. It provides resilience to different query types, typos, and user intent while still rewarding exact matches.

What's next?

Now that you understand how each search type behaves, let's dive into details about each type.

← Back to Lesson Overview

Comparing search types

Instantiate the collection object

Define a function to compare search types

Query 1: "Captain America" (exact title)

Query 2: "action" (generic word)

Query 3: "adventur" (misspelling)

Query 4: "액션" (Korean for "action")

Patterns and trade-offs

Vector search

Keyword search

Hybrid search

← Back to Lesson Overview

Comparing search types

Instantiate the collection object​

Define a function to compare search types​

Query 1: "Captain America" (exact title)​

Query 2: "action" (generic word)​

Query 3: "adventur" (misspelling)​

Query 4: "액션" (Korean for "action")​

Patterns and trade-offs​

Vector search​

Keyword search​

Hybrid search​

Instantiate the collection object

Define a function to compare search types

Query 1: "Captain America" (exact title)

Query 2: "action" (generic word)

Query 3: "adventur" (misspelling)

Query 4: "액션" (Korean for "action")

Patterns and trade-offs

Vector search

Keyword search

Hybrid search