Skip to main content

Selection framework

Now that you've seen how search types differ, let's build a framework for selecting the right one for your use case.

Ask these questions to guide your choice:

1. What type of queries will be used?

  • Natural language descriptions → Vector or Hybrid
  • Controlled/validated input → Keyword or Hybrid
  • Exact names/titles/IDs → Keyword or Hybrid
  • Mixed/unpredictable (e.g. user-typed queries) → Hybrid

2. What's your data like?

  • Single modality (text only) → Any type works
  • Multi-modal (text + images) → Vector (required)
  • Multi-lingual → Vector or Hybrid
  • Domain-specific terminology → Keyword

3. What's your priority?

  • Must find exact matches → Keyword or Hybrid
  • Need semantic understanding → Vector or Hybrid
  • Balance both → Hybrid

Real-world scenarios

Context:

  • Users search with natural language ("red running shoes")
  • Some search by SKU or exact product name
  • Multi-lingual customer base
  • User typos common

Recommendation: Hybrid search (alpha=0.6-0.8)

Configuration:

products = client.collections.use("Products")

response = products.query.hybrid(
query=user_query,
alpha=0.75, # 75% vector, 25% keyword
query_properties=["title^2", "description", "sku^3"] # Boost exact SKU matches
)

Why:

  • Handles both natural language AND exact SKU searches
  • Tolerates typos while rewarding exact matches
  • Property boosting ensures exact matches rank high

Context:

  • Lawyers search for specific case citations
  • Exact legal terminology critical
  • Document language is formal and consistent
  • Predictable query patterns

Recommendation: Keyword search (BM25)

Configuration:

from weaviate.classes.config import Property, DataType

client.collections.create(
name="LegalDocs",
properties=[
Property(
name="citation",
data_type=DataType.TEXT,
),
Property(
name="summary",
data_type=DataType.TEXT,
),
# Additonal settings not shown
]
)

documents = client.collections.use("LegalDocs")

response = documents.query.bm25(
query="Diamond v. Diehr",
query_properties=["citation^3", "summary"]
)

Why:

  • Exact matching is critical in legal context
  • Keyword search is predictable and precise
  • Faster than vector search for exact lookups
  • Transparency in why results match

Scenario 3: Content recommendation

Context:

  • Find similar articles to what user is reading
  • No explicit user query
  • Content in multiple languages
  • Semantic similarity needed

Recommendation: Vector search (near_object or near_text)

Configuration:

articles = client.collections.use("Articles")

response = articles.query.near_object(
near_object=current_article_uuid,
limit=10
)

Why:

  • Object-based similarity (no query text)
  • Vector search is the only appropriate choice
  • Works across languages
  • Captures semantic relationships

Quick reference

Use CaseSearch TypeKey Reason
Multi-modal searchVectorOnly option for cross-modal
Domain-specific docs (legal, medical)KeywordPrecision critical
E-commerce / general searchHybridMixed query types
Content recommendationsVectorSimilarity-based
Multi-lingual contentVector/HybridCross-lingual capability
Controlled form inputsKeywordPredictable queries
User-facing search (typos likely)HybridTypo tolerance + flexibility

Practice scenarios

Consider these use cases and decide which search type to use:

  1. Customer support FAQ - Users search for help articles
  2. Academic paper search - Researchers search by topic or citation
  3. Code repository search - Developers search for functions and files
  4. Social media posts - Users search their own posts
  5. Music recommendation - Find songs similar to current track
Suggested answers
  1. Customer support FAQ: Hybrid (α=0.7) - Natural language queries, semantic understanding important
  2. Academic papers: Hybrid (α=0.5) - Citations need exact matching (keyword), topics need semantic (vector)
  3. Code repository: Keyword or Hybrid (α=0.3) - Exact function names matter, but some semantic helps
  4. Social media posts: Hybrid (α=0.6) - Mix of semantic and exact matching
  5. Music recommendation: Vector (near_object) - Similarity-based, no text query
Start with hybrid

When in doubt, start with hybrid search (alpha=0.5-0.7). It handles most query types well and you can tune the alpha parameter based on your specific needs.

What's next?

You now have a comprehensive understanding of search strategies in Weaviate.

But, keep in mind that there's no universal "best" configuration. The right search strategy depends on:

  • Your data characteristics
  • Your users' query patterns
  • Your quality vs. performance requirements

Use this course as a foundation, then experiment and iterate with your specific use case.

Ready to build?

Try implementing these strategies in a small test project first. Start with the basics (search type selection, default tokenization), get it working, then optimize based on actual results. Good luck! 🚀

Login to track your progress