Selection framework
Now that you've seen how search types differ, let's build a framework for selecting the right one for your use case.
Ask these questions to guide your choice:
1. What type of queries will be used?
- Natural language descriptions → Vector or Hybrid
- Controlled/validated input → Keyword or Hybrid
- Exact names/titles/IDs → Keyword or Hybrid
- Mixed/unpredictable (e.g. user-typed queries) → Hybrid
2. What's your data like?
- Single modality (text only) → Any type works
- Multi-modal (text + images) → Vector (required)
- Multi-lingual → Vector or Hybrid
- Domain-specific terminology → Keyword
3. What's your priority?
- Must find exact matches → Keyword or Hybrid
- Need semantic understanding → Vector or Hybrid
- Balance both → Hybrid
Real-world scenarios
Scenario 1: E-commerce product search
Context:
- Users search with natural language ("red running shoes")
- Some search by SKU or exact product name
- Multi-lingual customer base
- User typos common
Recommendation: Hybrid search (alpha=0.6-0.8)
Configuration:
products = client.collections.use("Products")
response = products.query.hybrid(
query=user_query,
alpha=0.75, # 75% vector, 25% keyword
query_properties=["title^2", "description", "sku^3"] # Boost exact SKU matches
)
Why:
- Handles both natural language AND exact SKU searches
- Tolerates typos while rewarding exact matches
- Property boosting ensures exact matches rank high
Scenario 2: Legal document search
Context:
- Lawyers search for specific case citations
- Exact legal terminology critical
- Document language is formal and consistent
- Predictable query patterns
Recommendation: Keyword search (BM25)
Configuration:
from weaviate.classes.config import Property, DataType
client.collections.create(
name="LegalDocs",
properties=[
Property(
name="citation",
data_type=DataType.TEXT,
),
Property(
name="summary",
data_type=DataType.TEXT,
),
# Additonal settings not shown
]
)
documents = client.collections.use("LegalDocs")
response = documents.query.bm25(
query="Diamond v. Diehr",
query_properties=["citation^3", "summary"]
)
Why:
- Exact matching is critical in legal context
- Keyword search is predictable and precise
- Faster than vector search for exact lookups
- Transparency in why results match
Scenario 3: Content recommendation
Context:
- Find similar articles to what user is reading
- No explicit user query
- Content in multiple languages
- Semantic similarity needed
Recommendation: Vector search (near_object or near_text)
Configuration:
articles = client.collections.use("Articles")
response = articles.query.near_object(
near_object=current_article_uuid,
limit=10
)
Why:
- Object-based similarity (no query text)
- Vector search is the only appropriate choice
- Works across languages
- Captures semantic relationships
Quick reference
| Use Case | Search Type | Key Reason |
|---|---|---|
| Multi-modal search | Vector | Only option for cross-modal |
| Domain-specific docs (legal, medical) | Keyword | Precision critical |
| E-commerce / general search | Hybrid | Mixed query types |
| Content recommendations | Vector | Similarity-based |
| Multi-lingual content | Vector/Hybrid | Cross-lingual capability |
| Controlled form inputs | Keyword | Predictable queries |
| User-facing search (typos likely) | Hybrid | Typo tolerance + flexibility |
Practice scenarios
Consider these use cases and decide which search type to use:
- Customer support FAQ - Users search for help articles
- Academic paper search - Researchers search by topic or citation
- Code repository search - Developers search for functions and files
- Social media posts - Users search their own posts
- Music recommendation - Find songs similar to current track
Suggested answers
- Customer support FAQ: Hybrid (α=0.7) - Natural language queries, semantic understanding important
- Academic papers: Hybrid (α=0.5) - Citations need exact matching (keyword), topics need semantic (vector)
- Code repository: Keyword or Hybrid (α=0.3) - Exact function names matter, but some semantic helps
- Social media posts: Hybrid (α=0.6) - Mix of semantic and exact matching
- Music recommendation: Vector (near_object) - Similarity-based, no text query
When in doubt, start with hybrid search (alpha=0.5-0.7). It handles most query types well and you can tune the alpha parameter based on your specific needs.
What's next?
You now have a comprehensive understanding of search strategies in Weaviate.
But, keep in mind that there's no universal "best" configuration. The right search strategy depends on:
- Your data characteristics
- Your users' query patterns
- Your quality vs. performance requirements
Use this course as a foundation, then experiment and iterate with your specific use case.
Try implementing these strategies in a small test project first. Start with the basics (search type selection, default tokenization), get it working, then optimize based on actual results. Good luck! 🚀