Vector configuration
Your embedding model choice affects search quality, speed, and costs. Changing models requires re-vectorizing your entire dataset, making this a critical early decision.
Model selection
If text-only search with balanced cost/performance → snowflake-arctic-embed-m-v1.5 (Weaviate Embeddings), text-embedding-3-small (OpenAI)
If multi-modal content (text + images) → Modern image retrieval models (e.g. embed-v4.0 (Cohere), jina-embeddings-v4 (JinaAI), ColPali models
- Can embed both text descriptions and images in same vector space
If API inference undesirable → consider local inference (e.g. Ollama, transformers)
If highest quality needed → consider snowflake-arctic-embed-l-v2.0 (Weaviate Embeddings), embed-v4.0 (Cohere), text-embedding-3-small (OpenAI)
- Fast-moving space; review the latest benchmarks (e.g. MTEB) & developments
Single vs multiple vectors
Each object can have multiple independent vectors, each with its own index.
If different search strategies needed → Multiple vectors per object
- Enables specialized search on different aspects
- Example: Product with separate vectors for categories and product type vs list of features
If general-purpose search across mixed content → Single combined vector
- Simpler setup, lower resource usage
- Example: Combine title + description for unified product search
Planning example
In an E-commerce product collection, an object might look like this:
{
"title": "Wireless Bluetooth Headphones",
"description": "High-quality audio with noise cancellation",
"category": "Electronics",
"price": 199.99,
"images": [image_1, image_2, ...] // In base64 format
}
Each with two vectors:
- A
descriptionvector, based ondescriptionproperty, to allow semantic search on text, and - A
visualvector, based on the image(s), to allow similar image, or text-based searches on the visual similarity
This allows users to:
- Search by text: "noise cancelling headphones"
- Search by image: Upload a photo of similar headphones
- Filter by: Category, price range, etc.
Now that you understand vector configuration, let's explore vector index settings - the technical choices that determine search performance and resource usage.