Skip to main content

Vector configuration

Your embedding model choice affects search quality, speed, and costs. Changing models requires re-vectorizing your entire dataset, making this a critical early decision.

Model selection

If text-only search with balanced cost/performance → snowflake-arctic-embed-m-v1.5 (Weaviate Embeddings), text-embedding-3-small (OpenAI)

If multi-modal content (text + images) → Modern image retrieval models (e.g. embed-v4.0 (Cohere), jina-embeddings-v4 (JinaAI), ColPali models

  • Can embed both text descriptions and images in same vector space

If API inference undesirable → consider local inference (e.g. Ollama, transformers)

If highest quality needed → consider snowflake-arctic-embed-l-v2.0 (Weaviate Embeddings), embed-v4.0 (Cohere), text-embedding-3-small (OpenAI)

  • Fast-moving space; review the latest benchmarks (e.g. MTEB) & developments

Single vs multiple vectors

Each object can have multiple independent vectors, each with its own index.

If different search strategies needed → Multiple vectors per object

  • Enables specialized search on different aspects
  • Example: Product with separate vectors for categories and product type vs list of features

If general-purpose search across mixed content → Single combined vector

  • Simpler setup, lower resource usage
  • Example: Combine title + description for unified product search

Planning example

In an E-commerce product collection, an object might look like this:

{
"title": "Wireless Bluetooth Headphones",
"description": "High-quality audio with noise cancellation",
"category": "Electronics",
"price": 199.99,
"images": [image_1, image_2, ...] // In base64 format
}

Each with two vectors:

  • A description vector, based on description property, to allow semantic search on text, and
  • A visual vector, based on the image(s), to allow similar image, or text-based searches on the visual similarity

This allows users to:

  • Search by text: "noise cancelling headphones"
  • Search by image: Upload a photo of similar headphones
  • Filter by: Category, price range, etc.
What's next?

Now that you understand vector configuration, let's explore vector index settings - the technical choices that determine search performance and resource usage.

Login to track your progress