Weaviate Academy

Each data object in Weaviate has a unique identifier in UUID (Universally Unique Identifier) format. A strategic use of UUIDs can provide a great deal of value, on top of identifying objects.

Let's explore how to put these strategies into practice.

UUID generation strategies

Auto-generated UUIDs (default)

If a UUID is not provided, Weaviate generates a new random UUID for each object:

from weaviate.classes.query import Filter

products = client.collections.use("Products")

# Each import creates new objects (different UUIDs)
product = {"sku": "SKU123", "name": "Mouse", "price": 29.99}

products.data.insert(product)  # Creates object with UUID A
products.data.insert(product)  # Creates object with UUID B (duplicate!)

response = products.query.fetch_objects(
    filters=Filter.by_property("sku").equal("SKU123")
)
print(f"Number of products with SKU123: {len(response.objects)}")  # Outputs 2

# This should show identical properties but different UUIDs
for obj in response.objects:
    print(f"Object UUID: {obj.uuid}, Name: {obj.properties['name']}, Price: {obj.properties['price']}")

API docs

Result: Every import creates new objects, even if data is identical.

Use when:

Objects are truly unique each time
No need to re-import same data
Example: Time-series events, user interactions, logs

Deterministic UUIDs

You can generate deterministic UUIDs using standards like UUIDv5. This allows the same input data to always produce the same UUID.

In the Weaviate Python client, use generate_uuid5() to create such UUIDs, using a unique identifier from your data, such as a SKU, email address, URL or file path.

from weaviate.util import generate_uuid5

product = {"sku": "SKU-123", "name": "Mouse", "price": 29.99}
uuid = generate_uuid5(product["sku"])

API docs

Then, use the generated UUID when adding objects.

Note that single inserts and batch inserts treat this differently. Single inserts don't allow this, but batch inserts do:

Single inserts
Batch inserts

Inserting an object with the same deterministic UUID results in an error.

from weaviate.util import generate_uuid5
from weaviate.exceptions import UnexpectedStatusCodeError

products = client.collections.use("Products")

product = {"sku": "SKU-123", "name": "Mouse", "price": 29.99}

# Same input = same UUID
uuid = generate_uuid5(product["sku"])

products.data.insert(product, uuid=uuid)  # Creates object

# Attempting to insert again with same UUID raises error
product = {"sku": "SKU-123", "name": "Mouse", "price": 39.99}  # Different price
try:
    products.data.insert(product, uuid=uuid)  # Throws error as object exists
except UnexpectedStatusCodeError as e:
    print(f"Error on duplicate insert: {e}")

API docs

Batch inserting an object with the same deterministic UUID updates the existing object.

from weaviate.util import generate_uuid5

products = client.collections.use("Products")

product = {"sku": "SKU-123", "name": "Mouse", "price": 29.99}

# Same input = same UUID
uuid = generate_uuid5(product["sku"])

products.data.insert(product, uuid=uuid)  # Creates object

# Re-inserting with same UUID updates existing object
product = {"sku": "SKU-123", "name": "Mouse", "price": 39.99}  # Different price
with products.batch.fixed_size(batch_size=100) as batch:
    batch.add_object(properties=product, uuid=uuid)  # Runs without error, updates object

print(products.query.fetch_object_by_id(uuid).properties["price"])  # Outputs 39.99 (updated price)
# END DeterministicUUIDSingle

client.collections.delete("DocChunks")  # Clean up if exists

client.collections.create(
    name="DocChunks",
    properties=[
        Property(name="document_path", data_type=DataType.TEXT),
        Property(name="chunk_index", data_type=DataType.INT),
    ],
    vector_config=Configure.Vectors.text2vec_openai()
)

chunk_objs = [
    {"document_path": "/docs/report.pdf", "chunk_index": 0},
    {"document_path": "/docs/report.pdf", "chunk_index": 1},
]

# START UUIDFromMultipleProps
from weaviate.util import generate_uuid5

chunks = client.collections.use("DocChunks")

# Combine multiple properties for uniqueness
with chunks.batch.fixed_size(batch_size=100) as batch:
    for chunk in chunk_objs:
        # Unique key from document path + chunk index
        unique_key = f"{chunk['document_path']}-{chunk['chunk_index']}"
        uuid = generate_uuid5(unique_key)

        batch.add_object(
            properties=chunk,
            uuid=uuid
        )
# END UUIDFromMultipleProps

for cname in ["Products", "Manuals"]:
    client.collections.delete(cname)  # Clean up if exists

    client.collections.create(
        name=cname,
        properties=[
            Property(name="sku", data_type=DataType.TEXT),
        ],
        vector_config=Configure.Vectors.text2vec_openai()
    )

product_objs = [
    {"sku": "SKU001"},
    {"sku": "SKU002"},
]

manual_objs = product_objs

# START UUIDWithNamespace
from weaviate.util import generate_uuid5

products = client.collections.use("Products")
manuals = client.collections.use("Manuals")

with products.batch.fixed_size(batch_size=100) as batch:
    for product in product_objs:
        batch.add_object(
            properties=product,
            # Use different namespaces to avoid collisions
            uuid=generate_uuid5(identifier=product["sku"], namespace="products")
        )

with manuals.batch.fixed_size(batch_size=100) as batch:
    for manual in manual_objs:
        batch.add_object(
            properties=manual,
            # Use different namespaces to avoid collisions
            uuid=generate_uuid5(identifier=manual["sku"], namespace="manuals")
        )
# END UUIDWithNamespace

products.data.delete_by_id(uuid=generate_uuid5("SKU001"))

# START ExampleDeduplication
products = client.collections.use("Products")

product_objs = [
    {"sku": "SKU001"},
    {"sku": "SKU001"},  # Duplicate
]

uuid = generate_uuid5("SKU001")
with products.batch.fixed_size(batch_size=100) as batch:
    for product in product_objs:
        batch.add_object(
            properties=product,
            uuid=uuid  # Deterministic UUID ensures no duplicates
        )

response = products.query.fetch_objects(
    filters=Filter.by_property("sku").equal("SKU001")
)
print(len(response.objects))  # Outputs 1, deduplicated
# END ExampleDeduplication

products.data.delete_by_id(uuid=generate_uuid5("SKU001"))

# START ExampleIdempotency
products = client.collections.use("Products")

with products.batch.fixed_size(batch_size=100) as batch:
    for product in product_objs:
        batch.add_object(
            properties=product,
            uuid=generate_uuid5("SKU001")
        )

# Re-running the same batch import does not create duplicates
with products.batch.fixed_size(batch_size=100) as batch:
    for product in product_objs:
        batch.add_object(
            properties=product,
            uuid=generate_uuid5("SKU001")
        )

response = products.query.fetch_objects(
    filters=Filter.by_property("sku").equal("SKU001")
)
print(len(response.objects))  # Outputs 1, idempotent
# END ExampleIdempotency

products.data.delete_by_id(uuid=generate_uuid5("SKU001"))

# START ExampleUpdates
products = client.collections.use("Products")

products.data.insert(
    properties={"sku": "SKU001", "price": 39.99},
    uuid=generate_uuid5("SKU001")
)

# Having a deterministic UUID allows us to easily update the object
products.data.update(
    properties={"sku": "SKU001", "price": 30.05},
    uuid=generate_uuid5("SKU001")
)

response = products.query.fetch_objects(
    filters=Filter.by_property("sku").equal("SKU001")
)
print(response.objects[0].properties["price"])  # Outputs 30.05, updated price
# END ExampleUpdates

products.data.delete_by_id(uuid=generate_uuid5("SKU001"))

# START ExampleDeletions
products = client.collections.use("Products")

products.data.insert(
    properties={"sku": "SKU001", "price": 39.99},
    uuid=generate_uuid5("SKU001")
)

# Having a deterministic UUID allows us to easily delete the object
products.data.delete_by_id(
    uuid=generate_uuid5("SKU001")
)

response = products.query.fetch_objects(
    filters=Filter.by_property("sku").equal("SKU001")
)
print(len(response.objects))  # Outputs 0, object deleted
# END ExampleDeletions

client.close()

API docs

Recommended UUID strategies

Weaviate is primarily designed to use UUIDv4 (random) or UUIDv5 (namespace name-based, deterministic) identifiers.

When generating deterministic UUIDs, you can use the weaviate.util.generate_uuid5() function with different input strategies:

Generally, the seed for UUIDv5 should be a natural unique identifier in your data that doesn't change. Take these examples:

Object Type	Good Identifier	Bad Identifier
E-commerce product	Product SKU	Product name or details
User account	User's email address	User's address, name, description
Document or file	URL or file path	File contents or metadata

From unique identifier

If a natural unique identifier exists, you can use it directly to generate a UUID.

from weaviate.util import generate_uuid5

product = {"sku": "SKU-123", "name": "Mouse", "price": 29.99}
uuid = generate_uuid5(product["sku"])

API docs

From multiple properties

In some cases, a combination of properties creates a unique identifier. Concatenate these properties to form the seed for UUID generation.

One example may be in a chunked document, where the combination of document_path and chunk_index uniquely identifies each chunk.

from weaviate.util import generate_uuid5

chunks = client.collections.use("DocChunks")

# Combine multiple properties for uniqueness
with chunks.batch.fixed_size(batch_size=100) as batch:
    for chunk in chunk_objs:
        # Unique key from document path + chunk index
        unique_key = f"{chunk['document_path']}-{chunk['chunk_index']}"
        uuid = generate_uuid5(unique_key)

        batch.add_object(
            properties=chunk,
            uuid=uuid
        )

API docs

With namespace

UUIDv5 supports namespaces to avoid collisions. If you import similar data into multiple collections, use a different namespace for each collection to ensure uniqueness.

from weaviate.util import generate_uuid5

products = client.collections.use("Products")
manuals = client.collections.use("Manuals")

with products.batch.fixed_size(batch_size=100) as batch:
    for product in product_objs:
        batch.add_object(
            properties=product,
            # Use different namespaces to avoid collisions
            uuid=generate_uuid5(identifier=product["sku"], namespace="products")
        )

with manuals.batch.fixed_size(batch_size=100) as batch:
    for manual in manual_objs:
        batch.add_object(
            properties=manual,
            # Use different namespaces to avoid collisions
            uuid=generate_uuid5(identifier=manual["sku"], namespace="manuals")
        )

API docs

Use cases for deterministic UUIDs

Once you have a strategy for generating deterministic UUIDs, you can leverage them for several use cases, such as Deduplication, Idempotency, Updates, and Deletions.

See below for examples of each.

Deduplication
Idempotency
Updates
Deletions

When importing data that may contain duplicates, using deterministic UUIDs ensures that identical objects are not added multiple times.

products = client.collections.use("Products")

product_objs = [
    {"sku": "SKU001"},
    {"sku": "SKU001"},  # Duplicate
]

uuid = generate_uuid5("SKU001")
with products.batch.fixed_size(batch_size=100) as batch:
    for product in product_objs:
        batch.add_object(
            properties=product,
            uuid=uuid  # Deterministic UUID ensures no duplicates
        )

response = products.query.fetch_objects(
    filters=Filter.by_property("sku").equal("SKU001")
)
print(len(response.objects))  # Outputs 1, deduplicated

API docs

Idempotent imports allow you to safely re-run the same import multiple times without creating duplicates. Using deterministic UUIDs ensures that re-imported objects update existing ones rather than creating new entries.

products = client.collections.use("Products")

with products.batch.fixed_size(batch_size=100) as batch:
    for product in product_objs:
        batch.add_object(
            properties=product,
            uuid=generate_uuid5("SKU001")
        )

# Re-running the same batch import does not create duplicates
with products.batch.fixed_size(batch_size=100) as batch:
    for product in product_objs:
        batch.add_object(
            properties=product,
            uuid=generate_uuid5("SKU001")
        )

response = products.query.fetch_objects(
    filters=Filter.by_property("sku").equal("SKU001")
)
print(len(response.objects))  # Outputs 1, idempotent

API docs

You can update existing objects by re-importing them with the same deterministic UUID but modified properties.

products = client.collections.use("Products")

products.data.insert(
    properties={"sku": "SKU001", "price": 39.99},
    uuid=generate_uuid5("SKU001")
)

# Having a deterministic UUID allows us to easily update the object
products.data.update(
    properties={"sku": "SKU001", "price": 30.05},
    uuid=generate_uuid5("SKU001")
)

response = products.query.fetch_objects(
    filters=Filter.by_property("sku").equal("SKU001")
)
print(response.objects[0].properties["price"])  # Outputs 30.05, updated price

API docs

You can remove specific objects by their deterministic UUIDs.

products = client.collections.use("Products")

products.data.insert(
    properties={"sku": "SKU001", "price": 39.99},
    uuid=generate_uuid5("SKU001")
)

# Having a deterministic UUID allows us to easily delete the object
products.data.delete_by_id(
    uuid=generate_uuid5("SKU001")
)

response = products.query.fetch_objects(
    filters=Filter.by_property("sku").equal("SKU001")
)
print(len(response.objects))  # Outputs 0, object deleted

API docs

When NOT to use deterministic UUIDs

In any scenario where each object is inherently unique, avoid using deterministic UUIDs. Examples include:

Time-series data (logs, events)
User interactions (clicks, views)
Versioned content (documents with multiple versions)

UUIDv7 (time-ordered UUIDs)

UUIDv7 is designed to be time-ordered, which means a timestamp of some sort can be encoded into the UUID itself, and the resulting UUIDs will sort in chronological order.

However, Weaviate is not optimized for UUIDv7 at this time. UUIDv7 will lead to performance degradation in async replication scenarios. Use it with caution and test thoroughly if you choose to do so.

UUID strategy decision tree

Generally, using a deterministic UUID with a robust seed is recommended. In summary, here is a decision tree to help choose the right UUID strategy:

Is each object inherently unique?
├─ No → Use auto-generated UUIDs
└─ Yes → Continue

Does your data have a unique identifier?
├─ Yes → Use generate_uuid5(unique_id)
└─ No → Continue

Is there a combination of fields that's unique?
├─ Yes → Use generate_uuid5(field1 + field2)
└─ No → Use generate_uuid5(entire_object)

Do you import the same data to multiple collections?
└─ Yes → Add namespace parameter

Best practices

Choose the right identifier: Use natural unique IDs when possible
Be consistent: Same strategy across all imports for a collection
Use namespaces: Prevent collisions across collections
Document your strategy: Team needs to understand the approach
Test deduplication: Verify re-imports update instead of duplicate

What's next?

Now that you have a UUID strategy, let's learn how to optimize batch import performance.

← Back to Lesson Overview

UUID strategies for production

UUID generation strategies

Auto-generated UUIDs (default)

Deterministic UUIDs

Recommended UUID strategies

From unique identifier

From multiple properties

With namespace

Use cases for deterministic UUIDs

When NOT to use deterministic UUIDs

UUID strategy decision tree

Best practices

← Back to Lesson Overview

UUID strategies for production

UUID generation strategies​

Auto-generated UUIDs (default)​

Deterministic UUIDs​

Recommended UUID strategies​

From unique identifier​

From multiple properties​

With namespace​

Use cases for deterministic UUIDs​

When NOT to use deterministic UUIDs​

UUID strategy decision tree​

Best practices​

UUID generation strategies

Auto-generated UUIDs (default)

Deterministic UUIDs

Recommended UUID strategies

From unique identifier

From multiple properties

With namespace

Use cases for deterministic UUIDs

When NOT to use deterministic UUIDs

UUID strategy decision tree

Best practices