Skip to main content

Task: Ingest data into Weaviate

Before building the API, you need a dataset to work with. We've prepared a rich collection of movie data with pre-computed vector embeddings.

Understanding the Movie Dataset

The dataset includes:

  • Movie data: titles, descriptions, genres, release years, popularity scores
  • Multiple vectors per object: Separate embeddings for general content and genre-specific search, pre-computed using Weaviate Embeddings

Collection Schema

Your movie collection will have this structure:

Properties:

  • movie_id (INT): Unique identifier
  • title (TEXT): Movie title
  • overview (TEXT): Plot description
  • genres (TEXT_ARRAY): List of genre tags
  • year (INT): Release year
  • popularity (NUMBER): Popularity score

Vectors:

  • default: Embeddings from title + overview (for general search)
  • genres: Embeddings from genres (for genre-specific search)

Ingest data

populate.py

This script populate.py is responsible for ingesting your movie dataset into Weaviate.

Its purpose is to create a collection, read the data files, and ingest the data into Weaviate. The provided file is partially complete, and you'll need to fill in the missing parts.

The missing parts are marked as Python comments, with a marker such as # STUDENT TODO.

Helper functions and template code is provided to help you with this task. For example, parse_data_object() is configured to generate parsed movie data objects from the raw data. So you don't need to worry about the data parsing logic.

You will need to:

  • Create a collection (create_movies_collection())
    • Check if a collection exists, and if not - create it with the described configuration & data schema
  • Ingest the movie data (ingest_movies_data())
    • Get the movie collection from Weaviate
    • Using batch imports, ingest the data

delete_collection.py

If anything goes wrong, or you wish to restart the project, you may need to delete the existing movie collection and start over.

For this, we provide the delete_collection.py script for you to complete. This script is responsible for deleting the movie collection from Weaviate.

Since deleting a collection is a destructive action, the script asks for explicit confirmation before proceeding.

What's next?

Ready to implement the data ingestion? Let's look at the complete solution and understand the key concepts behind collection creation and batch imports.

Login to track your progress