Upgrade Metadata

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

Recently, changes to metadata indexing have been introduced in order to support date and numeric metadata search. Because of this, date and numeric search will not work on users’ existing entities created with the old indexing pattern. Therefore, there needs to be a way to update the metadata of outdated entities.

Goals

Detect outdated metadata and update their indexing. Ensure that no concurrency issues arise between users and the upgrade method updating the same metadata.

User Stories 

  • As a pipeline developer, I had many entities already created with numeric metadata values and I would like to now use the new Data Fusion feature to create numeric search queries on these entities.
  • As a pipeline developer, I had previously defined metadata properties with a date syntax that Data Fusion now supports, and I would like to do a date search over those properties.

Design

Update the indexing of outdated metadata entities. Outdated entities will be detected by their metadata version number (metadataVersion < 2 will be considered outdated). Since indexing information is stored in the Property class, new MetadataDocument instances with Property objects created with the new Property constructor will have to be made. Existing metadata information must be read and replaced in one transaction while keeping concurrency issues in mind (if both user and upgrade method attempt to update the same metadata information at the same time).


Approach

  • Change the value METADATA_VERSION in VersionInfo from 1 to 2.

    • Metadata entities to upgrade will have METADATA_VERSION < 2

    • This value can be changed again in the future for more upgrades.

  • Gather all outdated metadata entities into a list and pass them into a method [name TBD] to upgrade them.

Approach #1

  • Utilize: 

    • batch(List<? extends MetadataMutation> mutations, MutationOptions options) in ElasticsearchMetadataStorage

    • MetadataMutation is an abstract class and Update is a class that extends it. An Update object has type = UPDATE and takes in a MetadataEntity and Metadata objects that will be updated. 

  • Get MetadataEntity and Metadata pairs from MetadataRecords in order to construct MetadataMutation / Update objects to pass into the batch() method

  • In its implementation, batch() creates new MetadataDocument objects along with new Property objects. The newly implemented Property constructor will store information necessary for the new indexing format. Updating this information should update the indexing. 

  • batch() attempts to rewrite metadata until there are no conflicts (concurrency issues) so it is safe to use. 

  • Something to think about:

    • batch() checks whether there are any duplicate entities in the input and if there are then it doesn’t do a batch() update, but since we control the input and know that there will not be any duplicates perhaps we can separate those two parts of the code.

    • Currently:

      • batch(): checks for duplicates, does batch

      • Call batch() on outdated entities

    • Alternative: 

      • batch(): checks for duplicates, doBatchMethod()

      • Call doBatchMethod() on outdated entities

Created in 2020 by Google Inc.