Lookup in Transforms

Lookup in Transforms

Requirements

  1. Operations

    1. Perform single + batch read on single + multiple dataset from script transform

    2. Perform single + batch read on single + multiple files from script transform

  2. Supported tables for lookup

    1. KeyValueTable dataset

    2. ObjectMappedTable dataset

    3. CSV files treated as a list of key-value pairs

  3. Optional caching with time-based expiration

Design

  1.  Lookup interface 

    interface Lookup<T> { T lookup(String key); Map<String, T> lookup(String... keys); Map<String, T> lookup(Set<String> keys); }
  2. Implement Lookup in KeyValueTable and ObjectMappedTable

    1. KeyValueTable implements Lookup<String>

    2. ObjectMappedTable implements Lookup<StructuredRecord>

  3. DatasetConfigurer changes

    1. Add method: void useDataset(String datasetName);

  4. ScriptTransform changes

    1. Add configuration property for declaring lookup tables to use, properties for each table (e.g. dataset properties)

      "tables": [ { "name":"purchases", "type":"dataset", "properties": { "dataset":"purchases", "properties":{.. dataset properties ..}, "enableCache":"true", "cacheExpiry":1234 } }, {"name":"ip2geo", "type":"file", "properties":{"file":"/data/ip2geo.csv"}} ]
    2. configure(): verify tables (datasets and files) exist by calling DatasetConfigurer.useDataset()

    3. transform(): execute lookup methods in a transaction, provide Lookup instance to script

      1. Options for lookup usage: 

        var result = context.getLookup("purchases").lookup(user);
      2. Options for batch lookup usage:

        var result = context.getLookup("purchases").lookup(["alice", "bob"]); // do something with result["alice"] // do something with result["bob"]

 

Created in 2020 by Google Inc.