Page Comparison

...

Row Key	Column Key	Value	Note
MyNamespace:HRFile:<runidX-inverted-start-time>:runidX	Properties	inputDir=/data/2017/hr regex=*.csv failOnError=false	One Row per namespace per dataset per run
MyNamespace: PersonFile:<runidX-inverted-start-time>:runidX	Properties	inputDir=/data/2017/person regex=*.csv failOnError=false	One Row per namespace per dataset per run
MyNamespace:EmployeeData:<runidX-inverted-start-time>:runidX	Properties	rowid=ID /should we store schema too? what if that changes per run?/	One Row per namespace per dataset per run
MyNamespace:EmployeeData:AllFields:<runidX-inverted-start-time>:runidX	ID	/* We may not necessarily required to store any value*/ created_time:12345678 updated_time:12345678 last_updated_by:runid_X	One Row per namespace per dataset per run
MyNamespace:EmployeeData:AllFields:<runidX-inverted-start-time>:runidX	Name
MyNamespace:EmployeeData:AllFields:<runidX-inverted-start-time>:runidX	Department
MyNamespace:EmployeeData:ID:<runidX-MyNamespace:EmployeeData:AllFields	ContactDetails
MyNamespace:EmployeeData:AllFields	JoiningDate
:<runidX-inverted-start-time>:runidX	Lineage	Please see the full JSON below.ContactDetails			One row per run if field is part of target
MyNamespace:EmployeeData:NameAllFields:<runidX-inverted-start-time>:runidX	LineageJoiningDate	Similar JSON	One row per run if field is part of target
MyNamespace:EmployeeData:ContactDetails:<runidX-inverted-start-time>:runidX	Lineage	Similar JSON	One row per run if field is part of target	MyNamespace:EmployeeData:JoiningDate:<runidX-inverted-start-time>:runidX	Lineage	Similar JSON JSON representation of the LineageGraph provided by app to the platform.	One row per run if field is part of per target dataset

JSON stored for ID field:

Code Block

{
  "sources": [
    {
      "name": "PersonFile",
      "properties": {
        "inputPath": "/data/2017/persons",
        "regex": "*.csv"
      }
    },
    {
      "name": "HRFile",
      "properties": {
        "inputPath": "/data/2017/hr",
        "regex": "*.csv"
      }
    }
  ],
  "targets": [
    {
      "name": "Employee Data"
    }
  ],
  "operations": [
    {
      "inputs": [
        {
          "name": "PersonRecord",
          "properties": {
            "source": "PersonFile"
          }
        }
      ],
      "outputs": [
        {
          "name": "body"
        }
      ],
      "name": "READ",
      "description": "Read Person file.",
      "properties": {
        "stage": "Person File Reader"
      }
    },
    {
      "inputs": [
        {
          "name": "body"
        }
      ],
      "outputs": [
        {
          "name": "SSN"
        }
      ],
      "name": "PARSE",
      "description": "Parse the body field",
      "properties": {
        "stage": "Person File Parser"
      }
    },
    {
      "inputs": [
        {
          "name": "HRRecord",
          "properties": {
            "source": "HRFile"
          }
        }
      ],
      "outputs": [
        {
          "name": "body"
        }
      ],
      "name": "READ",
      "description": "Read HR file.",
      "properties": {
        "stage": "HR File Reader"
      }
    },
    {
      "inputs": [
        {
          "name": "body"
        }
      ],
      "outputs": [
        {
          "name": "Employee_Name"
        },
        {
          "name": "Dept_Name"
        }
      ],
      "name": "PARSE",
      "description": "Parse the body field",
      "properties": {
        "stage": "HR File Parser"
      }
    },
    {
      "inputs": [
        {
          "name": "Employee_Name"
        },
        {
          "name": "Dept_Name"
        },
        {
          "name": "SSN"
        }
      ],
      "outputs": [
        {
          "name": "ID",
          "properties": {
            "target": "Employee Data"
          }
        }
      ],
      "name": "GenerateID",
      "description": "Generate unique Employee Id",
      "properties": {
        "stage": "Field Normalizer"
      }
    }
  ]
}

...

Versions Compared

Old Version 12

New Version 13

Key