Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Row KeyColumn KeyValueNote
MyNamespace:HRFileProperties

inputDir=/data/2017/hr

regex=*.csv

failOnError=false

One Row per namespace per dataset
MyNamespace: PersonFileProperties

inputDir=/data/2017/person

regex=*.csv

failOnError=false

One Row per namespace per dataset
MyNamespace:EmployeeDataProperties

rowid=ID

/*should we store schema too? what if that changes per run?*/

One Row per namespace per dataset
MyNamespace:EmployeeData:AllFieldsID

/* We may not necessarily required to store any value*/

created_time:12345678

updated_time:12345678

last_updated_by:runid_X

One Row per namespace per dataset
MyNamespace:EmployeeData:AllFieldsName  
MyNamespace:EmployeeData:AllFieldsDepartment  
MyNamespace:EmployeeData:AllFieldsContactDetails  
MyNamespace:EmployeeData:AllFieldsJoiningDate  
MyNamespace:EmployeeData:ID:<runidX-inverted-start-time>:runidXLineage

Please see the full JSON below.

 

 
One row per run if field is part of target
MyNamespace:EmployeeData:Name:<runidX-inverted-start-time>:runidXLineageSimilar JSONOne row per run if field is part of target
MyNamespace:EmployeeData:ContactDetails:<runidX-inverted-start-time>:runidXLineageSimilar JSONOne row per run if field is part of target
MyNamespace:EmployeeData:JoiningDate:<runidX-inverted-start-time>:runidXLineageSimilar JSONOne row per run if field is part of target

JSON stored for ID field:

Code Block
{
  "sources": [
    {
      "name": "PersonFile",
      "properties": {
        "inputPath": "/data/2017/persons",
        "regex": "*.csv"
      }
    },
    {
      "name": "HRFile",
      "properties": {
        "inputPath": "/data/2017/hr",
        "regex": "*.csv"
      }
    }
  ],
  "targets": [
    {
      "name": "Employee Data"
    }
  ],
  "operations": [
    {
      "inputs": [
        {
          "name": "PersonRecord",
          "source": "PersonFile"
        }
      ],
      "outputs": [
        {
          "name": "PersonRecord.body"
        }
      ],
      "name": "READ",
      "description": "Read Person file."
    },
    {
      "inputs": [
        {
          "name": "PersonRecord.body"
        }
      ],
      "outputs": [
        {
          "name": "SSN"
        }
      ],
      "name": "PARSE",
      "description": "Parse the body field"
    },
    {
      "inputs": [
        {
          "name": "HRRecord",
          "source": "HRFile"
        }
      ],
      "outputs": [
        {
          "name": "HRRecord.body"
        }
      ],
      "name": "READ",
      "description": "Read HR file."
    },
    {
      "inputs": [
        {
          "name": "PersonRecord.body"
        }
      ],
      "outputs": [
        {
          "name": "Employee_Name"
        },
        {
          "name": "Dept_Name"
        }
      ],
      "name": "PARSE",
      "description": "Parse the body field"
    },
    {
      "inputs": [
        {
          "name": "Employee_Name"
        },
        {
          "name": "Dept_Name"
        },
        {
          "name": "SSN"
        }
      ],
      "outputs": [
        {
          "name": "ID",
          "target": "Employee Data"
        }
      ],
      "name": "GenerateID",
      "description": "Generate unique Employee Id"
    }
  ]
}

...