...
Row Key | Column Key | Value | Note | ||||
---|---|---|---|---|---|---|---|
MyNamespace:HRFile:<runidX-inverted-start-time>:runidX | Properties | inputDir=/data/2017/hr regex=*.csv failOnError=false | One Row per namespace per dataset per run | ||||
MyNamespace: PersonFile:<runidX-inverted-start-time>:runidX | Properties | inputDir=/data/2017/person regex=*.csv failOnError=false | One Row per namespace per dataset per run | ||||
MyNamespace:EmployeeData:<runidX-inverted-start-time>:runidX | Properties | rowid=ID /*should we store schema too? what if that changes per run?*/ | One Row per namespace per dataset per run | ||||
MyNamespace:EmployeeData:AllFields:<runidX-inverted-start-time>:runidX | ID | /* We may not necessarily required to store any value*/ created_time:12345678 updated_time:12345678 last_updated_by:runid_X | One Row per namespace per dataset per run | ||||
MyNamespace:EmployeeData:AllFields:<runidX-inverted-start-time>:runidX | Name | ||||||
MyNamespace:EmployeeData:AllFields:<runidX-inverted-start-time>:runidX | Department | ||||||
MyNamespace:EmployeeData:ID:<runidX-MyNamespace:EmployeeData:AllFields | ContactDetails | ||||||
MyNamespace:EmployeeData:AllFields | JoiningDate | ||||||
:<runidX-inverted-start-time>:runidX | Lineage | Please see the full JSON below.ContactDetails | One row per run if field is part of target | ||||
MyNamespace:EmployeeData:NameAllFields:<runidX-inverted-start-time>:runidX | LineageJoiningDate | Similar JSON | One row per run if field is part of target | ||||
MyNamespace:EmployeeData:ContactDetails:<runidX-inverted-start-time>:runidX | Lineage | Similar JSON | One row per run if field is part of target | MyNamespace:EmployeeData:JoiningDate:<runidX-inverted-start-time>:runidX | Lineage | Similar JSON JSON representation of the LineageGraph provided by app to the platform.
| One row per run if field is part of per target dataset |
JSON stored for ID field:
Code Block |
---|
{ "sources": [ { "name": "PersonFile", "properties": { "inputPath": "/data/2017/persons", "regex": "*.csv" } }, { "name": "HRFile", "properties": { "inputPath": "/data/2017/hr", "regex": "*.csv" } } ], "targets": [ { "name": "Employee Data" } ], "operations": [ { "inputs": [ { "name": "PersonRecord", "properties": { "source": "PersonFile" } } ], "outputs": [ { "name": "body" } ], "name": "READ", "description": "Read Person file.", "properties": { "stage": "Person File Reader" } }, { "inputs": [ { "name": "body" } ], "outputs": [ { "name": "SSN" } ], "name": "PARSE", "description": "Parse the body field", "properties": { "stage": "Person File Parser" } }, { "inputs": [ { "name": "HRRecord", "properties": { "source": "HRFile" } } ], "outputs": [ { "name": "body" } ], "name": "READ", "description": "Read HR file.", "properties": { "stage": "HR File Reader" } }, { "inputs": [ { "name": "body" } ], "outputs": [ { "name": "Employee_Name" }, { "name": "Dept_Name" } ], "name": "PARSE", "description": "Parse the body field", "properties": { "stage": "HR File Parser" } }, { "inputs": [ { "name": "Employee_Name" }, { "name": "Dept_Name" }, { "name": "SSN" } ], "outputs": [ { "name": "ID", "properties": { "target": "Employee Data" } } ], "name": "GenerateID", "description": "Generate unique Employee Id", "properties": { "stage": "Field Normalizer" } } ] } |
...