Page Comparison

...

Along with the error record, row number, sheet name or number and excel file name will be written to the error dataset.
RecordReader implemention for ExcelInputReader will return a whole row and the conditions like extraction of certain columns will be implemented at source plugin class.

Input Json format:

Code Block

language	js
linenumbers	true

{
  "name": "ExcelInputReader",
  "type": "batchsource",
  "properties": {
        "filesPath": "file:///hadoop/hdfs/xyz.xls",
		"filesPattern": "*",
        "memoryTableName": "memory-table",
        "reprocess": "false",
        "sheetName": "memory-table",
        "sheetNo": "2",
		"columnList": "A,B,C", 
        "skipFirstColumn": "false",
        "terminateIfEmptyRow": "false",
        "rowsLimit": "2000" ,
        "outputSchema": "column1:dataType1,column2:dataType2",
        "ifErrorRecord" : "dataset",
        "errorDatasetName": "error-dataset" 
   } 
}

...

b. If the user specified field names in output schema are different than the excel column names or mapped values, a validation error will be thrown.

E. Should we also provide the user the ability to skip first row in case, the sheet has headers?

Assumptions

1. All the excel files/sheet specified should have required output columns otherwise it would be considered as error excel.

...

Versions Compared

Old Version 20

New Version 21

Key

Assumptions