Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Along with the error record, row number, sheet name or number and excel file name will be written to the error dataset.
  • RecordReader implemention for ExcelInputReader will return a whole row and the conditions like extraction of certain columns will be implemented at source plugin class.

Input Json format:

Code Block
languagejs
linenumberstrue
{
  "name": "ExcelInputReader",
  "type": "batchsource",
  "properties": {
        "filesPath": "file:///hadoop/hdfs/xyz.xls",
		"filesPattern": "*",
        "memoryTableName": "memory-table",
        "reprocess": "false",
        "sheetName": "memory-table",
        "sheetNo": "2",
		"columnList": "A,B,C", 
        "skipFirstColumn": "false",
        "terminateIfEmptyRow": "false",
        "rowsLimit": "2000" ,
        "outputSchema": "column1:dataType1,column2:dataType2",
        "ifErrorRecord" : "dataset",
        "errorDatasetName": "error-dataset" 
   } 
}

...

       b. If the user specified field names in output schema are different than the excel column names or mapped values, a validation error will be thrown.

    E. Should we also provide the user the ability to skip first row in case, the sheet has headers?

 

Assumptions

1. All the excel files/sheet specified should have required output columns otherwise it would be considered as error excel.

...