...
- Along with the error record, row number, sheet name or number and excel file name will be written to the error dataset.
- RecordReader implemention for ExcelInputReader will return a whole row and the conditions like extraction of certain columns will be implemented at source plugin class.
Input Json format:
Code Block | ||||
---|---|---|---|---|
| ||||
{ "name": "ExcelInputReader", "type": "batchsource", "properties": { "filesPath": "file:///hadoop/hdfs/xyz.xls", "filesPattern": "*", "memoryTableName": "memory-table", "reprocess": "false", "sheetName": "memory-table", "sheetNo": "2", "columnList": "A,B,C", "skipFirstColumn": "false", "terminateIfEmptyRow": "false", "rowsLimit": "2000" , "outputSchema": "column1:dataType1,column2:dataType2", "ifErrorRecord" : "dataset", "errorDatasetName": "error-dataset" } } |
...
b. If the user specified field names in output schema are different than the excel column names or mapped values, a validation error will be thrown.
E. Should we also provide the user the ability to skip first row in case, the sheet has headers?
Assumptions
1. All the excel files/sheet specified should have required output columns otherwise it would be considered as error excel.
...