Goal
This is a source plugin that would allow users to read and process mainframe files defined using COBOL Copybook. This should be basic first implementation.
...
User should be able to copy paste or provide a file that gets loaded into text section for COBOL copybook
User should have the ability to select the fields that one wants into the output schema. So he they should be able to specify the field.
...
Input Format implementation : here
Design
- Assumptions:
- .cbl file will have the schema in data structure
- Both data file and .cbl files would reside on HDFS
...
copybookContents : Contents of the COBOL copybook file which will contain the data structure
binaryFilePath : Complete path of the .bin to be read.This will be a fixed length binary format file,that matches the copybook.
drop : Comma-separated list of fields to drop. For example: 'field1,field2,field3'.
maxSplitSize : Maximum split-size for each mapper in the MapReduce . \n Job. Defaults to 128MB.
Example :
This example reads data from a local binary file "file:///home/cdap/DTAR020_FB.bin" and parses it using the schema given in the text area "COCOL CopyBookCOBOL Copybook"
It will drop field "DTAR020-DATE" and generate structured records with schema as specified in the text area.
{
"name": "CopyBookReaderCopybookReader",
"plugin": {
"name": "CopyBookReaderCopybookReader",
"type": "batchsource",
"properties": {
"drop" : "DTAR020-DATE",
"referenceName": "CopyBookCopybook",
"copybookContents":
"000100* \n
...