Goal
This is a source plugin that would allow users to read and process mainframe files defined using COBOL Copybook. This should be basic first implementation.
...
Input Format implementation : here
Design
- Assumptions:
- .cbl file will have the schema in data structure
- Both data file and .cbl files would reside on HDFS
- For each AbstractLine read from the data file if the fields binary or binaryFile is true, the data will be encoded to Base64 format while reading
for (ExternalField field : externalRecord.getRecordFields()) {
AbstractFieldValue filedValue = line.getFieldValue(field.getName());
if (filedValue.isBinary()) {
value.put(field.getName(), new String(Base64.decodeBase64(Base64.encodeBase64String(
filedValue.toString().getBytes()))));
} else {
value.put(field.getName(), filedValue.toString());
}
}
Examples
Properties :
cobolFile : .cbl file contents to specify schema
binaryFilePath : hdfs path of .bin data file to be read
isCompressed : check if it is a compressed file.User can also specify a Native Compressed Codec as input.
outputSchema : list of fields in the output file
...
copybookContents : Contents of the COBOL copybook file which will contain the data structure
binaryFilePath : Complete path of the .bin to be read.This will be a fixed length binary format file,that matches the copybook.
fileStructure : CopyBook file structure. For the current implementation only fixed length flat files will be read.
Example :
This example reads data from a local binary file "file:///home/cdap/cdap/DTAR020_FB.bin" and parses it using the schema given in the text area "COCOL CopyBook contents"
It will generate structured records with either the output schema (if specified by the user) or with the default schema as is specified in the text area.
{
"name": "CopybookReaderCopyBookReader",
"plugin": {
"name": "CopybookReaderCopyBookReader",
"type": "batchsource",
"properties": {
"cobolFilePath": "/data/sales/sales.cbl",
"binaryFilePath" "/data/sales/sale.bin"
"isCompressed" : "true/false",
"outputSchema" : {},
"uploadFileProperties": {}
}
}
...
"schema": "{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"DTAR020-KEYCODE-NO\",\"type\":\"int\"},{\"name\":\"DATE\",\"type\":[\"int\",\"null\"]},{\"name\":\"DTAR020-DEPT-NO\",\"type\":[\"int\",\"null\"]},{\"name\":\"DTAR020-QTY-SOLD\",\"type\":[\"int\",\"null\"]},{\"name\":\"DTAR020-SALE-PRICE\",\"type\":[\"double\",\"null\"]}]}",
"referenceName": "CopyBook",
"copybookContents": "000100* \n000200* DTAR020 IS THE OUTPUT FROM DTAB020 FROM THE IML \n000300* CENTRAL REPORTING SYSTEM \n000400* \n000500* CREATED BY BRUCE ARTHUR 19/12/90 \n000600* \n000700* RECORD LENGTH IS 27. \n000800* \n000900 03 DTAR020-KCODE-STORE-KEY. \n001000 05 DTAR020-KEYCODE-NO PIC X(08). \n001100 05 DTAR020-STORE-NO PIC S9(03) COMP-3. \n001200 03 DTAR020-DATE PIC S9(07) COMP-3. \n001300 03 DTAR020-DEPT-NO PIC S9(03) COMP-3. \n001400 03 DTAR020-QTY-SOLD PIC S9(9) COMP-3. \n001500 03 DTAR020-SALE-PRICE PIC S9(9)V99 COMP-3. ",
"binaryFilePath": "file:///home/cdap/cdap/DTAR020_FB.bin",
"fileStructure": ""
}
}
}
Sample .cbl file:
000600*
000700* RECORD LENGTH IS 27.
000800*
000900 03 DTAR020-KCODE-STORE-KEY.
001000 05 DTAR020-KEYCODE-NO PIC X(08).
001100 05 DTAR020-STORE-NO PIC S9(03) COMP-3.
001200 03 DTAR020-DATE PIC S9(07) COMP-3.
001300 03 DTAR020-DEPT-NO PIC S9(03) COMP-3.
001400 03 DTAR020-QTY-SOLD PIC S9(9) COMP-3.
001500 03 DTAR020-SALE-PRICE PIC S9(9)V99 COMP-3.
The source plugin will read the above file as well as the data present in the .bin file and generate Base64 encoded output. The schema for the output will depend on the output schema as defined by the user.