Goal

This is a source plugin that would allow users to read and process mainframe files defined using COBOL Copybook. This should be basic first implementation.

Checklist

User stories documented
User stories reviewed
Design documented
Design reviewed
Feature merged
~~Examples and guides~~
Integration tests
Documentation for feature
Short video demonstrating the feature

Use-case

It’s basically used for reading flat file or dataset that is generated on a z/OS IBM mainframe based on a fixed length COBOL copybook. This will also work on AS/400 computers. So, if a customer has flat files on HDFS that can be parsed using simple COBOL copybook then applying the copybook one is able to read the file and its fields.

Conditions

Supports only fixed length binary format that matches the copybook
Binary data should be converted to Base64 encoded
First implementation will not be able to handle complex nested structures of COBOL copybook
Also will not handle Redefines or iterators in the structure.
Supports compressed files - Native Compressed Codec

Options

User should be able to copy paste or provide a file that gets loaded into text section for COBOL copybook
User should have the ability to select the fields that one wants into the output schema. So he should be able to specify the field.

References

Input Format implementation : here

Design

Assumptions:

1. .cbl file will have the schema in data structure
2. Both data file and .cbl files would reside on HDFS

For each AbstractLine read from the data file if the fields binary or binaryFile is true, the data will be encoded to Base64 format while reading

for (ExternalField field : externalRecord.getRecordFields()) {
  AbstractFieldValue filedValue = line.getFieldValue(field.getName());
  if (filedValue.isBinary()) {
    value.put(field.getName(), new String(Base64.decodeBase64(Base64.encodeBase64String(
      filedValue.toString().getBytes()))));
  } else {
    value.put(field.getName(), filedValue.toString());
  }
}

Examples

Properties :

cobolFile : .cbl file contents to specify schema
binaryFilePath : hdfs path of .bin data file to be read
isCompressed : check if it is a compressed file.User can also specify a Native Compressed Codec as input.
outputSchema : list of fields in the output file

Example :

{

"name": "CopybookReader",
"plugin": {

"name": "CopybookReader",

"type": "batchsource",

"properties": {

"cobolFilePath": "/data/sales/sales.cbl",
"binaryFilePath" "/data/sales/sale.bin"
"isCompressed" : "true/false",
"outputSchema" : {},
"uploadFileProperties": {}

}

}
This source plugin will read fixed length flat file sale.bin stored at the hdfs location hdfs://data/sales/sales.bin and schema specified from .cbl file in text input field. The plugin will convert the input binary data fo Base64 encoding format. The output data would be emitted as per the schema defined by the user.

Sample .cbl file:

000600*
000700* RECORD LENGTH IS 27.
000800*
000900 03 DTAR020-KCODE-STORE-KEY.
001000 05 DTAR020-KEYCODE-NO PIC X(08).
001100 05 DTAR020-STORE-NO PIC S9(03) COMP-3.
001200 03 DTAR020-DATE PIC S9(07) COMP-3.
001300 03 DTAR020-DEPT-NO PIC S9(03) COMP-3.
001400 03 DTAR020-QTY-SOLD PIC S9(9) COMP-3.
001500 03 DTAR020-SALE-PRICE PIC S9(9)V99 COMP-3.

The source plugin will read the above file as well as the data present in the .bin file and generate Base64 encoded output. The schema for the output will depend on the output schema as defined by the user.

Plugin for COBOL Copybook Reader - Fixed Length