Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Goal 

This is a source plugin that would allow users to read and process mainframe files defined using COBOL Copybook. This should be basic first implementation.

...

Input Format implementation : here 

Design

  • Assumptions:
    1. .cbl file will have the schema in data structure
    2. Both data file and .cbl files would reside on HDFS

  • For each AbstractLine "AbstractFieldValue" read from the data file if the fields binary or binaryFile is truetype is binary, the data will be encoded to Base64 format while reading
    for (ExternalField field : externalRecord.getRecordFields()) {
    AbstractFieldValue filedValue = line.getFieldValue(field.getName());
    if (filedValue.isBinary()) {
    value.put(field.getName(), new String(
    Integer.parseInt(Base64.decodeBase64(Base64.encodeBase64String(
    filedValueencodeBase64(value.toString().getBytes())).toString()); } else {
    value.put(field.getName(), filedValue.toString

    or

    Base64.decodeInteger(Base64.encodeInteger(value.asBigInteger()));  }
    }

 

It will depend on the field data type(int or BigInteger)

Examples

Properties : 

copybookContents copybookContents : Contents of the COBOL copybook file which will contain the data structure
binaryFilePath           : Complete path of the .bin to be read.This will be a fixed length binary format file,that matches the copybook.
fileStructure drop                            : CopyBook file structure. For the current implementation only fixed length flat files will be read: Comma-separated list of fields to drop. For example: 'field1,field2,field3'.
maxSplitSize            : Maximum split-size for each mapper in the MapReduce. \n Job. Defaults to 128MB.

Example :

This example reads data from a local binary file "file:///home/cdap/cdap/DTAR020_FB.bin"  and parses it using the schema given in the text area "COCOL CopyBook contents

It will drop field "DTAR020-DATE" and generate structured records with either the output schema (if specified by the user) or with the default schema as is specified in the text area.

...

"name": "CopyBookReader",
"type": "batchsource",
"properties": {

"schemadrop" : "{

\"type\":\"record\",

\"name\":\"etlSchemaBody\",

\"fields\":[

{

...

 "DTAR020-DATE",

...

"

...

},

{

\"name\":\"DTAR020-QTY-SOLD\",

\"type\":[\"int\",\"null\"]

},

{

\"name\":\"DTAR020-SALE-PRICE\",

\"type\":[\"double\",\"null\"]

}

]

}",

"referenceName": "CopyBook",

...

"binaryFilePath": "file:///home/cdap/cdap/DTAR020_FB.bin",

"fileStructuremaxSplitSize": "5"

}

}

}Sample .cbl file:000600* 
000700* RECORD LENGTH IS 27. 
000800* 
000900 03 DTAR020-KCODE-STORE-KEY. 
001000 05 DTAR020-KEYCODE-NO PIC X(08). 
001100 05 DTAR020-STORE-NO PIC S9(03) COMP-3. 
001200 03 DTAR020-DATE PIC S9(07) COMP-3. 
001300 03 DTAR020-DEPT-NO PIC S9(03) COMP-3. 
001400 03 DTAR020-QTY-SOLD PIC S9(9) COMP-3. 
001500 03 DTAR020-SALE-PRICE PIC S9(9)V99 COMP-3.