Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Goal 

This is a source plugin that would allow users to read and process mainframe files defined using COBOL Copybook. This should be basic first implementation.

Checklist

  • User stories documented 
  • User stories reviewed 
  • Design documented 
  • Design reviewed 
  • Feature merged 
  • Examples and guides 
  • Integration tests 
  • Documentation for feature 
  • Short video demonstrating the feature

Use-case 

It’s basically used for reading flat file or dataset that is generated on a z/OS IBM mainframe based on a fixed length COBOL copybook. This will also work on AS/400 computers. So, if a customer has flat files on HDFS that can be parsed using simple COBOL copybook then applying the copybook one is able to read the file and its fields.

Conditions

  • Supports only fixed length binary format that matches the copybook

  • Binary data should be converted to Base64 encoded

  • First implementation will not be able to handle complex nested structures of COBOL copybook

  • Also will not handle Redefines or iterators in the structure.

  • Supports compressed files - Native Compressed Codec

Options

  • User should be able to copy paste or provide a file that gets loaded into text section for COBOL copybook

  • User should have the ability to select the fields that one wants into the output schema. So he should be able to specify the field.

References

Input Format implementation : here 

Design

  • Assumptions:
    1. .cbl file will have the schema in data structure
    2. Both data file and .cbl files would reside on HDFS

  • For each "AbstractFieldValue" read from the data file if the type is binary, the data will be encoded to Base64 format.
    Integer.parseInt(Base64.decodeBase64(Base64.encodeBase64(value.toString().getBytes())).toString());

    or

    Base64.decodeInteger(Base64.encodeInteger(value.asBigInteger()));

It will depend on the field data type(int or BigInteger)

Examples

Properties : 

copybookContents : Contents of the COBOL copybook file which will contain the data structure
binaryFilePath         : Complete path of the .bin to be read.This will be a fixed length binary format file,that matches the copybook.
drop                          : Comma-separated list of fields to drop. For example: 'field1,field2,field3'.
maxSplitSize            : Maximum split-size for each mapper in the MapReduce. \n Job. Defaults to 128MB.

Example :

This example reads data from a local binary file "file:///home/cdap/DTAR020_FB.bin"  and parses it using the schema given in the text area "COCOL CopyBook" 

It will drop field "DTAR020-DATE" and generate structured records with schema as specified in the text area.

{

"name": "CopyBookReader",
"plugin": {

"name": "CopyBookReader",
"type": "batchsource",
"properties": {

"drop" : "DTAR020-DATE",

"referenceName": "CopyBook",

"copybookContents":

"000100* \n

000200* DTAR020 IS THE OUTPUT FROM DTAB020 FROM THE IML \n

000300* CENTRAL REPORTING SYSTEM \n

000400* \n

000500* CREATED BY BRUCE ARTHUR 19/12/90 \n

000600* \n

000700* RECORD LENGTH IS 27. \n

000800* \n

000900 03 DTAR020-KCODE-STORE-KEY. \n

001000 05 DTAR020-KEYCODE-NO PIC X(08). \n

001100 05 DTAR020-STORE-NO PIC S9(03) COMP-3. \n

001200 03 DTAR020-DATE PIC S9(07) COMP-3. \n

001300 03 DTAR020-DEPT-NO PIC S9(03) COMP-3. \n

001400 03 DTAR020-QTY-SOLD PIC S9(9) COMP-3. \n

001500 03 DTAR020-SALE-PRICE PIC S9(9)V99 COMP-3. ",

"binaryFilePath": "file:///home/cdap/DTAR020_FB.bin",

"maxSplitSize": "5"

}

}

}

  • No labels