Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Spelling

Goal 

This is a source plugin that would allow users to read and process mainframe files defined using COBOL Copybook. This should be basic first implementation.

...

  • User should be able to copy paste or provide a file that gets loaded into text section for COBOL copybook

  • User should have the ability to select the fields that one wants into the output schema. So he they should be able to specify the field.

...

Input Format implementation : here 

Design

  • Assumptions:
    1. .cbl file will have the schema in data structure
    2. Both data file and .cbl files would reside on HDFS

...

copybookContents : Contents of the COBOL copybook file which will contain the data structure
binaryFilePath         : Complete path of the .bin to be read.This will be a fixed length binary format file,that matches the copybook.
drop                          : Comma-separated list of fields to drop. For example: 'field1,field2,field3'.
maxSplitSize            : Maximum split-size for each mapper in the MapReduce . \n Job. Defaults to 128MB.

Example :

This example reads data from a local binary file "file:///home/cdap/DTAR020_FB.bin"  and parses it using the schema given in the text area "COCOL CopyBookCOBOL Copybook

It will drop field "DTAR020-DATE" and generate structured records with schema as specified in the text area.

{

"name": "CopyBookReaderCopybookReader",
"plugin": {

"name": "CopyBookReaderCopybookReader",
"type": "batchsource",
"properties": {

"drop" : "DTAR020-DATE",

"referenceName": "CopyBookCopybook",

"copybookContents":

"000100* \n

...