Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

Goal 

This is a source plugin that would allow users to read and process mainframe files defined using COBOL Copy BookCopybook. This should be basic first implementation.

Checklist

  •  User stories documented 
  •  User stories reviewed 
  •  Design documented 
  •  Design reviewed 
  •  Feature merged 
  •  Examples and guides 
  •  Integration tests 
  •  Documentation for feature 
  •  Short video demonstrating the feature

Use-case 

It’s basically used for reading flat file or dataset that is generated on a z/OS IBM mainframe based on a fixed length COBOL copybook. This will also work on AS/400 computers. So, if a customer has flat files on HDFS that can be parsed using simple COBOL copybook then applying the copybook one is able to read the file and it’s its fields easily.

Conditions

  • Supports only fixed length binary format that matches the copybook

  • Binary data should be converted to Base64 encoded

  • First implementation will not be able to handle complex nested structures of COBOL copybook

  • Also will not handle Redefines or iterators in the structure.

  • Supports compressed files - Native Compressed Codec

Options

  • User should be able to copy paste or provide a file that gets loaded into text section for COBOL copybook

  • User should have the ability to select the fields that one wants into the output schema. So he should be able to specify the field.

References

Input Format implementation : here 

Design

Assumptions:

  1. .cbl file will have the schema in data structure
  2. Both data file and .cbl files would reside on HDFS

 

Examples

Properties

cobolFile :  .cbl file contents to specify schema
binaryFilePath : hdfs path of .bin data file to be read
isCompressed : check if it is a compressed file.User can also specify a Native Compressed Codec as input.
outputSchema : list of fields in the output file

Example :

{

"name": "CopyBookReaderCopybookReader",
"plugin": {

"name": "CopyBookReaderCopybookReader",

"type": "batchsource",

"properties": {

...