Goal
This is a source plugin that would allow users to read and process mainframe files defined using COBOL Copybook. This should be basic first implementation.
...
Input Format implementation : here
Design
- Assumptions:
- .cbl file will have the schema in data structure
- Both data file and .cbl files would reside on HDFS
- For each AbstractLine read from the data file if the fields binary or binaryFile is true, the data will be encoded to Base64 format while reading
for (ExternalField field : externalRecord.getRecordFields()) {
AbstractFieldValue filedValue = line.getFieldValue(field.getName());
if (filedValue.isBinary()) {
value.put(field.getName(), new String(Base64.decodeBase64(Base64.encodeBase64String(
filedValue.toString().getBytes()))));
} else {
value.put(field.getName(), filedValue.toString());
}
}
Examples
Properties :
cobolFile : .cbl file contents to specify schema
binaryFilePath : hdfs path of .bin data file to be read
isCompressed : check if it is a compressed file.User can also specify a Native Compressed Codec as input.
outputSchema : list of fields in the output file
...
The source plugin will read the above file as well as the data present in the .bin file and generate Base64 encoded output. The schema for the output will depend on the output schema as defined by the user.
Sample Base64 encoded output:
9vn2+PT19fgCDABAEYwoDAAAAAAcAAAAAZAM9vn2+PT19fgCDABAEYwoDAAAAAAdAAAAAZAN9vn2+PT19fgCDABAEYwoDAAAAAAcAAAAAFAc9vn2+fTx9fg
CDABAEYwoDAAAAAAcAAAAAZAM9vn2+fTx9fgCDABAEYwoDAAAAAAdAAAAAZAN9vn2+fTx9fgCDABAEYwoDAAAAAAcAAAAAFAc9vP28PT48PgCDABAEYwXDA
AAAAAcAAAAAEh89vL2+PT29/ECDABAEYxoXAAAAAAcAAAABpmc9vL2+PT29/ECDABAEYxoXAAAAAAdAAAABpmd9vT28/T08vkCDABAEYyVfAAAAAAcAAAAA
Dmc9vb28vT09fgCDABAEYyVfAAAAAAcAAAAAAic9vP29/T49vECDABAEYyVfAAAAAEMAAAAACcM9vX29/T18/ICDABAEYySnAAAAAAcAAAAADWc9vT28fT0
8PEFnABAEYyVfAAAAAAcAAAAABmc9vT28fT08PEFnABAEYyVfAAAAAAcAAAAABmc9vH29vT38fMFnABAEYwzXAAAAAAcAAAAAXmc9vH29vT38fMFnABAEYw
zXAAAAAAdAAAAAXmd9vj28/T39fIFnABAEYxBDAAAAAAcAAAAAImc9vD28fT0+PcFnABAEYyHjAAAAAAcAAAAAFlc....