Goal
This is a source plugin that would allow users to read and process mainframe files defined using COBOL Copybook. This should be basic first implementation.
...
Input Format implementation : here
Design
- Assumptions:
- .cbl file will have the schema in data structure
- Both data file and .cbl files would reside on HDFS
- For each AbstractLine read from the data file if the fields binary or binaryFile is true, the data will be encoded to Base64 format while reading
for (ExternalField field : externalRecord.getRecordFields()) {
AbstractFieldValue filedValue = line.getFieldValue(field.getName());
if (filedValue.isBinary()) {
value.put(field.getName(), new String(Base64.decodeBase64(Base64.encodeBase64String(
filedValue.toString().getBytes()))));
} else {
value.put(field.getName(), filedValue.toString());
}
}
...
"name": "CopyBookReader",
"type": "batchsource",
"properties": {
"schema": "{
\"type\":\"record\",
\"name\":\"etlSchemaBody\",
\"fields\":[
{
\"name\":\"DTAR020-KEYCODE-NO\",
\"type\":\"int\"
},
...
{
\"name\":\"DTAR020-QTY-SOLD\",
\"type\":[\"int\",\"null\"]
},
{
\"name\":\"DTAR020-SALE-PRICE\",
\"type\":[\"double\",\"null\"]
}
]
}",
"referenceName": "CopyBook",
"copybookContents":
"000100* \
...
n
000200* DTAR020 IS THE OUTPUT FROM DTAB020 FROM THE IML \
...
n
000300* CENTRAL REPORTING SYSTEM \
...
n
000400* \
...
n
000500* CREATED BY BRUCE ARTHUR 19/12/90 \
...
n
000600* \
...
n
000700* RECORD LENGTH IS 27. \
...
n
000800* \
...
n
000900 03 DTAR020-KCODE-STORE-KEY. \
...
n
001000 05 DTAR020-KEYCODE-NO PIC X(08). \
...
n
001100 05 DTAR020-STORE-NO PIC S9(03) COMP-3. \
...
n
001200 03 DTAR020-DATE PIC S9(07) COMP-3. \
...
n
001300 03 DTAR020-DEPT-NO PIC S9(03) COMP-3. \
...
n
001400 03 DTAR020-QTY-SOLD PIC S9(9) COMP-3. \
...
n
001500 03 DTAR020-SALE-PRICE PIC S9(9)V99 COMP-3. ",
"binaryFilePath": "file:///home/cdap/cdap/DTAR020_FB.bin",
...