Decimal Type support in CDAP Schema

Decimal Type support in CDAP Schema

Introduction 

Most of the databases support decimal types with higher precision and scale which can not be represented by double(8-bytes) primitive type in java. Current CDAP schema does not support such decimal types. This document represents design to support decimal types with higher precision and scale in CDAP schema.

User Stories 

  • As a pipeline developer, I should be able read decimal type data with higher precision and scale from sources such as database or bigquery table. 

  • As a pipeline developer, I should be able write decimal type data with higher precision and scale to sinks such as database or bigquery table. 

  • As a plugin developer, I should be able to create structured records with decimal type data of higher precision and scale.

  • As a CDAP user, I should be able to explore CDAP datasets that have fields with decimal type data of higher precision and scale. 

Design

CDAP schema maps to Avro schema and Avro has a LogicalType - decimal - to support decimal data with higher precision and scale. Below is the avro schema to represent decimal logical type. 

Avro Type

Avro Schema Example

Description

Avro Type

Avro Schema Example

Description

Decimal

{
"type": "bytes",
"logicalType": "decimal",
"precision": 4,
"scale": 2
}

where the byte stores the decimal number.

Precision is a required field whereas scale is an optional field.

 

To support Decimal type in CDAP, we will have to map decimal avro logical type to corresponding java and hive types. Below table represents mapping of logical avro type to corresponding hive and database types.

Avro Type

Hive Type

Java Type

MySQL

Oracle

MS SQL

BigQuery

Avro Type

Hive Type

Java Type

MySQL

Oracle

MS SQL

BigQuery

Decimal

Decimal

java.math.BigDecimal

Decimal

Decimal, Numeric

Decimal

Numeric

New Programmatic APIs

CDAP Schema

Schema.java
public final class Schema implements Serializable { private int precision; private int scale; public enum LogicalType { DATE(Type.INT, "date"), TIMESTAMP_MILLIS(Type.LONG, "timestamp-millis"), TIMESTAMP_MICROS(Type.LONG, "timestamp-micros"), TIME_MILLIS(Type.INT, "time-millis"), TIME_MICROS(Type.LONG, "time-micros"), // New logical type to represent decimal in bytes DECIMAL(Type.BYTES, "decimal"); private final Type type; private final String token; private static final Map<String, LogicalType> LOOKUP_BY_TOKEN; static { Map<String, LogicalType> map = new HashMap<>(); for (LogicalType logicalType : values()) { map.put(logicalType.token, logicalType); } LOOKUP_BY_TOKEN = Collections.unmodifiableMap(map); } /** * Creates a {@link Schema} for Decimal logical type. * @param precision precision for decimal number * @return A {@link Schema} with the decimal logical type. */ public static Schema decimalOf(int precision) { return new Schema(Type.BYTES, LogicalType.DECIMAL, null, null, null, null, null, null, null, precision, 0); } /** * Creates a {@link Schema} for Decimal logical type. * @param precision precision for decimal number * @param scale scale for the decimal number * @return A {@link Schema} with the decimal logical type. */ public static Schema decimalOf(int precision, int scale) { return new Schema(Type.BYTES, LogicalType.DECIMAL, null, null, null, null, null, null, null, precision, scale); } /** * Returns precision of decimal schema. * @return */ public int getPrecision() { return precision; } /** * Returns scale of decimal schema. */ public int getScale() { return scale; } }

 

StructuredRecord

StructuredRecord.java
// get decimal type as java BigDecimal @Nullable public BigDecimal getDecimal(String fieldName) { // convert decimal bytes to java BigDecimal based on precision and scale provided in the field schema } // set decimal type as java BigDecimal public Builder setDecimal(String fieldName, @Nullable BigDecimal decimal) { }


Example usage in StructuredRecord: 

StructureRecordExample.java
Schema schema = Schema.recordOf("x", Schema.nullableOf(Schema.decimalOf(2,2)); BigInteger bigInteger = new BigInteger("111"); // big decimal representing 1.11 BigDecimal decimal = new BigDecimal(bigInteger, 2); StructuredRecord record = StructuredRecord.builder(schema).setDecimal("x", decimal).build();

 

Limitation

Avro Decimal logical type can also be presented as fixed complex avro type. However, CDAP schema currently does not support this type. Hence, decimal logical type with underlying type as 'fixed' will not be supported.

Plugins support

  • BigQuery Source/Sink

  • Wrangler

  • Dataprep BigQuery connection

Decimal type support in Data Prep Directives

  • Convert string to bigdecimal

  • Convert long to bigdecimal

  • Convert double to bigdecimal

  • Convert bigdecimal to string

UI Impact or Changes

For new Decimal type, the pipeline UI will display `Decimal`. Below is the mapping of backend types to UI types.

Backend schema type

UI type

Backend schema type

UI type

decimal

Decimal


Test Scenarios

Please note that the tests should be conducted for various databases and sinks such as bigquery. 

Test ID

Test Description

Expected Results

Test ID

Test Description

Expected Results

1.

Read decimal type from database or bigquery

should be able to read decimal type as bigdecimal

2.

Write decimal type to database or bigquery

should be able to write decimal type

Releases

Targeted release CDAP 6.0.0

Future work

Support other Avro types like Duration or fixed bytes.

Created in 2020 by Google Inc.