Decimal Type support in CDAP Schema
Introduction
Most of the databases support decimal types with higher precision and scale which can not be represented by double(8-bytes) primitive type in java. Current CDAP schema does not support such decimal types. This document represents design to support decimal types with higher precision and scale in CDAP schema.
User Stories
- As a pipeline developer, I should be able read decimal type data with higher precision and scale from sources such as database or bigquery table.
As a pipeline developer, I should be able write decimal type data with higher precision and scale to sinks such as database or bigquery table.
- As a plugin developer, I should be able to create structured records with decimal type data of higher precision and scale.
As a CDAP user, I should be able to explore CDAP datasets that have fields with decimal type data of higher precision and scale.
Design
CDAP schema maps to Avro schema and Avro has a LogicalType - decimal - to support decimal data with higher precision and scale. Below is the avro schema to represent decimal logical type.
Avro Type | Avro Schema Example | Description |
---|---|---|
Decimal | { | where the byte stores the decimal number. Precision is a required field whereas scale is an optional field. |
To support Decimal type in CDAP, we will have to map decimal avro logical type to corresponding java and hive types. Below table represents mapping of logical avro type to corresponding hive and database types.
Avro Type | Hive Type | Java Type | MySQL | Oracle | MS SQL | BigQuery |
---|---|---|---|---|---|---|
Decimal | Decimal | java.math.BigDecimal | Decimal | Decimal, Numeric | Decimal | Numeric |
New Programmatic APIs
CDAP Schema
public final class Schema implements Serializable { private int precision; private int scale; public enum LogicalType { DATE(Type.INT, "date"), TIMESTAMP_MILLIS(Type.LONG, "timestamp-millis"), TIMESTAMP_MICROS(Type.LONG, "timestamp-micros"), TIME_MILLIS(Type.INT, "time-millis"), TIME_MICROS(Type.LONG, "time-micros"), // New logical type to represent decimal in bytes DECIMAL(Type.BYTES, "decimal"); private final Type type; private final String token; private static final Map<String, LogicalType> LOOKUP_BY_TOKEN; static { Map<String, LogicalType> map = new HashMap<>(); for (LogicalType logicalType : values()) { map.put(logicalType.token, logicalType); } LOOKUP_BY_TOKEN = Collections.unmodifiableMap(map); } /** * Creates a {@link Schema} for Decimal logical type. * @param precision precision for decimal number * @return A {@link Schema} with the decimal logical type. */ public static Schema decimalOf(int precision) { return new Schema(Type.BYTES, LogicalType.DECIMAL, null, null, null, null, null, null, null, precision, 0); } /** * Creates a {@link Schema} for Decimal logical type. * @param precision precision for decimal number * @param scale scale for the decimal number * @return A {@link Schema} with the decimal logical type. */ public static Schema decimalOf(int precision, int scale) { return new Schema(Type.BYTES, LogicalType.DECIMAL, null, null, null, null, null, null, null, precision, scale); } /** * Returns precision of decimal schema. * @return */ public int getPrecision() { return precision; } /** * Returns scale of decimal schema. */ public int getScale() { return scale; } }
StructuredRecord
// get decimal type as java BigDecimal @Nullable public BigDecimal getDecimal(String fieldName) { // convert decimal bytes to java BigDecimal based on precision and scale provided in the field schema } // set decimal type as java BigDecimal public Builder setDecimal(String fieldName, @Nullable BigDecimal decimal) { }
Example usage in StructuredRecord:
Schema schema = Schema.recordOf("x", Schema.nullableOf(Schema.decimalOf(2,2)); BigInteger bigInteger = new BigInteger("111"); // big decimal representing 1.11 BigDecimal decimal = new BigDecimal(bigInteger, 2); StructuredRecord record = StructuredRecord.builder(schema).setDecimal("x", decimal).build();
Limitation
Avro Decimal logical type can also be presented as fixed complex avro type. However, CDAP schema currently does not support this type. Hence, decimal logical type with underlying type as 'fixed' will not be supported.
Plugins support
- BigQuery Source/Sink
- Wrangler
- Dataprep BigQuery connection
Decimal type support in Data Prep Directives
- Convert string to bigdecimal
- Convert long to bigdecimal
- Convert double to bigdecimal
- Convert bigdecimal to string
UI Impact or Changes
For new Decimal type, the pipeline UI will display `Decimal`. Below is the mapping of backend types to UI types.
Backend schema type | UI type |
---|---|
decimal | Decimal |
Test Scenarios
Please note that the tests should be conducted for various databases and sinks such as bigquery.
Test ID | Test Description | Expected Results |
---|---|---|
1. | Read decimal type from database or bigquery | should be able to read decimal type as bigdecimal |
2. | Write decimal type to database or bigquery | should be able to write decimal type |
Releases
Targeted release CDAP 6.0.0
Future work
Support other Avro types like Duration or fixed bytes.
Related content
Created in 2020 by Google Inc.