/
Decimal Type support in CDAP Schema

Decimal Type support in CDAP Schema

Introduction 

Most of the databases support decimal types with higher precision and scale which can not be represented by double(8-bytes) primitive type in java. Current CDAP schema does not support such decimal types. This document represents design to support decimal types with higher precision and scale in CDAP schema.

User Stories 

  • As a pipeline developer, I should be able read decimal type data with higher precision and scale from sources such as database or bigquery table. 
  • As a pipeline developer, I should be able write decimal type data with higher precision and scale to sinks such as database or bigquery table. 

  • As a plugin developer, I should be able to create structured records with decimal type data of higher precision and scale.
  • As a CDAP user, I should be able to explore CDAP datasets that have fields with decimal type data of higher precision and scale. 

Design

CDAP schema maps to Avro schema and Avro has a LogicalType - decimal - to support decimal data with higher precision and scale. Below is the avro schema to represent decimal logical type. 

Avro TypeAvro Schema ExampleDescription
Decimal

{
"type": "bytes",
"logicalType": "decimal",
"precision": 4,
"scale": 2
}

where the byte stores the decimal number.

Precision is a required field whereas scale is an optional field.


To support Decimal type in CDAP, we will have to map decimal avro logical type to corresponding java and hive types. Below table represents mapping of logical avro type to corresponding hive and database types.

Avro TypeHive TypeJava TypeMySQLOracleMS SQLBigQuery
DecimalDecimaljava.math.BigDecimalDecimalDecimal, NumericDecimalNumeric

New Programmatic APIs

CDAP Schema

Schema.java
public final class Schema implements Serializable {
  private int precision;
  private int scale;


  public enum LogicalType {
    DATE(Type.INT, "date"),
    TIMESTAMP_MILLIS(Type.LONG, "timestamp-millis"),
    TIMESTAMP_MICROS(Type.LONG, "timestamp-micros"),
    TIME_MILLIS(Type.INT, "time-millis"),
    TIME_MICROS(Type.LONG, "time-micros"),
    // New logical type to represent decimal in bytes
    DECIMAL(Type.BYTES, "decimal");

    private final Type type;
    private final String token;

    private static final Map<String, LogicalType> LOOKUP_BY_TOKEN;
    static {
      Map<String, LogicalType> map = new HashMap<>();
      for (LogicalType logicalType : values()) {
        map.put(logicalType.token, logicalType);
      }
      LOOKUP_BY_TOKEN = Collections.unmodifiableMap(map);
  }

  /**
   * Creates a {@link Schema} for Decimal logical type.
   * @param precision precision for decimal number
   * @return A {@link Schema} with the decimal logical type.
   */
  public static Schema decimalOf(int precision) {
    return new Schema(Type.BYTES, LogicalType.DECIMAL, null, null, null, null, null, null, null, precision, 0);
  }

  /**
   * Creates a {@link Schema} for Decimal logical type.
   * @param precision precision for decimal number
   * @param scale scale for the decimal number
   * @return A {@link Schema} with the decimal logical type.
   */
  public static Schema decimalOf(int precision, int scale) {
    return new Schema(Type.BYTES, LogicalType.DECIMAL, null, null, null, null, null, null, null, precision, scale);
  }


  /**
   * Returns precision of decimal schema.
   * @return
   */
  public int getPrecision() {
    return precision;
  }

  /**
   * Returns scale of decimal schema.
   */
  public int getScale() {
    return scale;
  }
}


StructuredRecord

StructuredRecord.java
// get decimal type as java BigDecimal
@Nullable
public BigDecimal getDecimal(String fieldName) {
  // convert decimal bytes to java BigDecimal based on precision and scale provided in the field schema
}

// set decimal type as java BigDecimal
public Builder setDecimal(String fieldName, @Nullable BigDecimal decimal) {
  
}


Example usage in StructuredRecord: 

StructureRecordExample.java
Schema schema = Schema.recordOf("x", Schema.nullableOf(Schema.decimalOf(2,2));
BigInteger bigInteger = new BigInteger("111");
// big decimal representing 1.11
BigDecimal decimal = new BigDecimal(bigInteger, 2);
StructuredRecord record = StructuredRecord.builder(schema).setDecimal("x", decimal).build();


Limitation

Avro Decimal logical type can also be presented as fixed complex avro type. However, CDAP schema currently does not support this type. Hence, decimal logical type with underlying type as 'fixed' will not be supported.

Plugins support

  • BigQuery Source/Sink
  • Wrangler
  • Dataprep BigQuery connection

Decimal type support in Data Prep Directives

  • Convert string to bigdecimal
  • Convert long to bigdecimal
  • Convert double to bigdecimal
  • Convert bigdecimal to string

UI Impact or Changes

For new Decimal type, the pipeline UI will display `Decimal`. Below is the mapping of backend types to UI types.

Backend schema typeUI type
decimalDecimal


Test Scenarios

Please note that the tests should be conducted for various databases and sinks such as bigquery. 

Test IDTest DescriptionExpected Results

1.

Read decimal type from database or bigqueryshould be able to read decimal type as bigdecimal
2.Write decimal type to database or bigqueryshould be able to write decimal type

Releases

Targeted release CDAP 6.0.0

Future work

Support other Avro types like Duration or fixed bytes.

Related content

Created in 2020 by Google Inc.