Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table is a core dataset. Unlike relational database tables where every row has the same schema, every row of a Table can have a different set of columns. Though Tables do not require a schema, in practice they are often written with an implicit schema. Column names are often strings, with a single data type used for all values in the same column. If you are using a Table in this way, you can set a schema as a Table property to enable exploration. The schema will be applied at read time, allowing you to run ad-hoc queries against the Table.

Requirements

In order to explore a Table, your Table must meet a few requirements.

  • Columns names must be strings.

  • All column values for a specific column must be of the same type. For example, a value cannot be a string in one row and an integer in another.

  • Column values must be of a primitive type. A primitive type is one of boolean, int, long, float, double, bytes, or string.

  • Column names must be valid Hive column names. This means they cannot be reserved keywords such as drop. Please refer to the Hive language manual for more information about Hive.

Creating an Explorable Table

When creating a Table in your application, if you set the table's schema property, your Table will be enabled for exploration after it is created:

...

Note that the schema row field property is set along with the schema property. The schema row field property must be set if you want to explore your Table row along with Table columns. In the example above, this property will let CDAP know to read the id field from the Table row instead of from the Table columns.

Setting a Schema on an Existing Table

Since schema is applied at read time, it is possible to set a schema on a Table after it has been created. It is also possible to change the schema of a Table. Dataset properties can be set using the Microservices. For example, the same schema set through the example code above can also be set through the Microservices (reformatted to fit):

...

CDAP schemas are adopted from the Avro Schema Declaration. Note that since dataset properties must be strings, the schema JSON has to be escaped properly. Note also that the properties given in this request replace all existing properties; that is, if you had set other properties for this table, such as time-to-live (dataset.table.ttl), you must also include those properties in the update request.

Formulating Queries

When creating your queries, keep these limitations in mind:

...