Overview
A Cube dataset is an implementation of an OLAP Cube that is pre-packaged with CDAP. Cube datasets store multidimensional facts and provide a querying interface for the retrieval of the data. Additionally, Cube datasets allows for exploring of the data stored in the Cube.
Storing Data
A Cube dataset stores multidimensional CubeFacts
that contain dimension values, measurements, and an associated timestamp:
...
Currently, two types of measurements are supported: gauge and counter. A gauge measurement is for an absolute metric (it overwrites) while a counter measurement is for an incremental metric.
Writing Data
The Cube Dataset API provides methods to write either a single fact or multiple facts at once:
Code Block |
---|
public interface Cube extends Dataset, BatchWritable<Object, CubeFact> { void add(CubeFact fact); void add(Collection<? extends CubeFact> facts); // ... } |
Cube Configuration
A Cube dataset allows for querying a pre-aggregated view of the data. That view needs to be configured before any data is written to the Cube. Currently, a view is configured with a list of dimensions and list of required dimensions using the Dataset Properties.
...
By default, if no dataset.cube.resolutions
property is provided, a resolution of 1 second is used.
Querying Data
Querying data in Cube dataset is the most useful part of it. One can slice, dice and drill down into the data of the Cube. Use these methods of the API to perform queries:
...
Code Block |
---|
public final class TimeSeries { private final String measureName; private final Map<String, String> dimensionValues; private final List<TimeValue> timeValues; // ... } |
Exploring Data
Many times, in order to construct a useful query, you have to explore and discover what data is available in the Cube. For that, Cube provides exploration APIs to search for available dimension values and measurements in specific selection of the Cube data:
...
This query defines the data selection as 1 minute resolution aggregations that have rack dimension with value rack1
and the specified time range. It limits the number of results to 100.
AbstractCubeHttpHandler
CDAP comes with an AbstractCubeHttpHandler that can be used to quickly add a Service in your application that provides Microservices on top of your Cube dataset. It is an abstract class with only a single method to be implemented by its subclass that returns the Cube dataset to query in:
...
Code Block |
---|
[ { "measureName": "disk.reads", "dimensionValues": { "server": "server1" }, "timeValues": [ { "timestamp": 1423370200, "value": 969 }, { "timestamp": 1423370260, "value": 360 } ] }, { "measureName": "disk.reads", "dimensionValues": { "server": "server2" }, "timeValues": [ { "timestamp": 1423370200, "value": 23 }, { "timestamp": 1423370260, "value": 444 } ] }, { "measureName": "cpu.used", "dimensionValues": { "server": "server1" }, "timeValues": [ { "timestamp": 1423370200, "value": 50 }, { "timestamp": 1423370260, "value": 55 } ] }, { "measureName": "cpu.used", "dimensionValues": { "server": "server2" }, "timeValues": [ { "timestamp": 1423370200, "value": 12 }, { "timestamp": 1423370260, "value": 56 } ] } ] |
Examples of Using Cube Dataset
An example of using a Cube Dataset is included in the How To article Data Analysis with OLAP Cube/wiki/spaces/KB/pages/482967564.