Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Each basic operation has a method that takes an operation-type object as a parameter plus handy methods for working directly with byte arrays. If your application code already deals with byte arrays, you can use the latter methods to save a conversion.

Read

get operation reads all columns or a selection of columns of a single row:

...

Each Row object in the returned list will contain the results for one of the requested row keys. When multiple rows must be retrieved together, this version of the get operation allows the storage provider to perform more efficient batching of the operations, if supported.

Scan

scan operation fetches a subset of rows or all of the rows of a Table:

...

Code Block
byte[] startRow;
// Scan all rows of a table
Scanner allRows = t.scan(null, null);
// Scan all columns up to stopRow (exclusive)
Scanner headRows = t.scan(null, stopRow);
// Scan all columns starting from startRow (inclusive)
Scanner tailRows = t.scan(startRow, null);

Write

put operation writes data into a row:

...

Note that the column value cannot be empty, that is, it must have at least length one.

Compare and Swap

A swap operation compares the existing value of a column with an expected value, and if it matches, replaces it with a new value. The operation returns true if it succeeds and false otherwise:

...

Note that the column value cannot be empty, that is, it must have at least length one.

Increment

An increment operation increments a long value of one or more columns by either 1L or an integer amount n. If a column does not exist, it is created with an assumed value of zero before the increment is applied:

...

  • incrementAndGet(...) operations will increment the currently stored value and return the result; and

  • increment(...) operations will increment the currently stored value without any return value.

Read-less Increments

By default, an increment operation will need to first perform a read operation to find the currently stored column value, apply the increment to the stored value, and then write the final result. For high write volume workloads, with only occasional reads, this can impose a great deal of unnecessary overhead for increments.

...

Note: the current implementation of read-less increments uses an HBase coprocessor to prefix the stored values for incremental updates with a special prefix. Since this prefix could occur naturally in other stored data values, it is highly recommended that increments be stored in a separate dataset and not be mixed in with other types of values. This will ensure that other data is not mis-identified as a stored increment and prevent incorrect results.

Delete

A delete operation removes an entire row or a subset of its columns:

...

Note that specifying a set of columns helps to perform delete operation faster. When you want to delete all the columns of a row and you know all of them, passing all of them will make the deletion faster. Deleting all the columns of a row will also delete the entire row, as the underlying implementation of a Table is a columnar store.

Writing from MapReduce

Table implements the BatchWritable interface, using byte[] as the key and Put as the value for each write. To write to a table from MapReduce, use these types as the output types of your Reducer (or Mapper in case of a map-only program). For example, the Reducer can be defined as follows:

...

Code Block
context.write(null, new Put(row).add("count", sum));

Pre-Splitting a Table into Multiple Regions

When the underlying storage for a Table Dataset (or any Dataset that uses a Table underneath, such as a KeyValueTable) is HBase, CDAP allows you to configure pre-splitting to gain a better distribution of data operations after the tables are created. This helps optimize for better performance, depending on your use case.

...