Hive Bulk Export Action

The Hive Bulk Export action plugin is available in the Hub.

Plugin version: 1.9.0-1.1.0

The Hive Bulk Export action takes a SELECT query as input and runs that query on a Hive table. It stores the results under the provided HDFS directory. When the SELECT query is provided to the plugin, it converts that SELECT query to INSERT OVERWRITE DIRECTORY Hive statement. When this query is executed, Hive starts a MapReduce job which stores the results to provided directory location. So there can be multiple files in a given directory location.

Important: Hive Export works with Hive 2.3.3.

If any query other than a valid SELECT query is provided, Hive Bulk Export will fail to publish the pipeline. This is because CDAP uses Apache Calcite to parse the SELECT query to verify that it's not any other SQL query.

To run the SELECT query, if the Overwrite Output Directory property is set to no, the pipeline publish will fail if the output directory already exists. In that case, either remove the directory or allow the directory to be overwritten by setting the Overwrite Output Directory property to yes.

You might use Hive Export Action to execute a SELECT query on Hive table(s) and write the results in a provided directory location in csv format.

Configuration

Property	Macro Enabled?	Description

Property	Macro Enabled?	Description
Hive Metastore Username	Yes	User identity for connecting to the specified hive database. Required for databases that need authentication. Optional for databases that do not require authentication.
Hive Metastore Password	Yes	Password to use to connect to the specified database. Required for databases that need authentication. Optional for databases that do not require authentication.
JDBC Connection String	Yes	Required. JDBC connection string including database name. Use `auth=delegationToken`. CDAP platform will provide appropriate delegation token while running the pipeline.
Select Statement	Yes	Required. Select command to select values from Hive table(s).
Output Directory	Yes	Required. HDFS Directory path where exported data will be written. If it does not exist it will get created. If it already exists, we can either overwrite it or fail at publish time based on Overwrite Output Directory property.
Overwrite Output Directory	Yes	If `yes` is selected, if the HDFS path exists, it will be overwritten. If `no` is selected, if the HDFS path exists, pipeline deployment will fail while publishing the pipeline. Default is `yes`.
Column Separator		Delimiter in the exported file. Values in each column is separated by this delimiter while writing to output file. Default is comma (,).

Example

This example connects to a Hive database using the specified JDBC Connection String, which means it will connect to the ‘mydb’ database of a Hive instance running on ‘localhost’ and runs the SELECT query as ‘INSERT OVERWRITE DIRECTORY’ statement. It will use path directory /tmp/hive and delimiter comma to write data into file(s).

Property	Value

Property	Value
Hive Metastore Username	`username`
Hive Metastore Password	`password`
JDBC Connection String	`jdbc:hive2://localhost:10000/mydb;auth=delegationToken`
Select Statement	`SELECT * FROM employee JOIN salary ON (employee.id = salary.id)`
Output Directory	`/tmp/hive`
Overwrite Output Directory	yes
Column Separator	,