The Vertica Bulk Import action plugin is available in the Hub.
Vertica Bulk Import Action plugin gets executed after successful mapreduce or spark job. It reads all the files in a given directory and bulk imports contents of those files into vertica table.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Usage Notes
The plugin can be configured to a read single file or multiple files from a configured HDFS directory and bulk load it into a Vertica table. The plugin uses the capabilities of Vertica to load the data from HDFS into Vertica. The command to load are issued through a Vertica JDBC driver. Vertica's java api VerticaCopyStream
is then used to write contents of the file as stdin stream
to a Vertica table.
For every load, the plugin starts up a transactions and the transaction is committed only when all the files have been successfully loaded into Vertica. In case of any failures while loading, the transaction is aborted. It's important to note that this will increase the load throughput, but in case of any issues it will rollback the complete fileset. Hence, the plugin provides the ability to commit transaction after every file being loaded into Vertica.
Plugin provides two different ways for loading in bulk to Vertica. First it uses a standard simple approach for loading in delimiter separated files, while the advanced option allows you to specify the COPY
query to load the data. More information about Vertica COPY
command can be found here. This advanced option should be used when you need advanced optimizations.
This plugin emits metrics num.of.rows.rejected
for number of rows successfully loaded and num.of.rows.inserted
number of rows rejected by Vertica bulk load.