Pentaho Data Integration

Pentaho Data Integration is an advanced, open source business intelligence tool that can execute transformations on data coming from various sources. Let's see how to connect it to CDAP datasets using the CDAP JDBC driver.

  1. Before opening the Pentaho Data Integration application, copy the io.cdap.cdap.cdap-explore-jdbc-6.2.0.jar file to the lib directory of Pentaho Data Integration, located at the root of the application's directory.

  2. Open Pentaho Data Integration.

  3. In the toolbar, select File > New > Database Connection....

  4. In the General section, select a Connection Name, such as CDAP Sandbox. For the Connection Type, select Generic database. Select Native (JDBC) for the Access field. In this example, where we connect to a CDAP Sandbox, our Custom Connection URL is jdbc:cdap://localhost:11015. In the field Custom Driver Class Name, type io.cdap.cdap.explore.jdbc.ExploreDriver.

     

  5. Click OK.

  6. To use this connection, navigate to the Design tab on the left of the main view. In the Input menu, double click Table input. It will create a new transformation containing this input.

  7. In your transformation, right-click Table input and select Edit step. You can specify an appropriate name for this input such as CDAP datasets query. Under Connection, select the newly created database connection. In this example, it’s CDAP Sandbox. Enter a valid SQL query in the main SQL field. This will define the data available to your transformation.

     

  8. Click OK. Your input is now ready to be used in your transformation, and it will contain data coming from the results of the SQL query on the CDAP datasets.

  9. For more information on how to add components to a transformation and link them together, see the Pentaho Data Integration page.

Created in 2020 by Google Inc.