Working with connections in Wrangler
A connection stores information, such as user credentials and host information, needed to connect to a data source.
If you use a connection in Wrangler, when you create a pipeline, CDAP creates the pipeline with the corresponding source plugin and the Wrangler transformation. The source plugin is configured with all of the properties in the connection.
You can use a connection to browse and sample data in supported data sources. For example, if the CDAP Administrator creates a connection to BigQuery, you can use the connection to locate the dataset you want to perform data cleansing and data quality checks on.
If you don’t use a connection, you need to manually add the source plugin to the pipeline and configure it.
For more information about supported connections, see the Connection Reference.
If you encounter any issues using connections, see the Troubleshooting Guide.
Note: Wrangler must be able to connect to the source when you create a connection.
Adding a connection
Note: Before you add a connection for a database source, you must upload the JDBC driver to CDAP. You can upload the driver from the Hub or the Namespace Admin page.
To add a connection, follow these steps:
From the Wrangler home page, click Add Connection.
From the Add a connection page, click the type of connection you want to create.
Configure the connection properties.
Note: Connection names must be unique in a namespace and can only include letters, numbers, underscores, and hyphens.As a best practice, click Test Connection to ensure the connection works.
Click Create.
The connection appears under the connection type and is available for all users in the namespace. The connection is available for use in the Wrangler and in the corresponding source plugin in the Pipeline Studio.
For information about connections, see Managing Connections.
Editing a Connection
To edit connection properties, follow these steps:
Click the name of the connection you want to view. The available connections are grouped by type on the left panel.
Click the three dots and click Edit.
You can view the connection properties and test the connection.Click Save.
Exporting a Connection
You might want to export connections to share them with other team members, add them to version control, or deploy them from a development environment to a test or production environment.
To export a connection, follow these steps:
Click the name of the connection you want to export. The available connections are grouped by type on the left panel.
Click the three dots and click Export.
CDAP exports the connection properties to a JSON file.
Duplicating a Connection
To duplicate a connection, follow these steps:
Click the name of the connection you want to duplicate. The available connections are grouped by type on the left panel.
Click the three dots and click Duplicate.
CDAP creates a new connection with the properties from the original connection.Enter a Name for the connection.
Note: Connection names must be unique in the namespace.(Optional) Edit the connection properties.
Click Test Connection and resolve any errors.
Click Create.
The connection is available for use in the Wrangler and in the corresponding source plugin in the Pipeline Studio.
Deleting a Connection
To delete a connection, follow these steps:
Click the name of the connection you want to delete. The available connections are grouped by type on the left panel.
Click the three dots and click Delete.
CDAP deletes the connection and it is no longer available for use.
Using a connection
To use a connection in Wrangler, follow these steps:
Click the name of the connection you want to use. The available connections are grouped by type on the left panel.
Locate the entity you want to wrangle.
Wrangler lists up to 1000 entities per page.
Note: Not all entities can be sampled. Keep browsing until you find an entity that you can sample.Click the entity.
The entity is ready to wrangle.After you finishing performing transformations on the data, click Create Pipeline and click Batch pipeline. CDAP creates a pipeline and adds a batch source plugin and a Wrangler transformation to it. CDAP populates the source plugin properties with the properties set in the connection.
Created in 2020 by Google Inc.