Wrangler overview

Wrangler is a powerful tool that helps you view, explore, and transform a small sample (10 MB) of your data in one place before running the logic on the entire dataset in the Pipeline Studio. This means you can quickly apply transformations to gain an understanding of how they will affect the entire dataset.

Wrangler allows you to connect to your data wherever it resides, and transform it using simple, point-and-click transformation steps. 

You can create multiple transformations and add them to a recipe. When you are satisfied with the results of your recipe, you can create a data pipeline that includes the source and the Wrangler transformation. In the Studio, you can add more plugins to continue transforming your data and add a sink to write the transformed data to a target location.

Note: Wrangler supports CSV files without headers.

The overall development process for data pipelines with a Wrangler transformation:

  1. Create a connection to the data source.

  2. Double-click the dataset to start transforming the dataset.

  3. For files connections, parse the dataset.

  4. Use the Insights tab to profile and discover data quality issues.

  5. Add transformations to the recipe.

  6. When the recipe is complete, click Create a Pipeline.
    CDAP creates a pipeline with the source plugin and the Wrangler transformation.

  7. In the Pipeline Studio, review the Source plugin and Wrangler transformation and edit as required.

  8. In the Pipeline Studio, continue adding plugins to the pipeline.

  9. When satisfied with the pipeline and transformations, deploy the pipeline and run it.

 

 



Created in 2020 by Google Inc.