Wrangler Command Line Reference

This guide provides information about how to use Power Mode (the Wrangler CLI) to add directives and functions to a recipe.

 

The Wrangler is an application that allows you to view, explore, and transform a small subset of your data (100 rows) before running your logic on the entire dataset (using a MapReduce or Spark job in Pipelines). This means you can quickly apply transformations to gain an understanding of how they will affect the entire dataset.

The Wrangler allows you to connect to a variety of data sources, including RDBMS, Kafka, .csv files, .json files, and more. It is simple to ingest and parse this data into an easily-understood columnar format.

Once the data is loaded, the Wrangler allows you to inspect the data in spreadsheet format. You can apply filters on columns to better understand the distribution of the data, inspect rows that are null, and more.

The Wrangler enables you to transform your data. The Wrangler is built for applying simple transforms that do not require more complex programmatic logic. You can join columns, replace values, drop values conditionally, and much more. Also, you can re-format your data, for instance, from .csv to .json.

Finally, once you have explored your data and applied desired transformations, you can operationalize those transformations by clicking Create Pipeline. This will create a Pipeline (as described in the next section) which will apply the transformations you made to the entirety of your data in a parallelized MapReduce, Spark, or Spark Streaming job.