Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This guide provides information about how to use Power Mode (the Wrangler CLI) to add directives and functions to a recipe.

Data Preparation The Wrangler is an application that allows you to view, explore, and transform a small subset of your data (100 rows) before running your logic on the entire dataset (using a MapReduce or Spark job in Pipelines). This means you can quickly apply transformations to gain an understanding of how they will affect the entire dataset.

...

Data Prepration The Wrangler allows you to connect to a variety of data sources, including RDBMS, Kafka, .csv files, .json files, and more. It is simple to ingest and parse this data into a an easily-understood columnar format.

Once the data is loaded, Data Preparation the Wrangler allows you to inspect the data in spreadsheet format. You can apply filters on columns to better understand the distribution of the data, inspect rows that are null, and more.

Further, Data Preparation The Wrangler enables you to transform your data. Data Preparation The Wrangler is built for applying simple transforms that do not require more complex programmatic logic. You can join columns, replace values, drop values conditionally, and much more. FurtherAlso, you can re-format your data -- , for instance, from .csv to .json.

Finally, once you have explored your data and applied desired transformations, you can operationalize those transformations by clicking Create Pipeline. This will create a Pipeline (as described in the next section) which will apply the transformations you made to the entirety of your data in a parallelized MapReduce, Spark, or Spark Streaming job.