Parsing a CSV file (6.6.0 and below)

The topic applies to CDAP 6.6.0 and below. The steps to parse a file changed in CDAP 6.7.0. For more information, see Parsing Files in Wrangler.

The most common source of data for ETL applications is typically data stored in comma separated value (CSV) format text files, as many database systems export and import data in this format. 

Want to watch the video?

To parse a CSV file, follow these steps:

  1. From the home page, click Wrangler.


    The Wrangler Connection page appears.

  2. From the Wrangler Connection page, connect to GCS storage and select the CSV to upload:


    The CSV file appears in the Wrangler on the Data page:

     

  3. Next, you need to parse the raw CSV data so that you can view it in a spreadsheet format that splits it into rows and columns. To do this, select the drop-down menu next to body.

     

  4. Select Parse > CSV.

     

  5. Select the type of delimiter in the file, and then select Set first row as header.


    The dataset appears in a spreadsheet format. Wrangler adds the parse-as-csv directive to the recipe.

  6. Since you don’t need the body column to cleanse and analyze your data, delete it. From the drop-down menu, select Delete column.


    Wrangler adds the drop directive to the recipe.



    The dataset is ready to wrangle!



 

Created in 2020 by Google Inc.