Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Filtering records that do not have a specified number of columns in a record can be achieved with send-to-error or filter-row. For example, send-to-error exp : { this.width() < 4 } will send all records that have less than 4 (0..3) to error. filter-row exp: { this.width() < 4 } true will filter records that are less than 4 from the main dataset.

  • To find rows that have issues, you can use filter-row with a minor change as follows: filter-row exp: { this.width() < 4 } false. This will show you only rows that are problematic.

Trailing Commas

Saving an Excel file as a CSV file can create extra commas at the end of each row. Trailing commas can result when columns are deleted or column headers are removed.

Code Block
school_id, student_id, last_name, first_name,,,
14J456,33445566,Potter,Harry,,,
14J456,44333433,Weasley,Ron,,,
14J456,65765566,Granger,Hermione,,,
14J456,13233121,Diggory,Cedric,,,
14J456,98786868,Weasley,George,,,
14J456,78977876,Weasley,Fred,,,

Take the following steps to resolve the issue using the Wrangler interface:

  • Open the file in Wrangler.

  • Parse CSV.

  • Use the directive keep to keep the columns that you need, for example, keep school_id, student_id, last_name, first_name.

In Wrangler, type in the directive:

  • keep school_id, student_id, last_name, first_name

Here’s the full recipe:

Code Block
parse-as-csv body , false
filter-rows-on regex-match body_1 ^school_id$
drop body
set columns school_id, student_id, last_name, first_name
keep school_id,student_id,last_name,first_name