...
Filtering records that do not have a specified number of columns in a record can be achieved with
send-to-error
orfilter-row
. For example,send-to-error exp : { this.width() < 4 }
will send all records that have less than 4 (0..3) to error.filter-row exp: { this.width() < 4 } true
will filter records that are less than 4 from the main dataset.To find rows that have issues, you can use
filter-row
with a minor change as follows:filter-row exp: { this.width() < 4 } false
. This will show you only rows that are problematic.
Trailing Commas
Saving an Excel file as a CSV file can create extra commas at the end of each row. Trailing commas can result when columns are deleted or column headers are removed.
Code Block |
---|
school_id, student_id, last_name, first_name,,,
14J456,33445566,Potter,Harry,,,
14J456,44333433,Weasley,Ron,,,
14J456,65765566,Granger,Hermione,,,
14J456,13233121,Diggory,Cedric,,,
14J456,98786868,Weasley,George,,,
14J456,78977876,Weasley,Fred,,, |
Take the following steps to resolve the issue using the Wrangler interface:
Open the file in Wrangler.
Parse CSV.
Use the directive
keep
to keep the columns that you need, for example,keep school_id, student_id, last_name, first_name
.
In Wrangler, type in the directive:
keep school_id, student_id, last_name, first_name
Here’s the full recipe:
Code Block |
---|
parse-as-csv body , false
filter-rows-on regex-match body_1 ^school_id$
drop body
set columns school_id, student_id, last_name, first_name
keep school_id,student_id,last_name,first_name |