Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

false at the end specifies that no auto-header detection is enabled. The columns would be named as body_1, body_2, … body_n. The name of the columns are based on the input column name, in this case it’s body.

...

Note: Data rows will also include the header, which need to be removed from processing.

Now that we have all rows of the CSV file parsed, we can use the filter-rows-on directive to filter and remove the header row. For this, you will have to identify a column name in the header that will never occur in the data. In this example, we will use school_id as the column name:

...

  • set columns school_id,student_id,first_name,last_name

Note: You don’t need to rename all the columns. Rename only the columns you are interested in. The column names are serially applied to the actual columns in the data rows. If there are more columns than what you have set, they will not be changed.

Here’s the full recipe:

Code Block
parse-as-csv :body ',' false
filter-rows-on regex-match body_1 ^school_id$
drop body
set columns school_id,student_id,first_name,last_name

...

Code Block
school_id, student_id, last_name, first_name
14J456,33445566,"Potter,Jr",Harry
14J456,44333433,Weasley,Ron

Leading Zeros

Numerical values (in fields like ‘school_id’, ‘student_id’, and ‘teacher_id’) often begin with “0”, but Excel removes these leading zeros, which changes the value listed for that field. For example, student_id “009844” becomes “9844”.

...