Adding Transformations to a Replication Job

Note: Replication transformations are in Preview for CDAP 6.7.0.

You can select tables to replicate and add transformations to columns.

Starting in CDAP 6.7.0, you can rename columns and mask string characters when you configure a replication job. These transformations are applied when you run the replication job. The transformations are applied in the order you enter them on the Mappings, assessments, and transformations page. They are saved as transformations in the replication job JSON file, grouped by table column and table.

As you add transformations to columns, click the Refresh button to validate them. If a transformation is invalid, an error appears. If a transformation is invalid, delete it, add a new one, and click Refresh again to validate it. 

Because a transformation might depend on a previous transformation, when you delete a transformation, all the transformations entered after that transformation are deleted. 

Renaming a column

  1. On the Mappings, assessments, and transformations page (Step 4), to rename a column, click Transform > Rename and enter the new name for the column.

  2. Click Refresh to validate the name.

  3. Click Save.

Masking fields

When you configure a replication job, you can mask all the fields in a string column. You might want to mask sensitive data, such as credit cards or SSNs. CDAP applies fixed masking, which means the pattern is applied to a fixed length string. Mask transformations have the following syntax:

mask <col_name> <direction> <masking_character> n

When you run the replication job, CDAP replaces all characters in each field with a mask character, except the first n characters in the specified direction, before it writes the data to the target.

The masking direction can be left or right.

Note: CDAP 6.7.0 supports masking string values only.

CDAP provides the following masking transformations:

  • Show last 2. Masks all characters with * except for the last two characters in the string.

  • Show last 4. Masks all characters with * except for the last four characters in the string.

  • Custom. Define the characters to mask and the masking character. For example, right ? 3 masks all characters with ? except for the last three characters in the string. left # 2 masks all characters with # except for the first 2 characters in the string.

The mask is saved as a mask transformation that is applied to the data when you run the replication job. For example, if you enter right # 2 as a custom mask in a column called SSN, the transformation looks like this:

mask ssn right # 2

To mask fields in a column:

  1. On the Mappings, assessments, and transformations page (Step 4), to rename a column, click Transform > Mask.

  2. Choose a masking transformation.

  3. Click Refresh to validate the masking transformation.

  4. Click Save.

 

Created in 2020 by Google Inc.