Masking data

In Wrangler, you can mask sensitive data, such as social security numbers. You can mask data for all types of columns except Boolean and Bytes.

Wrangler provides the following masking techniques:

  • Show the last 4 characters only

  • Show the last 2 characters only

  • Custom selection

  • By shuffling

To mask fields in a column, follow these steps:

  1. Click the drop-down button next to the column name.

  2. Click Mask, and then select the masking technique you want to perform.

Show the last 4 characters only

Show the last 4 characters only adds the mask-number directive as a transformation step to the recipe.

Show the last 2 characters only

Show the last 2 characters only adds the mask-number directive as a transformation step to the recipe.

Custom selection

You can select the position of the characters in the column you want to mask. For example, if you select the last 10 characters in a string of 20 characters, Wrangler masks all characters in each field except for the first 10 characters.

To select specific characters to mask:

  1. Click the drop-down button next to the column name.

  2. Click Mask, and then select Custom selection.
    The column appears with a blue background, signifying you are in Mask Data mode.

  3. In any field, highlight the characters you want to mask and click Apply.

That portion of the value is masked for all fields in the column.

Custom selection adds the mask-number directive to the recipe. When you run the data pipeline, the transformation will be applied to all values in the column.

By shuffling

When you select By shuffling, Wrangler applies a random masking pattern to each field in the column.

By shuffling adds the mask-shuffle directive as a transformation step to the recipe.

 

Created in 2020 by Google Inc.