Google Cloud Data Loss Prevention (DLP) Redact Transformation

The Google Cloud Data Loss Prevention (DLP) Redact transformation is available in the Hub.

This plugin uses Google’s Data Loss Prevention APIs which charge the user depending on the volume of data analyzed (not transformed). More details on the exact costs can be found here.

Plugin version: 1.4.0

This plugin transforms sensitive records from the input stream. A record is deemed sensitive if it matches some pre-defined DLP filters or a custom user-defined template. See the DLP Filter Mapping section for more details on the supported pre-defined filters. More information about custom templates can be found here.

This plugin currently supports the 5 most commonly used DLP transformations:

  • Date Shift: Apply a random shift to a date/timestamp value (supported types: datetimestamp)

  • Masking: Mask sensitive text by replacing characters with the Masking Character (supported types: string)

  • One-way Hash: Apply a one-way cryptographic hash function to the data (supported types: all)

  • Redact: Remove sensitive text from the record (supported types: string)

  • Replace with value: Replace sensitive text with a new value (supported types: string)

  • Format Preserving Encryption: Replaces sensitive text with an format-preserving encrypted value. The value can be decrypted using the Decrypt Plugin (supported types: string)

Permissions

In order for this plugin to function, it requires permissions to access the Data Loss Prevention APIs. These permissions granted through the service account that is provided in the plugin configuration. If the service account path is set to auto-detect then it will use a service account with the name service-<project-number>@gcp-sa-datafusion.iam.gserviceaccount.com.

The DLP Administrator role must be granted to the service account to allow this plugin to access the DLP APIs.

While using Deterministic Encryption with KMS Wrapped Key, the Cloud KMS CryptoKey Encrypter/Decrypter role must be granted to Cloud Data Loss Prevention Service Agent.

DLP Filter Mapping

This plugin supports most pre-defined DLP filters, they are grouped into boarder categories for ease of use. The contents of each group are as follows:

  • Everything: Applies transformation to the entire field, no inspection or filtering is applied

  • Demographic: PERSON_NAME, AGE, DATE_OF_BIRTH, PHONE_NUMBER, ETHNIC_GROUP

  • Location: LOCATION, MAC_ADDRESS, MAC_ADDRESS_LOCAL

  • Tax IDs: AUSTRALIA_TAX_FILE_NUMBER, DENMARK_CPR_NUMBER, NORWAY_NI_NUMBER, PORTUGAL_CDC_NUMBER, US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER, US_EMPLOYER_IDENTIFICATION_NUMBER,US_PREPARER_TAXPAYER_IDENTIFICATION_NUMBER

  • Credit Card Numbers: CREDIT_CARD_NUMBER

  • Passport Numbers: NETHERLANDS_PASSPORT

  • Health IDs: US_HEALTHCARE_NPI, CANADA_OHIP

  • National IDs: CHINA_RESIDENT_ID_NUMBER, DENMARK_CPR_NUMBER, FRANCE_CNI, FRANCE_NIR, FINLAND_NATIONAL_ID_NUMBER, JAPAN_INDIVIDUAL_NUMBER, NORWAY_NI_NUMBER, PARAGUAY_CIC_NUMBER, POLAND_PESEL_NUMBER, POLAND_NATIONAL_ID_NUMBER, PORTUGAL_CDC_NUMBER, SPAIN_NIE_NUMBER, SPAIN_NIF_NUMBER, SWEDEN_NATIONAL_ID_NUMBER, US_SOCIAL_SECURITY_NUMBER, URUGUAY_CDI_NUMBER, VENEZUELA_CDI_NUMBER

  • Driver License IDs: SPAIN_DRIVERS_LICENSE_NUMBER, US_DRIVERS_LICENSE_NUMBERundefined

Metrics

This plugin records three metrics:

  • dlp.requests.count: Total number of requests sent to Data Loss Prevention API

  • dlp.requests.success: Number of requests that were successfully processed by Data Loss Prevention API

  • dlp.requests.fail: Number of requests that failed

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Use custom template

No

 

Required. Enabling this option will allow you to define a custom DLP Inspection Template to use for matching during the transformation.

Default is No.

Template ID

Yes

 

Optional. ID of the Inspection Template found in DLP.

Custom Template Path

Yes

6.7.0/1.3.0

Optional. Custom path of the DLP Inspection Template.

Resource Location

Yes

6.7.0/1.3.0

Optional. Use this property to specify the resource location for the DLP Service. For more information, see https://cloud.google.com/dlp/docs/specifying-location .

Default is global.

Fields to Transform

Yes

 

Required. This field contains the rules for which fields should be transformed, as well as the configurations for the transforms.

Service Account Path

Yes

 

Optional. Path on the local file system of the service account key used for authorization. Can be set to auto-detect when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect.

Project Id

Yes

 

Optional. Google Cloud project ID, which uniquely identified a project. It can be found on the Dashboard in the Google Cloud Platform console.

Default is auto-detect.

Created in 2020 by Google Inc.