Google Cloud Data Loss Prevention (DLP) PII Filter Transformation

The Google Cloud Data Loss Prevention (DLP) PII Filter transformation is available in the Hub.

This plugin uses the Data Loss prevention APIs which charge the user depending on the volume of data analyzed. More details on the exact costs can be found here.

Plugin version: 1.4.0

This plugin separates sensitive records from the input stream. A record is deemed sensitive if it matches a user-defined template. More info on creating templates can be found here.

The matching can be applied to the entire record or a particular field (recommended if the entire record is large, DLP supports a maximum of 0.5 MB per record)

There are three options for error handling in this plugin:

  • Stop pipeline: Stops the pipeline as soon as an error is encountered

  • Skip record: The record that caused the error will be skipped and no error will be reported

  • Send to error: Send errors to the error port and continue running the pipeline

Permissions

In order for this plugin to function, it requires permission to access the Data Loss Prevention APIs. These permissions granted through the service account that is provided in the plugin configuration. If the service account path is set to auto-detect then it will use a service account with the name service-<project-number>@gcp-sa-datafusion.iam.gserviceaccount.com.

The DLP Administrator role must be granted to the service account to allow this plugin to access the DLP APIs.

Custom DLP Endpoint

By default, plugin uses Cloud DLP API endpoint. Should an alternate DLP API location be used, enable the Custom DLP endpoint switch under Advanced settings. It will display custom endpoint settings, which by default point to a local DLP API address:

  • Host - dlp.local by default - default hostname for a locally-exposed DLP API

  • Port - 7332 by default - alternate port where DLP API is exposed

  • Send credentials - No by default - you must enable send credentials if you are accessing the Cloud DLP API through a proxy however it is optional if you are using a local instance of DLP.

Metrics

This plugin records three metrics:

  • dlp.requests.count: Total number of requests sent to Data Loss Prevention API

  • dlp.requests.success: Number of requests that were successfully processed by Data Loss Prevention API

  • dlp.requests.fail: Number of requests that failed

Custom Template Path

The option to use a custom template path which is located in a different project other than the one specified in Project Id.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Filter on

Yes

 

Required. Checks full record or a field.

Default is Record.

Field

Yes

 

Optional. Name of the field to be inspected.

Use custom template

No

6.7.0/1.3.0

Required. Enabling this option will allow you to define a custom DLP inspection Template to use for matching during the transformation.

Default is No.

Template ID

Yes

 

Required. ID of the Inspection Template found in DLP.

Custom Template Path

Yes

6.7.0/1.3.0

Optional. Custom template path of the DLP inspection template.

Error Handling: On error

Yes

 

Required. Error handling of records.

Default is Stop pipeline.

Service Account Path

Yes

 

Optional. Path on the local file system of the service account key used for authorization. Can be set to auto-detect when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect.

Project Id

Yes

 

Optional. Google Cloud project ID, which uniquely identified a project. It can be found on the Dashboard in the Google Cloud Platform console.

Default is auto-detect.

Custom DLP endpoint

Yes

 

Use custom DLP endpoint.

Default is No.

Host

Yes

 

Optional. DLP host, for example dlp.google.apis.com or dlp.local.

Default is dlp.local.

Port

Yes

 

Optional. DLP port number between 0 and 65535.

Default is 7332.

Send Credentials

Yes

 

Enable send credentials if you are accessing the Cloud DLP API through a proxy. However, it is optional if you are using a local instance of DLP.

Default is No.

Created in 2020 by Google Inc.