The Google Cloud Data Loss Prevention (DLP) PII Filter transformation is available in the Hub.

note

This plugin uses the Data Loss prevention APIs which charge the user depending on the volume of data analyzed. More details on the exact costs can be found here.

This plugin uses the Data Loss prevention APIs which charge the user depending on the volume of data analyzed. More details on the exact costs can be found here.

Plugin version: 1.4.0

This plugin separates sensitive records from the input stream. A record is deemed sensitive if it matches a user-defined template. More info on creating templates can be found here.

The matching can be applied to the entire record or a particular field (recommended if the entire record is large, DLP supports a maximum of 0.5 MB per record)

There are three options for error handling in this plugin:

Permissions

In order for this plugin to function, it requires permission to access the Data Loss Prevention APIs. These permissions granted through the service account that is provided in the plugin configuration. If the service account path is set to auto-detect then it will use a service account with the name service-<project-number>@gcp-sa-datafusion.iam.gserviceaccount.com.

The DLP Administrator role must be granted to the service account to allow this plugin to access the DLP APIs.

Custom DLP Endpoint

By default, plugin uses Cloud DLP API endpoint. Should an alternate DLP API location be used, enable the Custom DLP endpoint switch under Advanced settings. It will display custom endpoint settings, which by default point to a local DLP API address:

Metrics

This plugin records three metrics:

Custom Template Path

The option to use a custom template path which is located in a different project other than the one specified in Project Id.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Filter on

Yes

Required. Checks full record or a field.

Default is Record.

Field

Yes

Optional. Name of the field to be inspected.

Use custom template

No

6.7.0/1.3.0

Required. Enabling this option will allow you to define a custom DLP inspection Template to use for matching during the transformation.

Default is No.

Template ID

Yes

Required. ID of the Inspection Template found in DLP.

Custom Template Path

Yes

6.7.0/1.3.0

Optional. Custom template path of the DLP inspection template.

Error Handling: On error

Yes

Required. Error handling of records.

Default is Stop pipeline.

Service Account Path

Yes

Optional. Path on the local file system of the service account key used for authorization. Can be set to auto-detect when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect.

Project Id

Yes

Optional. Google Cloud project ID, which uniquely identified a project. It can be found on the Dashboard in the Google Cloud Platform console.

Default is auto-detect.

Custom DLP endpoint

Yes

Use custom DLP endpoint.

Default is No.

Host

Yes

Optional. DLP host, for example dlp.google.apis.com or dlp.local.

Default is dlp.local.

Port

Yes

Optional. DLP port number between 0 and 65535.

Default is 7332.

Send Credentials

Yes

Enable send credentials if you are accessing the Cloud DLP API through a proxy. However, it is optional if you are using a local instance of DLP.

Default is No.