Google Cloud Data Loss Prevention (DLP) PII Filter Transformation
The Google Cloud Data Loss Prevention (DLP) PII Filter transformation is available in the Hub.
This plugin uses the Data Loss prevention APIs which charge the user depending on the volume of data analyzed. More details on the exact costs can be found here.
Plugin version: 1.4.0
This plugin separates sensitive records from the input stream. A record is deemed sensitive if it matches a user-defined template. More info on creating templates can be found here.
The matching can be applied to the entire record or a particular field (recommended if the entire record is large, DLP supports a maximum of 0.5 MB per record)
There are three options for error handling in this plugin:
Stop pipeline: Stops the pipeline as soon as an error is encountered
Skip record: The record that caused the error will be skipped and no error will be reported
Send to error: Send errors to the error port and continue running the pipeline
Permissions
In order for this plugin to function, it requires permission to access the Data Loss Prevention APIs. These permissions granted through the service account that is provided in the plugin configuration. If the service account path is set to auto-detect
then it will use a service account with the name service-<project-number>@gcp-sa-datafusion.iam.gserviceaccount.com
.
The DLP Administrator
role must be granted to the service account to allow this plugin to access the DLP APIs.
Custom DLP Endpoint
By default, plugin uses Cloud DLP API endpoint. Should an alternate DLP API location be used, enable the Custom DLP endpoint
switch under Advanced settings
. It will display custom endpoint settings, which by default point to a local DLP API address:
Host -
dlp.local
by default - default hostname for a locally-exposed DLP APIPort -
7332
by default - alternate port where DLP API is exposedSend credentials -
No
by default - you must enable send credentials if you are accessing the Cloud DLP API through a proxy however it is optional if you are using a local instance of DLP.
Metrics
This plugin records three metrics:
dlp.requests.count
: Total number of requests sent to Data Loss Prevention APIdlp.requests.success
: Number of requests that were successfully processed by Data Loss Prevention APIdlp.requests.fail
: Number of requests that failed
Custom Template Path
The option to use a custom template path which is located in a different project other than the one specified in Project Id.
Configuration
Property | Macro Enabled? | Version Introduced | Description |
---|---|---|---|
Filter on | Yes |
| Required. Checks full record or a field. Default is Record. |
Field | Yes |
| Optional. Name of the field to be inspected. |
Use custom template | No | 6.7.0/1.3.0 | Required. Enabling this option will allow you to define a custom DLP inspection Template to use for matching during the transformation. Default is No. |
Template ID | Yes |
| Required. ID of the Inspection Template found in DLP. |
Custom Template Path | Yes | 6.7.0/1.3.0 | Optional. Custom template path of the DLP inspection template. |
Error Handling: On error | Yes |
| Required. Error handling of records. Default is Stop pipeline. |
Service Account Path | Yes |
| Optional. Path on the local file system of the service account key used for authorization. Can be set to auto-detect when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster. Default is auto-detect. |
Project Id | Yes |
| Optional. Google Cloud project ID, which uniquely identified a project. It can be found on the Dashboard in the Google Cloud Platform console. Default is auto-detect. |
Custom DLP endpoint | Yes |
| Use custom DLP endpoint. Default is No. |
Host | Yes |
| Optional. DLP host, for example dlp.google.apis.com or dlp.local. Default is dlp.local. |
Port | Yes |
| Optional. DLP port number between 0 and 65535. Default is 7332. |
Send Credentials | Yes |
| Enable send credentials if you are accessing the Cloud DLP API through a proxy. However, it is optional if you are using a local instance of DLP. Default is No. |
Created in 2020 by Google Inc.