The Google Cloud Data Loss Prevention (DLP) Redact transformation is available in the Hub.

note

This plugin uses Google’s Data Loss Prevention APIs which charge the user depending on the volume of data analyzed (not transformed). More details on the exact costs can be found here.

This plugin uses Google’s Data Loss Prevention APIs which charge the user depending on the volume of data analyzed (not transformed). More details on the exact costs can be found here.

Plugin version: 1.4.0

This plugin transforms sensitive records from the input stream. A record is deemed sensitive if it matches some pre-defined DLP filters or a custom user-defined template. See the DLP Filter Mapping section for more details on the supported pre-defined filters. More information about custom templates can be found here.

This plugin currently supports the 5 most commonly used DLP transformations:

Permissions

In order for this plugin to function, it requires permissions to access the Data Loss Prevention APIs. These permissions granted through the service account that is provided in the plugin configuration. If the service account path is set to auto-detect then it will use a service account with the name service-<project-number>@gcp-sa-datafusion.iam.gserviceaccount.com.

The DLP Administrator role must be granted to the service account to allow this plugin to access the DLP APIs.

While using Deterministic Encryption with KMS Wrapped Key, the Cloud KMS CryptoKey Encrypter/Decrypter role must be granted to Cloud Data Loss Prevention Service Agent.

DLP Filter Mapping

This plugin supports most pre-defined DLP filters, they are grouped into boarder categories for ease of use. The contents of each group are as follows:

Metrics

This plugin records three metrics:

Configuration

Property

Macro Enabled?

Version Introduced

Description

Use custom template

No

Required. Enabling this option will allow you to define a custom DLP Inspection Template to use for matching during the transformation.

Default is No.

Template ID

Yes

Optional. ID of the Inspection Template found in DLP.

Custom Template Path

Yes

6.7.0/1.3.0

Optional. Custom path of the DLP Inspection Template.

Resource Location

Yes

6.7.0/1.3.0

Optional. Use this property to specify the resource location for the DLP Service. For more information, see https://cloud.google.com/dlp/docs/specifying-location .

Default is global.

Fields to Transform

Yes

Required. This field contains the rules for which fields should be transformed, as well as the configurations for the transforms.

Service Account Path

Yes

Optional. Path on the local file system of the service account key used for authorization. Can be set to auto-detect when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect.

Project Id

Yes

Optional. Google Cloud project ID, which uniquely identified a project. It can be found on the Dashboard in the Google Cloud Platform console.

Default is auto-detect.