Google Cloud Data Loss Prevention (DLP) Redact Transformation
The Google Cloud Data Loss Prevention (DLP) Redact transformation is available in the Hub.
This plugin uses Google’s Data Loss Prevention APIs which charge the user depending on the volume of data analyzed (not transformed). More details on the exact costs can be found here.
Plugin version: 1.4.0
This plugin transforms sensitive records from the input stream. A record is deemed sensitive if it matches some pre-defined DLP filters or a custom user-defined template. See the DLP Filter Mapping section for more details on the supported pre-defined filters. More information about custom templates can be found here.
This plugin currently supports the 5 most commonly used DLP transformations:
Date Shift: Apply a random shift to a date/timestamp value (supported types:Â
date
,Âtimestamp
)Masking: Mask sensitive text by replacing characters with the Masking Character (supported types:Â
string
)One-way Hash: Apply a one-way cryptographic hash function to the data (supported types:Â
all
)Redact: Remove sensitive text from the record (supported types:Â
string
)Replace with value: Replace sensitive text with a new value (supported types:Â
string
)Format Preserving Encryption: Replaces sensitive text with an format-preserving encrypted value. The value can be decrypted using the Decrypt Plugin (supported types:Â
string
)
Permissions
In order for this plugin to function, it requires permissions to access the Data Loss Prevention APIs. These permissions granted through the service account that is provided in the plugin configuration. If the service account path is set to auto-detect
then it will use a service account with the name service-<project-number>@gcp-sa-datafusion.iam.gserviceaccount.com
.
The DLP Administrator
role must be granted to the service account to allow this plugin to access the DLP APIs.
While using Deterministic Encryption
with KMS Wrapped Key
, the Cloud KMS CryptoKey Encrypter/Decrypter
role must be granted to Cloud Data Loss Prevention Service Agent
.
DLP Filter Mapping
This plugin supports most pre-defined DLP filters, they are grouped into boarder categories for ease of use. The contents of each group are as follows:
Everything: Applies transformation to the entire field, no inspection or filtering is applied
Demographic: PERSON_NAME, AGE, DATE_OF_BIRTH, PHONE_NUMBER, ETHNIC_GROUP
Location: LOCATION, MAC_ADDRESS, MAC_ADDRESS_LOCAL
Tax IDs: AUSTRALIA_TAX_FILE_NUMBER, DENMARK_CPR_NUMBER, NORWAY_NI_NUMBER, PORTUGAL_CDC_NUMBER, US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER, US_EMPLOYER_IDENTIFICATION_NUMBER,US_PREPARER_TAXPAYER_IDENTIFICATION_NUMBER
Credit Card Numbers: CREDIT_CARD_NUMBER
Passport Numbers: NETHERLANDS_PASSPORT
Health IDs: US_HEALTHCARE_NPI, CANADA_OHIP
National IDs: CHINA_RESIDENT_ID_NUMBER, DENMARK_CPR_NUMBER, FRANCE_CNI, FRANCE_NIR, FINLAND_NATIONAL_ID_NUMBER, JAPAN_INDIVIDUAL_NUMBER, NORWAY_NI_NUMBER, PARAGUAY_CIC_NUMBER, POLAND_PESEL_NUMBER, POLAND_NATIONAL_ID_NUMBER, PORTUGAL_CDC_NUMBER, SPAIN_NIE_NUMBER, SPAIN_NIF_NUMBER, SWEDEN_NATIONAL_ID_NUMBER, US_SOCIAL_SECURITY_NUMBER, URUGUAY_CDI_NUMBER, VENEZUELA_CDI_NUMBER
Driver License IDs: SPAIN_DRIVERS_LICENSE_NUMBER, US_DRIVERS_LICENSE_NUMBERundefined
Metrics
This plugin records three metrics:
dlp.requests.count
: Total number of requests sent to Data Loss Prevention APIdlp.requests.success
: Number of requests that were successfully processed by Data Loss Prevention APIdlp.requests.fail
: Number of requests that failed
Configuration
Property | Macro Enabled? | Version Introduced | Description |
---|---|---|---|
Use custom template | No | Â | Required. Enabling this option will allow you to define a custom DLP Inspection Template to use for matching during the transformation. Default is No. |
Template ID | Yes | Â | Optional. ID of the Inspection Template found in DLP. |
Custom Template Path | Yes | 6.7.0/1.3.0 | Optional. Custom path of the DLP Inspection Template. |
Resource Location | Yes | 6.7.0/1.3.0 | Optional. Use this property to specify the resource location for the DLP Service. For more information, see https://cloud.google.com/dlp/docs/specifying-location . Default is global. |
Fields to Transform | Yes | Â | Required. This field contains the rules for which fields should be transformed, as well as the configurations for the transforms. |
Service Account Path | Yes | Â | Optional. Path on the local file system of the service account key used for authorization. Can be set to auto-detect when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster. Default is auto-detect. |
Project Id | Yes | Â | Optional. Google Cloud project ID, which uniquely identified a project. It can be found on the Dashboard in the Google Cloud Platform console. Default is auto-detect. |
Created in 2020 by Google Inc.