Skip to content

Add DLP replace with info type transformation examples, and update DLP README #1299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 26 additions & 23 deletions dlp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,43 +44,46 @@ Note: image scanning is not currently supported on Google Cloud Storage.
For more information, refer to the [API documentation](https://cloud.google.com/dlp/docs).
Optional flags are explained in [this resource](https://cloud.google.com/dlp/docs/reference/rest/v2beta1/content/inspect#InspectConfig).
```
Commands:
-s <string> Inspect a string using the Data Loss Prevention API.
-f <filepath> Inspects a local text, PNG, or JPEG file using the Data Loss Prevention API.
-gcs -bucketName <bucketName> -fileName <fileName> Inspects a text file stored on Google Cloud Storage using the Data Loss
Prevention API.
-ds -projectId [projectId] -namespace [namespace] - kind <kind> Inspect a Datastore instance using the Data Loss Prevention API.

Options:
--help Show help
-minLikelihood [string] [choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
[default: "LIKELIHOOD_UNSPECIFIED"]
specifies the minimum reporting likelihood threshold.
-f, --maxFindings [number] [default: 0]
maximum number of results to retrieve
-q, --includeQuote [boolean] [default: true] include matching string in results
-t, --infoTypes set of infoTypes to search for [eg. PHONE_NUMBER US_PASSPORT]
-customDictionaries set of comma-separated dictionary words to search for as customInfoTypes
-customRegexes set of regex patterns to search for as customInfoTypes
usage: com.example.dlp.Inspect
-bq,--Google BigQuery inspect BigQuery table
-bucketName <arg>
-customDictionaries <arg>
-customRegexes <arg>
-datasetId <arg>
-ds,--Google Datastore inspect Datastore kind
-f,--file path <arg> inspect input file path
-fileName <arg>
-gcs,--Google Cloud Storage inspect GCS file
-includeQuote <arg>
-infoTypes <arg>
-kind <arg>
-maxFindings <arg>
-minLikelihood <arg>
-namespace <arg>
-projectId <arg>
-s,--string <arg> inspect string
-subscriptionId <arg>
-tableId <arg>
-topicId <arg>
```
### Examples
- Inspect a string:
```
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -s "My phone number is (123) 456-7890 and my email address is [email protected]" --infoTypes PHONE_NUMBER EMAIL_ADDRESS
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -s "My phone number is (123) 456-7890 and my email address is [email protected]" -infoTypes PHONE_NUMBER EMAIL_ADDRESS
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -s "My phone number is (123) 456-7890 and my email address is [email protected]" -customDictionaries [email protected] -customRegexes "\(\d{3}\) \d{3}-\d{4}"
```
- Inspect a local file (text / image):
```
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f src/test/resources/test.txt --infoTypes PHONE_NUMBER EMAIL_ADDRESS
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f src/test/resources/test.png --infoTypes PHONE_NUMBER EMAIL_ADDRESS
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f src/test/resources/test.txt -infoTypes PHONE_NUMBER EMAIL_ADDRESS
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f src/test/resources/test.png -infoTypes PHONE_NUMBER EMAIL_ADDRESS
```
- Inspect a file on Google Cloud Storage:
```
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -gcs -bucketName my-bucket -fileName my-file.txt --infoTypes PHONE_NUMBER EMAIL_ADDRESS
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -gcs -bucketName my-bucket -fileName my-file.txt -infoTypes PHONE_NUMBER EMAIL_ADDRESS
```
- Inspect a Google Cloud Datastore kind:
```
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -ds -kind my-kind --infoTypes PHONE_NUMBER EMAIL_ADDRESS
java -cp dlp/target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -ds -kind my-kind -infoTypes PHONE_NUMBER EMAIL_ADDRESS
```

## Automatic redaction of sensitive data from images
Expand Down
76 changes: 75 additions & 1 deletion dlp/src/main/java/com/example/dlp/DeIdentification.java
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
import com.google.privacy.dlp.v2.RecordTransformations;
import com.google.privacy.dlp.v2.ReidentifyContentRequest;
import com.google.privacy.dlp.v2.ReidentifyContentResponse;
import com.google.privacy.dlp.v2.ReplaceWithInfoTypeConfig;
import com.google.privacy.dlp.v2.Table;
import com.google.privacy.dlp.v2.Value;
import com.google.protobuf.ByteString;
Expand Down Expand Up @@ -71,6 +72,71 @@

public class DeIdentification {

// [START dlp_deidentify_replace_with_info_type]
/**
* Deidentify a string by replacing sensitive information with its info type using the DLP API.
*
* @param string The string to deidentify.
* @param projectId ID of Google Cloud project to run the API under.
*/
private static void deIdentifyReplaceWithInfoType(
String string,
List<InfoType> infoTypes,
String projectId) {

// instantiate a client
try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

ContentItem contentItem = ContentItem.newBuilder().setValue(string).build();

// Create the deidentification transformation configuration
PrimitiveTransformation primitiveTransformation =
PrimitiveTransformation.newBuilder()
.setReplaceWithInfoTypeConfig(ReplaceWithInfoTypeConfig.getDefaultInstance())
.build();

InfoTypeTransformation infoTypeTransformationObject =
InfoTypeTransformation.newBuilder()
.setPrimitiveTransformation(primitiveTransformation)
.build();

InfoTypeTransformations infoTypeTransformationArray =
InfoTypeTransformations.newBuilder()
.addTransformations(infoTypeTransformationObject)
.build();

InspectConfig inspectConfig =
InspectConfig.newBuilder()
.addAllInfoTypes(infoTypes)
.build();

DeidentifyConfig deidentifyConfig =
DeidentifyConfig.newBuilder()
.setInfoTypeTransformations(infoTypeTransformationArray)
.build();

// Create the deidentification request object
DeidentifyContentRequest request =
DeidentifyContentRequest.newBuilder()
.setParent(ProjectName.of(projectId).toString())
.setInspectConfig(inspectConfig)
.setDeidentifyConfig(deidentifyConfig)
.setItem(contentItem)
.build();

// Execute the deidentification request
DeidentifyContentResponse response = dlpServiceClient.deidentifyContent(request);

// Print the redacted input value
// e.g. "My SSN is 123456789" --> "My SSN is [US_SOCIAL_SECURITY_NUMBER]"
String result = response.getItem().getValue();
System.out.println(result);
} catch (Exception e) {
System.out.println("Error in deIdentifyReplaceWithInfoType: " + e.getMessage());
}
}
// [END dlp_deidentify_replace_with_info_type]

// [START dlp_deidentify_masking]
/**
* Deidentify a string by masking sensitive information with a character using the DLP API.
Expand Down Expand Up @@ -512,6 +578,10 @@ public static void main(String[] args) throws Exception {
OptionGroup optionsGroup = new OptionGroup();
optionsGroup.setRequired(true);

Option deidentifyReplaceWithInfoTypeOption =
new Option("it", "info_type_replace", true, "Deidentify by replacing with info type.");
optionsGroup.addOption(deidentifyReplaceWithInfoTypeOption);

Option deidentifyMaskingOption =
new Option("m", "mask", true, "Deidentify with character masking.");
optionsGroup.addOption(deidentifyMaskingOption);
Expand Down Expand Up @@ -606,7 +676,11 @@ public static void main(String[] args) throws Exception {
}
}

if (cmd.hasOption("m")) {
if (cmd.hasOption("it")) {
// replace with info type
String val = cmd.getOptionValue(deidentifyReplaceWithInfoTypeOption.getOpt());
deIdentifyReplaceWithInfoType(val, infoTypesList, projectId);
} else if (cmd.hasOption("m")) {
// deidentification with character masking
int numberToMask = Integer.parseInt(cmd.getOptionValue(numberToMaskOption.getOpt(), "0"));
char maskingCharacter = cmd.getOptionValue(maskingCharacterOption.getOpt(), "*").charAt(0);
Expand Down
12 changes: 12 additions & 0 deletions dlp/src/test/java/com/example/dlp/DeIdentificationIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,18 @@ public void setUp() {
assertNotNull(System.getenv("DLP_DEID_KEY_NAME"));
}

@Test
public void testDeidReplaceWithInfoType() throws Exception {
String text = "\"My SSN is 372819127\"";
DeIdentification.main(
new String[] {
"-it", text,
"-infoTypes", "US_SOCIAL_SECURITY_NUMBER"
});
String output = bout.toString();
assertThat(output, containsString("My SSN is [US_SOCIAL_SECURITY_NUMBER]"));
}

@Test
public void testDeidStringMasksCharacters() throws Exception {
String text = "\"My SSN is 372819127\"";
Expand Down