Skip to content

ref(pii): Better explanation of selectors #1562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 20, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -62,17 +62,60 @@ Rules generally consist of three parts:
- _MAC Addresses_
- _Anything_: Matches any value. This is useful if you want to remove a certain JSON key by path using [_Selectors_](#selectors) regardless of the value.

{% capture __alert_content -%}

Sentry does not know if a local variable that looks like a credit card number actually is one. As such, you need to expect not only false-positives but also false-negatives. [_Selectors_](#selectors) can help you in limiting the scope in which your rule runs.

{%- endcapture -%}
{%- include components/alert.html
title="Sentry does not know what your code does"
content=__alert_content
level="warning"
%}


## Selectors

You can select a region of the event using JSON-path-like syntax. As an example, to delete a specific key in "Additional Data", you would configure:
Selectors allow you to restrict rules to certain parts of the event. This is useful to unconditionally remove certain data by variable/field name from the event, but can also be used to conservatively test rules on real data.

Data scrubbing always works on the raw event payload. Keep in mind that some fields in the UI may be called differently in the JSON schema. When looking at an event there should always be a link called "JSON" present that allows you to see what the data scrubber sees.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks for pointing it out clearly. The example below is also very good.


For example, what is called "Additional Data" in the UI is called `extra` in the event payload. To remove a specific key called `foo`, you would write:

```
[Remove] [Anything] from [extra.foo]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My curiosity, does this apply equally to error/exception events, message events and transaction/spans events?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to all of them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing timestamps from spans would be catastrophic :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some safety rails for this. Ideally they are so unobstrusive that we will never need to document them, but let's see :D

```

Another example. Sentry knows about two kinds of error messages: the exception message, and the top-level log message. Here is an example of how such an event payload as sent by the SDK (and downloadable from the UI) would look like:

```json
{
"logentry": {
"formatted": "Failed to roll out the dinglebop"
},
"exception": {
"values": [
{
"type": "ZeroDivisionError",
"value": "integer division or modulo by zero",
}
]
}
}
```

Since the "error message" is taken from the `exception`'s `value`, and the "message" is taken from `logentry`, we would have to write the following to remove both from the event:

> `Remove` `Anything` from `extra.foo`
```
[Remove] [Anything] from [exception.value]
[Remove] [Anything] from [logentry.formatted]
```

### Boolean Logic

You can combine selectors using boolean logic.

* Prefix with `!` to invert the selector. `foo` matches the JSON key `foo`, while `(~foo)` matches everything but `foo`.
* Prefix with `!` to invert the selector. `foo` matches the JSON key `foo`, while `!foo` matches everything but `foo`.
* Build the conjunction (AND) using `&&`, such as: `foo && !extra.foo` to match the key `foo` except when inside of `extra`.
* Build the disjunction (OR) using `||`, such as: `foo || bar` to match `foo` or `bar`.

Expand All @@ -83,32 +126,33 @@ You can combine selectors using boolean logic.

### Value Types

Select subsections by JSON-type or semantic meaning using the following:

* `$string`
* `$number`
* `$boolean`
* `$datetime`
* `$array`
* `$object`
* `$event`
* `$exception`
* `$stacktrace`
* `$frame`
* `$request`
* `$user`
* `$logentry` (also applies to `event.message`)
* `$thread`
* `$breadcrumb`
* `$span`
* `$sdk`

Examples:
Select subsections by JSON-type using the following:

* `$string` matches any string value
* `$number` matches any integer or float value
* `$datetime` matches any field in the event that represents a timestamp
* `$array` matches any JSON array value
* `$object` matches any JSON object

Select known parts of the schema using the following:

* `$exception` matches a single exception instance in `{"exception": {"values": [...]}}`
* `$stacktrace` matches a stack trace instance
* `$frame` matches a frame
* `$request` matches the HTTP request context of an event
* `$user` matches the user context of an event
* `$logentry` matches both the `logentry` attribute of an event as well as the `message` attribute
* `$thread` matches a single thread instance in `{"threads": {"values": [...]}}`
* `$breadcrumb` matches a single breadcrumb in `{"breadcrumbs": {"values": [...]}}`
* `$span` matches a [trace span]({% link _documentation/performance/performance-glossary.md %}#span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pity that Jekyll doesn't support checking the reference within the target page, we indeed have to construct the link to #span the way it is here, and accept it may become broken without breaking the build (I admit, I don't even know if a bad {% link %} would break the build either...).

https://jekyllrb.com/docs/liquid/tags/#link

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check out bin/link-check, unfortunately it is extremely slow as it goes over HTML output and is still Ruby

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for sharing!

* `$sdk` matches the SDK context in `{"sdk": ...}`

#### Examples

* Delete `event.user`:

```
[Remove] [Anything] from [event.user]
[Remove] [Anything] from [$user]
```

* Delete all frame-local variables:
Expand All @@ -117,7 +161,7 @@ Examples:
[Remove] [Anything] from [$frame.vars]
```

### Escaping Specal Characters
### Escaping Special Characters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope you don't mind I took a chance to fix an old typo here :)


If the object key you want to match contains whitespace or special characters, you can use quotes to escape it:

Expand Down