-
Notifications
You must be signed in to change notification settings - Fork 345
Emit events from the Contents Service #954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
"$id": https://events.jupyter.org/jupyter_server/contents_service/v1 | ||
version: 1 | ||
title: Contents Manager activities | ||
personal-data: true | ||
description: | | ||
Record actions on files via the ContentsManager. | ||
|
||
The notebook ContentsManager REST API is used by all frontends to retreive, | ||
save, list, delete and perform other actions on notebooks, directories, | ||
and other files through the UI. This is pluggable - the default acts on | ||
the file system, but can be replaced with a different ContentsManager | ||
implementation - to work on S3, Postgres, other object stores, etc. | ||
The events get recorded regardless of the ContentsManager implementation | ||
being used. | ||
|
||
Limitations: | ||
|
||
1. This does not record all filesystem access, just the ones that happen | ||
explicitly via the notebook server's REST API. Users can (and often do) | ||
trivially access the filesystem in many other ways (such as `open()` calls | ||
in their code), so this is usually never a complete record. | ||
2. As with all events recorded by the notebook server, users most likely | ||
have the ability to modify the code of the notebook server. Unless other | ||
security measures are in place, these events should be treated as user | ||
controlled and not used in high security areas. | ||
3. Events are only recorded when an action succeeds. | ||
type: object | ||
required: | ||
- action | ||
- path | ||
properties: | ||
action: | ||
enum: | ||
- get | ||
- create | ||
- save | ||
- upload | ||
- rename | ||
- copy | ||
- delete | ||
description: | | ||
Action performed by the ContentsManager API. | ||
|
||
This is a required field. | ||
|
||
Possible values: | ||
|
||
1. get | ||
Get contents of a particular file, or list contents of a directory. | ||
|
||
2. save | ||
Save a file at path with contents from the client | ||
|
||
3. rename | ||
Rename a file or directory from value in source_path to | ||
value in path. | ||
|
||
4. copy | ||
Copy a file or directory from value in source_path to | ||
value in path. | ||
|
||
5. delete | ||
Delete a file or empty directory at given path | ||
path: | ||
type: string | ||
description: | | ||
Logical path on which the operation was performed. | ||
|
||
This is a required field. | ||
source_path: | ||
type: string | ||
description: | | ||
Source path of an operation when action is 'copy' or 'rename' |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -395,6 +395,7 @@ def get(self, path, content=True, type=None, format=None): | |
if type == "directory": | ||
raise web.HTTPError(400, "%s is not a directory" % path, reason="bad type") | ||
model = self._file_model(path, content=content, format=format) | ||
self.emit(data={"action": "get", "path": path}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a vague performance concern about emitting an event on every get and save. These are by far the most common ContentsManager actions called by clients. In my opinion, we should only emit events that have demonstrated a developer need. For my File ID manager, I do not need these events. If there are no known uses for these events, I suggest we remove these events entirely rather than attempt to anticipate future uses at the cost of performance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think this is too stringent of a condition. Events certainly have use-cases outside of just developers. Operators/admins look to use this event system to audit user activity—in fact, this was the original demand driving the jupyter events/telemetry efforts mentioned in this Jupyter Telemetry enhancement proposal (jupyter/enhancement-proposals#41). If this work is causing a noticeable performance degradation, we should address it rather than limit the usage of the event system for operators. Also, keep in mind, in scenarios where there are no handlers or listeners, the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I think we actually are in agreement; I forgot Jupyter maintains a distinction between developers and operators, and I was using the word "developer" to mean both. Yes, we should definitely be conscious of performance when building this, though I'm sure it's quite difficult to measure.
I took a look at that logic, and I'm not sure if that's sufficient. That checks if there is are any handlers/listeners attached to the current event logger, meaning if I add a listener to another schema or another event type, then a Contents Manager event still performs a deep copy, validation, and building an event capsule only to do nothing with it. Of course, this is out of scope for the PR. I'll make an issue of this in Jupyter Events for further discussion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think the main point is that Jupyter Server should reach a future state where it provides high volumes of fine-grained, structured event data from most areas across Jupyter Server, allowing operators to get detailed information about what's happening in a running server. In many deployment scenarios, it's required that operators keep a detailed log of everything happening in the application. If performance is the concern, the answer isn't to avoid adding events; it's to improve performance. I think we can do it! 😎
Yeah, that's right. I think we should update jupyter_events to check if the specific event is being watched by any listeners. This is a little trickier (not impossible) to do with handlers, since we don't keep a mapping of handlers to their specific events (consequence of using Python's logging libraries for the main EventLogger). All handlers listen to all events, unless someone adds a |
||
return model | ||
|
||
def _save_directory(self, os_path, model, path=""): | ||
|
@@ -459,7 +460,7 @@ def save(self, model, path=""): | |
model["message"] = validation_message | ||
|
||
self.run_post_save_hooks(model=model, os_path=os_path) | ||
|
||
self.emit(data={"action": "save", "path": path}) | ||
return model | ||
|
||
def delete_file(self, path): | ||
|
@@ -735,6 +736,7 @@ async def get(self, path, content=True, type=None, format=None): | |
if type == "directory": | ||
raise web.HTTPError(400, "%s is not a directory" % path, reason="bad type") | ||
model = await self._file_model(path, content=content, format=format) | ||
self.emit(data={"action": "get", "path": path}) | ||
return model | ||
|
||
async def _save_directory(self, os_path, model, path=""): | ||
|
@@ -795,7 +797,7 @@ async def save(self, model, path=""): | |
model["message"] = validation_message | ||
|
||
self.run_post_save_hooks(model=model, os_path=os_path) | ||
|
||
self.emit(data={"action": "save", "path": path}) | ||
return model | ||
|
||
async def delete_file(self, path): | ||
|
Uh oh!
There was an error while loading. Please reload this page.