-
Notifications
You must be signed in to change notification settings - Fork 73
Add keys() and remove() methods to SharedData. #203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm proposing this as I would like to be able to asynchronously garbage collect the VM's shared data, and right now, there's no way to do that. If I use SharedData, it'll grow indefinitely in size. One alternative, is to persist a fixed cardinality of objects, and GC data within those objects, mutating them within each request -- but that is less performant overall -- each request is going to have to deserialize a large amount of unrelated data. If i've overlooked some APIs that suit my use case, please let me know -- but on the face of it, this seems like a useful addition. |
Signed-off-by: James Mulcahy <[email protected]>
3e7e07d
to
2ed596b
Compare
re: CLA, I've joined the appropriate Netflix google CLA group, but I guess that change hasn't propagated yet. If the check gets re-run shortly, I'd expect it to pass. |
/retest |
@jamesmulcahy it looks that the Could you add your GitHub login via https://cla.developers.google.com/clas? |
I'm covered by the corporate CLA so don't think that applies, but on re-reading the instructions, I was missing this step "The email used to register you as an authorized contributor must also be attached to your GitHub account." I've done that now, so hopefully it'll work on a re-run! |
Signed-off-by: James Mulcahy <[email protected]>
61bc158
to
cb1b36a
Compare
@PiotrSikora Tests all passing now, ready for a review! |
Any plans on propagating this to the SDKs? cc @unleashed |
@rahulanand16nov Yes, that'd be my hope. I don't know what the typical process is here, but it seemed like getting the API in at this level was a good first step. |
@PiotrSikora I'd welcome your input/guidance on next steps here |
I'm interested in this one. SharedData is more useful (to me) if there's a way for it not to grow indefinitely. |
@PiotrSikora I added an extra commit to expose this through the Context. I've for some foreign functions wired up for this now, and confirm it's working. It'd be great to get this merged. I'll polish up the foreign functions and post a gist somewhere so others can take a similar approach, until we can safely update the ABI. |
src/shared_data.cc
Outdated
@@ -56,6 +56,43 @@ WasmResult SharedData::get(std::string_view vm_id, const std::string_view key, | |||
return WasmResult::NotFound; | |||
} | |||
|
|||
WasmResult SharedData::keys(std::string_view vm_id, const std::string_view key_prefix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key_prefix
is rather unusual for keys()
method. What's the reason for including it here parameter? Is this a workaround for a single KV store in current version of Proxy-Wasm or is there another reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since SharedData is shared by all plugins within a given VM ID, the guidance is to prefix your keys with a value specific to your plugin (I've seen this written somewhere, but I can't find it to reference right now). By offering this in the API, we simplify the job of the plugin writer, and encourage a best-practice of only working with their own data-set.
Without this, the first thing the plugins will have to do is filter out any keys that aren't relevant to them (or, worst case, they remove data belonging to another plugin).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but is there any reason to reuse the same VM ID other than sharing KV and Queues? (ignoring the fact that VM ID in Envoy defaults to an empty string, so it's an opt-out instead of opt-in from sharing - which wasn't fixed as part of envoyproxy/envoy-wasm#167)
Basically, I'm asking if this is going to be needed once there is support for multiple KV stores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep memory usage as low as possible was our motivation. That said, looking at some graphs right now, it looks like we see significant additional virtual memory usage as the number of WASM VMs grow (~10G/VM !?), we're not actually seeing all that much real memory usage.
The docs don't give much detail on resource utilization/behavior -- so perhaps my assumption here was wrong.
What are your expectations for increase in memory utilization per VM? (We're using V8 right now, for what it's worth).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WasmVMs cannot load multiple Wasm modules (ignoring module linking), similarly to how you cannot load multiple binaries in the same process. As such, VM ID is only a hint, and multiple instances of the same Proxy-Wasm plugin refer to the same WasmVM only if they have the same VM ID and plugin configuration. Different Proxy-Wasm plugins are always loaded in different WasmVMs, even if they provide the same VM ID.
As such, my question about the key prefix remains - do you have any reason to reuse the same VM ID across different plugins? Right now, it seems like we're adding the key prefix as a workaround for the poor default (VM ID sharing), which could be solved by simply using different VM IDs for different plugins.
As for the memory usage, I didn't look at it in a while, but 10 GiB VSZ per WasmVM sounds about right with V8, and RSS should be around 1 MiB per WasmVM (+ fixed V8 overhead that should be around 10 MiB per process).
V8 uses virtual memory for 2 reasons:
- Guard pages, which allow using signal handlers instead of explicit bound checks, which results in significant performance gain (see: v8: use signal handlers to catch out of bounds memory access. #144).
- Pointer compression, which reduces overall memory consumption, and improves performance (although I'm not sure if and how much does it affect Wasm), see: https://v8.dev/blog/pointer-compression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't appreciated that the plugin configuration was used to determine whether a distinct VM was created. With that in mind, I don't think the key prefix is necessary, no.
I'll update the PR to remove it. Thanks for the info!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misspoke. All instances of the same Proxy-Wasm plugin with the same VM ID and VM configuration will be attached to the same WasmVM, regardless of the plugin configuration.
Basically, WasmVM's key is derived from: plugin's bytecode, VM ID and VM configuration.
Let me know if that changes anything wrt key prefix.
Also, it looks that your last commit is missing |
Signed-off-by: James Mulcahy <[email protected]>
Signed-off-by: James Mulcahy <[email protected]>
004edbb
to
eba79a5
Compare
@PiotrSikora Thanks for the review; I addressed all your comments and pushed an extra commit. The DCO is fixed, too. |
src/shared_data.cc
Outdated
@@ -56,6 +56,43 @@ WasmResult SharedData::get(std::string_view vm_id, const std::string_view key, | |||
return WasmResult::NotFound; | |||
} | |||
|
|||
WasmResult SharedData::keys(std::string_view vm_id, const std::string_view key_prefix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but is there any reason to reuse the same VM ID other than sharing KV and Queues? (ignoring the fact that VM ID in Envoy defaults to an empty string, so it's an opt-out instead of opt-in from sharing - which wasn't fixed as part of envoyproxy/envoy-wasm#167)
Basically, I'm asking if this is going to be needed once there is support for multiple KV stores?
Signed-off-by: James Mulcahy <[email protected]>
v8 macos failure is:
Which I don't think is related to this PR? |
@jamesmulcahy could you merge |
@PiotrSikora Done! |
Signed-off-by: James Mulcahy <[email protected]>
8c05d4c
to
dbf1234
Compare
@PiotrSikora PR updated to remove key_prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Hi @jamesmulcahy, nice work here! I've been waiting for a fix for this for a couple of months. Do you, by any chance, have an example or hint on how to use this functionality through FFI? Not sure how to integrate this with current SDKs without the exports. Thanks! |
@balbifm Sure. I've created an envoy extension which we compile into our build. I haven't yet updated the extension for the last round of changes made during review (specifically, removal of the key_prefix argument to keys(), and adding the result object to remove()), but this is a gist of the extension from before those changes. Once this is registered in your build, you can call these methods through the Foreign Function interface in whatever language SDK you're using. https://gist.github.com/jamesmulcahy/03a656b2a4e564993e6f044672698e2b |
@jamesmulcahy Thanks so much for implementing this. I'm trying to use the WASM filter in a memory-constrained environment, and this functionality is really helpful there. Since it's been a few months since this was merged, I wanted to see if the plan was still to add these new methods to exports so that the SDKs can easily wrap them. |
No description provided.