Skip to content

Add keys() and remove() methods to SharedData. #203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 1, 2022

Conversation

jamesmulcahy
Copy link
Contributor

No description provided.

@jamesmulcahy
Copy link
Contributor Author

jamesmulcahy commented Dec 2, 2021

I'm proposing this as I would like to be able to asynchronously garbage collect the VM's shared data, and right now, there's no way to do that. If I use SharedData, it'll grow indefinitely in size.

One alternative, is to persist a fixed cardinality of objects, and GC data within those objects, mutating them within each request -- but that is less performant overall -- each request is going to have to deserialize a large amount of unrelated data.

If i've overlooked some APIs that suit my use case, please let me know -- but on the face of it, this seems like a useful addition.

@jamesmulcahy
Copy link
Contributor Author

re: CLA, I've joined the appropriate Netflix google CLA group, but I guess that change hasn't propagated yet. If the check gets re-run shortly, I'd expect it to pass.

@jamesmulcahy
Copy link
Contributor Author

/retest

@PiotrSikora
Copy link
Member

@jamesmulcahy it looks that the cla/google check needs to have both your email and GitHub login on file (see: https://github.com/proxy-wasm/proxy-wasm-cpp-host/pull/203/checks?check_run_id=4466687349).

Could you add your GitHub login via https://cla.developers.google.com/clas?

@jamesmulcahy
Copy link
Contributor Author

I'm covered by the corporate CLA so don't think that applies, but on re-reading the instructions, I was missing this step

"The email used to register you as an authorized contributor must also be attached to your GitHub account."

I've done that now, so hopefully it'll work on a re-run!

@jamesmulcahy
Copy link
Contributor Author

@PiotrSikora Tests all passing now, ready for a review!

@rahulanand16nov
Copy link

Any plans on propagating this to the SDKs?

cc @unleashed

@jamesmulcahy
Copy link
Contributor Author

@rahulanand16nov Yes, that'd be my hope. I don't know what the typical process is here, but it seemed like getting the API in at this level was a good first step.

@jamesmulcahy
Copy link
Contributor Author

@PiotrSikora I'd welcome your input/guidance on next steps here

@danielpcox
Copy link

I'm interested in this one. SharedData is more useful (to me) if there's a way for it not to grow indefinitely.

@jamesmulcahy
Copy link
Contributor Author

@PiotrSikora I added an extra commit to expose this through the Context. I've for some foreign functions wired up for this now, and confirm it's working. It'd be great to get this merged.

I'll polish up the foreign functions and post a gist somewhere so others can take a similar approach, until we can safely update the ABI.

@@ -56,6 +56,43 @@ WasmResult SharedData::get(std::string_view vm_id, const std::string_view key,
return WasmResult::NotFound;
}

WasmResult SharedData::keys(std::string_view vm_id, const std::string_view key_prefix,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key_prefix is rather unusual for keys() method. What's the reason for including it here parameter? Is this a workaround for a single KV store in current version of Proxy-Wasm or is there another reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SharedData is shared by all plugins within a given VM ID, the guidance is to prefix your keys with a value specific to your plugin (I've seen this written somewhere, but I can't find it to reference right now). By offering this in the API, we simplify the job of the plugin writer, and encourage a best-practice of only working with their own data-set.

Without this, the first thing the plugins will have to do is filter out any keys that aren't relevant to them (or, worst case, they remove data belonging to another plugin).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but is there any reason to reuse the same VM ID other than sharing KV and Queues? (ignoring the fact that VM ID in Envoy defaults to an empty string, so it's an opt-out instead of opt-in from sharing - which wasn't fixed as part of envoyproxy/envoy-wasm#167)

Basically, I'm asking if this is going to be needed once there is support for multiple KV stores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep memory usage as low as possible was our motivation. That said, looking at some graphs right now, it looks like we see significant additional virtual memory usage as the number of WASM VMs grow (~10G/VM !?), we're not actually seeing all that much real memory usage.

The docs don't give much detail on resource utilization/behavior -- so perhaps my assumption here was wrong.

What are your expectations for increase in memory utilization per VM? (We're using V8 right now, for what it's worth).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WasmVMs cannot load multiple Wasm modules (ignoring module linking), similarly to how you cannot load multiple binaries in the same process. As such, VM ID is only a hint, and multiple instances of the same Proxy-Wasm plugin refer to the same WasmVM only if they have the same VM ID and plugin configuration. Different Proxy-Wasm plugins are always loaded in different WasmVMs, even if they provide the same VM ID.

As such, my question about the key prefix remains - do you have any reason to reuse the same VM ID across different plugins? Right now, it seems like we're adding the key prefix as a workaround for the poor default (VM ID sharing), which could be solved by simply using different VM IDs for different plugins.

As for the memory usage, I didn't look at it in a while, but 10 GiB VSZ per WasmVM sounds about right with V8, and RSS should be around 1 MiB per WasmVM (+ fixed V8 overhead that should be around 10 MiB per process).

V8 uses virtual memory for 2 reasons:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't appreciated that the plugin configuration was used to determine whether a distinct VM was created. With that in mind, I don't think the key prefix is necessary, no.

I'll update the PR to remove it. Thanks for the info!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misspoke. All instances of the same Proxy-Wasm plugin with the same VM ID and VM configuration will be attached to the same WasmVM, regardless of the plugin configuration.

Basically, WasmVM's key is derived from: plugin's bytecode, VM ID and VM configuration.

Let me know if that changes anything wrt key prefix.

@PiotrSikora
Copy link
Member

Also, it looks that your last commit is missing Signed-off-by line, so DCO bot is broken.

@jamesmulcahy
Copy link
Contributor Author

@PiotrSikora Thanks for the review; I addressed all your comments and pushed an extra commit. The DCO is fixed, too.

@@ -56,6 +56,43 @@ WasmResult SharedData::get(std::string_view vm_id, const std::string_view key,
return WasmResult::NotFound;
}

WasmResult SharedData::keys(std::string_view vm_id, const std::string_view key_prefix,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but is there any reason to reuse the same VM ID other than sharing KV and Queues? (ignoring the fact that VM ID in Envoy defaults to an empty string, so it's an opt-out instead of opt-in from sharing - which wasn't fixed as part of envoyproxy/envoy-wasm#167)

Basically, I'm asking if this is going to be needed once there is support for multiple KV stores?

Signed-off-by: James Mulcahy <[email protected]>
@jamesmulcahy
Copy link
Contributor Author

v8 macos failure is:

Traceback (most recent call last):
	File "/private/var/tmp/_bazel_runner/f7b5b126cb65bf12475e292acf07553d/external/local_config_cc/cc_toolchain_config.bzl", line 481, column 61, in _impl
		flags = _deterministic_libtool_flags(ctx) + [
	File "/private/var/tmp/_bazel_runner/f7b5b126cb65bf12475e292acf07553d/external/local_config_cc/cc_toolchain_config.bzl", line 53, column 38, in _deterministic_libtool_flags
		if _can_use_deterministic_libtool(ctx):
	File "/private/var/tmp/_bazel_runner/f7b5b126cb65bf12475e292acf07553d/external/local_config_cc/cc_toolchain_config.bzl", line 45, column 25, in _can_use_deterministic_libtool
		if _compare_versions(xcode_version, _SUPPORTS_DETERMINISTIC_MODE) >= 0:
	File "/private/var/tmp/_bazel_runner/f7b5b126cb65bf12475e292acf07553d/external/local_config_cc/cc_toolchain_config.bzl", line 38, column 15, in _compare_versions
		return dv1.compare_to(apple_common.dotted_version(v2))
Error: 'NoneType' value has no field or method 'compare_to'
ERROR: /private/var/tmp/_bazel_runner/f7b5b126cb65bf12475e292acf07553d/external/local_config_cc/BUILD:84:24: Analysis of target '@local_config_cc//:ios_armv7' failed
ERROR: Analysis of target '//test:utility_lib' failed; build aborted: 

Which I don't think is related to this PR?

@PiotrSikora
Copy link
Member

@jamesmulcahy could you merge master branch?

@jamesmulcahy
Copy link
Contributor Author

@PiotrSikora Done!

@jamesmulcahy
Copy link
Contributor Author

@PiotrSikora PR updated to remove key_prefix

Copy link
Member

@PiotrSikora PiotrSikora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@PiotrSikora PiotrSikora changed the title Add keys() and remove() methods to SharedData Add keys() and remove() methods to SharedData. Feb 1, 2022
@PiotrSikora PiotrSikora merged commit 819dcc0 into proxy-wasm:master Feb 1, 2022
@balbifm
Copy link

balbifm commented Feb 13, 2022

@PiotrSikora I added an extra commit to expose this through the Context. I've for some foreign functions wired up for this now, and confirm it's working. It'd be great to get this merged.

I'll polish up the foreign functions and post a gist somewhere so others can take a similar approach, until we can safely update the ABI.

Hi @jamesmulcahy, nice work here! I've been waiting for a fix for this for a couple of months. Do you, by any chance, have an example or hint on how to use this functionality through FFI? Not sure how to integrate this with current SDKs without the exports. Thanks!

@jamesmulcahy
Copy link
Contributor Author

@balbifm Sure. I've created an envoy extension which we compile into our build. I haven't yet updated the extension for the last round of changes made during review (specifically, removal of the key_prefix argument to keys(), and adding the result object to remove()), but this is a gist of the extension from before those changes. Once this is registered in your build, you can call these methods through the Foreign Function interface in whatever language SDK you're using.

https://gist.github.com/jamesmulcahy/03a656b2a4e564993e6f044672698e2b

@while1malloc0
Copy link

@jamesmulcahy Thanks so much for implementing this. I'm trying to use the WASM filter in a memory-constrained environment, and this functionality is really helpful there.

Since it's been a few months since this was merged, I wanted to see if the plan was still to add these new methods to exports so that the SDKs can easily wrap them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants