Add keys() and remove() methods to SharedData. #203

jamesmulcahy · 2021-12-02T23:56:44Z

No description provided.

jamesmulcahy · 2021-12-02T23:58:57Z

I'm proposing this as I would like to be able to asynchronously garbage collect the VM's shared data, and right now, there's no way to do that. If I use SharedData, it'll grow indefinitely in size.

One alternative, is to persist a fixed cardinality of objects, and GC data within those objects, mutating them within each request -- but that is less performant overall -- each request is going to have to deserialize a large amount of unrelated data.

If i've overlooked some APIs that suit my use case, please let me know -- but on the face of it, this seems like a useful addition.

Signed-off-by: James Mulcahy <[email protected]>

jamesmulcahy · 2021-12-03T00:00:45Z

re: CLA, I've joined the appropriate Netflix google CLA group, but I guess that change hasn't propagated yet. If the check gets re-run shortly, I'd expect it to pass.

jamesmulcahy · 2021-12-09T00:46:06Z

/retest

PiotrSikora · 2021-12-09T05:39:29Z

@jamesmulcahy it looks that the cla/google check needs to have both your email and GitHub login on file (see: https://github.com/proxy-wasm/proxy-wasm-cpp-host/pull/203/checks?check_run_id=4466687349).

Could you add your GitHub login via https://cla.developers.google.com/clas?

jamesmulcahy · 2021-12-09T05:58:20Z

I'm covered by the corporate CLA so don't think that applies, but on re-reading the instructions, I was missing this step

"The email used to register you as an authorized contributor must also be attached to your GitHub account."

I've done that now, so hopefully it'll work on a re-run!

Signed-off-by: James Mulcahy <[email protected]>

jamesmulcahy · 2021-12-10T22:59:17Z

@PiotrSikora Tests all passing now, ready for a review!

rahulanand16nov · 2021-12-13T14:20:03Z

Any plans on propagating this to the SDKs?

cc @unleashed

jamesmulcahy · 2021-12-15T17:00:45Z

@rahulanand16nov Yes, that'd be my hope. I don't know what the typical process is here, but it seemed like getting the API in at this level was a good first step.

jamesmulcahy · 2022-01-03T21:49:03Z

@PiotrSikora I'd welcome your input/guidance on next steps here

danielpcox · 2022-01-05T21:53:36Z

I'm interested in this one. SharedData is more useful (to me) if there's a way for it not to grow indefinitely.

jamesmulcahy · 2022-01-17T19:55:48Z

@PiotrSikora I added an extra commit to expose this through the Context. I've for some foreign functions wired up for this now, and confirm it's working. It'd be great to get this merged.

I'll polish up the foreign functions and post a gist somewhere so others can take a similar approach, until we can safely update the ABI.

src/shared_data.cc

PiotrSikora · 2022-01-24T08:03:48Z

src/shared_data.cc

@@ -56,6 +56,43 @@ WasmResult SharedData::get(std::string_view vm_id, const std::string_view key,
  return WasmResult::NotFound;
 }

+WasmResult SharedData::keys(std::string_view vm_id, const std::string_view key_prefix,


The key_prefix is rather unusual for keys() method. What's the reason for including it here parameter? Is this a workaround for a single KV store in current version of Proxy-Wasm or is there another reason?

Since SharedData is shared by all plugins within a given VM ID, the guidance is to prefix your keys with a value specific to your plugin (I've seen this written somewhere, but I can't find it to reference right now). By offering this in the API, we simplify the job of the plugin writer, and encourage a best-practice of only working with their own data-set.

Without this, the first thing the plugins will have to do is filter out any keys that aren't relevant to them (or, worst case, they remove data belonging to another plugin).

Right, but is there any reason to reuse the same VM ID other than sharing KV and Queues? (ignoring the fact that VM ID in Envoy defaults to an empty string, so it's an opt-out instead of opt-in from sharing - which wasn't fixed as part of envoyproxy/envoy-wasm#167)

Basically, I'm asking if this is going to be needed once there is support for multiple KV stores?

Keep memory usage as low as possible was our motivation. That said, looking at some graphs right now, it looks like we see significant additional virtual memory usage as the number of WASM VMs grow (~10G/VM !?), we're not actually seeing all that much real memory usage.

The docs don't give much detail on resource utilization/behavior -- so perhaps my assumption here was wrong.

What are your expectations for increase in memory utilization per VM? (We're using V8 right now, for what it's worth).

WasmVMs cannot load multiple Wasm modules (ignoring module linking), similarly to how you cannot load multiple binaries in the same process. As such, VM ID is only a hint, and multiple instances of the same Proxy-Wasm plugin refer to the same WasmVM only if they have the same VM ID and plugin configuration. Different Proxy-Wasm plugins are always loaded in different WasmVMs, even if they provide the same VM ID.

As such, my question about the key prefix remains - do you have any reason to reuse the same VM ID across different plugins? Right now, it seems like we're adding the key prefix as a workaround for the poor default (VM ID sharing), which could be solved by simply using different VM IDs for different plugins.

As for the memory usage, I didn't look at it in a while, but 10 GiB VSZ per WasmVM sounds about right with V8, and RSS should be around 1 MiB per WasmVM (+ fixed V8 overhead that should be around 10 MiB per process).

V8 uses virtual memory for 2 reasons:

Guard pages, which allow using signal handlers instead of explicit bound checks, which results in significant performance gain (see: v8: use signal handlers to catch out of bounds memory access. #144).

Pointer compression, which reduces overall memory consumption, and improves performance (although I'm not sure if and how much does it affect Wasm), see: https://v8.dev/blog/pointer-compression.

I hadn't appreciated that the plugin configuration was used to determine whether a distinct VM was created. With that in mind, I don't think the key prefix is necessary, no.

I'll update the PR to remove it. Thanks for the info!

Sorry, I misspoke. All instances of the same Proxy-Wasm plugin with the same VM ID and VM configuration will be attached to the same WasmVM, regardless of the plugin configuration.

Basically, WasmVM's key is derived from: plugin's bytecode, VM ID and VM configuration.

Let me know if that changes anything wrt key prefix.

PiotrSikora · 2022-01-24T08:07:09Z

Also, it looks that your last commit is missing Signed-off-by line, so DCO bot is broken.

Signed-off-by: James Mulcahy <[email protected]>

jamesmulcahy · 2022-01-26T20:45:25Z

@PiotrSikora Thanks for the review; I addressed all your comments and pushed an extra commit. The DCO is fixed, too.

src/shared_data.cc

PiotrSikora · 2022-01-28T05:59:55Z

src/shared_data.cc

@@ -56,6 +56,43 @@ WasmResult SharedData::get(std::string_view vm_id, const std::string_view key,
  return WasmResult::NotFound;
 }

+WasmResult SharedData::keys(std::string_view vm_id, const std::string_view key_prefix,


Right, but is there any reason to reuse the same VM ID other than sharing KV and Queues? (ignoring the fact that VM ID in Envoy defaults to an empty string, so it's an opt-out instead of opt-in from sharing - which wasn't fixed as part of envoyproxy/envoy-wasm#167)

Basically, I'm asking if this is going to be needed once there is support for multiple KV stores?

Signed-off-by: James Mulcahy <[email protected]>

jamesmulcahy · 2022-01-31T06:06:42Z

v8 macos failure is:

Traceback (most recent call last):
	File "/private/var/tmp/_bazel_runner/f7b5b126cb65bf12475e292acf07553d/external/local_config_cc/cc_toolchain_config.bzl", line 481, column 61, in _impl
		flags = _deterministic_libtool_flags(ctx) + [
	File "/private/var/tmp/_bazel_runner/f7b5b126cb65bf12475e292acf07553d/external/local_config_cc/cc_toolchain_config.bzl", line 53, column 38, in _deterministic_libtool_flags
		if _can_use_deterministic_libtool(ctx):
	File "/private/var/tmp/_bazel_runner/f7b5b126cb65bf12475e292acf07553d/external/local_config_cc/cc_toolchain_config.bzl", line 45, column 25, in _can_use_deterministic_libtool
		if _compare_versions(xcode_version, _SUPPORTS_DETERMINISTIC_MODE) >= 0:
	File "/private/var/tmp/_bazel_runner/f7b5b126cb65bf12475e292acf07553d/external/local_config_cc/cc_toolchain_config.bzl", line 38, column 15, in _compare_versions
		return dv1.compare_to(apple_common.dotted_version(v2))
Error: 'NoneType' value has no field or method 'compare_to'
ERROR: /private/var/tmp/_bazel_runner/f7b5b126cb65bf12475e292acf07553d/external/local_config_cc/BUILD:84:24: Analysis of target '@local_config_cc//:ios_armv7' failed
ERROR: Analysis of target '//test:utility_lib' failed; build aborted:

Which I don't think is related to this PR?

PiotrSikora · 2022-01-31T06:23:32Z

@jamesmulcahy could you merge master branch?

jamesmulcahy · 2022-01-31T06:25:27Z

@PiotrSikora Done!

Signed-off-by: James Mulcahy <[email protected]>

jamesmulcahy · 2022-02-01T07:01:22Z

@PiotrSikora PR updated to remove key_prefix

PiotrSikora

Thanks!

balbifm · 2022-02-13T17:40:38Z

@PiotrSikora I added an extra commit to expose this through the Context. I've for some foreign functions wired up for this now, and confirm it's working. It'd be great to get this merged.

I'll polish up the foreign functions and post a gist somewhere so others can take a similar approach, until we can safely update the ABI.

Hi @jamesmulcahy, nice work here! I've been waiting for a fix for this for a couple of months. Do you, by any chance, have an example or hint on how to use this functionality through FFI? Not sure how to integrate this with current SDKs without the exports. Thanks!

jamesmulcahy · 2022-02-14T04:55:05Z

@balbifm Sure. I've created an envoy extension which we compile into our build. I haven't yet updated the extension for the last round of changes made during review (specifically, removal of the key_prefix argument to keys(), and adding the result object to remove()), but this is a gist of the extension from before those changes. Once this is registered in your build, you can call these methods through the Foreign Function interface in whatever language SDK you're using.

https://gist.github.com/jamesmulcahy/03a656b2a4e564993e6f044672698e2b

while1malloc0 · 2022-08-25T12:21:13Z

@jamesmulcahy Thanks so much for implementing this. I'm trying to use the WASM filter in a memory-constrained environment, and this functionality is really helpful there.

Since it's been a few months since this was merged, I wanted to see if the plan was still to add these new methods to exports so that the SDKs can easily wrap them.

jamesmulcahy requested review from mathetake and PiotrSikora as code owners December 2, 2021 23:56

Add keys() and remove() methods to SharedData

2ed596b

Signed-off-by: James Mulcahy <[email protected]>

jamesmulcahy force-pushed the shared-data-keys branch from 3e7e07d to 2ed596b Compare December 3, 2021 00:00

Fix formatting with clang-format

cb1b36a

Signed-off-by: James Mulcahy <[email protected]>

jamesmulcahy force-pushed the shared-data-keys branch from 61bc158 to cb1b36a Compare December 9, 2021 06:03

PiotrSikora requested changes Jan 24, 2022

View reviewed changes

jamesmulcahy added 2 commits January 26, 2022 12:40

Expose SharedData keys & remove methods in Context

083f606

Signed-off-by: James Mulcahy <[email protected]>

Address review comments

eba79a5

Signed-off-by: James Mulcahy <[email protected]>

jamesmulcahy force-pushed the shared-data-keys branch from 004edbb to eba79a5 Compare January 26, 2022 20:41

PiotrSikora reviewed Jan 28, 2022

View reviewed changes

Remove debug printf call

d9d3c9a

Signed-off-by: James Mulcahy <[email protected]>

Merge remote-tracking branch 'upstream/master' into shared-data-keys

799254e

Remove key_prefix from keys() call

dbf1234

Signed-off-by: James Mulcahy <[email protected]>

jamesmulcahy force-pushed the shared-data-keys branch from 8c05d4c to dbf1234 Compare February 1, 2022 07:00

PiotrSikora approved these changes Feb 1, 2022

View reviewed changes

PiotrSikora changed the title ~~Add keys() and remove() methods to SharedData~~ Add keys() and remove() methods to SharedData. Feb 1, 2022

PiotrSikora merged commit 819dcc0 into proxy-wasm:master Feb 1, 2022

mathetake mentioned this pull request Aug 30, 2022

Add the host function for removing shared data proxy-wasm/spec#32

Open

PiotrSikora mentioned this pull request Jul 14, 2023

Is there any way to evict shared data? envoyproxy/envoy#28339

Closed

Add keys() and remove() methods to SharedData. #203

Add keys() and remove() methods to SharedData. #203

Uh oh!

Conversation

jamesmulcahy commented Dec 2, 2021

Uh oh!

jamesmulcahy commented Dec 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamesmulcahy commented Dec 3, 2021

Uh oh!

jamesmulcahy commented Dec 9, 2021

Uh oh!

PiotrSikora commented Dec 9, 2021

Uh oh!

jamesmulcahy commented Dec 9, 2021

Uh oh!

jamesmulcahy commented Dec 10, 2021

Uh oh!

rahulanand16nov commented Dec 13, 2021

Uh oh!

jamesmulcahy commented Dec 15, 2021

Uh oh!

jamesmulcahy commented Jan 3, 2022

Uh oh!

danielpcox commented Jan 5, 2022

Uh oh!

jamesmulcahy commented Jan 17, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PiotrSikora commented Jan 24, 2022

Uh oh!

jamesmulcahy commented Jan 26, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesmulcahy commented Jan 31, 2022

Uh oh!

PiotrSikora commented Jan 31, 2022

Uh oh!

jamesmulcahy commented Jan 31, 2022

Uh oh!

jamesmulcahy commented Feb 1, 2022

Uh oh!

PiotrSikora left a comment

Choose a reason for hiding this comment

Uh oh!

balbifm commented Feb 13, 2022

Uh oh!

jamesmulcahy commented Feb 14, 2022

Uh oh!

while1malloc0 commented Aug 25, 2022

Uh oh!

Uh oh!

jamesmulcahy commented Dec 2, 2021 •

edited

Loading