Skip to content

Add keys() and remove() methods to SharedData. #203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 1, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions include/proxy-wasm/context.h
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,10 @@ class ContextBase : public RootInterface,
WasmResult getSharedData(std::string_view key,
std::pair<std::string, uint32_t /* cas */> *data) override;
WasmResult setSharedData(std::string_view key, std::string_view value, uint32_t cas) override;
WasmResult getSharedDataKeys(std::string_view key_prefix,
std::vector<std::string> *result) override;
WasmResult removeSharedDataKey(std::string_view key, uint32_t cas,
std::pair<std::string, uint32_t> *result) override;

// Shared Queue
WasmResult registerSharedQueue(std::string_view queue_name,
Expand Down
19 changes: 19 additions & 0 deletions include/proxy-wasm/context_interface.h
Original file line number Diff line number Diff line change
Expand Up @@ -621,6 +621,25 @@ struct SharedDataInterface {
* @param data is a location to store the returned value.
*/
virtual WasmResult setSharedData(std::string_view key, std::string_view value, uint32_t cas) = 0;

/**
* Return all the keys which match the given key_prefix
* @param key_prefix is used to restrict the results to keys matching the given prefix
* @param data is a location to store the returned value.
*/
virtual WasmResult getSharedDataKeys(std::string_view key_prefix,
std::vector<std::string> *result) = 0;

/**
* Removes the given key from the data shared between VMs.
* @param key is a proxy-wide key mapping to the shared data value.
* @param cas is a compare-and-swap value. If it is zero it is ignored, otherwise it must match
* @param cas is a location to store value, and cas number, associated with the removed key
* the cas associated with the value.
*/
virtual WasmResult
removeSharedDataKey(std::string_view key, uint32_t cas,
std::pair<std::string /* value */, uint32_t /* cas */> *result) = 0;
}; // namespace proxy_wasm

struct SharedQueueInterface {
Expand Down
10 changes: 10 additions & 0 deletions src/context.cc
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,16 @@ WasmResult ContextBase::setSharedData(std::string_view key, std::string_view val
return getGlobalSharedData().set(wasm_->vm_id(), key, value, cas);
}

WasmResult ContextBase::getSharedDataKeys(std::string_view key_prefix,
std::vector<std::string> *result) {
return getGlobalSharedData().keys(wasm_->vm_id(), key_prefix, result);
}

WasmResult ContextBase::removeSharedDataKey(std::string_view key, uint32_t cas,
std::pair<std::string, uint32_t> *result) {
return getGlobalSharedData().remove(wasm_->vm_id(), key, cas, result);
}

// Shared Queue

WasmResult ContextBase::registerSharedQueue(std::string_view queue_name,
Expand Down
45 changes: 45 additions & 0 deletions src/shared_data.cc
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,25 @@ WasmResult SharedData::get(std::string_view vm_id, const std::string_view key,
return WasmResult::NotFound;
}

WasmResult SharedData::keys(std::string_view vm_id, const std::string_view key_prefix,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key_prefix is rather unusual for keys() method. What's the reason for including it here parameter? Is this a workaround for a single KV store in current version of Proxy-Wasm or is there another reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SharedData is shared by all plugins within a given VM ID, the guidance is to prefix your keys with a value specific to your plugin (I've seen this written somewhere, but I can't find it to reference right now). By offering this in the API, we simplify the job of the plugin writer, and encourage a best-practice of only working with their own data-set.

Without this, the first thing the plugins will have to do is filter out any keys that aren't relevant to them (or, worst case, they remove data belonging to another plugin).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but is there any reason to reuse the same VM ID other than sharing KV and Queues? (ignoring the fact that VM ID in Envoy defaults to an empty string, so it's an opt-out instead of opt-in from sharing - which wasn't fixed as part of envoyproxy/envoy-wasm#167)

Basically, I'm asking if this is going to be needed once there is support for multiple KV stores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep memory usage as low as possible was our motivation. That said, looking at some graphs right now, it looks like we see significant additional virtual memory usage as the number of WASM VMs grow (~10G/VM !?), we're not actually seeing all that much real memory usage.

The docs don't give much detail on resource utilization/behavior -- so perhaps my assumption here was wrong.

What are your expectations for increase in memory utilization per VM? (We're using V8 right now, for what it's worth).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WasmVMs cannot load multiple Wasm modules (ignoring module linking), similarly to how you cannot load multiple binaries in the same process. As such, VM ID is only a hint, and multiple instances of the same Proxy-Wasm plugin refer to the same WasmVM only if they have the same VM ID and plugin configuration. Different Proxy-Wasm plugins are always loaded in different WasmVMs, even if they provide the same VM ID.

As such, my question about the key prefix remains - do you have any reason to reuse the same VM ID across different plugins? Right now, it seems like we're adding the key prefix as a workaround for the poor default (VM ID sharing), which could be solved by simply using different VM IDs for different plugins.

As for the memory usage, I didn't look at it in a while, but 10 GiB VSZ per WasmVM sounds about right with V8, and RSS should be around 1 MiB per WasmVM (+ fixed V8 overhead that should be around 10 MiB per process).

V8 uses virtual memory for 2 reasons:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't appreciated that the plugin configuration was used to determine whether a distinct VM was created. With that in mind, I don't think the key prefix is necessary, no.

I'll update the PR to remove it. Thanks for the info!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misspoke. All instances of the same Proxy-Wasm plugin with the same VM ID and VM configuration will be attached to the same WasmVM, regardless of the plugin configuration.

Basically, WasmVM's key is derived from: plugin's bytecode, VM ID and VM configuration.

Let me know if that changes anything wrt key prefix.

std::vector<std::string> *result) {
result->clear();

std::lock_guard<std::mutex> lock(mutex_);
auto map = data_.find(std::string(vm_id));
if (map == data_.end()) {
return WasmResult::Ok;
}

for (auto kv : map->second) {
if (kv.first.rfind(key_prefix, 0) == 0) {
result->push_back(kv.first);
}
}

return WasmResult::Ok;
}

WasmResult SharedData::set(std::string_view vm_id, std::string_view key, std::string_view value,
uint32_t cas) {
std::lock_guard<std::mutex> lock(mutex_);
Expand All @@ -78,4 +97,30 @@ WasmResult SharedData::set(std::string_view vm_id, std::string_view key, std::st
return WasmResult::Ok;
}

WasmResult SharedData::remove(std::string_view vm_id, std::string_view key, uint32_t cas,
std::pair<std::string, uint32_t> *result) {
std::lock_guard<std::mutex> lock(mutex_);
std::unordered_map<std::string, std::pair<std::string, uint32_t>> *map;
auto map_it = data_.find(std::string(vm_id));
if (map_it == data_.end()) {
return WasmResult::NotFound;
} else {
map = &map_it->second;
}

auto it = map->find(std::string(key));
if (it != map->end()) {
if (cas && cas != it->second.second) {
return WasmResult::CasMismatch;
}
if (result != nullptr) {
// printf("Here w/ %s", it->second.first.c_str());
*result = it->second;
}
map->erase(it);
return WasmResult::Ok;
}
return WasmResult::NotFound;
}

} // namespace proxy_wasm
4 changes: 4 additions & 0 deletions src/shared_data.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,12 @@ class SharedData {
SharedData(bool register_vm_id_callback = true);
WasmResult get(std::string_view vm_id, const std::string_view key,
std::pair<std::string, uint32_t> *result);
WasmResult keys(std::string_view vm_id, const std::string_view key_prefix,
std::vector<std::string> *result);
WasmResult set(std::string_view vm_id, std::string_view key, std::string_view value,
uint32_t cas);
WasmResult remove(std::string_view vm_id, const std::string_view key, uint32_t cas,
std::pair<std::string, uint32_t> *result);
void deleteByVmId(std::string_view vm_id);

private:
Expand Down
47 changes: 46 additions & 1 deletion test/shared_data_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,24 @@ namespace proxy_wasm {

TEST(SharedData, SingleThread) {
SharedData shared_data(false);
std::string_view vm_id = "id";

// Validate we get an 'Ok' response when fetching keys before anything
// is initialized.
std::vector<std::string> keys;
EXPECT_EQ(WasmResult::Ok, shared_data.keys(vm_id, "", &keys));
EXPECT_EQ(0, keys.size());

// Validate that we clear the result set
std::vector<std::string> nonEmptyKeys(2);
nonEmptyKeys[0] = "valueA";
nonEmptyKeys[1] = "valueB";
EXPECT_EQ(WasmResult::Ok, shared_data.keys(vm_id, "", &nonEmptyKeys));
EXPECT_EQ(0, nonEmptyKeys.size());

std::pair<std::string, uint32_t> result;
EXPECT_EQ(WasmResult::NotFound, shared_data.get("non-exist", "non-exists", &result));

std::string_view vm_id = "id";
std::string_view key = "key";
std::string_view value = "1";
EXPECT_EQ(WasmResult::Ok, shared_data.set(vm_id, key, value, 0));
Expand All @@ -44,6 +58,37 @@ TEST(SharedData, SingleThread) {
EXPECT_EQ(WasmResult::Ok, shared_data.get(vm_id, key, &result));
EXPECT_EQ(value, result.first);
EXPECT_EQ(result.second, 3);

EXPECT_EQ(WasmResult::Ok, shared_data.keys(vm_id, "unmatched-prefix", &keys));
EXPECT_EQ(0, keys.size());

EXPECT_EQ(WasmResult::Ok, shared_data.keys(vm_id, "keyyyyy", &keys));
EXPECT_EQ(0, keys.size());

EXPECT_EQ(WasmResult::Ok, shared_data.keys(vm_id, "ke", &keys));
EXPECT_EQ(1, keys.size());
EXPECT_EQ(key, keys[0]);

keys.clear();
EXPECT_EQ(WasmResult::CasMismatch, shared_data.remove(vm_id, key, 911, nullptr));
EXPECT_EQ(WasmResult::Ok, shared_data.keys(vm_id, "ke", &keys));
EXPECT_EQ(1, keys.size());

EXPECT_EQ(WasmResult::Ok, shared_data.remove(vm_id, key, 0, nullptr));
EXPECT_EQ(WasmResult::NotFound, shared_data.get(vm_id, key, &result));

EXPECT_EQ(WasmResult::NotFound, shared_data.remove(vm_id, "non-existent_key", 0, nullptr));

EXPECT_EQ(WasmResult::Ok, shared_data.set(vm_id, key, value, 0));
EXPECT_EQ(WasmResult::Ok, shared_data.set(vm_id, key, value, 0));
EXPECT_EQ(WasmResult::Ok, shared_data.get(vm_id, key, &result));

uint32_t expectedCasValue = result.second;

std::pair<std::string, uint32_t> removeResult;
EXPECT_EQ(WasmResult::Ok, shared_data.remove(vm_id, key, 0, &removeResult));
EXPECT_EQ(value, removeResult.first);
EXPECT_EQ(removeResult.second, expectedCasValue);
}

void incrementData(SharedData *shared_data, std::string_view vm_id, std::string_view key) {
Expand Down