-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Port FSTLevelDBKey to C++ #937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
39c4019
to
43c41d1
Compare
These operations universally operate on keys obtained from leveldb so it's actually unhelpful to force all the callers to make absl::string_views from them.
43c41d1
to
16a9bc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor nits.
* Creates a key prefix that points just before the first key for the given | ||
* user_id. | ||
*/ | ||
static std::string KeyPrefix(const absl::string_view user_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove const
from declarations (I think applies to all usage of absl::string_view
as a parameter in this file). (I would guess these were const references at some point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. They weren't const references before, but this was a style that Zhi was applying that I carried over.
* Decodes the given complete key, storing the decoded values as properties of | ||
* the receiver. | ||
* | ||
* @return true if the key successfully decoded, NO otherwise. If NO is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "NO" come from an Objective-C comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry: BOOL, YES, and NO are all the Objective-C spellings of bool, true, and false. These files were copied from Objective-C and then massaged until they worked.
Fixed.
* | ||
* @return true if the key successfully decoded, NO otherwise. If NO is | ||
* returned, the properties of the receiver are in an undefined state until | ||
* the next call to -decode_key:. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"-decode_key:" looks a little weird. There are several similar "-foo:" names in comments in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
static std::string KeyPrefix(); | ||
|
||
/** | ||
* Creates a complete key that points to a specific document. The document_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: there is mismatch between "document_key" here and the name of the argument which is just key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. This should have been document_key
. This was the only instance of this I could find.
* Key()). | ||
* | ||
* @return true if the key successfully decoded, false otherwise. If false is | ||
* returned, the properties of the receiver are in an undefined state until |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is "receiver" a carryover from Objective-C?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Fixed.
* BatchId is a locally assigned ID for a batch of mutations that have been | ||
* applied. | ||
*/ | ||
typedef int32_t BatchId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: consider using using
instead of typedef
(style guide recommends using
for new code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -48,6 +48,11 @@ std::string ImmediateSuccessor(absl::string_view s) { | |||
return out; | |||
} | |||
|
|||
bool StartsWith(const std::string &value, const std::string &prefix) { | |||
return prefix.size() <= value.size() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use absl::StartsWith.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* natively has concepts of tables and indexes. Where there's overlap, a comment | ||
* denotes the server value from the storage_format_internal.proto. | ||
*/ | ||
enum ComponentLabel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: ToTW 86 recommends enum class
for new code. If you feel it's more verbose for little gain, I'm fine with plain enum as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I explicitly chose an old-school enum
specifically because it's weakly typed and implicitly converts to an integer. I don't see much value in forcing static_cast
in every caller.
bool LevelDbMutationKey::Decode(leveldb::Slice key) { | ||
user_id_.clear(); | ||
|
||
return ReadTableNameMatching(&key, kMutationsTable) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish some of this boilerplate could be reduced, but don't really have a good suggestion. The best I can come up with is having a helper method that represents each of the Key classes as a tuple, and then Encode/Decode methods could delegate to a variadic template function like void LevelDbMutationKey::Encode() { EncodeImpl(AsTuple(*this)); }
However, it would require too much reworking, and will also increase binary size somewhat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with a template mechanism each of these function invocations becomes an instantiation of the placeholder in the tuple. At best I think it ends up removing the "Read", the "&key, " and changing the "&&" into "," which isn't really enough savings to justify this kind of reengineering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gain would be removing the duplication between Decode
and Key
(the common source for errors in serializing/deserializing is when those get out of sync) - both would delegate the tuple representation to a common helper. But yeah, it still won't outweigh the downsides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. That's a significant chunk of work and I'm still not entirely sure how I'd make partial keys work there.
// it's vital that partial segments in common not be prefixes. | ||
ASSERT_TRUE(util::StartsWith(foo_user_key, table_key)); | ||
|
||
// Here even though "" is a prefix of "foo" that prefix is within a segment so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: commas before "that" and before "so".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
static std::string KeyPrefix(const absl::string_view user_id); | ||
|
||
/** Creates a complete key that points to a specific user_id and batch_id. */ | ||
static std::string Key(const absl::string_view user_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, a little, but feel free to leave as is. My take is that these FooKey
classes are domain objects that are serialized to/from strings. But it's a small enough nitpick to just leave as is.
* returned, the properties of the receiver are in an undefined state until | ||
* the next call to -decode_key:. | ||
*/ | ||
bool Decode(leveldb::Slice key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that in general, so far it seems that our code base leaves towards immutable classes, so an outlier stands out. Thanks for expanding the comments.
|
||
private: | ||
std::string user_id_; | ||
model::BatchId batch_id_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an uninitialized variable always has a potential of becoming an issue in the future in case code is refactored. There would be indeed a double initialization, but is this really so performance-critical that the cost of initializing an int is a concern (string
is probably more expensive to initialize anyway)? OTOH, since this is by design in current code, I'm okay if you leave it as is (perhaps add a short comment like "Deliberately uninitialized - will be deserialized").
* A key in the mutation_queues table. | ||
* | ||
* Note that where mutation_queues contains one row about each queue, mutations | ||
* contains the actual mutation batches themselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that, thanks.
static std::string KeyPrefix(const model::ResourcePath& resource_path); | ||
|
||
/** | ||
* Decodes the contents of a remote document key into properties on this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that. I like the new version.
* If the read is successful, returns true and contents will be updated to the | ||
* next unread byte. | ||
*/ | ||
bool ReadComponentLabelMatching(leveldb::Slice *contents, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are local to this file, I'm fine with leaving it be.
leveldb::Slice complete_segments = *contents; | ||
|
||
std::string segment; | ||
std::vector<std::string> path_segments{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just redundant in this case, but feel free to leave them in, it's no big deal.
Default- vs value-initialization matters for primitives and for classes with no default constructor containing primitives:
struct Pod1 { int x; };
Pod1 foo;
assert(foo.x == 0); // Undefined; x is uninitialized
Pod1 bar{};
assert(bar.x == 0); // True
struct Pod2 { int x = 42; };
Pod2 spam;
assert(spam.x == 42); // True
Pod2 eggs{};
assert(eggs.x == 42); // True
But both vector
and string
are well-behaved classes.
bool LevelDbMutationKey::Decode(leveldb::Slice key) { | ||
user_id_.clear(); | ||
|
||
return ReadTableNameMatching(&key, kMutationsTable) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gain would be removing the duplication between Decode
and Key
(the common source for errors in serializing/deserializing is when those get out of sync) - both would delegate the tuple representation to a common helper. But yeah, it still won't outweigh the downsides.
3b40fa7
to
e5cffe2
Compare
This is just the main class and tests but none of the usages. I'll migrate those separately.