-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Serialization] Add a helper for serializing nodes in an object graph #26800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I promise I'm almost done with these Serialization cleanup commits. @swift-ci Please test |
@swift-ci Please test compiler performance |
lib/Serialization/Serialization.h
Outdated
/// \tparam RecordCode_ The code for the offsets record in the Index block for | ||
/// referring to these entities. | ||
template <typename T, typename ID, unsigned RecordCode_> | ||
class Serialized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is slightly misleading here (although the doc comment clarifies the intent) -- the class is called "Serialized" even though it keeps track of both things that have already been serialized and things that need to be serialized. I don't have a good alternative in mind though 🙁. Perhaps SerializationBookkeeper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't love the name either. I'll go with yours if I can't think of anything better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this class does—issuing ID numbers, looking up IDs for entities, keeping track of work to be done—seems clerical to me. Clerk
? Recorder
(in the sense of "county recorder") or RecordKeeper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more potential name SerializationState
. I like Brent's RecordKeeper
suggestion too.
lib/Serialization/Serialization.h
Outdated
} | ||
|
||
/// Returns the next entity to be written. | ||
T peekNext() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential crash if someone forgot to check hasMoreToSerialize()
before calling peekNext()
-- instead we could combine the two functions and return optional<T>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but I don't think it improves the clarity of code at these particular use sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to document this precondition, though.
/// Records that the next entity will be written at \p offset, and returns | ||
/// it so it can be written. | ||
T popNext(BitOffset offset) { | ||
T result = EntitiesToWrite.front(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above -- potential crash if someone forgot to check hasMoreToSerialize()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see -- this function is only being called in a particular context where you check the condition so perhaps this is fine...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make it an explicit assertion instead of relying on the STL to catch it. Would that help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an optional (1) is safe, (2) let's you shorten the while loops (except the one case which uses peekNext()
) :
while (DeclsToSerialize.hasMoreToSerialize()) {
auto next = DeclsToSerialize.popNext(Out.GetCurrentBitNo());
writeDecl(next);
}
becomes
while (auto next = DeclsToSerialize.popNext(Out.getCurrentBitNo()))
writeDecl(next.getValue());
and (3) let's you put an assertion when forcibly popping, so
(void)GenericEnvironmentsToSerialize.popNext((genericSigID << 1) | 0x01);
becomes
assert(GenericEnvironmentsToSerialize.popNext((genericSigID << 1) | 0x01).hasValue() && "[add justification here]");
Seems like a win-win -- what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-/ Take a closer look at writeNextGenericEnvironment. (I wouldn't have added peek
at all if not for that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, you don't propose to get rid of hasMoreToSerialize
. All right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring like this really clarifies the code! I have a couple of suggestions, but they're optional because they might verge on being too clever.
lib/Serialization/Serialization.h
Outdated
/// \tparam RecordCode_ The code for the offsets record in the Index block for | ||
/// referring to these entities. | ||
template <typename T, typename ID, unsigned RecordCode_> | ||
class Serialized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this class does—issuing ID numbers, looking up IDs for entities, keeping track of work to be done—seems clerical to me. Clerk
? Recorder
(in the sense of "county recorder") or RecordKeeper
?
lib/Serialization/Serialization.h
Outdated
auto &entityID = IDs[entity]; | ||
if (entityID == ID()) { | ||
EntitiesToWrite.push(entity); | ||
entityID = ++LastID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LastID
is always the same as IDs.size()
, isn't it? And it looks like all of the DenseMap
implementations keep the size in a member variable, so it shouldn't be slow to access. Dropping LastID
would save you four bytes and be one less invariant for someone to break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point. This one I'll take. :-)
lib/Serialization/Serialization.cpp
Outdated
DeclsAndTypesToWrite.pop(); | ||
while (DeclsToSerialize.hasMoreToSerialize()) { | ||
auto next = DeclsToSerialize.popNext(Out.GetCurrentBitNo()); | ||
writeDecl(next); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to incorporate these repetitive loops into Serialized
somehow. For instance, maybe you could rename writeDecl
, writeType
, etc. to just writeEntity
, overloaded by parameter type, and then write:
/// Return value: True if this call did no work; false otherwise.
template <typename RecordWriter>
bool Serialized::writeMore(RecordWriter &writer, llvm::BitstreamWriter &Out) {
if (!hasMoreToSerialize())
return true;
do {
auto next = popNext(Out.GetCurrentBitNo());
writer.writeEntity(next);
} while (hasMoreToSerialize());
return false;
}
With that, the loop in writeAllDeclsAndTypes()
condenses to something like:
bool quiescent;
do {
quiescent = DeclsToSerialize.writeMore(*this, Out);
quiescent &= TypesToSerialize.writeMore(*this, Out);
quiescent &= LocalDeclContextsToSerialize.writeMore(*this, Out);
// ...and so on...
} while (!quiescent);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, clever. I'll look into it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat idea :). Nitpick: if "writeMore" returns "true", I would expect that it means "yes, I wrote more stuff" but we're doing the opposite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled a bit with the name for quiescent
, which is why the method returns the value it does. In any case, Jordan can figure it out. 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll come back to the full-on refactoring in a follow-up PR; it would mean eliminating the ambiguity between DeclContextsToSerialize and LocalDeclContextsToSerialize.
Compiler perf tests (still not done) |
The AST block in a compiled module represents an object graph, which is essentially serialized in four steps: - An entity (such as a Decl or Type) is encountered, given an ID, and added to a worklist. - The next entity is popped from the worklist and its offset in the output stream is recorded. - During the course of writing that entity, more entities will be referenced and added to the worklist. - Once the entire worklist is drained, the offsets get written to a table in the Index block. The implementation of this was duplicated for each kind of entity in the AST block; this commit factors that out into a reusable helper. No intended high-level functionality change, but the order in which Decls and Types get emitted might change a little now that they're not in the same queue.
@swift-ci Please smoke test |
Are we sure this bot is excluding XFAILs properly? Or is batch mode non-deterministic in how it chooses batches these days? |
The AST block in a compiled module represents an object graph, which is essentially serialized in four steps:
The implementation of this was duplicated for each kind of entity in the AST block; this commit factors that out into a reusable helper. No intended high-level functionality change, but the order in which Decls and Types get emitted might change a little now that they're not in the same queue.