-
Notifications
You must be signed in to change notification settings - Fork 10.5k
SerializeLoc: serialize basic decl source location information to .swiftsourceinfo file #27464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a20e98f
to
d7eccd3
Compare
// SECOND: Inputs/def_comments.swift:28:13: Var/decl_var_2 RawComment=none BriefComment=none DocCommentAsXML=none | ||
// SECOND: Inputs/def_comments.swift:28:25: Var/decl_var_3 RawComment=none BriefComment=none DocCommentAsXML=none | ||
// SECOND: Inputs/def_comments.swift:28:25: Var/decl_var_3 RawComment=none BriefComment=none DocCommentAsXML=none | ||
// SECOND: NonExistingSource.swift:100000:13: Func/functionAfterPoundSourceLoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rintaro as you have suggested, a test case for #sourceLocation
is added here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this include a test that it's made absolute? Even if that's not important behavior, we probably still want to know if it changes.
fd391a0
to
91f0804
Compare
@swift-ci please smoke test |
effa580
to
776cd02
Compare
@swift-ci please smoke test |
@swift-ci Please test compiler performance |
@swift-ci Please Build Toolchain macOS Platform |
include/swift/AST/RawComment.h
Outdated
struct BasicDeclLocs { | ||
StringRef SourceFilePath; | ||
Optional<LineColumn> Loc; | ||
Optional<LineColumn> NameLoc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have any decls where Loc and NameLoc are different. Is this just to account for decls without names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have separate APIs getLoc()
and getNameLoc()
for ValueDecl
, so they could potentially be different. IMO, we shouldn't hardcode that they are usually identical to the format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll flip it around: there are zero cases where getLocFromSource()
does not return getNameLoc()
if there is a getNameLoc()
, and we could put that in the AST verifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A storm of comments on just the first commit, sorry.
include/swift/AST/USRGeneration.h
Outdated
@@ -50,6 +51,10 @@ bool printAccessorUSR(const AbstractStorageDecl *D, AccessorKind AccKind, | |||
/// \returns true if it failed, false on success. | |||
bool printExtensionUSR(const ExtensionDecl *ED, raw_ostream &OS); | |||
|
|||
/// Prints out the Decl USRs suitable for keys .swiftdoc and .swiftsourceinfo files. | |||
/// \returns true if it failed, false on success. | |||
bool printDeclUSRForModuleDoc(const Decl *D, raw_ostream &OS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I don't think it makes sense to have doc stuff be the special case here. Can we rename printDeclUSR
to printValueDeclUSR
and have this just be printDeclUSR
, or printAnyDeclUSR
? (And filter out implicit decls on the caller side.)
include/swift/Frontend/Frontend.h
Outdated
std::pair<std::unique_ptr<llvm::MemoryBuffer>, | ||
std::unique_ptr<llvm::MemoryBuffer>> | ||
getInputBufferAndModuleDocBufferIfPresent(const InputFile &input); | ||
ModuleBuffers getInputBuffersIfPresent(const InputFile &input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're here, may be worth making this Optional, rather than relying on a null ModuleBuffer as a sentinel.
lib/Frontend/Frontend.cpp
Outdated
return { | ||
std::move(*inputFileOrErr), | ||
moduleDocBuffer.hasValue() ? std::move(*moduleDocBuffer): nullptr, | ||
moduleSourceInfoBuffer.hasValue() ? std::move(*moduleSourceInfoBuffer): nullptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: missing space before ternary operator colon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also simplify this with std::move(moduleDocBuffer).getValueOr(nullptr)
. (I think this is the correct nesting.)
lib/Frontend/Frontend.cpp
Outdated
std::string NonPrivatePath = moduleSourceInfoFilePath.str().str(); | ||
StringRef fileName = llvm::sys::path::filename(NonPrivatePath); | ||
llvm::sys::path::remove_filename(moduleSourceInfoFilePath); | ||
llvm::sys::path::append(moduleSourceInfoFilePath, "Private"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion, still suggest changing this to "Project", to save "Private" for something closer in meaning to "private headers".
lib/Serialization/ModuleFile.cpp
Outdated
@@ -1222,13 +1223,272 @@ bool ModuleFile::readModuleDocIfPresent() { | |||
return true; | |||
} | |||
|
|||
class ModuleFile::BasicDeclLocTableInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Can you put all this in another file? I know the doc stuff is mixed up here but it probably shouldn't be.
lib/Serialization/SourceInfoFormat.h
Outdated
/// to make backwards-compatible changes using the LLVM bitcode format. | ||
/// | ||
/// \sa DECL_LOCS_BLOCK_ID | ||
namespace sourceinfo_block { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the DECL_LOCS block or the SOURCEINFO block?
lib/Serialization/SourceInfoFormat.h
Outdated
namespace sourceinfo_block { | ||
enum RecordKind { | ||
BASIC_DECL_LOCS = 1, | ||
DECL_USRS = 96, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the jump to 96?
lib/Serialization/SerializeDoc.cpp
Outdated
@@ -502,7 +502,390 @@ void serialization::writeDocToStream(raw_ostream &os, ModuleOrSourceFile DC, | |||
|
|||
S.writeToStream(os); | |||
} | |||
namespace { | |||
struct LineColoumn { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Coloumn
lib/Serialization/SerializeDoc.cpp
Outdated
// EndLoc | ||
dataLength += LineColumnLen; | ||
endian::Writer writer(out, little); | ||
writer.write<uint32_t>(keyLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the length is constant you actually don't have to serialize it at all; returning it is sufficient.
lib/Serialization/SerializeDoc.cpp
Outdated
WRITE_LINE_COLUMN(Loc) | ||
WRITE_LINE_COLUMN(NameLoc); | ||
WRITE_LINE_COLUMN(StartLoc); | ||
WRITE_LINE_COLUMN(EndLoc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent about the semicolons here. (I liked the lambda better, honestly.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked over it quickly, but nothing jumped out at me to comment upon. (I'll miss @jrose-apple 's reviews!) Let me know if you want another pair of eyes to go over it in depth.
@jrose-apple The most recent commit refactored the data structure as we discussed to use only one hash-table. Could you take another look? |
073e397
to
5257bee
Compare
@swift-ci please smoke test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to pretend I had the attention span to read every line in detail, but it looked alright in my skim. If there's something specific you want me to examine more minutely, please let me know.
std::unique_ptr<llvm::MemoryBuffer> *ModuleBuffer, | ||
std::unique_ptr<llvm::MemoryBuffer> *ModuleDocBuffer) = 0; | ||
std::unique_ptr<llvm::MemoryBuffer> *ModuleDocBuffer, | ||
std::unique_ptr<llvm::MemoryBuffer> *ModuleSourceInfoBuffer) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These now-three files are passed together so often that I'm beginning to think we should have an abstraction to represent this grouping. ModuleFiles<StringRef> ModuleFilenames
, ModuleFiles<std::unique_ptr<llvm::MemoryBuffer> *> ModuleBuffers
.
6ab1c06
to
9ea9192
Compare
@swift-ci please smoke test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments on everything except SerializeDoc.cpp.
lib/Driver/Driver.cpp
Outdated
@@ -2793,7 +2793,7 @@ static void chooseModuleAuxiliaryOutputFilePath(Compilation &C, | |||
StringRef workingDirectory, | |||
CommandOutput *Output, | |||
file_types::ID fileID, | |||
bool isPrivate, | |||
bool isProject = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: "isProject" doesn't feel as descriptive if you don't already know what this is doing. Maybe "isProjectOnlyContent" or "shouldUseProjectFolder"?
lib/Frontend/Frontend.cpp
Outdated
llvm::SmallString<128> moduleSourceInfoFilePath(input.file()); | ||
llvm::sys::path::replace_extension(moduleSourceInfoFilePath, | ||
file_types::getExtension(file_types::TY_SwiftSourceInfoFile)); | ||
std::string NonPrivatePath = moduleSourceInfoFilePath.str().str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicks: this should be a SmallString too, and it should have a lowerCamelCase name, and I don't love that the name is described in terms of the other path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think we need to do the "Project" thing for the partial modules.
lib/Serialization/ModuleFile.cpp
Outdated
@@ -1222,13 +1223,164 @@ bool ModuleFile::readModuleDocIfPresent() { | |||
return true; | |||
} | |||
|
|||
class ModuleFile::DeclUSRTableInfo { | |||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: indentation
lib/Serialization/ModuleFile.cpp
Outdated
return llvm::djbHash(key, SWIFTSOURCEINFO_HASH_SEED); | ||
} | ||
|
||
static bool EqualKey(internal_key_type lhs, internal_key_type rhs) { return lhs == rhs; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: 80 cols
@@ -263,10 +263,43 @@ std::error_code SerializedModuleLoaderBase::openModuleDocFile( | |||
return std::error_code(); | |||
} | |||
|
|||
std::error_code | |||
SerializedModuleLoaderBase::openModuleSourceInfoFile(AccessPathElem ModuleID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, here is fine, just generalize it.
lib/Serialization/SourceInfoFormat.h
Outdated
using llvm::BCVBR; | ||
|
||
/// Magic number for serialized source info files. | ||
const unsigned char SWIFTSOURCEINFO_SIGNATURE[] = { 0xD6, 0x9C, 0xB7, 0x23 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this number come from? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made these number up. Is there a recommended way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept: use the UTF8 bytes of 📒
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I end up using 🏎️as the magic number :)
lib/Serialization/SourceInfoFormat.h
Outdated
/// | ||
/// Increment this value when making a backwards-incompatible change, i.e. where | ||
/// an \e old compiler will \e not be able to read the new format. This should | ||
/// be rare. When incrementing this value, reset SWIFTDOC_VERSION_MINOR to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to update the comment!
|
||
using BasicDeclLocsLayout = BCRecordLayout< | ||
BASIC_DECL_LOCS, // record ID | ||
BCBlob // an array of fixed size location data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very extensible, but probably good enough.
…are importing a module from a Swift interface file
… in .swiftsourceinfo file For .swiftdoc file, we don't expose doc-comments for underscored symbols. But this seems to be an unnecessary constraint on .swiftsourceinfo file since we put these symbols in .swiftinterface files anyway.
After this change, we only use one single hash table for USR to USR id mapping. The basic source locations are an array of fixed length records that could be retrieved by using the USR id since each USR id is guaranteed to be associated with one basic location entry. The source file paths are refactored to a blob of 0-terminated strings. Decl locations use offset in this blob to refer to the source file path where the decl was defined.
Having both Loc and NameLoc fields seems to be redundant.
…esent to be Optional. NFC
…/Project This directory should be excluded during installation since the content is only used for local development. swiftsourceinfo file is currently emitted to this directory.
…x to access basic location records
2346edc
to
014f863
Compare
@swift-ci please smoke test |
After having this function, the serialization logic doesn't have to know where the source locations are coming from (.swift or .swiftmodule).
@swift-ci please smoke test |
@swift-ci Please smoke test compiler performance |
@swift-ci please test |
Build failed |
Build failed |
@swift-ci please test |
Talked with @jrose-apple about this, we plan to merge it for now to unblock progress. Future code review comments, if any, will be addressed by separate PRs. |
I think that this may have caused a regression on Windows: https://dev.azure.com/compnerd/windows-swift/_build/results?buildId=11097&view=logs&j=b9e62f99-1a98-5ed7-01d2-f4794231ed79&t=6116924f-95b5-5652-9734-6b3f0d96ecae&l=922 |
try to invoke a std::unique_ptr<>'s copy constructor, which is deleted. Either move the unique_ptr or pass along a nullptr. This should repair the Windows build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further comments, none of which are too serious.
openModuleSourceInfoFileIfPresent(AccessPathElem ModuleID, | ||
StringRef ModulePath, | ||
StringRef ModuleSourceInfoFileName, | ||
std::unique_ptr<llvm::MemoryBuffer> *ModuleSourceInfoBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Swift/LLVM style doesn't right-align parameters that are too long. Instead, give up on aligning to the open paren and double-indent from the left. (This is what clang-format will do, I believe.)
} | ||
} | ||
return true; | ||
} | ||
|
||
bool ide::printDeclUSR(const Decl *D, raw_ostream &OS) { | ||
if (D->isImplicit()) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't really belong here; can you put it at the call site instead? We should be able to print USRs for implicit things in general.
failed = true; | ||
return None; | ||
} | ||
|
||
// FIXME: The fact that this test happens twice, for some cases, | ||
// suggests that setupInputs could use another round of refactoring. | ||
if (serialization::isSerializedAST(buffers.first->getBuffer())) { | ||
if (serialization::isSerializedAST(buffers->ModuleBuffer->getBuffer())) { | ||
PartialModules.push_back( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just PartialModules.push_back(std::move(buffers))
?
AccessPathElem ModuleID, | ||
StringRef ModulePath, | ||
StringRef ModuleSourceInfoFilename, | ||
std::unique_ptr<llvm::MemoryBuffer> *ModuleSourceInfoBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about right-alignment here.
// SECOND: Inputs/def_comments.swift:28:13: Var/decl_var_2 RawComment=none BriefComment=none DocCommentAsXML=none | ||
// SECOND: Inputs/def_comments.swift:28:25: Var/decl_var_3 RawComment=none BriefComment=none DocCommentAsXML=none | ||
// SECOND: Inputs/def_comments.swift:28:25: Var/decl_var_3 RawComment=none BriefComment=none DocCommentAsXML=none | ||
// SECOND: NonExistingSource.swift:100000:13: Func/functionAfterPoundSourceLoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this include a test that it's made absolute? Even if that's not important behavior, we probably still want to know if it changes.
<< ":" << LineAndColumn.first << ":" << LineAndColumn.second << ": "; | ||
} else { | ||
printSerializedLoc(D); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to always go through this path now.
Buffer.append(Text); | ||
Buffer.push_back('\0'); | ||
} | ||
return IndexMap[Text]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance nitpick: this will do 2-3 hash lookups of the text, and while that's not super important, you can do better with something like this:
auto iterAndIsNew = IndexMap.insert({Text, Buffer.size()});
if (iterAndIsNew.second) {
Buffer.append(Text);
Buffer.push_back('\0');
}
return (*IterAndIsNew.first).getValue();
std::unique_ptr<ModuleFile::SerializedDeclUSRTable> | ||
ModuleFile::readDeclUSRsTable(ArrayRef<uint64_t> fields, StringRef blobData) { | ||
if (fields.empty() || blobData.empty()) | ||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funky indentation here.
After setting up the .swiftsourceinfo file, this patch starts to actually serialize and de-serialize source locations for declaration. The binary format of .swiftsourceinfo currently contains these three records:
BasicDeclLocs: a hash table mapping from a USR ID to a list of basic source locations. The USR id could be retrieved from the following DeclUSRs record using an actual decl USR. The basic source locations include a file ID and the results from Decl::getLoc(), ValueDecl::getNameLoc(), Decl::getStartLoc() and Decl::getEndLoc(). The file ID could be used to retrieve the actual file name from the following SourceFilePaths record. Each location is encoded as a line:column pair.
DeclUSRS: a hash table mapping from USR to a USR ID used by location records.
SourceFilePaths: a hash table mapping from a file ID to actual file name.
BasicDeclLocs should be sufficient for most diagnostic cases. If additional source locations are needed, we could always add new source location records without breaking the backward compatibility. When de-serializing the source location from a module-imported decl, we calculate its USR, retrieve the USR ID from the DeclUSRS record, and use the USR ID to look up the basic location list in the BasicDeclLocs record.
For more details about .swiftsourceinfo file: https://forums.swift.org/t/proposal-emitting-source-information-file-during-compilation
rdar://56167685