Skip to content

SymbolGraph ExtractAPI support for C and Objective-C in clang #4442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

daniel-grumberg
Copy link

Cherry-pick commits for llvm.org's main branch that implement support for Symbol Graph generation using clang for C and Objective-C headers, as mentioned in https://forums.swift.org/t/extending-swift-docc-to-support-objective-c-documentation/53243. Bringing in these changes would benefit the Swift community and facilitate support in SwiftPM and similar tooling.

These changes don’t affect regular compilation and instead introduce a new action to the clang frontend and driver. Most of the changes are self contained in the ExtractAPI library. An example clang invocation for generating a symbol graph would be:


clang -extract-api -o MyFramework.symbols.json first-header.h second-header.h

Changes that affect the core compiler infrastructure are constrained to the driver, the frontend actions infrastructure and the ASTs infrastructure for comment processing.


Changes to the driver consist of creating a new JobAction for the -extract-api option. The aim of this is to create a single CC1 invocation that takes all the header files provided on the command line as inputs for further processing by the frontend. These changes can be found in the following files.:

  • clang/include/clang/Basic/DiagnosticDriverKinds.td
  • clang/include/clang/Driver/Action.h
  • clang/include/clang/Driver/Options.td
  • clang/include/clang/Driver/Types.def
  • clang/lib/Driver/Action.cpp
  • clang/lib/Driver/Driver.cpp
  • clang/lib/Driver/ToolChain.cpp
  • clang/lib/Driver/ToolChains/Clang.cpp
  • clang/lib/Driver/Types.cpp

In CC1 itself the only new changes that aren’t self contained are to construct the appropriate new frontend action for the provided headers instead of performing a regular build. These changes can be found in the following files:

  • clang/lib/Frontend/CompilerInvocation.cpp
  • clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp
  • clang/include/clang/Frontend/FrontendActions.h
  • clang/include/clang/Frontend/FrontendOptions.h

In the AST comment processing infrastructure we introduced a new API for getting a list of comment lines with associated source locations to enable correct symbol graph generation. This new interface is only exercised by the ExtractAPI library. This new interface is defined and implemented in the following files:

  • clang/lib/AST/RawCommentList.cpp
  • clang/include/clang/AST/RawCommentList.h


The rest of the changes are in the ExtractAPI library implement the new frontend action for Symbol Graph generation. These are fully self contained and will not affect the rest of the compiler.

zixu-w and others added 30 commits May 5, 2022 13:05
This is the initial commit for the clang-extract-api RFC
<https://lists.llvm.org/pipermail/cfe-dev/2021-September/068768.html>
Add a new driver option `-extract-api` and associate it with a dummy
(for now) frontend action to set up the initial structure for
incremental works.

Differential Revision: https://reviews.llvm.org/D117809
Fix a build failure where an unused private field in ExtractAPIVisitor
triggered a warning turned into error.
Add facilities for extract-api:
- Structs/classes to hold collected API information: `APIRecord`, `API`
- Structs/classes for API information:
  - `AvailabilityInfo`: aggregated availbility information
  - `DeclarationFragments`: declaration fragments
    - `DeclarationFragmentsBuilder`: helper class to build declaration
      fragments for various types/declarations
  - `FunctionSignature`: function signature
- Serialization: `Serializer`
- Add output file for `ExtractAPIAction`
- Refactor `clang::RawComment::getFormattedText` to provide an
  additional `getFormattedLines` for a more detailed view of comment lines
  used for the SymbolGraph format

Add support for global records (global variables and functions)
- Add `GlobalRecord` based on `APIRecord` to store global records'
  information
- Implement `VisitVarDecl` and `VisitFunctionDecl` in `ExtractAPIVisitor` to
  collect information
- Implement serialization for global records
- Add test case for global records

Differential Revision: https://reviews.llvm.org/D119479
The clang/SymbolGraph/global_record.c test case explicitly diffs the
clang version in use, which causes failures. Fix the issue by normalize
the `generator` field before checking the output.
Implements an APISet specific unique ptr type that has a custom deleter
that just calls the underlying APIRecord subclass destructor.
clang -extract-api should accept multiple headers and forward them to a
single CC1 instance. This change introduces a new ExtractAPIJobAction.
Currently API Extraction is done during the Precompile phase as this is
the current phase that matches the requirements the most. Adding a new
phase would need to change some logic in how phases are scheduled. If
the headers scheduled for API extraction are of different types the
driver emits a diagnostic.

Differential Revision: https://reviews.llvm.org/D121936
- The name SymbolGraph is inappropriate and confusing for the new library
  for clang-extract-api. Refactor and rename things to make it clear that
  ExtractAPI is the core functionality and SymbolGraph is one serializer
  for the API information.
- Add documentation comments to ExtractAPI classes and methods to improve
  readability and clearness of the ExtractAPI work.

Differential Revision: https://reviews.llvm.org/D122160
…uage

Change the Symbol Graph serializer for ExtractAPI to use `objective-c`
for the language name string for Objective-C, to align with clang
frontend standards.
Adds `--product-name=` flag to the clang driver. This gets forwarded to
cc1 only when we are performing a ExtractAPI Action. This is used to
populate the `name` field of the module object in the generated SymbolGraph.

Differential Revision: https://reviews.llvm.org/D122141
Add support for enum records
- Add `EnumConstantRecord` and `EnumRecord` to store API information for
  enums
- Implement `VisitEnumDecl` in `ExtractAPIVisitor`
- Implement serializatin for enum records and `MemberOf` relationship
- Add test case for enum records
- Few other improvements

Depends on D122160

Differential Revision: https://reviews.llvm.org/D121873
- Add `StructFieldRecord` and `StructRecord` to store API information
  for structs
- Implement `VisitRecordDecl` in `ExtractAPIVisitor`
- Implement Symbol Graph serialization for struct records.
- Add test case for struct records.

Depends on D121873

Differential Revision: https://reviews.llvm.org/D122202
Before actually executing the ExtractAPIAction, clear the
CompilationInstance's input list and replace it with a single
synthesized file that just includes (or imports in ObjC) all the inputs.

Depends on D122141

Differential Revision: https://reviews.llvm.org/D122175
Using a BumpPtrAllocator introduced memory leaks for APIRecords as they
contain a std::vector. This meant that we needed to always keep a
reference to the records in APISet and arrange for their destructor to
get called appropriately. This was further complicated by the need for
records to own sub-records as these subrecords would still need to be
allocated via the BumpPtrAllocator and the owning record would now need
to arrange for the destructor of its subrecords to be called
appropriately.

Since APIRecords contain a std::vector so whenever elements get added to
that there is an associated heap allocation regardless. Since
performance isn't currently our main priority it makes sense to use
regular unique_ptr to keep track of APIRecords, this way we don't need
to arrange for destructors to get called.

The BumpPtrAllocator is still used for strings such as USRs so that we
can easily de-duplicate them as necessary.

Differential Revision: https://reviews.llvm.org/D122331
Add missing virtual method anchors for structs in ExtractAPI/API.h
Rename a local variable name to avoid potential ambiguity/conflict for
some compilers.
The current way of getting the `clang::Language` from `LangOptions` does
not handle Objective-C correctly because `clang::Language::ObjC` does
not correspond to any `LangStandard`. This patch passes the correct
`Language` from the frontend input information.

Differential Revision: https://reviews.llvm.org/D122495
Add support for Objective-C interface declarations in ExtractAPI.

Depends on D122495

Differential Revision: https://reviews.llvm.org/D122446
Add support for Objective-C protocol declarations in ExtractAPI.

Depends on D122446

Differential Revision: https://reviews.llvm.org/D122511
Make the API records a property of the action instead of the ASTVisitor
so that it can be accessed outside the AST visitation and push back
serialization to the end of the frontend action.

This will allow accessing and modifying the API records outside of the
ASTVisitor, which is a prerequisite for supporting macros.
To achieve this we hook into the preprocessor during the
ExtractAPIAction and record definitions for macros that don't get
undefined during preprocessing.
…POpts

This was triggering some build failures so removing this change for now.
Add struct level documentation for MacroDefinitionRecord.

Differential Revision: https://reviews.llvm.org/D122798
This fixes the situation where a undefining a not previously defined
macro resulted in a crash. Before trying to remove a definition from
PendingMacros we first check to see if the macro did indeed have a
previous definition.

Differential Revision: https://reviews.llvm.org/D123056
This includes:
- replacing "relationhips" with "relationships"
- emitting the "pathComponents" property on symbols
- emitting the "accessLevel" property on symbols

Differential Revision: https://reviews.llvm.org/D123045
Typedef records consist of the symbol associated with the underlying
TypedefDecl and a SymbolReference to the underlying type. Additionally
typedefs for anonymous TagTypes use the typedef'd name as the symbol
name in their respective records and USRs. As a result the declaration
fragments for the anonymous TagType are those for the associated
typedef. This means that when the user is defining a typedef to a
typedef to a anonymous type, we use a reference the anonymous TagType
itself and do not emit the typedef to the anonymous type in the
generated symbol graph, including in the type destination of further
typedef symbol records.

Differential Revision: https://reviews.llvm.org/D123019
Add (partial) support for Objective-C category records in ExtractAPI.
The current ExtractAPI collects everything for an Objective-C category,
but not fully serialized in the SymbolGraphSerializer. Categories
extending external interfaces are disgarded during serialization, and
categories extending known interfaces are merged (all members surfaced)
into the interfaces.

Differential Revision: https://reviews.llvm.org/D122774
There is a bug in `DeclarationFragments::appendSpace` where the space
character is added to a local copy of the last fragment.

Differential Revision: https://reviews.llvm.org/D123259
daniel-grumberg and others added 10 commits May 5, 2022 15:32
- Split GlobalRecord into two distinct types to be able to introduce
has_function_signature type trait.
- Add has_function_signature type trait.
- Serialize function signatures as part of serializeAPIRecord for
records that are known to have a function signature.

Differential Revision: https://reviews.llvm.org/D123304
Fix path replacement in sed (properly this time) using lit
regex_replacement.

Differential Revision: https://reviews.llvm.org/D123526

Co-authored-by: Michele Scandale <[email protected]>
Co-authored-by: Zixu Wang <[email protected]>
Anonymous enums without a typedef should have a "(anonymous)" identifier.

Differential Revision: https://reviews.llvm.org/D123533
Fix one test (enum.c) in ExtractAPI to use %clang_cc1 and -verify
instead of calling the full driver and FileCheck. This is an example for
my comment from https://reviews.llvm.org/D121873.

Differential Revision: https://reviews.llvm.org/D124634
This patch transforms the given input headers to relative include names
using header search entries and some heuritics.
For example: `/Path/To/Header.h` will be included as `<Header.h>` with a
search path of `-I /Path/To/`; and
`/Path/To/Framework.framework/Headers/Header.h` will be included as
`<Framework/Header.h>`, given a search path of `-F /Path/To`.
Headermaps will also be queried in reverse to find a spelled name to
include headers.

Differential Revision: https://reviews.llvm.org/D123831
4c262fe accidentally added local
unfinished test case clang/test/Index/annotate-comments-enum-constant.c
This patch removes it.
This reverts commit 4c262fe.
Revert to fix Msan and Asan errors.
Reapply the change after fixing sanitizer errors.
The original problem was that `StringRef`s in `Matches` are pointing to
temporary local `std::string`s created by `path::convert_to_slash` in
the regex match call. This patch does the conversion up front in
container `FilePath`.

This reverts commit 2966f0f.

Differential Revision: https://reviews.llvm.org/D124964
@daniel-grumberg
Copy link
Author

@swift-ci please test

@daniel-grumberg
Copy link
Author

@swift-ci Please Build Toolchain macOS Platform

@zixu-w
Copy link

zixu-w commented May 5, 2022

Failed test: https://github.com/apple/swift/blob/main/test/SourceKit/CursorInfo/cursor_symbol_graph_objc.swift#L243-L278
This is caused by the new clang::RawComment::getFormattedLines, which tries to fix the problem where an extra newline is added at the end of a block comment.

zixu-w added a commit to zixu-w/swift that referenced this pull request May 6, 2022
Pull request swiftlang/llvm-project#4442 brings in a change to
`RawComment::getFormattedText` that removes spurious new lines
and whitespaces at the end of block comments. It breaks the
`cursor_symbol_graph_objc` test which is assuming the old behavior.
Temporarily disable the relevant check lines in the test to merge the
llvm change, and then fix the test properly and switch to the new
`getFormattedLines` in SymbolGraphGen.
@bnbarham
Copy link

bnbarham commented May 7, 2022

You'll want to grab zixu-w/swift@1ed123e on swift release/5.7 to avoid spurious errors there.

zixu-w added a commit to zixu-w/swift that referenced this pull request May 7, 2022
Pull request swiftlang/llvm-project#4442 brings in a change to
`RawComment::getFormattedText` that removes spurious new lines
and whitespaces at the end of block comments. It breaks the
`cursor_symbol_graph_objc` test which is assuming the old behavior.
Temporarily disable the relevant check lines in the test to merge the
llvm change, and then fix the test properly and switch to the new
`getFormattedLines` in SymbolGraphGen.

(cherry picked from commit 1ed123e)
@daniel-grumberg
Copy link
Author

@swift-ci please test

@daniel-grumberg
Copy link
Author

@swift-ci please test

Copy link

@ributzka ributzka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an isolated change, so I am good with taking it.

@tkremenek tkremenek merged commit 9fde71b into swiftlang:swift/release/5.7 May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants