Skip to content

[libSyntax] Add a unsafe but fast SyntaxDataRef version of SyntaxData #36250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Mar 3, 2021

In contrast to SyntaxData, SyntaxDataRef is not memory-safe, but designed to be fast. In particular, the following guarantees from SyntaxData are being dropped:

  • SyntaxDataRef does not retain the SyntaxArena containing its RawSyntax. The user of SyntaxDataRef has to make sure the arena stays alive while the SyntaxDataRef is being used. However, that's usually pretty easily done by just retaining the SyntaxArena of the tree's root node.
  • The parent of a SyntaxDataRef must outlive the child node. This is the more tricky constraint, but if a tree is just walked top to bottom with nodes stored on the stack, this is given by the way the stack is being unrolled.

ahoppen added 2 commits March 3, 2021 08:48
Instead of having a heap-allocated RefCountedBox to store a SyntaxData's
parent, reference-count SyntaxData itself. This has a couple of
advantages:
 - When passing SyntaxData around, only a pointer needs to be passed
   instead of the entire struct contents. This is faster.
 - We can later introduce a SyntaxDataRef, which behaves similar to
   SyntaxData, but delegates the responsibility that the parent stays
   alive to the user. While sacrificing guaranteed memory safety, this
   means that SyntaxData can then be stack-allocated without any
   ref-counting overhead.
In contrast to SyntaxData, SyntaxDataRef is not memory-safe, but
designed to be fast. In particular, the following guarantees from
SyntaxData are being dropped:
 - SyntaxDataRef does not retain the SyntaxArena containing its
   RawSyntax. The user of SyntaxDataRef has to provide that guarantee.
   However, that's usually pretty easily done by just retaining the
   SyntaxArena of the tree's root node.
 - The parent of a SyntaxDataRef must outlive the child node. This is
   the more tricky constraint, but if a tree is just walked top to
   bottom with nodes stored on the stack, this is given by the way the
   stack is being unrolled.
@ahoppen
Copy link
Member Author

ahoppen commented Mar 3, 2021

@swift-ci Please smoke test

@rintaro
Copy link
Member

rintaro commented Mar 3, 2021

I feel this class hierarchy is safer. WDYT?

class SyntaxDataBase {
  const AbsoluteRawSyntax AbsoluteRaw;

protected:
  SyntaxDataBase(const AbsoluteRawSyntax & AbsoluteRaw)

public:
  // ... accessors to 'AbsoluteRaw'
}

class SyntaxDataRef: public SyntaxDataBase {
  const SyntaxDataRef *Parent;

  SyntaxDataRef(const AbsoluteRawSyntax &AbsoluteRaw, const SyntaxDataRef *Parent)

public:
  const SyntaxDataRef *getParent() const;
}

class SyntaxData: public llvm::ThreadSafeRefCountedBase<SyntaxData>, public SyntaxDataBase {
  RC<const SyntaxData> Parent;
  RC<SyntaxArena> Arena;

  SyntaxData(const AbsoluteRawSyntax &AbsoluteRaw, const RC<const SyntaxData> &Parent)
  SyntaxData(const AbsoluteRawSyntax &AbsoluteRaw)
public:
  RC<const SyntaxData> getParent() const;
}

@ahoppen
Copy link
Member Author

ahoppen commented Mar 3, 2021

The beauty of the the current design is that upcasting SyntaxData to SyntaxDataRef is zero-cost.

Foreshadowing to some future PRs, I’ll have a Syntax hierarchy (backed by SyntaxData) and a SyntaxRef hierarchy (backed by SyntaxDataRef). Demoting a Syntax node to a SyntaxRef node is essentially zero-cost in the current design. In you variant, we’d need to copy the SyntaxData (in particular its AbsoluteRawSyntax) into a SyntaxDataRef. At the moment, I’m not doing these kind of demotions very much (or even at all), but I think it’s a nice property to have.

I agree that your design is definitely safer, but would argue that the unsafe bits are pretty localized and I wouldn’t like to loose the property described above. Do you agree or would you argue in favor of safety?

@rintaro
Copy link
Member

rintaro commented Mar 4, 2021

Zero cost demotion is attractive. But I'm still not convinced that's worth it.
Could you explain your plan for getting children from SyntaxDataRef? How the APIs will look like?

@ahoppen
Copy link
Member Author

ahoppen commented Mar 4, 2021

Zero cost demotion is attractive. But I'm still not convinced that's worth it.

It’s a fair point. Especially since we’re not using it (and I don’t see that we need to for the parsing parsing). Maybe we should go with your safer approach for now and migrate to this if it turns out that SyntaxData -> SyntaxDataRef demotion is actually a performance concern.

Could you explain your plan for getting children from SyntaxDataRef? How the APIs will look like?
Copy-pasting from the next PR (which I haven’t posted yet).

/// If \c this node has a child at \p Index, write the child's \c
/// SyntaxDataRef to \p DataMem and return \c true. If no child exists at \p
/// Index, leave \p DataMem untouched and return \c false.
bool getChildRef(AbsoluteSyntaxPosition::IndexInParentType Index,
                 SyntaxDataRef *DataMem) const {
  auto AbsoluteRaw = getAbsoluteRaw().getChild(Index);
  if (AbsoluteRaw) {
    new (DataMem) SyntaxDataRef(*AbsoluteRaw,
                                /*Parent=*/const_cast<SyntaxDataRef *>(this));
    return true;
  } else {
    return false;
  }
}

@ahoppen
Copy link
Member Author

ahoppen commented Mar 4, 2021

Thinking about it some more, I think the current design has two issues (please correct me if you see more drawbacks @rintaro):

  1. We need to manually ref-count Parent
  2. All the issues you discovered around IsRef.

I’m personally not too fuzzed about the manual ref-counting of Parent. I’ve looked up where llvm::IntrusiveRefCntPtr issues retain/release calls and just copied them over, so we should be safe here.

And I think we can eliminate all the issues around IsRef by making isRef a virtual method (only available in assert builds) which returns true for SyntaxDataRef and false for SyntaxData.

With this, I think I would like to keep the zero-cost demotion around as an option and stick with the current design.

@ahoppen
Copy link
Member Author

ahoppen commented Mar 4, 2021

@swift-ci Please smoke test

Optional<SyntaxData> getPreviousNode() const;
#ifndef NDEBUG
virtual bool isRef() const { return true; }
virtual ~SyntaxDataRef() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a dumb C++ question. Could you explain why this is in #ifndef?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it’s #ifndef NDEBUG everywhere else as well. I always assumed this was to make the default a debug build.
If you use #ifdef DEBUG, the default is a non-debug build unless you define DEBUG.

Copy link
Member

@rintaro rintaro Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear. I meant why the destructor must be guarded by the #ifndef?

/// Get the child at \p Index if it exists. If the node does not have a child
/// at \p Index, return \c None. Asserts that \p Index < \c NumChildren
Optional<AbsoluteRawSyntax>
getChild(AbsoluteSyntaxPosition::IndexInParentType Index) const {
Copy link
Member

@rintaro rintaro Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return isNull() AbsoluteRawSyntax instead of Optional? If not, what the purpose of isNull()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the isNull() AbsoluteRawSyntax is so we can construct a null SyntaxDataRef (which has a AbsoluteRawSyntax member). We need this for the getChildRef method, I already described here, which is used as follows:

SyntaxRef result;
someNode.getChildRef(1, result);

Technically, result just needs to be uninitialized memory. If you know how to create uninitialized stack memory, we can remove isNull altogether.

I though about returning a null AbsoluteRawSyntax here, but decided against it, because I expect there will be a whole category of bugs where we would be accessing an optional child without check if it is null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m looking into whether we can make Optional<AbsoluteRawSyntax> a zero-cost wrapper around AbsoluteRawSyntax whose null value is internally backed by isNull() and make isNull() private. Then we would have the compile time safety of Optional but the runtime behavior as if we returned an isNull() AbsoluteRawSyntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the isNull function and made Optional<AbsoluteRawSyntax> a zero-cost wrapper. Since the change wasn't trivial (~250 LOC), I created a follow-up PR for it here.

@ahoppen
Copy link
Member Author

ahoppen commented Mar 9, 2021

Merged as part of #36313.

@ahoppen ahoppen closed this Mar 9, 2021
@ahoppen ahoppen deleted the pr/syntaxdataref branch March 9, 2021 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants