Skip to content

GSB: New algorithm for identifying redundant non-same-type requirements #36109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

slavapestov
Copy link
Contributor

@slavapestov slavapestov commented Feb 23, 2021

Generic signature minimization needs to diagnose and remove any redundant requirements, that is, requirements that can be proven from some subset of the remaining requirements.

For each requirement on an equivalence class, we record a set of RequirementSources; a RequirementSource is a "proof" that the equivalence class satisfies the requirement.

A RequirementSource is either "explicit" -- meaning it corresponds to a generic requirement written by the user -- or it is "derived", meaning it can be proven from some other explicit requirement.

The most naive formulation of the minimization problem is that we say that an explicit requirement is redundant if there is a derived source for the same requirement. However, this is not sufficient, for example:

protocol P {
  associatedtype A : P where A.A == Self
}

In the signature , the explicit requirement T : P also has a derived source T.A.A : P. However, this source is "self-derived", meaning that in order to obtain the witness table for T.A.A : P, we first have to obtain the witness table for T.A.

The GenericSignatureBuilder handled this kind of 'self-derived' requirement correctly, by removing it from the list of sources. This was implemented in the removeSelfDerived() function.

After removeSelfDerived() was called, any remaining derived requirement sources were assumed to obsolete any explicit source for the same requirement.

However, even this was not sufficient -- namely, it only handled the case where a explicit requirement would imply a derived source for itself, and not a cycle involving multiple explicit sources that would imply each other.

For example, the following generic signature would be misdiagnosed with both conformance requirements as redundant, resulting in an invalid generic signature:

protocol P {
  associatedtype T : P
}

func f<T : P, U : P>(_: T, _: U) where T.X == U, U.X == T {}

In the above example, T : P has an explicit requirement source, as well as a derived source (U : P)(U.X : P). Similarly, U : P has an explicit requirement source, as well as a derived source (T : P)(T.X : P). Since neither of the derived sources were "self-derived" according to our definition, we would diagnose both explicit sources as redundant. But of course, after dropping them, we are left with the following generic signature:

func f<T, U>(_: T, _: U) where T.X == U, U.X == T {}

This is no longer valid -- since neither T nor U have a conformance requirement, the nested types T.X and U.X referenced from our same-type requirements are no longer valid.

The new algorithm abandons the "self-derived" concept. Instead, we build a directed graph where the vertices are explicit requirements, and the edges are implications where one explicit requirement implies another. In the above example, each of the explicit conformance requirements implies the other. This means a correct minimization must pick exactly one of the two -- not zero, and not both.

The set of minimized requirements is formally defined as the minimum set of requirements whose transitive closure is the entire graph.

We compute this set by first building the graph of strongly connected components using Tarjan's algorithm. The graph of SCCs is a directed acyclic graph, which means we can compute the root set of the DAG. Finally, we pick a suitable representative requirement from each SCC in the root set, using the lexshort order on subject types.

Fixes rdar://problem/rdar62903491.

Next steps

This PR drops in the new algorithm and uses it in a couple of places without refactoring much else. The old removeSelfDerived() code is still part of some load-bearing hacks in computing conformance access paths and minimizing same-type requirements. There will be more code deleted in this area eventually.

@slavapestov slavapestov force-pushed the scc-redundant-requirements-algorithm branch 8 times, most recently from 5809f69 to 0d90f50 Compare March 1, 2021 05:13
@slavapestov slavapestov changed the title Scc redundant requirements algorithm GSB: New algorithm for identifying redundant requirements Mar 1, 2021
@slavapestov slavapestov marked this pull request as ready for review March 1, 2021 05:14
@slavapestov
Copy link
Contributor Author

Thank you to @ornata for suggesting the idea of using the strongly-connected-components DAG to compute the set of minimized requirements.

@swift-ci

This comment has been minimized.

@swift-ci

This comment has been minimized.

Generic signature minimization needs to diagnose and remove any
redundant requirements, that is, requirements that can be proven
from some subset of the remaining requirements.

For each requirement on an equivalence class, we record a set of
RequirementSources; a RequirementSource is a "proof" that the
equivalence class satisfies the requirement.

A RequirementSource is either "explicit" -- meaning it
corresponds to a generic requirement written by the user -- or it
is "derived", meaning it can be proven from some other explicit
requirement.

The most naive formulation of the minimization problem is that
we say that an explicit requirement is redundant if there is a
derived source for the same requirement. However, this is not
sufficient, for example:

protocol P {
  associatedtype A : P where A.A == Self
}

In the signature <T where T : P>, the explicit requirement
T : P also has a derived source T.A.A : P. However, this source
is "self-derived", meaning that in order to obtain the
witness table for T.A.A : P, we first have to obtain the witness
table for T.A.

The GenericSignatureBuilder handled this kind of 'self-derived'
requirement correctly, by removing it from the list of sources.
This was implemented in the removeSelfDerived() function.

After removeSelfDerived() was called, any remaining derived
requirement sources were assumed to obsolete any explicit
source for the same requirement.

However, even this was not sufficient -- namely, it only handled
the case where a explicit requirement would imply a derived
source for itself, and not a cycle involving multiple explicit
sources that would imply each other.

For example, the following generic signature would be misdiagnosed
with *both* conformance requirements as redundant, resulting in
an invalid generic signature:

protocol P {
  associatedtype T : P
}

func f<T : P, U : P>(_: T, _: U) where T.X == U, U.X == T {}

In the above example, T : P has an explicit requirement source,
as well as a derived source (U : P)(U.X : P). Similarly, U : P
has an explicit requirement source, as well as a derived source
(T : P)(T.X : P). Since neither of the derived sources were
"self-derived" according to our definition, we would diagnose
*both* explicit sources as redundant. But of course, after
dropping them, we are left with the following generic signature:

func f<T, U>(_: T, _: U) where T.X == U, U.X == T {}

This is no longer valid -- since neither T nor U have a conformance
requirement, the nested types T.X and U.X referenced from our
same-type requirements are no longer valid.

The new algorithm abandons the "self-derived" concept. Instead,
we build a directed graph where the vertices are explicit
requirements, and the edges are implications where one explicit
requirement implies another. In the above example, each of the
explicit conformance requirements implies the other. This means
a correct minimization must pick exactly one of the two -- not
zero, and not both.

The set of minimized requirements is formally defined as the
minimum set of requirements whose transitive closure is the
entire graph.

We compute this set by first building the graph of strongly
connected components using Tarjan's algorithm. The graph of SCCs
is a directed acyclic graph, which means we can compute the root
set of the DAG. Finally, we pick a suitable representative
requirement from each SCC in the root set, using the lexshort
order on subject types.

This commit implements the new algorithm, runs it on each generic
signature and asserts some properties about the results, but
doesn't actually use the algorithm for computing the minimized
signature or diagnosing redundancies; that will come in the next
commit.
@slavapestov slavapestov force-pushed the scc-redundant-requirements-algorithm branch 3 times, most recently from ca473dc to f30af84 Compare March 1, 2021 22:26
@slavapestov
Copy link
Contributor Author

Source compatibility already passed and the updated PR only adds some tests.

@slavapestov slavapestov changed the title GSB: New algorithm for identifying redundant requirements GSB: New algorithm for identifying redundant non-same-type requirements Mar 1, 2021
@slavapestov slavapestov force-pushed the scc-redundant-requirements-algorithm branch from f30af84 to 0be55c1 Compare March 1, 2021 22:50
@slavapestov
Copy link
Contributor Author

@swift-ci Please smoke test

@slavapestov slavapestov merged commit 85bbf70 into swiftlang:main Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants