Search graph for quantization parameters #6690
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(This change was merged but had to be reverted due to type checking errors, see #6452)
Generalizes the search for quantization parameters. The idea is to make a graph like this (part of) a valid quantized graph:
dq -> view -> transpose -> some_op
dq ------> expand -----------/^
For a subset of operations 'passable_op' that don't modify the values of the tensors it is is allowed to "pass through" the op when searching for qparams. If multiple qparams are encounterd in one search, they are asserted to be equal.
The reason for this is to unify what a quantized graph looks like, removing the need for some ad hoc exceptions (for example checking if a view is before an addmm etc.) that were in place before and in general cleaning up the handling of quantization in the ArmBackend. In particular, some decompositions in the to_edge step seem to not insert q/ dq ops which breaks the pattern of dq -> op -> q.
With this change, Arm internal passes can also in some cases skip inserting quantize ops, simplifying the passes.