-
Notifications
You must be signed in to change notification settings - Fork 607
Commit 3201e83
committed
Update base for Update on "[ET-VK] Migrate ops to use
## Changes
* Migrate operators that are used in the llama model to use `DynamicDispatchNode` instead of `DispatchNode`
## Motivation
`DynamicDispatchNode` is a subclass of `DispatchNode` that allows dynamic selection of compute shaders, global and local work group sizing whenever the command buffer is encoded. This is critical for ensuring optimum performance when input shapes are dynamic, since it allows operators to select the best compute shader for the input conditions and also to adjust global work group sizing to launch the minimum number of work groups necessary.
Without this change, performance of llama 3.2 1B with dynamic shapes enabled is terrible (< 1 tok/s) because global work group sizing is determined based on maximum tensor sizes, which is based on the maximum sequence length. In practice, the sequence length dimension of tensors (even during the prefill phase) will not approach the maximum. This results in a lot of inactive threads launched during compute shader dispatches.
Differential Revision: [D75878398](https://our.internmc.facebook.com/intern/diff/D75878398/)
[ghstack-poisoned]DynamicDispatchNode
"1 parent a155072 commit 3201e83Copy full SHA for 3201e83
File tree
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changedFilter options
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changed
0 commit comments