-
Notifications
You must be signed in to change notification settings - Fork 608
Commit 32a40b0
committed
Update base for Update on "Refactor attention v2"
Pull attention creation out of Transformer/TransformerBlock. Instead, pass the layers into Transformer.
The motivation is to customize linear layers in attention for LoRA (eg. make wq into a LoraLinear instead of a regular linear). In the next diff (D73517350), we pull wq,wk,wv,wo out of the attention and pass those in as well.
This allows us to customize attention parameters without passing in ModelArgs and doing the customization deep inside attention.py.
I think this modularizes our attention/transformer components, though also means that users have to do some more work to construct the attention layers and pass it to transformer.
It follows the torchtune structure more closely, eg. https://github.com/pytorch/torchtune/blob/main/torchtune/models/llama3_2/_component_builders.py#L221
Differential Revision: [D73538697](https://our.internmc.facebook.com/intern/diff/D73538697/)
[ghstack-poisoned]1 parent d4c9f8b commit 32a40b0Copy full SHA for 32a40b0
File tree
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changedFilter options
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changed
0 commit comments