Skip to content

Commit bd22d18

Browse files
committed
Update base for Update on "[Executorch][llama] Make RoPE freq calculation broadcast for per head"
This is a workaround, may not be even worth landing, to avoid broadcasting semantics in the mul op and for that matter any binary op. Current implementation of oiptimized ops doesnt handle broadcasting and falls back to portable op implementation. This diff also fixes an issue where (as seen in llama) two tensors of binary op are not broadcasting, but they have different # of dims, which results in invocation of unoptimized path. e.g. a = [1, 1, 2048], b = [2048], out = [1, 1, 2048]. In llama case this is optimized path when generating one token at a time. Not so during pre-fill Making optimized op handle broadcasting, and support vectorization, is not hard, but may take some time. Differential Revision: [D54766067](https://our.internmc.facebook.com/intern/diff/D54766067/) [ghstack-poisoned]
1 parent 9c38abd commit bd22d18

File tree

0 file changed

+0
-0
lines changed

    0 file changed

    +0
    -0
    lines changed

    0 commit comments

    Comments
     (0)