Commit bd22d18

committed

Update base for Update on "[Executorch][llama] Make RoPE freq calculation broadcast for per head"

This is a workaround, may not be even worth landing, to avoid broadcasting semantics in the mul op and for that matter any binary op. Current implementation of oiptimized ops doesnt handle broadcasting and falls back to portable op implementation. This diff also fixes an issue where (as seen in llama) two tensors of binary op are not broadcasting, but they have different # of dims, which results in invocation of unoptimized path. e.g. a = [1, 1, 2048], b = [2048], out = [1, 1, 2048]. In llama case this is optimized path when generating one token at a time. Not so during pre-fill Making optimized op handle broadcasting, and support vectorization, is not hard, but may take some time. Differential Revision: [D54766067](https://our.internmc.facebook.com/intern/diff/D54766067/) [ghstack-poisoned]

1 parent 9c38abd commit bd22d18Copy full SHA for bd22d18

0 file changed

-0

lines changed

0 file changed

-0

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit bd22d18

0 file changed

0 file changed

File tree

0 file changed

0 file changed

0 commit comments