You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update on "[ET][Memory planning] Improve greedy memory planning."
This diff replaces the old greedy algorithm. Older algorithm resulted in 35%
worse compared to theoretical optimum. THis matter for long context even more
since additional overhead can be few hundred MB.
For example the theorical optimial for llama3_2 8B, 4-bit quantized modelw ith
context length of 2k needs about 1G of memory. This theoretcial max can be
observed by looking at the peaks in memory profile.
Current agorithm resulted in about 1.6GB of planned memory. New algorithm
reduce that to about 1.1G.
Differential Revision: [D68448332](https://our.internmc.facebook.com/intern/diff/D68448332/)
[ghstack-poisoned]
0 commit comments