You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Offloading tensors based on total VRAM budget and offloading policy (ggml-org#6)
* deprecate ffn_b
* get tensor offloading levels
* wip: split tensor loading
* wip: framework of loading sparse model tensors
* save and flush gpu alloc buffer
* vram budget will fall back to remaining free memory
* minor: remove vram safety margin
* add options for vram budget; clean old env vars
* minor: bugfix
0 commit comments