-
Notifications
You must be signed in to change notification settings - Fork 12.2k
Add hipGraph and VMM support to ROCM #11362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since you are looking into this, have you considered enabling the VMM pool allocator in HIP? I believe that the APIs necessary are supported, it was just never enabled because nobody with the ability to test it tried it yet. It could reduce memory usage. |
Sure ill take a look |
Looks like there is a bug in rocm VMM when runing a non amdhsa kfd enabled kernel and the rocm runtime is falling back to the regular drm interface on the mainline kernel. As i suspect that using the mainline kernel is pretty common among ggml users i wont ammend this pr to also add vmm support while i report the issue to amd. |
6a4a87e
to
ed69500
Compare
@slaren i think avoiding the workaround on cuda makes the code more complicated for no real gain, but i have done so. I have also fixed the coding style to the best of my knowledge but unfortunately i cant use clang-format as https://github.com/ggerganov/llama.cpp/blob/master/.clang-format contains a style that conflicts with the one used in this file results in a huge amount of changes. |
Thanks, I know that the change is unlikely to affect CUDA negatively, but I would rather not risk it. Some tools allow formatting only a selection with |
Even then, clang-format produces very a different codeing style to what is used in this file. here is what it wants to do with the snippet in this pr:
i gues thats an issue report in of itself. |
* Add hipGraph support * Enable VMM on rocm
* Add hipGraph support * Enable VMM on rocm
* Add hipGraph support * Enable VMM on rocm
* Add hipGraph support * Enable VMM on rocm
This adds, disabled by default, hipGraph support. Essentially this just involves adding the relevant hip defines to ggml-cuda/vendors/hip.h
Currently is seams that hipGraph dosent improve performance at all. Looking at rocprof it seams that launching the kernels this way gains no decrease in overhead, while building the graph adds overhead. Presumably since this api was recently added to rocm and is still marked as beta (https://rocmdocs.amd.com/projects/HIP/en/latest/reference/hip_runtime_api/modules/graph_management.html) It has not been tuned for performance.
I still think its useful to have this since in the future this will likely change, and maybe on some hw configs it already helps right now.
This also enables VMM support on ROCM. There seams to be a slight bug in ROCR, that i added a workaround for, this should be harmless to cuda.