Skip to content

Add hipGraph and VMM support to ROCM #11362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 24, 2025
Merged

Conversation

IMbackK
Copy link
Collaborator

@IMbackK IMbackK commented Jan 22, 2025

This adds, disabled by default, hipGraph support. Essentially this just involves adding the relevant hip defines to ggml-cuda/vendors/hip.h

Currently is seams that hipGraph dosent improve performance at all. Looking at rocprof it seams that launching the kernels this way gains no decrease in overhead, while building the graph adds overhead. Presumably since this api was recently added to rocm and is still marked as beta (https://rocmdocs.amd.com/projects/HIP/en/latest/reference/hip_runtime_api/modules/graph_management.html) It has not been tuned for performance.

I still think its useful to have this since in the future this will likely change, and maybe on some hw configs it already helps right now.

This also enables VMM support on ROCM. There seams to be a slight bug in ROCR, that i added a workaround for, this should be harmless to cuda.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 22, 2025
@slaren
Copy link
Member

slaren commented Jan 22, 2025

Since you are looking into this, have you considered enabling the VMM pool allocator in HIP? I believe that the APIs necessary are supported, it was just never enabled because nobody with the ability to test it tried it yet. It could reduce memory usage.

@IMbackK
Copy link
Collaborator Author

IMbackK commented Jan 22, 2025

Sure ill take a look

@IMbackK
Copy link
Collaborator Author

IMbackK commented Jan 23, 2025

Looks like there is a bug in rocm VMM when runing a non amdhsa kfd enabled kernel and the rocm runtime is falling back to the regular drm interface on the mainline kernel.

As i suspect that using the mainline kernel is pretty common among ggml users i wont ammend this pr to also add vmm support while i report the issue to amd.

@IMbackK IMbackK changed the title Add hipGraph support Add hipGraph and VMM support to ROCM Jan 23, 2025
@IMbackK
Copy link
Collaborator Author

IMbackK commented Jan 23, 2025

@slaren Ok so i debugged the rocr issue and we can simply add a minor workaround for this so this pr now also adds VMM support (enabled by default)

@IMbackK IMbackK force-pushed the hip_graph branch 2 times, most recently from 6a4a87e to ed69500 Compare January 23, 2025 22:51
@IMbackK
Copy link
Collaborator Author

IMbackK commented Jan 24, 2025

@slaren i think avoiding the workaround on cuda makes the code more complicated for no real gain, but i have done so.

I have also fixed the coding style to the best of my knowledge but unfortunately i cant use clang-format as https://github.com/ggerganov/llama.cpp/blob/master/.clang-format contains a style that conflicts with the one used in this file results in a huge amount of changes.

@slaren
Copy link
Member

slaren commented Jan 24, 2025

Thanks, I know that the change is unlikely to affect CUDA negatively, but I would rather not risk it.

Some tools allow formatting only a selection with clang-format (e.g. the vscode clangd extension), it should not be necessary to re-format the entire file.

@IMbackK
Copy link
Collaborator Author

IMbackK commented Jan 24, 2025

Even then, clang-format produces very a different codeing style to what is used in this file. here is what it wants to do with the snippet in this pr:

struct ggml_cuda_pool_vmm : public ggml_cuda_pool {
    static const size_t CUDA_POOL_VMM_MAX_SIZE = 1ull << 35;  // 32 GB

    int         device;
    CUdeviceptr pool_addr = 0;
    size_t      pool_used = 0;
    size_t      pool_size = 0;
    size_t      granularity;
#    if defined(GGML_USE_HIP)
    std::vector<std::pair<CUdeviceptr, size_t>> mappings;
#    endif

    explicit ggml_cuda_pool_vmm(int device) :
        device(device),
        granularity(ggml_cuda_info().devices[device].vmm_granularity) {}

    ~ggml_cuda_pool_vmm() {
        if (pool_addr != 0) {
#    if defined(GGML_USE_HIP)
            // Workaround for https://github.com/ROCm/ROCR-Runtime/issues/285
            for (std::pair<CUdeviceptr, size_t> & mapping : mappings) {
                CU_CHECK(cuMemUnmap(mapping.first, mapping.second));
            }
#    else
            CU_CHECK(cuMemUnmap(pool_addr, pool_size));
#    endif
            CU_CHECK(cuMemAddressFree(pool_addr, CUDA_POOL_VMM_MAX_SIZE));
        }
    }

i gues thats an issue report in of itself.

@slaren slaren merged commit 5f0db95 into ggml-org:master Jan 24, 2025
44 checks passed
anagri pushed a commit to BodhiSearch/llama.cpp that referenced this pull request Jan 26, 2025
* Add hipGraph support

* Enable VMM on rocm
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
* Add hipGraph support

* Enable VMM on rocm
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
* Add hipGraph support

* Enable VMM on rocm
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
* Add hipGraph support

* Enable VMM on rocm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants