Skip to content

Commit 64e5f6e

Browse files
GZGavinZhaoatmouse-
authored andcommitted
ROCm: use native CMake HIP support
Supercedes ggml-org#4024 and ggml-org#4813. CMake's native HIP support has become the recommended way to add HIP code into a project (see [here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)). This PR makes the following changes: 1. The environment variable `HIPCXX` or CMake option `CMAKE_HIP_COMPILER` should be used to specify the HIP compiler. Notably this shouldn't be `hipcc`, but ROCm's clang, which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`. 2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the GPU architectures to build for. Previously this was controled by `GPU_TARGETS`. 3. Updated the Nix recipe to account for these new changes. 4. The GPU targets to build against in the Nix recipe is now consistent with the supported GPU targets in nixpkgs. The most important part about this PR is the separation of the HIP compiler and the C/C++ compiler. This allows users to choose a different C/C++ compiler if desired, compared to the current situation where when building for ROCm support, everything must be compiled with ROCm's clang. ~~Makefile is unchanged. Please let me know if we want to be consistent on variables' naming because Makefile still uses `GPU_TARGETS` to control architectures to build for, but I feel like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're calling `make`.~~ Makefile used `GPU_TARGETS` but the README says to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of `GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`. Thanks to the suggestion of @jin-eld, to maintain backwards compatibility (and not break too many downstream users' builds), if `CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using the original behavior and emit a warning that recommends switching to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but `CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS` to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new HIP support. Signed-off-by: Gavin Zhao <[email protected]>
1 parent 7593639 commit 64e5f6e

File tree

5 files changed

+108
-21
lines changed

5 files changed

+108
-21
lines changed

.devops/nix/package.nix

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -227,20 +227,20 @@ effectiveStdenv.mkDerivation (
227227
)
228228
]
229229
++ optionals useRocm [
230-
(cmakeFeature "CMAKE_C_COMPILER" "hipcc")
231-
(cmakeFeature "CMAKE_CXX_COMPILER" "hipcc")
232-
233-
# Build all targets supported by rocBLAS. When updating search for TARGET_LIST_ROCM
234-
# in https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CMakeLists.txt
235-
# and select the line that matches the current nixpkgs version of rocBLAS.
236-
# Should likely use `rocmPackages.clr.gpuTargets`.
237-
"-DAMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102"
230+
(cmakeFeature "CMAKE_HIP_COMPILER" "${rocmPackages.llvm.clang}/bin/clang")
231+
(cmakeFeature "CMAKE_HIP_ARCHITECTURES" (builtins.concatStringsSep ";" rocmPackages.clr.gpuTargets))
238232
]
239233
++ optionals useMetalKit [
240234
(lib.cmakeFeature "CMAKE_C_FLAGS" "-D__ARM_FEATURE_DOTPROD=1")
241235
(cmakeBool "LLAMA_METAL_EMBED_LIBRARY" (!precompileMetalShaders))
242236
];
243237

238+
# Environment variables needed for ROCm
239+
env = optionals useRocm {
240+
ROCM_PATH = "${rocmPackages.clr}";
241+
HIP_DEVICE_LIB_PATH = "${rocmPackages.rocm-device-libs}/amdgcn/bitcode";
242+
};
243+
244244
# TODO(SomeoneSerge): It's better to add proper install targets at the CMake level,
245245
# if they haven't been added yet.
246246
postInstall = ''

.github/workflows/build.yml

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -329,6 +329,33 @@ jobs:
329329
cmake -DLLAMA_VULKAN=ON ..
330330
cmake --build . --config Release -j $(nproc)
331331
332+
ubuntu-22-cmake-hip:
333+
runs-on: ubuntu-22.04
334+
container: rocm/dev-ubuntu-22.04:6.0.2
335+
336+
steps:
337+
- name: Clone
338+
id: checkout
339+
uses: actions/checkout@v3
340+
341+
- name: Dependencies
342+
id: depends
343+
run: |
344+
sudo apt-get update
345+
sudo apt-get install -y build-essential git cmake rocblas-dev hipblas-dev
346+
347+
- name: Build with native CMake HIP support
348+
id: cmake_build
349+
run: |
350+
cmake -B build -S . -DCMAKE_HIP_COMPILER="$(hipconfig -l)/clang" -DLLAMA_HIPBLAS=ON
351+
cmake --build build --config Release -j $(nproc)
352+
353+
- name: Build with legacy HIP support
354+
id: cmake_build_legacy_hip
355+
run: |
356+
cmake -B build2 -S . -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc -DLLAMA_HIPBLAS=ON
357+
cmake --build build2 --config Release -j $(nproc)
358+
332359
ubuntu-22-cmake-sycl:
333360
runs-on: ubuntu-22.04
334361

@@ -851,6 +878,36 @@ jobs:
851878
path: llama-${{ steps.tag.outputs.name }}-bin-win-sycl-x64.zip
852879
name: llama-bin-win-sycl-x64.zip
853880

881+
windows-latest-cmake-hip:
882+
runs-on: windows
883+
884+
steps:
885+
- name: Clone
886+
id: checkout
887+
uses: actions/checkout@v3
888+
889+
- name: Install
890+
id: depends
891+
run: |
892+
$ErrorActionPreference = "Stop"
893+
write-host "Downloading AMD HIP SDK Installer"
894+
Invoke-WebRequest -Uri "https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-23.Q4-WinSvr2022-For-HIP.exe" -OutFile "${env:RUNNER_TEMP}\rocm-install.exe"
895+
write-host "Installing AMD HIP SDK"
896+
Start-Process "${env:RUNNER_TEMP}\rocm-install.exe" -ArgumentList '-install' -NoNewWindow -Wait
897+
write-host "Completed AMD HIP SDK installation"
898+
899+
- name: Verify ROCm
900+
id: verify
901+
run: |
902+
& 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' --version
903+
904+
- name: Build
905+
id: cmake_build
906+
run: |
907+
$env:HIP_PATH=$(Resolve-Path 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' | split-path | split-path)
908+
cmake -B build -S . -DCMAKE_C_COMPILER="$HIP_PATH\bin\clang.exe" -DCMAKE_CXX_COMPILER="$HIP_PATH\bin\clang++.exe" -DLLAMA_HIPBLAS=ON
909+
cmake --build build --config Release
910+
854911
ios-xcode-build:
855912
runs-on: macos-latest
856913

CMakeLists.txt

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -522,16 +522,26 @@ if (LLAMA_VULKAN)
522522
endif()
523523

524524
if (LLAMA_HIPBLAS)
525-
list(APPEND CMAKE_PREFIX_PATH /opt/rocm)
526-
527-
if (NOT ${CMAKE_C_COMPILER_ID} MATCHES "Clang")
528-
message(WARNING "Only LLVM is supported for HIP, hint: CC=/opt/rocm/llvm/bin/clang")
525+
if ($ENV{ROCM_PATH})
526+
set(ROCM_PATH $ENV{ROCM_PATH})
527+
else()
528+
set(ROCM_PATH /opt/rocm)
529529
endif()
530+
list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH})
530531

531-
if (NOT ${CMAKE_CXX_COMPILER_ID} MATCHES "Clang")
532-
message(WARNING "Only LLVM is supported for HIP, hint: CXX=/opt/rocm/llvm/bin/clang++")
533-
endif()
532+
string(REGEX MATCH "hipcc(\.bat)?$" CXX_IS_HIPCC "${CMAKE_CXX_COMPILER}")
534533

534+
if (CXX_IS_HIPCC AND UNIX)
535+
message(WARNING "Setting hipcc as the C++ compiler is legacy behavior."
536+
" Prefer setting the HIP compiler directly. See README for details.")
537+
else()
538+
# Forward AMDGPU_TARGETS to CMAKE_HIP_ARCHITECTURES.
539+
if (AMDGPU_TARGETS AND NOT CMAKE_HIP_ARCHITECTURES)
540+
set(CMAKE_HIP_ARCHITECTURES ${AMDGPU_ARGETS})
541+
endif()
542+
cmake_minimum_required(VERSION 3.21)
543+
enable_language(HIP)
544+
endif()
535545
find_package(hip REQUIRED)
536546
find_package(hipblas REQUIRED)
537547
find_package(rocblas REQUIRED)
@@ -565,13 +575,18 @@ if (LLAMA_HIPBLAS)
565575
add_compile_definitions(GGML_CUDA_MMV_Y=${LLAMA_CUDA_MMV_Y})
566576
add_compile_definitions(K_QUANTS_PER_ITERATION=${LLAMA_CUDA_KQUANTS_ITER})
567577

568-
set_source_files_properties(${GGML_SOURCES_ROCM} PROPERTIES LANGUAGE CXX)
578+
if (CXX_IS_HIPCC)
579+
set_source_files_properties(${GGML_SOURCES_ROCM} PROPERTIES LANGUAGE CXX)
580+
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} hip::device)
581+
else()
582+
set_source_files_properties(${GGML_SOURCES_ROCM} PROPERTIES LANGUAGE HIP)
583+
endif()
569584

570585
if (LLAMA_STATIC)
571586
message(FATAL_ERROR "Static linking not supported for HIP/ROCm")
572587
endif()
573588

574-
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} hip::device PUBLIC hip::host roc::rocblas roc::hipblas)
589+
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} PUBLIC hip::host roc::rocblas roc::hipblas)
575590
endif()
576591

577592
if (LLAMA_SYCL)

Makefile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -529,10 +529,10 @@ endif # LLAMA_VULKAN
529529
ifdef LLAMA_HIPBLAS
530530
ifeq ($(wildcard /opt/rocm),)
531531
ROCM_PATH ?= /usr
532-
GPU_TARGETS ?= $(shell $(shell which amdgpu-arch))
532+
AMDGPU_TARGETS ?= $(shell $(shell which amdgpu-arch))
533533
else
534534
ROCM_PATH ?= /opt/rocm
535-
GPU_TARGETS ?= $(shell $(ROCM_PATH)/llvm/bin/amdgpu-arch)
535+
AMDGPU_TARGETS ?= $(shell $(ROCM_PATH)/llvm/bin/amdgpu-arch)
536536
endif
537537
HIPCC ?= $(CCACHE) $(ROCM_PATH)/bin/hipcc
538538
LLAMA_CUDA_DMMV_X ?= 32
@@ -544,7 +544,7 @@ ifdef LLAMA_HIP_UMA
544544
endif # LLAMA_HIP_UMA
545545
MK_LDFLAGS += -L$(ROCM_PATH)/lib -Wl,-rpath=$(ROCM_PATH)/lib
546546
MK_LDFLAGS += -lhipblas -lamdhip64 -lrocblas
547-
HIPFLAGS += $(addprefix --offload-arch=,$(GPU_TARGETS))
547+
HIPFLAGS += $(addprefix --offload-arch=,$(AMDGPU_TARGETS))
548548
HIPFLAGS += -DGGML_CUDA_DMMV_X=$(LLAMA_CUDA_DMMV_X)
549549
HIPFLAGS += -DGGML_CUDA_MMV_Y=$(LLAMA_CUDA_MMV_Y)
550550
HIPFLAGS += -DK_QUANTS_PER_ITERATION=$(LLAMA_CUDA_KQUANTS_ITER)

README.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -507,13 +507,28 @@ Building the program with BLAS support may lead to some performance improvements
507507
```
508508
- Using `CMake` for Linux (assuming a gfx1030-compatible AMD GPU):
509509
```bash
510-
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ \
510+
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
511511
cmake -H. -Bbuild -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
512512
&& cmake --build build -- -j 16
513513
```
514514
On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting `-DLLAMA_HIP_UMA=ON"`.
515515
However, this hurts performance for non-integrated GPUs (but enables working with integrated GPUs).
516516
517+
Note that if you get the following error:
518+
```
519+
clang: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
520+
```
521+
Try searching for a directory under `HIP_PATH` that contains the file
522+
`oclc_abi_version_400.bc`. Then, add the following to the start of the
523+
command: `HIP_DEVICE_LIB_PATH=<directory-you-just-found>`, so something
524+
like:
525+
```bash
526+
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -p)" \
527+
HIP_DEVICE_LIB_PATH=<directory-you-just-found> \
528+
cmake -H. -Bbuild -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
529+
&& cmake --build build -- -j 16
530+
```
531+
517532
- Using `make` (example for target gfx1030, build with 16 CPU threads):
518533
```bash
519534
make -j16 LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx1030

0 commit comments

Comments
 (0)