Skip to content

Commit 82ca83d

Browse files
authored
ROCm: use native CMake HIP support (#5966)
Supercedes #4024 and #4813. CMake's native HIP support has become the recommended way to add HIP code into a project (see [here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)). This PR makes the following changes: 1. The environment variable `HIPCXX` or CMake option `CMAKE_HIP_COMPILER` should be used to specify the HIP compiler. Notably this shouldn't be `hipcc`, but ROCm's clang, which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`. Note that since native CMake HIP support is not yet available on Windows, on Windows we fall back to the old behavior. 2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the GPU architectures to build for. Previously this was controled by `GPU_TARGETS`. 3. Updated the Nix recipe to account for these new changes. 4. The GPU targets to build against in the Nix recipe is now consistent with the supported GPU targets in nixpkgs. 5. Added CI checks for HIP on both Linux and Windows. On Linux, we test both the new and old behavior. The most important part about this PR is the separation of the HIP compiler and the C/C++ compiler. This allows users to choose a different C/C++ compiler if desired, compared to the current situation where when building for ROCm support, everything must be compiled with ROCm's clang. ~~Makefile is unchanged. Please let me know if we want to be consistent on variables' naming because Makefile still uses `GPU_TARGETS` to control architectures to build for, but I feel like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're calling `make`.~~ Makefile used `GPU_TARGETS` but the README says to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of `GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`. Thanks to the suggestion of @jin-eld, to maintain backwards compatibility (and not break too many downstream users' builds), if `CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using the original behavior and emit a warning that recommends switching to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but `CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS` to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new HIP support. Signed-off-by: Gavin Zhao <[email protected]>
1 parent f4bd8b3 commit 82ca83d

File tree

5 files changed

+122
-25
lines changed

5 files changed

+122
-25
lines changed

.devops/nix/package.nix

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -227,20 +227,20 @@ effectiveStdenv.mkDerivation (
227227
)
228228
]
229229
++ optionals useRocm [
230-
(cmakeFeature "CMAKE_C_COMPILER" "hipcc")
231-
(cmakeFeature "CMAKE_CXX_COMPILER" "hipcc")
232-
233-
# Build all targets supported by rocBLAS. When updating search for TARGET_LIST_ROCM
234-
# in https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CMakeLists.txt
235-
# and select the line that matches the current nixpkgs version of rocBLAS.
236-
# Should likely use `rocmPackages.clr.gpuTargets`.
237-
"-DAMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102"
230+
(cmakeFeature "CMAKE_HIP_COMPILER" "${rocmPackages.llvm.clang}/bin/clang")
231+
(cmakeFeature "CMAKE_HIP_ARCHITECTURES" (builtins.concatStringsSep ";" rocmPackages.clr.gpuTargets))
238232
]
239233
++ optionals useMetalKit [
240234
(lib.cmakeFeature "CMAKE_C_FLAGS" "-D__ARM_FEATURE_DOTPROD=1")
241235
(cmakeBool "LLAMA_METAL_EMBED_LIBRARY" (!precompileMetalShaders))
242236
];
243237

238+
# Environment variables needed for ROCm
239+
env = optionals useRocm {
240+
ROCM_PATH = "${rocmPackages.clr}";
241+
HIP_DEVICE_LIB_PATH = "${rocmPackages.rocm-device-libs}/amdgcn/bitcode";
242+
};
243+
244244
# TODO(SomeoneSerge): It's better to add proper install targets at the CMake level,
245245
# if they haven't been added yet.
246246
postInstall = ''

.github/workflows/build.yml

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -392,6 +392,33 @@ jobs:
392392
cmake -DLLAMA_VULKAN=ON ..
393393
cmake --build . --config Release -j $(nproc)
394394
395+
ubuntu-22-cmake-hip:
396+
runs-on: ubuntu-22.04
397+
container: rocm/dev-ubuntu-22.04:6.0.2
398+
399+
steps:
400+
- name: Clone
401+
id: checkout
402+
uses: actions/checkout@v3
403+
404+
- name: Dependencies
405+
id: depends
406+
run: |
407+
sudo apt-get update
408+
sudo apt-get install -y build-essential git cmake rocblas-dev hipblas-dev
409+
410+
- name: Build with native CMake HIP support
411+
id: cmake_build
412+
run: |
413+
cmake -B build -S . -DCMAKE_HIP_COMPILER="$(hipconfig -l)/clang" -DLLAMA_HIPBLAS=ON
414+
cmake --build build --config Release -j $(nproc)
415+
416+
- name: Build with legacy HIP support
417+
id: cmake_build_legacy_hip
418+
run: |
419+
cmake -B build2 -S . -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc -DLLAMA_HIPBLAS=ON
420+
cmake --build build2 --config Release -j $(nproc)
421+
395422
ubuntu-22-cmake-sycl:
396423
runs-on: ubuntu-22.04
397424

@@ -989,6 +1016,37 @@ jobs:
9891016
path: llama-${{ steps.tag.outputs.name }}-bin-win-sycl-x64.zip
9901017
name: llama-bin-win-sycl-x64.zip
9911018

1019+
windows-latest-cmake-hip:
1020+
runs-on: windows-latest
1021+
1022+
steps:
1023+
- name: Clone
1024+
id: checkout
1025+
uses: actions/checkout@v3
1026+
1027+
- name: Install
1028+
id: depends
1029+
run: |
1030+
$ErrorActionPreference = "Stop"
1031+
write-host "Downloading AMD HIP SDK Installer"
1032+
Invoke-WebRequest -Uri "https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-23.Q4-WinSvr2022-For-HIP.exe" -OutFile "${env:RUNNER_TEMP}\rocm-install.exe"
1033+
write-host "Installing AMD HIP SDK"
1034+
Start-Process "${env:RUNNER_TEMP}\rocm-install.exe" -ArgumentList '-install' -NoNewWindow -Wait
1035+
write-host "Completed AMD HIP SDK installation"
1036+
1037+
- name: Verify ROCm
1038+
id: verify
1039+
run: |
1040+
& 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' --version
1041+
1042+
- name: Build
1043+
id: cmake_build
1044+
run: |
1045+
$env:HIP_PATH=$(Resolve-Path 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' | split-path | split-path)
1046+
$env:CMAKE_PREFIX_PATH="${env:HIP_PATH}"
1047+
cmake -G "Unix Makefiles" -B build -S . -DCMAKE_C_COMPILER="${env:HIP_PATH}\bin\clang.exe" -DCMAKE_CXX_COMPILER="${env:HIP_PATH}\bin\clang++.exe" -DLLAMA_HIPBLAS=ON
1048+
cmake --build build --config Release
1049+
9921050
ios-xcode-build:
9931051
runs-on: macos-latest
9941052

CMakeLists.txt

Lines changed: 34 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -555,16 +555,37 @@ if (LLAMA_VULKAN)
555555
endif()
556556

557557
if (LLAMA_HIPBLAS)
558-
list(APPEND CMAKE_PREFIX_PATH /opt/rocm)
559-
560-
if (NOT ${CMAKE_C_COMPILER_ID} MATCHES "Clang")
561-
message(WARNING "Only LLVM is supported for HIP, hint: CC=/opt/rocm/llvm/bin/clang")
558+
if ($ENV{ROCM_PATH})
559+
set(ROCM_PATH $ENV{ROCM_PATH})
560+
else()
561+
set(ROCM_PATH /opt/rocm)
562562
endif()
563+
list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH})
563564

564-
if (NOT ${CMAKE_CXX_COMPILER_ID} MATCHES "Clang")
565-
message(WARNING "Only LLVM is supported for HIP, hint: CXX=/opt/rocm/llvm/bin/clang++")
565+
# CMake on Windows doesn't support the HIP language yet
566+
if(WIN32)
567+
set(CXX_IS_HIPCC TRUE)
568+
else()
569+
string(REGEX MATCH "hipcc(\.bat)?$" CXX_IS_HIPCC "${CMAKE_CXX_COMPILER}")
566570
endif()
567571

572+
if(CXX_IS_HIPCC)
573+
if(LINUX)
574+
if (NOT ${CMAKE_CXX_COMPILER_ID} MATCHES "Clang")
575+
message(WARNING "Only LLVM is supported for HIP, hint: CXX=/opt/rocm/llvm/bin/clang++")
576+
endif()
577+
578+
message(WARNING "Setting hipcc as the C++ compiler is legacy behavior."
579+
" Prefer setting the HIP compiler directly. See README for details.")
580+
endif()
581+
else()
582+
# Forward AMDGPU_TARGETS to CMAKE_HIP_ARCHITECTURES.
583+
if(AMDGPU_TARGETS AND NOT CMAKE_HIP_ARCHITECTURES)
584+
set(CMAKE_HIP_ARCHITECTURES ${AMDGPU_ARGETS})
585+
endif()
586+
cmake_minimum_required(VERSION 3.21)
587+
enable_language(HIP)
588+
endif()
568589
find_package(hip REQUIRED)
569590
find_package(hipblas REQUIRED)
570591
find_package(rocblas REQUIRED)
@@ -598,13 +619,18 @@ if (LLAMA_HIPBLAS)
598619
add_compile_definitions(GGML_CUDA_MMV_Y=${LLAMA_CUDA_MMV_Y})
599620
add_compile_definitions(K_QUANTS_PER_ITERATION=${LLAMA_CUDA_KQUANTS_ITER})
600621

601-
set_source_files_properties(${GGML_SOURCES_ROCM} PROPERTIES LANGUAGE CXX)
622+
if (CXX_IS_HIPCC)
623+
set_source_files_properties(${GGML_SOURCES_ROCM} PROPERTIES LANGUAGE CXX)
624+
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} hip::device)
625+
else()
626+
set_source_files_properties(${GGML_SOURCES_ROCM} PROPERTIES LANGUAGE HIP)
627+
endif()
602628

603629
if (LLAMA_STATIC)
604630
message(FATAL_ERROR "Static linking not supported for HIP/ROCm")
605631
endif()
606632

607-
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} hip::device PUBLIC hip::host roc::rocblas roc::hipblas)
633+
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} PUBLIC hip::host roc::rocblas roc::hipblas)
608634
endif()
609635

610636
if (LLAMA_SYCL)

Makefile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -560,10 +560,10 @@ endif # LLAMA_VULKAN
560560
ifdef LLAMA_HIPBLAS
561561
ifeq ($(wildcard /opt/rocm),)
562562
ROCM_PATH ?= /usr
563-
GPU_TARGETS ?= $(shell $(shell which amdgpu-arch))
563+
AMDGPU_TARGETS ?= $(shell $(shell which amdgpu-arch))
564564
else
565565
ROCM_PATH ?= /opt/rocm
566-
GPU_TARGETS ?= $(shell $(ROCM_PATH)/llvm/bin/amdgpu-arch)
566+
AMDGPU_TARGETS ?= $(shell $(ROCM_PATH)/llvm/bin/amdgpu-arch)
567567
endif
568568
HIPCC ?= $(CCACHE) $(ROCM_PATH)/bin/hipcc
569569
LLAMA_CUDA_DMMV_X ?= 32
@@ -575,7 +575,7 @@ ifdef LLAMA_HIP_UMA
575575
endif # LLAMA_HIP_UMA
576576
MK_LDFLAGS += -L$(ROCM_PATH)/lib -Wl,-rpath=$(ROCM_PATH)/lib
577577
MK_LDFLAGS += -lhipblas -lamdhip64 -lrocblas
578-
HIPFLAGS += $(addprefix --offload-arch=,$(GPU_TARGETS))
578+
HIPFLAGS += $(addprefix --offload-arch=,$(AMDGPU_TARGETS))
579579
HIPFLAGS += -DGGML_CUDA_DMMV_X=$(LLAMA_CUDA_DMMV_X)
580580
HIPFLAGS += -DGGML_CUDA_MMV_Y=$(LLAMA_CUDA_MMV_Y)
581581
HIPFLAGS += -DK_QUANTS_PER_ITERATION=$(LLAMA_CUDA_KQUANTS_ITER)

README.md

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -528,13 +528,28 @@ Building the program with BLAS support may lead to some performance improvements
528528
```
529529
- Using `CMake` for Linux (assuming a gfx1030-compatible AMD GPU):
530530
```bash
531-
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ \
532-
cmake -B build -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
531+
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
532+
cmake -S . -B build -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
533533
&& cmake --build build --config Release -- -j 16
534534
```
535535
On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting `-DLLAMA_HIP_UMA=ON`.
536536
However, this hurts performance for non-integrated GPUs (but enables working with integrated GPUs).
537537

538+
Note that if you get the following error:
539+
```
540+
clang: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
541+
```
542+
Try searching for a directory under `HIP_PATH` that contains the file
543+
`oclc_abi_version_400.bc`. Then, add the following to the start of the
544+
command: `HIP_DEVICE_LIB_PATH=<directory-you-just-found>`, so something
545+
like:
546+
```bash
547+
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -p)" \
548+
HIP_DEVICE_LIB_PATH=<directory-you-just-found> \
549+
cmake -S . -B build -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
550+
&& cmake --build build -- -j 16
551+
```
552+
538553
- Using `make` (example for target gfx1030, build with 16 CPU threads):
539554
```bash
540555
make -j16 LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx1030
@@ -543,10 +558,8 @@ Building the program with BLAS support may lead to some performance improvements
543558
- Using `CMake` for Windows (using x64 Native Tools Command Prompt for VS, and assuming a gfx1100-compatible AMD GPU):
544559
```bash
545560
set PATH=%HIP_PATH%\bin;%PATH%
546-
mkdir build
547-
cd build
548-
cmake -G Ninja -DAMDGPU_TARGETS=gfx1100 -DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release ..
549-
cmake --build .
561+
cmake -S . -B build -G Ninja -DAMDGPU_TARGETS=gfx1100 -DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release
562+
cmake --build build
550563
```
551564
Make sure that `AMDGPU_TARGETS` is set to the GPU arch you want to compile for. The above example uses `gfx1100` that corresponds to Radeon RX 7900XTX/XT/GRE. You can find a list of targets [here](https://llvm.org/docs/AMDGPUUsage.html#processors)
552565
Find your gpu version string by matching the most significant version information from `rocminfo | grep gfx | head -1 | awk '{print $2}'` with the list of processors, e.g. `gfx1035` maps to `gfx1030`.

0 commit comments

Comments
 (0)