Avoid using __fp16 on ARM with old nvcc, fixes #10555 #10616

frankier · 2024-12-01T17:07:59Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

It appears that NVCC on CUDA <= 11 doesn't have __fp16, so as I understand it this should fallback to a default slower implementation.

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 1, 2024

frankier force-pushed the fix-nvcc-cuda11-arm branch from 9a240e6 to 407fba8 Compare December 1, 2024 17:17

Avoid using __fp16 on ARM with old nvcc

dca39b0

frankier force-pushed the fix-nvcc-cuda11-arm branch from 407fba8 to dca39b0 Compare December 2, 2024 08:39

slaren approved these changes Dec 4, 2024

View reviewed changes

slaren merged commit cd2f37b into ggml-org:master Dec 4, 2024
50 checks passed

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Dec 7, 2024

Avoid using __fp16 on ARM with old nvcc (ggml-org#10616)

9a408f1

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

Avoid using __fp16 on ARM with old nvcc (ggml-org#10616)

2d029b1

ggerganov mentioned this pull request Apr 6, 2025

ggml : simplify Arm fp16 CPU logic ggml-org/ggml#1177

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid using __fp16 on ARM with old nvcc, fixes #10555 #10616

Avoid using __fp16 on ARM with old nvcc, fixes #10555 #10616

Uh oh!

frankier commented Dec 1, 2024

Uh oh!

Uh oh!

Uh oh!

Avoid using __fp16 on ARM with old nvcc, fixes #10555 #10616

Avoid using __fp16 on ARM with old nvcc, fixes #10555 #10616

Uh oh!

Conversation

frankier commented Dec 1, 2024

Uh oh!

Uh oh!

Uh oh!