Skip to content

Avoid using __fp16 on ARM with old nvcc, fixes #10555 #10616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

frankier
Copy link
Contributor

@frankier frankier commented Dec 1, 2024

Fixes #10555

It appears that NVCC on CUDA <= 11 doesn't have __fp16, so as I understand it this should fallback to a default slower implementation.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 1, 2024
@frankier frankier force-pushed the fix-nvcc-cuda11-arm branch from 9a240e6 to 407fba8 Compare December 1, 2024 17:17
@frankier frankier force-pushed the fix-nvcc-cuda11-arm branch from 407fba8 to dca39b0 Compare December 2, 2024 08:39
@slaren slaren merged commit cd2f37b into ggml-org:master Dec 4, 2024
50 checks passed
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Dec 7, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compile bug: ggml-impl.h(314): error: identifier "__fp16" is undefined on Jetson AGX Xavier
2 participants