Skip to content

ggml : reading the runtime sve config of the cpu #8382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jdomke
Copy link
Contributor

@jdomke jdomke commented Jul 9, 2024

To go from SVE512 to SVE256 (or other configurations), one can use the following:

$ cat wrapper.c
#include <sys/prctl.h>
#include <unistd.h>
int main(int argc, char *argv[], char *envp[]){
  prctl(PR_SVE_SET_VL,32|PR_SVE_VL_INHERIT);
  execve(argv[1], &argv[1], envp); return 0;
}
$ gcc wrapper.c -o wrapper
$ ./wrapper ./bin/llama-cli ...

However, when ggml reads the "current" sve width via svcntb() it still gets the wrong value, and only prctl(PR_SVE_GET_VL) returns the correct value.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 9, 2024
@jdomke
Copy link
Contributor Author

jdomke commented Jul 11, 2024

The issue might actually be the result of a bug in our clang runtime causing svcntb to return the wrong value; we are confirming right now.

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jul 13, 2024
@jdomke
Copy link
Contributor Author

jdomke commented Jul 26, 2024

The issue might actually be the result of a bug in our clang runtime causing svcntb to return the wrong value; we are confirming right now.

According to Fujitsu's compiler developers, the compiler assumes that the vector length does not change during the program execution, and hence svcntb() is assumed to return the same vector length and the compiler may optimize it away. Therefor, the only workaround to allow users (withour root rights) to lower the vector length with the wrapper is to fall back to PR_SVE_VL_LEN_MASK & prctl(PR_SVE_GET_VL) inside of llama.cpp.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to update the svcntb() calls in ggml-quants.c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants