Skip to content

avx on Core i5-2400 #562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

avx on Core i5-2400 #562

wants to merge 5 commits into from

Conversation

kaufmannr
Copy link

Good evening,

I needed to deactivate avx compile flag for my Core i5-2400 to get it compile on Linux. Somebody else did implement the missing fp16 functions as code, but I was unable to find the right place for it in the macro hell.
I think this would be even better than just deactivating avx but I cannot find this code/comment anymore.

Would be cool, if your code would run on my CPU.

Thanks + best regards, Rainer

@anzz1
Copy link
Contributor

anzz1 commented Mar 27, 2023

The spec sheet for i5-2400 says that AVX should be supported on that processor?

It's probably AVX2 what you want to disable, but keep AVX.
Try:

mkdir build
cd build
cmake -DLLAMA_AVX2=OFF ..
cmake --build . --config Release

Okay thats for cmake, it seems you are using make.

However , your processor supports AVX but does not support F16C.
You are doing it the wrong way around, checking for F16C first and then for AVX?
Is there a problem of checking them separately like it currently does?
You should be able to enable AVX and only disable F16C , to enable AVX features.

It doesn't work with AVX enabled and F16C disabled?

@slaren
Copy link
Member

slaren commented Mar 27, 2023

If there are any cases where F16C is used without checking for the flag, that should be fixed instead.

@slaren
Copy link
Member

slaren commented Mar 27, 2023

I think ggml.c:1131 may be the culprit:

#define GGML_F32Cx8_STORE(x, y) _mm_storeu_si128((__m128i *)(x), _mm256_cvtps_ph(y, 0))

If I am not mistaken _mm256_cvtps_ph is only available with F16C, but this code only checks for AVX.

@anzz1
Copy link
Contributor

anzz1 commented Mar 27, 2023

I think ggml.c:1131 may be the culprit:

#define GGML_F32Cx8_STORE(x, y) _mm_storeu_si128((__m128i *)(x), _mm256_cvtps_ph(y, 0))

If I am not mistaken _mm256_cvtps_ph is only available with F16C, but this code only checks for AVX.

You are correct.
And load too

#define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))

_mm256_cvtph_ps() = VCVTPH2PS
_mm256_cvtps_ph() = VCVTPS2PH

both are F16C/CVT16 instructions.
and should be guarded with __F16C__ check

using the SSE versions instead would fix that

#define GGML_F32Cx4_LOAD(x)     __sse_f16x4_load(x)
#define GGML_F32Cx4_STORE(x, y) __sse_f16x4_store(x, y)

@slaren
Copy link
Member

slaren commented Mar 27, 2023

Superseded by #563

@slaren slaren closed this Mar 27, 2023
AAbushady pushed a commit to AAbushady/llama.cpp that referenced this pull request Jan 27, 2024
* .sh script V1

* koboldcpp.sh polish

* koboldcpp.sh dist generator

* Include html's in dist

* RWKV in Linux Dist

* Lower dependency requirements

* Eliminate wget dependency

* More distinct binary name

I know its technically amd64, but I don't want to cause confusion among nvidia users.

* Use System OpenCL

Unsure how this will behave in the pyinstaller build, but pocl ended up CPU only. With a bit of luck the pyinstaller uses the one from the actual system if compiled in a system without opencl, while conda now includes it for that specific system.

* Add cblas dependency

Missing this causes compile failures on some system's

* ICD workaround

Ideally we find a better solution, but conda forces ICD and needs this for the successful compile. However, pyinstaller then embeds the ICD causing it to be limited to the system it was compiled for. By temporarily removing the ICD pyinstaller can't find it and everything remains functional. Ideally we do this on a pyinstaller level, but I could not find any good options to do so yet.

* Fix & Nocuda

---------

Co-authored-by: root <root@DESKTOP-DQ1QRAG>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants