-
Notifications
You must be signed in to change notification settings - Fork 4.4k
'-Ofast' and '-march=native' provide significant speedup #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ut no AVX) instructions. Should help other platforms, too.
See #251 for details. |
I get:
idk why compilers can't standardise this stuff, but I guess it should be arch-conditional. |
From the official GCC documentation ( https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html ):
So the compiler you tried this on, Luke, is probably a rather old version of GCC. In fact, when I try
Specs of my test: Arch Linux on Celeron N5095, with GCC12.2. The deprecation of -mcpu is quite a ways back, GCC-version wise. As for the speed difference between |
That was with GCC 11.3.0 for ppc64le. There is no |
Well, like all things, this is a balancing act... The usual autoconf/automake machinery can be used, to have "./configure" emit a Makefile that uses whatever options apply best to the current machine. I can do that for whisper.cpp, if @ggerganov is OK with the involved complexity. But as-is, |
So I'm not 100% sure what to do here. On my MacBook, building with stock
Using Let's think about this some more. Maybe we can hear more points of view on this topic and get better insight. |
Well, this is what autoconf/automake were built for: to pick the best compilation options possible for the specific target we are building on. IMHO it's a shame to leave a 2x speedup on the table... I could write the necessary |
Just one more note: in GCC land, you can ask the compiler to emit instruction-set-specific versions of the functions, and dispatch appropriately at run-time, based on the machine we run on: https://github.com/ttsiodras/MandelbrotSSE/blob/master/src/xaos.cc#L31 I used that to get maximum flexibility in there - worked like a charm. I don't know if clang supports that, though. |
In order for you to have more information on the autoconf/automake decision, I just pushed a few commits - you can try it out and decide for yourself.
To see for yourself: after you clone my version of the repo, launch To modify the logic, edit |
+1 to autotools. That would also make it simpler to libtoolise the library and make the examples link to it. |
@ttsiodras I did a few tests with and without
Lower
Given these results, I don't think it is crucial to have these flags. Sometimes they help, sometimes they don't. So I think for now, I will leave the existing Makefile as it is. |
'-Ofast' and '-march=native' cause 2x-speedup in machines with SSE (but no AVX) instructions. Should help other platforms, too.