Skip to content

Add support for cpu_get_num_physical_cores() on Windows #8771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 16, 2024

Conversation

Septa2112
Copy link
Contributor

@Septa2112 Septa2112 commented Jul 30, 2024

Add a function cpu_get_num_cores_win(bool print_physical_core_num) to detect the physical and logical cores on Windows. Now the program will successfully detect all sockets and CPUs and set the default number of threads more accurately on Windows.

Test Platform

  • OS
    • Windows 10
    • Windows 11
  • CPU
    • Intel Xeon CPU E5-2699 v4 (22 physical core per socket, 2 Sockets in total)
    • Intel i7-1185G7 (4 physical cores)
    • Intel Core Ultra5 125H (4 P-core + 8 E-core + 2 LPE-core)

Before

When I run llama-cli.exe -m <my_model> -p <prompt>, the output is like this:

Intel Xeon CPU E5-2699 v4 (2 Sockets)

system_info: n_threads = 22 / 44 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | 

Intel i7-1185G7

system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | 

Intel MTL Ultra5

system_info: n_threads = 9 / 18 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | 

After Modification

Intel Xeon CPU E5-2699 v4 (2 Sockets)

system_info: n_threads = 44 / 88 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | 

Intel i7-1185G7

system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | 

Intel MTL Ultra5

system_info: n_threads = 14 / 18 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | 

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Aug 1, 2024
@Septa2112 Septa2112 changed the title Add support for cpu_get_num_phsical_cores() on Windows Add support for cpu_get_num_physical_cores() on Windows Aug 5, 2024
@Septa2112
Copy link
Contributor Author

@mofosyne Could you have a look at my PR? Or anyone others? Thanks.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need some reports on Windows that this change works as expected

@mofosyne mofosyne added windows Issues specific to Windows performance Speed related topics labels Aug 6, 2024
@Septa2112
Copy link
Contributor Author

Septa2112 commented Aug 8, 2024

I noticed that the CI CI / windows-latest-cmake (msvc-arm64, -G "Ninja Multi-Config"... failed, indicating that the Windows API I used was not detected.
However, another CI about arm64 CI / windows-latest-cmake (llvm-arm64, -G "Ninja Multi-Config" is successful(tested myself using GitHub Actions).

I noticed that these two CIs were added due to #7191

To solve this, I attempted to skip the ARM64 platform with the following macros:

#elif defined(_WIN32) && (!defined(__ARM_NEON) || !defined(_M_ARM) || !defined(_M_ARM64) || !defined(__aarch64__) || !defined(__arm__) || !defined(__ARM_ARCH))

However, despite my repeated attempts, none of these macros related to the ARM architecture were detected in this CI environment.

I suspect there might be an issue with the toolchain, but unfortunately, I don't have any ARM64+Windows machine for testing. Could you kindly suggest any solutions or approaches to resolve this issue?

@Septa2112
Copy link
Contributor Author

Septa2112 commented Aug 8, 2024

I carefully reviewed the CI error and found that the issue seems to be with MinGW64+ARM64 on windows. Therefore, I made the following modification

#elif defined(_WIN32) && (_WIN32_WINNT >= 0x0601) && !defined(__MINGW64__) // windows 7 and later

to bypass the issue with the toolchain on ARM64+Windows by detecting __MINGW64__.

@Septa2112 Septa2112 requested a review from ggerganov August 8, 2024 11:14
@hmartinez82
Copy link

hmartinez82 commented Aug 15, 2024

@Septa2112 Once this is merged I'll take a look at why ARM64 MinGW64 is not building :-)
I have two Windows on ARM64 machines. Once with Snapdragon 8cx (ARMv8.2-A) and one with Snapdragon Elite X (ARMv8.7-A)

@Septa2112
Copy link
Contributor Author

@hmartinez82 That's great. Let's see when the PR will be merged. Perhaps need more test on windows.

@ggerganov ggerganov merged commit fb487bb into ggml-org:master Aug 16, 2024
53 checks passed
@hmartinez82
Copy link

hmartinez82 commented Aug 16, 2024

@Septa2112 Just as I suspected, after glancing over the PR, there's nothing there that would not work. I removed the && !defined(__MINGW64__) in both lines and tried building for ARM64 with both x-compiling from a x64 MSVC (17.10) and natively with MSYS2's Clang in a CLANGARM64 environment.

The build was successful in both cases.

@Septa2112
Copy link
Contributor Author

Septa2112 commented Aug 16, 2024

@hmartinez82 Thanks for you testing. But if I remove this macro, the corresponding CI checks will fail. Maybe something else is causing the issue?

This related CI was introduced in the issue #7191.

@hmartinez82
Copy link

Ahh ok, I thought that the issue was during build time. I'll try to replicate what the CI does. Thank you.

@Septa2112 Septa2112 deleted the pr-test branch August 21, 2024 02:55
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
…l-org#8771)

* Add support for cpu_get_num_phsical_cores() on Windows

* fix build bug on msys2-clang64 and ucrt64

* avoid adding new function

* add new macros to avoid windows+mingw64

* Add error checking to return default value
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
…l-org#8771)

* Add support for cpu_get_num_phsical_cores() on Windows

* fix build bug on msys2-clang64 and ucrt64

* avoid adding new function

* add new macros to avoid windows+mingw64

* Add error checking to return default value
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Feb 25, 2025
…l-org#8771)

* Add support for cpu_get_num_phsical_cores() on Windows

* fix build bug on msys2-clang64 and ucrt64

* avoid adding new function

* add new macros to avoid windows+mingw64

* Add error checking to return default value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Speed related topics Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix windows Issues specific to Windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants