-
Notifications
You must be signed in to change notification settings - Fork 35
Remove allocsSpreadAcrossAllNumaNodes test case #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove allocsSpreadAcrossAllNumaNodes test case #620
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure this test is running not skipping (on at least some configurations)... I'm afraid we will now skip it all the time (silently) and this test will never run... 😉
If that were the case we would get those fails all the time up until now, but it happened sporadically. |
I get that, but we may change something and unknowingly influence the available size hence making this test skip all the time... If you don't see any good option to assure this test is running, we may just hope for the best (and probably audit the test logs from time to time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the loop, we allocate up to 516MB across all NUMA nodes. However, we previously checked each NUMA node for this free space separately, which I believe is the bug we are looking for.
BTW, this test almost tests nothing and can be removed. It performs the following steps:
Checks if there is 516MB of free space on each NUMA node.
Allocates 4MB of memory until there is less than 516MB of free space across all NUMA nodes.
For the first page, it checks if it was bound with MPOL_DEFAULT (a good check) and with an empty nodemask (you cannot bind with MPOL_DEFAULT without an empty nodemask, so this tests if the OS works correctly).
Verifies that the page is allocated from an existing NUMA node, which again tests the OS rather than our code.
There is also a disabled check with a misleading comment: "Confirm that all the NUMA nodes (...) were exhausted." This check would verify if we allocated at least one page from each NUMA node (specifically, the first page in each 1024-page allocation from each NUMA node), which again tests if the kernel can allocate memory from each NUMA node.
TLDR: This test can be simplified to allocate one or multiple pages and check if they are bound with MPOL_DEFAULT. It can also be removed, as we have a similar test above that checks the same functionality in a different (and less explicit) way.
There were requirements for this test case to explicitly check whether all the numa nodes were really exhausted when allocating until OOM (since some issues with OOM killer, we changed the test to leave some |
Again - we are testing operating system not UMF. Let hypothetically assume, that we successfully mbind memory with MPOL_DEFAULT. But we did not "exhausted all numanodes" after allocation till OOM. So test fail. But this fail means only two things - there is bug in the kernel, or platform is wrongly configured. Testing kernel maybe it's not a bad thing - at least we are ensuring that mpool_default does this what we are expecting - but this test is fragile - It even cannot make it working reliable under qemu. It might fail because different process do allocations while this test is running I'm almost sure that it can fail if there is SWAP partition available etc.... The value of this test is just very small compared to how fragile this test it. |
I agree with @lplewa , I always disliked assuming how much we can allocate in a numa node. |
3bdfbe4
to
fa385a6
Compare
This test case is removed because it provides too little value compared to the cost to setup this test in appropriate environment. It was the cause of sporadic fails on UR CI.
fa385a6
to
c70ab7d
Compare
Description
Checklist