forked from openstack/nova
-
Notifications
You must be signed in to change notification settings - Fork 0
Fix for bugs 1839560 and 1839674 #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
markgoddard
approved these changes
Sep 26, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, we'll need to install in the nova-conductor image in addition to nova-compute.
We probably want https://review.opendev.org/#/c/676280/ as well. |
If ComputeNode.create() fails, the update_available_resource periodic will not try to create it again because it will be mapped in the compute_nodes dict and _init_compute_node will return early but trying to save changes to that ComputeNode object later will fail because there is no id on the object, since we failed to create it in the DB. This simply reverses the logic such that we only map the compute node if we successfully created it. Some existing _init_compute_node testing had to be changed since it relied on the order of when the ComputeNode object is created and put into the compute_nodes dict in order to pass the object along to some much lower-level PCI tracker code, which was arguably mocking too deep for a unit test. That is changed to avoid the low-level mocking and assertions and just assert that _setup_pci_tracker is called as expected. Conflicts: nova/compute/resource_tracker.py nova/tests/unit/compute/test_resource_tracker.py NOTE(mriedem): The resource_tracker.py conflict is due to not having I14a310b20bd9892e7b34464e6baad49bf5928ece in Rocky. The test conflicts are due to not having change I37e8ad5b14262d801702411c2c87e73550adda70 in Rocky. Change-Id: I9fa1d509a3de405d6246fb8670612c65c10cc93b Closes-Bug: #1839674 (cherry picked from commit f578146) (cherry picked from commit 648770b) (cherry picked from commit 35273a8)
This adds a functional test which recreates bug 1839560 where the driver reports a node, then no longer reports it so the compute manager deletes it, and then the driver reports it again later (this can be common with ironic nodes as they undergo maintenance). The issue is that since Ia69fabce8e7fd7de101e291fe133c6f5f5f7056a in Rocky, the ironic node uuid is re-used for the compute node uuid but there is a unique constraint on the compute node uuid so when trying to create the compute node once the ironic node is available again, the compute node create fails with a duplicate entry error due to the duplicate uuid. To recreate this in the functional test, a new fake virt driver is added which provides a predictable uuid per node like the ironic driver. The test also shows that archiving the database is a way to workaround the bug until it's properly fixed. NOTE(mriedem): Since change Idaed39629095f86d24a54334c699a26c218c6593 is not in Rocky the PlacementFixture still comes from nova_fixtures. Change-Id: If822509e906d5094f13a8700b2b9ed3c40580431 Related-Bug: #1839560 (cherry picked from commit 89dd74a) (cherry picked from commit e7109d4) (cherry picked from commit ecd1e04)
There is a unique index on the compute_nodes.uuid column which means we can't have more than one compute_nodes record in the same DB with the same UUID even if one is soft deleted because the deleted column is not part of that unique index constraint. This is a problem with ironic nodes where the node is 1:1 with the compute node record, and when a node is undergoing maintenance the driver doesn't return it from get_available_nodes() so the ComputeManager.update_available_resource periodic task (soft) deletes the compute node record, but when the node is no longer under maintenance in ironic and the driver reports it, the ResourceTracker._init_compute_node code will fail to create the ComputeNode record again because of the duplicate uuid. This change handles the DBDuplicateEntry error in compute_node_create by finding the soft-deleted compute node with the same uuid and simply updating it to no longer be (soft) deleted. Closes-Bug: #1839560 Change-Id: Iafba419fe86446ffe636721f523fb619f8f787b3 (cherry picked from commit 8b00726) (cherry picked from commit 1b02166) (cherry picked from commit 9ce9484)
cad4fee
to
0fbfc26
Compare
markgoddard
approved these changes
Sep 27, 2019
jovial
pushed a commit
that referenced
this pull request
Oct 22, 2019
This addresses review comments from the following changes: I61a3e8902a891bac36911812e4e7c080570e3850 I48e6db9693e470b177bf4c75211d8b883c768433 Ic70d2bb781b6a844849a5cf2fe4d271b5a81093d I5a956513f3485074023e027430cc52ee7a3f92e4 Ica6152ccb97dce805969d964d6ed032bfe22a33f Part of blueprint bandwidth-resource-provider Change-Id: Idffaa6d206cda3f507e6be095356537f22302ad7
jovial
pushed a commit
that referenced
this pull request
Oct 22, 2019
github-actions bot
pushed a commit
that referenced
this pull request
Jan 25, 2023
In py310 unittest.mock does not allow to mock the same function twice as the second mocking will fail to autospec the Mock object created by the first mocking. This patch manually fixes the double mocking. Fixed cases: 1) one of the mock was totally unnecessary so it was removed 2) the second mock specialized the behavior of the first generic mock. In this case the second mock is replaced with the configuration of the first mock 3) a test case with two test steps mocked the same function for each step with overlapping mocks. Here the overlap was removed to have the two mock exists independently The get_connection injection in the libvirt functional test needed a further tweak (yeah I know it has many already) to act like a single mock (basically case #2) instead of a temporary re-mocking. Still the globalness of the get_connection mocking warrant the special set / reset logic there. Conflicts: nova/tests/unit/api/openstack/compute/test_limits.py nova/tests/unit/virt/libvirt/volume/test_lightos.py nova/tests/unit/virt/test_block_device.py Change-Id: I3998d0d49583806ac1c3ae64f1b1fe343cefd20d (cherry picked from commit f8cf050) (cherry picked from commit b40bd1b)
github-actions bot
pushed a commit
that referenced
this pull request
Jan 25, 2023
In py310 unittest.mock does not allow to mock the same function twice as the second mocking will fail to autospec the Mock object created by the first mocking. This patch manually fixes the double mocking. Fixed cases: 1) one of the mock was totally unnecessary so it was removed 2) the second mock specialized the behavior of the first generic mock. In this case the second mock is replaced with the configuration of the first mock 3) a test case with two test steps mocked the same function for each step with overlapping mocks. Here the overlap was removed to have the two mock exists independently The get_connection injection in the libvirt functional test needed a further tweak (yeah I know it has many already) to act like a single mock (basically case #2) instead of a temporary re-mocking. Still the globalness of the get_connection mocking warrant the special set / reset logic there. Conflicts: nova/tests/functional/regressions/test_bug_1781286.py nova/tests/unit/api/openstack/compute/test_shelve.py Change-Id: I3998d0d49583806ac1c3ae64f1b1fe343cefd20d (cherry picked from commit f8cf050)
github-actions bot
pushed a commit
that referenced
this pull request
Dec 18, 2023
In py310 unittest.mock does not allow to mock the same function twice as the second mocking will fail to autospec the Mock object created by the first mocking. This patch manually fixes the double mocking. Fixed cases: 1) one of the mock was totally unnecessary so it was removed 2) the second mock specialized the behavior of the first generic mock. In this case the second mock is replaced with the configuration of the first mock 3) a test case with two test steps mocked the same function for each step with overlapping mocks. Here the overlap was removed to have the two mock exists independently The get_connection injection in the libvirt functional test needed a further tweak (yeah I know it has many already) to act like a single mock (basically case #2) instead of a temporary re-mocking. Still the globalness of the get_connection mocking warrant the special set / reset logic there. Change-Id: I3998d0d49583806ac1c3ae64f1b1fe343cefd20d
github-actions bot
pushed a commit
that referenced
this pull request
Dec 18, 2023
The PciPassthroughFilter logic checks each InstancePCIRequest individually against the available PCI pools of a given host and given boot request. So it is possible that the scheduler accepts a host that has a single PCI device available even if two devices are requested for a single instance via two separate PCI aliases. Then the PCI claim on the compute detects this but does not stop the boot just logs an ERROR. This results in the instance booted without any PCI device. This patch does two things: 1) changes the PCI claim to fail with an exception and trigger a re-schedule instead of just logging an ERROR. 2) change the PciDeviceStats.support_requests that is called during scheduling to not just filter pools for individual requests but also consume the request from the pool within the scope of a single boot request. The fix in #2) would not be enough alone as two parallel scheduling request could race for a single device on the same host. #1) is the ultimate place where we consume devices under a compute global lock so we need the fix there too. Closes-Bug: #1986838 Change-Id: Iea477be57ae4e95dfc03acc9368f31d4be895343
github-actions bot
pushed a commit
that referenced
this pull request
Dec 18, 2023
Each PCI device and each PF is a separate RP in Placement and the scheduler allocate them specifically so the PCI filtering and claiming also needs to handle these devices individually. Nova pooled PCI devices together if they had the same device_spec and same device type and numa node. Now this is changed that only pool VFs from the same parent PF. Fortunately nova already handled consuming devices for a single InstancePCIRequest from multiple PCI pools, so this change does not affect the device consumption code path. The test_live_migrate_server_with_neutron test needed to be changed. Originally this test used a compute with the following config: * PF 81.00.0 ** VFs 81.00.[1-4] * PF 81.01.0 ** VFs 81.01.[1-4] * PF 82.00.0 And booted a VM that needed one VF and one PF. This request has two widely different solutions: 1) allocate the VF from under 81.00 and therefore consume 81.00.0 and allocate the 82.00.0 PF This was what the test asserted to happen. 2) allocate the VF from under 81.00 and therefore consume 81.00.0 and allocate the 81.00.0 PF and therefore consume all the VFs under it This results in a different amount of free devices than #1) AFAIK nova does not have any implemented preference for consuming PFs without VFs. The test just worked by chance (some internal device and pool ordering made it that way). However when the PCI pools are split nova started choosing solution #2) making the test fail. As both solution is equally good from nova's scheduling contract perspective I don't consider this as a behavior change. Therefore the test is updated not to create a situation where two different scheduling solutions are possible. blueprint: pci-device-tracking-in-placement Change-Id: I4b67cca3807fbda9e9b07b220a28e331def57624
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.