You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two reasons for disabling BME on an RDMA capable device on a
panic. The first one is Exadata's FNDD logic, the other is quite
simple; we do not want a vmcore to be generated whilst there are
incoming RDMA Write/Atomics, modifying the image while it is
generated.
Exadata has implemented Fast Node Death Detection (FNDD) by means of
posting RDMA operation between nodes. Now, since the RDMA responder is
handled by the remote HCA, the RDMA requests will be satisfied, even
though the host OS is crashing, generating vmcore, or other. It will
do so until the PCIe PRST signal (PCI Reset) has been raised.
Exadata has tried to circumvent this situation by having a process at
the target, regularly incrementing a variable in the MR used by FNDD,
then performing RDMA Read to read the variable and make sure it
increments.
This mechanism has proven to give too many false negatives, as the
process incrementing the variable may be suspended for several
seconds, due to a very high number of processes.
Hence, the idea is to revoke the HCA's ability to perform host memory
accesses, by simply resetting the Bus Master Enable (BME) bit, when
the host OS panics or reboots.
Here is an excerpt of the PCI Express specification:
<quote>
Bus Master Enable - Controls the ability of a PCI Express Endpoint to
issue Memory and I/O Read/Write Requests, and the ability of a Root or
Switch Port to forward Memory and I/O Read/Write Requests in the
Upstream direction
Endpoints:
When this bit is Set, the PCI Express Function is allowed to issue
Memory or I/O Requests.
When this bit is Clear, the PCI Express Function is not allowed to
issue any Memory or I/O Requests.
Note that as MSI/MSI-X interrupt Messages are in-band memory writes,
setting the Bus Master Enable bit to 0b disables MSI/MSI-X interrupt
Messages as well.
Requests other than Memory or I/O Requests are not controlled by this
bit.
Default value of this bit is 0b.
This bit is hardwired to 0b if a Function does not generate Memory or
I/O Requests.
</quote>
To accommodate Exadata's requirement here, we install a panic-notifier
than when invoked, revokes BME for the function.
Orabug: 31556128
UEK5 => UEK6
(cherry picked from commit 6baf337)
cherry-pick-repo=UEK/production/linux-uek.git
Conflicts:
drivers/infiniband/hw/mlx5/mlx5_ib.h
drivers/infiniband/hw/mlx5/main.c
* The conflict arouse because in UEK5, the device struct
* contained an array of mlx5_roce structs, whereas in UEK6, the
* mlx5_roce struct is a sub-ordinate to the device's port
* array.
Signed-off-by: Håkon Bugge <[email protected]>
Reviewed-by: Sharath Srinivasan <[email protected]>
Signed-off-by: Aron Silverton <[email protected]>
0 commit comments