You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rds: Put back pages on the CPU that allocated them
The RDBMS usage model for RDS dictates that an RDS message, which is
an SG list populated with enough order zero pages to hold the RDS
messages, is allocated where the RDBMS process runs. And they run on
any CPU.
The Reliability aspect of RDS requires RDS to keep the message until
the peer has acknowledged it. This happens by an explicit or an
implicit ACK. When said ACKs are received, the pages in the SG list
are freed (put). However, this only happens on the NUMA node local to
the HCA used for communication, due to how cellirqbalance works.
The above facts lead to a surplus of order zero pages on the NUMA node
local to the HCA, and similar, a deficit of order zero pages on other
NUMA nodes. Even though the SLAB allocation system is supposedly
lock-free for order zero pages, locks are taken in the above scenario
in order to establish balance again.
This will again lead to lock contention in the kernel, and reduced
RDBMS IOPS for certain workloads.
This is fixed by maintaining a per-cpu cache for pages. In
rds_copy_from_user(), we allocate from the per-cpu cache if
possible. Likewise, when purging RDS messages, we put the pages back
on the per-cpu list that allocated the page, unless it is too long.
To avoid pages being stuck forever in the per-cpu cache, we have a
garbage collector, which by default runs every second and cleans 10%
of the possible CPU caches per invocation.
When the RDS module is removed, we stop the worker thread and flushes
100% of all possible CPUs.
This optimization provides an 11% improvement in update IOPS on an X10M
VM running Oracle RDBMS.
Orabug: 35768362
Suggested-by: Jane Chu <[email protected]>
Signed-off-by: Håkon Bugge <[email protected]>
Tested-by: Håkon Bugge <[email protected]>
Tested-by: Shih-Yu Huang <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Tested-by: Alexis Silva <[email protected]>
Orabug: 35768362
LUCI => v6.11
Conflicts:
net/rds/sysctl.c
- Merge conflict due to the missing ctl_table sentinel (Orabug 36936368)
Reviewed-by: Hans Westgaard Ry <[email protected]>
Signed-off-by: Håkon Bugge <[email protected]>
0 commit comments