Skip to content

Commit 1295175

Browse files
konradwilkNipaLocal
authored andcommitted
rds: Expose feature parameters via sysfs (and ELF)
We would like to have a programmatic way for applications to query which of the features defined in include/uapi/linux/rds.h are actually implemented by the kernel. The problem is that applications can be built against newer kernel (or older) and they may have the feature implemented or not. The lack of a certain feature would signify that the kernel does not support it. The presence of it signifies the existence of it. This would provide the application to query the sysfs and figure out what is supported on a running system. This patch would expose this extra sysfs file: /sys/kernel/rds/features/ioctl_get_tos /sys/kernel/rds/features/ioctl_set_tos /sys/kernel/rds/features/socket_cancel_sent_to /sys/kernel/rds/features/socket_cong_monitor /sys/kernel/rds/features/socket_free_mr /sys/kernel/rds/features/socket_get_mr /sys/kernel/rds/features/socket_get_mr_for_dest /sys/kernel/rds/features/socket_recverr /sys/kernel/rds/features/socket_so_rxpath_latency /sys/kernel/rds/features/socket_so_transport With the value of 'supported' in them. In the future this value could change to say 'deprecated' or have other values (for example different versions) or can be runtime changed. The choice to use sysfs and this particular way is modeled on the filesystems usage exposing their features. Alternative solution such as exposing one file ('features') with each feature enumerated (which cgroup does) is a bit limited in that it does not provide means to provide extra content in the future for each feature. For example if one of the features had three modes and one wanted to set a particular one at runtime - that does not exist in cgroup (albeit it can be implemented but it would be quite hectic to have just one single attribute). Another solution of using an ioctl to expose a bitmask has the disadvantage of being less flexible in the future and while it can have a bit of supported/unsupported, it is not clear how one would change modes or expose versions. It is most certainly feasible but it can get seriously complex fast. As such this mechanism offers the basic support we require now and offers the flexibility for the future. Lastly, we also utilize the ELF note macro to expose these via so that applications that have not yet initialized RDS transport can inspect the kernel module to see if they have the appropiate support and choose an alternative protocol if they wish so. Reviewed-by: Allison Henderson <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: NipaLocal <nipa@local>
1 parent 874d7ce commit 1295175

File tree

2 files changed

+105
-1
lines changed

2 files changed

+105
-1
lines changed
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
What: /sys/kernel/rds/features/*
2+
Date: June 2025
3+
KernelVersion: 6.17
4+
5+
Description: This directory contains the features that this kernel
6+
has been built with and supports. They correspond
7+
to the include/uapi/linux/rds.h features.
8+
9+
The intent is for applications compiled against rds.h
10+
to be able to query and find out what features the
11+
driver supports. The current expected value is 'supported'.
12+
13+
What: /sys/kernel/rds/features/ioctl_[get,set]_tos
14+
Description: Allows the user to set on the socket a type of
15+
service(tos) value associated forever.
16+
17+
What: /sys/kernel/rds/features/socket_cancel_sent_to
18+
Description: Allows to cancel all pending messages to a given destination.
19+
20+
What: /sys/kernel/rds/features/socket_cong_monitor
21+
Description: RDS provides explicit congestion monitoring for a socket using
22+
a 64-bit mask. Each bit in the mask corresponds to a group of ports.
23+
24+
When a congestion update arrives, RDS checks the set of ports
25+
that became uncongested against the bit mask.
26+
27+
If they overlap, a control messages is enqueued on the socket,
28+
and the application is woken up.
29+
30+
What: /sys/kernel/rds/features/socket_[get_mr,get_mr_for_dest,free_mr]
31+
Description: RDS allows a process to register or release memory ranges for
32+
RDMA.
33+
34+
What: /sys/kernel/rds/features/socket_recverr
35+
Description: RDS will send RDMA notification messages to the application for
36+
any RDMA operation that fails. By default this is off.
37+
38+
What: /sys/kernel/rds/features/socket_so_rxpath_latency
39+
Description: Receive path latency in various stages of receive path.
40+
41+
What: /sys/kernel/rds/features/socket_so_transport
42+
Description: Attach the socket to the underlaying transport (TCP, RDMA
43+
or loop) before invoking bind on the socket.

net/rds/af_rds.c

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@
3232
*/
3333
#include <linux/module.h>
3434
#include <linux/errno.h>
35+
#include <linux/elfnote.h>
3536
#include <linux/kernel.h>
37+
#include <linux/kobject.h>
3638
#include <linux/gfp.h>
3739
#include <linux/in.h>
3840
#include <linux/ipv6.h>
@@ -871,6 +873,53 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
871873
}
872874
#endif
873875

876+
static ssize_t supported_show(struct kobject *kobj, struct kobj_attribute *attr,
877+
char *buf)
878+
{
879+
return sysfs_emit(buf, "supported\n");
880+
}
881+
882+
#define SYSFS_ATTR(_name) \
883+
ELFNOTE64("rds." #_name, 0, 1); \
884+
static struct kobj_attribute rds_attr_##_name = { \
885+
.attr = {.name = __stringify(_name), .mode = 0444 }, \
886+
.show = supported_show, \
887+
}
888+
889+
SYSFS_ATTR(ioctl_set_tos);
890+
SYSFS_ATTR(ioctl_get_tos);
891+
SYSFS_ATTR(socket_cancel_sent_to);
892+
SYSFS_ATTR(socket_cong_monitor);
893+
SYSFS_ATTR(socket_get_mr);
894+
SYSFS_ATTR(socket_get_mr_for_dest);
895+
SYSFS_ATTR(socket_free_mr);
896+
SYSFS_ATTR(socket_recverr);
897+
SYSFS_ATTR(socket_so_rxpath_latency);
898+
SYSFS_ATTR(socket_so_transport);
899+
900+
#define ATTR_LIST(_name) &rds_attr_##_name.attr
901+
902+
static struct attribute *rds_feat_attrs[] = {
903+
ATTR_LIST(ioctl_set_tos),
904+
ATTR_LIST(ioctl_get_tos),
905+
ATTR_LIST(socket_cancel_sent_to),
906+
ATTR_LIST(socket_cong_monitor),
907+
ATTR_LIST(socket_get_mr),
908+
ATTR_LIST(socket_get_mr_for_dest),
909+
ATTR_LIST(socket_free_mr),
910+
ATTR_LIST(socket_recverr),
911+
ATTR_LIST(socket_so_rxpath_latency),
912+
ATTR_LIST(socket_so_transport),
913+
NULL,
914+
};
915+
916+
static const struct attribute_group rds_feat_group = {
917+
.attrs = rds_feat_attrs,
918+
.name = "features",
919+
};
920+
921+
static struct kobject *rds_sysfs_kobj;
922+
874923
static void rds_exit(void)
875924
{
876925
sock_unregister(rds_family_ops.family);
@@ -882,6 +931,8 @@ static void rds_exit(void)
882931
rds_stats_exit();
883932
rds_page_exit();
884933
rds_bind_lock_destroy();
934+
sysfs_remove_group(rds_sysfs_kobj, &rds_feat_group);
935+
kobject_put(rds_sysfs_kobj);
885936
rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info);
886937
rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info);
887938
#if IS_ENABLED(CONFIG_IPV6)
@@ -923,6 +974,15 @@ static int __init rds_init(void)
923974
if (ret)
924975
goto out_proto;
925976

977+
rds_sysfs_kobj = kobject_create_and_add("rds", kernel_kobj);
978+
if (!rds_sysfs_kobj) {
979+
ret = -ENOMEM;
980+
goto out_proto;
981+
}
982+
ret = sysfs_create_group(rds_sysfs_kobj, &rds_feat_group);
983+
if (ret)
984+
goto out_kobject;
985+
926986
rds_info_register_func(RDS_INFO_SOCKETS, rds_sock_info);
927987
rds_info_register_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info);
928988
#if IS_ENABLED(CONFIG_IPV6)
@@ -931,7 +991,8 @@ static int __init rds_init(void)
931991
#endif
932992

933993
goto out;
934-
994+
out_kobject:
995+
kobject_put(rds_sysfs_kobj);
935996
out_proto:
936997
proto_unregister(&rds_proto);
937998
out_stats:

0 commit comments

Comments
 (0)