Skip to content

Commit 1001659

Browse files
tomratbertdavem330
authored andcommitted
kcm: Add description in Documentation
Add kcm.txt to desribe KCM and interfaces. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 29152a3 commit 1001659

File tree

1 file changed

+285
-0
lines changed

1 file changed

+285
-0
lines changed

Documentation/networking/kcm.txt

Lines changed: 285 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,285 @@
1+
Kernel Connection Mulitplexor
2+
-----------------------------
3+
4+
Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based
5+
interface over TCP for generic application protocols. With KCM an application
6+
can efficiently send and receive application protocol messages over TCP using
7+
datagram sockets.
8+
9+
KCM implements an NxM multiplexor in the kernel as diagrammed below:
10+
11+
+------------+ +------------+ +------------+ +------------+
12+
| KCM socket | | KCM socket | | KCM socket | | KCM socket |
13+
+------------+ +------------+ +------------+ +------------+
14+
| | | |
15+
+-----------+ | | +----------+
16+
| | | |
17+
+----------------------------------+
18+
| Multiplexor |
19+
+----------------------------------+
20+
| | | | |
21+
+---------+ | | | ------------+
22+
| | | | |
23+
+----------+ +----------+ +----------+ +----------+ +----------+
24+
| Psock | | Psock | | Psock | | Psock | | Psock |
25+
+----------+ +----------+ +----------+ +----------+ +----------+
26+
| | | | |
27+
+----------+ +----------+ +----------+ +----------+ +----------+
28+
| TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock |
29+
+----------+ +----------+ +----------+ +----------+ +----------+
30+
31+
KCM sockets
32+
-----------
33+
34+
The KCM sockets provide the user interface to the muliplexor. All the KCM sockets
35+
bound to a multiplexor are considered to have equivalent function, and I/O
36+
operations in different sockets may be done in parallel without the need for
37+
synchronization between threads in userspace.
38+
39+
Multiplexor
40+
-----------
41+
42+
The multiplexor provides the message steering. In the transmit path, messages
43+
written on a KCM socket are sent atomically on an appropriate TCP socket.
44+
Similarly, in the receive path, messages are constructed on each TCP socket
45+
(Psock) and complete messages are steered to a KCM socket.
46+
47+
TCP sockets & Psocks
48+
--------------------
49+
50+
TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated
51+
for each bound TCP socket, this structure holds the state for constructing
52+
messages on receive as well as other connection specific information for KCM.
53+
54+
Connected mode semantics
55+
------------------------
56+
57+
Each multiplexor assumes that all attached TCP connections are to the same
58+
destination and can use the different connections for load balancing when
59+
transmitting. The normal send and recv calls (include sendmmsg and recvmmsg)
60+
can be used to send and receive messages from the KCM socket.
61+
62+
Socket types
63+
------------
64+
65+
KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types.
66+
67+
Message delineation
68+
-------------------
69+
70+
Messages are sent over a TCP stream with some application protocol message
71+
format that typically includes a header which frames the messages. The length
72+
of a received message can be deduced from the application protocol header
73+
(often just a simple length field).
74+
75+
A TCP stream must be parsed to determine message boundaries. Berkeley Packet
76+
Filter (BPF) is used for this. When attaching a TCP socket to a multiplexor a
77+
BPF program must be specified. The program is called at the start of receiving
78+
a new message and is given an skbuff that contains the bytes received so far.
79+
It parses the message header and returns the length of the message. Given this
80+
information, KCM will construct the message of the stated length and deliver it
81+
to a KCM socket.
82+
83+
TCP socket management
84+
---------------------
85+
86+
When a TCP socket is attached to a KCM multiplexor data ready (POLLIN) and
87+
write space available (POLLOUT) events are handled by the multiplexor. If there
88+
is a state change (disconnection) or other error on a TCP socket, an error is
89+
posted on the TCP socket so that a POLLERR event happens and KCM discontinues
90+
using the socket. When the application gets the error notification for a
91+
TCP socket, it should unattach the socket from KCM and then handle the error
92+
condition (the typical response is to close the socket and create a new
93+
connection if necessary).
94+
95+
KCM limits the maximum receive message size to be the size of the receive
96+
socket buffer on the attached TCP socket (the socket buffer size can be set by
97+
SO_RCVBUF). If the length of a new message reported by the BPF program is
98+
greater than this limit a corresponding error (EMSGSIZE) is posted on the TCP
99+
socket. The BPF program may also enforce a maximum messages size and report an
100+
error when it is exceeded.
101+
102+
A timeout may be set for assembling messages on a receive socket. The timeout
103+
value is taken from the receive timeout of the attached TCP socket (this is set
104+
by SO_RCVTIMEO). If the timer expires before assembly is complete an error
105+
(ETIMEDOUT) is posted on the socket.
106+
107+
User interface
108+
==============
109+
110+
Creating a multiplexor
111+
----------------------
112+
113+
A new multiplexor and initial KCM socket is created by a socket call:
114+
115+
socket(AF_KCM, type, protocol)
116+
117+
- type is either SOCK_DGRAM or SOCK_SEQPACKET
118+
- protocol is KCMPROTO_CONNECTED
119+
120+
Cloning KCM sockets
121+
-------------------
122+
123+
After the first KCM socket is created using the socket call as described
124+
above, additional sockets for the multiplexor can be created by cloning
125+
a KCM socket. This is accomplished by an ioctl on a KCM socket:
126+
127+
/* From linux/kcm.h */
128+
struct kcm_clone {
129+
int fd;
130+
};
131+
132+
struct kcm_clone info;
133+
134+
memset(&info, 0, sizeof(info));
135+
136+
err = ioctl(kcmfd, SIOCKCMCLONE, &info);
137+
138+
if (!err)
139+
newkcmfd = info.fd;
140+
141+
Attach transport sockets
142+
------------------------
143+
144+
Attaching of transport sockets to a multiplexor is performed by calling an
145+
ioctl on a KCM socket for the multiplexor. e.g.:
146+
147+
/* From linux/kcm.h */
148+
struct kcm_attach {
149+
int fd;
150+
int bpf_fd;
151+
};
152+
153+
struct kcm_attach info;
154+
155+
memset(&info, 0, sizeof(info));
156+
157+
info.fd = tcpfd;
158+
info.bpf_fd = bpf_prog_fd;
159+
160+
ioctl(kcmfd, SIOCKCMATTACH, &info);
161+
162+
The kcm_attach structure contains:
163+
fd: file descriptor for TCP socket being attached
164+
bpf_prog_fd: file descriptor for compiled BPF program downloaded
165+
166+
Unattach transport sockets
167+
--------------------------
168+
169+
Unattaching a transport socket from a multiplexor is straightforward. An
170+
"unattach" ioctl is done with the kcm_unattach structure as the argument:
171+
172+
/* From linux/kcm.h */
173+
struct kcm_unattach {
174+
int fd;
175+
};
176+
177+
struct kcm_unattach info;
178+
179+
memset(&info, 0, sizeof(info));
180+
181+
info.fd = cfd;
182+
183+
ioctl(fd, SIOCKCMUNATTACH, &info);
184+
185+
Disabling receive on KCM socket
186+
-------------------------------
187+
188+
A setsockopt is used to disable or enable receiving on a KCM socket.
189+
When receive is disabled, any pending messages in the socket's
190+
receive buffer are moved to other sockets. This feature is useful
191+
if an application thread knows that it will be doing a lot of
192+
work on a request and won't be able to service new messages for a
193+
while. Example use:
194+
195+
int val = 1;
196+
197+
setsockopt(kcmfd, SOL_KCM, KCM_RECV_DISABLE, &val, sizeof(val))
198+
199+
BFP programs for message delineation
200+
------------------------------------
201+
202+
BPF programs can be compiled using the BPF LLVM backend. For exmple,
203+
the BPF program for parsing Thrift is:
204+
205+
#include "bpf.h" /* for __sk_buff */
206+
#include "bpf_helpers.h" /* for load_word intrinsic */
207+
208+
SEC("socket_kcm")
209+
int bpf_prog1(struct __sk_buff *skb)
210+
{
211+
return load_word(skb, 0) + 4;
212+
}
213+
214+
char _license[] SEC("license") = "GPL";
215+
216+
Use in applications
217+
===================
218+
219+
KCM accelerates application layer protocols. Specifically, it allows
220+
applications to use a message based interface for sending and receiving
221+
messages. The kernel provides necessary assurances that messages are sent
222+
and received atomically. This relieves much of the burden applications have
223+
in mapping a message based protocol onto the TCP stream. KCM also make
224+
application layer messages a unit of work in the kernel for the purposes of
225+
steerng and scheduling, which in turn allows a simpler networking model in
226+
multithreaded applications.
227+
228+
Configurations
229+
--------------
230+
231+
In an Nx1 configuration, KCM logically provides multiple socket handles
232+
to the same TCP connection. This allows parallelism between in I/O
233+
operations on the TCP socket (for instance copyin and copyout of data is
234+
parallelized). In an application, a KCM socket can be opened for each
235+
processing thread and inserted into the epoll (similar to how SO_REUSEPORT
236+
is used to allow multiple listener sockets on the same port).
237+
238+
In a MxN configuration, multiple connections are established to the
239+
same destination. These are used for simple load balancing.
240+
241+
Message batching
242+
----------------
243+
244+
The primary purpose of KCM is load balancing between KCM sockets and hence
245+
threads in a nominal use case. Perfect load balancing, that is steering
246+
each received message to a different KCM socket or steering each sent
247+
message to a different TCP socket, can negatively impact performance
248+
since this doesn't allow for affinities to be established. Balancing
249+
based on groups, or batches of messages, can be beneficial for performance.
250+
251+
On transmit, there are three ways an application can batch (pipeline)
252+
messages on a KCM socket.
253+
1) Send multiple messages in a single sendmmsg.
254+
2) Send a group of messages each with a sendmsg call, where all messages
255+
except the last have MSG_BATCH in the flags of sendmsg call.
256+
3) Create "super message" composed of multiple messages and send this
257+
with a single sendmsg.
258+
259+
On receive, the KCM module attempts to queue messages received on the
260+
same KCM socket during each TCP ready callback. The targeted KCM socket
261+
changes at each receive ready callback on the KCM socket. The application
262+
does not need to configure this.
263+
264+
Error handling
265+
--------------
266+
267+
An application should include a thread to monitor errors raised on
268+
the TCP connection. Normally, this will be done by placing each
269+
TCP socket attached to a KCM multiplexor in epoll set for POLLERR
270+
event. If an error occurs on an attached TCP socket, KCM sets an EPIPE
271+
on the socket thus waking up the application thread. When the application
272+
sees the error (which may just be a disconnect) it should unattach the
273+
socket from KCM and then close it. It is assumed that once an error is
274+
posted on the TCP socket the data stream is unrecoverable (i.e. an error
275+
may have occurred in in the middle of receiving a messssge).
276+
277+
TCP connection monitoring
278+
-------------------------
279+
280+
In KCM there is no means to correlate a message to the TCP socket that
281+
was used to send or receive the message (except in the case there is
282+
only one attached TCP socket). However, the application does retain
283+
an open file descriptor to the socket so it will be able to get statistics
284+
from the socket which can be used in detecting issues (such as high
285+
retransmissions on the socket).

0 commit comments

Comments
 (0)