Skip to content

Commit 3b84482

Browse files
committed
pipe: avoid unnecessary EPOLLET wakeups under normal loads
I had forgotten just how sensitive hackbench is to extra pipe wakeups, and commit 3a34b13 ("pipe: make pipe writes always wake up readers") ended up causing a quite noticeable regression on larger machines. Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's not clear that this matters in real life all that much, but as Mel points out, it's used often enough when comparing kernels and so the performance regression shows up like a sore thumb. It's easy enough to fix at least for the common cases where pipes are used purely for data transfer, and you never have any exciting poll usage at all. So set a special 'poll_usage' flag when there is polling activity, and make the ugly "EPOLLET has crazy legacy expectations" semantics explicit to only that case. I would love to limit it to just the broken EPOLLET case, but the pipe code can't see the difference between epoll and regular select/poll, so any non-read/write waiting will trigger the extra wakeup behavior. That is sufficient for at least the hackbench case. Apart from making the odd extra wakeup cases more explicitly about EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe write, not at the first write chunk. That is actually much saner semantics (as much as you can call any of the legacy edge-triggered expectations for EPOLLET "sane") since it means that you know the wakeup will happen once the write is done, rather than possibly in the middle of one. [ For stable people: I'm putting a "Fixes" tag on this, but I leave it up to you to decide whether you actually want to backport it or not. It likely has no impact outside of synthetic benchmarks - Linus ] Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/ Fixes: 3a34b13 ("pipe: make pipe writes always wake up readers") Reported-by: kernel test robot <[email protected]> Tested-by: Sandeep Patil <[email protected]> Tested-by: Mel Gorman <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 614cb27 commit 3b84482

File tree

2 files changed

+11
-6
lines changed

2 files changed

+11
-6
lines changed

fs/pipe.c

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -444,9 +444,6 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
444444
#endif
445445

446446
/*
447-
* Epoll nonsensically wants a wakeup whether the pipe
448-
* was already empty or not.
449-
*
450447
* If it wasn't empty we try to merge new data into
451448
* the last buffer.
452449
*
@@ -455,9 +452,9 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
455452
* spanning multiple pages.
456453
*/
457454
head = pipe->head;
458-
was_empty = true;
455+
was_empty = pipe_empty(head, pipe->tail);
459456
chars = total_len & (PAGE_SIZE-1);
460-
if (chars && !pipe_empty(head, pipe->tail)) {
457+
if (chars && !was_empty) {
461458
unsigned int mask = pipe->ring_size - 1;
462459
struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
463460
int offset = buf->offset + buf->len;
@@ -590,8 +587,11 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
590587
* This is particularly important for small writes, because of
591588
* how (for example) the GNU make jobserver uses small writes to
592589
* wake up pending jobs
590+
*
591+
* Epoll nonsensically wants a wakeup whether the pipe
592+
* was already empty or not.
593593
*/
594-
if (was_empty) {
594+
if (was_empty || pipe->poll_usage) {
595595
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
596596
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
597597
}
@@ -654,6 +654,9 @@ pipe_poll(struct file *filp, poll_table *wait)
654654
struct pipe_inode_info *pipe = filp->private_data;
655655
unsigned int head, tail;
656656

657+
/* Epoll has some historical nasty semantics, this enables them */
658+
pipe->poll_usage = 1;
659+
657660
/*
658661
* Reading pipe state only -- no need for acquiring the semaphore.
659662
*

include/linux/pipe_fs_i.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ struct pipe_buffer {
4848
* @files: number of struct file referring this pipe (protected by ->i_lock)
4949
* @r_counter: reader counter
5050
* @w_counter: writer counter
51+
* @poll_usage: is this pipe used for epoll, which has crazy wakeups?
5152
* @fasync_readers: reader side fasync
5253
* @fasync_writers: writer side fasync
5354
* @bufs: the circular array of pipe buffers
@@ -70,6 +71,7 @@ struct pipe_inode_info {
7071
unsigned int files;
7172
unsigned int r_counter;
7273
unsigned int w_counter;
74+
unsigned int poll_usage;
7375
struct page *tmp_page;
7476
struct fasync_struct *fasync_readers;
7577
struct fasync_struct *fasync_writers;

0 commit comments

Comments
 (0)