Skip to content

Commit 80c0531

Browse files
author
Linus Torvalds
committed
Merge master.kernel.org:/pub/scm/linux/kernel/git/mingo/mutex-2.6
2 parents a457aa6 + 11b751a commit 80c0531

File tree

198 files changed

+2754
-649
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

198 files changed

+2754
-649
lines changed

Documentation/DocBook/kernel-locking.tmpl

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -222,24 +222,30 @@
222222
<title>Two Main Types of Kernel Locks: Spinlocks and Semaphores</title>
223223

224224
<para>
225-
There are two main types of kernel locks. The fundamental type
225+
There are three main types of kernel locks. The fundamental type
226226
is the spinlock
227227
(<filename class="headerfile">include/asm/spinlock.h</filename>),
228228
which is a very simple single-holder lock: if you can't get the
229229
spinlock, you keep trying (spinning) until you can. Spinlocks are
230230
very small and fast, and can be used anywhere.
231231
</para>
232232
<para>
233-
The second type is a semaphore
233+
The second type is a mutex
234+
(<filename class="headerfile">include/linux/mutex.h</filename>): it
235+
is like a spinlock, but you may block holding a mutex.
236+
If you can't lock a mutex, your task will suspend itself, and be woken
237+
up when the mutex is released. This means the CPU can do something
238+
else while you are waiting. There are many cases when you simply
239+
can't sleep (see <xref linkend="sleeping-things"/>), and so have to
240+
use a spinlock instead.
241+
</para>
242+
<para>
243+
The third type is a semaphore
234244
(<filename class="headerfile">include/asm/semaphore.h</filename>): it
235245
can have more than one holder at any time (the number decided at
236246
initialization time), although it is most commonly used as a
237-
single-holder lock (a mutex). If you can't get a semaphore,
238-
your task will put itself on the queue, and be woken up when the
239-
semaphore is released. This means the CPU will do something
240-
else while you are waiting, but there are many cases when you
241-
simply can't sleep (see <xref linkend="sleeping-things"/>), and so
242-
have to use a spinlock instead.
247+
single-holder lock (a mutex). If you can't get a semaphore, your
248+
task will be suspended and later on woken up - just like for mutexes.
243249
</para>
244250
<para>
245251
Neither type of lock is recursive: see

Documentation/mutex-design.txt

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
Generic Mutex Subsystem
2+
3+
started by Ingo Molnar <[email protected]>
4+
5+
"Why on earth do we need a new mutex subsystem, and what's wrong
6+
with semaphores?"
7+
8+
firstly, there's nothing wrong with semaphores. But if the simpler
9+
mutex semantics are sufficient for your code, then there are a couple
10+
of advantages of mutexes:
11+
12+
- 'struct mutex' is smaller on most architectures: .e.g on x86,
13+
'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
14+
A smaller structure size means less RAM footprint, and better
15+
CPU-cache utilization.
16+
17+
- tighter code. On x86 i get the following .text sizes when
18+
switching all mutex-alike semaphores in the kernel to the mutex
19+
subsystem:
20+
21+
text data bss dec hex filename
22+
3280380 868188 396860 4545428 455b94 vmlinux-semaphore
23+
3255329 865296 396732 4517357 44eded vmlinux-mutex
24+
25+
that's 25051 bytes of code saved, or a 0.76% win - off the hottest
26+
codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
27+
Smaller code means better icache footprint, which is one of the
28+
major optimization goals in the Linux kernel currently.
29+
30+
- the mutex subsystem is slightly faster and has better scalability for
31+
contended workloads. On an 8-way x86 system, running a mutex-based
32+
kernel and testing creat+unlink+close (of separate, per-task files)
33+
in /tmp with 16 parallel tasks, the average number of ops/sec is:
34+
35+
Semaphores: Mutexes:
36+
37+
$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
38+
8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
39+
checking VFS performance. checking VFS performance.
40+
avg loops/sec: 34713 avg loops/sec: 84153
41+
CPU utilization: 63% CPU utilization: 22%
42+
43+
i.e. in this workload, the mutex based kernel was 2.4 times faster
44+
than the semaphore based kernel, _and_ it also had 2.8 times less CPU
45+
utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
46+
performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
47+
performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
48+
more efficient.)
49+
50+
the scalability difference is visible even on a 2-way P4 HT box:
51+
52+
Semaphores: Mutexes:
53+
54+
$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
55+
4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
56+
checking VFS performance. checking VFS performance.
57+
avg loops/sec: 127659 avg loops/sec: 181082
58+
CPU utilization: 100% CPU utilization: 34%
59+
60+
(the straight performance advantage of mutexes is 41%, the per-cycle
61+
efficiency of mutexes is 4.1 times better.)
62+
63+
- there are no fastpath tradeoffs, the mutex fastpath is just as tight
64+
as the semaphore fastpath. On x86, the locking fastpath is 2
65+
instructions:
66+
67+
c0377ccb <mutex_lock>:
68+
c0377ccb: f0 ff 08 lock decl (%eax)
69+
c0377cce: 78 0e js c0377cde <.text.lock.mutex>
70+
c0377cd0: c3 ret
71+
72+
the unlocking fastpath is equally tight:
73+
74+
c0377cd1 <mutex_unlock>:
75+
c0377cd1: f0 ff 00 lock incl (%eax)
76+
c0377cd4: 7e 0f jle c0377ce5 <.text.lock.mutex+0x7>
77+
c0377cd6: c3 ret
78+
79+
- 'struct mutex' semantics are well-defined and are enforced if
80+
CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
81+
virtually no debugging code or instrumentation. The mutex subsystem
82+
checks and enforces the following rules:
83+
84+
* - only one task can hold the mutex at a time
85+
* - only the owner can unlock the mutex
86+
* - multiple unlocks are not permitted
87+
* - recursive locking is not permitted
88+
* - a mutex object must be initialized via the API
89+
* - a mutex object must not be initialized via memset or copying
90+
* - task may not exit with mutex held
91+
* - memory areas where held locks reside must not be freed
92+
* - held mutexes must not be reinitialized
93+
* - mutexes may not be used in irq contexts
94+
95+
furthermore, there are also convenience features in the debugging
96+
code:
97+
98+
* - uses symbolic names of mutexes, whenever they are printed in debug output
99+
* - point-of-acquire tracking, symbolic lookup of function names
100+
* - list of all locks held in the system, printout of them
101+
* - owner tracking
102+
* - detects self-recursing locks and prints out all relevant info
103+
* - detects multi-task circular deadlocks and prints out all affected
104+
* locks and tasks (and only those tasks)
105+
106+
Disadvantages
107+
-------------
108+
109+
The stricter mutex API means you cannot use mutexes the same way you
110+
can use semaphores: e.g. they cannot be used from an interrupt context,
111+
nor can they be unlocked from a different context that which acquired
112+
it. [ I'm not aware of any other (e.g. performance) disadvantages from
113+
using mutexes at the moment, please let me know if you find any. ]
114+
115+
Implementation of mutexes
116+
-------------------------
117+
118+
'struct mutex' is the new mutex type, defined in include/linux/mutex.h
119+
and implemented in kernel/mutex.c. It is a counter-based mutex with a
120+
spinlock and a wait-list. The counter has 3 states: 1 for "unlocked",
121+
0 for "locked" and negative numbers (usually -1) for "locked, potential
122+
waiters queued".
123+
124+
the APIs of 'struct mutex' have been streamlined:
125+
126+
DEFINE_MUTEX(name);
127+
128+
mutex_init(mutex);
129+
130+
void mutex_lock(struct mutex *lock);
131+
int mutex_lock_interruptible(struct mutex *lock);
132+
int mutex_trylock(struct mutex *lock);
133+
void mutex_unlock(struct mutex *lock);
134+
int mutex_is_locked(struct mutex *lock);
135+

arch/i386/mm/pageattr.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,10 @@ void kernel_map_pages(struct page *page, int numpages, int enable)
222222
{
223223
if (PageHighMem(page))
224224
return;
225+
if (!enable)
226+
mutex_debug_check_no_locks_freed(page_address(page),
227+
page_address(page+numpages));
228+
225229
/* the return value is ignored - the calls cannot fail,
226230
* large pages are disabled at boot time.
227231
*/

arch/powerpc/platforms/cell/spufs/inode.c

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ spufs_delete_inode(struct inode *inode)
137137
static void spufs_prune_dir(struct dentry *dir)
138138
{
139139
struct dentry *dentry, *tmp;
140-
down(&dir->d_inode->i_sem);
140+
mutex_lock(&dir->d_inode->i_mutex);
141141
list_for_each_entry_safe(dentry, tmp, &dir->d_subdirs, d_child) {
142142
spin_lock(&dcache_lock);
143143
spin_lock(&dentry->d_lock);
@@ -154,23 +154,23 @@ static void spufs_prune_dir(struct dentry *dir)
154154
}
155155
}
156156
shrink_dcache_parent(dir);
157-
up(&dir->d_inode->i_sem);
157+
mutex_unlock(&dir->d_inode->i_mutex);
158158
}
159159

160160
static int spufs_rmdir(struct inode *root, struct dentry *dir_dentry)
161161
{
162162
struct spu_context *ctx;
163163

164164
/* remove all entries */
165-
down(&root->i_sem);
165+
mutex_lock(&root->i_mutex);
166166
spufs_prune_dir(dir_dentry);
167-
up(&root->i_sem);
167+
mutex_unlock(&root->i_mutex);
168168

169169
/* We have to give up the mm_struct */
170170
ctx = SPUFS_I(dir_dentry->d_inode)->i_ctx;
171171
spu_forget(ctx);
172172

173-
/* XXX Do we need to hold i_sem here ? */
173+
/* XXX Do we need to hold i_mutex here ? */
174174
return simple_rmdir(root, dir_dentry);
175175
}
176176

@@ -330,7 +330,7 @@ long spufs_create_thread(struct nameidata *nd,
330330
out_dput:
331331
dput(dentry);
332332
out_dir:
333-
up(&nd->dentry->d_inode->i_sem);
333+
mutex_unlock(&nd->dentry->d_inode->i_mutex);
334334
out:
335335
return ret;
336336
}

drivers/block/loop.c

Lines changed: 14 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec,
215215
unsigned offset, bv_offs;
216216
int len, ret;
217217

218-
down(&mapping->host->i_sem);
218+
mutex_lock(&mapping->host->i_mutex);
219219
index = pos >> PAGE_CACHE_SHIFT;
220220
offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1);
221221
bv_offs = bvec->bv_offset;
@@ -278,7 +278,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec,
278278
}
279279
ret = 0;
280280
out:
281-
up(&mapping->host->i_sem);
281+
mutex_unlock(&mapping->host->i_mutex);
282282
return ret;
283283
unlock:
284284
unlock_page(page);
@@ -527,12 +527,12 @@ static int loop_make_request(request_queue_t *q, struct bio *old_bio)
527527
lo->lo_pending++;
528528
loop_add_bio(lo, old_bio);
529529
spin_unlock_irq(&lo->lo_lock);
530-
up(&lo->lo_bh_mutex);
530+
complete(&lo->lo_bh_done);
531531
return 0;
532532

533533
out:
534534
if (lo->lo_pending == 0)
535-
up(&lo->lo_bh_mutex);
535+
complete(&lo->lo_bh_done);
536536
spin_unlock_irq(&lo->lo_lock);
537537
bio_io_error(old_bio, old_bio->bi_size);
538538
return 0;
@@ -593,23 +593,20 @@ static int loop_thread(void *data)
593593
lo->lo_pending = 1;
594594

595595
/*
596-
* up sem, we are running
596+
* complete it, we are running
597597
*/
598-
up(&lo->lo_sem);
598+
complete(&lo->lo_done);
599599

600600
for (;;) {
601601
int pending;
602602

603-
/*
604-
* interruptible just to not contribute to load avg
605-
*/
606-
if (down_interruptible(&lo->lo_bh_mutex))
603+
if (wait_for_completion_interruptible(&lo->lo_bh_done))
607604
continue;
608605

609606
spin_lock_irq(&lo->lo_lock);
610607

611608
/*
612-
* could be upped because of tear-down, not pending work
609+
* could be completed because of tear-down, not pending work
613610
*/
614611
if (unlikely(!lo->lo_pending)) {
615612
spin_unlock_irq(&lo->lo_lock);
@@ -632,7 +629,7 @@ static int loop_thread(void *data)
632629
break;
633630
}
634631

635-
up(&lo->lo_sem);
632+
complete(&lo->lo_done);
636633
return 0;
637634
}
638635

@@ -843,7 +840,7 @@ static int loop_set_fd(struct loop_device *lo, struct file *lo_file,
843840
set_blocksize(bdev, lo_blocksize);
844841

845842
kernel_thread(loop_thread, lo, CLONE_KERNEL);
846-
down(&lo->lo_sem);
843+
wait_for_completion(&lo->lo_done);
847844
return 0;
848845

849846
out_putf:
@@ -909,10 +906,10 @@ static int loop_clr_fd(struct loop_device *lo, struct block_device *bdev)
909906
lo->lo_state = Lo_rundown;
910907
lo->lo_pending--;
911908
if (!lo->lo_pending)
912-
up(&lo->lo_bh_mutex);
909+
complete(&lo->lo_bh_done);
913910
spin_unlock_irq(&lo->lo_lock);
914911

915-
down(&lo->lo_sem);
912+
wait_for_completion(&lo->lo_done);
916913

917914
lo->lo_backing_file = NULL;
918915

@@ -1289,8 +1286,8 @@ static int __init loop_init(void)
12891286
if (!lo->lo_queue)
12901287
goto out_mem4;
12911288
init_MUTEX(&lo->lo_ctl_mutex);
1292-
init_MUTEX_LOCKED(&lo->lo_sem);
1293-
init_MUTEX_LOCKED(&lo->lo_bh_mutex);
1289+
init_completion(&lo->lo_done);
1290+
init_completion(&lo->lo_bh_done);
12941291
lo->lo_number = i;
12951292
spin_lock_init(&lo->lo_lock);
12961293
disk->major = LOOP_MAJOR;

drivers/block/sx8.c

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@
2727
#include <linux/time.h>
2828
#include <linux/hdreg.h>
2929
#include <linux/dma-mapping.h>
30+
#include <linux/completion.h>
3031
#include <asm/io.h>
31-
#include <asm/semaphore.h>
3232
#include <asm/uaccess.h>
3333

3434
#if 0
@@ -303,7 +303,7 @@ struct carm_host {
303303

304304
struct work_struct fsm_task;
305305

306-
struct semaphore probe_sem;
306+
struct completion probe_comp;
307307
};
308308

309309
struct carm_response {
@@ -1346,7 +1346,7 @@ static void carm_fsm_task (void *_data)
13461346
}
13471347

13481348
case HST_PROBE_FINISHED:
1349-
up(&host->probe_sem);
1349+
complete(&host->probe_comp);
13501350
break;
13511351

13521352
case HST_ERROR:
@@ -1622,7 +1622,7 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
16221622
host->flags = pci_dac ? FL_DAC : 0;
16231623
spin_lock_init(&host->lock);
16241624
INIT_WORK(&host->fsm_task, carm_fsm_task, host);
1625-
init_MUTEX_LOCKED(&host->probe_sem);
1625+
init_completion(&host->probe_comp);
16261626

16271627
for (i = 0; i < ARRAY_SIZE(host->req); i++)
16281628
host->req[i].tag = i;
@@ -1691,8 +1691,8 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
16911691
if (rc)
16921692
goto err_out_free_irq;
16931693

1694-
DPRINTK("waiting for probe_sem\n");
1695-
down(&host->probe_sem);
1694+
DPRINTK("waiting for probe_comp\n");
1695+
wait_for_completion(&host->probe_comp);
16961696

16971697
printk(KERN_INFO "%s: pci %s, ports %d, io %lx, irq %u, major %d\n",
16981698
host->name, pci_name(pdev), (int) CARM_MAX_PORTS,

0 commit comments

Comments
 (0)