Skip to content

Commit 606fec9

Browse files
committed
drm/i915: Prefer random replacement before eviction search
Performing an eviction search can be very, very slow especially for a range restricted replacement. For example, a workload like gem_concurrent_blit will populate the entire GTT and then cause aperture thrashing. Since the GTT is a mix of active and inactive tiny objects, we have to search through almost 400k objects before finding anything inside the mappable region, and as this search is required before every operation performance falls off a cliff. Instead of performing the full search, we do a trial replacement of the node at a random location fitting the specified restrictions. We lose the strict LRU property of the GTT in exchange for avoiding the slow search (several orders of runtime improvement for gem_concurrent_blit 4KiB-global-gtt, e.g. from 5000s to 20s). The loss of LRU replacement is (later) mitigated firstly by only doing replacement if we find no freespace and secondly by execbuf doing a PIN_NONBLOCK search first before it starts thrashing (i.e. the random replacement will only occur from the already inactive set of objects). v2: Ascii-art, and check preconditionst v3: Rephrase final sentence in comment to explain why we don't bother with if (i915_is_ggtt(vm)) for preferring random replacement. Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Joonas Lahtinen <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
1 parent 625d988 commit 606fec9

File tree

1 file changed

+58
-1
lines changed

1 file changed

+58
-1
lines changed

drivers/gpu/drm/i915/i915_gem_gtt.c

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
*/
2525

2626
#include <linux/log2.h>
27+
#include <linux/random.h>
2728
#include <linux/seq_file.h>
2829
#include <linux/stop_machine.h>
2930

@@ -3606,6 +3607,31 @@ int i915_gem_gtt_reserve(struct i915_address_space *vm,
36063607
return err;
36073608
}
36083609

3610+
static u64 random_offset(u64 start, u64 end, u64 len, u64 align)
3611+
{
3612+
u64 range, addr;
3613+
3614+
GEM_BUG_ON(range_overflows(start, len, end));
3615+
GEM_BUG_ON(round_up(start, align) > round_down(end - len, align));
3616+
3617+
range = round_down(end - len, align) - round_up(start, align);
3618+
if (range) {
3619+
if (sizeof(unsigned long) == sizeof(u64)) {
3620+
addr = get_random_long();
3621+
} else {
3622+
addr = get_random_int();
3623+
if (range > U32_MAX) {
3624+
addr <<= 32;
3625+
addr |= get_random_int();
3626+
}
3627+
}
3628+
div64_u64_rem(addr, range, &addr);
3629+
start += addr;
3630+
}
3631+
3632+
return round_up(start, align);
3633+
}
3634+
36093635
/**
36103636
* i915_gem_gtt_insert - insert a node into an address_space (GTT)
36113637
* @vm - the &struct i915_address_space
@@ -3627,7 +3653,8 @@ int i915_gem_gtt_reserve(struct i915_address_space *vm,
36273653
* its @size must then fit entirely within the [@start, @end] bounds. The
36283654
* nodes on either side of the hole must match @color, or else a guard page
36293655
* will be inserted between the two nodes (or the node evicted). If no
3630-
* suitable hole is found, then the LRU list of objects within the GTT
3656+
* suitable hole is found, first a victim is randomly selected and tested
3657+
* for eviction, otherwise then the LRU list of objects within the GTT
36313658
* is scanned to find the first set of replacement nodes to create the hole.
36323659
* Those old overlapping nodes are evicted from the GTT (and so must be
36333660
* rebound before any future use). Any node that is currently pinned cannot
@@ -3645,6 +3672,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
36453672
u64 start, u64 end, unsigned int flags)
36463673
{
36473674
u32 search_flag, alloc_flag;
3675+
u64 offset;
36483676
int err;
36493677

36503678
lockdep_assert_held(&vm->i915->drm.struct_mutex);
@@ -3687,6 +3715,35 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
36873715
if (err != -ENOSPC)
36883716
return err;
36893717

3718+
/* No free space, pick a slot at random.
3719+
*
3720+
* There is a pathological case here using a GTT shared between
3721+
* mmap and GPU (i.e. ggtt/aliasing_ppgtt but not full-ppgtt):
3722+
*
3723+
* |<-- 256 MiB aperture -->||<-- 1792 MiB unmappable -->|
3724+
* (64k objects) (448k objects)
3725+
*
3726+
* Now imagine that the eviction LRU is ordered top-down (just because
3727+
* pathology meets real life), and that we need to evict an object to
3728+
* make room inside the aperture. The eviction scan then has to walk
3729+
* the 448k list before it finds one within range. And now imagine that
3730+
* it has to search for a new hole between every byte inside the memcpy,
3731+
* for several simultaneous clients.
3732+
*
3733+
* On a full-ppgtt system, if we have run out of available space, there
3734+
* will be lots and lots of objects in the eviction list! Again,
3735+
* searching that LRU list may be slow if we are also applying any
3736+
* range restrictions (e.g. restriction to low 4GiB) and so, for
3737+
* simplicity and similarilty between different GTT, try the single
3738+
* random replacement first.
3739+
*/
3740+
offset = random_offset(start, end,
3741+
size, alignment ?: I915_GTT_MIN_ALIGNMENT);
3742+
err = i915_gem_gtt_reserve(vm, node, size, offset, color, flags);
3743+
if (err != -ENOSPC)
3744+
return err;
3745+
3746+
/* Randomly selected placement is pinned, do a search */
36903747
err = i915_gem_evict_something(vm, size, alignment, color,
36913748
start, end, flags);
36923749
if (err)

0 commit comments

Comments
 (0)