Skip to content

Commit ad765fa

Browse files
icklerodrigovivi
authored andcommitted
drm/i915/gem: Look for waitboosting across the whole object prior to individual waits
We employ a "waitboost" heuristic to detect when userspace is stalled waiting for results from earlier execution. Under latency sensitive work mixed between the gpu/cpu, the GPU is typically under-utilised and so RPS sees that low utilisation as a reason to downclock the frequency, causing longer stalls and lower throughput. The user left waiting for the results is not impressed. On applying commit 047a1b8 ("dma-buf & drm/amdgpu: remove dma_resv workaround") it was observed that deinterlacing h264 on Haswell performance dropped by 2-5x. The reason being that the natural workload was not intense enough to trigger RPS (using HW evaluation intervals) to upclock, and so it was depending on waitboosting for the throughput. Commit 047a1b8 ("dma-buf & drm/amdgpu: remove dma_resv workaround") changes the composition of dma-resv from keeping a single write fence + multiple read fences, to a single array of multiple write and read fences (a maximum of one pair of write/read fences per context). The iteration order was also changed implicitly from all-read fences then the single write fence, to a mix of write fences followed by read fences. It is that ordering change that belied the fragility of waitboosting. Currently, a waitboost is inspected at the point of waiting on an outstanding fence. If the GPU is backlogged such that we haven't yet stated the request we need to wait on, we force the GPU to upclock until the completion of that request. By changing the order in which we waited upon requests, we ended up waiting on those requests in sequence and as such we saw that each request was already started and so not a suitable candidate for waitboosting. Instead of asking whether to boost each fence in turn, we can look at whether boosting is required for the dma-resv ensemble prior to waiting on any fence, making the heuristic more robust to the order in which fences are stored in the dma-resv. Reported-by: Thomas Voegtle <[email protected]> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284 Fixes: 047a1b8 ("dma-buf & drm/amdgpu: remove dma_resv workaround") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Signed-off-by: Karolina Drobnik <[email protected]> Tested-by: Thomas Voegtle <[email protected]> Reviewed-by: Andi Shyti <[email protected]> Acked-by: Rodrigo Vivi <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com (cherry picked from commit 394e2b5) Signed-off-by: Rodrigo Vivi <[email protected]>
1 parent a1c5a7b commit ad765fa

File tree

1 file changed

+34
-0
lines changed

1 file changed

+34
-0
lines changed

drivers/gpu/drm/i915/gem/i915_gem_wait.c

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
#include <linux/jiffies.h>
1010

1111
#include "gt/intel_engine.h"
12+
#include "gt/intel_rps.h"
1213

1314
#include "i915_gem_ioctls.h"
1415
#include "i915_gem_object.h"
@@ -31,6 +32,37 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
3132
timeout);
3233
}
3334

35+
static void
36+
i915_gem_object_boost(struct dma_resv *resv, unsigned int flags)
37+
{
38+
struct dma_resv_iter cursor;
39+
struct dma_fence *fence;
40+
41+
/*
42+
* Prescan all fences for potential boosting before we begin waiting.
43+
*
44+
* When we wait, we wait on outstanding fences serially. If the
45+
* dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced
46+
* form 1:2, then as we look at each wait in turn we see that each
47+
* request is currently executing and not worthy of boosting. But if
48+
* we only happen to look at the final fence in the sequence (because
49+
* of request coalescing or splitting between read/write arrays by
50+
* the iterator), then we would boost. As such our decision to boost
51+
* or not is delicately balanced on the order we wait on fences.
52+
*
53+
* So instead of looking for boosts sequentially, look for all boosts
54+
* upfront and then wait on the outstanding fences.
55+
*/
56+
57+
dma_resv_iter_begin(&cursor, resv,
58+
dma_resv_usage_rw(flags & I915_WAIT_ALL));
59+
dma_resv_for_each_fence_unlocked(&cursor, fence)
60+
if (dma_fence_is_i915(fence) &&
61+
!i915_request_started(to_request(fence)))
62+
intel_rps_boost(to_request(fence));
63+
dma_resv_iter_end(&cursor);
64+
}
65+
3466
static long
3567
i915_gem_object_wait_reservation(struct dma_resv *resv,
3668
unsigned int flags,
@@ -40,6 +72,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
4072
struct dma_fence *fence;
4173
long ret = timeout ?: 1;
4274

75+
i915_gem_object_boost(resv, flags);
76+
4377
dma_resv_iter_begin(&cursor, resv,
4478
dma_resv_usage_rw(flags & I915_WAIT_ALL));
4579
dma_resv_for_each_fence_unlocked(&cursor, fence) {

0 commit comments

Comments
 (0)