Skip to content

Commit 2c83a72

Browse files
committed
drm/etnaviv: bring back progress check in job timeout handler
When the hangcheck handler was replaced by the DRM scheduler timeout handling we dropped the forward progress check, as this might allow clients to hog the GPU for a long time with a big job. It turns out that even reasonably well behaved clients like the Armada Xorg driver occasionally trip over the 500ms timeout. Bring back the forward progress check to get rid of the userspace regression. We would still like to fix userspace to submit smaller batches if possible, but that is for another day. Cc: <[email protected]> Fixes: 6d7a20c (drm/etnaviv: replace hangcheck with scheduler timeout) Reported-by: Russell King <[email protected]> Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
1 parent bf6ba3a commit 2c83a72

File tree

2 files changed

+27
-0
lines changed

2 files changed

+27
-0
lines changed

drivers/gpu/drm/etnaviv/etnaviv_gpu.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,9 @@ struct etnaviv_gpu {
131131
struct work_struct sync_point_work;
132132
int sync_point_event;
133133

134+
/* hang detection */
135+
u32 hangcheck_dma_addr;
136+
134137
void __iomem *mmio;
135138
int irq;
136139

drivers/gpu/drm/etnaviv/etnaviv_sched.c

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
#include "etnaviv_gem.h"
1111
#include "etnaviv_gpu.h"
1212
#include "etnaviv_sched.h"
13+
#include "state.xml.h"
1314

1415
static int etnaviv_job_hang_limit = 0;
1516
module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
@@ -85,6 +86,29 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
8586
{
8687
struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
8788
struct etnaviv_gpu *gpu = submit->gpu;
89+
u32 dma_addr;
90+
int change;
91+
92+
/*
93+
* If the GPU managed to complete this jobs fence, the timout is
94+
* spurious. Bail out.
95+
*/
96+
if (fence_completed(gpu, submit->out_fence->seqno))
97+
return;
98+
99+
/*
100+
* If the GPU is still making forward progress on the front-end (which
101+
* should never loop) we shift out the timeout to give it a chance to
102+
* finish the job.
103+
*/
104+
dma_addr = gpu_read(gpu, VIVS_FE_DMA_ADDRESS);
105+
change = dma_addr - gpu->hangcheck_dma_addr;
106+
if (change < 0 || change > 16) {
107+
gpu->hangcheck_dma_addr = dma_addr;
108+
schedule_delayed_work(&sched_job->work_tdr,
109+
sched_job->sched->timeout);
110+
return;
111+
}
88112

89113
/* block scheduler */
90114
kthread_park(gpu->sched.thread);

0 commit comments

Comments
 (0)