You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ata: libata-scsi: do not overwrite SCSI ML and status bytes
For SCSI ML byte:
In the case where a command is completed via libata EH:
irq -> ata_qc_complete() -> ata_qc_schedule_eh()
irq done
... -> ata_do_eh() -> ata_eh_link_autopsy() -> ata_eh_finish() ->
ata_eh_qc_complete() -> __ata_eh_qc_complete() -> __ata_qc_complete() ->
qc->complete_fn() (ata_scsi_qc_complete()) -> ata_qc_done() ->
qc->scsidone() (empty stub)
... -> scsi_eh_finish_cmd() -> scsi_eh_flush_done_q() ->
scsi_finish_command()
ata_eh_link_autopsy() will call ata_eh_analyze_tf(), which calls
scsi_check_sense(), which sets the SCSI ML byte.
Since ata_scsi_qc_complete() is called after scsi_check_sense() when
a command is completed via libata EH, we cannot simply overwrite the
SCSI ML byte that was set earlier in the call chain.
For SCSI status byte:
When a SCSI command is prepared using scsi_prepare_cmd(), it sets
cmd->result to 0. (SAM_STAT_GOOD is defined as 0x0).
Likewise, when a command is requeued from SCSI EH, scsi_queue_insert()
is called, which sets cmd->result to 0.
A SCSI command thus always has a GOOD status by default when being
sent to libata.
If libata fetches sense data from the device, it will call
ata_scsi_set_sense(), which will set the status byte to
SAM_STAT_CHECK_CONDITION, if the caller deems that the status should be
a check condition.
ata_scsi_qc_complete() should therefore never overwrite the existing
status byte, because if it is != GOOD, it was set by libata itself,
for a reason.
For the host byte:
When libata abort commands, because of a NCQ error, it will schedule
SCSI EH for all QCs using blk_abort_request(), which will all end up in
scsi_timeout(), which will call scsi_abort_command(). scsi_timeout()
sets DID_TIME_OUT regardless if a command was aborted or timed out.
If we don't clear the DID_TIME_OUT byte for the QC that caused the
NCQ error, that QC will be reported as a timed out command, instead
of being reported as a NCQ error.
For a command that actually timed out, DID_TIME_OUT would be fine to
keep, but libata has its own way of detecting that a command timed out
(see ata_scsi_cmd_error_handler()), and sets AC_ERR_TIMEOUT if that is
the case. libata will retry timed out commands.
We could clear DID_TIME_OUT only for the QC that caused the NCQ error,
but since libata has its own way of detecting timeouts, simply clear it
always.
Note that the existing ata_scsi_qc_complete() code does:
cmd->result = SAM_STAT_CHECK_CONDITION or cmd->result = SAM_STAT_GOOD.
This WILL clear the host byte. So us clearing the host byte
unconditionally is in line with the existing libata behavior.
Signed-off-by: Niklas Cassel <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
0 commit comments