mbox series

[00/88] esp: rework ESP emulation to use a SCSI phase-based state machine

Message ID 20240112125420.514425-1-mark.cave-ayland@ilande.co.uk
Headers show
Series esp: rework ESP emulation to use a SCSI phase-based state machine | expand

Message

Mark Cave-Ayland Jan. 12, 2024, 12:52 p.m. UTC
The ESP SCSI chip fundamentally consists of a FIFO for transferring data to/from
the SCSI bus along with a command sequencer which automates various processes such
as selection, message/command transfer and data transfer. What makes this chip
particularly interesting is that the FIFO is also used as a buffer for DMA transfers
which makes it possible to mix and match DMA and non-DMA transfers when sending and
receiving data to/from the SCSI bus.

Whilst the current ESP emulation works well for most cases, there are several
problems that I've encountered whilst trying to debug various guest issues:


1) Multiple code paths for non-DMA, DMA and pDMA

Consider a SCSI request being submitted by the guest: if it is a DMA SCSI
request then it takes one path through get_cmd(), and a different path through
get_cmd() for pDMA. This then leads to the DMA code path being called for DMA
and the pDMA codepath being called for each FIFO write. Finally once you add
in the non-DMA path you end up with a total of 5 potential codepaths for any
given SCSI request.

2) Manual handling of STAT_TC flag

According to the datasheet the STAT_TC flag in ESP_RSTAT is set when the DMA
transfer counter reaches zero. The current code handles this manually in
multiple places, including where it shouldn't be necessary.

3) Deferred interrupts are used only for reads and not writes

The ESP emulation currently makes use of a deferred interrupt when submitting
read commands i.e. the interrupt flags are not updated immediately but when
the QEMU SCSI layer indicates it is ready. This works well for reads, but isn't
implemented for writes which can cause problems when the host is heavily loaded
or a large SCSI write is requested.

4) Padding of pDMA TI transfers to a minimum of 16 bytes

In order to allow Classic MacOS to boot successfully there is a workaround that
pads all pDMA TI transfers to a minimum of 16 bytes (see commit 7aa6baee7c
"esp: add support for unaligned accesses"). This feature is not documented and it
turns out that this is a symptom of incorrect handling of transfer
overflow/underflow conditions.

5) Duplication of some non-DMA logic in esp_reg_read() and esp_reg_write()

When reading and writing to the FIFO there is some duplication of logic from
esp_do_nodma() in esp_reg_read() and esp_reg_write(). This should be eliminated
in favour of a single state machine for handling non-DMA FIFO accesses in a
single place.


The series here reworks the ESP emulation to use a SCSI phase-based state machine
with just two paths: one for DMA/pDMA requests, and another for non-DMA requests.
The pDMA callbacks are completely removed so that there is now only a single path
for DMA requests. As part of this work the manual STAT_TC handling is removed, and a
couple of bugs in the SCSI phase selection are fixed to allow proper transfer
underflow/overflow detection and recovery. 

I've tested the series with all of my available ESP images and compatibility is
greatly improved with these changes: I believe that this series should also fix a
few GitLab issues including https://gitlab.com/qemu-project/qemu/-/issues/1831,
https://gitlab.com/qemu-project/qemu/-/issues/611 and
https://gitlab.com/qemu-project/qemu/-/issues/1127. In addition it allows Aurelien's
really old SPARC Linux images to boot once again, and also the NeXTCube machine can
now boot, load its kernel from disk and start to execute it.

The series breaks down into the following approximate sections:

  - Patches 1-6 do some initial tidy-up for ESP select commands

  - Patches 7-16 implement consolidation of the STAT_TC flag handling

  - Patches 17-36 start to bring esp_do_dma() and do_pdma_dma() into line with each
    other, removing unneeded hacks and fixing up end-of-phase logic

  - Patches 37-56 move the MESSAGE OUT and COMMAND phase processing to the new SCSI
    phase-based state machine, allowing the removal of the pDMA-specific callbacks

  - Patches 57-62 tidy up the end of SCSI command logic and adds deferred interrupt
    handling for the DATA IN phase to complement the existing DATA OUT phase
    implementation

  - Patches 63-64 remove the ti_cmd field which is now no longer required

  - Patches 65-67 move the last bit of MESSAGE OUT and COMMAND phase processing to
    the new SCSI phase-based state machine which allows get_cmd() to be removed

  - Patches 68-69 move the STATUS and MESSAGE IN phase processing to the new SCSI
    phase-based state machine

  - Patches 70-72 fix the reset of the ESP_RINTR/ESP_STAT registers which allows the
    TC underflow detection to work correctly

  - Patch 73 removes the restriction on FIFO read access if the DMA routines are not
    defined (which also allows the NeXTCube to pass its self-test)

  - Patches 74-77 update the state machine to allow more combinations of mixed TI
    and FIFO reads/writes exposed by my test images

  - Patches 78-88 are final tidy-ups and improvements, along with adding an
    implementation of the "Transfer Pad" command

There is a migration break for the Q800 machine here as the series removes the pDMA
subsection: this seems the safest option as it isn't possible to map the old pDMA
callback enum directly to the new state machine. In theory there is a small window
whereby the normal DMA state may miss setting data_ready correctly, but I haven't
been able to recreate this during my tests.


Notes for reviewers:

This series is not a particularly easy one to review without having good in-depth
knowledge of the ESP device. The biggest issue I found is that due to the state
machines used by the client upon receiving an interrupt, a change to a ESP register
in one place can cause a completely unrelated error elsewhere.

A prime example of this is the consolidation of the STAT_TC flag: updating the
emulated device to behave as described in the datasheet caused other failures until
other fixes/updates could be made. This is why the work is done over 2 separate
commits with a gap in-between to keep the series bisectible, clearly indicating how
sensitive this series is to commit order.

In order to implement the changes I decided to focus on 3 main test images that should
pass for each commit to keep the series bisectible:

1) Modern Linux kernel with am53c974.c (hppa)
2) Modern Linux kernel with mac_esp.c (q800)
3) Classic MacOS 8.0

Other images may stop working during parts of the series, which sometimes is the
only option as otherwise implementing a compatibility workaround would be
prohibitively time-consuming or impossible at that point in the series. By the
end of the series all of my test images work correctly, including a few extra that
don't work with QEMU current git master.

Finally it is worth reiterating that in its current form this series fixes several
significant bugs and improves the compatibility of the ESP emulation, and these changes
make future fixes considerably easier compared with the existing implementation.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>


Mark Cave-Ayland (88):
  esp: don't clear cmdfifo when esp_select() fails in get_cmd()
  esp: move existing request cancel check into esp_select()
  esp.c: add FIFO wraparound support to esp_fifo_pop_buf()
  esp: remove FIFO clear from esp_select()
  esp: move esp_select() to ESP selection commands from get_cmd()
  esp: update esp_set_tc() to set STAT_TC flag
  esp: start removal of manual STAT_TC setting when transfer counter
    reaches zero
  esp: move command execution logic to new esp_run_cmd() function
  esp: update TC check logic in do_dma_pdma_cb() to check for TC == 0
  esp: move buffer and TC logic into separate to/from device paths in
    esp_do_dma()
  esp.c: remove unused case from esp_pdma_read()
  esp.c: don't accumulate directly into cmdfifo
  esp.c: decrement the TC during MESSAGE OUT and COMMAND phases
  esp.c: introduce esp_set_phase() helper function
  esp.c: remove another set of manual STAT_TC updates
  esp.c: remove MacOS TI workaround that pads FIFO transfers to
    ESP_FIFO_SZ
  esp.c: don't reset the TC and ESP_RSEQ state when executing a SCSI
    command
  esp.c: don't clear RFLAGS register when DMA is complete
  esp: remove zero transfer size check from esp_do_dma()
  esp.c: update condition for esp_dma_done() in esp_do_dma() from device
    path
  esp.c: update condition for esp_dma_done() in esp_do_dma() to device
    path
  esp.c: ensure that the PDMA callback is called for every device read
  esp.c: don't immediately raise INTR_BS if SCSI data needed in
    esp_do_dma()
  esp.c: remove TC adjustment in esp_do_dma() from device path
  esp.c: remove unaligned adjustment in do_dma_pdma_cb() to device path
  esp.c: remove unneeded if() check in esp_transfer_data()
  esp.c: update end of transfer logic at the end of esp_transfer_data()
  esp.c: consolidate async_len and TC == 0 checks in do_dma_pdma_cb()
    and esp_do_dma()
  esp.c: fix premature end of phase logic esp_command_complete
  esp.c: move TC and FIFO check logic into esp_dma_done()
  esp.c: rename esp_dma_done() to esp_dma_ti_check()
  esp.c: copy PDMA logic for transfers to device from do_dma_pdma_cb()
    to esp_do_dma()
  esp.c: copy logic for do_cmd transfers from do_dma_pdma_cb() to
    esp_do_dma()
  esp.c: update esp_do_dma() bypass if async_len is zero to include
    non-zero transfer check
  esp.c: move end of SCSI transfer check after TC adjustment in
    do_dma_pdma_cb()
  esp.c: remove s_without_satn_pdma_cb() PDMA callback
  esp.c: introduce esp_get_phase() function
  esp.c: convert esp_do_dma() to switch statement based upon SCSI phase
  esp.c: convert do_dma_pdma_db() to switch statement based upon SCSI
    phase
  esp.c: convert esp_do_nodma() to switch statement based upon SCSI
    phase
  esp.c: convert esp_do_dma() do_cmd path to check for SCSI phase
    instead
  esp.c: convert do_dma_pdma_cb() do_cmd path to check for SCSI phase
    instead
  esp.c: convert esp_do_nodma() do_cmd path to check for SCSI phase
    instead
  esp.c: convert esp_reg_write() do_cmd path to check for SCSI phase
    instead
  esp.c: remove do_cmd from ESPState
  esp.c: untangle MESSAGE OUT and COMMAND phase logic in esp_do_dma()
  esp.c: untangle MESSAGE OUT and COMMAND phase logic in
    do_dma_pdma_cb()
  esp.c: untangle MESSAGE OUT and COMMAND phase logic in esp_do_nodma()
  esp.c: move CMD_SELATN end of message phase detection to esp_do_dma()
    and do_dma_pdma_cb()
  esp.c: move CMD_TI end of message phase detection to esp_do_dma() and
    do_dma_pdma_cb()
  esp.c: don't use get_cmd() for CMD_SEL DMA commands
  esp.c: move CMD_SELATNS end of command logic to esp_do_dma() and
    do_dma_pdma_cb()
  esp.c: replace do_dma_pdma_cb() with esp_do_dma()
  esp.c: move CMD_ICCS command logic to esp_do_dma()
  esp.c: always use esp_do_dma() in pdma_cb()
  esp.c: remove unused PDMA callback implementation
  esp.c: rename data_in_ready to to data_ready
  esp.c: separate logic based upon ESP command in esp_command_complete()
  esp.c: separate logic based upon ESP command in esp_transfer_data()
  esp.c: use deferred interrupts for both DATA IN and DATA OUT phases
  esp.c: remove DATA IN phase logic when reading from FIFO
  esp.c: zero command register when TI command terminates due to phase
    change
  esp.c: remove unneeded ti_cmd field
  esp.c: don't raise INTR_BS interrupt in DATA IN phase until TI command
    issued
  esp.c: move non-DMA TI logic to separate esp_nodma_ti_dataout()
    function
  esp.c: process non-DMA FIFO writes in esp_do_nodma()
  esp.c: replace get_cmd() with esp_do_nodma()
  esp.c: move write_response() non-DMA logic to esp_do_nodma()
  esp.c: consolidate end of command sequence after ICCS command
  esp.c: ensure that STAT_INT is cleared when reading ESP_RINTR
  esp.c: don't clear the SCSI phase when reading ESP_RINTR
  esp.c: handle TC underflow for DMA SCSI requests
  esp.c: remove restriction on FIFO read access when DMA memory routines
    defined
  esp.c: handle non-DMA FIFO writes used to terminate DMA commands
  esp.c: improve ESP_RSEQ logic consolidation
  esp.c: only transfer non-DMA COMMAND phase data for specific commands
  esp.c: only transfer non-DMA MESSAGE OUT phase data for specific
    commands
  esp.c: consolidate DMA and PDMA logic in DATA OUT phase
  esp.c: consolidate DMA and PDMA logic in DATA IN phase
  esp.c: consolidate DMA and PDMA logic in MESSAGE OUT phase
  esp.c: remove redundant n variable in PDMA COMMAND phase
  esp.c: consolidate DMA and PDMA logic in STATUS and MESSAGE IN phases
  esp.c: replace n variable with len in esp_do_nodma()
  esp.c: implement DMA Transfer Pad command for DATA phases
  esp.c: rename irq_data IRQ to drq_irq
  esp.c: keep track of the DRQ state during DMA
  esp.c: switch TypeInfo registration to use DEFINE_TYPES() macro
  esp.c: add my copyright to the file

 hw/scsi/esp.c         | 1371 ++++++++++++++++++++++-------------------
 hw/scsi/trace-events  |    1 +
 include/hw/scsi/esp.h |   18 +-
 3 files changed, 727 insertions(+), 663 deletions(-)

Comments

Mark Cave-Ayland Jan. 22, 2024, 12:41 p.m. UTC | #1
On 12/01/2024 12:52, Mark Cave-Ayland wrote:

> The ESP SCSI chip fundamentally consists of a FIFO for transferring data to/from
> the SCSI bus along with a command sequencer which automates various processes such
> as selection, message/command transfer and data transfer. What makes this chip
> particularly interesting is that the FIFO is also used as a buffer for DMA transfers
> which makes it possible to mix and match DMA and non-DMA transfers when sending and
> receiving data to/from the SCSI bus.
> 
> Whilst the current ESP emulation works well for most cases, there are several
> problems that I've encountered whilst trying to debug various guest issues:
> 
> 
> 1) Multiple code paths for non-DMA, DMA and pDMA
> 
> Consider a SCSI request being submitted by the guest: if it is a DMA SCSI
> request then it takes one path through get_cmd(), and a different path through
> get_cmd() for pDMA. This then leads to the DMA code path being called for DMA
> and the pDMA codepath being called for each FIFO write. Finally once you add
> in the non-DMA path you end up with a total of 5 potential codepaths for any
> given SCSI request.
> 
> 2) Manual handling of STAT_TC flag
> 
> According to the datasheet the STAT_TC flag in ESP_RSTAT is set when the DMA
> transfer counter reaches zero. The current code handles this manually in
> multiple places, including where it shouldn't be necessary.
> 
> 3) Deferred interrupts are used only for reads and not writes
> 
> The ESP emulation currently makes use of a deferred interrupt when submitting
> read commands i.e. the interrupt flags are not updated immediately but when
> the QEMU SCSI layer indicates it is ready. This works well for reads, but isn't
> implemented for writes which can cause problems when the host is heavily loaded
> or a large SCSI write is requested.
> 
> 4) Padding of pDMA TI transfers to a minimum of 16 bytes
> 
> In order to allow Classic MacOS to boot successfully there is a workaround that
> pads all pDMA TI transfers to a minimum of 16 bytes (see commit 7aa6baee7c
> "esp: add support for unaligned accesses"). This feature is not documented and it
> turns out that this is a symptom of incorrect handling of transfer
> overflow/underflow conditions.
> 
> 5) Duplication of some non-DMA logic in esp_reg_read() and esp_reg_write()
> 
> When reading and writing to the FIFO there is some duplication of logic from
> esp_do_nodma() in esp_reg_read() and esp_reg_write(). This should be eliminated
> in favour of a single state machine for handling non-DMA FIFO accesses in a
> single place.
> 
> 
> The series here reworks the ESP emulation to use a SCSI phase-based state machine
> with just two paths: one for DMA/pDMA requests, and another for non-DMA requests.
> The pDMA callbacks are completely removed so that there is now only a single path
> for DMA requests. As part of this work the manual STAT_TC handling is removed, and a
> couple of bugs in the SCSI phase selection are fixed to allow proper transfer
> underflow/overflow detection and recovery.
> 
> I've tested the series with all of my available ESP images and compatibility is
> greatly improved with these changes: I believe that this series should also fix a
> few GitLab issues including https://gitlab.com/qemu-project/qemu/-/issues/1831,
> https://gitlab.com/qemu-project/qemu/-/issues/611 and
> https://gitlab.com/qemu-project/qemu/-/issues/1127. In addition it allows Aurelien's
> really old SPARC Linux images to boot once again, and also the NeXTCube machine can
> now boot, load its kernel from disk and start to execute it.
> 
> The series breaks down into the following approximate sections:
> 
>    - Patches 1-6 do some initial tidy-up for ESP select commands
> 
>    - Patches 7-16 implement consolidation of the STAT_TC flag handling
> 
>    - Patches 17-36 start to bring esp_do_dma() and do_pdma_dma() into line with each
>      other, removing unneeded hacks and fixing up end-of-phase logic
> 
>    - Patches 37-56 move the MESSAGE OUT and COMMAND phase processing to the new SCSI
>      phase-based state machine, allowing the removal of the pDMA-specific callbacks
> 
>    - Patches 57-62 tidy up the end of SCSI command logic and adds deferred interrupt
>      handling for the DATA IN phase to complement the existing DATA OUT phase
>      implementation
> 
>    - Patches 63-64 remove the ti_cmd field which is now no longer required
> 
>    - Patches 65-67 move the last bit of MESSAGE OUT and COMMAND phase processing to
>      the new SCSI phase-based state machine which allows get_cmd() to be removed
> 
>    - Patches 68-69 move the STATUS and MESSAGE IN phase processing to the new SCSI
>      phase-based state machine
> 
>    - Patches 70-72 fix the reset of the ESP_RINTR/ESP_STAT registers which allows the
>      TC underflow detection to work correctly
> 
>    - Patch 73 removes the restriction on FIFO read access if the DMA routines are not
>      defined (which also allows the NeXTCube to pass its self-test)
> 
>    - Patches 74-77 update the state machine to allow more combinations of mixed TI
>      and FIFO reads/writes exposed by my test images
> 
>    - Patches 78-88 are final tidy-ups and improvements, along with adding an
>      implementation of the "Transfer Pad" command
> 
> There is a migration break for the Q800 machine here as the series removes the pDMA
> subsection: this seems the safest option as it isn't possible to map the old pDMA
> callback enum directly to the new state machine. In theory there is a small window
> whereby the normal DMA state may miss setting data_ready correctly, but I haven't
> been able to recreate this during my tests.
> 
> 
> Notes for reviewers:
> 
> This series is not a particularly easy one to review without having good in-depth
> knowledge of the ESP device. The biggest issue I found is that due to the state
> machines used by the client upon receiving an interrupt, a change to a ESP register
> in one place can cause a completely unrelated error elsewhere.
> 
> A prime example of this is the consolidation of the STAT_TC flag: updating the
> emulated device to behave as described in the datasheet caused other failures until
> other fixes/updates could be made. This is why the work is done over 2 separate
> commits with a gap in-between to keep the series bisectible, clearly indicating how
> sensitive this series is to commit order.
> 
> In order to implement the changes I decided to focus on 3 main test images that should
> pass for each commit to keep the series bisectible:
> 
> 1) Modern Linux kernel with am53c974.c (hppa)
> 2) Modern Linux kernel with mac_esp.c (q800)
> 3) Classic MacOS 8.0
> 
> Other images may stop working during parts of the series, which sometimes is the
> only option as otherwise implementing a compatibility workaround would be
> prohibitively time-consuming or impossible at that point in the series. By the
> end of the series all of my test images work correctly, including a few extra that
> don't work with QEMU current git master.
> 
> Finally it is worth reiterating that in its current form this series fixes several
> significant bugs and improves the compatibility of the ESP emulation, and these changes
> make future fixes considerably easier compared with the existing implementation.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> 
> Mark Cave-Ayland (88):
>    esp: don't clear cmdfifo when esp_select() fails in get_cmd()
>    esp: move existing request cancel check into esp_select()
>    esp.c: add FIFO wraparound support to esp_fifo_pop_buf()
>    esp: remove FIFO clear from esp_select()
>    esp: move esp_select() to ESP selection commands from get_cmd()
>    esp: update esp_set_tc() to set STAT_TC flag
>    esp: start removal of manual STAT_TC setting when transfer counter
>      reaches zero
>    esp: move command execution logic to new esp_run_cmd() function
>    esp: update TC check logic in do_dma_pdma_cb() to check for TC == 0
>    esp: move buffer and TC logic into separate to/from device paths in
>      esp_do_dma()
>    esp.c: remove unused case from esp_pdma_read()
>    esp.c: don't accumulate directly into cmdfifo
>    esp.c: decrement the TC during MESSAGE OUT and COMMAND phases
>    esp.c: introduce esp_set_phase() helper function
>    esp.c: remove another set of manual STAT_TC updates
>    esp.c: remove MacOS TI workaround that pads FIFO transfers to
>      ESP_FIFO_SZ
>    esp.c: don't reset the TC and ESP_RSEQ state when executing a SCSI
>      command
>    esp.c: don't clear RFLAGS register when DMA is complete
>    esp: remove zero transfer size check from esp_do_dma()
>    esp.c: update condition for esp_dma_done() in esp_do_dma() from device
>      path
>    esp.c: update condition for esp_dma_done() in esp_do_dma() to device
>      path
>    esp.c: ensure that the PDMA callback is called for every device read
>    esp.c: don't immediately raise INTR_BS if SCSI data needed in
>      esp_do_dma()
>    esp.c: remove TC adjustment in esp_do_dma() from device path
>    esp.c: remove unaligned adjustment in do_dma_pdma_cb() to device path
>    esp.c: remove unneeded if() check in esp_transfer_data()
>    esp.c: update end of transfer logic at the end of esp_transfer_data()
>    esp.c: consolidate async_len and TC == 0 checks in do_dma_pdma_cb()
>      and esp_do_dma()
>    esp.c: fix premature end of phase logic esp_command_complete
>    esp.c: move TC and FIFO check logic into esp_dma_done()
>    esp.c: rename esp_dma_done() to esp_dma_ti_check()
>    esp.c: copy PDMA logic for transfers to device from do_dma_pdma_cb()
>      to esp_do_dma()
>    esp.c: copy logic for do_cmd transfers from do_dma_pdma_cb() to
>      esp_do_dma()
>    esp.c: update esp_do_dma() bypass if async_len is zero to include
>      non-zero transfer check
>    esp.c: move end of SCSI transfer check after TC adjustment in
>      do_dma_pdma_cb()
>    esp.c: remove s_without_satn_pdma_cb() PDMA callback
>    esp.c: introduce esp_get_phase() function
>    esp.c: convert esp_do_dma() to switch statement based upon SCSI phase
>    esp.c: convert do_dma_pdma_db() to switch statement based upon SCSI
>      phase
>    esp.c: convert esp_do_nodma() to switch statement based upon SCSI
>      phase
>    esp.c: convert esp_do_dma() do_cmd path to check for SCSI phase
>      instead
>    esp.c: convert do_dma_pdma_cb() do_cmd path to check for SCSI phase
>      instead
>    esp.c: convert esp_do_nodma() do_cmd path to check for SCSI phase
>      instead
>    esp.c: convert esp_reg_write() do_cmd path to check for SCSI phase
>      instead
>    esp.c: remove do_cmd from ESPState
>    esp.c: untangle MESSAGE OUT and COMMAND phase logic in esp_do_dma()
>    esp.c: untangle MESSAGE OUT and COMMAND phase logic in
>      do_dma_pdma_cb()
>    esp.c: untangle MESSAGE OUT and COMMAND phase logic in esp_do_nodma()
>    esp.c: move CMD_SELATN end of message phase detection to esp_do_dma()
>      and do_dma_pdma_cb()
>    esp.c: move CMD_TI end of message phase detection to esp_do_dma() and
>      do_dma_pdma_cb()
>    esp.c: don't use get_cmd() for CMD_SEL DMA commands
>    esp.c: move CMD_SELATNS end of command logic to esp_do_dma() and
>      do_dma_pdma_cb()
>    esp.c: replace do_dma_pdma_cb() with esp_do_dma()
>    esp.c: move CMD_ICCS command logic to esp_do_dma()
>    esp.c: always use esp_do_dma() in pdma_cb()
>    esp.c: remove unused PDMA callback implementation
>    esp.c: rename data_in_ready to to data_ready
>    esp.c: separate logic based upon ESP command in esp_command_complete()
>    esp.c: separate logic based upon ESP command in esp_transfer_data()
>    esp.c: use deferred interrupts for both DATA IN and DATA OUT phases
>    esp.c: remove DATA IN phase logic when reading from FIFO
>    esp.c: zero command register when TI command terminates due to phase
>      change
>    esp.c: remove unneeded ti_cmd field
>    esp.c: don't raise INTR_BS interrupt in DATA IN phase until TI command
>      issued
>    esp.c: move non-DMA TI logic to separate esp_nodma_ti_dataout()
>      function
>    esp.c: process non-DMA FIFO writes in esp_do_nodma()
>    esp.c: replace get_cmd() with esp_do_nodma()
>    esp.c: move write_response() non-DMA logic to esp_do_nodma()
>    esp.c: consolidate end of command sequence after ICCS command
>    esp.c: ensure that STAT_INT is cleared when reading ESP_RINTR
>    esp.c: don't clear the SCSI phase when reading ESP_RINTR
>    esp.c: handle TC underflow for DMA SCSI requests
>    esp.c: remove restriction on FIFO read access when DMA memory routines
>      defined
>    esp.c: handle non-DMA FIFO writes used to terminate DMA commands
>    esp.c: improve ESP_RSEQ logic consolidation
>    esp.c: only transfer non-DMA COMMAND phase data for specific commands
>    esp.c: only transfer non-DMA MESSAGE OUT phase data for specific
>      commands
>    esp.c: consolidate DMA and PDMA logic in DATA OUT phase
>    esp.c: consolidate DMA and PDMA logic in DATA IN phase
>    esp.c: consolidate DMA and PDMA logic in MESSAGE OUT phase
>    esp.c: remove redundant n variable in PDMA COMMAND phase
>    esp.c: consolidate DMA and PDMA logic in STATUS and MESSAGE IN phases
>    esp.c: replace n variable with len in esp_do_nodma()
>    esp.c: implement DMA Transfer Pad command for DATA phases
>    esp.c: rename irq_data IRQ to drq_irq
>    esp.c: keep track of the DRQ state during DMA
>    esp.c: switch TypeInfo registration to use DEFINE_TYPES() macro
>    esp.c: add my copyright to the file
> 
>   hw/scsi/esp.c         | 1371 ++++++++++++++++++++++-------------------
>   hw/scsi/trace-events  |    1 +
>   include/hw/scsi/esp.h |   18 +-
>   3 files changed, 727 insertions(+), 663 deletions(-)

Any other thoughts on this series, particularly from SCSI maintainers? I appreciate 
that it's not the easiest one to review, and I'm grateful for the T-B tags received 
so far.


ATB,

Mark.
Thomas Huth Jan. 25, 2024, 8:53 a.m. UTC | #2
On 12/01/2024 13.52, Mark Cave-Ayland wrote:
> The ESP SCSI chip fundamentally consists of a FIFO for transferring data to/from
> the SCSI bus along with a command sequencer which automates various processes such
> as selection, message/command transfer and data transfer. What makes this chip
> particularly interesting is that the FIFO is also used as a buffer for DMA transfers
> which makes it possible to mix and match DMA and non-DMA transfers when sending and
> receiving data to/from the SCSI bus.
> 
> Whilst the current ESP emulation works well for most cases, there are several
> problems that I've encountered whilst trying to debug various guest issues:
> 
> 
> 1) Multiple code paths for non-DMA, DMA and pDMA
> 
> Consider a SCSI request being submitted by the guest: if it is a DMA SCSI
> request then it takes one path through get_cmd(), and a different path through
> get_cmd() for pDMA. This then leads to the DMA code path being called for DMA
> and the pDMA codepath being called for each FIFO write. Finally once you add
> in the non-DMA path you end up with a total of 5 potential codepaths for any
> given SCSI request.
> 
> 2) Manual handling of STAT_TC flag
> 
> According to the datasheet the STAT_TC flag in ESP_RSTAT is set when the DMA
> transfer counter reaches zero. The current code handles this manually in
> multiple places, including where it shouldn't be necessary.
> 
> 3) Deferred interrupts are used only for reads and not writes
> 
> The ESP emulation currently makes use of a deferred interrupt when submitting
> read commands i.e. the interrupt flags are not updated immediately but when
> the QEMU SCSI layer indicates it is ready. This works well for reads, but isn't
> implemented for writes which can cause problems when the host is heavily loaded
> or a large SCSI write is requested.
> 
> 4) Padding of pDMA TI transfers to a minimum of 16 bytes
> 
> In order to allow Classic MacOS to boot successfully there is a workaround that
> pads all pDMA TI transfers to a minimum of 16 bytes (see commit 7aa6baee7c
> "esp: add support for unaligned accesses"). This feature is not documented and it
> turns out that this is a symptom of incorrect handling of transfer
> overflow/underflow conditions.
> 
> 5) Duplication of some non-DMA logic in esp_reg_read() and esp_reg_write()
> 
> When reading and writing to the FIFO there is some duplication of logic from
> esp_do_nodma() in esp_reg_read() and esp_reg_write(). This should be eliminated
> in favour of a single state machine for handling non-DMA FIFO accesses in a
> single place.
> 
> 
> The series here reworks the ESP emulation to use a SCSI phase-based state machine
> with just two paths: one for DMA/pDMA requests, and another for non-DMA requests.
> The pDMA callbacks are completely removed so that there is now only a single path
> for DMA requests. As part of this work the manual STAT_TC handling is removed, and a
> couple of bugs in the SCSI phase selection are fixed to allow proper transfer
> underflow/overflow detection and recovery.
> 
> I've tested the series with all of my available ESP images and compatibility is
> greatly improved with these changes: I believe that this series should also fix a
> few GitLab issues including https://gitlab.com/qemu-project/qemu/-/issues/1831,
> https://gitlab.com/qemu-project/qemu/-/issues/611 and
> https://gitlab.com/qemu-project/qemu/-/issues/1127. In addition it allows Aurelien's
> really old SPARC Linux images to boot once again, and also the NeXTCube machine can
> now boot, load its kernel from disk and start to execute it.

FWIW, I did some sanity checking by installing an old Fedora 12 onto a SCSI 
disk attached to the am53c974 and dc390 controllers, and everything still 
works fine with your patches. Thus feel free to add:

Tested-by: Thomas Huth <thuth@redhat.com>