diff mbox series

hw/usb/hcd-ohci: Fix ohci_service_td: accept valid TDs

Message ID 20240520232634.317988-1-dmamfmgm@gmail.com
State New
Headers show
Series hw/usb/hcd-ohci: Fix ohci_service_td: accept valid TDs | expand

Commit Message

Cord Amfmgm May 20, 2024, 11:26 p.m. UTC
From: Cord Amfmgm <dmamfmgm@gmail.com>

This changes the way the ohci emulation handles a Transfer Descriptor with
"Current Buffer Pointer" set to "Buffer End" + 1.

The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be
to signal the buffer has zero length. Currently qemu only accepts zero-length
Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware
accepts both cases.

The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
and earlier matched the spec. (I haven't taken the time to bisect exactly
where the logic was changed.)

With a tiny OS[1] that boots and executes a test, the issue can be seen:

* OS that sends USB requests to a USB mass storage device
  but sends td.cbp = td.be + 1
* qemu 4.2
* qemu HEAD (4e66a0854)
* Actual OHCI controller (hardware)

Command line:
qemu-system-x86_64 -m 20 \
 -device pci-ohci,id=ohci \
 -drive if=none,format=raw,id=d,file=testmbr.raw \
 -device usb-storage,bus=ohci.0,drive=d \
 --trace "usb_*" --trace "ohci_*" -D qemu.log

Results are:

 qemu 4.2   | qemu HEAD  | actual HW
------------+------------+------------
 works fine | ohci_die() | works fine

Tip: if the flags "-serial pty -serial stdio" are added to the command line
the test will output USB requests like this:

Testing qemu HEAD:

> Free mem 2M ohci port2 conn FS
> setup { 80 6 0 1 0 0 8 0 }
> ED info=80000 { mps=8 en=0 d=0 } tail=c20920
>   td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907
>   td1 c20960 nxt=c20980 f3140000    in cbp=c20908 be=c2090f
>   td2 c20980 nxt=c20920 f3080000   out cbp=c20910 be=c2090f ohci20 host err
> usb stopped

And in qemu.log:

usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > next_offset=0x00c2090f

Testing qemu 4.2:

> Free mem 2M ohci port2 conn FS
> setup { 80 6 0 1 0 0 8 0 }
> ED info=80000 { mps=8 en=0 d=0 } tail=620920
>   td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0 be=620907
>   td1 620960 nxt=620980 f3140000    in cbp=620908 be=62090f       cbp=0 be=62090f
>   td2 620980 nxt=620920 f3080000   out cbp=620910 be=62090f       cbp=0 be=62090f
>    rx { 12 1 0 2 0 0 0 8 }
> setup { 0 5 1 0 0 0 0 0 } tx {}
> ED info=80000 { mps=8 en=0 d=0 } tail=620880
>   td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0 be=620907
>   td1 620960 nxt=620880 f3100000    in cbp=620908 be=620907       cbp=0 be=620907
> setup { 80 6 0 1 0 0 12 0 }
> ED info=80001 { mps=8 en=0 d=1 } tail=620960
>   td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927       cbp=0 be=620927
>   td1 6209c0 nxt=6209e0 f3140000    in cbp=620928 be=620939       cbp=0 be=620939
>   td2 6209e0 nxt=620960 f3080000   out cbp=62093a be=620939       cbp=0 be=620939
>    rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> setup { 80 6 0 2 0 0 0 1 }
> ED info=80001 { mps=8 en=0 d=1 } tail=620880
>   td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27       cbp=0 be=620a27
>   td1 6209a0 nxt=6209c0 f3140004    in cbp=620a28 be=620b27       cbp=620a48 be=620b27
>   td2 6209c0 nxt=620880 f3080000   out cbp=620b28 be=620b27       cbp=0 be=620b27
>    rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2 40 0 0 }
> setup { 0 9 1 0 0 0 0 0 } tx {}
> ED info=80001 { mps=8 en=0 d=1 } tail=620900
>   td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07       cbp=0 be=620a07
>   td1 620940 nxt=620900 f3100000    in cbp=620a08 be=620a07       cbp=0 be=620a07

[1] The OS disk image has been emailed to philmd@linaro.org, mjt@tls.msk.ru,
and kraxel@redhat.com:

* testCbpOffBy1.img.xz
* sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed

Signed-off-by: Cord Amfmgm <dmamfmgm@gmail.com>
---
 hw/usb/hcd-ohci.c   | 4 ++--
 hw/usb/trace-events | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

Comments

Peter Maydell May 30, 2024, 3:05 p.m. UTC | #1
On Tue, 21 May 2024 at 00:26, David Hubbard <dmamfmgm@gmail.com> wrote:
>
> From: Cord Amfmgm <dmamfmgm@gmail.com>
>
> This changes the way the ohci emulation handles a Transfer Descriptor with
> "Current Buffer Pointer" set to "Buffer End" + 1.
>
> The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be
> to signal the buffer has zero length. Currently qemu only accepts zero-length
> Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware
> accepts both cases.
>
> The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
> and earlier matched the spec. (I haven't taken the time to bisect exactly
> where the logic was changed.)

Almost certainly this was commit 1328fe0c32d54 ("hw: usb: hcd-ohci:
check len and frame_number variables"), which added these bounds
checks. Prior to that we did no bounds checking at all, which
meant that we permitted cbp=be+1 to mean a zero length, but also
that we permitted the guest to overrun host-side buffers by
specifying completely bogus cbp and be values. The timeframe is
more or less right (2020), at least.

-- PMM
Alex Bennée May 30, 2024, 7:14 p.m. UTC | #2
David Hubbard <dmamfmgm@gmail.com> writes:

> From: Cord Amfmgm <dmamfmgm@gmail.com>
>
> This changes the way the ohci emulation handles a Transfer Descriptor with
> "Current Buffer Pointer" set to "Buffer End" + 1.
>
> The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be
> to signal the buffer has zero length. Currently qemu only accepts zero-length
> Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware
> accepts both cases.

Which version of the OHCI spec is this? I can't find it in the one copy
Google throws up:

  http://download.microsoft.com/download/1/6/1/161ba512-40e2-4cc9-843a-923143f3456c/ohci_11.pdf

> The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
> and earlier matched the spec. (I haven't taken the time to bisect exactly
> where the logic was changed.)
>
> With a tiny OS[1] that boots and executes a test, the issue can be seen:
>
> * OS that sends USB requests to a USB mass storage device
>   but sends td.cbp = td.be + 1
> * qemu 4.2
> * qemu HEAD (4e66a0854)
> * Actual OHCI controller (hardware)
<snip>
Cord Amfmgm May 30, 2024, 9:14 p.m. UTC | #3
On Thu, May 30, 2024 at 2:14 PM Alex Bennée <alex.bennee@linaro.org> wrote:

> David Hubbard <dmamfmgm@gmail.com> writes:
>
> > From: Cord Amfmgm <dmamfmgm@gmail.com>
> >
> > This changes the way the ohci emulation handles a Transfer Descriptor
> with
> > "Current Buffer Pointer" set to "Buffer End" + 1.
> >
> > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than
> td.be
> > to signal the buffer has zero length. Currently qemu only accepts
> zero-length
> > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI
> hardware
> > accepts both cases.
>
> Which version of the OHCI spec is this? I can't find it in the one copy
> Google throws up:
>
>
> http://download.microsoft.com/download/1/6/1/161ba512-40e2-4cc9-843a-923143f3456c/ohci_11.pdf
>
>
Replace http with https in that URL and it downloads the PDF OK - it is for
IEEE-1394 Firewire though.

Try this link: https://www.cs.usfca.edu/~cruse/cs698s10/hcir1_0a.pdf - I am
on page 35/160 (the page is numbered "21" on the bottom) for the Table 4-2.
Alex Bennée May 31, 2024, 1:59 p.m. UTC | #4
David Hubbard <dmamfmgm@gmail.com> writes:

> From: Cord Amfmgm <dmamfmgm@gmail.com>
>
> This changes the way the ohci emulation handles a Transfer Descriptor with
> "Current Buffer Pointer" set to "Buffer End" + 1.
>
> The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be
> to signal the buffer has zero length. Currently qemu only accepts zero-length
> Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware
> accepts both cases.
>
> The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
> and earlier matched the spec. (I haven't taken the time to bisect exactly
> where the logic was changed.)

I find it hard to characterise this as a regression because we've
basically gone from no checks to actually doing bounds checks:

  1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)

The argument here seems to be that real hardware is laxer than the specs
in what it accepts.

> With a tiny OS[1] that boots and executes a test, the issue can be seen:
>
> * OS that sends USB requests to a USB mass storage device
>   but sends td.cbp = td.be + 1
> * qemu 4.2
> * qemu HEAD (4e66a0854)
> * Actual OHCI controller (hardware)
>
> Command line:
> qemu-system-x86_64 -m 20 \
>  -device pci-ohci,id=ohci \
>  -drive if=none,format=raw,id=d,file=testmbr.raw \
>  -device usb-storage,bus=ohci.0,drive=d \
>  --trace "usb_*" --trace "ohci_*" -D qemu.log
>
> Results are:
>
>  qemu 4.2   | qemu HEAD  | actual HW
> ------------+------------+------------
>  works fine | ohci_die() | works fine
>
> Tip: if the flags "-serial pty -serial stdio" are added to the command line
> the test will output USB requests like this:
>
> Testing qemu HEAD:
>
>> Free mem 2M ohci port2 conn FS
>> setup { 80 6 0 1 0 0 8 0 }
>> ED info=80000 { mps=8 en=0 d=0 } tail=c20920
>>   td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907
>>   td1 c20960 nxt=c20980 f3140000    in cbp=c20908 be=c2090f
>>   td2 c20980 nxt=c20920 f3080000   out cbp=c20910 be=c2090f ohci20 host err
>> usb stopped
>
> And in qemu.log:
>
> usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > next_offset=0x00c2090f
>
> Testing qemu 4.2:
>
>> Free mem 2M ohci port2 conn FS
>> setup { 80 6 0 1 0 0 8 0 }
>> ED info=80000 { mps=8 en=0 d=0 } tail=620920
>>   td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0 be=620907
>>   td1 620960 nxt=620980 f3140000    in cbp=620908 be=62090f       cbp=0 be=62090f
>>   td2 620980 nxt=620920 f3080000   out cbp=620910 be=62090f       cbp=0 be=62090f
>>    rx { 12 1 0 2 0 0 0 8 }
>> setup { 0 5 1 0 0 0 0 0 } tx {}
>> ED info=80000 { mps=8 en=0 d=0 } tail=620880
>>   td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0 be=620907
>>   td1 620960 nxt=620880 f3100000    in cbp=620908 be=620907       cbp=0 be=620907
>> setup { 80 6 0 1 0 0 12 0 }
>> ED info=80001 { mps=8 en=0 d=1 } tail=620960
>>   td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927       cbp=0 be=620927
>>   td1 6209c0 nxt=6209e0 f3140000    in cbp=620928 be=620939       cbp=0 be=620939
>>   td2 6209e0 nxt=620960 f3080000   out cbp=62093a be=620939       cbp=0 be=620939
>>    rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
>> setup { 80 6 0 2 0 0 0 1 }
>> ED info=80001 { mps=8 en=0 d=1 } tail=620880
>>   td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27       cbp=0 be=620a27
>>   td1 6209a0 nxt=6209c0 f3140004    in cbp=620a28 be=620b27       cbp=620a48 be=620b27
>>   td2 6209c0 nxt=620880 f3080000   out cbp=620b28 be=620b27       cbp=0 be=620b27
>>    rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2 40 0 0 }
>> setup { 0 9 1 0 0 0 0 0 } tx {}
>> ED info=80001 { mps=8 en=0 d=1 } tail=620900
>>   td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07       cbp=0 be=620a07
>>   td1 620940 nxt=620900 f3100000    in cbp=620a08 be=620a07       cbp=0 be=620a07
>
> [1] The OS disk image has been emailed to philmd@linaro.org, mjt@tls.msk.ru,
> and kraxel@redhat.com:
>
> * testCbpOffBy1.img.xz
> * sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed
>
> Signed-off-by: Cord Amfmgm <dmamfmgm@gmail.com>
> ---
>  hw/usb/hcd-ohci.c   | 4 ++--
>  hw/usb/trace-events | 1 +
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> index acd6016980..71b54914d3 100644
> --- a/hw/usb/hcd-ohci.c
> +++ b/hw/usb/hcd-ohci.c
> @@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
>          if ((td.cbp & 0xfffff000) != (td.be & 0xfffff000)) {
>              len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
>          } else {
> -            if (td.cbp > td.be) {
> -                trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
> +            if (td.cbp - 1 > td.be) {  /* rely on td.cbp != 0 */
> +                trace_usb_ohci_td_bad_buf(td.cbp, td.be);
>                  ohci_die(ohci);
>                  return 1;
>              }

With the updated commit message:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Peter Maydell May 31, 2024, 2:03 p.m. UTC | #5
On Tue, 21 May 2024 at 00:26, David Hubbard <dmamfmgm@gmail.com> wrote:
>
> From: Cord Amfmgm <dmamfmgm@gmail.com>
>
> This changes the way the ohci emulation handles a Transfer Descriptor with
> "Current Buffer Pointer" set to "Buffer End" + 1.
>
> The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be
> to signal the buffer has zero length. Currently qemu only accepts zero-length
> Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware
> accepts both cases.
>
> The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
> and earlier matched the spec. (I haven't taken the time to bisect exactly
> where the logic was changed.)
>
> With a tiny OS[1] that boots and executes a test, the issue can be seen:
>
> * OS that sends USB requests to a USB mass storage device
>   but sends td.cbp = td.be + 1
> * qemu 4.2
> * qemu HEAD (4e66a0854)
> * Actual OHCI controller (hardware)
>
> Command line:
> qemu-system-x86_64 -m 20 \
>  -device pci-ohci,id=ohci \
>  -drive if=none,format=raw,id=d,file=testmbr.raw \
>  -device usb-storage,bus=ohci.0,drive=d \
>  --trace "usb_*" --trace "ohci_*" -D qemu.log
>
> Results are:
>
>  qemu 4.2   | qemu HEAD  | actual HW
> ------------+------------+------------
>  works fine | ohci_die() | works fine
>
> Tip: if the flags "-serial pty -serial stdio" are added to the command line
> the test will output USB requests like this:
>
> Testing qemu HEAD:
>
> > Free mem 2M ohci port2 conn FS
> > setup { 80 6 0 1 0 0 8 0 }
> > ED info=80000 { mps=8 en=0 d=0 } tail=c20920
> >   td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907
> >   td1 c20960 nxt=c20980 f3140000    in cbp=c20908 be=c2090f
> >   td2 c20980 nxt=c20920 f3080000   out cbp=c20910 be=c2090f ohci20 host err
> > usb stopped
>
> And in qemu.log:
>
> usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > next_offset=0x00c2090f
>
> Testing qemu 4.2:
>
> > Free mem 2M ohci port2 conn FS
> > setup { 80 6 0 1 0 0 8 0 }
> > ED info=80000 { mps=8 en=0 d=0 } tail=620920
> >   td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0 be=620907
> >   td1 620960 nxt=620980 f3140000    in cbp=620908 be=62090f       cbp=0 be=62090f
> >   td2 620980 nxt=620920 f3080000   out cbp=620910 be=62090f       cbp=0 be=62090f
> >    rx { 12 1 0 2 0 0 0 8 }
> > setup { 0 5 1 0 0 0 0 0 } tx {}
> > ED info=80000 { mps=8 en=0 d=0 } tail=620880
> >   td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0 be=620907
> >   td1 620960 nxt=620880 f3100000    in cbp=620908 be=620907       cbp=0 be=620907
> > setup { 80 6 0 1 0 0 12 0 }
> > ED info=80001 { mps=8 en=0 d=1 } tail=620960
> >   td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927       cbp=0 be=620927
> >   td1 6209c0 nxt=6209e0 f3140000    in cbp=620928 be=620939       cbp=0 be=620939
> >   td2 6209e0 nxt=620960 f3080000   out cbp=62093a be=620939       cbp=0 be=620939
> >    rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> > setup { 80 6 0 2 0 0 0 1 }
> > ED info=80001 { mps=8 en=0 d=1 } tail=620880
> >   td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27       cbp=0 be=620a27
> >   td1 6209a0 nxt=6209c0 f3140004    in cbp=620a28 be=620b27       cbp=620a48 be=620b27
> >   td2 6209c0 nxt=620880 f3080000   out cbp=620b28 be=620b27       cbp=0 be=620b27
> >    rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2 40 0 0 }
> > setup { 0 9 1 0 0 0 0 0 } tx {}
> > ED info=80001 { mps=8 en=0 d=1 } tail=620900
> >   td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07       cbp=0 be=620a07
> >   td1 620940 nxt=620900 f3100000    in cbp=620a08 be=620a07       cbp=0 be=620a07
>
> [1] The OS disk image has been emailed to philmd@linaro.org, mjt@tls.msk.ru,
> and kraxel@redhat.com:
>
> * testCbpOffBy1.img.xz
> * sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed
>
> Signed-off-by: Cord Amfmgm <dmamfmgm@gmail.com>
> ---
>  hw/usb/hcd-ohci.c   | 4 ++--
>  hw/usb/trace-events | 1 +
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> index acd6016980..71b54914d3 100644
> --- a/hw/usb/hcd-ohci.c
> +++ b/hw/usb/hcd-ohci.c
> @@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
>          if ((td.cbp & 0xfffff000) != (td.be & 0xfffff000)) {
>              len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
>          } else {
> -            if (td.cbp > td.be) {
> -                trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
> +            if (td.cbp - 1 > td.be) {  /* rely on td.cbp != 0 */
> +                trace_usb_ohci_td_bad_buf(td.cbp, td.be);
>                  ohci_die(ohci);
>                  return 1;
>              }

This patch seems to me to do what the commit message sets out to
do, and it looks unlikely to have any unintended side effects
because it turns a "USB controller flags an error" case into
a "treat as zero length packet" case, and I have trouble
imagining that any guest could be relying on looking for the
controller error. On that basis I'm inclined to accept it.

What I would like to see is what we could classify under
"rationale", which is to say "what prompted us to make this
change?". In my experience it's important to record this
(including in the commit message). There are of course
many cases in QEMU's git history where we failed to do that,
but in general I think it's a good standard to meet. (I
am also erring on the side of caution in reviewing this
particular patch, because I don't know the relevant standards
or this bit of the code very well.)

Why do we care about the motivation for a patch?

(1) In the present: it helps to raise confidence that the
proposed new behaviour is right. This is good because QEMU's
test suite is far from comprehensive, so in some sense any
change to the codebase is a risk.

For instance, if a change is being made because the QNX demo
floppy doesn't run, then the fact that the change fixes that
failure-to-run indicates that our interpretation of the
meaning of the standard, or of what should happen in the
grey areas that the documentation doesn't clearly describe,
matches what the QNX driver author (an unrelated third party)
thought and probably also what a lot of in-the-field hardware
does (since QNX was no doubt tested on a lot of different PCs
back in the day).

On the other hand, if a change is proposed because it fixes
booting with older Linux kernels prior to commit XYZ, and
kernel commit XYZ turns out to be "make this device driver
program the hardware according to the specification rather
than relying on an accident of timing", then we might want
to look at where we want to be in the tradeoff of "run older
kernels" versus "put workaround for a guest software issue
into QEMU". (Workarounds for guest software bugs are something
I'm very reluctant to put into QEMU, because my experience
is that once they're in the codebase we can essentially never
remove them, because we don't know what guest code might
be relying on them. But sometimes they're a necessary evil.)

(2) In the future: if in a year's time or more, somebody
reports that a particular commit has regressed some specific
guest workload they have, knowing why we made the change in
the first place is really useful in investigating the
regression.

If we need to change code that was initially added to solve
a problem when running FreeBSD, we know we need to re-test
with FreeBSD.

If the change went in to fix a buffer overrun, we know we
need to be careful and cross-check that we don't reintroduce
the overrun in the course of fixing a regression.

If a change is one that we made on the grounds of "reading
the spec and the code, this looked like it was clearly wrong,
but we don't have a definite repro case of it breaking a guest"
then that might put "revert the change, we were mistaken" on
the table as a response to a future regression report.
And so on.

thanks
-- PMM
Cord Amfmgm May 31, 2024, 6:16 p.m. UTC | #6
On Fri, May 31, 2024 at 9:03 AM Peter Maydell <peter.maydell@linaro.org>
wrote:

> On Tue, 21 May 2024 at 00:26, David Hubbard <dmamfmgm@gmail.com> wrote:
> >
> > From: Cord Amfmgm <dmamfmgm@gmail.com>
> >
> > This changes the way the ohci emulation handles a Transfer Descriptor
> with
> > "Current Buffer Pointer" set to "Buffer End" + 1.
> >
> > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than
> td.be
> > to signal the buffer has zero length. Currently qemu only accepts
> zero-length
> > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI
> hardware
> > accepts both cases.
> >
> > The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
> > and earlier matched the spec. (I haven't taken the time to bisect exactly
> > where the logic was changed.)
> >
> > With a tiny OS[1] that boots and executes a test, the issue can be seen:
> >
> > * OS that sends USB requests to a USB mass storage device
> >   but sends td.cbp = td.be + 1
> > * qemu 4.2
> > * qemu HEAD (4e66a0854)
> > * Actual OHCI controller (hardware)
> >
> > Command line:
> > qemu-system-x86_64 -m 20 \
> >  -device pci-ohci,id=ohci \
> >  -drive if=none,format=raw,id=d,file=testmbr.raw \
> >  -device usb-storage,bus=ohci.0,drive=d \
> >  --trace "usb_*" --trace "ohci_*" -D qemu.log
> >
> > Results are:
> >
> >  qemu 4.2   | qemu HEAD  | actual HW
> > ------------+------------+------------
> >  works fine | ohci_die() | works fine
> >
> > Tip: if the flags "-serial pty -serial stdio" are added to the command
> line
> > the test will output USB requests like this:
> >
> > Testing qemu HEAD:
> >
> > > Free mem 2M ohci port2 conn FS
> > > setup { 80 6 0 1 0 0 8 0 }
> > > ED info=80000 { mps=8 en=0 d=0 } tail=c20920
> > >   td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907
> > >   td1 c20960 nxt=c20980 f3140000    in cbp=c20908 be=c2090f
> > >   td2 c20980 nxt=c20920 f3080000   out cbp=c20910 be=c2090f ohci20
> host err
> > > usb stopped
> >
> > And in qemu.log:
> >
> > usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 >
> next_offset=0x00c2090f
> >
> > Testing qemu 4.2:
> >
> > > Free mem 2M ohci port2 conn FS
> > > setup { 80 6 0 1 0 0 8 0 }
> > > ED info=80000 { mps=8 en=0 d=0 } tail=620920
> > >   td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907
>  cbp=0 be=620907
> > >   td1 620960 nxt=620980 f3140000    in cbp=620908 be=62090f
>  cbp=0 be=62090f
> > >   td2 620980 nxt=620920 f3080000   out cbp=620910 be=62090f
>  cbp=0 be=62090f
> > >    rx { 12 1 0 2 0 0 0 8 }
> > > setup { 0 5 1 0 0 0 0 0 } tx {}
> > > ED info=80000 { mps=8 en=0 d=0 } tail=620880
> > >   td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907
>  cbp=0 be=620907
> > >   td1 620960 nxt=620880 f3100000    in cbp=620908 be=620907
>  cbp=0 be=620907
> > > setup { 80 6 0 1 0 0 12 0 }
> > > ED info=80001 { mps=8 en=0 d=1 } tail=620960
> > >   td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927
>  cbp=0 be=620927
> > >   td1 6209c0 nxt=6209e0 f3140000    in cbp=620928 be=620939
>  cbp=0 be=620939
> > >   td2 6209e0 nxt=620960 f3080000   out cbp=62093a be=620939
>  cbp=0 be=620939
> > >    rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> > > setup { 80 6 0 2 0 0 0 1 }
> > > ED info=80001 { mps=8 en=0 d=1 } tail=620880
> > >   td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27
>  cbp=0 be=620a27
> > >   td1 6209a0 nxt=6209c0 f3140004    in cbp=620a28 be=620b27
>  cbp=620a48 be=620b27
> > >   td2 6209c0 nxt=620880 f3080000   out cbp=620b28 be=620b27
>  cbp=0 be=620b27
> > >    rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2
> 40 0 0 }
> > > setup { 0 9 1 0 0 0 0 0 } tx {}
> > > ED info=80001 { mps=8 en=0 d=1 } tail=620900
> > >   td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07
>  cbp=0 be=620a07
> > >   td1 620940 nxt=620900 f3100000    in cbp=620a08 be=620a07
>  cbp=0 be=620a07
> >
> > [1] The OS disk image has been emailed to philmd@linaro.org,
> mjt@tls.msk.ru,
> > and kraxel@redhat.com:
> >
> > * testCbpOffBy1.img.xz
> > * sha256:
> f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed
> >
> > Signed-off-by: Cord Amfmgm <dmamfmgm@gmail.com>
> > ---
> >  hw/usb/hcd-ohci.c   | 4 ++--
> >  hw/usb/trace-events | 1 +
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> > index acd6016980..71b54914d3 100644
> > --- a/hw/usb/hcd-ohci.c
> > +++ b/hw/usb/hcd-ohci.c
> > @@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct
> ohci_ed *ed)
> >          if ((td.cbp & 0xfffff000) != (td.be & 0xfffff000)) {
> >              len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
> >          } else {
> > -            if (td.cbp > td.be) {
> > -                trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
> > +            if (td.cbp - 1 > td.be) {  /* rely on td.cbp != 0 */
> > +                trace_usb_ohci_td_bad_buf(td.cbp, td.be);
> >                  ohci_die(ohci);
> >                  return 1;
> >              }
>
> This patch seems to me to do what the commit message sets out to
> do, and it looks unlikely to have any unintended side effects
> because it turns a "USB controller flags an error" case into
> a "treat as zero length packet" case, and I have trouble
> imagining that any guest could be relying on looking for the
> controller error. On that basis I'm inclined to accept it.
>
> What I would like to see is what we could classify under
> "rationale", which is to say "what prompted us to make this
> change?". In my experience it's important to record this
> (including in the commit message). There are of course
> many cases in QEMU's git history where we failed to do that,
> but in general I think it's a good standard to meet. (I
> am also erring on the side of caution in reviewing this
> particular patch, because I don't know the relevant standards
> or this bit of the code very well.)
>
> Why do we care about the motivation for a patch?
>
> (1) In the present: it helps to raise confidence that the
> proposed new behaviour is right. This is good because QEMU's
> test suite is far from comprehensive, so in some sense any
> change to the codebase is a risk.
>
> For instance, if a change is being made because the QNX demo
> floppy doesn't run, then the fact that the change fixes that
> failure-to-run indicates that our interpretation of the
> meaning of the standard, or of what should happen in the
> grey areas that the documentation doesn't clearly describe,
> matches what the QNX driver author (an unrelated third party)
> thought and probably also what a lot of in-the-field hardware
> does (since QNX was no doubt tested on a lot of different PCs
> back in the day).
>
> On the other hand, if a change is proposed because it fixes
> booting with older Linux kernels prior to commit XYZ, and
> kernel commit XYZ turns out to be "make this device driver
> program the hardware according to the specification rather
> than relying on an accident of timing", then we might want
> to look at where we want to be in the tradeoff of "run older
> kernels" versus "put workaround for a guest software issue
> into QEMU". (Workarounds for guest software bugs are something
> I'm very reluctant to put into QEMU, because my experience
> is that once they're in the codebase we can essentially never
> remove them, because we don't know what guest code might
> be relying on them. But sometimes they're a necessary evil.)
>
> (2) In the future: if in a year's time or more, somebody
> reports that a particular commit has regressed some specific
> guest workload they have, knowing why we made the change in
> the first place is really useful in investigating the
> regression.
>
> If we need to change code that was initially added to solve
> a problem when running FreeBSD, we know we need to re-test
> with FreeBSD.
>
> If the change went in to fix a buffer overrun, we know we
> need to be careful and cross-check that we don't reintroduce
> the overrun in the course of fixing a regression.
>
> If a change is one that we made on the grounds of "reading
> the spec and the code, this looked like it was clearly wrong,
> but we don't have a definite repro case of it breaking a guest"
> then that might put "revert the change, we were mistaken" on
> the table as a response to a future regression report.
> And so on.
>
> thanks
> -- PMM
>

Thanks, in responding to that, I'm breaking down my responses into the
following answers:

Q1: (summarizing) What prompted us to make this change?

A1: I'm summarizing what I wrote in previous emails and the commit message -

* Bring qemu closer to actual hw with a neatly packaged test case to
demonstrate the behavior
* I interpret the spec (Table 4-2) as saying the "be = cbp - 1" is valid,
in addition to setting "cbp = 0"
* I interpret it that way due to a comment in an old linux kernel version
in the 2.x range, ohci-hcd.c file. It said (paraphrasing) some misbehaving
ohci controllers would fetch physical memory at 0 when cbp = 0 was in the
Transfer Descriptor

Q2: (summarizing) is the proposed new behaviour right?

A2: Like what Peter Maydell wrote, I believe this turns a "USB controller
flags an error" case into "USB controller treats it as a zero length
packet" case.

I came across this corner case as a result of that comment in an old 2.x
linux kernel. I am not an expert on computer history and old OSes that
might want this behavior.

Q3: (summarizing, take 1) if this patch is called up later and folks are
wondering if there are older OSes that need to be tested?

A3: I do not know of computer history and old OSes where this change is
needed.

qemu up until commit 1328fe0c32d54 ("hw: usb: hcd-ohci: check len and
frame_number variables") allowed this behavior, so it is possible that
older committers wanted "be = cbp - 1" to signify a zero length packet, or
knew of some guest OS that relied on this behavior.

I think there is a non trivial chance some guest OS would rely on this
behavior because:

* using code like "be = cbp + len - 1" is *required* in every Transfer
Descriptor
* ohci Transfer Descriptors can be fragmented by the controller (e.g. if
the Info word sets Max Packet Size = 8)
* ohci Transfer Descriptors have physical memory 4K-page boundary
requirements
* ohci Transfer Descriptors for a zero-length packet are required to
complete certain USB1 transfers (such as the status transfer after an OUT,
or the empty packet during an IN)

in other words, there are just enough simultaneous requirements that a
guest OS might choose to lay out a string of Transfer Descriptors where the
guest OS uses this "be = cbp - 1" for a zero-length transfer.

But I only have a test case I created myself, and am not an expert on
computer history here. I think "be liberal in what you accept, by
conservative in what you send" applies when I don't know which historical
OSes, if any, need this behavior. I think the behavior of actual hardware
weighs more heavily since that *is* available and testable. Are there
additional checks that would expand on what's known about actual ohci hw?

Q3: (summarizing, take 2) if this patch is called up later and folks are
wondering whether it is correct?

A3: In my understanding of the discussion, this thread contains the most
significant discussion of the spec, the implementation and all the details.

In typical internet fashion, we're probably writing what future folks will
look at to answer that question.

In the future, actual hardware will become harder and harder to obtain.

It is not inconceivable that qemu will become the primary source of truth
for "what was ohci behavior?" to future folks.

In my understanding, behaving like actual hardware is a great way to
support the future folks looking for such info.
Peter Maydell June 7, 2024, 1:23 p.m. UTC | #7
On Fri, 31 May 2024 at 19:16, Cord Amfmgm <dmamfmgm@gmail.com> wrote:
> On Fri, May 31, 2024 at 9:03 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>> What I would like to see is what we could classify under
>> "rationale", which is to say "what prompted us to make this
>> change?". In my experience it's important to record this
>> (including in the commit message). There are of course
>> many cases in QEMU's git history where we failed to do that,
>> but in general I think it's a good standard to meet. (I
>> am also erring on the side of caution in reviewing this
>> particular patch, because I don't know the relevant standards
>> or this bit of the code very well.)

> Thanks, in responding to that, I'm breaking down my responses into the following answers:
>
> Q1: (summarizing) What prompted us to make this change?
>
> A1: I'm summarizing what I wrote in previous emails and the commit message -
>
> * Bring qemu closer to actual hw with a neatly packaged test case to demonstrate the behavior
> * I interpret the spec (Table 4-2) as saying the "be = cbp - 1" is valid, in addition to setting "cbp = 0"
> * I interpret it that way due to a comment in an old linux kernel version in the 2.x range, ohci-hcd.c file. It said (paraphrasing) some misbehaving ohci controllers would fetch physical memory at 0 when cbp = 0 was in the Transfer Descriptor

Interesting. Do you have a more specific version for that?
I had a look at various 2.x Linux OHCI drivers but they all seem to do
zero-length packets "by the spec" with CBP=BE=0. eg 2.6.39.4:
https://elixir.bootlin.com/linux/v2.6.39.4/source/drivers/usb/host/ohci-q.c#L539
and there's no hardware-quirk handling to do it differently on
some controllers. (The USB OHCI driver seems to have gone through
a couple of rewrites in the 2.x kernel timeframe; the earlier 2.3.51
version does the TD fill here:
https://elixir.bootlin.com/linux/2.3.51/source/drivers/usb/usb-ohci.c#L812
and again it handles zero length as BE=CBP=0.)

> But I only have a test case I created myself, and am not an expert on computer history here. I think "be liberal in what you accept, by conservative in what you send" applies when I don't know which historical OSes, if any, need this behavior. I think the behavior of actual hardware weighs more heavily since that *is* available and testable. Are there additional checks that would expand on what's known about actual ohci hw?

The other side of the argument is "if in doubt and you don't
know of any concrete problem being caused, don't change
working code". If there are any historical OSes that rely on
being able to send zero packets with be=cbp-1 for some nonzero
be, and anybody wants to run them on QEMU, then presumably they
can send us a bug report saying "XYZ's USB support doesn't work".
That nobody has ever does this is evidence on the side of
"there is no such OS out there, everybody writing an OHCI driver
read the spec and made their driver send zero length packets the
way the spec very clearly says you should". Please correct me
if I'm wrong, but my interpretation of your helpful explanation
above is that this is essentially a theoretical problem rather
than one that's caused you a problem you need to fix.

thanks
-- PMM
Cord Amfmgm June 8, 2024, 3:19 p.m. UTC | #8
On Fri, Jun 7, 2024 at 8:23 AM Peter Maydell <peter.maydell@linaro.org>
wrote:

> On Fri, 31 May 2024 at 19:16, Cord Amfmgm <dmamfmgm@gmail.com> wrote:
> > On Fri, May 31, 2024 at 9:03 AM Peter Maydell <peter.maydell@linaro.org>
> wrote:
> >> What I would like to see is what we could classify under
> >> "rationale", which is to say "what prompted us to make this
> >> change?". In my experience it's important to record this
> >> (including in the commit message). There are of course
> >> many cases in QEMU's git history where we failed to do that,
> >> but in general I think it's a good standard to meet. (I
> >> am also erring on the side of caution in reviewing this
> >> particular patch, because I don't know the relevant standards
> >> or this bit of the code very well.)
>
> > Thanks, in responding to that, I'm breaking down my responses into the
> following answers:
> >
> > Q1: (summarizing) What prompted us to make this change?
> >
> > A1: I'm summarizing what I wrote in previous emails and the commit
> message -
> >
> > * Bring qemu closer to actual hw with a neatly packaged test case to
> demonstrate the behavior
> > * I interpret the spec (Table 4-2) as saying the "be = cbp - 1" is
> valid, in addition to setting "cbp = 0"
> > * I interpret it that way due to a comment in an old linux kernel
> version in the 2.x range, ohci-hcd.c file. It said (paraphrasing) some
> misbehaving ohci controllers would fetch physical memory at 0 when cbp = 0
> was in the Transfer Descriptor
>
> Interesting. Do you have a more specific version for that?
> I had a look at various 2.x Linux OHCI drivers but they all seem to do
> zero-length packets "by the spec" with CBP=BE=0. eg 2.6.39.4:
>
> https://elixir.bootlin.com/linux/v2.6.39.4/source/drivers/usb/host/ohci-q.c#L539
> and there's no hardware-quirk handling to do it differently on
> some controllers. (The USB OHCI driver seems to have gone through
> a couple of rewrites in the 2.x kernel timeframe; the earlier 2.3.51
> version does the TD fill here:
> https://elixir.bootlin.com/linux/2.3.51/source/drivers/usb/usb-ohci.c#L812
> and again it handles zero length as BE=CBP=0.)
>
> > But I only have a test case I created myself, and am not an expert on
> computer history here. I think "be liberal in what you accept, by
> conservative in what you send" applies when I don't know which historical
> OSes, if any, need this behavior. I think the behavior of actual hardware
> weighs more heavily since that *is* available and testable. Are there
> additional checks that would expand on what's known about actual ohci hw?
>
> The other side of the argument is "if in doubt and you don't
> know of any concrete problem being caused, don't change
> working code". If there are any historical OSes that rely on
> being able to send zero packets with be=cbp-1 for some nonzero
> be, and anybody wants to run them on QEMU, then presumably they
> can send us a bug report saying "XYZ's USB support doesn't work".
> That nobody has ever does this is evidence on the side of
> "there is no such OS out there, everybody writing an OHCI driver
> read the spec and made their driver send zero length packets the
> way the spec very clearly says you should". Please correct me
> if I'm wrong, but my interpretation of your helpful explanation
> above is that this is essentially a theoretical problem rather
> than one that's caused you a problem you need to fix.
>

 I think that's fair.

I sent in the patch more out of a desire for qemu to have the greatest
possible ohci implementation, than due to knowledge of an actual OS that
couldn't work.

Up to you what to do from here.
Cord Amfmgm June 12, 2024, 5:17 p.m. UTC | #9
On Wed, Jun 12, 2024 at 9:21 AM Alex Bennée <alex.bennee@linaro.org> wrote:

> David Hubbard <dmamfmgm@gmail.com> writes:
>
> > From: Cord Amfmgm <dmamfmgm@gmail.com>
> >
> > This changes the way the ohci emulation handles a Transfer Descriptor
> with
> > "Current Buffer Pointer" set to "Buffer End" + 1.
> >
> > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than
> td.be
> > to signal the buffer has zero length. Currently qemu only accepts
> zero-length
> > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI
> hardware
> > accepts both cases.
> >
> > The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
> > and earlier matched the spec. (I haven't taken the time to bisect exactly
> > where the logic was changed.)
>
> I find it hard to characterise this as a regression because we've
> basically gone from no checks to actually doing bounds checks:
>
>   1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
>
> The argument here seems to be that real hardware is laxer than the specs
> in what it accepts.
>
> > With a tiny OS[1] that boots and executes a test, the issue can be seen:
> >
> > * OS that sends USB requests to a USB mass storage device
> >   but sends td.cbp = td.be + 1
> > * qemu 4.2
> > * qemu HEAD (4e66a0854)
> > * Actual OHCI controller (hardware)
> >
> > Command line:
> > qemu-system-x86_64 -m 20 \
> >  -device pci-ohci,id=ohci \
> >  -drive if=none,format=raw,id=d,file=testmbr.raw \
> >  -device usb-storage,bus=ohci.0,drive=d \
> >  --trace "usb_*" --trace "ohci_*" -D qemu.log
> >
> > Results are:
> >
> >  qemu 4.2   | qemu HEAD  | actual HW
> > ------------+------------+------------
> >  works fine | ohci_die() | works fine
> >
> > Tip: if the flags "-serial pty -serial stdio" are added to the command
> line
> > the test will output USB requests like this:
> >
> > Testing qemu HEAD:
> >
> >> Free mem 2M ohci port2 conn FS
> >> setup { 80 6 0 1 0 0 8 0 }
> >> ED info=80000 { mps=8 en=0 d=0 } tail=c20920
> >>   td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907
> >>   td1 c20960 nxt=c20980 f3140000    in cbp=c20908 be=c2090f
> >>   td2 c20980 nxt=c20920 f3080000   out cbp=c20910 be=c2090f ohci20 host
> err
> >> usb stopped
> >
> > And in qemu.log:
> >
> > usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 >
> next_offset=0x00c2090f
> >
> > Testing qemu 4.2:
> >
> >> Free mem 2M ohci port2 conn FS
> >> setup { 80 6 0 1 0 0 8 0 }
> >> ED info=80000 { mps=8 en=0 d=0 } tail=620920
> >>   td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0
> be=620907
> >>   td1 620960 nxt=620980 f3140000    in cbp=620908 be=62090f       cbp=0
> be=62090f
> >>   td2 620980 nxt=620920 f3080000   out cbp=620910 be=62090f       cbp=0
> be=62090f
> >>    rx { 12 1 0 2 0 0 0 8 }
> >> setup { 0 5 1 0 0 0 0 0 } tx {}
> >> ED info=80000 { mps=8 en=0 d=0 } tail=620880
> >>   td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0
> be=620907
> >>   td1 620960 nxt=620880 f3100000    in cbp=620908 be=620907       cbp=0
> be=620907
> >> setup { 80 6 0 1 0 0 12 0 }
> >> ED info=80001 { mps=8 en=0 d=1 } tail=620960
> >>   td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927       cbp=0
> be=620927
> >>   td1 6209c0 nxt=6209e0 f3140000    in cbp=620928 be=620939       cbp=0
> be=620939
> >>   td2 6209e0 nxt=620960 f3080000   out cbp=62093a be=620939       cbp=0
> be=620939
> >>    rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> >> setup { 80 6 0 2 0 0 0 1 }
> >> ED info=80001 { mps=8 en=0 d=1 } tail=620880
> >>   td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27       cbp=0
> be=620a27
> >>   td1 6209a0 nxt=6209c0 f3140004    in cbp=620a28 be=620b27
>  cbp=620a48 be=620b27
> >>   td2 6209c0 nxt=620880 f3080000   out cbp=620b28 be=620b27       cbp=0
> be=620b27
> >>    rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2
> 40 0 0 }
> >> setup { 0 9 1 0 0 0 0 0 } tx {}
> >> ED info=80001 { mps=8 en=0 d=1 } tail=620900
> >>   td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07       cbp=0
> be=620a07
> >>   td1 620940 nxt=620900 f3100000    in cbp=620a08 be=620a07       cbp=0
> be=620a07
> >
> > [1] The OS disk image has been emailed to philmd@linaro.org,
> mjt@tls.msk.ru,
> > and kraxel@redhat.com:
> >
> > * testCbpOffBy1.img.xz
> > * sha256:
> f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed
> >
> > Signed-off-by: Cord Amfmgm <dmamfmgm@gmail.com>
> > ---
> >  hw/usb/hcd-ohci.c   | 4 ++--
> >  hw/usb/trace-events | 1 +
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> > index acd6016980..71b54914d3 100644
> > --- a/hw/usb/hcd-ohci.c
> > +++ b/hw/usb/hcd-ohci.c
> > @@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct
> ohci_ed *ed)
> >          if ((td.cbp & 0xfffff000) != (td.be & 0xfffff000)) {
> >              len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
> >          } else {
> > -            if (td.cbp > td.be) {
> > -                trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
> > +            if (td.cbp - 1 > td.be) {  /* rely on td.cbp != 0 */
> > +                trace_usb_ohci_td_bad_buf(td.cbp, td.be);
> >                  ohci_die(ohci);
> >                  return 1;
> >              }
>
> With the updated commit message:
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>

Please forgive my lack of experience on this mailing list. I don't see a
suggested commit message from Alex but in case that was what is being
considered, here is one. Feedback welcome, also if this is not what is
wanted, please just say it.

> From: Cord Amfmgm <dmamfmgm@gmail.com>
>
> This changes the way the ohci emulation handles a Transfer Descriptor with
> "Buffer End" set to "Current Buffer Pointer" - 1, specifically in the
case of a
> zero-length packet.
>
> The OHCI spec 4.3.1.2 Table 4-2 specifies td.cbp to be zero for a
zero-length
> packet. Peter Maydell tracked down commit
> 1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
> where qemu started checking this according to the spec.
>
> What this patch does is loosen the qemu ohci implementation to allow a
> zero-length packet if td.be (Buffer End) is set to td.cbp - 1, and with a
> non-zero td.cbp value.
>
> Is this correct? It appears to not follow the spec, so no. But actual hw
> seems to be ok with it.
>
> Does any OS rely on this behavior? There have been no reports to
> qemu-devel of this problem.
>
> This is attempting to have qemu behave like actual hardware,
> but this is just a minor change.
>
> With a tiny OS[1] that boots and executes a test, the behavior is
> reproducible:
>
> * OS that sends USB requests to a USB mass storage device
>   but sends td.be = td.cbp - 1
> * qemu 4.2
> * qemu HEAD (4e66a0854)
> * Actual OHCI controller (hardware)
>
> Command line:
> qemu-system-x86_64 -m 20 \
>  -device pci-ohci,id=ohci \
>  -drive if=none,format=raw,id=d,file=testmbr.raw \
>  -device usb-storage,bus=ohci.0,drive=d \
>  --trace "usb_*" --trace "ohci_*" -D qemu.log
>
> Results are:
>
>  qemu 4.2   | qemu HEAD  | actual HW
> ------------+------------+------------
>  works fine | ohci_die() | works fine
>
> Tip: if the flags "-serial pty -serial stdio" are added to the command
line
> the test will output USB requests like this:
>
> Testing qemu HEAD:
>
>> Free mem 2M ohci port2 conn FS
>> setup { 80 6 0 1 0 0 8 0 }
>> ED info=80000 { mps=8 en=0 d=0 } tail=c20920
>>   td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907
>>   td1 c20960 nxt=c20980 f3140000    in cbp=c20908 be=c2090f
>>   td2 c20980 nxt=c20920 f3080000   out cbp=c20910 be=c2090f ohci20 host
err
>> usb stopped
>
> And in qemu.log:
>
> usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 >
next_offset=0x00c2090f
>
> Testing qemu 4.2:
>
>> Free mem 2M ohci port2 conn FS
>> setup { 80 6 0 1 0 0 8 0 }
>> ED info=80000 { mps=8 en=0 d=0 } tail=620920
>>   td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0
be=620907
>>   td1 620960 nxt=620980 f3140000    in cbp=620908 be=62090f       cbp=0
be=62090f
>>   td2 620980 nxt=620920 f3080000   out cbp=620910 be=62090f       cbp=0
be=62090f
>>    rx { 12 1 0 2 0 0 0 8 }
>> setup { 0 5 1 0 0 0 0 0 } tx {}
>> ED info=80000 { mps=8 en=0 d=0 } tail=620880
>>   td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0
be=620907
>>   td1 620960 nxt=620880 f3100000    in cbp=620908 be=620907       cbp=0
be=620907
>> setup { 80 6 0 1 0 0 12 0 }
>> ED info=80001 { mps=8 en=0 d=1 } tail=620960
>>   td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927       cbp=0
be=620927
>>   td1 6209c0 nxt=6209e0 f3140000    in cbp=620928 be=620939       cbp=0
be=620939
>>   td2 6209e0 nxt=620960 f3080000   out cbp=62093a be=620939       cbp=0
be=620939
>>    rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
>> setup { 80 6 0 2 0 0 0 1 }
>> ED info=80001 { mps=8 en=0 d=1 } tail=620880
>>   td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27       cbp=0
be=620a27
>>   td1 6209a0 nxt=6209c0 f3140004    in cbp=620a28 be=620b27
 cbp=620a48 be=620b27
>>   td2 6209c0 nxt=620880 f3080000   out cbp=620b28 be=620b27       cbp=0
be=620b27
>>    rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2
40 0 0 }
>> setup { 0 9 1 0 0 0 0 0 } tx {}
>> ED info=80001 { mps=8 en=0 d=1 } tail=620900
>>   td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07       cbp=0
be=620a07
>>   td1 620940 nxt=620900 f3100000    in cbp=620a08 be=620a07       cbp=0
be=620a07
>
> [1] The OS disk image has been emailed to philmd@linaro.org,
mjt@tls.msk.ru,
> and kraxel@redhat.com:
>
> * testCbpOffBy1.img.xz
> * sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed
>
> Signed-off-by: Cord Amfmgm <dmamfmgm@gmail.com>
Alex Bennée June 12, 2024, 7:36 p.m. UTC | #10
Cord Amfmgm <dmamfmgm@gmail.com> writes:

> On Wed, Jun 12, 2024 at 9:21 AM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>  David Hubbard <dmamfmgm@gmail.com> writes:
>
>  > From: Cord Amfmgm <dmamfmgm@gmail.com>
>  >
>  > This changes the way the ohci emulation handles a Transfer Descriptor with
>  > "Current Buffer Pointer" set to "Buffer End" + 1.
>  >
>  > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be
>  > to signal the buffer has zero length. Currently qemu only accepts zero-length
>  > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware
>  > accepts both cases.
>  >
>  > The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
>  > and earlier matched the spec. (I haven't taken the time to bisect exactly
>  > where the logic was changed.)
>
>  I find it hard to characterise this as a regression because we've
>  basically gone from no checks to actually doing bounds checks:
>
>    1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
>
>  The argument here seems to be that real hardware is laxer than the specs
>  in what it accepts.
>
<snip>
>
>  With the updated commit message:
>
>  Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>
> Please forgive my lack of experience on this mailing list. I don't see a suggested commit message from Alex but in case that
> was what is being considered, here is one. Feedback welcome, also if this is not what is wanted, please just say it.
>

Something along the lines of what you suggest here (where did this come
from?)

>> From: Cord Amfmgm <dmamfmgm@gmail.com>
>>
>> This changes the way the ohci emulation handles a Transfer Descriptor with
>> "Buffer End" set to "Current Buffer Pointer" - 1, specifically in the case of a
>> zero-length packet.
>>
>> The OHCI spec 4.3.1.2 Table 4-2 specifies td.cbp to be zero for a zero-length
>> packet. Peter Maydell tracked down commit
>> 1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
>> where qemu started checking this according to the spec.
>>
>> What this patch does is loosen the qemu ohci implementation to allow a
>> zero-length packet if td.be (Buffer End) is set to td.cbp - 1, and with a
>> non-zero td.cbp value.
>>
>> Is this correct? It appears to not follow the spec, so no. But actual hw
>> seems to be ok with it.
>>
>> Does any OS rely on this behavior? There have been no reports to
>> qemu-devel of this problem.
>>
>> This is attempting to have qemu behave like actual hardware,
>> but this is just a minor change.
>>
>> With a tiny OS[1] that boots and executes a test, the behavior is
>> reproducible:
>>
>> * OS that sends USB requests to a USB mass storage device
>>   but sends td.be = td.cbp - 1
>> * qemu 4.2
>> * qemu HEAD (4e66a0854)
>> * Actual OHCI controller (hardware)
>>
>> Command line:
>> qemu-system-x86_64 -m 20 \
>>  -device pci-ohci,id=ohci \
>>  -drive if=none,format=raw,id=d,file=testmbr.raw \
>>  -device usb-storage,bus=ohci.0,drive=d \
>>  --trace "usb_*" --trace "ohci_*" -D qemu.log
>>
>> Results are:
>>
>>  qemu 4.2   | qemu HEAD  | actual HW
>> ------------+------------+------------
>>  works fine | ohci_die() | works fine
>>
>> Tip: if the flags "-serial pty -serial stdio" are added to the command line
>> the test will output USB requests like this:
>>
>> Testing qemu HEAD:
>>
>>> Free mem 2M ohci port2 conn FS
>>> setup { 80 6 0 1 0 0 8 0 }
>>> ED info=80000 { mps=8 en=0 d=0 } tail=c20920
>>>   td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907
>>>   td1 c20960 nxt=c20980 f3140000    in cbp=c20908 be=c2090f
>>>   td2 c20980 nxt=c20920 f3080000   out cbp=c20910 be=c2090f ohci20 host err
>>> usb stopped
>>
>> And in qemu.log:
>>
>> usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > next_offset=0x00c2090f
>>
>> Testing qemu 4.2:
>>
>>> Free mem 2M ohci port2 conn FS
>>> setup { 80 6 0 1 0 0 8 0 }
>>> ED info=80000 { mps=8 en=0 d=0 } tail=620920
>>>   td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0 be=620907
>>>   td1 620960 nxt=620980 f3140000    in cbp=620908 be=62090f       cbp=0 be=62090f
>>>   td2 620980 nxt=620920 f3080000   out cbp=620910 be=62090f       cbp=0 be=62090f
>>>    rx { 12 1 0 2 0 0 0 8 }
>>> setup { 0 5 1 0 0 0 0 0 } tx {}
>>> ED info=80000 { mps=8 en=0 d=0 } tail=620880
>>>   td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907       cbp=0 be=620907
>>>   td1 620960 nxt=620880 f3100000    in cbp=620908 be=620907       cbp=0 be=620907
>>> setup { 80 6 0 1 0 0 12 0 }
>>> ED info=80001 { mps=8 en=0 d=1 } tail=620960
>>>   td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927       cbp=0 be=620927
>>>   td1 6209c0 nxt=6209e0 f3140000    in cbp=620928 be=620939       cbp=0 be=620939
>>>   td2 6209e0 nxt=620960 f3080000   out cbp=62093a be=620939       cbp=0 be=620939
>>>    rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
>>> setup { 80 6 0 2 0 0 0 1 }
>>> ED info=80001 { mps=8 en=0 d=1 } tail=620880
>>>   td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27       cbp=0 be=620a27
>>>   td1 6209a0 nxt=6209c0 f3140004    in cbp=620a28 be=620b27       cbp=620a48 be=620b27
>>>   td2 6209c0 nxt=620880 f3080000   out cbp=620b28 be=620b27       cbp=0 be=620b27
>>>    rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2 40 0 0 }
>>> setup { 0 9 1 0 0 0 0 0 } tx {}
>>> ED info=80001 { mps=8 en=0 d=1 } tail=620900
>>>   td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07       cbp=0 be=620a07
>>>   td1 620940 nxt=620900 f3100000    in cbp=620a08 be=620a07       cbp=0 be=620a07
>>
>> [1] The OS disk image has been emailed to philmd@linaro.org, mjt@tls.msk.ru,
>> and kraxel@redhat.com:
>>
>> * testCbpOffBy1.img.xz
>> * sha256:
>> f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed
>>
>> Signed-off-by: Cord Amfmgm <dmamfmgm@gmail.com>
Cord Amfmgm June 12, 2024, 7:48 p.m. UTC | #11
On Wed, Jun 12, 2024 at 2:36 PM Alex Bennée <alex.bennee@linaro.org> wrote:

> Cord Amfmgm <dmamfmgm@gmail.com> writes:
>
> > On Wed, Jun 12, 2024 at 9:21 AM Alex Bennée <alex.bennee@linaro.org>
> wrote:
> >
> >  David Hubbard <dmamfmgm@gmail.com> writes:
> >
> >  > From: Cord Amfmgm <dmamfmgm@gmail.com>
> >  >
> >  > This changes the way the ohci emulation handles a Transfer Descriptor
> with
> >  > "Current Buffer Pointer" set to "Buffer End" + 1.
> >  >
> >  > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more
> than td.be
> >  > to signal the buffer has zero length. Currently qemu only accepts
> zero-length
> >  > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI
> hardware
> >  > accepts both cases.
> >  >
> >  > The qemu ohci emulation has a regression in ohci_service_td. Version
> 4.2
> >  > and earlier matched the spec. (I haven't taken the time to bisect
> exactly
> >  > where the logic was changed.)
> >
> >  I find it hard to characterise this as a regression because we've
> >  basically gone from no checks to actually doing bounds checks:
> >
> >    1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
> >
> >  The argument here seems to be that real hardware is laxer than the specs
> >  in what it accepts.
> >
> <snip>
> >
> >  With the updated commit message:
> >
> >  Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> >
> > Please forgive my lack of experience on this mailing list. I don't see a
> suggested commit message from Alex but in case that
> > was what is being considered, here is one. Feedback welcome, also if
> this is not what is wanted, please just say it.
> >
>
> Something along the lines of what you suggest here (where did this come
> from?)
>
> >> From: Cord Amfmgm <dmamfmgm@gmail.com>
> >>
> >> This changes the way the ohci emulation handles a Transfer Descriptor
> with
> >> "Buffer End" set to "Current Buffer Pointer" - 1, specifically in the
> case of a
> >> zero-length packet.
> >>
> >> The OHCI spec 4.3.1.2 Table 4-2 specifies td.cbp to be zero for a
> zero-length
> >> packet. Peter Maydell tracked down commit
> >> 1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
> >> where qemu started checking this according to the spec.
> >>
> >> What this patch does is loosen the qemu ohci implementation to allow a
> >> zero-length packet if td.be (Buffer End) is set to td.cbp - 1, and
> with a
> >> non-zero td.cbp value.
> >>
> >> Is this correct? It appears to not follow the spec, so no. But actual hw
> >> seems to be ok with it.
> >>
> >> Does any OS rely on this behavior? There have been no reports to
> >> qemu-devel of this problem.
> >>
> >> This is attempting to have qemu behave like actual hardware,
> >> but this is just a minor change.
> >>
> >> With a tiny OS[1] that boots and executes a test, the behavior is
> >> reproducible:
> >>
> >> * OS that sends USB requests to a USB mass storage device
> >>   but sends td.be = td.cbp - 1
> >> * qemu 4.2
> >> * qemu HEAD (4e66a0854)
> >> * Actual OHCI controller (hardware)
> >>
> >> Command line:
> >> qemu-system-x86_64 -m 20 \
> >>  -device pci-ohci,id=ohci \
> >>  -drive if=none,format=raw,id=d,file=testmbr.raw \
> >>  -device usb-storage,bus=ohci.0,drive=d \
> >>  --trace "usb_*" --trace "ohci_*" -D qemu.log
> >>
> >> Results are:
> >>
> >>  qemu 4.2   | qemu HEAD  | actual HW
> >> ------------+------------+------------
> >>  works fine | ohci_die() | works fine
> >>
> >> Tip: if the flags "-serial pty -serial stdio" are added to the command
> line
> >> the test will output USB requests like this:
> >>
> >> Testing qemu HEAD:
> >>
> >>> Free mem 2M ohci port2 conn FS
> >>> setup { 80 6 0 1 0 0 8 0 }
> >>> ED info=80000 { mps=8 en=0 d=0 } tail=c20920
> >>>   td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907
> >>>   td1 c20960 nxt=c20980 f3140000    in cbp=c20908 be=c2090f
> >>>   td2 c20980 nxt=c20920 f3080000   out cbp=c20910 be=c2090f ohci20
> host err
> >>> usb stopped
> >>
> >> And in qemu.log:
> >>
> >> usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 >
> next_offset=0x00c2090f
> >>
> >> Testing qemu 4.2:
> >>
> >>> Free mem 2M ohci port2 conn FS
> >>> setup { 80 6 0 1 0 0 8 0 }
> >>> ED info=80000 { mps=8 en=0 d=0 } tail=620920
> >>>   td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907
>  cbp=0 be=620907
> >>>   td1 620960 nxt=620980 f3140000    in cbp=620908 be=62090f
>  cbp=0 be=62090f
> >>>   td2 620980 nxt=620920 f3080000   out cbp=620910 be=62090f
>  cbp=0 be=62090f
> >>>    rx { 12 1 0 2 0 0 0 8 }
> >>> setup { 0 5 1 0 0 0 0 0 } tx {}
> >>> ED info=80000 { mps=8 en=0 d=0 } tail=620880
> >>>   td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907
>  cbp=0 be=620907
> >>>   td1 620960 nxt=620880 f3100000    in cbp=620908 be=620907
>  cbp=0 be=620907
> >>> setup { 80 6 0 1 0 0 12 0 }
> >>> ED info=80001 { mps=8 en=0 d=1 } tail=620960
> >>>   td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927
>  cbp=0 be=620927
> >>>   td1 6209c0 nxt=6209e0 f3140000    in cbp=620928 be=620939
>  cbp=0 be=620939
> >>>   td2 6209e0 nxt=620960 f3080000   out cbp=62093a be=620939
>  cbp=0 be=620939
> >>>    rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> >>> setup { 80 6 0 2 0 0 0 1 }
> >>> ED info=80001 { mps=8 en=0 d=1 } tail=620880
> >>>   td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27
>  cbp=0 be=620a27
> >>>   td1 6209a0 nxt=6209c0 f3140004    in cbp=620a28 be=620b27
>  cbp=620a48 be=620b27
> >>>   td2 6209c0 nxt=620880 f3080000   out cbp=620b28 be=620b27
>  cbp=0 be=620b27
> >>>    rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2
> 40 0 0 }
> >>> setup { 0 9 1 0 0 0 0 0 } tx {}
> >>> ED info=80001 { mps=8 en=0 d=1 } tail=620900
> >>>   td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07
>  cbp=0 be=620a07
> >>>   td1 620940 nxt=620900 f3100000    in cbp=620a08 be=620a07
>  cbp=0 be=620a07
> >>
> >> [1] The OS disk image has been emailed to philmd@linaro.org,
> mjt@tls.msk.ru,
> >> and kraxel@redhat.com:
> >>
> >> * testCbpOffBy1.img.xz
> >> * sha256:
> >> f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed
> >>
> >> Signed-off-by: Cord Amfmgm <dmamfmgm@gmail.com>
>
>
Oh, I just manually put it in blockquotes to clearly delineate what it is.

I authored this new message in the previous message.
Peter Maydell June 21, 2024, 3:20 p.m. UTC | #12
On Wed, 12 Jun 2024 at 20:36, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Cord Amfmgm <dmamfmgm@gmail.com> writes:
>
> > On Wed, Jun 12, 2024 at 9:21 AM Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> >  David Hubbard <dmamfmgm@gmail.com> writes:
> >
> >  > From: Cord Amfmgm <dmamfmgm@gmail.com>
> >  >
> >  > This changes the way the ohci emulation handles a Transfer Descriptor with
> >  > "Current Buffer Pointer" set to "Buffer End" + 1.
> >  >
> >  > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be
> >  > to signal the buffer has zero length. Currently qemu only accepts zero-length
> >  > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware
> >  > accepts both cases.
> >  >
> >  > The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
> >  > and earlier matched the spec. (I haven't taken the time to bisect exactly
> >  > where the logic was changed.)
> >
> >  I find it hard to characterise this as a regression because we've
> >  basically gone from no checks to actually doing bounds checks:
> >
> >    1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
> >
> >  The argument here seems to be that real hardware is laxer than the specs
> >  in what it accepts.
> >
> <snip>
> >
> >  With the updated commit message:
> >
> >  Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> >
> > Please forgive my lack of experience on this mailing list. I don't see a suggested commit message from Alex but in case that
> > was what is being considered, here is one. Feedback welcome, also if this is not what is wanted, please just say it.
> >
>
> Something along the lines of what you suggest here

Thanks; I've picked up this patch for target-arm.next (as with
your previous one for hcd-ohci, adjusting the Author and
Signed-off-by lines to both read David Hubbard).

I tweaked the commit message a little bit, so the middle part reads:

    What this patch does is loosen the qemu ohci implementation to allow a
    zero-length packet if td.be (Buffer End) is set to td.cbp - 1, and with a
    non-zero td.cbp value.

    The spec is unclear whether this is valid or not -- it is not the
    clearly documented way to send a zero length TD (which is CBP=BE=0),
    but it isn't specifically forbidden. Actual hw seems to be ok with it.

thanks
-- PMM
Cord Amfmgm June 21, 2024, 4:24 p.m. UTC | #13
On Fri, Jun 21, 2024 at 10:21 AM Peter Maydell <peter.maydell@linaro.org>
wrote:

> On Wed, 12 Jun 2024 at 20:36, Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> > Cord Amfmgm <dmamfmgm@gmail.com> writes:
> >
> > > On Wed, Jun 12, 2024 at 9:21 AM Alex Bennée <alex.bennee@linaro.org>
> wrote:
> > >
> > >  David Hubbard <dmamfmgm@gmail.com> writes:
> > >
> > >  > From: Cord Amfmgm <dmamfmgm@gmail.com>
> > >  >
> > >  > This changes the way the ohci emulation handles a Transfer
> Descriptor with
> > >  > "Current Buffer Pointer" set to "Buffer End" + 1.
> > >  >
> > >  > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more
> than td.be
> > >  > to signal the buffer has zero length. Currently qemu only accepts
> zero-length
> > >  > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI
> hardware
> > >  > accepts both cases.
> > >  >
> > >  > The qemu ohci emulation has a regression in ohci_service_td.
> Version 4.2
> > >  > and earlier matched the spec. (I haven't taken the time to bisect
> exactly
> > >  > where the logic was changed.)
> > >
> > >  I find it hard to characterise this as a regression because we've
> > >  basically gone from no checks to actually doing bounds checks:
> > >
> > >    1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
> > >
> > >  The argument here seems to be that real hardware is laxer than the
> specs
> > >  in what it accepts.
> > >
> > <snip>
> > >
> > >  With the updated commit message:
> > >
> > >  Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > >
> > > Please forgive my lack of experience on this mailing list. I don't see
> a suggested commit message from Alex but in case that
> > > was what is being considered, here is one. Feedback welcome, also if
> this is not what is wanted, please just say it.
> > >
> >
> > Something along the lines of what you suggest here
>
> Thanks; I've picked up this patch for target-arm.next (as with
> your previous one for hcd-ohci, adjusting the Author and
> Signed-off-by lines to both read David Hubbard).
>
> I tweaked the commit message a little bit, so the middle part reads:
>
>     What this patch does is loosen the qemu ohci implementation to allow a
>     zero-length packet if td.be (Buffer End) is set to td.cbp - 1, and
> with a
>     non-zero td.cbp value.
>
>     The spec is unclear whether this is valid or not -- it is not the
>     clearly documented way to send a zero length TD (which is CBP=BE=0),
>     but it isn't specifically forbidden. Actual hw seems to be ok with it.
>
> thanks
> -- PMM
>

That tweak looks great.

Thank you for your patience working with me to get this patch into a good
shape.

This was my first attempt to contribute to qemu - really appreciate it.
Peter Maydell June 21, 2024, 5:16 p.m. UTC | #14
On Fri, 21 Jun 2024 at 17:24, Cord Amfmgm <dmamfmgm@gmail.com> wrote:
>
>
> On Fri, Jun 21, 2024 at 10:21 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>> Thanks; I've picked up this patch for target-arm.next (as with
>> your previous one for hcd-ohci, adjusting the Author and
>> Signed-off-by lines to both read David Hubbard).
>>
>> I tweaked the commit message a little bit, so the middle part reads:
>>
>>     What this patch does is loosen the qemu ohci implementation to allow a
>>     zero-length packet if td.be (Buffer End) is set to td.cbp - 1, and with a
>>     non-zero td.cbp value.
>>
>>     The spec is unclear whether this is valid or not -- it is not the
>>     clearly documented way to send a zero length TD (which is CBP=BE=0),
>>     but it isn't specifically forbidden. Actual hw seems to be ok with it.
>
> That tweak looks great.
>
> Thank you for your patience working with me to get this patch into a good shape.
>
> This was my first attempt to contribute to qemu - really appreciate it.

You're welcome -- thanks for the effort you've put in on your end
working through our code review process.

-- PMM
diff mbox series

Patch

diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index acd6016980..71b54914d3 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -941,8 +941,8 @@  static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
         if ((td.cbp & 0xfffff000) != (td.be & 0xfffff000)) {
             len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
         } else {
-            if (td.cbp > td.be) {
-                trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
+            if (td.cbp - 1 > td.be) {  /* rely on td.cbp != 0 */
+                trace_usb_ohci_td_bad_buf(td.cbp, td.be);
                 ohci_die(ohci);
                 return 1;
             }
diff --git a/hw/usb/trace-events b/hw/usb/trace-events
index fd7b90d70c..fe282e7876 100644
--- a/hw/usb/trace-events
+++ b/hw/usb/trace-events
@@ -29,6 +29,7 @@  usb_ohci_iso_td_data_underrun(int ret) "DataUnderrun %d"
 usb_ohci_iso_td_nak(int ret) "got NAK/STALL %d"
 usb_ohci_iso_td_bad_response(int ret) "Bad device response %d"
 usb_ohci_td_bad_pid(const char *s, uint32_t edf, uint32_t tdf) "Bad pid %s: ed.flags 0x%x td.flags 0x%x"
+usb_ohci_td_bad_buf(uint32_t cbp, uint32_t be) "Bad cbp = 0x%x > be = 0x%x"
 usb_ohci_port_attach(int index) "port #%d"
 usb_ohci_port_detach(int index) "port #%d"
 usb_ohci_port_wakeup(int index) "port #%d"