Message ID | 1350297511-25437-7-git-send-email-hdegoede@redhat.com |
---|---|
State | New |
Headers | show |
On 10/15/12 12:38, Hans de Goede wrote: > Often the guest will queue up new packets in response to a packet, in the > async schedule with its IOC flag set, completing. By speeding up the > frame-timer, we notice these new packets earlier. This increases the > speed (MB/s) of a Linux guest reading from a USB mass storage device by a > factor of 1.15 on top of the "Improve latency of interrupt delivery" > speed-ups, both with and without input pipelining enabled. Why not just set async_stepdown to 0? cheers, Gerd
Hi, On 10/15/2012 01:17 PM, Gerd Hoffmann wrote: > On 10/15/12 12:38, Hans de Goede wrote: >> Often the guest will queue up new packets in response to a packet, in the >> async schedule with its IOC flag set, completing. By speeding up the >> frame-timer, we notice these new packets earlier. This increases the >> speed (MB/s) of a Linux guest reading from a USB mass storage device by a >> factor of 1.15 on top of the "Improve latency of interrupt delivery" >> speed-ups, both with and without input pipelining enabled. > > Why not just set async_stepdown to 0? We already do that whenever we run a package completion (it get sets when we move to the executing stage). What this patch does is request the frame timer to run again in 500 usecs instead of after 1 ms, thus making us see and process async transfers faster when they are queued up in response to just completed packages (which we've told the guest about with the int interrupt). This makes the USB-bus / device idle time between any 2 transfers of the 3 transfer involving USB storage BOT time shorter, thereby speeding things up. Regards, Hans
On 10/15/12 15:00, Hans de Goede wrote: > Hi, > > On 10/15/2012 01:17 PM, Gerd Hoffmann wrote: >> On 10/15/12 12:38, Hans de Goede wrote: >>> Often the guest will queue up new packets in response to a packet, in >>> the >>> async schedule with its IOC flag set, completing. By speeding up the >>> frame-timer, we notice these new packets earlier. This increases the >>> speed (MB/s) of a Linux guest reading from a USB mass storage device >>> by a >>> factor of 1.15 on top of the "Improve latency of interrupt delivery" >>> speed-ups, both with and without input pipelining enabled. >> >> Why not just set async_stepdown to 0? > > We already do that whenever we run a package completion (it get sets when > we move to the executing stage). What this patch does is request the > frame timer to run again in 500 usecs instead of after 1 ms, thus making > us see and process async transfers faster when they are queued up in > response to just completed packages (which we've told the guest about with > the int interrupt). This makes the USB-bus / device idle time between > any 2 transfers of the 3 transfer involving USB storage BOT time shorter, > thereby speeding things up. Don't feel like having two mechanisms for wakeup rate control. Can't we integrate this with async_stepdown? Changing the baseline maybe, so stepdown=0 doesn't mean 1000 Hz but 2000 Hz? cheers, Gerd
Hi, On 10/17/2012 01:01 PM, Gerd Hoffmann wrote: > On 10/15/12 15:00, Hans de Goede wrote: >> Hi, >> >> On 10/15/2012 01:17 PM, Gerd Hoffmann wrote: >>> On 10/15/12 12:38, Hans de Goede wrote: >>>> Often the guest will queue up new packets in response to a packet, in >>>> the >>>> async schedule with its IOC flag set, completing. By speeding up the >>>> frame-timer, we notice these new packets earlier. This increases the >>>> speed (MB/s) of a Linux guest reading from a USB mass storage device >>>> by a >>>> factor of 1.15 on top of the "Improve latency of interrupt delivery" >>>> speed-ups, both with and without input pipelining enabled. >>> >>> Why not just set async_stepdown to 0? >> >> We already do that whenever we run a package completion (it get sets when >> we move to the executing stage). What this patch does is request the >> frame timer to run again in 500 usecs instead of after 1 ms, thus making >> us see and process async transfers faster when they are queued up in >> response to just completed packages (which we've told the guest about with >> the int interrupt). This makes the USB-bus / device idle time between >> any 2 transfers of the 3 transfer involving USB storage BOT time shorter, >> thereby speeding things up. > > Don't feel like having two mechanisms for wakeup rate control. Can't we > integrate this with async_stepdown? Changing the baseline maybe, so > stepdown=0 doesn't mean 1000 Hz but 2000 Hz? That is actually close to what I wanted to do at first (I wanted to use stepdown=-1 for the faster wakeup case). But there are 2 problems with this: 1) It causes migration issues when migrating to / from an old version 2) We don't want to change the wakeup rate when the interrupt flag gets set as pending, but when it actually gets committed, and we only want to change the wakeup rate when the int was requested by an async packet, not when it was requested by a periodic packet, so we will need the int_req_by_async flag anyways, as which point this seemed the cleanest way. Regards, Hans
Hi, > 1) It causes migration issues when migrating to / from an old version With -1 yes, shifting the scale shoudn't be a that big issue though as it is just a optimization. > 2) We don't want to change the wakeup rate when the interrupt flag gets set > as pending, but when it actually gets committed, and we only want to change > the wakeup rate when the int was requested by an async packet, not when it > was requested by a periodic packet, so we will need the int_req_by_async > flag anyways, as which point this seemed the cleanest way. Missed that little details, ok then. cheers, Gerd
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index bbfa441..58e788b 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -443,6 +443,7 @@ struct EHCIState { uint64_t last_run_ns; uint32_t async_stepdown; + bool int_req_by_async; }; #define SET_LAST_RUN_CLOCK(s) \ @@ -1529,6 +1530,9 @@ static void ehci_execute_complete(EHCIQueue *q) if (q->qh.token & QTD_TOKEN_IOC) { ehci_raise_irq(q->ehci, USBSTS_INT); + if (q->async) { + q->ehci->int_req_by_async = true; + } } } @@ -2503,8 +2507,15 @@ static void ehci_frame_timer(void *opaque) } if (need_timer) { - expire_time = t_now + (get_ticks_per_sec() + /* If we've raised int, we speed up the timer, so that we quickly + * notice any new packets queued up in response */ + if (ehci->int_req_by_async && (ehci->usbsts & USBSTS_INT)) { + expire_time = t_now + get_ticks_per_sec() / (FRAME_TIMER_FREQ * 2); + ehci->int_req_by_async = false; + } else { + expire_time = t_now + (get_ticks_per_sec() * (ehci->async_stepdown+1) / FRAME_TIMER_FREQ); + } qemu_mod_timer(ehci->frame_timer, expire_time); } }
Often the guest will queue up new packets in response to a packet, in the async schedule with its IOC flag set, completing. By speeding up the frame-timer, we notice these new packets earlier. This increases the speed (MB/s) of a Linux guest reading from a USB mass storage device by a factor of 1.15 on top of the "Improve latency of interrupt delivery" speed-ups, both with and without input pipelining enabled. I've not tested the speed-up of this patch without the "Improve latency of interrupt delivery" patch. Signed-off-by: Hans de Goede <hdegoede@redhat.com> --- hw/usb/hcd-ehci.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)