Message ID | 1327889855.3455.2.camel@nexus.oss.ntt.co.jp |
---|---|
State | New |
Headers | show |
2012/1/30 Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>: > Some drivers (Linux' 8139too among them) rely on the NIC injecting an interrupt > in the event of a receive buffer overflow and, accordingly, set the RxOverflow > bit in the interrupt mask. Unfortunately rtl8139's can_receive method ignores > the RxOverflow flag, which may lead to a situation where rtl8139 stops receiving > packets (can_receive returns 0) when the receive buffer becomes full. > > If the driver eventually read from the receive buffer or reset the card the > emulator could recover from this situation. However some implementations only > do this upon receiving an interrupt with either RxOK or RxOverflow set in the > ISR; interrupt that will never come because QEMU's flow control mechanisms would > prevent rtl8139 from receiving any packet. > > Letting packets go through when the overflow interrupt is enabled makes the > QEMU emulator compliant to the spec and solves the problem. > > This patch should fix a relatively common (in our experience) network stall > observed when running enterprise distros with rtl8139 as the NIC; in some cases > the 8139too device driver gets loaded and when under heavy load the network > eventually stops working. It would be great to see specific example to verify the issue. Otherwise the change looks great.
Hi Igor, Thank you for the prompt reply. I really appreciate it. (2012年01月31日 02:28), Igor Kovalenko wrote: > 2012/1/30 Fernando Luis Vázquez Cao<fernando@oss.ntt.co.jp>: >> Some drivers (Linux' 8139too among them) rely on the NIC injecting an interrupt >> in the event of a receive buffer overflow and, accordingly, set the RxOverflow >> bit in the interrupt mask. Unfortunately rtl8139's can_receive method ignores >> the RxOverflow flag, which may lead to a situation where rtl8139 stops receiving >> packets (can_receive returns 0) when the receive buffer becomes full. >> >> If the driver eventually read from the receive buffer or reset the card the >> emulator could recover from this situation. However some implementations only >> do this upon receiving an interrupt with either RxOK or RxOverflow set in the >> ISR; interrupt that will never come because QEMU's flow control mechanisms would >> prevent rtl8139 from receiving any packet. >> >> Letting packets go through when the overflow interrupt is enabled makes the >> QEMU emulator compliant to the spec and solves the problem. >> >> This patch should fix a relatively common (in our experience) network stall >> observed when running enterprise distros with rtl8139 as the NIC; in some cases >> the 8139too device driver gets loaded and when under heavy load the network >> eventually stops working. > It would be great to see specific example to verify the issue. Sure. In the gdb debug session included inline below you can see that after a while RxBufPtr-RxBufAddr becomes too small to accommodate a MTU sized packet and, because of the current can_receive logic, rtl8139 will stop receiving packets from other QEMU VLAN clients. > Otherwise the change looks great. Could I have your "Acked-by"? By the way, whose tree should this patch go through? ----- gdb debug session (gdb) p *(RTL8139State *) 0x144143a0 $3 = {phys = "TR\000bS:\000", mult = "\000\000\000\200\000\000\004@", TxStatus = {565346, 565346, 565346, 565346}, TxAddr = {899350528, 899352064, 899353600, 899355136}, RxBuf = 894697472, RxBufferSize = 32768, RxBufPtr = 0, RxBufAddr = 0, IntrStatus = 0, IntrMask = 49279, TxConfig = 2004878976, RxConfig = 63374, RxMissed = 0, CSCR = 832, Cfg9346 = 0 '\0', Config0 = 0 '\0', Config1 = 12 '\f', Config3 = 1 '\001', Config4 = 0 '\0', Config5 = 0 '\0', clock_enabled = 1 '\001', bChipCmdState = 12 '\f', MultiIntr = 0, BasicModeCtrl = 4096, BasicModeStatus = 30765, NWayAdvert = 1505, NWayLPAR = 1505, NWayExpansion = 1, CpCmd = 0, TxThresh = 0 '\0', pci_dev = 0x14414150, vc = 0x14414550, macaddr = "TR\000bS:", rtl8139_mmio_io_addr = 104, currTxDesc = 0, cplus_enabled = 0, currCPlusRxDesc = 0, currCPlusTxDesc = 0, RxRingAddrLO = 0, RxRingAddrHI = 0, eeprom = {contents = {33065, 4332, 33081, 0, 0, 0, 0, 21076, 25088, 14931, 0 <repeats 54 times>}, mode = 1, tick = 0, address = 9 '\t', input = 0, output = 0, eecs = 1 '\001', eesk = 0 '\0', eedi = 0 '\0', eedo = 1 '\001'}, TCTR = 0, TimerInt = 0, TCTR_base = 0, tally_counters = {TxOk = 0, RxOk = 0, TxERR = 0, RxERR = 10, MissPkt = 0, FAE = 0, Tx1Col = 0, TxMCol = 0, RxOkPhy = 1302, RxOkBrd = 2, RxOkMul = 10, TxAbt = 0, TxUndrn = 0}, cplus_txbuffer = 0x0, cplus_txbuffer_len = 0, cplus_txbuffer_offset = 0, timer = 0x0} (gdb) c Continuing. [Thread 0x42c3a940 (LWP 2739) exited] [New Thread 0x42c3a940 (LWP 2791)] Program received signal SIGINT, Interrupt. 0x0000003a3a4cced2 in select () from /lib64/libc.so.6 (gdb) d Delete all breakpoints? (y or n) y (gdb) (gdb) c Continuing. Program received signal SIGINT, Interrupt. 0x0000003a3a4cced2 in select () from /lib64/libc.so.6 (gdb) p *(RTL8139State *) 0x144143a0 $4 = {phys = "TR\000bS:\000", mult = "\000\000\000\200\000\000\004@", TxStatus = {565308, 565346, 565346, 565346}, TxAddr = {899350528, 899352064, 899353600, 899355136}, RxBuf = 894697472, RxBufferSize = 32768, RxBufPtr = 22032, RxBufAddr = 20544, IntrStatus = 0, IntrMask = 49279, TxConfig = 2004878976, RxConfig = 63374, RxMissed = 0, CSCR = 832, Cfg9346 = 0 '\0', Config0 = 0 '\0', Config1 = 12 '\f', Config3 = 1 '\001', Config4 = 0 '\0', Config5 = 0 '\0', clock_enabled = 1 '\001', bChipCmdState = 12 '\f', MultiIntr = 0, BasicModeCtrl = 4096, BasicModeStatus = 30765, NWayAdvert = 1505, NWayLPAR = 1505, NWayExpansion = 1, CpCmd = 0, TxThresh = 0 '\0', pci_dev = 0x14414150, vc = 0x14414550, macaddr = "TR\000bS:", rtl8139_mmio_io_addr = 104, currTxDesc = 1, cplus_enabled = 0, currCPlusRxDesc = 0, currCPlusTxDesc = 0, RxRingAddrLO = 0, RxRingAddrHI = 0, eeprom = {contents = {33065, 4332, 33081, 0, 0, 0, 0, 21076, 25088, 14931, 0 <repeats 54 times>}, mode = 1, tick = 0, address = 9 '\t', input = 0, output = 0, eecs = 1 '\001', eesk = 0 '\0', eedi = 0 '\0', eedo = 1 '\001'}, TCTR = 0, TimerInt = 0, TCTR_base = 0, tally_counters = {TxOk = 0, RxOk = 0, TxERR = 0, RxERR = 10, MissPkt = 0, FAE = 0, Tx1Col = 0, TxMCol = 0, RxOkPhy = 10277, RxOkBrd = 3, RxOkMul = 10, TxAbt = 0, TxUndrn = 0}, cplus_txbuffer = 0x0, cplus_txbuffer_len = 0, cplus_txbuffer_offset = 0, timer = 0x0} ----- Thanks, Fernando
2012/1/30 Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>: > Some drivers (Linux' 8139too among them) rely on the NIC injecting an interrupt > in the event of a receive buffer overflow and, accordingly, set the RxOverflow > bit in the interrupt mask. Unfortunately rtl8139's can_receive method ignores > the RxOverflow flag, which may lead to a situation where rtl8139 stops receiving > packets (can_receive returns 0) when the receive buffer becomes full. > > If the driver eventually read from the receive buffer or reset the card the > emulator could recover from this situation. However some implementations only > do this upon receiving an interrupt with either RxOK or RxOverflow set in the > ISR; interrupt that will never come because QEMU's flow control mechanisms would > prevent rtl8139 from receiving any packet. > > Letting packets go through when the overflow interrupt is enabled makes the > QEMU emulator compliant to the spec and solves the problem. > > This patch should fix a relatively common (in our experience) network stall > observed when running enterprise distros with rtl8139 as the NIC; in some cases > the 8139too device driver gets loaded and when under heavy load the network > eventually stops working. > > Reported-by: Hayato Kakuta <kakuta.hayato@oss.ntt.co.jp> > Tested-by: Hayato Kakuta <kakuta.hayato@oss.ntt.co.jp> > Signed-off-by: Fernando Luis Vazquez Cao<fernando@oss.ntt.co.jp> > --- > > diff -urNp qemu-kvm-orig/hw/rtl8139.c qemu-kvm/hw/rtl8139.c > --- qemu-kvm-orig/hw/rtl8139.c 2012-01-12 20:55:27.000000000 +0900 > +++ qemu-kvm/hw/rtl8139.c 2012-01-18 17:20:12.000000000 +0900 > @@ -824,7 +824,7 @@ static int rtl8139_can_receive(VLANClien > } else { > avail = MOD2(s->RxBufferSize + s->RxBufPtr - s->RxBufAddr, > s->RxBufferSize); > - return (avail == 0 || avail >= 1514); > + return (avail == 0 || avail >= 1514 || (s->IntrMask & RxOverflow)); > } > } > > > Acked-by: Igor Kovalenko <igor.v.kovalenko@gmail.com>
(2012年01月31日 13:12), Igor Kovalenko wrote: > 2012/1/30 Fernando Luis Vázquez Cao<fernando@oss.ntt.co.jp>: >> Some drivers (Linux' 8139too among them) rely on the NIC injecting an interrupt >> in the event of a receive buffer overflow and, accordingly, set the RxOverflow >> bit in the interrupt mask. Unfortunately rtl8139's can_receive method ignores >> the RxOverflow flag, which may lead to a situation where rtl8139 stops receiving >> packets (can_receive returns 0) when the receive buffer becomes full. >> >> If the driver eventually read from the receive buffer or reset the card the >> emulator could recover from this situation. However some implementations only >> do this upon receiving an interrupt with either RxOK or RxOverflow set in the >> ISR; interrupt that will never come because QEMU's flow control mechanisms would >> prevent rtl8139 from receiving any packet. >> >> Letting packets go through when the overflow interrupt is enabled makes the >> QEMU emulator compliant to the spec and solves the problem. >> >> This patch should fix a relatively common (in our experience) network stall >> observed when running enterprise distros with rtl8139 as the NIC; in some cases >> the 8139too device driver gets loaded and when under heavy load the network >> eventually stops working. >> >> Reported-by: Hayato Kakuta<kakuta.hayato@oss.ntt.co.jp> >> Tested-by: Hayato Kakuta<kakuta.hayato@oss.ntt.co.jp> >> Signed-off-by: Fernando Luis Vazquez Cao<fernando@oss.ntt.co.jp> >> --- >> >> diff -urNp qemu-kvm-orig/hw/rtl8139.c qemu-kvm/hw/rtl8139.c >> --- qemu-kvm-orig/hw/rtl8139.c 2012-01-12 20:55:27.000000000 +0900 >> +++ qemu-kvm/hw/rtl8139.c 2012-01-18 17:20:12.000000000 +0900 >> @@ -824,7 +824,7 @@ static int rtl8139_can_receive(VLANClien >> } else { >> avail = MOD2(s->RxBufferSize + s->RxBufPtr - s->RxBufAddr, >> s->RxBufferSize); >> - return (avail == 0 || avail>= 1514); >> + return (avail == 0 || avail>= 1514 || (s->IntrMask& RxOverflow)); >> } >> } >> >> >> > Acked-by: Igor Kovalenko<igor.v.kovalenko@gmail.com> Thank you for the review and the ack, Igor. Anthony, could you pick up this patch? Regards, Fernando
On 01/29/2012 08:17 PM, Fernando Luis Vázquez Cao wrote: > Some drivers (Linux' 8139too among them) rely on the NIC injecting an interrupt > in the event of a receive buffer overflow and, accordingly, set the RxOverflow > bit in the interrupt mask. Unfortunately rtl8139's can_receive method ignores > the RxOverflow flag, which may lead to a situation where rtl8139 stops receiving > packets (can_receive returns 0) when the receive buffer becomes full. > > If the driver eventually read from the receive buffer or reset the card the > emulator could recover from this situation. However some implementations only > do this upon receiving an interrupt with either RxOK or RxOverflow set in the > ISR; interrupt that will never come because QEMU's flow control mechanisms would > prevent rtl8139 from receiving any packet. > > Letting packets go through when the overflow interrupt is enabled makes the > QEMU emulator compliant to the spec and solves the problem. > > This patch should fix a relatively common (in our experience) network stall > observed when running enterprise distros with rtl8139 as the NIC; in some cases > the 8139too device driver gets loaded and when under heavy load the network > eventually stops working. > > Reported-by: Hayato Kakuta<kakuta.hayato@oss.ntt.co.jp> > Tested-by: Hayato Kakuta<kakuta.hayato@oss.ntt.co.jp> > Signed-off-by: Fernando Luis Vazquez Cao<fernando@oss.ntt.co.jp> Applied. Thanks. Regards, Anthony Liguori > --- > > diff -urNp qemu-kvm-orig/hw/rtl8139.c qemu-kvm/hw/rtl8139.c > --- qemu-kvm-orig/hw/rtl8139.c 2012-01-12 20:55:27.000000000 +0900 > +++ qemu-kvm/hw/rtl8139.c 2012-01-18 17:20:12.000000000 +0900 > @@ -824,7 +824,7 @@ static int rtl8139_can_receive(VLANClien > } else { > avail = MOD2(s->RxBufferSize + s->RxBufPtr - s->RxBufAddr, > s->RxBufferSize); > - return (avail == 0 || avail>= 1514); > + return (avail == 0 || avail>= 1514 || (s->IntrMask& RxOverflow)); > } > } > > > > >
On 02/02/2012 07:11 AM, Anthony Liguori wrote: > On 01/29/2012 08:17 PM, Fernando Luis Vázquez Cao wrote: >> Some drivers (Linux' 8139too among them) rely on the NIC injecting an >> interrupt >> in the event of a receive buffer overflow and, accordingly, set the >> RxOverflow >> bit in the interrupt mask. Unfortunately rtl8139's can_receive method >> ignores >> the RxOverflow flag, which may lead to a situation where rtl8139 >> stops receiving >> packets (can_receive returns 0) when the receive buffer becomes full. >> >> If the driver eventually read from the receive buffer or reset the >> card the >> emulator could recover from this situation. However some >> implementations only >> do this upon receiving an interrupt with either RxOK or RxOverflow >> set in the >> ISR; interrupt that will never come because QEMU's flow control >> mechanisms would >> prevent rtl8139 from receiving any packet. >> >> Letting packets go through when the overflow interrupt is enabled >> makes the >> QEMU emulator compliant to the spec and solves the problem. >> >> This patch should fix a relatively common (in our experience) network >> stall >> observed when running enterprise distros with rtl8139 as the NIC; in >> some cases >> the 8139too device driver gets loaded and when under heavy load the >> network >> eventually stops working. >> >> Reported-by: Hayato Kakuta<kakuta.hayato@oss.ntt.co.jp> >> Tested-by: Hayato Kakuta<kakuta.hayato@oss.ntt.co.jp> >> Signed-off-by: Fernando Luis Vazquez Cao<fernando@oss.ntt.co.jp> > > Applied. Thanks. Hi Anthony, It seems that this patch did not make it into the git tree. Did you find any merge conflicts? Thanks, Fernando
On 02/03/2012 11:16 AM, Fernando Luis Vázquez Cao wrote: > On 02/02/2012 07:11 AM, Anthony Liguori wrote: >> On 01/29/2012 08:17 PM, Fernando Luis Vázquez Cao wrote: >>> Some drivers (Linux' 8139too among them) rely on the NIC injecting >>> an interrupt >>> in the event of a receive buffer overflow and, accordingly, set the >>> RxOverflow >>> bit in the interrupt mask. Unfortunately rtl8139's can_receive >>> method ignores >>> the RxOverflow flag, which may lead to a situation where rtl8139 >>> stops receiving >>> packets (can_receive returns 0) when the receive buffer becomes full. >>> >>> If the driver eventually read from the receive buffer or reset the >>> card the >>> emulator could recover from this situation. However some >>> implementations only >>> do this upon receiving an interrupt with either RxOK or RxOverflow >>> set in the >>> ISR; interrupt that will never come because QEMU's flow control >>> mechanisms would >>> prevent rtl8139 from receiving any packet. >>> >>> Letting packets go through when the overflow interrupt is enabled >>> makes the >>> QEMU emulator compliant to the spec and solves the problem. >>> >>> This patch should fix a relatively common (in our experience) >>> network stall >>> observed when running enterprise distros with rtl8139 as the NIC; in >>> some cases >>> the 8139too device driver gets loaded and when under heavy load the >>> network >>> eventually stops working. >>> >>> Reported-by: Hayato Kakuta<kakuta.hayato@oss.ntt.co.jp> >>> Tested-by: Hayato Kakuta<kakuta.hayato@oss.ntt.co.jp> >>> Signed-off-by: Fernando Luis Vazquez Cao<fernando@oss.ntt.co.jp> >> >> Applied. Thanks. > > Hi Anthony, > > It seems that this patch did not make it into the git tree. > Did you find any merge conflicts? ping
diff -urNp qemu-kvm-orig/hw/rtl8139.c qemu-kvm/hw/rtl8139.c --- qemu-kvm-orig/hw/rtl8139.c 2012-01-12 20:55:27.000000000 +0900 +++ qemu-kvm/hw/rtl8139.c 2012-01-18 17:20:12.000000000 +0900 @@ -824,7 +824,7 @@ static int rtl8139_can_receive(VLANClien } else { avail = MOD2(s->RxBufferSize + s->RxBufPtr - s->RxBufAddr, s->RxBufferSize); - return (avail == 0 || avail >= 1514); + return (avail == 0 || avail >= 1514 || (s->IntrMask & RxOverflow)); } }