Message ID | 20130109105100.GA17137@redhat.com |
---|---|
State | New |
Headers | show |
On Wed, Jan 09, 2013 at 12:51:00PM +0200, Michael S. Tsirkin wrote: > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, > qemu started updating ICS register when interrupt > is sent, with the intent to match spec better > (guests do not actually read this register). > However, the function set_interrupt_cause where ICS > is updated is often called internally by > device emulation so reading it does not produce the last value > written by driver. Looking closer at the spec, > it documents ICS as write-only, so there's no need > to update it at all. I conclude that while harmless this line is useless > code so removing it is a bit cleaner than keeping it in. > > Tested with windows and linux guests. > > Cc: Bill Paul <wpaul@windriver.com> > Reported-by: Yan Vugenfirer <yan@daynix.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > hw/e1000.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/hw/e1000.c b/hw/e1000.c > index 92fb00a..928d804 100644 > --- a/hw/e1000.c > +++ b/hw/e1000.c > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val) > val |= E1000_ICR_INT_ASSERTED; > } > s->mac_reg[ICR] = val; > - s->mac_reg[ICS] = val; > qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); > } Looks good according to datasheet but let's give Bill a chance to review this patch before merging. Stefan
"Michael S. Tsirkin" <mst@redhat.com> writes: > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, > qemu started updating ICS register when interrupt > is sent, with the intent to match spec better > (guests do not actually read this register). > However, the function set_interrupt_cause where ICS > is updated is often called internally by > device emulation so reading it does not produce the last value > written by driver. Looking closer at the spec, > it documents ICS as write-only, so there's no need > to update it at all. I conclude that while harmless this line is useless > code so removing it is a bit cleaner than keeping it in. > > Tested with windows and linux guests. > > Cc: Bill Paul <wpaul@windriver.com> > Reported-by: Yan Vugenfirer <yan@daynix.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Yeah, I'm a little confused as to why we would need to set the ICS register too. Unless Bill had a reason that I'm missing: Reviewed-by: Anthony Liguori <aliguori@us.ibm.com> Regards, Anthony Liguori > --- > hw/e1000.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/hw/e1000.c b/hw/e1000.c > index 92fb00a..928d804 100644 > --- a/hw/e1000.c > +++ b/hw/e1000.c > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val) > val |= E1000_ICR_INT_ASSERTED; > } > s->mac_reg[ICR] = val; > - s->mac_reg[ICS] = val; > qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); > } > > -- > MST
On 01/09/2013 06:51 PM, Michael S. Tsirkin wrote: > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, > qemu started updating ICS register when interrupt > is sent, with the intent to match spec better > (guests do not actually read this register). > However, the function set_interrupt_cause where ICS > is updated is often called internally by > device emulation so reading it does not produce the last value > written by driver. Looking closer at the spec, > it documents ICS as write-only, so there's no need > to update it at all. I conclude that while harmless this line is useless > code so removing it is a bit cleaner than keeping it in. > > Tested with windows and linux guests. > > Cc: Bill Paul <wpaul@windriver.com> > Reported-by: Yan Vugenfirer <yan@daynix.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > hw/e1000.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/hw/e1000.c b/hw/e1000.c > index 92fb00a..928d804 100644 > --- a/hw/e1000.c > +++ b/hw/e1000.c > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val) > val |= E1000_ICR_INT_ASSERTED; > } > s->mac_reg[ICR] = val; > - s->mac_reg[ICS] = val; > qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); > } > If my memory is correct, though ICS is marked as read only in the spec, we do can read it when I'm examining a real e1000 card.
On Wed, Jan 09, 2013 at 11:28:29PM +0800, Jason Wang wrote: > On 01/09/2013 06:51 PM, Michael S. Tsirkin wrote: > > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, > > qemu started updating ICS register when interrupt > > is sent, with the intent to match spec better > > (guests do not actually read this register). > > However, the function set_interrupt_cause where ICS > > is updated is often called internally by > > device emulation so reading it does not produce the last value > > written by driver. Looking closer at the spec, > > it documents ICS as write-only, so there's no need > > to update it at all. I conclude that while harmless this line is useless > > code so removing it is a bit cleaner than keeping it in. > > > > Tested with windows and linux guests. > > > > Cc: Bill Paul <wpaul@windriver.com> > > Reported-by: Yan Vugenfirer <yan@daynix.com> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > hw/e1000.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/hw/e1000.c b/hw/e1000.c > > index 92fb00a..928d804 100644 > > --- a/hw/e1000.c > > +++ b/hw/e1000.c > > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val) > > val |= E1000_ICR_INT_ASSERTED; > > } > > s->mac_reg[ICR] = val; > > - s->mac_reg[ICS] = val; > > qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); > > } > > > > If my memory is correct, though ICS is marked as read only in the spec, > we do can read it when I'm examining a real e1000 card. Interesting, this was not Bill's motivation. I haven't seen any reads with linux or windows guests - which guest did trigger them for you? Also, what's the value one would expect?
On 01/09/2013 11:34 PM, Michael S. Tsirkin wrote: > On Wed, Jan 09, 2013 at 11:28:29PM +0800, Jason Wang wrote: >> On 01/09/2013 06:51 PM, Michael S. Tsirkin wrote: >>> Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, >>> qemu started updating ICS register when interrupt >>> is sent, with the intent to match spec better >>> (guests do not actually read this register). >>> However, the function set_interrupt_cause where ICS >>> is updated is often called internally by >>> device emulation so reading it does not produce the last value >>> written by driver. Looking closer at the spec, >>> it documents ICS as write-only, so there's no need >>> to update it at all. I conclude that while harmless this line is useless >>> code so removing it is a bit cleaner than keeping it in. >>> >>> Tested with windows and linux guests. >>> >>> Cc: Bill Paul <wpaul@windriver.com> >>> Reported-by: Yan Vugenfirer <yan@daynix.com> >>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> >>> --- >>> hw/e1000.c | 1 - >>> 1 file changed, 1 deletion(-) >>> >>> diff --git a/hw/e1000.c b/hw/e1000.c >>> index 92fb00a..928d804 100644 >>> --- a/hw/e1000.c >>> +++ b/hw/e1000.c >>> @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val) >>> val |= E1000_ICR_INT_ASSERTED; >>> } >>> s->mac_reg[ICR] = val; >>> - s->mac_reg[ICS] = val; >>> qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); >>> } >>> >> If my memory is correct, though ICS is marked as read only in the spec, >> we do can read it when I'm examining a real e1000 card. > Interesting, this was not Bill's motivation. > I haven't seen any reads with linux or windows guests - > which guest did trigger them for you? > Also, what's the value one would expect? > I also find this violation of spec in the past, so I just hack the linux driver to see the result. I'm not sure if there are some driver depends on this value. I forget the detail of what value it returns, may worth to try again to see.
On Wed, Jan 09, 2013 at 11:36:50PM +0800, Jason Wang wrote: > On 01/09/2013 11:34 PM, Michael S. Tsirkin wrote: > > On Wed, Jan 09, 2013 at 11:28:29PM +0800, Jason Wang wrote: > >> On 01/09/2013 06:51 PM, Michael S. Tsirkin wrote: > >>> Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, > >>> qemu started updating ICS register when interrupt > >>> is sent, with the intent to match spec better > >>> (guests do not actually read this register). > >>> However, the function set_interrupt_cause where ICS > >>> is updated is often called internally by > >>> device emulation so reading it does not produce the last value > >>> written by driver. Looking closer at the spec, > >>> it documents ICS as write-only, so there's no need > >>> to update it at all. I conclude that while harmless this line is useless > >>> code so removing it is a bit cleaner than keeping it in. > >>> > >>> Tested with windows and linux guests. > >>> > >>> Cc: Bill Paul <wpaul@windriver.com> > >>> Reported-by: Yan Vugenfirer <yan@daynix.com> > >>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > >>> --- > >>> hw/e1000.c | 1 - > >>> 1 file changed, 1 deletion(-) > >>> > >>> diff --git a/hw/e1000.c b/hw/e1000.c > >>> index 92fb00a..928d804 100644 > >>> --- a/hw/e1000.c > >>> +++ b/hw/e1000.c > >>> @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val) > >>> val |= E1000_ICR_INT_ASSERTED; > >>> } > >>> s->mac_reg[ICR] = val; > >>> - s->mac_reg[ICS] = val; > >>> qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); > >>> } > >>> > >> If my memory is correct, though ICS is marked as read only in the spec, > >> we do can read it when I'm examining a real e1000 card. > > Interesting, this was not Bill's motivation. > > I haven't seen any reads with linux or windows guests - > > which guest did trigger them for you? > > Also, what's the value one would expect? > > > > I also find this violation of spec in the past, so I just hack the linux > driver to see the result. I'm not sure if there are some driver depends > on this value. I forget the detail of what value it returns, may worth > to try again to see. What Bill wrote in the commit log is that he didn't see any guest reading this.
Of all the gin joints in all the towns in all the world, Michael S. Tsirkin had to walk into mine at 02:51:00 on Wednesday 09 January 2013 and say: > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, > qemu started updating ICS register when interrupt > is sent, with the intent to match spec better > (guests do not actually read this register). > However, the function set_interrupt_cause where ICS > is updated is often called internally by > device emulation so reading it does not produce the last value > written by driver. Looking closer at the spec, > it documents ICS as write-only, so there's no need > to update it at all. I conclude that while harmless this line is useless > code so removing it is a bit cleaner than keeping it in. You are wrong. I know what the spec says. The spec lies (or at the very least, it doesn't agree with reality). With actual PRO/1000 hardware, the ICS register is readable. It provides a way to obtain the currently pending interrupt bits without the auto-clear behavior of the ICR register (which in some cases is actually detrimental). The Intel 10GbE NICs (82598, 82599, x540) are the same way (they're very similar in design to the PRO/1000s, particularly the 82575, 82576, 82580 and later devices). The spes for the 10GbE devices _also_ say that ICS is read only. But if you look at the Intel drivers for those chips, you'll see they have actually implemented a workaround for a device errata (I think the for the 82598) that actually requires reading the ICS register. If you had implemented a PRO/10GbE virtual device based on the same chip and obeyed the spec the same way, I suspect Intel's driver would break. I actually have in my possession real PRO/1000 silicon going all the way back to the 82543 and have tested every single one of them, and they all work like this. I've also asked the Intel LAN access division people about it and they said that in spite of it not being documented as readable, there's nothing particularly wrong with doing this. The problem with using ICR is that the auto-clear behavior can sometimes result in some awkward interrupt handling code. You need to test ICR in interrupt context to see if there are events pending, and then schedule some other thread to handle them. But since reading ICR clears the bits, you need to save the events somewhere so that the other thread knows what events need servicing. Keeping the saved copy of pending events properly synchronized so that you don't lose any is actually big challenge. The VxWorks PRO/1000 driver used to have some very ugly code in it to deal with it, all of which became much simpler when using the ICS register instead. I know what the spec says. But this is a case where the spec only says that because Intel decided not to disclose what the hardware actually does. They also don't tell you about all the hidden debug registers in the hardware either, but that doesn't mean they don't exist. So pretty please, with sugar on top, leave this code alone. -Bill > Tested with windows and linux guests. > > Cc: Bill Paul <wpaul@windriver.com> > Reported-by: Yan Vugenfirer <yan@daynix.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > hw/e1000.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/hw/e1000.c b/hw/e1000.c > index 92fb00a..928d804 100644 > --- a/hw/e1000.c > +++ b/hw/e1000.c > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t > val) val |= E1000_ICR_INT_ASSERTED; > } > s->mac_reg[ICR] = val; > - s->mac_reg[ICS] = val; > qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); > }
On Wed, Jan 09, 2013 at 09:30:43AM -0800, Bill Paul wrote: > Of all the gin joints in all the towns in all the world, Michael S. Tsirkin > had to walk into mine at 02:51:00 on Wednesday 09 January 2013 and say: > > > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, > > qemu started updating ICS register when interrupt > > is sent, with the intent to match spec better > > (guests do not actually read this register). > > However, the function set_interrupt_cause where ICS > > is updated is often called internally by > > device emulation so reading it does not produce the last value > > written by driver. Looking closer at the spec, > > it documents ICS as write-only, so there's no need > > to update it at all. I conclude that while harmless this line is useless > > code so removing it is a bit cleaner than keeping it in. > > You are wrong. > > I know what the spec says. The spec lies (or at the very least, it doesn't > agree with reality). With actual PRO/1000 hardware, the ICS register is > readable. It provides a way to obtain the currently pending interrupt bits > without the auto-clear behavior of the ICR register (which in some cases is > actually detrimental). > > The Intel 10GbE NICs (82598, 82599, x540) are the same way (they're very > similar in design to the PRO/1000s, particularly the 82575, 82576, 82580 and > later devices). The spes for the 10GbE devices _also_ say that ICS is read > only. But if you look at the Intel drivers for those chips, you'll see they > have actually implemented a workaround for a device errata (I think the for > the 82598) that actually requires reading the ICS register. If you had > implemented a PRO/10GbE virtual device based on the same chip and obeyed the > spec the same way, I suspect Intel's driver would break. > > I actually have in my possession real PRO/1000 silicon going all the way back > to the 82543 and have tested every single one of them, and they all work like > this. I've also asked the Intel LAN access division people about it and they > said that in spite of it not being documented as readable, there's nothing > particularly wrong with doing this. > > The problem with using ICR is that the auto-clear behavior can sometimes > result in some awkward interrupt handling code. You need to test ICR in > interrupt context to see if there are events pending, and then schedule some > other thread to handle them. But since reading ICR clears the bits, you need > to save the events somewhere so that the other thread knows what events need > servicing. Keeping the saved copy of pending events properly synchronized so > that you don't lose any is actually big challenge. The VxWorks PRO/1000 driver > used to have some very ugly code in it to deal with it, all of which became > much simpler when using the ICS register instead. > > I know what the spec says. But this is a case where the spec only says that > because Intel decided not to disclose what the hardware actually does. They > also don't tell you about all the hidden debug registers in the hardware > either, but that doesn't mean they don't exist. > > So pretty please, with sugar on top, leave this code alone. > > -Bill OKay, so it's an undocumented feature that some drivers use, instead of (as commit log implies) being a documented feature that drivers don't use. Thanks for the clarification. I will put info in code comments. > > Tested with windows and linux guests. > > > > Cc: Bill Paul <wpaul@windriver.com> > > Reported-by: Yan Vugenfirer <yan@daynix.com> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > hw/e1000.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/hw/e1000.c b/hw/e1000.c > > index 92fb00a..928d804 100644 > > --- a/hw/e1000.c > > +++ b/hw/e1000.c > > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t > > val) val |= E1000_ICR_INT_ASSERTED; > > } > > s->mac_reg[ICR] = val; > > - s->mac_reg[ICS] = val; > > qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); > > } > > -- > ============================================================================= > -Bill Paul (510) 749-2329 | Member of Technical Staff, > wpaul@windriver.com | Master of Unix-Fu - Wind River Systems > ============================================================================= > "I put a dollar in a change machine. Nothing changed." - George Carlin > =============================================================================
Bill Paul <wpaul@windriver.com> writes: > Of all the gin joints in all the towns in all the world, Michael S. Tsirkin > had to walk into mine at 02:51:00 on Wednesday 09 January 2013 and say: > >> Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, >> qemu started updating ICS register when interrupt >> is sent, with the intent to match spec better >> (guests do not actually read this register). >> However, the function set_interrupt_cause where ICS >> is updated is often called internally by >> device emulation so reading it does not produce the last value >> written by driver. Looking closer at the spec, >> it documents ICS as write-only, so there's no need >> to update it at all. I conclude that while harmless this line is useless >> code so removing it is a bit cleaner than keeping it in. > > You are wrong. > > I know what the spec says. The spec lies (or at the very least, it doesn't > agree with reality). With actual PRO/1000 hardware, the ICS register is > readable. It provides a way to obtain the currently pending interrupt bits > without the auto-clear behavior of the ICR register (which in some cases is > actually detrimental). > > The Intel 10GbE NICs (82598, 82599, x540) are the same way (they're very > similar in design to the PRO/1000s, particularly the 82575, 82576, 82580 and > later devices). The spes for the 10GbE devices _also_ say that ICS is read > only. But if you look at the Intel drivers for those chips, you'll see they > have actually implemented a workaround for a device errata (I think the for > the 82598) that actually requires reading the ICS register. If you had > implemented a PRO/10GbE virtual device based on the same chip and obeyed the > spec the same way, I suspect Intel's driver would break. > > I actually have in my possession real PRO/1000 silicon going all the way back > to the 82543 and have tested every single one of them, and they all work like > this. I've also asked the Intel LAN access division people about it and they > said that in spite of it not being documented as readable, there's nothing > particularly wrong with doing this. > > The problem with using ICR is that the auto-clear behavior can sometimes > result in some awkward interrupt handling code. You need to test ICR in > interrupt context to see if there are events pending, and then schedule some > other thread to handle them. But since reading ICR clears the bits, you need > to save the events somewhere so that the other thread knows what events need > servicing. Keeping the saved copy of pending events properly synchronized so > that you don't lose any is actually big challenge. The VxWorks PRO/1000 driver > used to have some very ugly code in it to deal with it, all of which became > much simpler when using the ICS register instead. > > I know what the spec says. But this is a case where the spec only says that > because Intel decided not to disclose what the hardware actually does. They > also don't tell you about all the hidden debug registers in the hardware > either, but that doesn't mean they don't exist. > > So pretty please, with sugar on top, leave this code alone. I figured this would be the case but this is where a comment in the code or commit message would have helped a lot in avoiding future confusion. Regards, Anthony Liguori > > -Bill > >> Tested with windows and linux guests. >> >> Cc: Bill Paul <wpaul@windriver.com> >> Reported-by: Yan Vugenfirer <yan@daynix.com> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> >> --- >> hw/e1000.c | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/hw/e1000.c b/hw/e1000.c >> index 92fb00a..928d804 100644 >> --- a/hw/e1000.c >> +++ b/hw/e1000.c >> @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t >> val) val |= E1000_ICR_INT_ASSERTED; >> } >> s->mac_reg[ICR] = val; >> - s->mac_reg[ICS] = val; >> qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); >> } > > -- > ============================================================================= > -Bill Paul (510) 749-2329 | Member of Technical Staff, > wpaul@windriver.com | Master of Unix-Fu - Wind River Systems > ============================================================================= > "I put a dollar in a change machine. Nothing changed." - George Carlin > =============================================================================
On Wed, Jan 09, 2013 at 09:30:43AM -0800, Bill Paul wrote: > Of all the gin joints in all the towns in all the world, Michael S. Tsirkin > had to walk into mine at 02:51:00 on Wednesday 09 January 2013 and say: > > > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, > > qemu started updating ICS register when interrupt > > is sent, with the intent to match spec better > > (guests do not actually read this register). > > However, the function set_interrupt_cause where ICS > > is updated is often called internally by > > device emulation so reading it does not produce the last value > > written by driver. Looking closer at the spec, > > it documents ICS as write-only, so there's no need > > to update it at all. I conclude that while harmless this line is useless > > code so removing it is a bit cleaner than keeping it in. > > You are wrong. > > I know what the spec says. The spec lies (or at the very least, it doesn't > agree with reality). With actual PRO/1000 hardware, the ICS register is > readable. It provides a way to obtain the currently pending interrupt bits > without the auto-clear behavior of the ICR register (which in some cases is > actually detrimental). > > The Intel 10GbE NICs (82598, 82599, x540) are the same way (they're very > similar in design to the PRO/1000s, particularly the 82575, 82576, 82580 and > later devices). The spes for the 10GbE devices _also_ say that ICS is read > only. But if you look at the Intel drivers for those chips, you'll see they > have actually implemented a workaround for a device errata (I think the for > the 82598) that actually requires reading the ICS register. If you had > implemented a PRO/10GbE virtual device based on the same chip and obeyed the > spec the same way, I suspect Intel's driver would break. > > I actually have in my possession real PRO/1000 silicon going all the way back > to the 82543 and have tested every single one of them, and they all work like > this. I've also asked the Intel LAN access division people about it and they > said that in spite of it not being documented as readable, there's nothing > particularly wrong with doing this. > > The problem with using ICR is that the auto-clear behavior can sometimes > result in some awkward interrupt handling code. You need to test ICR in > interrupt context to see if there are events pending, and then schedule some > other thread to handle them. But since reading ICR clears the bits, you need > to save the events somewhere so that the other thread knows what events need > servicing. Keeping the saved copy of pending events properly synchronized so > that you don't lose any is actually big challenge. The VxWorks PRO/1000 driver > used to have some very ugly code in it to deal with it, all of which became > much simpler when using the ICS register instead. > > I know what the spec says. But this is a case where the spec only says that > because Intel decided not to disclose what the hardware actually does. They > also don't tell you about all the hidden debug registers in the hardware > either, but that doesn't mean they don't exist. > > So pretty please, with sugar on top, leave this code alone. > > -Bill OK now since there's no spec, I'd like to find out how does the register behave. Let's assume I read ICR. This clears ICR - does it also clear ICS? Thanks, MST > > Tested with windows and linux guests. > > > > Cc: Bill Paul <wpaul@windriver.com> > > Reported-by: Yan Vugenfirer <yan@daynix.com> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > hw/e1000.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/hw/e1000.c b/hw/e1000.c > > index 92fb00a..928d804 100644 > > --- a/hw/e1000.c > > +++ b/hw/e1000.c > > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t > > val) val |= E1000_ICR_INT_ASSERTED; > > } > > s->mac_reg[ICR] = val; > > - s->mac_reg[ICS] = val; > > qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); > > } > > -- > ============================================================================= > -Bill Paul (510) 749-2329 | Member of Technical Staff, > wpaul@windriver.com | Master of Unix-Fu - Wind River Systems > ============================================================================= > "I put a dollar in a change machine. Nothing changed." - George Carlin > =============================================================================
Of all the gin joints in all the towns in all the world, Michael S. Tsirkin had to walk into mine at 13:44:38 on Wednesday 09 January 2013 and say: > On Wed, Jan 09, 2013 at 09:30:43AM -0800, Bill Paul wrote: > > Of all the gin joints in all the towns in all the world, Michael S. > > Tsirkin > > > > had to walk into mine at 02:51:00 on Wednesday 09 January 2013 and say: > > > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, > > > qemu started updating ICS register when interrupt > > > is sent, with the intent to match spec better > > > (guests do not actually read this register). > > > However, the function set_interrupt_cause where ICS > > > is updated is often called internally by > > > device emulation so reading it does not produce the last value > > > written by driver. Looking closer at the spec, > > > it documents ICS as write-only, so there's no need > > > to update it at all. I conclude that while harmless this line is > > > useless code so removing it is a bit cleaner than keeping it in. > > > > You are wrong. > > > > I know what the spec says. The spec lies (or at the very least, it > > doesn't agree with reality). With actual PRO/1000 hardware, the ICS > > register is readable. It provides a way to obtain the currently pending > > interrupt bits without the auto-clear behavior of the ICR register > > (which in some cases is actually detrimental). > > > > The Intel 10GbE NICs (82598, 82599, x540) are the same way (they're very > > similar in design to the PRO/1000s, particularly the 82575, 82576, 82580 > > and later devices). The spes for the 10GbE devices _also_ say that ICS > > is read only. But if you look at the Intel drivers for those chips, > > you'll see they have actually implemented a workaround for a device > > errata (I think the for the 82598) that actually requires reading the > > ICS register. If you had implemented a PRO/10GbE virtual device based on > > the same chip and obeyed the spec the same way, I suspect Intel's driver > > would break. > > > > I actually have in my possession real PRO/1000 silicon going all the way > > back to the 82543 and have tested every single one of them, and they all > > work like this. I've also asked the Intel LAN access division people > > about it and they said that in spite of it not being documented as > > readable, there's nothing particularly wrong with doing this. > > > > The problem with using ICR is that the auto-clear behavior can sometimes > > result in some awkward interrupt handling code. You need to test ICR in > > interrupt context to see if there are events pending, and then schedule > > some other thread to handle them. But since reading ICR clears the bits, > > you need to save the events somewhere so that the other thread knows > > what events need servicing. Keeping the saved copy of pending events > > properly synchronized so that you don't lose any is actually big > > challenge. The VxWorks PRO/1000 driver used to have some very ugly code > > in it to deal with it, all of which became much simpler when using the > > ICS register instead. > > > > I know what the spec says. But this is a case where the spec only says > > that because Intel decided not to disclose what the hardware actually > > does. They also don't tell you about all the hidden debug registers in > > the hardware either, but that doesn't mean they don't exist. > > > > So pretty please, with sugar on top, leave this code alone. > > > > -Bill > > OK now since there's no spec, I'd like to find out how does the > register behave. Let's assume I read ICR. This clears ICR - does it > also clear ICS? Yes. ICS mirrors ICR: reading ICS lets you peek at the contents of ICR without auto-clearing them. If you read ICR, you're acknowledging any events that may be pending. Let's say the LSC (link state change) becomes set, because you unplugged the cable. The LSC bit becomes set in both ICR and in ICS. If you read ICS first, then you'll see the LSC bit is set, and it'll stay set (the event stays unacked). If you read ICR first, then you'll see the LSC bit is set, and you'll clear it (the event is acked). If you read ICS _after_ you read ICR, then LSC will be clear (because reading ICR first acked it). -Bill > Thanks, > MST > > > > Tested with windows and linux guests. > > > > > > Cc: Bill Paul <wpaul@windriver.com> > > > Reported-by: Yan Vugenfirer <yan@daynix.com> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > --- > > > > > > hw/e1000.c | 1 - > > > 1 file changed, 1 deletion(-) > > > > > > diff --git a/hw/e1000.c b/hw/e1000.c > > > index 92fb00a..928d804 100644 > > > --- a/hw/e1000.c > > > +++ b/hw/e1000.c > > > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, > > > uint32_t val) val |= E1000_ICR_INT_ASSERTED; > > > > > > } > > > s->mac_reg[ICR] = val; > > > > > > - s->mac_reg[ICS] = val; > > > > > > qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != > > > 0); > > > > > > }
On Wed, Jan 09, 2013 at 01:50:39PM -0600, Anthony Liguori wrote: > I figured this would be the case but this is where a comment in the code > or commit message would have helped a lot in avoiding future confusion. > > Regards, > > Anthony Liguori I just sent a documentation patch, please take a look. Thanks!
Of all the gin joints in all the towns in all the world, Michael S. Tsirkin had to walk into mine at 14:07:53 on Wednesday 09 January 2013 and say: > On Wed, Jan 09, 2013 at 01:50:39PM -0600, Anthony Liguori wrote: > > I figured this would be the case but this is where a comment in the code > > or commit message would have helped a lot in avoiding future confusion. > > > > Regards, > > > > Anthony Liguori > > I just sent a documentation patch, please take a look. > Thanks! Well, strictly speaking, VxWorks doesn't rely on the ICS behavior anymore, but it prefers it. I ran into problems with VMware and Simics too, as they also don't emulate the ICS behavior, and getting that fixed was more of an uphill battle. So as a compromise I added a check to the driver to see if the ICS register has 'real hardware behavior' or 'emulated behavior' and made it use an alternate interrupt handling scheme in the emulated case. The alternate scheme is a little less efficient than the ICS scheme in some cases (mainly when the PRO/1000 device's interrupt is shared), but it still handles interrupts reliably. That's a minor nit though. I'm fine with the comments as is. I just didn't want you do think VxWorks was completely brain damaged. :) Note that I don't think VMware emulates the 'flexible RX mode' descriptor mechanism for the PRO/100 either, because the non-NDAed PRO/100 manual doesn't bother to mention it exists. I think this is something that was fixed in QEMU but I haven't had a chance to test it in a while. There's a similar problem with simulated the AMD PCnet/PCI devices (they only support the 'configuration block' setup method -- for the older non-PCI LANCE chips that was the only way to configure them but PCI devices starting with the am97c970 can be configured just by setting up registers). Honestly I'm surprised I still have all my hair and that it's still the same color. -Bill
On Wed, Jan 09, 2013 at 02:21:50PM -0800, Bill Paul wrote: > Of all the gin joints in all the towns in all the world, Michael S. Tsirkin > had to walk into mine at 14:07:53 on Wednesday 09 January 2013 and say: > > > On Wed, Jan 09, 2013 at 01:50:39PM -0600, Anthony Liguori wrote: > > > I figured this would be the case but this is where a comment in the code > > > or commit message would have helped a lot in avoiding future confusion. > > > > > > Regards, > > > > > > Anthony Liguori > > > > I just sent a documentation patch, please take a look. > > Thanks! > > Well, strictly speaking, VxWorks doesn't rely on the ICS behavior anymore, but > it prefers it. I ran into problems with VMware and Simics too, as they also > don't emulate the ICS behavior, and getting that fixed was more of an uphill > battle. So as a compromise I added a check to the driver to see if the ICS > register has 'real hardware behavior' or 'emulated behavior' and made it use > an alternate interrupt handling scheme in the emulated case. The alternate > scheme is a little less efficient than the ICS scheme in some cases (mainly > when the PRO/1000 device's interrupt is shared), but it still handles > interrupts reliably. > > That's a minor nit though. I'm fine with the comments as is. I just didn't > want you do think VxWorks was completely brain damaged. :) I guess I'll change it to "The VxWorks PRO/1000 driver uses this behaviour." > Note that I don't think VMware emulates the 'flexible RX mode' descriptor > mechanism for the PRO/100 either, because the non-NDAed PRO/100 manual doesn't > bother to mention it exists. I think this is something that was fixed in QEMU > but I haven't had a chance to test it in a while. There's a similar problem > with simulated the AMD PCnet/PCI devices (they only support the 'configuration > block' setup method -- for the older non-PCI LANCE chips that was the only way > to configure them but PCI devices starting with the am97c970 can be configured > just by setting up registers). > > Honestly I'm surprised I still have all my hair and that it's still the same > color. > > -Bill Yes virtio is much easier. All bugs there are our own. > -- > ============================================================================= > -Bill Paul (510) 749-2329 | Member of Technical Staff, > wpaul@windriver.com | Master of Unix-Fu - Wind River Systems > ============================================================================= > "I put a dollar in a change machine. Nothing changed." - George Carlin > =============================================================================
diff --git a/hw/e1000.c b/hw/e1000.c index 92fb00a..928d804 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val) val |= E1000_ICR_INT_ASSERTED; } s->mac_reg[ICR] = val; - s->mac_reg[ICS] = val; qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0); }
Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968, qemu started updating ICS register when interrupt is sent, with the intent to match spec better (guests do not actually read this register). However, the function set_interrupt_cause where ICS is updated is often called internally by device emulation so reading it does not produce the last value written by driver. Looking closer at the spec, it documents ICS as write-only, so there's no need to update it at all. I conclude that while harmless this line is useless code so removing it is a bit cleaner than keeping it in. Tested with windows and linux guests. Cc: Bill Paul <wpaul@windriver.com> Reported-by: Yan Vugenfirer <yan@daynix.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- hw/e1000.c | 1 - 1 file changed, 1 deletion(-)