diff mbox

Re: still having r8169 woes with XID 18000000

Message ID 4C09FF4B.4050601@iki.fi
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Timo Teras June 5, 2010, 7:39 a.m. UTC
On 06/04/2010 11:24 PM, Timo Teräs wrote:
> On 06/04/2010 08:31 PM, Timo Teräs wrote:
> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> r8169 0000:00:09.0: PCI->APIC IRQ transform: INT A -> IRQ 18
> r8169 0000:00:09.0: no PCI Express capability
> eth0: RTL8169sc/8110sc at 0xf835c000, 00:30:18:a6:2b:6c, XID 18000000 IRQ 18
> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> r8169 0000:00:0b.0: PCI->APIC IRQ transform: INT A -> IRQ 19
> r8169 0000:00:0b.0: no PCI Express capability
> eth1: RTL8169sc/8110sc at 0xf8360000, 00:30:18:a6:2b:6d, XID 18000000 IRQ 19
> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> r8169 0000:00:0c.0: PCI->APIC IRQ transform: INT A -> IRQ 16
> r8169 0000:00:0c.0: no PCI Express capability
> eth2: RTL8169sc/8110sc at 0xf8364000, 00:30:18:a6:2b:6e, XID 18000000 IRQ 16
> r8169: mdio_write(f8364000, 0x00000003, 0000000a1) required 2000 cycles
> r8169: mdio_write(f8364000, 0x00000000, 000001000) required 2000 cycles
> r8169: mdio_write(f8364000, 0x00000000, 00000a0ff) required 2000 cycles
> r8169: mdio_write(f8364000, 0x00000014, 00000fb54) required 2000 cycles
> 
> And eth2 was not working. Reloading the module gave a lot of other
> mdio_write and mdio_read errors.
> 
> It seems to be pretty random when the errors occur, but that's the
> reason why the NIC stops working: mdio_write() fails (one or more times)
> at some crucial point of the board specific phy config code resulting in
> bad state.
> 
> Any ideas how to debug this further?

Ok, I compared Realtek's and the in-tree driver. The only essential
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller June 5, 2010, 9:02 a.m. UTC | #1
From: Timo Teräs <timo.teras@iki.fi>
Date: Sat, 05 Jun 2010 10:39:55 +0300

> Ok, I compared Realtek's and the in-tree driver. The only essential
> difference is that Realtek driver uses udelay(100) in mdio_write()'s
> busy polling loop where as the in-tree uses udelay(25). And that seems
> to be the magic difference! Using udelay(100) fixes this!
> 
> I'm guessing that the phy needs slight delay between consecutive
> mdio_write's even if it has advertised that the write has been
> completed. And yes, just adding a small delay in the end of mdio_write
> does seem to work too.
> 
> Francois, you think the below patch is ok? Should I send it as properly
> formatted commit?

Excellent detective work, Francois please review!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 5, 2010, 9:13 a.m. UTC | #2
Le samedi 05 juin 2010 à 10:39 +0300, Timo Teräs a écrit :

> Ok, I compared Realtek's and the in-tree driver. The only essential
> difference is that Realtek driver uses udelay(100) in mdio_write()'s
> busy polling loop where as the in-tree uses udelay(25). And that seems
> to be the magic difference! Using udelay(100) fixes this!
> 
> I'm guessing that the phy needs slight delay between consecutive
> mdio_write's even if it has advertised that the write has been
> completed. And yes, just adding a small delay in the end of mdio_write
> does seem to work too.
> 
> Francois, you think the below patch is ok? Should I send it as properly
> formatted commit?
> 
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 217e709..6db62bf 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -559,6 +559,7 @@ static void mdio_write(void __iomem *ioaddr,
>  			break;
>  		udelay(25);
>  	}
> +	udelay(25);
>  }
> 
>  static int mdio_read(void __iomem *ioaddr, int reg_addr)
> --

Sure this deserves an official patch with all prereqs, but please add a
comment in mdio_write() why this extra udelay(25) is needed, especially
since you say of udelay(100) 'fixes the bug'.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Timo Teras June 5, 2010, 9:21 a.m. UTC | #3
On 06/05/2010 12:13 PM, Eric Dumazet wrote:
> Le samedi 05 juin 2010 à 10:39 +0300, Timo Teräs a écrit :
>> Ok, I compared Realtek's and the in-tree driver. The only essential
>> difference is that Realtek driver uses udelay(100) in mdio_write()'s
>> busy polling loop where as the in-tree uses udelay(25). And that seems
>> to be the magic difference! Using udelay(100) fixes this!
>>
>> I'm guessing that the phy needs slight delay between consecutive
>> mdio_write's even if it has advertised that the write has been
>> completed. And yes, just adding a small delay in the end of mdio_write
>> does seem to work too.
>>
>> Francois, you think the below patch is ok? Should I send it as properly
>> formatted commit?
>>
>> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
>> index 217e709..6db62bf 100644
>> --- a/drivers/net/r8169.c
>> +++ b/drivers/net/r8169.c
>> @@ -559,6 +559,7 @@ static void mdio_write(void __iomem *ioaddr,
>>  			break;
>>  		udelay(25);
>>  	}
>> +	udelay(25);
>>  }
>>
>>  static int mdio_read(void __iomem *ioaddr, int reg_addr)
>> --
> 
> Sure this deserves an official patch with all prereqs, but please add a
> comment in mdio_write() why this extra udelay(25) is needed, especially
> since you say of udelay(100) 'fixes the bug'.

Uh, yes. The intention was to get feedback if this is the proper place
for the delay. I first thought that maybe we could just add the delay to
the phy config function writing the values in tight loop... but as it
appears even some other mdio_writes seem to fail, this seems to be the
logical and correct place.

Alternatively, I was thinking if someone had specs to check if they say
anything about delay needed between mdio writes.

Reason why adding new delay in the end of the function is better is
because using udelay(100) in the loop means that anything between
0..100us has elapsed after the "write complete" is acked; if very
unlucky the "write complete" happens just after our udelay has ended and
there would be no delay between next mdio write.

Adding the additional udelay(25) in the end guarantees small delay
between "write complete" ack and the next write. And yes, it needs a
comment.

I'll send a new patch shortly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

difference is that Realtek driver uses udelay(100) in mdio_write()'s
busy polling loop where as the in-tree uses udelay(25). And that seems
to be the magic difference! Using udelay(100) fixes this!

I'm guessing that the phy needs slight delay between consecutive
mdio_write's even if it has advertised that the write has been
completed. And yes, just adding a small delay in the end of mdio_write
does seem to work too.

Francois, you think the below patch is ok? Should I send it as properly
formatted commit?

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 217e709..6db62bf 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -559,6 +559,7 @@  static void mdio_write(void __iomem *ioaddr,
 			break;
 		udelay(25);
 	}
+	udelay(25);
 }

 static int mdio_read(void __iomem *ioaddr, int reg_addr)