diff mbox

[OpenWrt-Devel,1/2] ar71xx: fix ethernet packet loss issues on OM5P-AN

Message ID 2034204.MktLeOeYPv@bentobox
State Not Applicable
Headers show

Commit Message

Sven Eckelmann April 10, 2015, 3:58 p.m. UTC
On Friday 10 April 2015 17:39:04 Daniel Golle wrote:
> Hi!
> 
> Sorry for hijacking this thread...
> I observed some recent ubiquiti gear suffering from similar issues,
> suspected the hardware being broken or some EE functionality missing in
> the driver as the issue usually occurs after some time of inactivity...
> My logs repeatedly get filled with stuff like that:
> Fri Apr 10 17:31:07 2015 kern.info kernel: [75029.020000] eth0: tx timeout
> Fri Apr 10 17:31:07 2015 kern.info kernel: [75029.020000] eth0: link down
> Fri Apr 10 17:31:08 2015 kern.info kernel: [75029.850000] eth0: link up
> (100Mbps/Full duplex)
> 
> Observed on ubnt nanostation loco m5 xw and recent bullet m5 with phy
> having phy_id 0x004dd041
> 
> Could that be related?

I don't know the nanostation loco m5 xw. But this is at least not what I see
here. The link would be stable (at least the PHY thinks that it is stable) but
no frames/not all fames are transfered from the device to its link-partner.

I had problems in the past with some optimization by from Felix which caused
situations like that. For some reason the device was not correctly reseted on
errors. This looked like it was caused by his reset optimizations. When this
device uses the ag71xx driver, is a ar724x device and you suspect that this is
a partial reset problem then you may try something like this



Of course, this is only a test and may only be useful when you can reproduce
the problem. It could easily be something completely else. But hard to tell
without knowing the hardware or having seen the problem before. :)

Kind regards,
	Sven

Comments

Daniel Golle April 10, 2015, 5:06 p.m. UTC | #1
On Fri, Apr 10, 2015 at 05:58:04PM +0200, Sven Eckelmann wrote:
> I don't know the nanostation loco m5 xw. But this is at least not what I see
> here. The link would be stable (at least the PHY thinks that it is stable) but
> no frames/not all fames are transfered from the device to its link-partner.
Ok, that's really a different thing then (-> changing the subject of
the thread)

> I had problems in the past with some optimization by from Felix which caused
> situations like that. For some reason the device was not correctly reseted on
> errors. This looked like it was caused by his reset optimizations. When this
> device uses the ag71xx driver, is a ar724x device and you suspect that this is
> a partial reset problem then you may try something like this

I flashed the affected device (turns out to be a non-XW nanostation)
with your testing patch applied. I'll see how it goes, I'll let you
know in the next days if the ethernet link is more stable now.


Cheers


Daniel
Daniel Golle April 11, 2015, 10:25 a.m. UTC | #2
On Fri, Apr 10, 2015 at 07:06:24PM +0200, Daniel Golle wrote:
> On Fri, Apr 10, 2015 at 05:58:04PM +0200, Sven Eckelmann wrote:
> > I had problems in the past with some optimization by from Felix which caused
> > situations like that. For some reason the device was not correctly reseted on
> > errors. This looked like it was caused by his reset optimizations. When this
> > device uses the ag71xx driver, is a ar724x device and you suspect that this is
> > a partial reset problem then you may try something like this
> 
> I flashed the affected device (turns out to be a non-XW nanostation)
> with your testing patch applied. I'll see how it goes, I'll let you
> know in the next days if the ethernet link is more stable now.

Result: the problem persists also with your patch applied...
Conor O'Gorman April 13, 2015, 8:15 a.m. UTC | #3
On 11/04/15 11:25, Daniel Golle wrote:
> On Fri, Apr 10, 2015 at 07:06:24PM +0200, Daniel Golle wrote:
>> On Fri, Apr 10, 2015 at 05:58:04PM +0200, Sven Eckelmann wrote:
>>> I had problems in the past with some optimization by from Felix which caused
>>> situations like that. For some reason the device was not correctly reseted on
>>> errors. This looked like it was caused by his reset optimizations. When this
>>> device uses the ag71xx driver, is a ar724x device and you suspect that this is
>>> a partial reset problem then you may try something like this
>>
>> I flashed the affected device (turns out to be a non-XW nanostation)
>> with your testing patch applied. I'll see how it goes, I'll let you
>> know in the next days if the ethernet link is more stable now.
>
> Result: the problem persists also with your patch applied...
>
I am seeing trouble like this on an OM2P. I suspect it is due to pause 
frames.
diff mbox

Patch

--- a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
+++ b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
@@ -865,12 +865,6 @@  static void ag71xx_restart_work_func(struct work_struct *work)
 {
 	struct ag71xx *ag = container_of(work, struct ag71xx, restart_work);
 
-	if (ag71xx_get_pdata(ag)->is_ar724x) {
-		ag->link = 0;
-		ag71xx_link_adjust(ag);
-		return;
-	}
-
 	ag71xx_stop(ag->dev);
 	ag71xx_open(ag->dev);
 }