diff mbox

tulip_rxtx_stop() on Cobalt Qube2

Message ID 20090531014036.GA23050@lackof.org
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Grant Grundler May 31, 2009, 1:40 a.m. UTC
Florian,
Summary: proposed patch below for you to test (I've not tested it yet)
and some explanation on what I think is happening.


On Fri, May 01, 2009 at 12:38:57PM +0200, Florian Fainelli wrote:
...
> Linux Tulip driver version 1.1.15-NAPI (Feb 27, 2007)
...
> PCI: Enabling device 0000:00:0c.0 (0005 -> 0007)
> tulip1: Old format EEPROM on 'Cobalt Microserver' board.  Using substitute 
> media control info.
> tulip1:  EEPROM default media type Autosense.
> tulip1:  Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block.
> tulip1:  MII transceiver #1 config 1000 status 7809 advertising 01e1.
> eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0x12082400, 00:10:e0:00:88:b9, 
> IRQ 20.
> [snip]
> 0000:00:07.0: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb20e2202)

Looking up these bits in the publicly available manual:
    ftp://download.intel.com/design/network/manuals/27807401.pdf

"Rev 65" == 0x41 == 21143-PD or 21143-TD (page 3-7)

Operation Mode Register (CSR6­Offset 30H)
----
bit : Val
31  : 1 Special Capture Effect Enable
30  : 1 Receive All
13  : 1 ST - Start Transmission
9   : 1 Full Duplex
1   : 1 SR - Start Receive

Status Register (CSR5­Offset 28H)
----
28-31 : reserved
23-25 : 0x6 TX State = Suspended--Transmit FIFO underflow, or an unavailable transmit descriptor
20-22 : 0x3 RX State = Running--Waiting for receive packet
1     : 0x0 TX isn't stopped
0     : 0x0 No TX Interrupt pending

The RX/TX engines are in a wedged state to begin with. :(

The normal calling path here is tulip_init() to register the driver
callbacks and tulip_init_one() gets called immediately by pci subsystem.
tulip_up() gets called after netdev registration when someone ifconfig's
the device (ifconfig up). I don't know when else tulip_up() is called.

I have two ideas on how to fix this:
1) reset the RX/TX engines in tulip_init_one() before tulip_stop_rxtx().
2) reset the RX/TX engines in tulip_stop_rxtx() if they are "wedged".
3) remove tulip_stop_rxtx() call in tulip_init_one()


(1) seems like a reasonable thing to do at init time anyway.
(2) feels like a pretty big hammer and I don't know all the side effects.
(3) tulip_up() will reset the RX/TX engine and call tulip_stop_rxtx()
   when the NIC is opene/ifconfig'd. Calling pci_set_master()
   will allow the device to scribble into Host Memory...probably
   should move that call to tulip_up() as well. But I don't see anything
   in tulip_init_one() that requires DMA (and thus pci_set_master()).

Is there any reason for NOT implementing (1) and (3)?

I still need to work out where to call pci_set_master() in
tulip_up(). If that's feasible,

hrm. WTF. tulip_stop_rxtx() is called again from tulip_up() right before
CSR6 is written. Earlier in tulip_up() we reset the chip.  Removed.

Can you test the patch below?

I have not tested or even compiled this patch...will do so on parisc/ia64
machines once I get some feedback on this patch.

And I just noticed pci_clear_master() is not called *anywhere*. :(
Need to add such a call after tulip_stop_rxtx() some place (many places?).
This patch is just RFC and not suitable for merging upstream.

many thanks,
grant


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Florian Fainelli May 31, 2009, 9:02 p.m. UTC | #1
Hi Grant,

Le Sunday 31 May 2009 03:40:36 Grant Grundler, vous avez écrit :
> Florian,
> Summary: proposed patch below for you to test (I've not tested it yet)
> and some explanation on what I think is happening.
>
>
> On Fri, May 01, 2009 at 12:38:57PM +0200, Florian Fainelli wrote:
> ...
>
> > Linux Tulip driver version 1.1.15-NAPI (Feb 27, 2007)
>
> ...
>
> > PCI: Enabling device 0000:00:0c.0 (0005 -> 0007)
> > tulip1: Old format EEPROM on 'Cobalt Microserver' board.  Using
> > substitute media control info.
> > tulip1:  EEPROM default media type Autosense.
> > tulip1:  Index #0 - Media MII (#11) described by a 21142 MII PHY (3)
> > block. tulip1:  MII transceiver #1 config 1000 status 7809 advertising
> > 01e1. eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0x12082400,
> > 00:10:e0:00:88:b9, IRQ 20.
> > [snip]
> > 0000:00:07.0: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb20e2202)
>
> Looking up these bits in the publicly available manual:
>     ftp://download.intel.com/design/network/manuals/27807401.pdf
>
> "Rev 65" == 0x41 == 21143-PD or 21143-TD (page 3-7)
>
> Operation Mode Register (CSR6­Offset 30H)
> ----
> bit : Val
> 31  : 1 Special Capture Effect Enable
> 30  : 1 Receive All
> 13  : 1 ST - Start Transmission
> 9   : 1 Full Duplex
> 1   : 1 SR - Start Receive
>
> Status Register (CSR5­Offset 28H)
> ----
> 28-31 : reserved
> 23-25 : 0x6 TX State = Suspended--Transmit FIFO underflow, or an
> unavailable transmit descriptor 20-22 : 0x3 RX State = Running--Waiting for
> receive packet
> 1     : 0x0 TX isn't stopped
> 0     : 0x0 No TX Interrupt pending
>
> The RX/TX engines are in a wedged state to begin with. :(

I suppose this is due to the Bootloader, either CoLo or the original Cobalt 
microservers bootloader.

>
> The normal calling path here is tulip_init() to register the driver
> callbacks and tulip_init_one() gets called immediately by pci subsystem.
> tulip_up() gets called after netdev registration when someone ifconfig's
> the device (ifconfig up). I don't know when else tulip_up() is called.
>
> I have two ideas on how to fix this:
> 1) reset the RX/TX engines in tulip_init_one() before tulip_stop_rxtx().
> 2) reset the RX/TX engines in tulip_stop_rxtx() if they are "wedged".
> 3) remove tulip_stop_rxtx() call in tulip_init_one()
>
>
> (1) seems like a reasonable thing to do at init time anyway.
> (2) feels like a pretty big hammer and I don't know all the side effects.
> (3) tulip_up() will reset the RX/TX engine and call tulip_stop_rxtx()
>    when the NIC is opene/ifconfig'd. Calling pci_set_master()
>    will allow the device to scribble into Host Memory...probably
>    should move that call to tulip_up() as well. But I don't see anything
>    in tulip_init_one() that requires DMA (and thus pci_set_master()).
>
> Is there any reason for NOT implementing (1) and (3)?
>
> I still need to work out where to call pci_set_master() in
> tulip_up(). If that's feasible,
>
> hrm. WTF. tulip_stop_rxtx() is called again from tulip_up() right before
> CSR6 is written. Earlier in tulip_up() we reset the chip.  Removed.
>
> Can you test the patch below?
>
> I have not tested or even compiled this patch...will do so on parisc/ia64
> machines once I get some feedback on this patch.
>
> And I just noticed pci_clear_master() is not called *anywhere*. :(
> Need to add such a call after tulip_stop_rxtx() some place (many places?).
> This patch is just RFC and not suitable for merging upstream.

The patch below does not help on my Qube2, I am still having the same message 
appearing.

>
> many thanks,
> grant
>
>
> diff --git a/drivers/net/tulip/tulip_core.c
> b/drivers/net/tulip/tulip_core.c index 2abb5d3..1aa058e 100644
> --- a/drivers/net/tulip/tulip_core.c
> +++ b/drivers/net/tulip/tulip_core.c
> @@ -470,11 +470,12 @@ media_picked:
>  		tulip_select_media(dev, 1);
>
>  	/* Start the chip's Tx to process setup frame. */
> -	tulip_stop_rxtx(tp);
>  	barrier();
>  	udelay(5);
>  	iowrite32(tp->csr6 | TxOn, ioaddr + CSR6);
>
> +	pci_set_master(pdev);	/* enabled DMA */
> +
>  	/* Enable interrupts by setting the interrupt mask. */
>  	iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR5);
>  	iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7);
> @@ -1422,11 +1423,6 @@ static int __devinit tulip_init_one (struct pci_dev
> *pdev, tulip_mwi_config (pdev, dev);
>  #endif
>
> -	/* Stop the chip's Tx and Rx processes. */
> -	tulip_stop_rxtx(tp);
> -
> -	pci_set_master(pdev);
> -
>  #ifdef CONFIG_GSC
>  	if (pdev->subsystem_vendor == PCI_VENDOR_ID_HP) {
>  		switch (pdev->subsystem_device) {
Grant Grundler May 31, 2009, 11:43 p.m. UTC | #2
On Sun, May 31, 2009 at 11:02:22PM +0200, Florian Fainelli wrote:
> Hi Grant,
...
> > The RX/TX engines are in a wedged state to begin with. :(
> 
> I suppose this is due to the Bootloader, either CoLo or the original Cobalt 
> microservers bootloader.

Yeah - either bootloader or BIOS - whatever talked to the NIC most recently.

...
> > I have not tested or even compiled this patch...will do so on parisc/ia64
> > machines once I get some feedback on this patch.
> >
> > And I just noticed pci_clear_master() is not called *anywhere*. :(
> > Need to add such a call after tulip_stop_rxtx() some place (many places?).
> > This patch is just RFC and not suitable for merging upstream.
> 
> The patch below does not help on my Qube2, I am still having the same message 
> appearing.

Are you sure?

I thought I removed all calls to tulip_stop_rxtx() in the initialization
code path and didn't think it would get called. Did I overlook one?
Can you add "dump_stack()" to tulip_stop_rxtx() failure case?

Can you also modify the driver version to make sure you are using
the correct/most recenly built module?

And the please post the dmesg output from the driver again (plus 10
lines of output  before and after).

thanks,
grant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c
index 2abb5d3..1aa058e 100644
--- a/drivers/net/tulip/tulip_core.c
+++ b/drivers/net/tulip/tulip_core.c
@@ -470,11 +470,12 @@  media_picked:
 		tulip_select_media(dev, 1);
 
 	/* Start the chip's Tx to process setup frame. */
-	tulip_stop_rxtx(tp);
 	barrier();
 	udelay(5);
 	iowrite32(tp->csr6 | TxOn, ioaddr + CSR6);
 
+	pci_set_master(pdev);	/* enabled DMA */
+
 	/* Enable interrupts by setting the interrupt mask. */
 	iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR5);
 	iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7);
@@ -1422,11 +1423,6 @@  static int __devinit tulip_init_one (struct pci_dev *pdev,
 		tulip_mwi_config (pdev, dev);
 #endif
 
-	/* Stop the chip's Tx and Rx processes. */
-	tulip_stop_rxtx(tp);
-
-	pci_set_master(pdev);
-
 #ifdef CONFIG_GSC
 	if (pdev->subsystem_vendor == PCI_VENDOR_ID_HP) {
 		switch (pdev->subsystem_device) {