diff mbox

Strange network timeouts w/ 2.6.30.5

Message ID m3skfmk39q.fsf@intrepid.localdomain
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Krzysztof Halasa Aug. 20, 2009, 1:45 p.m. UTC
Walt Holman <walt@holmansrus.com> writes:

> dmesg is attached. This box does have >2GB Ram (6GB total).  The dmesg
> will show e100 init'd 3 times since the first is the stock modprobe,
> 2nd was forced with use_io and the 3rd modprobe was after reverting
> the patch.

You most probably can't test without swiotlb (RAM has to be limited to
2 GB or so), can you? That would (dis)prove my theory. Alternatively (or
better), a test on IOMMU-equipped system would do.

Since swiotlb is x86-only thing (though other 64-bit archs may have
something similar), I think the correct work around is to enable the
"for_device" handoff on !X86.

Something like maybe:

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>

Comments

David Miller Aug. 20, 2009, 7:28 p.m. UTC | #1
From: Krzysztof Halasa <khc@pm.waw.pl>
Date: Thu, 20 Aug 2009 15:45:21 +0200

> Something like maybe:
> 
> Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
> 
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> index 014dfb6..b610088 100644
> --- a/drivers/net/e100.c
> +++ b/drivers/net/e100.c
> @@ -1762,9 +1762,12 @@ static int e100_rx_indicate(struct nic *nic, struct rx *rx,
>  
>  			if (ioread8(&nic->csr->scb.status) & rus_no_res)
>  				nic->ru_running = RU_SUSPENDED;
> +#ifndef CONFIG_X86
> +		/* FIXME interferes with swiotlb. */
>  		pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
>  					       sizeof(struct rfd),
>  					       PCI_DMA_BIDIRECTIONAL);
> +#endif
>  		return -ENODATA;
>  	}
>  

I'm not willing to apply something like this.

swiotlb emulates what hardware does, so if it can go wrong with
swiotlb it can go wrong with hardware to.

Figure out what the exact bug is.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa Aug. 20, 2009, 8:28 p.m. UTC | #2
David Miller <davem@davemloft.net> writes:

> swiotlb emulates what hardware does, so if it can go wrong with
> swiotlb it can go wrong with hardware to.
>
> Figure out what the exact bug is.

I think I already have.

The exact bug is using streaming allocations for the descriptor.
It can't work consistently on all platforms, period. Streaming
allocation can only have one owner (either CPU or device) at a time, and
e100 driver wants access (for examining desc status) simultaneously with
the hardware (which may alter desc status at any time).

On ARM with the previous patch applied it can work because the CPU cache
has the "dirty" bits (e100 driver only reads from the descriptors).
On x86 without swiotlb it can work because streaming allocations are
already coherent.
On x86 with swiotlb it can't really work reliably (and if does, it does
by pure luck) because (I guess) swiotlb has no "dirty" flag and can't
know when it doesn't need to flush.

There is no other fix than to convert the desc rings to coherent
allocs. I'm going to do precisely that in few days, but we're stuck with
the existing code in 2.6.31 (and 2.6.30.x etc).
Walt Holman Aug. 20, 2009, 10:21 p.m. UTC | #3
----- "Krzysztof Halasa" <khc@pm.waw.pl> wrote:

> Walt Holman <walt@holmansrus.com> writes:
> 
> > dmesg is attached. This box does have >2GB Ram (6GB total).  The
> dmesg
> > will show e100 init'd 3 times since the first is the stock
> modprobe,
> > 2nd was forced with use_io and the 3rd modprobe was after reverting
> > the patch.
> 
> You most probably can't test without swiotlb (RAM has to be limited
> to
> 2 GB or so), can you? That would (dis)prove my theory. Alternatively
> (or
> better), a test on IOMMU-equipped system would do.

Would something like passing a mem=xx cmdline on x86_64 be sufficient to test this?

-Walt



> 
> Since swiotlb is x86-only thing (though other 64-bit archs may have
> something similar), I think the correct work around is to enable the
> "for_device" handoff on !X86.
> 
> Something like maybe:
> 
> Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
> 
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> index 014dfb6..b610088 100644
> --- a/drivers/net/e100.c
> +++ b/drivers/net/e100.c
> @@ -1762,9 +1762,12 @@ static int e100_rx_indicate(struct nic *nic,
> struct rx *rx,
>  
>  			if (ioread8(&nic->csr->scb.status) & rus_no_res)
>  				nic->ru_running = RU_SUSPENDED;
> +#ifndef CONFIG_X86
> +		/* FIXME interferes with swiotlb. */
>  		pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
>  					       sizeof(struct rfd),
>  					       PCI_DMA_BIDIRECTIONAL);
> +#endif
>  		return -ENODATA;
>  	}
>  
> -- 
> Krzysztof Halasa
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa Aug. 21, 2009, 9:44 a.m. UTC | #4
Walt Holman <walt@holmansrus.com> writes:

> Would something like passing a mem=xx cmdline on x86_64 be sufficient to test this?

I think so, though I admit I don't remember using this personally since
the introduction of e820 RAM mapping support.
Dmesg will show if the memory is limited. For the swiotlb to effectively
disable no RAM > 0x100000000 may be used.

But this test isn't IMHO terribly important at this point - the driver
makes invalid use of the DMA API and it has to change. The test could
only explain _how_ exactly does it fail, we already know _why_ it does.
Aviv Greenberg Aug. 23, 2009, 7:18 p.m. UTC | #5
On Thu, Aug 20, 2009 at 03:45:21PM +0200, Krzysztof Halasa wrote:
 
> Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
> 
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> index 014dfb6..b610088 100644
> --- a/drivers/net/e100.c
> +++ b/drivers/net/e100.c
> @@ -1762,9 +1762,12 @@ static int e100_rx_indicate(struct nic *nic, struct rx *rx,
>  
>  			if (ioread8(&nic->csr->scb.status) & rus_no_res)
>  				nic->ru_running = RU_SUSPENDED;
> +#ifndef CONFIG_X86
> +		/* FIXME interferes with swiotlb. */
>  		pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
>  					       sizeof(struct rfd),
>  					       PCI_DMA_BIDIRECTIONAL);
> +#endif
>  		return -ENODATA;
>  	}
>  
> -- 

Why do you use PCI_DMA_BIDIRECTIONAL? From what i get, the device writes and the
cpu reads - so the direction should be PCI_DMA_FROMDEVICE. Maybe just maybe the 
bi-directionality bites you in the swiotlb case.

> Krzysztof Halasa
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa Aug. 23, 2009, 8:02 p.m. UTC | #6
Aviv Greenberg <avivgnet@gmail.com> writes:

> Why do you use PCI_DMA_BIDIRECTIONAL? From what i get, the device writes and the
> cpu reads - so the direction should be PCI_DMA_FROMDEVICE. Maybe just maybe the 
> bi-directionality bites you in the swiotlb case.

Yes, it's just been confirmed that FROMDEVICE at this point happens to
work.
diff mbox

Patch

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 014dfb6..b610088 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -1762,9 +1762,12 @@  static int e100_rx_indicate(struct nic *nic, struct rx *rx,
 
 			if (ioread8(&nic->csr->scb.status) & rus_no_res)
 				nic->ru_running = RU_SUSPENDED;
+#ifndef CONFIG_X86
+		/* FIXME interferes with swiotlb. */
 		pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
 					       sizeof(struct rfd),
 					       PCI_DMA_BIDIRECTIONAL);
+#endif
 		return -ENODATA;
 	}