diff mbox series

[net-next] net: stmmac: remove superfluous wmb() memory barriers

Message ID 20180308103006.1197-1-niklas.cassel@axis.com
State Rejected, archived
Delegated to: David Miller
Headers show
Series [net-next] net: stmmac: remove superfluous wmb() memory barriers | expand

Commit Message

Niklas Cassel March 8, 2018, 10:30 a.m. UTC
These wmb() memory barriers are performed after the last descriptor write,
and they are followed by enable_dma_transmission()/set_tx_tail_ptr(),
i.e. a writel() to MMIO register space.
Since writel() itself performs the equivalent of a wmb() before doing the
actual write, these barriers are superfluous, and removing them should
thus not change any existing behavior.

Ordering within the descriptor writes is already ensured with dma_wmb()
barriers inside prepare_tx_desc(first, ..)/prepare_tso_tx_desc(first, ..).

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 12 ------------
 1 file changed, 12 deletions(-)

Comments

David Miller March 9, 2018, 2:50 a.m. UTC | #1
From: Niklas Cassel <niklas.cassel@axis.com>
Date: Thu,  8 Mar 2018 11:30:05 +0100

> These wmb() memory barriers are performed after the last descriptor write,
> and they are followed by enable_dma_transmission()/set_tx_tail_ptr(),
> i.e. a writel() to MMIO register space.
> Since writel() itself performs the equivalent of a wmb() before doing the
> actual write, these barriers are superfluous, and removing them should
> thus not change any existing behavior.
> 
> Ordering within the descriptor writes is already ensured with dma_wmb()
> barriers inside prepare_tx_desc(first, ..)/prepare_tso_tx_desc(first, ..).
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>

Please allow me some time to consider this issue a little bit more
before applying this patch.

Thank you Niklas.
Jose Abreu March 9, 2018, 10:26 a.m. UTC | #2
Hi Niklas,

On 08-03-2018 10:30, Niklas Cassel wrote:
> These wmb() memory barriers are performed after the last descriptor write,
> and they are followed by enable_dma_transmission()/set_tx_tail_ptr(),
> i.e. a writel() to MMIO register space.
> Since writel() itself performs the equivalent of a wmb() 

Sorry but I know at least two architectures which don't do a
wmb() upon an writel [1] [2]. This can be critical if if we are
accessing the device through some slow or filled bus which will
delay accesses to the device IO. Notice that writel and then
readl to the same address will force CPU to wait for writel
completion before readl, but in this case we are using DMA and
then writel so I think a wmb() before the writel is a safe measure.

Thanks and Best Regards,
Jose Miguel Abreu

[1]
https://elixir.bootlin.com/linux/latest/source/arch/arc/include/asm/io.h#L147,
with "CONFIG_ISA_ARCV2=n"
[2]
https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/io.h#L314,
with "CONFIG_ARM_DMA_MEM_BUFFERABLE=n"

> before doing the
> actual write, these barriers are superfluous, and removing them should
> thus not change any existing behavior.
>
> Ordering within the descriptor writes is already ensured with dma_wmb()
> barriers inside prepare_tx_desc(first, ..)/prepare_tso_tx_desc(first, ..).
>
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 12 ------------
>  1 file changed, 12 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index a9856a8bf8ad..005fb45ace30 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -2998,12 +2998,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
>  		priv->hw->desc->set_tx_owner(mss_desc);
>  	}
>  
> -	/* The own bit must be the latest setting done when prepare the
> -	 * descriptor and then barrier is needed to make sure that
> -	 * all is coherent before granting the DMA engine.
> -	 */
> -	wmb();
> -
>  	if (netif_msg_pktdata(priv)) {
>  		pr_info("%s: curr=%d dirty=%d f=%d, e=%d, f_p=%p, nfrags %d\n",
>  			__func__, tx_q->cur_tx, tx_q->dirty_tx, first_entry,
> @@ -3221,12 +3215,6 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>  		priv->hw->desc->prepare_tx_desc(first, 1, nopaged_len,
>  						csum_insertion, priv->mode, 1,
>  						last_segment, skb->len);
> -
> -		/* The own bit must be the latest setting done when prepare the
> -		 * descriptor and then barrier is needed to make sure that
> -		 * all is coherent before granting the DMA engine.
> -		 */
> -		wmb();
>  	}
>  
>  	netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len);
Pavel Machek March 9, 2018, 10:48 a.m. UTC | #3
On Fri 2018-03-09 10:26:11, Jose Abreu wrote:
> Hi Niklas,
> 
> On 08-03-2018 10:30, Niklas Cassel wrote:
> > These wmb() memory barriers are performed after the last descriptor write,
> > and they are followed by enable_dma_transmission()/set_tx_tail_ptr(),
> > i.e. a writel() to MMIO register space.
> > Since writel() itself performs the equivalent of a wmb() 
> 
> Sorry but I know at least two architectures which don't do a
> wmb() upon an writel [1] [2]. This can be critical if if we are
> accessing the device through some slow or filled bus which will
> delay accesses to the device IO. Notice that writel and then
> readl to the same address will force CPU to wait for writel
> completion before readl, but in this case we are using DMA and
> then writel so I think a wmb() before the writel is a safe measure.

This also matches documentation, as I tried to point out.
David Miller March 9, 2018, 3:15 p.m. UTC | #4
From: Jose Abreu <Jose.Abreu@synopsys.com>
Date: Fri, 9 Mar 2018 10:26:11 +0000

> Sorry but I know at least two architectures which don't do a
> wmb() upon an writel [1] [2]. This can be critical if if we are
> accessing the device through some slow or filled bus which will
> delay accesses to the device IO. Notice that writel and then
> readl to the same address will force CPU to wait for writel
> completion before readl, but in this case we are using DMA and
> then writel so I think a wmb() before the writel is a safe measure.

Wait a second.

This is not about whether there is an explicit memory barrier
instruction placed in the writel() implementation.

Are you saying that the cpu(s) in question will reorder stores in
their store buffers, even if they are to real memory vs. IOMEM?

That's really dangerous.
Niklas Cassel March 12, 2018, 8:55 a.m. UTC | #5
On Fri, Mar 09, 2018 at 10:15:20AM -0500, David Miller wrote:
> From: Jose Abreu <Jose.Abreu@synopsys.com>
> Date: Fri, 9 Mar 2018 10:26:11 +0000
> 
> > Sorry but I know at least two architectures which don't do a
> > wmb() upon an writel [1] [2]. This can be critical if if we are
> > accessing the device through some slow or filled bus which will
> > delay accesses to the device IO. Notice that writel and then
> > readl to the same address will force CPU to wait for writel
> > completion before readl, but in this case we are using DMA and
> > then writel so I think a wmb() before the writel is a safe measure.
> 
> Wait a second.
> 
> This is not about whether there is an explicit memory barrier
> instruction placed in the writel() implementation.
> 
> Are you saying that the cpu(s) in question will reorder stores in
> their store buffers, even if they are to real memory vs. IOMEM?
> 
> That's really dangerous.

Hello David,

Jose is simply responding to the commit message description of this patch.

You explained that there is an implicit memory barrier between physical memory
writes and those to MMIO register space, as long as you used writel().

I assumed that you meant writel() vs writel_relaxed(), where there latter
does not do an implicit barrier.

I also found this from you:
https://lwn.net/Articles/198995/

If my assumption was incorrect, please correct me.

As you seem to possess knowledge regarding this, you are probably the most
suited person to know if this patch simply needs a commit message rewrite,
or if it should be dropped completely.


Best regards,
Niklas
David Miller March 13, 2018, 1:20 a.m. UTC | #6
From: Niklas Cassel <niklas.cassel@axis.com>
Date: Mon, 12 Mar 2018 09:55:42 +0100

> Jose is simply responding to the commit message description of this patch.
> 
> You explained that there is an implicit memory barrier between physical memory
> writes and those to MMIO register space, as long as you used writel().
> 
> I assumed that you meant writel() vs writel_relaxed(), where there latter
> does not do an implicit barrier.
> 
> I also found this from you:
> https://lwn.net/Articles/198995/
> 
> If my assumption was incorrect, please correct me.
> 
> As you seem to possess knowledge regarding this, you are probably the most
> suited person to know if this patch simply needs a commit message rewrite,
> or if it should be dropped completely.

Yes, I have always argued that the non-relaxed {read,write}{b,w,l}()
interfaces should imply barriers wrt. physical memory accesses.

Without that, drivers are harder to write.  Specifically, drivers that
work properly on all architectures will be very hard to write.

But looking at some drivers, probably this isn't fully the case right
now.  Which is unfortunate, but we must code to reality.

For example, looking at drivers/net/ethernet/broadcom/tg3.c, we have
tg3_start_xmit() going:

	write descriptors
	...
	/* Sync BD data before updating mailbox */
	wmb();
	...
		/* Packets are ready, update Tx producer idx on card. */
		tw32_tx_mbox(tnapi->prodmbox, entry);

so it really seems to be necessary.

So this stmmac revert is not valid.

Sorry for all the confusion.  I guess it's a lot of wishful thinking on
my part :-)
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index a9856a8bf8ad..005fb45ace30 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2998,12 +2998,6 @@  static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 		priv->hw->desc->set_tx_owner(mss_desc);
 	}
 
-	/* The own bit must be the latest setting done when prepare the
-	 * descriptor and then barrier is needed to make sure that
-	 * all is coherent before granting the DMA engine.
-	 */
-	wmb();
-
 	if (netif_msg_pktdata(priv)) {
 		pr_info("%s: curr=%d dirty=%d f=%d, e=%d, f_p=%p, nfrags %d\n",
 			__func__, tx_q->cur_tx, tx_q->dirty_tx, first_entry,
@@ -3221,12 +3215,6 @@  static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 		priv->hw->desc->prepare_tx_desc(first, 1, nopaged_len,
 						csum_insertion, priv->mode, 1,
 						last_segment, skb->len);
-
-		/* The own bit must be the latest setting done when prepare the
-		 * descriptor and then barrier is needed to make sure that
-		 * all is coherent before granting the DMA engine.
-		 */
-		wmb();
 	}
 
 	netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len);