[RFC] tulip: Support for byte queue limits

Message ID	20130712180116.21176.qmail@science.horizon.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Date: 12 Jul 2013 14:01:16 -0400 Message-ID: <20130712180116.21176.qmail@science.horizon.com> From: "George Spelvin" <linux@horizon.com> To: grantgrundler@gmail.com, linux@horizon.com Subject: Re: [RFC] tulip: Support for byte queue limits Cc: eric.dumazet@gmail.com, grundler@parisc-linux.org, netdev@vger.kernel.org In-Reply-To: <CAP6odjhsL8QPKwurP_H88M3kAoBW-z_jGHtEhL0Tj7AfZNutHA@mail.gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk

George Spelvin July 12, 2013, 6:01 p.m. UTC

> Hi George,
> While you are right that functionally it doesn't matter, my preference
> would be to have nothing between the wmb() and iowrite() that kicks
> off the TX. This marginally helps kick off the TX process consistently
> slightly sooner. On modern HW, probably irrelevant, but not on the HW
> these chips are used on.

I'll revise it.  It just made sense to me to put it next to the other
bookkeeping line of tp->cur_tx++.  Should I move them both below the
iowrite()?  As in:


> Lastly, given I haven't powered up a system in two years which has
> tulip, any one want to take over maintainer for tulip driver?
> It's basically obsolete with a few rare patches like this one coming in.

I'm not up to it myself, sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Grant Grundler July 12, 2013, 6:14 p.m. UTC | #1

On Fri, Jul 12, 2013 at 11:01 AM, George Spelvin <linux@horizon.com> wrote:
>> Hi George,
>> While you are right that functionally it doesn't matter, my preference
>> would be to have nothing between the wmb() and iowrite() that kicks
>> off the TX. This marginally helps kick off the TX process consistently
>> slightly sooner. On modern HW, probably irrelevant, but not on the HW
>> these chips are used on.
>
> I'll revise it.  It just made sense to me to put it next to the other
> bookkeeping line of tp->cur_tx++.  Should I move them both below the
> iowrite()?

I would prefer that. I agree it's better to keep those two lines of
code together.

Just keep in mind this is a nit and it's not critical to accepting your change.
So whatever you submit, I'll probably ACK.

>  As in:
>
> --- a/drivers/net/ethernet/dec/tulip/tulip_core.c
> +++ b/drivers/net/ethernet/dec/tulip/tulip_core.c
> @@ -702,11 +702,11 @@ tulip_start_xmit(struct sk_buff *skb, struct net_device *dev)
>         tp->tx_ring[entry].status = cpu_to_le32(DescOwned);
>         wmb();
>
> -       tp->cur_tx++;
> -
>         /* Trigger an immediate transmit demand. */
>         iowrite32(0, tp->base_addr + CSR1);
>
> +       tp->cur_tx++;
> +       netdev_sent_queue(dev, skb->len);
>         spin_unlock_irqrestore(&tp->lock, flags);
>
>         return NETDEV_TX_OK;

Yup - looks good.

>> Lastly, given I haven't powered up a system in two years which has
>> tulip, any one want to take over maintainer for tulip driver?
>> It's basically obsolete with a few rare patches like this one coming in.
>
> I'm not up to it myself, sorry.

No worries. Just wanted to advertise to anyone who bothers to read
about tulip patches. :)

cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ben Hutchings July 15, 2013, 11:58 p.m. UTC | #2

On Fri, 2013-07-12 at 14:01 -0400, George Spelvin wrote:
> > Hi George,
> > While you are right that functionally it doesn't matter, my preference
> > would be to have nothing between the wmb() and iowrite() that kicks
> > off the TX. This marginally helps kick off the TX process consistently
> > slightly sooner. On modern HW, probably irrelevant, but not on the HW
> > these chips are used on.
> 
> I'll revise it.  It just made sense to me to put it next to the other
> bookkeeping line of tp->cur_tx++.  Should I move them both below the
> iowrite()?  As in:
> 
> --- a/drivers/net/ethernet/dec/tulip/tulip_core.c
> +++ b/drivers/net/ethernet/dec/tulip/tulip_core.c
> @@ -702,11 +702,11 @@ tulip_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	tp->tx_ring[entry].status = cpu_to_le32(DescOwned);
>  	wmb();
>  
> -	tp->cur_tx++;
> -
>  	/* Trigger an immediate transmit demand. */
>  	iowrite32(0, tp->base_addr + CSR1);
>  
> +	tp->cur_tx++;
> +	netdev_sent_queue(dev, skb->len);

This is not good practice, because once you start DMA you have
effectively passed ownership of the skb to the TX completion handler.

>  	spin_unlock_irqrestore(&tp->lock, flags);

Presumably the TX completion handler will hold this spinlock and
therefore cannot free the skb before you use skb->len above.  So this
will be safe now.  But one day someone may want to get rid of this lock,
so this is a trap waiting to spring.

Ben.

>  	return NETDEV_TX_OK;
> 
> > Lastly, given I haven't powered up a system in two years which has
> > tulip, any one want to take over maintainer for tulip driver?
> > It's basically obsolete with a few rare patches like this one coming in.
> 
> I'm not up to it myself, sorry.

Grant Grundler July 16, 2013, 1:17 a.m. UTC | #3

On Mon, Jul 15, 2013 at 4:58 PM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Fri, 2013-07-12 at 14:01 -0400, George Spelvin wrote:
>> > Hi George,
>> > While you are right that functionally it doesn't matter, my preference
>> > would be to have nothing between the wmb() and iowrite() that kicks
>> > off the TX. This marginally helps kick off the TX process consistently
>> > slightly sooner. On modern HW, probably irrelevant, but not on the HW
>> > these chips are used on.
>>
>> I'll revise it.  It just made sense to me to put it next to the other
>> bookkeeping line of tp->cur_tx++.  Should I move them both below the
>> iowrite()?  As in:
>>
>> --- a/drivers/net/ethernet/dec/tulip/tulip_core.c
>> +++ b/drivers/net/ethernet/dec/tulip/tulip_core.c
>> @@ -702,11 +702,11 @@ tulip_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>       tp->tx_ring[entry].status = cpu_to_le32(DescOwned);
>>       wmb();
>>
>> -     tp->cur_tx++;
>> -
>>       /* Trigger an immediate transmit demand. */
>>       iowrite32(0, tp->base_addr + CSR1);
>>
>> +     tp->cur_tx++;
>> +     netdev_sent_queue(dev, skb->len);
>
> This is not good practice, because once you start DMA you have
> effectively passed ownership of the skb to the TX completion handler.

Is the problem the reference to skb->len?
By passing ownership,  are you suggesting the device can change this value?

AFAIK, tulip device only knows about the contents of tx_ring[] and not skb's.
Would you like to see a comment added to that effect?

>>       spin_unlock_irqrestore(&tp->lock, flags);
>
> Presumably the TX completion handler will hold this spinlock and
> therefore cannot free the skb before you use skb->len above.  So this
> will be safe now.

Correct.

> But one day someone may want to get rid of this lock,
> so this is a trap waiting to spring.

Even for tulip driver?  Sorry, I just can't imagine anyone taking
enough interest in tulip driver to implement that.  I'm not even sure
it would be possible.

thanks!
grant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet July 16, 2013, 1:27 a.m. UTC | #4

On Mon, 2013-07-15 at 18:17 -0700, Grant Grundler wrote:

> Even for tulip driver?  Sorry, I just can't imagine anyone taking
> enough interest in tulip driver to implement that.  I'm not even sure
> it would be possible.
> 

I would not care. When I reviewed this patch, I was aware of this
problem (as this is the most common error done in BQL patches), and
decided patch was fine, because lock was held.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ben Hutchings July 16, 2013, 2:37 p.m. UTC | #5

On Mon, 2013-07-15 at 18:17 -0700, Grant Grundler wrote:
> On Mon, Jul 15, 2013 at 4:58 PM, Ben Hutchings
> <bhutchings@solarflare.com> wrote:
> > On Fri, 2013-07-12 at 14:01 -0400, George Spelvin wrote:
> >> > Hi George,
> >> > While you are right that functionally it doesn't matter, my preference
> >> > would be to have nothing between the wmb() and iowrite() that kicks
> >> > off the TX. This marginally helps kick off the TX process consistently
> >> > slightly sooner. On modern HW, probably irrelevant, but not on the HW
> >> > these chips are used on.
> >>
> >> I'll revise it.  It just made sense to me to put it next to the other
> >> bookkeeping line of tp->cur_tx++.  Should I move them both below the
> >> iowrite()?  As in:
> >>
> >> --- a/drivers/net/ethernet/dec/tulip/tulip_core.c
> >> +++ b/drivers/net/ethernet/dec/tulip/tulip_core.c
> >> @@ -702,11 +702,11 @@ tulip_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >>       tp->tx_ring[entry].status = cpu_to_le32(DescOwned);
> >>       wmb();
> >>
> >> -     tp->cur_tx++;
> >> -
> >>       /* Trigger an immediate transmit demand. */
> >>       iowrite32(0, tp->base_addr + CSR1);
> >>
> >> +     tp->cur_tx++;
> >> +     netdev_sent_queue(dev, skb->len);
> >
> > This is not good practice, because once you start DMA you have
> > effectively passed ownership of the skb to the TX completion handler.
> 
> Is the problem the reference to skb->len?
> By passing ownership,  are you suggesting the device can change this value?

No, the device can complete the descriptor and then the TX completion
handler will free the skb.

[...]
> > But one day someone may want to get rid of this lock,
> > so this is a trap waiting to spring.
> 
> Even for tulip driver?  Sorry, I just can't imagine anyone taking
> enough interest in tulip driver to implement that.  I'm not even sure
> it would be possible.

You're taking interest in it, aren't you?  But I accept this is a minor
issue.

Ben.

Grant Grundler July 16, 2013, 5:20 p.m. UTC | #6

On Tue, Jul 16, 2013 at 7:37 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Mon, 2013-07-15 at 18:17 -0700, Grant Grundler wrote:
...
>> Is the problem the reference to skb->len?
>> By passing ownership,  are you suggesting the device can change this value?
>
> No, the device can complete the descriptor and then the TX completion
> handler will free the skb.

Ah ok. That's what the spinlock_irqsave() is protecting against.

> [...]
>> > But one day someone may want to get rid of this lock,
>> > so this is a trap waiting to spring.
>>
>> Even for tulip driver?  Sorry, I just can't imagine anyone taking
>> enough interest in tulip driver to implement that.  I'm not even sure
>> it would be possible.
>
> You're taking interest in it, aren't you?

Let me clarify my interest: I will reject patches to change the
locking for this driver. Patch reviews for this driver are a
"community service project" since I know reasonably well how this chip
behaves. But I'm not doing anything else (e.g. bug fixes or testing).

If someone else wants to change the tulip driver locking, they need to
submit a patch to become the new maintainer first (I'll ACK that :)

>  But I accept this is a minor issue.

Ok.

cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

George Spelvin July 17, 2013, 4:09 a.m. UTC | #7

>>  	wmb();
>>  
>> -	tp->cur_tx++;
>> -
>>  	/* Trigger an immediate transmit demand. */
>>  	iowrite32(0, tp->base_addr + CSR1);
>>  
>> +	tp->cur_tx++;
>> +	netdev_sent_queue(dev, skb->len);
>>  	spin_unlock_irqrestore(&tp->lock, flags);

> This is not good practice, because once you start DMA you have
> effectively passed ownership of the skb to the TX completion handler.

Thank you for this advice.  Just to be clear, is the only issue reading
skb->len from a potentially deallocated skb?  Or is it also going go
give the byte queue system fits if the transmit complete handler calls
netdev_completed_queue before the transmitter calls netdev_sent_queue?

I'd hope it can underflow safely, and only look at the net value after
the transmit handler returns.

> Presumably the TX completion handler will hold this spinlock and
> therefore cannot free the skb before you use skb->len above.  So this
> will be safe now.  But one day someone may want to get rid of this lock,
> so this is a trap waiting to spring.

Sounds like a fun project.  But I have to dig into the BQL code; if it's
getting and dropping locks inside netdev_*_queue, the win is limited.

(A lock-free version would have separate "sent" and "completed" counters,
and compute the difference when a snapshot is required.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ben Hutchings July 17, 2013, 3:42 p.m. UTC | #8

On Wed, 2013-07-17 at 00:09 -0400, George Spelvin wrote:
> >>  	wmb();
> >>  
> >> -	tp->cur_tx++;
> >> -
> >>  	/* Trigger an immediate transmit demand. */
> >>  	iowrite32(0, tp->base_addr + CSR1);
> >>  
> >> +	tp->cur_tx++;
> >> +	netdev_sent_queue(dev, skb->len);
> >>  	spin_unlock_irqrestore(&tp->lock, flags);
> 
> > This is not good practice, because once you start DMA you have
> > effectively passed ownership of the skb to the TX completion handler.
> 
> Thank you for this advice.  Just to be clear, is the only issue reading
> skb->len from a potentially deallocated skb?

That's what I was thinking of.

> Or is it also going go
> give the byte queue system fits if the transmit complete handler calls
> netdev_completed_queue before the transmitter calls netdev_sent_queue?

I don't know.

> I'd hope it can underflow safely, and only look at the net value after
> the transmit handler returns.
> 
> > Presumably the TX completion handler will hold this spinlock and
> > therefore cannot free the skb before you use skb->len above.  So this
> > will be safe now.  But one day someone may want to get rid of this lock,
> > so this is a trap waiting to spring.
> 
> Sounds like a fun project.  But I have to dig into the BQL code; if it's
> getting and dropping locks inside netdev_*_queue, the win is limited.
> 
> (A lock-free version would have separate "sent" and "completed" counters,
> and compute the difference when a snapshot is required.)

Yes, it is lock-free with separate counters.  The implementation is in
lib/dynamic_queue_limits.c.  I think that the use of POSDIFF() protects
against races that could leave completed > sent.

Ben.

George Spelvin July 31, 2013, 12:34 p.m. UTC | #9

Grumble.  After a week of uptime with my tulip patches
(the ones I posted here plus some cleanup patches) my
kernel gave me this:

------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xd9/0x137()
NETDEV WATCHDOG: cable (tulip): transmit queue 0 timed out
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.2-00033-gca137af #47
Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7376/MS-7376, BIOS V1.7 01/13/2009
 0000000000000000 ffffffff81028aba ffff88021fc03e68 ffff880216b38000
 ffff88021fc03eb8 ffffffff8133643a ffffffff8133643a ffffffff81028b33
 ffffffff8159fe93 ffff880200000030 ffff88021fc03ec8 ffff88021fc03e88
Call Trace:
 <IRQ>  [<ffffffff81028aba>] ? warn_slowpath_common+0x56/0x6b
 [<ffffffff8133643a>] ? qdisc_rcu_free+0x19/0x19
 [<ffffffff8133643a>] ? qdisc_rcu_free+0x19/0x19
 [<ffffffff81028b33>] ? warn_slowpath_fmt+0x47/0x49
 [<ffffffff8133634e>] ? netif_tx_lock+0x47/0x72
 [<ffffffff81336513>] ? dev_watchdog+0xd9/0x137
 [<ffffffff81031f40>] ? call_timer_fn.isra.29+0x1c/0x6f
 [<ffffffff810321a2>] ? run_timer_softirq+0x19a/0x1c2
 [<ffffffff8102de09>] ? __do_softirq+0xb9/0x171
 [<ffffffff8102df7f>] ? irq_exit+0x3a/0x7a
 [<ffffffff8101967a>] ? smp_apic_timer_interrupt+0x72/0x7e
 [<ffffffff8142a60a>] ? apic_timer_interrupt+0x6a/0x70
 <EOI>  [<ffffffff81007e34>] ? amd_e400_idle+0xbf/0xc1
 [<ffffffff81007e2c>] ? amd_e400_idle+0xb7/0xc1
 [<ffffffff8104f74d>] ? cpu_startup_entry+0x9c/0xec
 [<ffffffff8164eb91>] ? start_kernel+0x2bd/0x2c8
 [<ffffffff8164e6f7>] ? repair_env_string+0x54/0x54
---[ end trace fa3269ab5c1a15ad ]---

Since it's on the tulip device, it's probably my fault, and
since it took a week to manifest itself, it's going to be
a complete PITA to reproduce.

Anyway, I haven't forgotten.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] tulip: Support for byte queue limits

Commit Message

Comments

Patch