UDP multicast packet loss not reported if TX ring overrun?

Message ID	1251239734.3169.65.camel@w-sridhar.beaverton.ibm.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Subject: Re: UDP multicast packet loss not reported if TX ring overrun? From: Sridhar Samudrala <sri@us.ibm.com> To: Christoph Lameter <cl@linux-foundation.org> Cc: David Stevens <dlstevens@us.ibm.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <eric.dumazet@gmail.com>, netdev@vger.kernel.org, netdev-owner@vger.kernel.org, niv@linux.vnet.ibm.com, sri@linux.vnet.ibm.com In-Reply-To: <alpine.DEB.1.10.0908251514190.17963@gentwo.org> References: <OFB18AD855.24C5AC71-ON8825761D.00687D5E-8825761D.0068BB9C@us.ibm.com> <alpine.DEB.1.10.0908251514190.17963@gentwo.org> Content-Type: text/plain Date: Tue, 25 Aug 2009 15:35:34 -0700 Message-Id: <1251239734.3169.65.camel@w-sridhar.beaverton.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk

Sridhar Samudrala Aug. 25, 2009, 10:35 p.m. UTC

On Tue, 2009-08-25 at 15:15 -0400, Christoph Lameter wrote:
> On Tue, 25 Aug 2009, David Stevens wrote:
> 
> > Christoph Lameter <cl@linux-foundation.org> wrote on 08/25/2009 06:48:24
> > AM:
> >
> > > On Mon, 24 Aug 2009, Sridhar Samudrala wrote:
> >
> > > > If we count these drops as qdisc drops, should we also count them as
> > IP OUTDISCARDS?
> > >
> > > Yes.
> >
> > Actually, no. (!)
> >
> > IP_OUTDISCARDS should count the packets IP dropped, not
> > anything dropped at a lower layer (which, in general, it
> > is not aware of). If you count these in multiple layers,
> > then you don't really know who dropped it.
> 
> You are right. I skipped that IP OUTDICARDS reference. They need to be
> accounted at the qdisc level though.

Yes. Now that we agree that drops at dev_queue_xmit level should be counted
under qdisc stats, the following patch should address 1 of the 3 places where
NET_XMIT_DROP is returned, but qdisc drop stats is not incremented.
The other 2 places are in ipsec output functions esp_output and esp6_output.
I am not sure where these drops should be accounted.

Could you check if the UDP packet losses you are seeing are accounted for in
qdisc drops with this patch. But i am not completely positive on this as this
case happens only if qdisc is deactivated.

Thanks
Sridhar

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter Aug. 26, 2009, 4:29 p.m. UTC | #1

On Tue, 25 Aug 2009, Sridhar Samudrala wrote:

> Could you check if the UDP packet losses you are seeing are accounted for in
> qdisc drops with this patch. But i am not completely positive on this as this
> case happens only if qdisc is deactivated.

This does not work. qdisc drops are still not reported. They are reported
for IP and UDP.

Test tool crashes on first TX overrun:

clameter@rd-strategy3-deb64:~$ ./mcast -n1 -r400000
Receiver: Listening to control channel 239.0.192.1
Receiver: Subscribing to 0 MC addresses 239.0.192-254.2-254 offset 0
origin 10.2.36.123
Sender: Sending 400000 msgs/ch/sec on 1 channels. Probe interval=0.001-1
sec.
sendto: No buffer space available
Socket Send error

netstat reports exactly one packet loss:


clameter@rd-strategy3-deb64:~$ netstat -su
IcmpMsg:
    InType3: 1
    OutType3: 1
Udp:
    298 packets received
    0 packets to unknown port received.
    0 packet receive errors
    7232136 packets sent
    SndbufErrors: 1

root@rd-strategy3-deb64:/home/clameter#tc -s qdisc show
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1
1 1 1 1
 Sent 6208 bytes 64 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0

SNMP report one drop:

root@rd-strategy3-deb64:/home/clameter#cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
FragOKs FragFails FragCreates
Ip: 2 64 1114 0 0 0 0 0 1114 7232754 1 0 0 0 0 0 0 0 0
Icmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs
InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks
InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs
OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps
OutTimestampReps OutAddrMasks OutAddrMaskReps
Icmp: 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0
IcmpMsg: InType3 OutType3
IcmpMsg: 1 1
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens
AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs
OutRsts
Tcp: 1 200 120000 -1 26 4 0 0 2 774 595 0 0 0
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
Udp: 308 0 0 7232146 0 1
UdpLite: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors
SndbufErrors
UdpLite: 0 0 0 0 0 0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sridhar Samudrala Aug. 26, 2009, 5:50 p.m. UTC | #2

On Wed, 2009-08-26 at 12:29 -0400, Christoph Lameter wrote:
> On Tue, 25 Aug 2009, Sridhar Samudrala wrote:
> 
> > Could you check if the UDP packet losses you are seeing are accounted for in
> > qdisc drops with this patch. But i am not completely positive on this as this
> > case happens only if qdisc is deactivated.
> 
> This does not work. qdisc drops are still not reported.

OK. So the drops are not happening in dev_queue_xmit().

>  They are reported for IP and UDP.
Not clear what you meant by this.

> Test tool crashes on first TX overrun:
> 
> clameter@rd-strategy3-deb64:~$ ./mcast -n1 -r400000
> Receiver: Listening to control channel 239.0.192.1
> Receiver: Subscribing to 0 MC addresses 239.0.192-254.2-254 offset 0
> origin 10.2.36.123
> Sender: Sending 400000 msgs/ch/sec on 1 channels. Probe interval=0.001-1
> sec.
> sendto: No buffer space available
> Socket Send error
> 
> netstat reports exactly one packet loss:
> 
> 
> clameter@rd-strategy3-deb64:~$ netstat -su
> IcmpMsg:
>     InType3: 1
>     OutType3: 1
> Udp:
>     298 packets received
>     0 packets to unknown port received.
>     0 packet receive errors
>     7232136 packets sent
>     SndbufErrors: 1
> 
> root@rd-strategy3-deb64:/home/clameter#tc -s qdisc show
> qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1
> 1 1 1 1
>  Sent 6208 bytes 64 pkt (dropped 0, overlimits 0 requeues 0)
>  rate 0bit 0pps backlog 0b 0p requeues 0

Even the Sent count seems to be too low. Are you looking at the right
device?

So based on the current analysis, the packets are getting dropped after
the call to ip_local_out() in ip_push_pending_frames(). ip_local_out()
is failing with NET_XMIT_DROP. But we are not sure where they are
getting dropped. Is that right?

I think we need to figure out where they are getting dropped and then
decide on the appropriate counter to be incremented.

Thanks
Sridhar

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter Aug. 26, 2009, 7:09 p.m. UTC | #3

On Wed, 26 Aug 2009, Sridhar Samudrala wrote:

> >  They are reported for IP and UDP.
> Not clear what you meant by this.

The SNMP and UDP statistics show the loss. qdisc level does not show the
loss.

> > root@rd-strategy3-deb64:/home/clameter#tc -s qdisc show
> > qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1
> > 1 1 1 1
> >  Sent 6208 bytes 64 pkt (dropped 0, overlimits 0 requeues 0)
> >  rate 0bit 0pps backlog 0b 0p requeues 0
>
> Even the Sent count seems to be too low. Are you looking at the right
> device?

I would think that tc displays all queues? It says eth0 and eth0 is the
device that we sent the data out on.


> So based on the current analysis, the packets are getting dropped after
> the call to ip_local_out() in ip_push_pending_frames(). ip_local_out()
> is failing with NET_XMIT_DROP. But we are not sure where they are
> getting dropped. Is that right?

ip_local_out is returning ENOBUFS. Something at the qdisc layer is
dropping the packet and not incrementing counters.

> I think we need to figure out where they are getting dropped and then
> decide on the appropriate counter to be incremented.

Right. Where in the qdisc layer do drops occur?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sridhar Samudrala Aug. 26, 2009, 10:11 p.m. UTC | #4

On Wed, 2009-08-26 at 15:09 -0400, Christoph Lameter wrote:
> On Wed, 26 Aug 2009, Sridhar Samudrala wrote:
> 
> > >  They are reported for IP and UDP.
> > Not clear what you meant by this.
> 
> The SNMP and UDP statistics show the loss. qdisc level does not show the
> loss.
> > > root@rd-strategy3-deb64:/home/clameter#tc -s qdisc show
> > > qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1
> > > 1 1 1 1
> > >  Sent 6208 bytes 64 pkt (dropped 0, overlimits 0 requeues 0)
> > >  rate 0bit 0pps backlog 0b 0p requeues 0
> >
> > Even the Sent count seems to be too low. Are you looking at the right
> > device?
> 
> I would think that tc displays all queues? It says eth0 and eth0 is the
> device that we sent the data out on.


> 
> > So based on the current analysis, the packets are getting dropped after
> > the call to ip_local_out() in ip_push_pending_frames(). ip_local_out()
> > is failing with NET_XMIT_DROP. But we are not sure where they are
> > getting dropped. Is that right?
> 
> ip_local_out is returning ENOBUFS. Something at the qdisc layer is
> dropping the packet and not incrementing counters.

Is the ENOBUFS return with your/Eric's patch? I thought you were
were seeing NET_XMIT_DROP without any patches.

> 
> > I think we need to figure out where they are getting dropped and then
> > decide on the appropriate counter to be incremented.
> 
> Right. Where in the qdisc layer do drops occur?

The normal path where the packets are dropped when the tx qlen is exceeded is
  pfifo_fast_enqueue() -> qdisc_drop()
In this path, drops are counted.
The other place is in dev_queue_xmit(), but you are not hitting that case too.

So it looks like there is another place where they are getting dropped.

Thanks
Sridhar





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter Aug. 27, 2009, 3:40 p.m. UTC | #5

On Wed, 26 Aug 2009, Sridhar Samudrala wrote:

> > ip_local_out is returning ENOBUFS. Something at the qdisc layer is
> > dropping the packet and not incrementing counters.
>
> Is the ENOBUFS return with your/Eric's patch? I thought you were
> were seeing NET_XMIT_DROP without any patches.

Both Erics latest patch and your patch were applied.

> > > I think we need to figure out where they are getting dropped and then
> > > decide on the appropriate counter to be incremented.
> >
> > Right. Where in the qdisc layer do drops occur?
>
> The normal path where the packets are dropped when the tx qlen is exceeded is
>   pfifo_fast_enqueue() -> qdisc_drop()
> In this path, drops are counted.
> The other place is in dev_queue_xmit(), but you are not hitting that case too.
>
> So it looks like there is another place where they are getting dropped.

Hmmm.. I need to find more time to dig into this.

Anyways it seems that Eric's latest patch is doing many good things for
packet loss accounting.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

UDP multicast packet loss not reported if TX ring overrun?

Commit Message

Comments

Patch