diff mbox series

[net-next] tcp: add SNMP counter for the number of packets pruned from ofo queue

Message ID 1532524010-11855-1-git-send-email-laoar.shao@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series [net-next] tcp: add SNMP counter for the number of packets pruned from ofo queue | expand

Commit Message

Yafang Shao July 25, 2018, 1:06 p.m. UTC
LINUX_MIB_OFOPRUNED is used to count how many times ofo queue is pruned,
but sometimes we want to know how many packets are pruned from this queue,
that could help us to track the dropped packets.

As LINUX_MIB_OFOPRUNED is a useful event for us, so I introduce a new
SNMP counter LINUX_MIB_OFOPRUNEDROP, which could be showed in netstat as
OfoPruneDrop.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/uapi/linux/snmp.h | 1 +
 net/ipv4/proc.c           | 1 +
 net/ipv4/tcp_input.c      | 1 +
 3 files changed, 3 insertions(+)

Comments

Eric Dumazet July 25, 2018, 1:33 p.m. UTC | #1
On 07/25/2018 06:06 AM, Yafang Shao wrote:
> LINUX_MIB_OFOPRUNED is used to count how many times ofo queue is pruned,
> but sometimes we want to know how many packets are pruned from this queue,
> that could help us to track the dropped packets.
> 
> As LINUX_MIB_OFOPRUNED is a useful event for us, so I introduce a new
> SNMP counter LINUX_MIB_OFOPRUNEDROP, which could be showed in netstat as
> OfoPruneDrop.


Okay, but why tracking number of skbs that are removed ?

Skb can contain many segments (because of GRO and TCP coalescing)

So your claim of tracking dropped packets is ill defined.

Also I prefer having net tree being merged into net-next, since your patch would
add a merge conflict.
Yafang Shao July 25, 2018, 1:40 p.m. UTC | #2
On Wed, Jul 25, 2018 at 9:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 07/25/2018 06:06 AM, Yafang Shao wrote:
>> LINUX_MIB_OFOPRUNED is used to count how many times ofo queue is pruned,
>> but sometimes we want to know how many packets are pruned from this queue,
>> that could help us to track the dropped packets.
>>
>> As LINUX_MIB_OFOPRUNED is a useful event for us, so I introduce a new
>> SNMP counter LINUX_MIB_OFOPRUNEDROP, which could be showed in netstat as
>> OfoPruneDrop.
>
>
> Okay, but why tracking number of skbs that are removed ?
>

Because we want to know why packets were dropped.
If that could be show in netstat, we could easily find that it is
dropped due to ofo prune.


> Skb can contain many segments (because of GRO and TCP coalescing)
>
> So your claim of tracking dropped packets is ill defined.
>
> Also I prefer having net tree being merged into net-next, since your patch would
> add a merge conflict.
>
>
Eric Dumazet July 25, 2018, 1:55 p.m. UTC | #3
On 07/25/2018 06:40 AM, Yafang Shao wrote:

> 
> Because we want to know why packets were dropped.
> If that could be show in netstat, we could easily find that it is
> dropped due to ofo prune.


We have a counter already for these events : LINUX_MIB_OFOPRUNED

You want to add another counter tracking number of _skbs_,
which has no precise value for user,
given each skb can contain a variable number of _packets_.
Yafang Shao July 25, 2018, 1:59 p.m. UTC | #4
On Wed, Jul 25, 2018 at 9:55 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 07/25/2018 06:40 AM, Yafang Shao wrote:
>
>>
>> Because we want to know why packets were dropped.
>> If that could be show in netstat, we could easily find that it is
>> dropped due to ofo prune.
>
>
> We have a counter already for these events : LINUX_MIB_OFOPRUNED
>
> You want to add another counter tracking number of _skbs_,
> which has no precise value for user,
> given each skb can contain a variable number of _packets_.
>

Got it.

But with LINUX_MIB_OFOPRUNED, we only know  tcp_prune_ofo_queue is
called, but have no idea how many skbs are dropped.

Thanks
Yafang
Yafang Shao July 26, 2018, 1:42 a.m. UTC | #5
On Wed, Jul 25, 2018 at 9:59 PM, Yafang Shao <laoar.shao@gmail.com> wrote:
> On Wed, Jul 25, 2018 at 9:55 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>> On 07/25/2018 06:40 AM, Yafang Shao wrote:
>>
>>>
>>> Because we want to know why packets were dropped.
>>> If that could be show in netstat, we could easily find that it is
>>> dropped due to ofo prune.
>>
>>
>> We have a counter already for these events : LINUX_MIB_OFOPRUNED
>>
>> You want to add another counter tracking number of _skbs_,
>> which has no precise value for user,
>> given each skb can contain a variable number of _packets_.
>>
>
> Got it.
>
> But with LINUX_MIB_OFOPRUNED, we only know  tcp_prune_ofo_queue is
> called, but have no idea how many skbs are dropped.
>


Hi Eric,

LINUX_MIB_TCPOFOQUEUE, LINUX_MIB_TCPOFODROP and LINUX_MIB_TCPOFOMERGE
are all for the number of SKBs, but only  LINUX_MIB_OFOPRUNED is for
the event, that could lead misunderstading.
So I think introducing a counter for the number of SKB pruned could be
better, that could help us to track the whole behavior of ofo queue.
That is why I submit this patch.


Thanks
Yafang
Eric Dumazet July 26, 2018, 4:06 a.m. UTC | #6
On 07/25/2018 06:42 PM, Yafang Shao wrote:
 
> 
> Hi Eric,
> 
> LINUX_MIB_TCPOFOQUEUE, LINUX_MIB_TCPOFODROP and LINUX_MIB_TCPOFOMERGE
> are all for the number of SKBs, but only  LINUX_MIB_OFOPRUNED is for
> the event, that could lead misunderstading.
> So I think introducing a counter for the number of SKB pruned could be
> better, that could help us to track the whole behavior of ofo queue.
> That is why I submit this patch.

Sure, and I said your patch had issues.
You mixed 'packets' and 'skbs' but apparently you do not get my point.

I would rather not add another SNMP counter, and refine the current one,
trying to track something more meaningful.

The notion of 'skb' is internal to the kernel, and can not be mapped easily
to 'number of network segments' which probably is more useful for the user.

I will send this instead :

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d51fa358b2b196d0f9c258b24354813f2128a675..141a062abd0660c8f6d049de1dc7c7ecf7a7230d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5001,18 +5001,19 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
 {
        struct tcp_sock *tp = tcp_sk(sk);
        struct rb_node *node, *prev;
+       unsigned int segs = 0;
        int goal;
 
        if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
                return false;
 
-       NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
        goal = sk->sk_rcvbuf >> 3;
        node = &tp->ooo_last_skb->rbnode;
        do {
                prev = rb_prev(node);
                rb_erase(node, &tp->out_of_order_queue);
                goal -= rb_to_skb(node)->truesize;
+               segs += max_t(u16, 1, skb_shinfo(rb_to_skb(node))->gso_segs);
                tcp_drop(sk, rb_to_skb(node));
                if (!prev || goal <= 0) {
                        sk_mem_reclaim(sk);
@@ -5023,6 +5024,7 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
                }
                node = prev;
        } while (node);
+       NET_ADD_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED, segs);
        tp->ooo_last_skb = rb_to_skb(prev);
 
        /* Reset SACK state.  A conforming SACK implementation will
Yafang Shao July 26, 2018, 4:31 a.m. UTC | #7
On Thu, Jul 26, 2018 at 12:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 07/25/2018 06:42 PM, Yafang Shao wrote:
>
>>
>> Hi Eric,
>>
>> LINUX_MIB_TCPOFOQUEUE, LINUX_MIB_TCPOFODROP and LINUX_MIB_TCPOFOMERGE
>> are all for the number of SKBs, but only  LINUX_MIB_OFOPRUNED is for
>> the event, that could lead misunderstading.
>> So I think introducing a counter for the number of SKB pruned could be
>> better, that could help us to track the whole behavior of ofo queue.
>> That is why I submit this patch.
>
> Sure, and I said your patch had issues.
> You mixed 'packets' and 'skbs' but apparently you do not get my point.
>

I had noticed that I made this mistake.

> I would rather not add another SNMP counter, and refine the current one,
> trying to track something more meaningful.
>
> The notion of 'skb' is internal to the kernel, and can not be mapped easily
> to 'number of network segments' which probably is more useful for the user.
>
> I will send this instead :
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index d51fa358b2b196d0f9c258b24354813f2128a675..141a062abd0660c8f6d049de1dc7c7ecf7a7230d 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5001,18 +5001,19 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
>  {
>         struct tcp_sock *tp = tcp_sk(sk);
>         struct rb_node *node, *prev;
> +       unsigned int segs = 0;
>         int goal;
>
>         if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
>                 return false;
>
> -       NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
>         goal = sk->sk_rcvbuf >> 3;
>         node = &tp->ooo_last_skb->rbnode;
>         do {
>                 prev = rb_prev(node);
>                 rb_erase(node, &tp->out_of_order_queue);
>                 goal -= rb_to_skb(node)->truesize;
> +               segs += max_t(u16, 1, skb_shinfo(rb_to_skb(node))->gso_segs);
>                 tcp_drop(sk, rb_to_skb(node));
>                 if (!prev || goal <= 0) {
>                         sk_mem_reclaim(sk);
> @@ -5023,6 +5024,7 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
>                 }
>                 node = prev;
>         } while (node);
> +       NET_ADD_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED, segs);
>         tp->ooo_last_skb = rb_to_skb(prev);
>
>         /* Reset SACK state.  A conforming SACK implementation will
>

You patch will make it more meaningful.
How about the other ones?
I think changing all of them from the number of SKBs to the number of
network segments would be better.

Thanks
Yafang
Eric Dumazet July 26, 2018, 4:36 a.m. UTC | #8
On 07/25/2018 09:31 PM, Yafang Shao wrote:
> On Thu, Jul 26, 2018 at 12:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>> On 07/25/2018 06:42 PM, Yafang Shao wrote:
>>
>>>
>>> Hi Eric,
>>>
>>> LINUX_MIB_TCPOFOQUEUE, LINUX_MIB_TCPOFODROP and LINUX_MIB_TCPOFOMERGE
>>> are all for the number of SKBs, but only  LINUX_MIB_OFOPRUNED is for
>>> the event, that could lead misunderstading.
>>> So I think introducing a counter for the number of SKB pruned could be
>>> better, that could help us to track the whole behavior of ofo queue.
>>> That is why I submit this patch.
>>
>> Sure, and I said your patch had issues.
>> You mixed 'packets' and 'skbs' but apparently you do not get my point.
>>
> 
> I had noticed that I made this mistake.
> 
>> I would rather not add another SNMP counter, and refine the current one,
>> trying to track something more meaningful.
>>
>> The notion of 'skb' is internal to the kernel, and can not be mapped easily
>> to 'number of network segments' which probably is more useful for the user.
>>
>> I will send this instead :
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index d51fa358b2b196d0f9c258b24354813f2128a675..141a062abd0660c8f6d049de1dc7c7ecf7a7230d 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -5001,18 +5001,19 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
>>  {
>>         struct tcp_sock *tp = tcp_sk(sk);
>>         struct rb_node *node, *prev;
>> +       unsigned int segs = 0;
>>         int goal;
>>
>>         if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
>>                 return false;
>>
>> -       NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
>>         goal = sk->sk_rcvbuf >> 3;
>>         node = &tp->ooo_last_skb->rbnode;
>>         do {
>>                 prev = rb_prev(node);
>>                 rb_erase(node, &tp->out_of_order_queue);
>>                 goal -= rb_to_skb(node)->truesize;
>> +               segs += max_t(u16, 1, skb_shinfo(rb_to_skb(node))->gso_segs);
>>                 tcp_drop(sk, rb_to_skb(node));
>>                 if (!prev || goal <= 0) {
>>                         sk_mem_reclaim(sk);
>> @@ -5023,6 +5024,7 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
>>                 }
>>                 node = prev;
>>         } while (node);
>> +       NET_ADD_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED, segs);
>>         tp->ooo_last_skb = rb_to_skb(prev);
>>
>>         /* Reset SACK state.  A conforming SACK implementation will
>>
> 
> You patch will make it more meaningful.
> How about the other ones?

Same thing really.

Note that tcp_collapse() does not currently propagate skb_shinfo(skb)->gso_segs
when rebuilding the skbs, so this will need a fix.
Yafang Shao July 26, 2018, 4:42 a.m. UTC | #9
On Thu, Jul 26, 2018 at 12:36 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 07/25/2018 09:31 PM, Yafang Shao wrote:
>> On Thu, Jul 26, 2018 at 12:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>>
>>> On 07/25/2018 06:42 PM, Yafang Shao wrote:
>>>
>>>>
>>>> Hi Eric,
>>>>
>>>> LINUX_MIB_TCPOFOQUEUE, LINUX_MIB_TCPOFODROP and LINUX_MIB_TCPOFOMERGE
>>>> are all for the number of SKBs, but only  LINUX_MIB_OFOPRUNED is for
>>>> the event, that could lead misunderstading.
>>>> So I think introducing a counter for the number of SKB pruned could be
>>>> better, that could help us to track the whole behavior of ofo queue.
>>>> That is why I submit this patch.
>>>
>>> Sure, and I said your patch had issues.
>>> You mixed 'packets' and 'skbs' but apparently you do not get my point.
>>>
>>
>> I had noticed that I made this mistake.
>>
>>> I would rather not add another SNMP counter, and refine the current one,
>>> trying to track something more meaningful.
>>>
>>> The notion of 'skb' is internal to the kernel, and can not be mapped easily
>>> to 'number of network segments' which probably is more useful for the user.
>>>
>>> I will send this instead :
>>>
>>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>>> index d51fa358b2b196d0f9c258b24354813f2128a675..141a062abd0660c8f6d049de1dc7c7ecf7a7230d 100644
>>> --- a/net/ipv4/tcp_input.c
>>> +++ b/net/ipv4/tcp_input.c
>>> @@ -5001,18 +5001,19 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
>>>  {
>>>         struct tcp_sock *tp = tcp_sk(sk);
>>>         struct rb_node *node, *prev;
>>> +       unsigned int segs = 0;
>>>         int goal;
>>>
>>>         if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
>>>                 return false;
>>>
>>> -       NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
>>>         goal = sk->sk_rcvbuf >> 3;
>>>         node = &tp->ooo_last_skb->rbnode;
>>>         do {
>>>                 prev = rb_prev(node);
>>>                 rb_erase(node, &tp->out_of_order_queue);
>>>                 goal -= rb_to_skb(node)->truesize;
>>> +               segs += max_t(u16, 1, skb_shinfo(rb_to_skb(node))->gso_segs);
>>>                 tcp_drop(sk, rb_to_skb(node));
>>>                 if (!prev || goal <= 0) {
>>>                         sk_mem_reclaim(sk);
>>> @@ -5023,6 +5024,7 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
>>>                 }
>>>                 node = prev;
>>>         } while (node);
>>> +       NET_ADD_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED, segs);
>>>         tp->ooo_last_skb = rb_to_skb(prev);
>>>
>>>         /* Reset SACK state.  A conforming SACK implementation will
>>>
>>
>> You patch will make it more meaningful.
>> How about the other ones?
>
> Same thing really.
>
> Note that tcp_collapse() does not currently propagate skb_shinfo(skb)->gso_segs
> when rebuilding the skbs, so this will need a fix.
>

Got it.
Thank you very much for your comment.

Thanks
Yafang
diff mbox series

Patch

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index e5ebc83..c996fba 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -172,6 +172,7 @@  enum
 	LINUX_MIB_PRUNECALLED,			/* PruneCalled */
 	LINUX_MIB_RCVPRUNED,			/* RcvPruned */
 	LINUX_MIB_OFOPRUNED,			/* OfoPruned */
+	LINUX_MIB_OFOPRUNEDROP,			/* OfoPruneDrop */
 	LINUX_MIB_OUTOFWINDOWICMPS,		/* OutOfWindowIcmps */
 	LINUX_MIB_LOCKDROPPEDICMPS,		/* LockDroppedIcmps */
 	LINUX_MIB_ARPFILTER,			/* ArpFilter */
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index b46e4cf..c718295 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -180,6 +180,7 @@  static int sockstat_seq_show(struct seq_file *seq, void *v)
 	SNMP_MIB_ITEM("PruneCalled", LINUX_MIB_PRUNECALLED),
 	SNMP_MIB_ITEM("RcvPruned", LINUX_MIB_RCVPRUNED),
 	SNMP_MIB_ITEM("OfoPruned", LINUX_MIB_OFOPRUNED),
+	SNMP_MIB_ITEM("OfoPruneDrop", LINUX_MIB_OFOPRUNEDROP),
 	SNMP_MIB_ITEM("OutOfWindowIcmps", LINUX_MIB_OUTOFWINDOWICMPS),
 	SNMP_MIB_ITEM("LockDroppedIcmps", LINUX_MIB_LOCKDROPPEDICMPS),
 	SNMP_MIB_ITEM("ArpFilter", LINUX_MIB_ARPFILTER),
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 91dbb9a..5267121 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4980,6 +4980,7 @@  static bool tcp_prune_ofo_queue(struct sock *sk)
 	do {
 		prev = rb_prev(node);
 		rb_erase(node, &tp->out_of_order_queue);
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNEDROP);
 		tcp_drop(sk, rb_to_skb(node));
 		sk_mem_reclaim(sk);
 		if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&