[RFC,net-next] pktgen: introduce 'rx' mode

Message ID	1430273488-8403-2-git-send-email-ast@plumgrid.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Alexei Starovoitov <ast@plumgrid.com> To: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com>, Daniel Borkmann <daniel@iogearbox.net>, Thomas Graf <tgraf@suug.ch>, Jamal Hadi Salim <jhs@mojatatu.com>, John Fastabend <john.r.fastabend@intel.com>, netdev@vger.kernel.org Subject: [PATCH RFC net-next] pktgen: introduce 'rx' mode Date: Tue, 28 Apr 2015 19:11:28 -0700 Message-Id: <1430273488-8403-2-git-send-email-ast@plumgrid.com> In-Reply-To: <1430273488-8403-1-git-send-email-ast@plumgrid.com> References: <1430273488-8403-1-git-send-email-ast@plumgrid.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk

Alexei Starovoitov April 29, 2015, 2:11 a.m. UTC

Introduce 'RX' mode for pktgen which generates the packets
using familiar pktgen commands, but feeds them into netif_receive_skb()
instead of ndo_start_xmit().
It can be used to benchmark different kernel RX paths.

Sample script 'pktgen.sh':
!/bin/bash
function pgset() {
  local result

  echo $1 > $PGDEV

  result=`cat $PGDEV | fgrep "Result: OK:"`
  if [ "$result" = "" ]; then
    cat $PGDEV | fgrep Result:
  fi
}

ETH=$1

PGDEV=/proc/net/pktgen/kpktgend_0
pgset "rem_device_all"
pgset "add_device $ETH"

PGDEV=/proc/net/pktgen/$ETH
pgset "rx"
pgset "clone_skb 1000"
pgset "pkt_size 60"
pgset "dst 99.1.1.2"
pgset "dst_mac 90:e2:ba:6e:a8:e5"
pgset "count 10000000"
pgset "burst 32"

PGDEV=/proc/net/pktgen/pgctrl
echo "Running... ctrl^C to stop"
pgset "start"
echo "Done"
cat /proc/net/pktgen/$ETH

Usage:
$ sudo ./pktgen.sh eth2
...
Result: OK: 232376(c232372+d3) usec, 10000000 (60byte,0frags)
  43033682pps 20656Mb/sec (20656167360bps) errors: 10000000

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 net/core/pktgen.c |   30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

Eric Dumazet April 29, 2015, 4:14 a.m. UTC | #1

On Tue, 2015-04-28 at 19:11 -0700, Alexei Starovoitov wrote:

> +	if (pkt_dev->flags & F_DO_RX) {
> +		local_bh_disable();
> +		atomic_add(burst, &pkt_dev->skb->users);
> +		do {
> +			ret = netif_receive_skb(pkt_dev->skb);
> +			if (ret == NET_RX_DROP)
> +				pkt_dev->errors++;
> +			pkt_dev->last_ok = 1;
> +			pkt_dev->sofar++;
> +			pkt_dev->seq_num++;
> +		} while (--burst > 0);
> +		local_bh_enable();
> +		goto out;
> +	}
> +

This looks buggy.

skb can be put on a queue, so skb->next and skb->prev cannot be reused,
or queues will be corrupted.

Note that on TX, it is possible to have the same issue if you use a
virtual device like bonding, and skb is queued on a slave qdisc.

(Thats why we have this IFF_TX_SKB_SHARING flag)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexei Starovoitov April 29, 2015, 9:55 p.m. UTC | #2

On 4/28/15 9:14 PM, Eric Dumazet wrote:
> On Tue, 2015-04-28 at 19:11 -0700, Alexei Starovoitov wrote:
>
>> +	if (pkt_dev->flags & F_DO_RX) {
>> +		local_bh_disable();
>> +		atomic_add(burst, &pkt_dev->skb->users);
>> +		do {
>> +			ret = netif_receive_skb(pkt_dev->skb);
>> +			if (ret == NET_RX_DROP)
>> +				pkt_dev->errors++;
>> +			pkt_dev->last_ok = 1;
>> +			pkt_dev->sofar++;
>> +			pkt_dev->seq_num++;
>> +		} while (--burst > 0);
>> +		local_bh_enable();
>> +		goto out;
>> +	}
>> +
>
> This looks buggy.
>
> skb can be put on a queue, so skb->next and skb->prev cannot be reused,
> or queues will be corrupted.

don't see the bug yet.
Any layer that wants to do such queueing should do skb_share_check
first. Just like ip_rcv does. So everything in IP world should
work fine, because it will be operating on clean cloned skb.

> Note that on TX, it is possible to have the same issue if you use a
> virtual device like bonding, and skb is queued on a slave qdisc.
>
> (Thats why we have this IFF_TX_SKB_SHARING flag)

yep, that's why xmit into veth is not useful for benchmarking of rx,
since the hottest functions are alloc_skb and fill_packet, while
netif_receive_skb is not even seen in top 10.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet April 29, 2015, 10:19 p.m. UTC | #3

On Wed, 2015-04-29 at 14:55 -0700, Alexei Starovoitov wrote:
> On 4/28/15 9:14 PM, Eric Dumazet wrote:
> > On Tue, 2015-04-28 at 19:11 -0700, Alexei Starovoitov wrote:
> >
> >> +	if (pkt_dev->flags & F_DO_RX) {
> >> +		local_bh_disable();
> >> +		atomic_add(burst, &pkt_dev->skb->users);
> >> +		do {
> >> +			ret = netif_receive_skb(pkt_dev->skb);
> >> +			if (ret == NET_RX_DROP)
> >> +				pkt_dev->errors++;
> >> +			pkt_dev->last_ok = 1;
> >> +			pkt_dev->sofar++;
> >> +			pkt_dev->seq_num++;
> >> +		} while (--burst > 0);
> >> +		local_bh_enable();
> >> +		goto out;
> >> +	}
> >> +
> >
> > This looks buggy.
> >
> > skb can be put on a queue, so skb->next and skb->prev cannot be reused,
> > or queues will be corrupted.
> 
> don't see the bug yet.
> Any layer that wants to do such queueing should do skb_share_check
> first. Just like ip_rcv does. So everything in IP world should
> work fine, because it will be operating on clean cloned skb.

Really this is what _you_ think is needed, so that your patch can fly.

In current state of the stack, the skb_share_check() is done where we
know that packet _might_ be delivered to multiple end points
(deliver_skb() does atomic_inc(&skb->users) )

But RPS/RFS/GRO do not care of your new rule.

Yes, before reaching __netif_receive_skb_core(), packets are supposed to
belong to the stack. We are supposed to queue them, without adding a
check for skb->users being one or not, and eventually add an expensive
memory allocation/copy.

We are not going to add an extra check just to make pktgen rx fast.
pktgen will have to comply to existing rules.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexei Starovoitov April 29, 2015, 10:38 p.m. UTC | #4

On 4/29/15 3:19 PM, Eric Dumazet wrote:
> On Wed, 2015-04-29 at 14:55 -0700, Alexei Starovoitov wrote:
>> On 4/28/15 9:14 PM, Eric Dumazet wrote:
>>> On Tue, 2015-04-28 at 19:11 -0700, Alexei Starovoitov wrote:
>>>
>>>
>>> This looks buggy.
>>>
>>> skb can be put on a queue, so skb->next and skb->prev cannot be reused,
>>> or queues will be corrupted.
>>
>> don't see the bug yet.
>> Any layer that wants to do such queueing should do skb_share_check
>> first. Just like ip_rcv does. So everything in IP world should
>> work fine, because it will be operating on clean cloned skb.
>
> Really this is what _you_ think is needed, so that your patch can fly.
>
> In current state of the stack, the skb_share_check() is done where we
> know that packet _might_ be delivered to multiple end points
> (deliver_skb() does atomic_inc(&skb->users) )
>
> But RPS/RFS/GRO do not care of your new rule.
>
> Yes, before reaching __netif_receive_skb_core(), packets are supposed to
> belong to the stack. We are supposed to queue them, without adding a
> check for skb->users being one or not, and eventually add an expensive
> memory allocation/copy.
>
> We are not going to add an extra check just to make pktgen rx fast.
> pktgen will have to comply to existing rules.

I'm not making and not suggesting any new rules.
ip_rcv is doing this skb_share_check() not because of pktgen rx,
but because there can be taps and deliver_skb() as you said.
gro has a different interface and this pktgen cannot benchmark it.
rps/rfs is not benchmarkable but this approach either.
To me this is all fine. I'm not trying to do a universal
benchmarking tool. This one is dumb and simple and primarily
oriented to benchmark changes to netif_receive_skb and ingress
qdisc only. I'm not suggesting to use it everywhere.
I already mentioned in cover letter:
"The profile dump looks as expected for RX of UDP packets
without local socket except presence of __skb_clone."
Clearly I'm not suggesting to use pktgen rx to optimize IP stack
and not suggesting at all that stack should assume users!=1
when skb hits netif_receive_skb. Today at the beginning of
netif_receive_skb we know that users==1 without checking.
I'm not changing that assumption.
Just like pktgen xmit path is cheating little bit while
benchmarking TX, I'm cheating a little bit with users!=1 on RX.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet April 29, 2015, 10:56 p.m. UTC | #5

On Wed, 2015-04-29 at 15:38 -0700, Alexei Starovoitov wrote:

> I'm not making and not suggesting any new rules.
> ip_rcv is doing this skb_share_check() not because of pktgen rx,
> but because there can be taps and deliver_skb() as you said.
> gro has a different interface and this pktgen cannot benchmark it.
> rps/rfs is not benchmarkable but this approach either.
> To me this is all fine. I'm not trying to do a universal
> benchmarking tool. This one is dumb and simple and primarily
> oriented to benchmark changes to netif_receive_skb and ingress
> qdisc only. I'm not suggesting to use it everywhere.
> I already mentioned in cover letter:
> "The profile dump looks as expected for RX of UDP packets
> without local socket except presence of __skb_clone."
> Clearly I'm not suggesting to use pktgen rx to optimize IP stack
> and not suggesting at all that stack should assume users!=1
> when skb hits netif_receive_skb. Today at the beginning of
> netif_receive_skb we know that users==1 without checking.
> I'm not changing that assumption.
> Just like pktgen xmit path is cheating little bit while
> benchmarking TX, I'm cheating a little bit with users!=1 on RX.

Look, you don't have to write all this, just fix the bug I mentioned to
you about RPS / RFS. (details in Documentation/networking/scaling.txt )

Simply read the code in enqueue_to_backlog(), this obviously needs to
own skb->next/prev

So pktgen in RX mode MUST deliver skb with skb->users = 1, there is no
way around it.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexei Starovoitov April 29, 2015, 11:28 p.m. UTC | #6

On 4/29/15 3:56 PM, Eric Dumazet wrote:
>
> So pktgen in RX mode MUST deliver skb with skb->users = 1, there is no
> way around it.

if I only knew how to do it...
The cost of continuously allocating skbs is way higher than
netif_receive_skb itself. Such benchmarking tool would measure the
speed of skb alloc/free instead of speed of netif_receive_skb.
Are you suggesting to pre-allocate 10s of millions of skbs and
then feed them in one go? The profile will be dominated by
cache misses in the first few lines of __netif_receive_skb_core()
where it accesses skb->dev,data,head. Doesn't sound too useful either.
Other thoughts?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet April 29, 2015, 11:39 p.m. UTC | #7

On Wed, 2015-04-29 at 16:28 -0700, Alexei Starovoitov wrote:
> On 4/29/15 3:56 PM, Eric Dumazet wrote:
> >
> > So pktgen in RX mode MUST deliver skb with skb->users = 1, there is no
> > way around it.
> 
> if I only knew how to do it...
> The cost of continuously allocating skbs is way higher than
> netif_receive_skb itself. Such benchmarking tool would measure the
> speed of skb alloc/free instead of speed of netif_receive_skb.
> Are you suggesting to pre-allocate 10s of millions of skbs and
> then feed them in one go? The profile will be dominated by
> cache misses in the first few lines of __netif_receive_skb_core()
> where it accesses skb->dev,data,head. Doesn't sound too useful either.
> Other thoughts?

The code I copy pasted from your patch is buggy. Not the whole thing.

You have to replace it by something smarter.

This should not be hard really.

Zap the 'burst/clone' thing, this is not going to work for RX.
TX was okay, not RX. 

You could for instance do :

atomic_inc(&skb->users);
netif_receive_skb(skb);
if (atomic_read(skb->users) != 1) {
	/* This is too bad, I can not recycle this skb because it is still used */
	consume_skb(skb);
	/* allocate a fresh new skb */
	skb = ...
} else {
	/* Yeah ! Lets celebrate, cost of reusing this skb was one atomic op */
}





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexei Starovoitov April 29, 2015, 11:59 p.m. UTC | #8

On 4/29/15 4:39 PM, Eric Dumazet wrote:
>
> Zap the 'burst/clone' thing, this is not going to work for RX.
> TX was okay, not RX.
>
> You could for instance do :
>
> atomic_inc(&skb->users);
> netif_receive_skb(skb);
> if (atomic_read(skb->users) != 1) {
> 	/* This is too bad, I can not recycle this skb because it is still used */
> 	consume_skb(skb);
> 	/* allocate a fresh new skb */
> 	skb = ...
> } else {
> 	/* Yeah ! Lets celebrate, cost of reusing this skb was one atomic op */
> }

ahh, great! I think I'm starting to understand.
then the following should be ok as well?

atomic_add(burst, &skb->users);
do {
    netif_receive_skb(skb);
    if (atomic_read(skb->users) != burst) {
  	/* too bad, can not recycle */
         atomic_sub(burst - 1, &skb->users);
  	consume_skb(skb);
  	/* allocate a fresh new skb */
  	skb = ...
         /* and get out of this loop */
    } else {
	/* Yeah ! the cost of reusing this skb was
            one atomic op amortized over 'burst' iterations */
    }
} while (--burst > 0);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC,net-next] pktgen: introduce 'rx' mode

Commit Message

Comments

Patch