diff mbox

[RFC,net-next] pktgen: introduce 'rx' mode

Message ID 1430273488-8403-2-git-send-email-ast@plumgrid.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Alexei Starovoitov April 29, 2015, 2:11 a.m. UTC
Introduce 'RX' mode for pktgen which generates the packets
using familiar pktgen commands, but feeds them into netif_receive_skb()
instead of ndo_start_xmit().
It can be used to benchmark different kernel RX paths.

Sample script 'pktgen.sh':
!/bin/bash
function pgset() {
  local result

  echo $1 > $PGDEV

  result=`cat $PGDEV | fgrep "Result: OK:"`
  if [ "$result" = "" ]; then
    cat $PGDEV | fgrep Result:
  fi
}

ETH=$1

PGDEV=/proc/net/pktgen/kpktgend_0
pgset "rem_device_all"
pgset "add_device $ETH"

PGDEV=/proc/net/pktgen/$ETH
pgset "rx"
pgset "clone_skb 1000"
pgset "pkt_size 60"
pgset "dst 99.1.1.2"
pgset "dst_mac 90:e2:ba:6e:a8:e5"
pgset "count 10000000"
pgset "burst 32"

PGDEV=/proc/net/pktgen/pgctrl
echo "Running... ctrl^C to stop"
pgset "start"
echo "Done"
cat /proc/net/pktgen/$ETH

Usage:
$ sudo ./pktgen.sh eth2
...
Result: OK: 232376(c232372+d3) usec, 10000000 (60byte,0frags)
  43033682pps 20656Mb/sec (20656167360bps) errors: 10000000

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 net/core/pktgen.c |   30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

Comments

Eric Dumazet April 29, 2015, 4:14 a.m. UTC | #1
On Tue, 2015-04-28 at 19:11 -0700, Alexei Starovoitov wrote:

> +	if (pkt_dev->flags & F_DO_RX) {
> +		local_bh_disable();
> +		atomic_add(burst, &pkt_dev->skb->users);
> +		do {
> +			ret = netif_receive_skb(pkt_dev->skb);
> +			if (ret == NET_RX_DROP)
> +				pkt_dev->errors++;
> +			pkt_dev->last_ok = 1;
> +			pkt_dev->sofar++;
> +			pkt_dev->seq_num++;
> +		} while (--burst > 0);
> +		local_bh_enable();
> +		goto out;
> +	}
> +

This looks buggy.

skb can be put on a queue, so skb->next and skb->prev cannot be reused,
or queues will be corrupted.

Note that on TX, it is possible to have the same issue if you use a
virtual device like bonding, and skb is queued on a slave qdisc.

(Thats why we have this IFF_TX_SKB_SHARING flag)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 29, 2015, 9:55 p.m. UTC | #2
On 4/28/15 9:14 PM, Eric Dumazet wrote:
> On Tue, 2015-04-28 at 19:11 -0700, Alexei Starovoitov wrote:
>
>> +	if (pkt_dev->flags & F_DO_RX) {
>> +		local_bh_disable();
>> +		atomic_add(burst, &pkt_dev->skb->users);
>> +		do {
>> +			ret = netif_receive_skb(pkt_dev->skb);
>> +			if (ret == NET_RX_DROP)
>> +				pkt_dev->errors++;
>> +			pkt_dev->last_ok = 1;
>> +			pkt_dev->sofar++;
>> +			pkt_dev->seq_num++;
>> +		} while (--burst > 0);
>> +		local_bh_enable();
>> +		goto out;
>> +	}
>> +
>
> This looks buggy.
>
> skb can be put on a queue, so skb->next and skb->prev cannot be reused,
> or queues will be corrupted.

don't see the bug yet.
Any layer that wants to do such queueing should do skb_share_check
first. Just like ip_rcv does. So everything in IP world should
work fine, because it will be operating on clean cloned skb.

> Note that on TX, it is possible to have the same issue if you use a
> virtual device like bonding, and skb is queued on a slave qdisc.
>
> (Thats why we have this IFF_TX_SKB_SHARING flag)

yep, that's why xmit into veth is not useful for benchmarking of rx,
since the hottest functions are alloc_skb and fill_packet, while
netif_receive_skb is not even seen in top 10.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 29, 2015, 10:19 p.m. UTC | #3
On Wed, 2015-04-29 at 14:55 -0700, Alexei Starovoitov wrote:
> On 4/28/15 9:14 PM, Eric Dumazet wrote:
> > On Tue, 2015-04-28 at 19:11 -0700, Alexei Starovoitov wrote:
> >
> >> +	if (pkt_dev->flags & F_DO_RX) {
> >> +		local_bh_disable();
> >> +		atomic_add(burst, &pkt_dev->skb->users);
> >> +		do {
> >> +			ret = netif_receive_skb(pkt_dev->skb);
> >> +			if (ret == NET_RX_DROP)
> >> +				pkt_dev->errors++;
> >> +			pkt_dev->last_ok = 1;
> >> +			pkt_dev->sofar++;
> >> +			pkt_dev->seq_num++;
> >> +		} while (--burst > 0);
> >> +		local_bh_enable();
> >> +		goto out;
> >> +	}
> >> +
> >
> > This looks buggy.
> >
> > skb can be put on a queue, so skb->next and skb->prev cannot be reused,
> > or queues will be corrupted.
> 
> don't see the bug yet.
> Any layer that wants to do such queueing should do skb_share_check
> first. Just like ip_rcv does. So everything in IP world should
> work fine, because it will be operating on clean cloned skb.

Really this is what _you_ think is needed, so that your patch can fly.

In current state of the stack, the skb_share_check() is done where we
know that packet _might_ be delivered to multiple end points
(deliver_skb() does atomic_inc(&skb->users) )

But RPS/RFS/GRO do not care of your new rule.

Yes, before reaching __netif_receive_skb_core(), packets are supposed to
belong to the stack. We are supposed to queue them, without adding a
check for skb->users being one or not, and eventually add an expensive
memory allocation/copy.

We are not going to add an extra check just to make pktgen rx fast.
pktgen will have to comply to existing rules.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 29, 2015, 10:38 p.m. UTC | #4
On 4/29/15 3:19 PM, Eric Dumazet wrote:
> On Wed, 2015-04-29 at 14:55 -0700, Alexei Starovoitov wrote:
>> On 4/28/15 9:14 PM, Eric Dumazet wrote:
>>> On Tue, 2015-04-28 at 19:11 -0700, Alexei Starovoitov wrote:
>>>
>>>
>>> This looks buggy.
>>>
>>> skb can be put on a queue, so skb->next and skb->prev cannot be reused,
>>> or queues will be corrupted.
>>
>> don't see the bug yet.
>> Any layer that wants to do such queueing should do skb_share_check
>> first. Just like ip_rcv does. So everything in IP world should
>> work fine, because it will be operating on clean cloned skb.
>
> Really this is what _you_ think is needed, so that your patch can fly.
>
> In current state of the stack, the skb_share_check() is done where we
> know that packet _might_ be delivered to multiple end points
> (deliver_skb() does atomic_inc(&skb->users) )
>
> But RPS/RFS/GRO do not care of your new rule.
>
> Yes, before reaching __netif_receive_skb_core(), packets are supposed to
> belong to the stack. We are supposed to queue them, without adding a
> check for skb->users being one or not, and eventually add an expensive
> memory allocation/copy.
>
> We are not going to add an extra check just to make pktgen rx fast.
> pktgen will have to comply to existing rules.

I'm not making and not suggesting any new rules.
ip_rcv is doing this skb_share_check() not because of pktgen rx,
but because there can be taps and deliver_skb() as you said.
gro has a different interface and this pktgen cannot benchmark it.
rps/rfs is not benchmarkable but this approach either.
To me this is all fine. I'm not trying to do a universal
benchmarking tool. This one is dumb and simple and primarily
oriented to benchmark changes to netif_receive_skb and ingress
qdisc only. I'm not suggesting to use it everywhere.
I already mentioned in cover letter:
"The profile dump looks as expected for RX of UDP packets
without local socket except presence of __skb_clone."
Clearly I'm not suggesting to use pktgen rx to optimize IP stack
and not suggesting at all that stack should assume users!=1
when skb hits netif_receive_skb. Today at the beginning of
netif_receive_skb we know that users==1 without checking.
I'm not changing that assumption.
Just like pktgen xmit path is cheating little bit while
benchmarking TX, I'm cheating a little bit with users!=1 on RX.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 29, 2015, 10:56 p.m. UTC | #5
On Wed, 2015-04-29 at 15:38 -0700, Alexei Starovoitov wrote:

> I'm not making and not suggesting any new rules.
> ip_rcv is doing this skb_share_check() not because of pktgen rx,
> but because there can be taps and deliver_skb() as you said.
> gro has a different interface and this pktgen cannot benchmark it.
> rps/rfs is not benchmarkable but this approach either.
> To me this is all fine. I'm not trying to do a universal
> benchmarking tool. This one is dumb and simple and primarily
> oriented to benchmark changes to netif_receive_skb and ingress
> qdisc only. I'm not suggesting to use it everywhere.
> I already mentioned in cover letter:
> "The profile dump looks as expected for RX of UDP packets
> without local socket except presence of __skb_clone."
> Clearly I'm not suggesting to use pktgen rx to optimize IP stack
> and not suggesting at all that stack should assume users!=1
> when skb hits netif_receive_skb. Today at the beginning of
> netif_receive_skb we know that users==1 without checking.
> I'm not changing that assumption.
> Just like pktgen xmit path is cheating little bit while
> benchmarking TX, I'm cheating a little bit with users!=1 on RX.

Look, you don't have to write all this, just fix the bug I mentioned to
you about RPS / RFS. (details in Documentation/networking/scaling.txt )

Simply read the code in enqueue_to_backlog(), this obviously needs to
own skb->next/prev

So pktgen in RX mode MUST deliver skb with skb->users = 1, there is no
way around it.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 29, 2015, 11:28 p.m. UTC | #6
On 4/29/15 3:56 PM, Eric Dumazet wrote:
>
> So pktgen in RX mode MUST deliver skb with skb->users = 1, there is no
> way around it.

if I only knew how to do it...
The cost of continuously allocating skbs is way higher than
netif_receive_skb itself. Such benchmarking tool would measure the
speed of skb alloc/free instead of speed of netif_receive_skb.
Are you suggesting to pre-allocate 10s of millions of skbs and
then feed them in one go? The profile will be dominated by
cache misses in the first few lines of __netif_receive_skb_core()
where it accesses skb->dev,data,head. Doesn't sound too useful either.
Other thoughts?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 29, 2015, 11:39 p.m. UTC | #7
On Wed, 2015-04-29 at 16:28 -0700, Alexei Starovoitov wrote:
> On 4/29/15 3:56 PM, Eric Dumazet wrote:
> >
> > So pktgen in RX mode MUST deliver skb with skb->users = 1, there is no
> > way around it.
> 
> if I only knew how to do it...
> The cost of continuously allocating skbs is way higher than
> netif_receive_skb itself. Such benchmarking tool would measure the
> speed of skb alloc/free instead of speed of netif_receive_skb.
> Are you suggesting to pre-allocate 10s of millions of skbs and
> then feed them in one go? The profile will be dominated by
> cache misses in the first few lines of __netif_receive_skb_core()
> where it accesses skb->dev,data,head. Doesn't sound too useful either.
> Other thoughts?

The code I copy pasted from your patch is buggy. Not the whole thing.

You have to replace it by something smarter.

This should not be hard really.

Zap the 'burst/clone' thing, this is not going to work for RX.
TX was okay, not RX. 

You could for instance do :

atomic_inc(&skb->users);
netif_receive_skb(skb);
if (atomic_read(skb->users) != 1) {
	/* This is too bad, I can not recycle this skb because it is still used */
	consume_skb(skb);
	/* allocate a fresh new skb */
	skb = ...
} else {
	/* Yeah ! Lets celebrate, cost of reusing this skb was one atomic op */
}





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 29, 2015, 11:59 p.m. UTC | #8
On 4/29/15 4:39 PM, Eric Dumazet wrote:
>
> Zap the 'burst/clone' thing, this is not going to work for RX.
> TX was okay, not RX.
>
> You could for instance do :
>
> atomic_inc(&skb->users);
> netif_receive_skb(skb);
> if (atomic_read(skb->users) != 1) {
> 	/* This is too bad, I can not recycle this skb because it is still used */
> 	consume_skb(skb);
> 	/* allocate a fresh new skb */
> 	skb = ...
> } else {
> 	/* Yeah ! Lets celebrate, cost of reusing this skb was one atomic op */
> }

ahh, great! I think I'm starting to understand.
then the following should be ok as well?

atomic_add(burst, &skb->users);
do {
    netif_receive_skb(skb);
    if (atomic_read(skb->users) != burst) {
  	/* too bad, can not recycle */
         atomic_sub(burst - 1, &skb->users);
  	consume_skb(skb);
  	/* allocate a fresh new skb */
  	skb = ...
         /* and get out of this loop */
    } else {
	/* Yeah ! the cost of reusing this skb was
            one atomic op amortized over 'burst' iterations */
    }
} while (--burst > 0);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 508155b283dd..4f6c56bca550 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -203,6 +203,7 @@ 
 #define F_NODE          (1<<15)	/* Node memory alloc*/
 #define F_UDPCSUM       (1<<16)	/* Include UDP checksum */
 #define F_NO_TIMESTAMP  (1<<17)	/* Don't timestamp packets (default TS) */
+#define F_DO_RX		(1<<18) /* generate packets for RX */
 
 /* Thread control flag bits */
 #define T_STOP        (1<<0)	/* Stop run */
@@ -1080,7 +1081,7 @@  static ssize_t pktgen_if_write(struct file *file,
 		len = num_arg(&user_buffer[i], 10, &value);
 		if (len < 0)
 			return len;
-		if ((value > 0) &&
+		if ((value > 0) && !(pkt_dev->flags & F_DO_RX) &&
 		    (!(pkt_dev->odev->priv_flags & IFF_TX_SKB_SHARING)))
 			return -ENOTSUPP;
 		i += len;
@@ -1089,6 +1090,12 @@  static ssize_t pktgen_if_write(struct file *file,
 		sprintf(pg_result, "OK: clone_skb=%d", pkt_dev->clone_skb);
 		return count;
 	}
+	if (!strcmp(name, "rx")) {
+		pkt_dev->flags |= F_DO_RX;
+
+		sprintf(pg_result, "OK: RX is ON");
+		return count;
+	}
 	if (!strcmp(name, "count")) {
 		len = num_arg(&user_buffer[i], 10, &value);
 		if (len < 0)
@@ -1134,7 +1141,7 @@  static ssize_t pktgen_if_write(struct file *file,
 			return len;
 
 		i += len;
-		if ((value > 1) &&
+		if ((value > 1) && !(pkt_dev->flags & F_DO_RX) &&
 		    (!(pkt_dev->odev->priv_flags & IFF_TX_SKB_SHARING)))
 			return -ENOTSUPP;
 		pkt_dev->burst = value < 1 ? 1 : value;
@@ -3349,11 +3356,29 @@  static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		pkt_dev->last_pkt_size = pkt_dev->skb->len;
 		pkt_dev->allocated_skbs++;
 		pkt_dev->clone_count = 0;	/* reset counter */
+		if (pkt_dev->flags & F_DO_RX)
+			pkt_dev->skb->protocol = eth_type_trans(pkt_dev->skb,
+								pkt_dev->skb->dev);
 	}
 
 	if (pkt_dev->delay && pkt_dev->last_ok)
 		spin(pkt_dev, pkt_dev->next_tx);
 
+	if (pkt_dev->flags & F_DO_RX) {
+		local_bh_disable();
+		atomic_add(burst, &pkt_dev->skb->users);
+		do {
+			ret = netif_receive_skb(pkt_dev->skb);
+			if (ret == NET_RX_DROP)
+				pkt_dev->errors++;
+			pkt_dev->last_ok = 1;
+			pkt_dev->sofar++;
+			pkt_dev->seq_num++;
+		} while (--burst > 0);
+		local_bh_enable();
+		goto out;
+	}
+
 	txq = skb_get_tx_queue(odev, pkt_dev->skb);
 
 	local_bh_disable();
@@ -3403,6 +3428,7 @@  unlock:
 
 	local_bh_enable();
 
+out:
 	/* If pkt_dev->count is zero, then run forever */
 	if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) {
 		pktgen_wait_for_skb(pkt_dev);