diff mbox

[Xen-devel] xen-netfront possibly rides the rocket too often

Message ID 20140516094842.GA18551@zion.uk.xensource.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Wei Liu May 16, 2014, 9:48 a.m. UTC
On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
[...]
> > Wei.
> > 
> Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
> least now with compound pages) cannot be used in any way to derive the number of
> 4k slots a transfer will require.
> 
> Zoltan already commented on worst cases. Not sure it would get as bad as that or
> "just" 16*4k frags all in the middle of compound pages. That would then end in
> around 33 or 34 slots, depending on the header.
> 
> Zoltan wrote:
> > I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
> > which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
> them have a 4k
> > page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
> > That's 51 individual pages.
> 
> I cannot claim to really know what to expect worst case. Somewhat I was thinking
> of a
> worst case of (16+1)*2, which would be inconvenient enough.
> 
> So without knowing exactly how to do it, but as Ian said it sounds best to come
> up with some sort of exception coalescing in cases the slot count goes over 18
> and we know the data size is below 64K.
> 

I took a stab at it this morning and came up with this patch. Ran
redis-benchmark, it seemed to fix that for me -- only saw one "failed to
linearize skb" during

  redis-benchmark -h XXX -d 1000 -t lrange

And before this change, a lot of "rides rocket" were triggered.

Thought?

---8<---
From 743495a2b2d338fc6cfe9bfd4b6e840392b87f4a Mon Sep 17 00:00:00 2001
From: Wei Liu <wei.liu2@citrix.com>
Date: Fri, 16 May 2014 10:39:01 +0100
Subject: [PATCH] xen-netfront: linearize SKB if it occupies too many slots

Some workload, such as Redis can generate SKBs which make use of compound
pages. Netfront doesn't quite like that because it doesn't want to send
exessive slots to the backend as backend might deem it malicious. On the
flip side these packets are actually legit, the size check at the
beginning of xennet_start_xmit ensures that packet size is below 64K.

So we linearize SKB if it occupies too many slots. If the linearization
fails then the SKB is dropped.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Wei Liu May 16, 2014, 9:57 a.m. UTC | #1
On Fri, May 16, 2014 at 10:48:42AM +0100, Wei Liu wrote:
> On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
> [...]
> > > Wei.
> > > 
> > Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
> > least now with compound pages) cannot be used in any way to derive the number of
> > 4k slots a transfer will require.
> > 
> > Zoltan already commented on worst cases. Not sure it would get as bad as that or
> > "just" 16*4k frags all in the middle of compound pages. That would then end in
> > around 33 or 34 slots, depending on the header.
> > 
> > Zoltan wrote:
> > > I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
> > > which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
> > them have a 4k
> > > page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
> > > That's 51 individual pages.
> > 
> > I cannot claim to really know what to expect worst case. Somewhat I was thinking
> > of a
> > worst case of (16+1)*2, which would be inconvenient enough.
> > 
> > So without knowing exactly how to do it, but as Ian said it sounds best to come
> > up with some sort of exception coalescing in cases the slot count goes over 18
> > and we know the data size is below 64K.
> > 
> 
> I took a stab at it this morning and came up with this patch. Ran
> redis-benchmark, it seemed to fix that for me -- only saw one "failed to
> linearize skb" during
> 
>   redis-benchmark -h XXX -d 1000 -t lrange
> 
> And before this change, a lot of "rides rocket" were triggered.
> 
> Thought?
> 

And don't take this as a formal patch because I haven't tested it
thoroughly. It's just an idea to coalesce SKBs. We can certainly code
up a loop to do so by ourself.

Wei.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight May 16, 2014, 10:05 a.m. UTC | #2
From: Wei Liu
> On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
> [...]
> > > Wei.
> > >
> > Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
> > least now with compound pages) cannot be used in any way to derive the number of
> > 4k slots a transfer will require.
> >
> > Zoltan already commented on worst cases. Not sure it would get as bad as that or
> > "just" 16*4k frags all in the middle of compound pages. That would then end in
> > around 33 or 34 slots, depending on the header.
> >
> > Zoltan wrote:
> > > I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
> > > which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
> > them have a 4k
> > > page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
> > > That's 51 individual pages.
> >
> > I cannot claim to really know what to expect worst case. Somewhat I was thinking
> > of a worst case of (16+1)*2, which would be inconvenient enough.
> >
> > So without knowing exactly how to do it, but as Ian said it sounds best to come
> > up with some sort of exception coalescing in cases the slot count goes over 18
> > and we know the data size is below 64K.
> >
> 
> I took a stab at it this morning and came up with this patch. Ran
> redis-benchmark, it seemed to fix that for me -- only saw one "failed to
> linearize skb" during
> 
>   redis-benchmark -h XXX -d 1000 -t lrange
> 
> And before this change, a lot of "rides rocket" were triggered.
> 
> Thought?
> 
> ---8<---
> From 743495a2b2d338fc6cfe9bfd4b6e840392b87f4a Mon Sep 17 00:00:00 2001
> From: Wei Liu <wei.liu2@citrix.com>
> Date: Fri, 16 May 2014 10:39:01 +0100
> Subject: [PATCH] xen-netfront: linearize SKB if it occupies too many slots
> 
> Some workload, such as Redis can generate SKBs which make use of compound
> pages. Netfront doesn't quite like that because it doesn't want to send
> exessive slots to the backend as backend might deem it malicious. On the
> flip side these packets are actually legit, the size check at the
> beginning of xennet_start_xmit ensures that packet size is below 64K.
> 
> So we linearize SKB if it occupies too many slots. If the linearization
> fails then the SKB is dropped.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 895355d..0361fc5 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -573,9 +573,21 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> -		net_alert_ratelimited(
> -			"xennet: skb rides the rocket: %d slots\n", slots);
> -		goto drop;
> +		if (skb_linearize(skb)) {

You don't need to actually linearize the skb here.
One with multiple fragments is fine.
I'm not sure there is a standard function to 'copy and refragment'
the skb data though.

> +			net_alert_ratelimited(
> +				"xennet: failed to linearize skb, skb dropped\n");
> +			goto drop;
> +		}
> +		data = skb->data;
> +		offset = offset_in_page(data);
> +		len = skb_headlen(skb);
> +		slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> +			xennet_count_skb_frag_slots(skb);

IIRC If you have called skb_linearize then there shouldn't be any fragments.

> +		if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> +			net_alert_ratelimited(
> +				"xennet: still too many slots after linerization: %d", slots);
> +			goto drop;
> +		}
>  	}
> 
>  	spin_lock_irqsave(&np->tx_lock, flags);
> --
> 1.7.10.4

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Bader May 16, 2014, 10:09 a.m. UTC | #3
On 16.05.2014 11:48, Wei Liu wrote:
> On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
> [...]
>>> Wei.
>>>
>> Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
>> least now with compound pages) cannot be used in any way to derive the number of
>> 4k slots a transfer will require.
>>
>> Zoltan already commented on worst cases. Not sure it would get as bad as that or
>> "just" 16*4k frags all in the middle of compound pages. That would then end in
>> around 33 or 34 slots, depending on the header.
>>
>> Zoltan wrote:
>>> I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
>>> which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
>> them have a 4k
>>> page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
>>> That's 51 individual pages.
>>
>> I cannot claim to really know what to expect worst case. Somewhat I was thinking
>> of a
>> worst case of (16+1)*2, which would be inconvenient enough.
>>
>> So without knowing exactly how to do it, but as Ian said it sounds best to come
>> up with some sort of exception coalescing in cases the slot count goes over 18
>> and we know the data size is below 64K.
>>
> 
> I took a stab at it this morning and came up with this patch. Ran
> redis-benchmark, it seemed to fix that for me -- only saw one "failed to
> linearize skb" during
> 
>   redis-benchmark -h XXX -d 1000 -t lrange
> 
> And before this change, a lot of "rides rocket" were triggered.
> 
> Thought?

It appears at least to me as something that nicely makes use of existing code. I
was wondering about what could or could not be used. Trying to get ones head
around the whole thing is kind of a lot to look at.

The change at least looks straight forward enough.

-Stefan
> 
> ---8<---
> From 743495a2b2d338fc6cfe9bfd4b6e840392b87f4a Mon Sep 17 00:00:00 2001
> From: Wei Liu <wei.liu2@citrix.com>
> Date: Fri, 16 May 2014 10:39:01 +0100
> Subject: [PATCH] xen-netfront: linearize SKB if it occupies too many slots
> 
> Some workload, such as Redis can generate SKBs which make use of compound
> pages. Netfront doesn't quite like that because it doesn't want to send
> exessive slots to the backend as backend might deem it malicious. On the
> flip side these packets are actually legit, the size check at the
> beginning of xennet_start_xmit ensures that packet size is below 64K.
> 
> So we linearize SKB if it occupies too many slots. If the linearization
> fails then the SKB is dropped.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 895355d..0361fc5 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -573,9 +573,21 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> -		net_alert_ratelimited(
> -			"xennet: skb rides the rocket: %d slots\n", slots);
> -		goto drop;
> +		if (skb_linearize(skb)) {
> +			net_alert_ratelimited(
> +				"xennet: failed to linearize skb, skb dropped\n");
> +			goto drop;
> +		}
> +		data = skb->data;
> +		offset = offset_in_page(data);
> +		len = skb_headlen(skb);
> +		slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> +			xennet_count_skb_frag_slots(skb);
> +		if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> +			net_alert_ratelimited(
> +				"xennet: still too many slots after linerization: %d", slots);
> +			goto drop;
> +		}
>  	}
>  
>  	spin_lock_irqsave(&np->tx_lock, flags);
>
Stefan Bader May 16, 2014, 10:17 a.m. UTC | #4
On 16.05.2014 12:09, Stefan Bader wrote:
> On 16.05.2014 11:48, Wei Liu wrote:
>> On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
>> [...]
>>>> Wei.
>>>>
>>> Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
>>> least now with compound pages) cannot be used in any way to derive the number of
>>> 4k slots a transfer will require.
>>>
>>> Zoltan already commented on worst cases. Not sure it would get as bad as that or
>>> "just" 16*4k frags all in the middle of compound pages. That would then end in
>>> around 33 or 34 slots, depending on the header.
>>>
>>> Zoltan wrote:
>>>> I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
>>>> which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
>>> them have a 4k
>>>> page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
>>>> That's 51 individual pages.
>>>
>>> I cannot claim to really know what to expect worst case. Somewhat I was thinking
>>> of a
>>> worst case of (16+1)*2, which would be inconvenient enough.
>>>
>>> So without knowing exactly how to do it, but as Ian said it sounds best to come
>>> up with some sort of exception coalescing in cases the slot count goes over 18
>>> and we know the data size is below 64K.
>>>
>>
>> I took a stab at it this morning and came up with this patch. Ran
>> redis-benchmark, it seemed to fix that for me -- only saw one "failed to
>> linearize skb" during
>>
>>   redis-benchmark -h XXX -d 1000 -t lrange
>>
>> And before this change, a lot of "rides rocket" were triggered.
>>
>> Thought?
> 
> It appears at least to me as something that nicely makes use of existing code. I
> was wondering about what could or could not be used. Trying to get ones head
> around the whole thing is kind of a lot to look at.
> 
> The change at least looks straight forward enough.

The only woe for me is that I am looking puzzled at the implementation of
skb_linearize(). Somehow the data_len element decides whether a skb can be
linearized and basically how much it tries to pull from the tail. It probably
makes sense ... just not to me with not deep experience here.

-Stefan
Wei Liu May 16, 2014, 10:22 a.m. UTC | #5
On Fri, May 16, 2014 at 10:05:46AM +0000, David Laight wrote:
[...]
> > ---8<---
> > From 743495a2b2d338fc6cfe9bfd4b6e840392b87f4a Mon Sep 17 00:00:00 2001
> > From: Wei Liu <wei.liu2@citrix.com>
> > Date: Fri, 16 May 2014 10:39:01 +0100
> > Subject: [PATCH] xen-netfront: linearize SKB if it occupies too many slots
> > 
> > Some workload, such as Redis can generate SKBs which make use of compound
> > pages. Netfront doesn't quite like that because it doesn't want to send
> > exessive slots to the backend as backend might deem it malicious. On the
> > flip side these packets are actually legit, the size check at the
> > beginning of xennet_start_xmit ensures that packet size is below 64K.
> > 
> > So we linearize SKB if it occupies too many slots. If the linearization
> > fails then the SKB is dropped.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |   18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 895355d..0361fc5 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -573,9 +573,21 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> >  		xennet_count_skb_frag_slots(skb);
> >  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> > -		net_alert_ratelimited(
> > -			"xennet: skb rides the rocket: %d slots\n", slots);
> > -		goto drop;
> > +		if (skb_linearize(skb)) {
> 
> You don't need to actually linearize the skb here.
> One with multiple fragments is fine.

Yes I know; but my idea is to get a SKB that doesn't use up too many
slots -- a linearized SKB will not use too many slots.

> I'm not sure there is a standard function to 'copy and refragment'
> the skb data though.
> 

That can only help if we can control how the fragment page is allocated.

> > +			net_alert_ratelimited(
> > +				"xennet: failed to linearize skb, skb dropped\n");
> > +			goto drop;
> > +		}
> > +		data = skb->data;
> > +		offset = offset_in_page(data);
> > +		len = skb_headlen(skb);
> > +		slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> > +			xennet_count_skb_frag_slots(skb);
> 
> IIRC If you have called skb_linearize then there shouldn't be any fragments.
> 

Plain copy-n-paste from code above. I will pay more attention if it's a
formal patch. ;-)

Wei.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Liu May 16, 2014, 10:32 a.m. UTC | #6
On Fri, May 16, 2014 at 12:17:48PM +0200, Stefan Bader wrote:
> On 16.05.2014 12:09, Stefan Bader wrote:
> > On 16.05.2014 11:48, Wei Liu wrote:
> >> On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
> >> [...]
> >>>> Wei.
> >>>>
> >>> Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
> >>> least now with compound pages) cannot be used in any way to derive the number of
> >>> 4k slots a transfer will require.
> >>>
> >>> Zoltan already commented on worst cases. Not sure it would get as bad as that or
> >>> "just" 16*4k frags all in the middle of compound pages. That would then end in
> >>> around 33 or 34 slots, depending on the header.
> >>>
> >>> Zoltan wrote:
> >>>> I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
> >>>> which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
> >>> them have a 4k
> >>>> page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
> >>>> That's 51 individual pages.
> >>>
> >>> I cannot claim to really know what to expect worst case. Somewhat I was thinking
> >>> of a
> >>> worst case of (16+1)*2, which would be inconvenient enough.
> >>>
> >>> So without knowing exactly how to do it, but as Ian said it sounds best to come
> >>> up with some sort of exception coalescing in cases the slot count goes over 18
> >>> and we know the data size is below 64K.
> >>>
> >>
> >> I took a stab at it this morning and came up with this patch. Ran
> >> redis-benchmark, it seemed to fix that for me -- only saw one "failed to
> >> linearize skb" during
> >>
> >>   redis-benchmark -h XXX -d 1000 -t lrange
> >>
> >> And before this change, a lot of "rides rocket" were triggered.
> >>
> >> Thought?
> > 
> > It appears at least to me as something that nicely makes use of existing code. I
> > was wondering about what could or could not be used. Trying to get ones head
> > around the whole thing is kind of a lot to look at.
> > 
> > The change at least looks straight forward enough.
> 
> The only woe for me is that I am looking puzzled at the implementation of
> skb_linearize(). Somehow the data_len element decides whether a skb can be
> linearized and basically how much it tries to pull from the tail. It probably
> makes sense ... just not to me with not deep experience here.
> 

Data_len is the size of paged data (that's the size of data in frags).
If it pulls everything from frags to linear area then SKB is linearized.

Wei.

> -Stefan
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 895355d..0361fc5 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -573,9 +573,21 @@  static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
 		xennet_count_skb_frag_slots(skb);
 	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
-		net_alert_ratelimited(
-			"xennet: skb rides the rocket: %d slots\n", slots);
-		goto drop;
+		if (skb_linearize(skb)) {
+			net_alert_ratelimited(
+				"xennet: failed to linearize skb, skb dropped\n");
+			goto drop;
+		}
+		data = skb->data;
+		offset = offset_in_page(data);
+		len = skb_headlen(skb);
+		slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
+			xennet_count_skb_frag_slots(skb);
+		if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+			net_alert_ratelimited(
+				"xennet: still too many slots after linerization: %d", slots);
+			goto drop;
+		}
 	}
 
 	spin_lock_irqsave(&np->tx_lock, flags);