Message ID | 1400238496-2471-1-git-send-email-wei.liu2@citrix.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2014-05-16 at 12:08 +0100, Wei Liu wrote: > Some workload, such as Redis can generate SKBs which make use of > compound pages. Netfront doesn't quite like that because it doesn't want > to send packet that occupies exessive slots to the backend as backend > might deem it malicious. On the flip side these packets are actually > legit, the size check at the beginning of xennet_start_xmit ensures that > packet size is below 64K. > > So we linearize SKB if it occupies too many slots. If the linearization > fails then the SKB is dropped. > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > Cc: David Vrabel <david.vrabel@citrix.com> > Cc: Konrad Wilk <konrad.wilk@oracle.com> > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> > Cc: Stefan Bader <stefan.bader@canonical.com> > Cc: Zoltan Kiss <zoltan.kiss@citrix.com> > --- > drivers/net/xen-netfront.c | 17 ++++++++++++++--- > 1 file changed, 14 insertions(+), 3 deletions(-) This is likely to fail on typical host. What about adding a smart helper trying to aggregate consecutive smallest fragments into a single frag ? This would be needed for bnx2x for example as well. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 16, 2014 at 06:04:34AM -0700, Eric Dumazet wrote: > On Fri, 2014-05-16 at 12:08 +0100, Wei Liu wrote: > > Some workload, such as Redis can generate SKBs which make use of > > compound pages. Netfront doesn't quite like that because it doesn't want > > to send packet that occupies exessive slots to the backend as backend > > might deem it malicious. On the flip side these packets are actually > > legit, the size check at the beginning of xennet_start_xmit ensures that > > packet size is below 64K. > > > > So we linearize SKB if it occupies too many slots. If the linearization > > fails then the SKB is dropped. > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > > Cc: David Vrabel <david.vrabel@citrix.com> > > Cc: Konrad Wilk <konrad.wilk@oracle.com> > > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> > > Cc: Stefan Bader <stefan.bader@canonical.com> > > Cc: Zoltan Kiss <zoltan.kiss@citrix.com> > > --- > > drivers/net/xen-netfront.c | 17 ++++++++++++++--- > > 1 file changed, 14 insertions(+), 3 deletions(-) > > This is likely to fail on typical host. > It's not that common to trigger this, I only saw a few reports. In fact Stefan's report is the first one that comes with a method to reproduce it. I tested with redis-benchmark on a guest with 256MB RAM and only saw a few "failed to linearize", never saw a single one with 1GB guest. > What about adding a smart helper trying to aggregate consecutive > smallest fragments into a single frag ? > Ideally this is a better apporach, but I'm afraid I won't be able to look into this until early / mid June. Wei. > This would be needed for bnx2x for example as well. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote: > It's not that common to trigger this, I only saw a few reports. In fact > Stefan's report is the first one that comes with a method to reproduce > it. > > I tested with redis-benchmark on a guest with 256MB RAM and only saw a > few "failed to linearize", never saw a single one with 1GB guest. Well, I am just saying. This is asking order-5 allocations, and yes, this is going to fail after few days of uptime, no matter what you try. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote: > On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote: > > > It's not that common to trigger this, I only saw a few reports. In fact > > Stefan's report is the first one that comes with a method to reproduce > > it. > > > > I tested with redis-benchmark on a guest with 256MB RAM and only saw a > > few "failed to linearize", never saw a single one with 1GB guest. > > Well, I am just saying. This is asking order-5 allocations, and yes, > this is going to fail after few days of uptime, no matter what you try. > Hmm... I see what you mean -- memory fragmentation leads to allocation failure. Thanks. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote: > On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote: > > On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote: > > > > > It's not that common to trigger this, I only saw a few reports. In fact > > > Stefan's report is the first one that comes with a method to reproduce > > > it. > > > > > > I tested with redis-benchmark on a guest with 256MB RAM and only saw a > > > few "failed to linearize", never saw a single one with 1GB guest. > > > > Well, I am just saying. This is asking order-5 allocations, and yes, > > this is going to fail after few days of uptime, no matter what you try. > > > > Hmm... I see what you mean -- memory fragmentation leads to allocation > failure. Thanks. In the mean time, have you tried to lower gso_max_size ? Setting it witk netif_set_gso_max_size() to something like 56000 might avoid the problem. (Not sure if it is applicable in your case) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote: > On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote: > > On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote: > > > On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote: > > > > > > > It's not that common to trigger this, I only saw a few reports. In fact > > > > Stefan's report is the first one that comes with a method to reproduce > > > > it. > > > > > > > > I tested with redis-benchmark on a guest with 256MB RAM and only saw a > > > > few "failed to linearize", never saw a single one with 1GB guest. > > > > > > Well, I am just saying. This is asking order-5 allocations, and yes, > > > this is going to fail after few days of uptime, no matter what you try. > > > > > > > Hmm... I see what you mean -- memory fragmentation leads to allocation > > failure. Thanks. > > In the mean time, have you tried to lower gso_max_size ? > > Setting it witk netif_set_gso_max_size() to something like 56000 might > avoid the problem. > > (Not sure if it is applicable in your case) > It works, at least in this Redis testcase. Could you explain a bit where this 56000 magic number comes from? :-) Presumably I can derive it from some constant in core network code? Wei. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 895355d..b378dcd 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -573,9 +573,20 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + xennet_count_skb_frag_slots(skb); if (unlikely(slots > MAX_SKB_FRAGS + 1)) { - net_alert_ratelimited( - "xennet: skb rides the rocket: %d slots\n", slots); - goto drop; + if (skb_linearize(skb)) { + net_alert_ratelimited( + "xennet: failed to linearize skb, skb dropped\n"); + goto drop; + } + data = skb->data; + offset = offset_in_page(data); + len = skb_headlen(skb); + slots = DIV_ROUND_UP(offset + len, PAGE_SIZE); + if (unlikely(slots > MAX_SKB_FRAGS + 1)) { + net_alert_ratelimited( + "xennet: still too many slots after linerization: %d", slots); + goto drop; + } } spin_lock_irqsave(&np->tx_lock, flags);
Some workload, such as Redis can generate SKBs which make use of compound pages. Netfront doesn't quite like that because it doesn't want to send packet that occupies exessive slots to the backend as backend might deem it malicious. On the flip side these packets are actually legit, the size check at the beginning of xennet_start_xmit ensures that packet size is below 64K. So we linearize SKB if it occupies too many slots. If the linearization fails then the SKB is dropped. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: Zoltan Kiss <zoltan.kiss@citrix.com> --- drivers/net/xen-netfront.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)