Message ID | 54CA8D33.6020006@oracle.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 2015-01-29 at 14:42 -0500, David L Stevens wrote: > This patch fixes a bug where vnet_skb_shape() didn't set the already-selected > queue mapping when a packet copy was required. This results in using the > wrong queue index for stops/starts, hung tx queues and watchdog timeouts > under heavy load. It also transfers the destructor, truesize, and the owning > socket for flow control (code adapted from skb_segment). > > Signed-off-by: David L Stevens <david.stevens@oracle.com> > Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> > > -- > Changes since v1: > move truesize, destructor and socket per Eric Dumazet <eric.dumazet@gmail.com> > > --- > drivers/net/ethernet/sun/sunvnet.c | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c > index 2b719cc..0a1d3eb 100644 > --- a/drivers/net/ethernet/sun/sunvnet.c > +++ b/drivers/net/ethernet/sun/sunvnet.c > @@ -1123,6 +1123,16 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, int ncookies) > skb_shinfo(nskb)->gso_size = skb_shinfo(skb)->gso_size; > skb_shinfo(nskb)->gso_type = skb_shinfo(skb)->gso_type; > } > + nskb->queue_mapping = skb->queue_mapping; > + /* Following permits correct back-pressure, for protocols > + * using skb_set_owner_w(). > + * Idea is to transfer ownership from skb to nskb. > + */ > + if (skb->destructor == sock_wfree) { Sorry, but you should remove this test. (TCP uses another destructor, not sock_wfree()) All sent packets will support these swap() operations, regardless of destructor. > + swap(nskb->truesize, skb->truesize); > + swap(nskb->destructor, skb->destructor); > + swap(nskb->sk, skb->sk); > + } > dev_kfree_skb(skb); > skb = nskb; > } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/29/2015 03:39 PM, Eric Dumazet wrote: > Sorry, but you should remove this test. > > (TCP uses another destructor, not sock_wfree()) > > All sent packets will support these swap() operations, > regardless of destructor. In general the destructor should match the allocator, right? So it bothers me here that we'd be replacing it in an skb allocated with alloc_and_align_skb(), with an an arbitrary destructor from an skb NOT allocated with alloc_and_align_skb(). If it has a different destructor that does special handling related to the allocator, it is the original skb, not the new one, that needs the old destructor. This TCP accounting has less to do with the buffer destructor than with the freeing of the contents of the buffer, but that isn't necessarily true for all destructors. Checking for a known, specific destructor is less troubling, so I don't want to remove the test entirely. Since the concern here is specifically TCP flow control, do you think it's sufficient to substitute tcp_wfree for the sock_wfree here? +-DLS -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 29 Jan 2015 12:39:56 -0800 > On Thu, 2015-01-29 at 14:42 -0500, David L Stevens wrote: >> @@ -1123,6 +1123,16 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, int ncookies) >> skb_shinfo(nskb)->gso_size = skb_shinfo(skb)->gso_size; >> skb_shinfo(nskb)->gso_type = skb_shinfo(skb)->gso_type; >> } >> + nskb->queue_mapping = skb->queue_mapping; >> + /* Following permits correct back-pressure, for protocols >> + * using skb_set_owner_w(). >> + * Idea is to transfer ownership from skb to nskb. >> + */ >> + if (skb->destructor == sock_wfree) { > > Sorry, but you should remove this test. > > (TCP uses another destructor, not sock_wfree()) > > All sent packets will support these swap() operations, > regardless of destructor. Then we need to fix skb_segment() too. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-01-29 at 16:36 -0500, David L Stevens wrote: > In general the destructor should match the allocator, right? So it bothers me > here that we'd be replacing it in an skb allocated with alloc_and_align_skb(), > with an an arbitrary destructor from an skb NOT allocated with alloc_and_align_skb(). > If it has a different destructor that does special handling related to the allocator, > it is the original skb, not the new one, that needs the old destructor. This TCP > accounting has less to do with the buffer destructor than with the freeing of the > contents of the buffer, but that isn't necessarily true for all destructors. > > Checking for a known, specific destructor is less troubling, so I don't want > to remove the test entirely. > > Since the concern here is specifically TCP flow control, do you think it's sufficient > to substitute tcp_wfree for the sock_wfree here? The concern is also for UDP. Right now a single UDP flow can flood your network, even if application or admin cared to set a low SO_SNDBUF. I guess you can extend the test to sock_wfree and tcp_wfree, but : You have to EXPORT_SYMBOL(tcp_wfree) Add an #ifdef CONFIG_INET (take a look at skb_orphan_partial()) Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-01-29 at 14:19 -0800, Eric Dumazet wrote: > I guess you can extend the test to sock_wfree and tcp_wfree, but : > > You have to EXPORT_SYMBOL(tcp_wfree) > > Add an #ifdef CONFIG_INET (take a look at skb_orphan_partial()) Or even better, add a new helper in the same spirit than skb_orphan_partial() : You would EXPORT_SYMBOL() it and leave tcp_wfree() as non exported. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-01-29 at 14:15 -0800, David Miller wrote:
> Then we need to fix skb_segment() too.
My concern was that skb_segment() could be used in rx paths, with quite
different destructor semantics.
While from ndo_start_xmit(), I believe sane destructors should not
depend on skb being invariant, sk 'refcount' must be using
skb->truesize.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c index 2b719cc..0a1d3eb 100644 --- a/drivers/net/ethernet/sun/sunvnet.c +++ b/drivers/net/ethernet/sun/sunvnet.c @@ -1123,6 +1123,16 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, int ncookies) skb_shinfo(nskb)->gso_size = skb_shinfo(skb)->gso_size; skb_shinfo(nskb)->gso_type = skb_shinfo(skb)->gso_type; } + nskb->queue_mapping = skb->queue_mapping; + /* Following permits correct back-pressure, for protocols + * using skb_set_owner_w(). + * Idea is to transfer ownership from skb to nskb. + */ + if (skb->destructor == sock_wfree) { + swap(nskb->truesize, skb->truesize); + swap(nskb->destructor, skb->destructor); + swap(nskb->sk, skb->sk); + } dev_kfree_skb(skb); skb = nskb; }