Message ID | 145225444176.22215.2003378381077166898.stgit@zurg |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, Jan 8, 2016 at 3:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote: > Skb_gso_segment() uses skb control block during segmentation. > This patch adds 32-bytes room for previous control block which > will be copied into all resulting segments. > > This patch fixes kernel crash during fragmenting forwarded packets. > Fragmentation requires valid IP CB in skb for clearing ip options. > Also patch removes custom save/restore in ovs code, now it's redundant. > > Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> > Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com This patch is alternative for [PATCH] net: prevent corruption of skb when using skb_gso_segment by Thadeu Lima de Souza Cascardo. This version have no stack allocations and no changes in code. It just shifts gso cb.
From: Of Konstantin Khlebnikov > Sent: 08 January 2016 12:01 > Skb_gso_segment() uses skb control block during segmentation. > This patch adds 32-bytes room for previous control block which > will be copied into all resulting segments. > > This patch fixes kernel crash during fragmenting forwarded packets. > Fragmentation requires valid IP CB in skb for clearing ip options. > Also patch removes custom save/restore in ovs code, now it's redundant. > ... > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 4355129fff91..9147f9f34cbe 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -3446,7 +3446,8 @@ struct skb_gso_cb { > int encap_level; > __u16 csum_start; > }; > -#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)(skb)->cb) > +#define SKB_SGO_CB_OFFSET 32 > +#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)((skb)->cb + SKB_SGO_CB_OFFSET)) You could set SKB_SGO_CB_OFFSET to sizeof ((skb)->cb) - sizeof (struct skb_gso_cb) so that the end of 'cb' is always used. (Assuming the former is a multiple of 4.) It might be worth using an on-stack structure passed through as a separate parameter - it doesn't look as though it has to be queued with the skb. (Clearly a bigger change.) David
On Fri, Jan 08, 2016 at 12:13:49PM +0000, David Laight wrote: > From: Of Konstantin Khlebnikov > > Sent: 08 January 2016 12:01 > > Skb_gso_segment() uses skb control block during segmentation. > > This patch adds 32-bytes room for previous control block which > > will be copied into all resulting segments. > > > > This patch fixes kernel crash during fragmenting forwarded packets. > > Fragmentation requires valid IP CB in skb for clearing ip options. > > Also patch removes custom save/restore in ovs code, now it's redundant. > > > ... > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index 4355129fff91..9147f9f34cbe 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -3446,7 +3446,8 @@ struct skb_gso_cb { > > int encap_level; > > __u16 csum_start; > > }; > > -#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)(skb)->cb) > > +#define SKB_SGO_CB_OFFSET 32 > > +#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)((skb)->cb + SKB_SGO_CB_OFFSET)) > > You could set SKB_SGO_CB_OFFSET to sizeof ((skb)->cb) - sizeof (struct skb_gso_cb) > so that the end of 'cb' is always used. > (Assuming the former is a multiple of 4.) > > It might be worth using an on-stack structure passed through as a separate > parameter - it doesn't look as though it has to be queued with the skb. > (Clearly a bigger change.) > I considered that as an option. But the bigger change and the use of the extra stack for all users, plus the extra parameters indicated I should go the other way. In my opinion, at least in the IP fragmentation case, saving/restoring cb is not such a big problem since we are in slow path already. Cascardo. > David >
Hi Konstantin,
[auto build test ERROR on net/master]
[also build test ERROR on v4.4-rc8 next-20160108]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
url: https://github.com/0day-ci/linux/commits/Konstantin-Khlebnikov/net-preserve-IP-control-block-during-GSO-segmentation/20160108-200335
config: xtensa-allyesconfig (attached as .config)
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa
All errors (new ones prefixed by >>):
net/openvswitch/datapath.c: In function 'queue_gso_packets':
>> net/openvswitch/datapath.c:360:18: error: 'ovs_cb' undeclared (first use in this function)
*OVS_CB(skb) = ovs_cb;
^
net/openvswitch/datapath.c:360:18: note: each undeclared identifier is reported only once for each function it appears in
vim +/ovs_cb +360 net/openvswitch/datapath.c
e8eedb85 Pravin B Shelar 2014-11-06 354 later_key.ip.frag = OVS_FRAG_TYPE_LATER;
e8eedb85 Pravin B Shelar 2014-11-06 355 }
e8eedb85 Pravin B Shelar 2014-11-06 356
ccb1352e Jesse Gross 2011-10-25 357 /* Queue all of the segments. */
ccb1352e Jesse Gross 2011-10-25 358 skb = segs;
ccb1352e Jesse Gross 2011-10-25 359 do {
e8eedb85 Pravin B Shelar 2014-11-06 @360 *OVS_CB(skb) = ovs_cb;
e8eedb85 Pravin B Shelar 2014-11-06 361 if (gso_type & SKB_GSO_UDP && skb != segs)
e8eedb85 Pravin B Shelar 2014-11-06 362 key = &later_key;
e8eedb85 Pravin B Shelar 2014-11-06 363
:::::: The code at line 360 was first introduced by commit
:::::: e8eedb85bd238613332570ac6ae683fee94fbe36 openvswitch: Remove redundant key ref from upcall_info.
:::::: TO: Pravin B Shelar <pshelar@nicira.com>
:::::: CC: Pravin B Shelar <pshelar@nicira.com>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On 01/08/2016 08:13 PM, David Laight wrote: > You could set SKB_SGO_CB_OFFSET to sizeof ((skb)->cb) - sizeof (struct skb_gso_cb) > so that the end of 'cb' is always used. > (Assuming the former is a multiple of 4.) > > It might be worth using an on-stack structure passed through as a separate > parameter - it doesn't look as though it has to be queued with the skb. > (Clearly a bigger change.) I would definitely prefer the stack structure. As a kernel developer, sometime I can hardly figure out which struct current cb is without debug it, and the worst they are not documented anywhere. I can hardly know the life time of the cb types. If using a stack, things can be much easier. only an extra var to store the stack top: int cb_top; and several macro to manage the stack: SKB_CB_PUSH(skb, type) SKB_CB_POP(skb) SKB_CB_TOP(skb, type) and maybe a debug variable to store current cb type. All current cb macro can be replaced by SKB_CB_TOP. Although it is a big change, I think it worths, for both performance and maintainability
On Sun, Jan 10, 2016 at 11:45 PM, Zang MingJie <zealot0630@gmail.com> wrote: > On 01/08/2016 08:13 PM, David Laight wrote: >> >> You could set SKB_SGO_CB_OFFSET to sizeof ((skb)->cb) - sizeof (struct >> skb_gso_cb) >> so that the end of 'cb' is always used. >> (Assuming the former is a multiple of 4.) >> >> It might be worth using an on-stack structure passed through as a separate >> parameter - it doesn't look as though it has to be queued with the skb. >> (Clearly a bigger change.) > > > I would definitely prefer the stack structure. > > As a kernel developer, sometime I can hardly figure out which struct current > cb is without debug it, and the worst they are not documented anywhere. I > can hardly know the life time of the cb types. NAK. Skb control block was not designed to be used like a stack but like a union for different layers, just that people begin to mess it up.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4355129fff91..9147f9f34cbe 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3446,7 +3446,8 @@ struct skb_gso_cb { int encap_level; __u16 csum_start; }; -#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)(skb)->cb) +#define SKB_SGO_CB_OFFSET 32 +#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)((skb)->cb + SKB_SGO_CB_OFFSET)) static inline int skb_tnl_header_len(const struct sk_buff *inner_skb) { diff --git a/net/core/dev.c b/net/core/dev.c index ae00b894e675..7f00f2439770 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2542,6 +2542,8 @@ static inline bool skb_needs_check(struct sk_buff *skb, bool tx_path) * * It may return NULL if the skb requires no segmentation. This is * only possible when GSO is used for verifying header integrity. + * + * Segmentation preserves SKB_SGO_CB_OFFSET bytes of previous skb cb. */ struct sk_buff *__skb_gso_segment(struct sk_buff *skb, netdev_features_t features, bool tx_path) @@ -2556,6 +2558,9 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb, return ERR_PTR(err); } + BUILD_BUG_ON(SKB_SGO_CB_OFFSET + + sizeof(*SKB_GSO_CB(skb)) > sizeof(skb->cb)); + SKB_GSO_CB(skb)->mac_offset = skb_headroom(skb); SKB_GSO_CB(skb)->encap_level = 0; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 4233cbe47052..59ed4b89b67a 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -240,6 +240,7 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk, * from host network stack. */ features = netif_skb_features(skb); + BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET); segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); if (IS_ERR_OR_NULL(segs)) { kfree_skb(skb); diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 91a8b004dc51..b1b380ee667d 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -336,12 +336,10 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb, unsigned short gso_type = skb_shinfo(skb)->gso_type; struct sw_flow_key later_key; struct sk_buff *segs, *nskb; - struct ovs_skb_cb ovs_cb; int err; - ovs_cb = *OVS_CB(skb); + BUILD_BUG_ON(sizeof(*OVS_CB(skb)) > SKB_SGO_CB_OFFSET); segs = __skb_gso_segment(skb, NETIF_F_SG, false); - *OVS_CB(skb) = ovs_cb; if (IS_ERR(segs)) return PTR_ERR(segs); if (segs == NULL) diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index cc3676eb6239..ff4a91fcab9f 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -167,6 +167,8 @@ static int xfrm_output_gso(struct net *net, struct sock *sk, struct sk_buff *skb { struct sk_buff *segs; + BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET); + BUILD_BUG_ON(sizeof(*IP6CB(skb)) > SKB_SGO_CB_OFFSET); segs = skb_gso_segment(skb, 0); kfree_skb(skb); if (IS_ERR(segs))
Skb_gso_segment() uses skb control block during segmentation. This patch adds 32-bytes room for previous control block which will be copied into all resulting segments. This patch fixes kernel crash during fragmenting forwarded packets. Fragmentation requires valid IP CB in skb for clearing ip options. Also patch removes custom save/restore in ovs code, now it's redundant. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com --- include/linux/skbuff.h | 3 ++- net/core/dev.c | 5 +++++ net/ipv4/ip_output.c | 1 + net/openvswitch/datapath.c | 4 +--- net/xfrm/xfrm_output.c | 2 ++ 5 files changed, 11 insertions(+), 4 deletions(-)