Message ID | 20190222160330.34237-1-salyzyn@android.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
Series | [stable,3.18,backport,v2] netlink: Trim skb to alloc size to avoid MSG_TRUNC | expand |
On Fri, Feb 22, 2019 at 08:03:28AM -0800, Mark Salyzyn wrote: > From: "Arad, Ronen" <ronen.arad@intel.com> > > Direct this upstream db65a3aaf29ecce2e34271d52e8d2336b97bd9fe sha to > stable 3.18. This patch addresses a race condition where a call to > > nlk->max_recvmsg_len = max(nlk->max_recvmsg_len, len); > nlk->max_recvmsg_len = min_t(size_t, nlk->max_recvmsg_len, > > one thread in-between another thread: > > skb = netlink_alloc_skb(sk, > > and > > skb_reserve(skb, skb_tailroom(skb) - > nlk->max_recvmsg_len); > > in netlink_dump. The result can be a negative value and will cause > a kernel panic ad BUG at net/core/skbuff.c because the negative value > turns into an extremely large positive value. > > Original commit: > > netlink_dump() allocates skb based on the calculated min_dump_alloc or > a per socket max_recvmsg_len. > min_alloc_size is maximum space required for any single netdev > attributes as calculated by rtnl_calcit(). > max_recvmsg_len tracks the user provided buffer to netlink_recvmsg. > It is capped at 16KiB. > The intention is to avoid small allocations and to minimize the number > of calls required to obtain dump information for all net devices. > > netlink_dump packs as many small messages as could fit within an skb > that was sized for the largest single netdev information. The actual > space available within an skb is larger than what is requested. It could > be much larger and up to near 2x with align to next power of 2 approach. > > Allowing netlink_dump to use all the space available within the > allocated skb increases the buffer size a user has to provide to avoid > truncaion (i.e. MSG_TRUNG flag set). > > It was observed that with many VLANs configured on at least one netdev, > a larger buffer of near 64KiB was necessary to avoid "Message truncated" > error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when > min_alloc_size was only little over 32KiB. > > This patch trims skb to allocated size in order to allow the user to > avoid truncation with more reasonable buffer size. > > Signed-off-by: Ronen Arad <ronen.arad@intel.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > (cherry pick commit db65a3aaf29ecce2e34271d52e8d2336b97bd9fe) > Signed-off-by: Mark Salyzyn <salyzyn@android.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Ronen Arad <ronen.arad@intel.com> > Cc: "David S . Miller" <davem@davemloft.net> > Cc: Dmitry Safonov <dima@arista.com> > Cc: David Ahern <dsahern@gmail.com> > Cc: Kirill Tkhai <ktkhai@virtuozzo.com> > Cc: Andrei Vagin <avagin@virtuozzo.com> > Cc: Li RongQing <lirongqing@baidu.com> > Cc: YU Bo <tsu.yubo@gmail.com> > Cc: Denys Vlasenko <dvlasenk@redhat.com> > Cc: netdev@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: stable@vger.kernel.org # 3.18 > --- > net/netlink/af_netlink.c | 34 ++++++++++++++++++++++------------ > 1 file changed, 22 insertions(+), 12 deletions(-) Now queued up, thanks. greg k-h
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 50096e0edd8e..14295cef6b76 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1977,6 +1977,7 @@ static int netlink_dump(struct sock *sk) struct nlmsghdr *nlh; struct module *module; int err = -ENOBUFS; + int alloc_min_size; int alloc_size; mutex_lock(nlk->cb_mutex); @@ -1985,9 +1986,6 @@ static int netlink_dump(struct sock *sk) goto errout_skb; } - cb = &nlk->cb; - alloc_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE); - if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf) goto errout_skb; @@ -1996,22 +1994,34 @@ static int netlink_dump(struct sock *sk) * to reduce number of system calls on dump operations, if user * ever provided a big enough buffer. */ - if (alloc_size < nlk->max_recvmsg_len) { - skb = netlink_alloc_skb(sk, - nlk->max_recvmsg_len, - nlk->portid, + cb = &nlk->cb; + alloc_min_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE); + + if (alloc_min_size < nlk->max_recvmsg_len) { + alloc_size = nlk->max_recvmsg_len; + skb = netlink_alloc_skb(sk, alloc_size, nlk->portid, (GFP_KERNEL & ~__GFP_WAIT) | __GFP_NOWARN | __GFP_NORETRY); - /* available room should be exact amount to avoid MSG_TRUNC */ - if (skb) - skb_reserve(skb, skb_tailroom(skb) - - nlk->max_recvmsg_len); } - if (!skb) + if (!skb) { + alloc_size = alloc_min_size; skb = netlink_alloc_skb(sk, alloc_size, nlk->portid, (GFP_KERNEL & ~__GFP_WAIT)); + } if (!skb) goto errout_skb; + + /* Trim skb to allocated size. User is expected to provide buffer as + * large as max(min_dump_alloc, 16KiB (mac_recvmsg_len capped at + * netlink_recvmsg())). dump will pack as many smaller messages as + * could fit within the allocated skb. skb is typically allocated + * with larger space than required (could be as much as near 2x the + * requested size with align to next power of 2 approach). Allowing + * dump to use the excess space makes it difficult for a user to have a + * reasonable static buffer based on the expected largest dump of a + * single netdev. The outcome is MSG_TRUNC error. + */ + skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); if (nlk->dump_done_errno > 0)