From patchwork Tue Jan 19 15:57:54 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marc Dionne X-Patchwork-Id: 570023 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 90AB5140B8C for ; Wed, 20 Jan 2016 02:58:01 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=AC8s3/4m; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932134AbcASP55 (ORCPT ); Tue, 19 Jan 2016 10:57:57 -0500 Received: from mail-qg0-f48.google.com ([209.85.192.48]:34806 "EHLO mail-qg0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754602AbcASP5z (ORCPT ); Tue, 19 Jan 2016 10:57:55 -0500 Received: by mail-qg0-f48.google.com with SMTP id 6so503743038qgy.1; Tue, 19 Jan 2016 07:57:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=P7kbkXc8fB2a3eaNqmeU4uHuYyLmLOlP2MJ2V+HMRqM=; b=AC8s3/4m88YEzZQgi8EbDHI45UVJW5ZnrpHfNdIhls2d5e6wByRIpkp9EsPBkRvyzp 3P018w5I1ac4849oE+GxBGEZ9fKjJsyZwRwQyw/MV6dOQJ1LEQI9HN9x+WsgOQUBD08j k0+IqVTpr3gUh92k0yFqaWqlAi/OuAHuSmRi5KjmIuw0voMIWhgCs/4diPt0HgLh4wfq mWXK1BoPlySQgEftb2qeqHdeaX+shCd4zdIdO2BwFSsd11ykjpl3CQ1626lJ2aWMEvlE kkCf/b6mNHQBoifGm2J29ioNA3rb87+hHBDgSrBs55DA1+UAfw8MiQZ8jSthJCrPBHre HWTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to:cc :content-type; bh=P7kbkXc8fB2a3eaNqmeU4uHuYyLmLOlP2MJ2V+HMRqM=; b=Jl2I9Z9edDAr8khT4xfAPOm/AmpG7aqBHZa/D5L3yz+6oNP6FGtlJnPOwcobLgd8OY 1fHUpD8433QfEVMFVzcLixA3SD3Y7DjEREMscXi+mOjtZv+4CNMODJsYY7lRnsu0j9HJ 0qqJ80CudC31L7xfngRzYgUNlm/EoqxNpZR17mtrGf97oIhRVaomG9tincnohBbjqjPT vsWxjf1Vh2oP5xFiwhpI+gtM6xJrvUazqIitvpcF50UC2GsGFtiecflz3XcDs0jh+v8o otfQCq7zR4dZj53HFAW0kvsBdyxtmaIdAySL9B28x+cTiQO2e2ktzufzBmUBEfyoySXh t+PA== X-Gm-Message-State: ALoCoQnf9a5zcSCOJUe4/uovcfH9CHicMM15V2/8agmnHIAGpqXQVXNQtvwFdTqq9plfayjNHsYLFaz0QTYIxEoTvXS83jFHtA== MIME-Version: 1.0 X-Received: by 10.140.38.73 with SMTP id s67mr38896744qgs.82.1453219074306; Tue, 19 Jan 2016 07:57:54 -0800 (PST) Received: by 10.55.161.22 with HTTP; Tue, 19 Jan 2016 07:57:54 -0800 (PST) Date: Tue, 19 Jan 2016 11:57:54 -0400 Message-ID: Subject: Crash with SO_REUSEPORT and ef456144da8ef507c8cf504284b6042e9201a05c From: Marc Dionne To: netdev@vger.kernel.org, Linux Kernel Mailing List , Craig Gallek Cc: eric.dumazet@google.com Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org I shared this one with Craig but I thought I'd put it out to a wider audience. Trying to run the current kernel mainline on a test system I found that any attempt to run many of our executables would crash the system. The networking code in all of these opens and listens on multiple UDP sockets set with SO_REUSEPORT. We also like to bind the first socket before setting SO_REUSEPORT so we can catch some cases where the port is actually in use by someone else (for instance a previous incarnation of the same service). This is easily reproduced with this sequence: - create 2 sockets A and B - bind socket A to an address - set SO_REUSEPORT on socket A - set SO_REUSEPORT on socket B - bind socket B to the same address as A The sk_reuseport_cb structure is only allocated at bind time if SO_REUSEPORT is already set, so A doesn't have one. When we bind B, A is found as a match that has SO_REUSEPORT and reuseport_add_sock will try to use the NULL sk_reuseport_cb structure from A, causing a crash. Not sure what the best fix is, but seems like the structure could be either allocated (if not already done) when setting SO_REUSEPORT, or when we find it to be NULL in reuseport_add_sock (but locking may be an issue there). I was able to test that allocating sk_reuseport_cb when setting SO_REUSEPORT makes things behave normally again; see attached patch. That's surely not a correct/complete fix as B (in the scenario above) will have an unnecessary sk_reuseport_cb which will trigger a warning and should be dealt with. Thanks, Marc commit 518120f41aec30bdde39caf58818e1e5d2d6a4a4 Author: Marc Dionne Date: Fri Jan 15 16:02:32 2016 -0400 soreuseport: Allow bind before setting SO_REUSEPORT If a socket is bound before the SO_REUSEPORT option is set, it will have no sk_reuseport_cb structure allocated and attached to it. If there is then an attempt to bind a second socket to the same port, the first socket will be found as matching and reuseport_add_sock will attempt to use the NULL sk_reuseport_cb pointer, resulting in an oops. Allocate sk_reuseport_cb in setsockopt if not already set. Also, don't preclude reusing a port because the sk has sk_reuseport_cb set. Signed-off-by: Marc Dionne diff --git a/net/core/sock.c b/net/core/sock.c index 5127023..587de5b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -724,6 +724,8 @@ int sock_setsockopt(struct socket *sock, int level, int optname, break; case SO_REUSEPORT: sk->sk_reuseport = valbool; + if (!rcu_access_pointer(sk->sk_reuseport_cb)) + reuseport_alloc(sk); break; case SO_TYPE: case SO_PROTOCOL: diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index dc45b53..13884b6 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -154,7 +154,6 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num, (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if || sk2->sk_bound_dev_if == sk->sk_bound_dev_if) && (!sk2->sk_reuseport || !sk->sk_reuseport || - rcu_access_pointer(sk->sk_reuseport_cb) || !uid_eq(uid, sock_i_uid(sk2))) && saddr_comp(sk, sk2, true)) { if (!bitmap) @@ -190,7 +189,6 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num, (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if || sk2->sk_bound_dev_if == sk->sk_bound_dev_if) && (!sk2->sk_reuseport || !sk->sk_reuseport || - rcu_access_pointer(sk->sk_reuseport_cb) || !uid_eq(uid, sock_i_uid(sk2))) && saddr_comp(sk, sk2, true)) { res = 1;