Message ID | 1450814710-17850-2-git-send-email-kraigatgoog@gmail.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Craig Gallek <kraigatgoog@gmail.com> Date: Tue, 22 Dec 2015 15:05:07 -0500 > + for (i = 0; i < reuse->num_socks; i++) { > + if (reuse->socks[i] == sk) { > + reuse->socks[i] = reuse->socks[reuse->num_socks - 1]; > + reuse->num_socks--; > + if (reuse->num_socks == 0) > + kfree_rcu(reuse, rcu); > + break; > + } > + } Don't you need to memmove() the entire rest of the array down one slot when you hit the matching 'sk' in there? I can't see how it can work to only move one entry down. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Dec 22, 2015 at 4:40 PM, David Miller <davem@davemloft.net> wrote: > From: Craig Gallek <kraigatgoog@gmail.com> > Date: Tue, 22 Dec 2015 15:05:07 -0500 > >> + for (i = 0; i < reuse->num_socks; i++) { >> + if (reuse->socks[i] == sk) { >> + reuse->socks[i] = reuse->socks[reuse->num_socks - 1]; >> + reuse->num_socks--; >> + if (reuse->num_socks == 0) >> + kfree_rcu(reuse, rcu); >> + break; >> + } >> + } > > Don't you need to memmove() the entire rest of the array down one slot > when you hit the matching 'sk' in there? I can't see how it can work > to only move one entry down. It moves the last element in the list into the slot that was just emptied. You could argue that this may cause unexpected changes in the index -> socket mapping observed by the user, but I'm not sure making many socket indexes change (by sliding everything down one) when one is removed is a desirable behavior either. I don't have a strong opinion either way though... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Craig Gallek <kraigatgoog@gmail.com> Date: Tue, 22 Dec 2015 16:58:11 -0500 > On Tue, Dec 22, 2015 at 4:40 PM, David Miller <davem@davemloft.net> wrote: >> From: Craig Gallek <kraigatgoog@gmail.com> >> Date: Tue, 22 Dec 2015 15:05:07 -0500 >> >>> + for (i = 0; i < reuse->num_socks; i++) { >>> + if (reuse->socks[i] == sk) { >>> + reuse->socks[i] = reuse->socks[reuse->num_socks - 1]; >>> + reuse->num_socks--; >>> + if (reuse->num_socks == 0) >>> + kfree_rcu(reuse, rcu); >>> + break; >>> + } >>> + } >> >> Don't you need to memmove() the entire rest of the array down one slot >> when you hit the matching 'sk' in there? I can't see how it can work >> to only move one entry down. > It moves the last element in the list into the slot that was just > emptied. You could argue that this may cause unexpected changes in > the index -> socket mapping observed by the user, but I'm not sure > making many socket indexes change (by sliding everything down one) > when one is removed is a desirable behavior either. I don't have a > strong opinion either way though... Thanks for explaining, I misered the code. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Craig, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Craig-Gallek/Faster-SO_REUSEPORT/20151223-040911 config: arm-mvebu_v7_defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm Note: the linux-review/Craig-Gallek/Faster-SO_REUSEPORT/20151223-040911 HEAD 660edb1fa7e1a2bf71ef9cbf4555cda4af95ea58 builds fine. It only hurts bisectibility. All errors (new ones prefixed by >>): In file included from include/linux/spinlock_types.h:13:0, from include/net/sock_reuseport.h:4, from net/core/sock_reuseport.c:7: >> arch/arm/include/asm/spinlock_types.h:12:3: error: unknown type name 'u32' u32 slock; ^ >> arch/arm/include/asm/spinlock_types.h:18:4: error: unknown type name 'u16' u16 owner; ^ arch/arm/include/asm/spinlock_types.h:19:4: error: unknown type name 'u16' u16 next; ^ arch/arm/include/asm/spinlock_types.h:28:2: error: unknown type name 'u32' u32 lock; ^ vim +/u32 +12 arch/arm/include/asm/spinlock_types.h fb1c8f93 include/asm-arm/spinlock_types.h Ingo Molnar 2005-09-10 6 #endif fb1c8f93 include/asm-arm/spinlock_types.h Ingo Molnar 2005-09-10 7 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 8 #define TICKET_SHIFT 16 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 9 fb1c8f93 include/asm-arm/spinlock_types.h Ingo Molnar 2005-09-10 10 typedef struct { 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 11 union { 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 @12 u32 slock; 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 13 struct __raw_tickets { 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 14 #ifdef __ARMEB__ 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 15 u16 next; 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 16 u16 owner; 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 17 #else 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 @18 u16 owner; 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 19 u16 next; 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 20 #endif 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 21 } tickets; :::::: The code at line 12 was first introduced by commit :::::: 546c2896a42202dbc7d02f7c6ec9948ac1bf511b ARM: 7446/1: spinlock: use ticket algorithm for ARMv6+ locking implementation :::::: TO: Will Deacon <will.deacon@arm.com> :::::: CC: Russell King <rmk+kernel@arm.linux.org.uk> --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Tue, Dec 22, 2015 at 5:11 PM, kbuild test robot <lkp@intel.com> wrote: > Hi Craig, > > [auto build test ERROR on net-next/master] > > url: https://github.com/0day-ci/linux/commits/Craig-Gallek/Faster-SO_REUSEPORT/20151223-040911 > config: arm-mvebu_v7_defconfig (attached as .config) > reproduce: > wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=arm > > Note: the linux-review/Craig-Gallek/Faster-SO_REUSEPORT/20151223-040911 HEAD 660edb1fa7e1a2bf71ef9cbf4555cda4af95ea58 builds fine. > It only hurts bisectibility. > > All errors (new ones prefixed by >>): > > In file included from include/linux/spinlock_types.h:13:0, > from include/net/sock_reuseport.h:4, > from net/core/sock_reuseport.c:7: >>> arch/arm/include/asm/spinlock_types.h:12:3: error: unknown type name 'u32' > u32 slock; > ^ >>> arch/arm/include/asm/spinlock_types.h:18:4: error: unknown type name 'u16' > u16 owner; > ^ > arch/arm/include/asm/spinlock_types.h:19:4: error: unknown type name 'u16' > u16 next; > ^ > arch/arm/include/asm/spinlock_types.h:28:2: error: unknown type name 'u32' > u32 lock; > ^ > > vim +/u32 +12 arch/arm/include/asm/spinlock_types.h > > fb1c8f93 include/asm-arm/spinlock_types.h Ingo Molnar 2005-09-10 6 #endif > fb1c8f93 include/asm-arm/spinlock_types.h Ingo Molnar 2005-09-10 7 > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 8 #define TICKET_SHIFT 16 > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 9 > fb1c8f93 include/asm-arm/spinlock_types.h Ingo Molnar 2005-09-10 10 typedef struct { > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 11 union { > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 @12 u32 slock; > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 13 struct __raw_tickets { > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 14 #ifdef __ARMEB__ > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 15 u16 next; > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 16 u16 owner; > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 17 #else > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 @18 u16 owner; > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 19 u16 next; > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 20 #endif > 546c2896 arch/arm/include/asm/spinlock_types.h Will Deacon 2012-07-06 21 } tickets; > > :::::: The code at line 12 was first introduced by commit > :::::: 546c2896a42202dbc7d02f7c6ec9948ac1bf511b ARM: 7446/1: spinlock: use ticket algorithm for ARMv6+ locking implementation > > :::::: TO: Will Deacon <will.deacon@arm.com> > :::::: CC: Russell King <rmk+kernel@arm.linux.org.uk> > > --- > 0-DAY kernel test infrastructure Open Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation ACK, I don't even need spinlock_types for this file anymore (though it may be worth fixing the arm version to include linux/types.h). Will remove for v2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/sock.h b/include/net/sock.h index 3794cdd..e830c10 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -318,6 +318,7 @@ struct cg_proto; * @sk_error_report: callback to indicate errors (e.g. %MSG_ERRQUEUE) * @sk_backlog_rcv: callback to process the backlog * @sk_destruct: called at sock freeing time, i.e. when all refcnt == 0 + * @sk_reuseport_cb: reuseport group container */ struct sock { /* @@ -453,6 +454,7 @@ struct sock { int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); void (*sk_destruct)(struct sock *sk); + struct sock_reuseport __rcu *sk_reuseport_cb; }; #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h new file mode 100644 index 0000000..f17d190 --- /dev/null +++ b/include/net/sock_reuseport.h @@ -0,0 +1,21 @@ +#ifndef _SOCK_REUSEPORT_H +#define _SOCK_REUSEPORT_H + +#include <linux/spinlock_types.h> +#include <linux/types.h> +#include <net/sock.h> + +struct sock_reuseport { + struct rcu_head rcu; + + u16 max_socks; /* length of socks */ + u16 num_socks; /* elements in socks */ + struct sock *socks[0]; /* array of sock pointers */ +}; + +extern int reuseport_alloc(struct sock *sk); +extern int reuseport_add_sock(struct sock *sk, const struct sock *sk2); +extern void reuseport_detach_sock(struct sock *sk); +extern struct sock *reuseport_select_sock(struct sock *sk, u32 hash); + +#endif /* _SOCK_REUSEPORT_H */ diff --git a/net/core/Makefile b/net/core/Makefile index 086b01f..0b835de 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -9,7 +9,7 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o obj-y += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o \ - sock_diag.o dev_ioctl.o tso.o + sock_diag.o dev_ioctl.o tso.o sock_reuseport.o obj-$(CONFIG_XFRM) += flow.o obj-y += net-sysfs.o diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c new file mode 100644 index 0000000..963c8d5 --- /dev/null +++ b/net/core/sock_reuseport.c @@ -0,0 +1,173 @@ +/* + * To speed up listener socket lookup, create an array to store all sockets + * listening on the same port. This allows a decision to be made after finding + * the first socket. + */ + +#include <net/sock_reuseport.h> +#include <linux/rcupdate.h> + +#define INIT_SOCKS 128 + +static DEFINE_SPINLOCK(reuseport_lock); + +static struct sock_reuseport *__reuseport_alloc(u16 max_socks) +{ + size_t size = sizeof(struct sock_reuseport) + + sizeof(struct sock *) * max_socks; + struct sock_reuseport *reuse = kzalloc(size, GFP_ATOMIC); + + if (!reuse) + return NULL; + + reuse->max_socks = max_socks; + + return reuse; +} + +int reuseport_alloc(struct sock *sk) +{ + struct sock_reuseport *reuse; + + /* bh lock used since this function call may precede hlist lock in + * soft irq of receive path or setsockopt from process context + */ + spin_lock_bh(&reuseport_lock); + WARN_ONCE(rcu_dereference_protected(sk->sk_reuseport_cb, + lockdep_is_held(&reuseport_lock)), + "multiple allocations for the same socket"); + reuse = __reuseport_alloc(INIT_SOCKS); + if (!reuse) { + spin_unlock_bh(&reuseport_lock); + return -ENOMEM; + } + + reuse->socks[0] = sk; + reuse->num_socks = 1; + rcu_assign_pointer(sk->sk_reuseport_cb, reuse); + + spin_unlock_bh(&reuseport_lock); + + return 0; +} +EXPORT_SYMBOL(reuseport_alloc); + +static struct sock_reuseport *reuseport_grow(struct sock_reuseport *reuse) +{ + struct sock_reuseport *more_reuse; + u32 more_socks_size, i; + + more_socks_size = reuse->max_socks * 2U; + if (more_socks_size > U16_MAX) + return NULL; + + more_reuse = __reuseport_alloc(more_socks_size); + if (!more_reuse) + return NULL; + + more_reuse->max_socks = more_socks_size; + more_reuse->num_socks = reuse->num_socks; + + memcpy(more_reuse->socks, reuse->socks, + reuse->num_socks * sizeof(struct sock *)); + + for (i = 0; i < reuse->num_socks; ++i) + rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb, + more_reuse); + + kfree_rcu(reuse, rcu); + return more_reuse; +} + +/** + * reuseport_add_sock - Add a socket to the reuseport group of another. + * @sk: New socket to add to the group. + * @sk2: Socket belonging to the existing reuseport group. + * May return ENOMEM and not add socket to group under memory pressure. + */ +int reuseport_add_sock(struct sock *sk, const struct sock *sk2) +{ + struct sock_reuseport *reuse; + + spin_lock_bh(&reuseport_lock); + reuse = rcu_dereference_protected(sk2->sk_reuseport_cb, + lockdep_is_held(&reuseport_lock)), + WARN_ONCE(rcu_dereference_protected(sk->sk_reuseport_cb, + lockdep_is_held(&reuseport_lock)), + "socket already in reuseport group"); + + if (reuse->num_socks == reuse->max_socks) { + reuse = reuseport_grow(reuse); + if (!reuse) { + spin_unlock_bh(&reuseport_lock); + return -ENOMEM; + } + } + + reuse->socks[reuse->num_socks] = sk; + /* paired with smp_rmb() in reuseport_select_sock() */ + smp_wmb(); + reuse->num_socks++; + rcu_assign_pointer(sk->sk_reuseport_cb, reuse); + + spin_unlock_bh(&reuseport_lock); + + return 0; +} +EXPORT_SYMBOL(reuseport_add_sock); + +void reuseport_detach_sock(struct sock *sk) +{ + struct sock_reuseport *reuse; + int i; + + spin_lock_bh(&reuseport_lock); + reuse = rcu_dereference_protected(sk->sk_reuseport_cb, + lockdep_is_held(&reuseport_lock)); + rcu_assign_pointer(sk->sk_reuseport_cb, NULL); + + for (i = 0; i < reuse->num_socks; i++) { + if (reuse->socks[i] == sk) { + reuse->socks[i] = reuse->socks[reuse->num_socks - 1]; + reuse->num_socks--; + if (reuse->num_socks == 0) + kfree_rcu(reuse, rcu); + break; + } + } + spin_unlock_bh(&reuseport_lock); +} +EXPORT_SYMBOL(reuseport_detach_sock); + +/** + * reuseport_select_sock - Select a socket from an SO_REUSEPORT group. + * @sk: First socket in the group. + * @hash: Use this hash to select. + * Returns a socket that should receive the packet (or NULL on error). + */ +struct sock *reuseport_select_sock(struct sock *sk, u32 hash) +{ + struct sock_reuseport *reuse; + struct sock *sk2 = NULL; + u16 socks; + + rcu_read_lock(); + reuse = rcu_dereference(sk->sk_reuseport_cb); + + /* if memory allocation failed or add call is not yet complete */ + if (!reuse) + goto out; + + socks = READ_ONCE(reuse->num_socks); + if (likely(socks)) { + /* paired with smp_wmb() in reuseport_add_sock() */ + smp_rmb(); + + sk2 = reuse->socks[reciprocal_scale(hash, socks)]; + } + +out: + rcu_read_unlock(); + return sk2; +} +EXPORT_SYMBOL(reuseport_select_sock);