Message ID | 1259879498-27860-4-git-send-email-opurdila@ixiacom.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Octavian Purdila a écrit : > This patch adds a per SAP device based hash table to solve the > multicast delivery scalability issues for the case where the are a > large number of interfaces and a large number of sockets (bound to the > same SAP) are used. > > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> > --- > include/net/llc.h | 20 ++++++++++++++++---- > include/net/llc_conn.h | 1 + > net/llc/llc_conn.c | 18 +++++++++++++++++- > net/llc/llc_core.c | 3 +++ > net/llc/llc_sap.c | 11 ++++++----- > 5 files changed, 43 insertions(+), 10 deletions(-) > > diff --git a/include/net/llc.h b/include/net/llc.h > index 7940da1..31e9902 100644 > --- a/include/net/llc.h > +++ b/include/net/llc.h > @@ -31,6 +31,14 @@ struct llc_addr { > #define LLC_SAP_STATE_INACTIVE 1 > #define LLC_SAP_STATE_ACTIVE 2 > > +#define LLC_SK_DEV_HASH_BITS 10 > +#define LLC_SK_DEV_HASH_ENTRIES (1<<LLC_SK_DEV_HASH_BITS) > + > +struct llc_sk_list { > + rwlock_t lock; > + struct hlist_head list; > +}; > + This patch introduces a big hash table, and 1024 rwlocks, for IXIACOM need. Is the problem both on lock contention and lookup ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 4 Dec 2009 00:31:37 +0200 Octavian Purdila <opurdila@ixiacom.com> wrote: > This patch adds a per SAP device based hash table to solve the > multicast delivery scalability issues for the case where the are a > large number of interfaces and a large number of sockets (bound to the > same SAP) are used. Rather than adding hash table and rwlock, why not hash list RCU and a single spin lock
On Friday 04 December 2009 00:59:58 you wrote: > Octavian Purdila a écrit : > > This patch adds a per SAP device based hash table to solve the > > multicast delivery scalability issues for the case where the are a > > large number of interfaces and a large number of sockets (bound to the > > same SAP) are used. > > > > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> > > --- > > include/net/llc.h | 20 ++++++++++++++++---- > > include/net/llc_conn.h | 1 + > > net/llc/llc_conn.c | 18 +++++++++++++++++- > > net/llc/llc_core.c | 3 +++ > > net/llc/llc_sap.c | 11 ++++++----- > > 5 files changed, 43 insertions(+), 10 deletions(-) > > > > diff --git a/include/net/llc.h b/include/net/llc.h > > index 7940da1..31e9902 100644 > > --- a/include/net/llc.h > > +++ b/include/net/llc.h > > @@ -31,6 +31,14 @@ struct llc_addr { > > #define LLC_SAP_STATE_INACTIVE 1 > > #define LLC_SAP_STATE_ACTIVE 2 > > > > +#define LLC_SK_DEV_HASH_BITS 10 > > +#define LLC_SK_DEV_HASH_ENTRIES (1<<LLC_SK_DEV_HASH_BITS) > > + > > +struct llc_sk_list { > > + rwlock_t lock; > > + struct hlist_head list; > > +}; > > + > > This patch introduces a big hash table, and 1024 rwlocks, for IXIACOM need. > Yes, that is probably not appropriate for upstream. What would be a good value? > Is the problem both on lock contention and lookup ? Since at this point we are using UP ports contention is not really an issue for us. I've extrapolated this (lock per hash bucket) based on how locking is done in other places, like UDP. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Octavian Purdila a écrit : > > Yes, that is probably not appropriate for upstream. What would be a good > value? > A small one to begin (say 64). > Since at this point we are using UP ports contention is not really an issue > for us. I've extrapolated this (lock per hash bucket) based on how locking is > done in other places, like UDP. Yes but you know we want to remove those locks per UDP hash bucket, since we dont really need them anymore. ;) If you remember, we had in the past one rwlock for the whole UDP table. Then this was converted to one spinlock per hash slot (128 slots) + RCU lookups for unicast RX Then we dynamically sized udp table at boot (up to 65536 slots) multicast optimization (holding lock for small duration + double hashing) bind optimization (thanks to double hashing) To be done : 1) multicast RX can be done without taking any lock, and RCU lookups 2) zap all locks and use one lock, or a small array of hashed spinlocks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 04 December 2009 01:25:13 you wrote: > On Fri, 4 Dec 2009 00:31:37 +0200 > > Octavian Purdila <opurdila@ixiacom.com> wrote: > > This patch adds a per SAP device based hash table to solve the > > multicast delivery scalability issues for the case where the are a > > large number of interfaces and a large number of sockets (bound to the > > same SAP) are used. > > Rather than adding hash table and rwlock, why not hash list RCU > and a single spin lock > I have a partial version with RCU and single spinlock, but then I ran into a (Eric's I think) patch which moved the UDP lock per bucket. And since RCU can't help on the write side (in this instance each time we bound or delete the socket) it was not clear to me what is the best approach. Should I go ahead with the RCU and single lock? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 04 December 2009 01:52:44 you wrote: > Octavian Purdila a écrit : > > Since at this point we are using UP ports contention is not really an > > issue for us. I've extrapolated this (lock per hash bucket) based on how > > locking is done in other places, like UDP. > > Yes but you know we want to remove those locks per UDP hash bucket, since > we dont really need them anymore. ;) > > If you remember, we had in the past one rwlock for the whole UDP table. > > Then this was converted to one spinlock per hash slot (128 slots) + RCU > lookups for unicast RX > > Then we dynamically sized udp table at boot (up to 65536 slots) > > multicast optimization (holding lock for small duration + double hashing) > > bind optimization (thanks to double hashing) > > To be done : > > 1) multicast RX can be done without taking any lock, and RCU lookups > 2) zap all locks and use one lock, or a small array of hashed spinlocks > Thanks for the nice summary Eric ! I still have one doubt related to this: we still need locking for creating and destroying sockets to insert/remove them into/from the hash, RCU can't help us here, right? In that case wouldn't spinlock contention become an issue for short lived connections? Probably not for UDP (or LLC), but for TCP I certainly can think of a few usecases for short lived connections. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Octavian Purdila a écrit : > On Friday 04 December 2009 01:52:44 you wrote: >> Octavian Purdila a écrit : > >>> Since at this point we are using UP ports contention is not really an >>> issue for us. I've extrapolated this (lock per hash bucket) based on how >>> locking is done in other places, like UDP. >> Yes but you know we want to remove those locks per UDP hash bucket, since >> we dont really need them anymore. ;) >> >> If you remember, we had in the past one rwlock for the whole UDP table. >> >> Then this was converted to one spinlock per hash slot (128 slots) + RCU >> lookups for unicast RX >> >> Then we dynamically sized udp table at boot (up to 65536 slots) >> >> multicast optimization (holding lock for small duration + double hashing) >> >> bind optimization (thanks to double hashing) >> >> To be done : >> >> 1) multicast RX can be done without taking any lock, and RCU lookups >> 2) zap all locks and use one lock, or a small array of hashed spinlocks >> > > Thanks for the nice summary Eric ! > > I still have one doubt related to this: we still need locking for creating and > destroying sockets to insert/remove them into/from the hash, RCU can't help us > here, right? Sure. RCU is used for readers only. We still need locks to protect writers against them. > > In that case wouldn't spinlock contention become an issue for short lived > connections? Probably not for UDP (or LLC), but for TCP I certainly can think > of a few usecases for short lived connections. Yes, this is why an array of hashed spinlocks would be good, (as already done with TCP for example, or IP route cache) (Say you have a table of 65536 UDP slots on your high performance server, handling millions of udp sockets, you dont _need_ 65536 spinlocks, but some number related to number of cpus) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 4 Dec 2009 01:53:11 +0200 Octavian Purdila <opurdila@ixiacom.com> wrote: > On Friday 04 December 2009 01:25:13 you wrote: > > On Fri, 4 Dec 2009 00:31:37 +0200 > > > > Octavian Purdila <opurdila@ixiacom.com> wrote: > > > This patch adds a per SAP device based hash table to solve the > > > multicast delivery scalability issues for the case where the are a > > > large number of interfaces and a large number of sockets (bound to the > > > same SAP) are used. > > > > Rather than adding hash table and rwlock, why not hash list RCU > > and a single spin lock > > > > I have a partial version with RCU and single spinlock, but then I ran into a > (Eric's I think) patch which moved the UDP lock per bucket. And since RCU > can't help on the write side (in this instance each time we bound or delete > the socket) it was not clear to me what is the best approach. The lock is held for such a brief period on connection setup that a single spinlock is probably ok. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/llc.h b/include/net/llc.h index 7940da1..31e9902 100644 --- a/include/net/llc.h +++ b/include/net/llc.h @@ -31,6 +31,14 @@ struct llc_addr { #define LLC_SAP_STATE_INACTIVE 1 #define LLC_SAP_STATE_ACTIVE 2 +#define LLC_SK_DEV_HASH_BITS 10 +#define LLC_SK_DEV_HASH_ENTRIES (1<<LLC_SK_DEV_HASH_BITS) + +struct llc_sk_list { + rwlock_t lock; + struct hlist_head list; +}; + /** * struct llc_sap - Defines the SAP component * @@ -53,12 +61,16 @@ struct llc_sap { struct net_device *orig_dev); struct llc_addr laddr; struct list_head node; - struct { - rwlock_t lock; - struct hlist_head list; - } sk_list; + struct llc_sk_list sk_list; + struct llc_sk_list sk_dev_hash[LLC_SK_DEV_HASH_ENTRIES]; }; +static inline +struct llc_sk_list *llc_sk_dev_hash(struct llc_sap *sap, int ifindex) +{ + return &sap->sk_dev_hash[ifindex % LLC_SK_DEV_HASH_ENTRIES]; +} + #define LLC_DEST_INVALID 0 /* Invalid LLC PDU type */ #define LLC_DEST_SAP 1 /* Type 1 goes here */ #define LLC_DEST_CONN 2 /* Type 2 goes here */ diff --git a/include/net/llc_conn.h b/include/net/llc_conn.h index fe982fd..2f97d8d 100644 --- a/include/net/llc_conn.h +++ b/include/net/llc_conn.h @@ -77,6 +77,7 @@ struct llc_sock { received and caused sending FRMR. Used for resending FRMR */ u32 cmsg_flags; + struct hlist_node dev_hash_node; }; static inline struct llc_sock *llc_sk(const struct sock *sk) diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c index c6bab39..c3f47ec 100644 --- a/net/llc/llc_conn.c +++ b/net/llc/llc_conn.c @@ -651,11 +651,19 @@ static int llc_find_offset(int state, int ev_type) */ void llc_sap_add_socket(struct llc_sap *sap, struct sock *sk) { + struct llc_sock *llc = llc_sk(sk); + struct llc_sk_list *dev_hb = llc_sk_dev_hash(sap, llc->dev->ifindex); + llc_sap_hold(sap); + llc->sap = sap; + write_lock_bh(&sap->sk_list.lock); - llc_sk(sk)->sap = sap; sk_add_node(sk, &sap->sk_list.list); write_unlock_bh(&sap->sk_list.lock); + + write_lock_bh(&dev_hb->lock); + hlist_add_head(&llc->dev_hash_node, &dev_hb->list); + write_unlock_bh(&dev_hb->lock); } /** @@ -668,9 +676,17 @@ void llc_sap_add_socket(struct llc_sap *sap, struct sock *sk) */ void llc_sap_remove_socket(struct llc_sap *sap, struct sock *sk) { + struct llc_sock *llc = llc_sk(sk); + struct llc_sk_list *dev_hb = llc_sk_dev_hash(sap, llc->dev->ifindex); + write_lock_bh(&sap->sk_list.lock); sk_del_node_init(sk); write_unlock_bh(&sap->sk_list.lock); + + write_lock_bh(&dev_hb->lock); + hlist_del(&llc->dev_hash_node); + write_unlock_bh(&dev_hb->lock); + llc_sap_put(sap); } diff --git a/net/llc/llc_core.c b/net/llc/llc_core.c index ff4c0ab..1d79cda 100644 --- a/net/llc/llc_core.c +++ b/net/llc/llc_core.c @@ -33,12 +33,15 @@ DEFINE_RWLOCK(llc_sap_list_lock); static struct llc_sap *llc_sap_alloc(void) { struct llc_sap *sap = kzalloc(sizeof(*sap), GFP_ATOMIC); + int i; if (sap) { /* sap->laddr.mac - leave as a null, it's filled by bind */ sap->state = LLC_SAP_STATE_ACTIVE; rwlock_init(&sap->sk_list.lock); atomic_set(&sap->refcnt, 1); + for (i = 0; i < LLC_SK_DEV_HASH_ENTRIES; i++) + rwlock_init(&sap->sk_dev_hash[i].lock); } return sap; } diff --git a/net/llc/llc_sap.c b/net/llc/llc_sap.c index 008de1f..11bad55 100644 --- a/net/llc/llc_sap.c +++ b/net/llc/llc_sap.c @@ -340,12 +340,13 @@ static void llc_sap_mcast(struct llc_sap *sap, const struct llc_addr *laddr, struct sk_buff *skb) { - struct sock *sk; + struct llc_sock *llc; struct hlist_node *node; + struct llc_sk_list *dev_hb = llc_sk_dev_hash(sap, skb->dev->ifindex); - read_lock_bh(&sap->sk_list.lock); - sk_for_each(sk, node, &sap->sk_list.list) { - struct llc_sock *llc = llc_sk(sk); + read_lock_bh(&dev_hb->lock); + hlist_for_each_entry(llc, node, &dev_hb->list, dev_hash_node) { + struct sock *sk = &llc->sk; struct sk_buff *skb1; if (sk->sk_type != SOCK_DGRAM) @@ -365,7 +366,7 @@ static void llc_sap_mcast(struct llc_sap *sap, llc_sap_rcv(sap, skb1, sk); sock_put(sk); } - read_unlock_bh(&sap->sk_list.lock); + read_unlock_bh(&dev_hb->lock); }
This patch adds a per SAP device based hash table to solve the multicast delivery scalability issues for the case where the are a large number of interfaces and a large number of sockets (bound to the same SAP) are used. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> --- include/net/llc.h | 20 ++++++++++++++++---- include/net/llc_conn.h | 1 + net/llc/llc_conn.c | 18 +++++++++++++++++- net/llc/llc_core.c | 3 +++ net/llc/llc_sap.c | 11 ++++++----- 5 files changed, 43 insertions(+), 10 deletions(-)