Message ID | 1425247789-21211-7-git-send-email-therbert@google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Sun, 2015-03-01 at 14:09 -0800, Tom Herbert wrote: > Call qdisc_skb_get_hash instead of doing skb_flow_dissect and then > jhash by hand. This defeats one of the perturbation goal : If two flows hashes into same hash, then skb_get_hash_perturb(skb, q->perturbation) will also give same hash forever and map to same hash bucket. Ideally, we could add a 'u32 perturbation' salt to __skb_get_hash()/__flow_hash_from_keys()/__flow_hash_3words instead of using a net_get_random_once(&hashrnd, sizeof(hashrnd)); but I guess all these functions are becoming very fat. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 01 Mar 2015 16:58:52 -0800 > On Sun, 2015-03-01 at 14:09 -0800, Tom Herbert wrote: >> Call qdisc_skb_get_hash instead of doing skb_flow_dissect and then >> jhash by hand. > > This defeats one of the perturbation goal : > > If two flows hashes into same hash, then skb_get_hash_perturb(skb, > q->perturbation) will also give same hash forever and map to same hash > bucket. Yes, this needs to be resolved. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Mar 1, 2015 at 4:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Sun, 2015-03-01 at 14:09 -0800, Tom Herbert wrote: >> Call qdisc_skb_get_hash instead of doing skb_flow_dissect and then >> jhash by hand. > > This defeats one of the perturbation goal : > > If two flows hashes into same hash, then skb_get_hash_perturb(skb, > q->perturbation) will also give same hash forever and map to same hash > bucket. > As I already mentioned, the probability of skb_get_hash returning the same value for two flows is 1/2^32. I suspect it's greater probability to get a hash collision between two IPv6 flows with subtlety different addresses or simply match 4-tuples in different address spaces (like VLAN or with VNID). If hash collision is really a problem, we can periodically rekey skb->hash (either SW or HW) or maybe move to 64 bit hashes somehow. Anyway, we potentially have the same problem in RSS and other uses of the hash, if hashes collide then two flows share the same bucket forever unless there is an action to break that. > Ideally, we could add a 'u32 perturbation' salt to > __skb_get_hash()/__flow_hash_from_keys()/__flow_hash_3words instead of > using a net_get_random_once(&hashrnd, sizeof(hashrnd)); > > but I guess all these functions are becoming very fat. > Same thing as rekeying. > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 2, 2015 at 7:31 AM, Tom Herbert <therbert@google.com> wrote: > > On Sun, Mar 1, 2015 at 4:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Sun, 2015-03-01 at 14:09 -0800, Tom Herbert wrote: > >> Call qdisc_skb_get_hash instead of doing skb_flow_dissect and then > >> jhash by hand. > > > > This defeats one of the perturbation goal : > > > > If two flows hashes into same hash, then skb_get_hash_perturb(skb, > > q->perturbation) will also give same hash forever and map to same hash > > bucket. > > > As I already mentioned, the probability of skb_get_hash returning the > same value for two flows is 1/2^32. I suspect it's greater probability > to get a hash collision between two IPv6 flows with subtlety different > addresses or simply match 4-tuples in different address spaces (like > VLAN or with VNID). If hash collision is really a problem, we can > periodically rekey skb->hash (either SW or HW) or maybe move to 64 bit > hashes somehow. Anyway, we potentially have the same problem in RSS > and other uses of the hash, if hashes collide then two flows share the > same bucket forever unless there is an action to break that. > One other possibility is that with the txhash changes we can rehash individual connections that are sourced from the host. I was planning to do this on too many TCP retransmits to try to get a different ECMP path (type of source routing), but we could also do this periodically or based on request from qdisc. ooo_okay could be taken into account to avoid OOO packets. > > > Ideally, we could add a 'u32 perturbation' salt to > > __skb_get_hash()/__flow_hash_from_keys()/__flow_hash_3words instead of > > using a net_get_random_once(&hashrnd, sizeof(hashrnd)); > > > > but I guess all these functions are becoming very fat. > > > Same thing as rekeying. > > > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-03-02 at 07:31 -0800, Tom Herbert wrote: > As I already mentioned, the probability of skb_get_hash returning the > same value for two flows is 1/2^32. I suspect it's greater probability > to get a hash collision between two IPv6 flows with subtlety different > addresses or simply match 4-tuples in different address spaces (like > VLAN or with VNID). If hash collision is really a problem, we can > periodically rekey skb->hash (either SW or HW) or maybe move to 64 bit > hashes somehow. Anyway, we potentially have the same problem in RSS > and other uses of the hash, if hashes collide then two flows share the > same bucket forever unless there is an action to break that. Main historical use of SFQ is for routers. We do not control skb->hash for forwarding workload, unless you reprogram RSS keys on the NIC. If you remember, we used to have a problem on some NIC using a well known RSS key. It is fortunate SFQ/fq_codel did not rely on skb->hash too much. If to target one flow, attackers need to attack your big boxes to flood one particular RX queue at NIC level, this has a huge cost for them. While targeting one hash bucket on a qdisc might be much easier, if they know even a hash perturbation has no effect against their attack. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-03-02 at 08:25 -0800, Tom Herbert wrote: > One other possibility is that with the txhash changes we can rehash > individual connections that are sourced from the host. I was planning > to do this on too many TCP retransmits to try to get a different ECMP > path (type of source routing), but we could also do this periodically > or based on request from qdisc. ooo_okay could be taken into account > to avoid OOO packets. About locally generated traffic, skb->txhash potential uses would be 1) Somewhat spray txhash on the wire (Hmmm...) Hard to do that with regular IPv4/tcp traffic, or quite hacky. (like VLAN spraying...) 2.1) Slave selection on bonding driver 2.2) TX queue selection on multiqueue NIC. In this model (way different than XPS), then skb->ooo_okay is of no use : In both cases, if you change skb->txhash after congestion event, you do not care of reorders anyway. They are going to be generated most probably because of different paths. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 2, 2015 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Mon, 2015-03-02 at 08:25 -0800, Tom Herbert wrote: > >> One other possibility is that with the txhash changes we can rehash >> individual connections that are sourced from the host. I was planning >> to do this on too many TCP retransmits to try to get a different ECMP >> path (type of source routing), but we could also do this periodically >> or based on request from qdisc. ooo_okay could be taken into account >> to avoid OOO packets. > > About locally generated traffic, skb->txhash potential uses would be > > 1) Somewhat spray txhash on the wire (Hmmm...) > Hard to do that with regular IPv4/tcp traffic, or quite hacky. > (like VLAN spraying...) > Spraying would be quite easy by modulating txhash from TCP and using either UDP encapsulation like GUE or IPv6 flow labels. Only problem is that this would constantly be changing rxhash at the receiver for the connection, although flow labels might 'just work' with IPv6 Toeplitz in NICs. > 2.1) Slave selection on bonding driver > 2.2) TX queue selection on multiqueue NIC. > > In this model (way different than XPS), then skb->ooo_okay is of > no use : > > In both cases, if you change skb->txhash after congestion event, > you do not care of reorders anyway. They are going to be generated > most probably because of different paths. > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-03-02 at 11:34 -0800, Tom Herbert wrote: > Spraying would be quite easy by modulating txhash from TCP and using > either UDP encapsulation like GUE or IPv6 flow labels. Only problem > is that this would constantly be changing rxhash at the receiver for > the connection, although flow labels might 'just work' with IPv6 > Toeplitz in NICs. When I said 'regular IPv4/tcp traffic' this did not include any added encapsulation. Most NIC are not able to properly offload TCP (TSO) with such encapsulations. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 2, 2015 at 11:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2015-03-02 at 11:34 -0800, Tom Herbert wrote: > >> Spraying would be quite easy by modulating txhash from TCP and using >> either UDP encapsulation like GUE or IPv6 flow labels. Only problem >> is that this would constantly be changing rxhash at the receiver for >> the connection, although flow labels might 'just work' with IPv6 >> Toeplitz in NICs. > > When I said 'regular IPv4/tcp traffic' this did not include any added > encapsulation. > > Most NIC are not able to properly offload TCP (TSO) with such > encapsulations. > IPv6 flow labels should work through TSO, they would just be replicated for each segment. You would need switches that can hash based on flow label though, and of course IPv6 deployed in the network :-). > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-03-02 at 11:51 -0800, Tom Herbert wrote: > On Mon, Mar 2, 2015 at 11:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Mon, 2015-03-02 at 11:34 -0800, Tom Herbert wrote: > > > >> Spraying would be quite easy by modulating txhash from TCP and using > >> either UDP encapsulation like GUE or IPv6 flow labels. Only problem > >> is that this would constantly be changing rxhash at the receiver for > >> the connection, although flow labels might 'just work' with IPv6 > >> Toeplitz in NICs. > > > > When I said 'regular IPv4/tcp traffic' this did not include any added > > encapsulation. > > > > Most NIC are not able to properly offload TCP (TSO) with such > > encapsulations. > > > IPv6 flow labels should work through TSO, they would just be > replicated for each segment. You would need switches that can hash > based on flow label though, and of course IPv6 deployed in the network > :-). Yeah, but I explicitly talked about native IPv4 + TCP ;) not IPv6 + TCP, or IPv6 + IPV4 + TCP or some encapsulation... Trend is to generalize random packet spraying, even within a flow. https://www.cs.purdue.edu/homes/kompella/publications/infocom13.pdf This forces us to build a resilient TCP stack. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 2, 2015 at 11:24 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2015-03-02 at 07:31 -0800, Tom Herbert wrote: > >> As I already mentioned, the probability of skb_get_hash returning the >> same value for two flows is 1/2^32. I suspect it's greater probability >> to get a hash collision between two IPv6 flows with subtlety different >> addresses or simply match 4-tuples in different address spaces (like >> VLAN or with VNID). If hash collision is really a problem, we can >> periodically rekey skb->hash (either SW or HW) or maybe move to 64 bit >> hashes somehow. Anyway, we potentially have the same problem in RSS >> and other uses of the hash, if hashes collide then two flows share the >> same bucket forever unless there is an action to break that. > > Main historical use of SFQ is for routers. > > We do not control skb->hash for forwarding workload, unless you > reprogram RSS keys on the NIC. > > If you remember, we used to have a problem on some NIC using a well > known RSS key. It is fortunate SFQ/fq_codel did not rely on skb->hash > too much. > Okay, I'll audit the drivers returning skb->hash fro HW to make sure they are properly randomizing the initial hash key. Unless, there's other material objections, I'll formally post these patches. Consolidating the packet hash for different use cases should be more efficient and we'll be able focus efforts on strengthening it over time in one place. As for SW vs. HW hash, it still seems like getting a Toeplitz hash from the hardware is going to be stronger at least for TCP/IPv6 than what we can currently do with jhash as the input is the full 40 bytes of 4-tuple. I must admit I wish Microsoft would have continued to make advancements with Toeplitz hash, it would be nice if there was a real definition for use with UDP and maybe we had instructions for this in x86! Tom > If to target one flow, attackers need to attack your big boxes to flood > one particular RX queue at NIC level, this has a huge cost for them. > > While targeting one hash bucket on a qdisc might be much easier, if they > know even a hash perturbation has no effect against their attack. > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index b877140..e7ed8a5 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -23,7 +23,6 @@ #include <linux/vmalloc.h> #include <net/netlink.h> #include <net/pkt_sched.h> -#include <net/flow_keys.h> #include <net/red.h> @@ -156,30 +155,10 @@ static inline struct sfq_head *sfq_dep_head(struct sfq_sched_data *q, sfq_index return &q->dep[val - SFQ_MAX_FLOWS]; } -/* - * In order to be able to quickly rehash our queue when timer changes - * q->perturbation, we store flow_keys in skb->cb[] - */ -struct sfq_skb_cb { - struct flow_keys keys; -}; - -static inline struct sfq_skb_cb *sfq_skb_cb(const struct sk_buff *skb) -{ - qdisc_cb_private_validate(skb, sizeof(struct sfq_skb_cb)); - return (struct sfq_skb_cb *)qdisc_skb_cb(skb)->data; -} - static unsigned int sfq_hash(const struct sfq_sched_data *q, - const struct sk_buff *skb) + struct sk_buff *skb) { - const struct flow_keys *keys = &sfq_skb_cb(skb)->keys; - unsigned int hash; - - hash = jhash_3words((__force u32)keys->dst, - (__force u32)keys->src ^ keys->ip_proto, - (__force u32)keys->ports, q->perturbation); - return hash & (q->divisor - 1); + return skb_get_hash_perturb(skb, q->perturbation) & (q->divisor - 1); } static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch, @@ -196,10 +175,8 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch, return TC_H_MIN(skb->priority); fl = rcu_dereference_bh(q->filter_list); - if (!fl) { - skb_flow_dissect(skb, &sfq_skb_cb(skb)->keys); + if (!fl) return sfq_hash(q, skb) + 1; - } *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; result = tc_classify(skb, fl, &res);
Call qdisc_skb_get_hash instead of doing skb_flow_dissect and then jhash by hand. Signed-off-by: Tom Herbert <therbert@google.com> --- net/sched/sch_sfq.c | 29 +++-------------------------- 1 file changed, 3 insertions(+), 26 deletions(-)