Message ID | 6d9ed6c448a5c855e05abf19c205f33a66b6ff40.1552557395.git.sd@queasysnail.net |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net] net: enforce xmit_recursion for devices with a queue | expand |
On 03/14/2019 03:15 AM, Sabrina Dubroca wrote: > Commit 745e20f1b626 ("net: add a recursion limit in xmit path") > introduced a recursion limit, but it only applies to devices without a > queue. Virtual devices with a queue (either because they don't have > the IFF_NO_QUEUE flag, or because the administrator added one) can > still cause an unbounded recursion, via __dev_queue_xmit -> > __dev_xmit_skb -> qdisc_run -> __qdisc_run -> qdisc_restart -> > sch_direct_xmit -> dev_hard_start_xmit . Jianlin reported this in a > setup with 16 gretap devices stacked on top of one another. > > This patch prevents the stack overflow by incrementing xmit_recursion in > code paths that can call dev_hard_start_xmit() (like commit 745e20f1b626 > did). If the recursion limit is exceeded, the packet is enqueued and the > qdisc is scheduled. > > Reported-by: Jianlin Shi <jishi@redhat.com> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> > Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Hi Sabrina, thanks for the patch. Can't we detect this in the control path instead ? Surely this makes no sense in real world to stack 16 devices like that.
2019-03-14, 05:58:03 -0700, Eric Dumazet wrote: > > > On 03/14/2019 03:15 AM, Sabrina Dubroca wrote: > > Commit 745e20f1b626 ("net: add a recursion limit in xmit path") > > introduced a recursion limit, but it only applies to devices without a > > queue. Virtual devices with a queue (either because they don't have > > the IFF_NO_QUEUE flag, or because the administrator added one) can > > still cause an unbounded recursion, via __dev_queue_xmit -> > > __dev_xmit_skb -> qdisc_run -> __qdisc_run -> qdisc_restart -> > > sch_direct_xmit -> dev_hard_start_xmit . Jianlin reported this in a > > setup with 16 gretap devices stacked on top of one another. > > > > This patch prevents the stack overflow by incrementing xmit_recursion in > > code paths that can call dev_hard_start_xmit() (like commit 745e20f1b626 > > did). If the recursion limit is exceeded, the packet is enqueued and the > > qdisc is scheduled. > > > > Reported-by: Jianlin Shi <jishi@redhat.com> > > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> > > Reviewed-by: Stefano Brivio <sbrivio@redhat.com> > > Hi Sabrina, thanks for the patch. > > Can't we detect this in the control path instead ? I don't see how. You could have a perfectly reasonable set of gretap devices that trigger this situation from simply reshuffling the IP addresses: gretap$x remote 1.1.$((x-1)).{1,2} (all those addresses set on a single veth device) Then you move those addresses to the corresponding device (1.1.${x}.{1,2} on gretap$x), and your machine crashes.
On 03/14/2019 07:15 AM, Sabrina Dubroca wrote: > 2019-03-14, 05:58:03 -0700, Eric Dumazet wrote: >> >> >> On 03/14/2019 03:15 AM, Sabrina Dubroca wrote: >>> Commit 745e20f1b626 ("net: add a recursion limit in xmit path") >>> introduced a recursion limit, but it only applies to devices without a >>> queue. Virtual devices with a queue (either because they don't have >>> the IFF_NO_QUEUE flag, or because the administrator added one) can >>> still cause an unbounded recursion, via __dev_queue_xmit -> >>> __dev_xmit_skb -> qdisc_run -> __qdisc_run -> qdisc_restart -> >>> sch_direct_xmit -> dev_hard_start_xmit . Jianlin reported this in a >>> setup with 16 gretap devices stacked on top of one another. >>> >>> This patch prevents the stack overflow by incrementing xmit_recursion in >>> code paths that can call dev_hard_start_xmit() (like commit 745e20f1b626 >>> did). If the recursion limit is exceeded, the packet is enqueued and the >>> qdisc is scheduled. >>> >>> Reported-by: Jianlin Shi <jishi@redhat.com> >>> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> >>> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> >> >> Hi Sabrina, thanks for the patch. >> >> Can't we detect this in the control path instead ? > > I don't see how. You could have a perfectly reasonable set of gretap > devices that trigger this situation from simply reshuffling the IP > addresses: > > gretap$x remote 1.1.$((x-1)).{1,2} > (all those addresses set on a single veth device) > > Then you move those addresses to the corresponding device > (1.1.${x}.{1,2} on gretap$x), and your machine crashes. > If this only can be done with gretap, why gretap cant implement the protection, outside of the fast path ? Thanks.
2019-03-14, 07:56:10 -0700, Eric Dumazet wrote: > > > On 03/14/2019 07:15 AM, Sabrina Dubroca wrote: > > 2019-03-14, 05:58:03 -0700, Eric Dumazet wrote: > >> > >> > >> On 03/14/2019 03:15 AM, Sabrina Dubroca wrote: > >>> Commit 745e20f1b626 ("net: add a recursion limit in xmit path") > >>> introduced a recursion limit, but it only applies to devices without a > >>> queue. Virtual devices with a queue (either because they don't have > >>> the IFF_NO_QUEUE flag, or because the administrator added one) can > >>> still cause an unbounded recursion, via __dev_queue_xmit -> > >>> __dev_xmit_skb -> qdisc_run -> __qdisc_run -> qdisc_restart -> > >>> sch_direct_xmit -> dev_hard_start_xmit . Jianlin reported this in a > >>> setup with 16 gretap devices stacked on top of one another. > >>> > >>> This patch prevents the stack overflow by incrementing xmit_recursion in > >>> code paths that can call dev_hard_start_xmit() (like commit 745e20f1b626 > >>> did). If the recursion limit is exceeded, the packet is enqueued and the > >>> qdisc is scheduled. > >>> > >>> Reported-by: Jianlin Shi <jishi@redhat.com> > >>> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> > >>> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> > >> > >> Hi Sabrina, thanks for the patch. > >> > >> Can't we detect this in the control path instead ? > > > > I don't see how. You could have a perfectly reasonable set of gretap > > devices that trigger this situation from simply reshuffling the IP > > addresses: > > > > gretap$x remote 1.1.$((x-1)).{1,2} > > (all those addresses set on a single veth device) > > > > Then you move those addresses to the corresponding device > > (1.1.${x}.{1,2} on gretap$x), and your machine crashes. > > > > If this only can be done with gretap, why gretap cant implement the protection, > outside of the fast path ? It's not just gretap. VXLAN will do the same as long as you add a qdisc. I expect other types of tunnels to behave like that.
On 03/14/2019 10:40 AM, Sabrina Dubroca wrote: > 2019-03-14, 07:56:10 -0700, Eric Dumazet wrote: >> >> >> On 03/14/2019 07:15 AM, Sabrina Dubroca wrote: >>> 2019-03-14, 05:58:03 -0700, Eric Dumazet wrote: >>>> >>>> >>>> On 03/14/2019 03:15 AM, Sabrina Dubroca wrote: >>>>> Commit 745e20f1b626 ("net: add a recursion limit in xmit path") >>>>> introduced a recursion limit, but it only applies to devices without a >>>>> queue. Virtual devices with a queue (either because they don't have >>>>> the IFF_NO_QUEUE flag, or because the administrator added one) can >>>>> still cause an unbounded recursion, via __dev_queue_xmit -> >>>>> __dev_xmit_skb -> qdisc_run -> __qdisc_run -> qdisc_restart -> >>>>> sch_direct_xmit -> dev_hard_start_xmit . Jianlin reported this in a >>>>> setup with 16 gretap devices stacked on top of one another. >>>>> >>>>> This patch prevents the stack overflow by incrementing xmit_recursion in >>>>> code paths that can call dev_hard_start_xmit() (like commit 745e20f1b626 >>>>> did). If the recursion limit is exceeded, the packet is enqueued and the >>>>> qdisc is scheduled. >>>>> >>>>> Reported-by: Jianlin Shi <jishi@redhat.com> >>>>> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> >>>>> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> >>>> >>>> Hi Sabrina, thanks for the patch. >>>> >>>> Can't we detect this in the control path instead ? >>> >>> I don't see how. You could have a perfectly reasonable set of gretap >>> devices that trigger this situation from simply reshuffling the IP >>> addresses: >>> >>> gretap$x remote 1.1.$((x-1)).{1,2} >>> (all those addresses set on a single veth device) >>> >>> Then you move those addresses to the corresponding device >>> (1.1.${x}.{1,2} on gretap$x), and your machine crashes. >>> >> >> If this only can be done with gretap, why gretap cant implement the protection, >> outside of the fast path ? > > It's not just gretap. VXLAN will do the same as long as you add a > qdisc. I expect other types of tunnels to behave like that. > It might make sense to add a helper using dev_queue_xmit() for tunnel users. Then remove the xmit recursion stuff out of the dev_queue_xmit() Lets make the fast path fast again.
2019-03-14, 10:51:49 -0700, Eric Dumazet wrote: > > > On 03/14/2019 10:40 AM, Sabrina Dubroca wrote: > > 2019-03-14, 07:56:10 -0700, Eric Dumazet wrote: > >> > >> > >> On 03/14/2019 07:15 AM, Sabrina Dubroca wrote: > >>> 2019-03-14, 05:58:03 -0700, Eric Dumazet wrote: > >>>> > >>>> > >>>> On 03/14/2019 03:15 AM, Sabrina Dubroca wrote: > >>>>> Commit 745e20f1b626 ("net: add a recursion limit in xmit path") > >>>>> introduced a recursion limit, but it only applies to devices without a > >>>>> queue. Virtual devices with a queue (either because they don't have > >>>>> the IFF_NO_QUEUE flag, or because the administrator added one) can > >>>>> still cause an unbounded recursion, via __dev_queue_xmit -> > >>>>> __dev_xmit_skb -> qdisc_run -> __qdisc_run -> qdisc_restart -> > >>>>> sch_direct_xmit -> dev_hard_start_xmit . Jianlin reported this in a > >>>>> setup with 16 gretap devices stacked on top of one another. > >>>>> > >>>>> This patch prevents the stack overflow by incrementing xmit_recursion in > >>>>> code paths that can call dev_hard_start_xmit() (like commit 745e20f1b626 > >>>>> did). If the recursion limit is exceeded, the packet is enqueued and the > >>>>> qdisc is scheduled. > >>>>> > >>>>> Reported-by: Jianlin Shi <jishi@redhat.com> > >>>>> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> > >>>>> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> > >>>> > >>>> Hi Sabrina, thanks for the patch. > >>>> > >>>> Can't we detect this in the control path instead ? > >>> > >>> I don't see how. You could have a perfectly reasonable set of gretap > >>> devices that trigger this situation from simply reshuffling the IP > >>> addresses: > >>> > >>> gretap$x remote 1.1.$((x-1)).{1,2} > >>> (all those addresses set on a single veth device) > >>> > >>> Then you move those addresses to the corresponding device > >>> (1.1.${x}.{1,2} on gretap$x), and your machine crashes. > >>> > >> > >> If this only can be done with gretap, why gretap cant implement the protection, > >> outside of the fast path ? > > > > It's not just gretap. VXLAN will do the same as long as you add a > > qdisc. I expect other types of tunnels to behave like that. > > > > It might make sense to add a helper using dev_queue_xmit() > for tunnel users. > > Then remove the xmit recursion stuff out of the dev_queue_xmit() > > Lets make the fast path fast again. Ok. I'm traveling next week, so I'll work on this when I get back. Thanks for the comments.
Hi Eric, 2019-03-14, 10:51:49 -0700, Eric Dumazet wrote: > On 03/14/2019 10:40 AM, Sabrina Dubroca wrote: > > 2019-03-14, 07:56:10 -0700, Eric Dumazet wrote: > >> On 03/14/2019 07:15 AM, Sabrina Dubroca wrote: > >>> 2019-03-14, 05:58:03 -0700, Eric Dumazet wrote: > >>>> On 03/14/2019 03:15 AM, Sabrina Dubroca wrote: > >>>>> Commit 745e20f1b626 ("net: add a recursion limit in xmit path") > >>>>> introduced a recursion limit, but it only applies to devices without a > >>>>> queue. Virtual devices with a queue (either because they don't have > >>>>> the IFF_NO_QUEUE flag, or because the administrator added one) can > >>>>> still cause an unbounded recursion, via __dev_queue_xmit -> > >>>>> __dev_xmit_skb -> qdisc_run -> __qdisc_run -> qdisc_restart -> > >>>>> sch_direct_xmit -> dev_hard_start_xmit . Jianlin reported this in a > >>>>> setup with 16 gretap devices stacked on top of one another. > >>>>> > >>>>> This patch prevents the stack overflow by incrementing xmit_recursion in > >>>>> code paths that can call dev_hard_start_xmit() (like commit 745e20f1b626 > >>>>> did). If the recursion limit is exceeded, the packet is enqueued and the > >>>>> qdisc is scheduled. > >>>>> > >>>>> Reported-by: Jianlin Shi <jishi@redhat.com> > >>>>> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> > >>>>> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> > >>>> > >>>> Hi Sabrina, thanks for the patch. > >>>> > >>>> Can't we detect this in the control path instead ? > >>> > >>> I don't see how. You could have a perfectly reasonable set of gretap > >>> devices that trigger this situation from simply reshuffling the IP > >>> addresses: > >>> > >>> gretap$x remote 1.1.$((x-1)).{1,2} > >>> (all those addresses set on a single veth device) > >>> > >>> Then you move those addresses to the corresponding device > >>> (1.1.${x}.{1,2} on gretap$x), and your machine crashes. > >>> > >> > >> If this only can be done with gretap, why gretap cant implement the protection, > >> outside of the fast path ? > > > > It's not just gretap. VXLAN will do the same as long as you add a > > qdisc. I expect other types of tunnels to behave like that. > > > > It might make sense to add a helper using dev_queue_xmit() > for tunnel users. > > Then remove the xmit recursion stuff out of the dev_queue_xmit() > > Lets make the fast path fast again. I've been looking into this. Ignoring devices that link themselves to their upper/lower (which would be easy to handle at setup time), we're left with IP/UDP tunnels (also easy to handle, they all xmit via a few output functions), and then a couple of weird cases: - vti/xfrmi call dst_output - ppp can end up calling ip_local_out, ip_queue_xmit, or ip6_xmit - some devices call dev_queue_xmit. we'd need a recursion-limited wrapper for those, without putting the penalty for users that respect the rules (core, linked devices). I don't see a way to enforce that (ipv6 needs dev_queue_xmit to be EXPORT'ed), and human review is not perfect, so some devices could slip in using the wrong dev_queue_xmit. I'm not convinced this change is worth the risk.
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index a16fbe9a2a67..9384c1749bf3 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -113,6 +113,20 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, struct net_device *dev, struct netdev_queue *txq, spinlock_t *root_lock, bool validate); +static inline bool sch_direct_xmit_rec(struct sk_buff *skb, struct Qdisc *q, + struct net_device *dev, + struct netdev_queue *txq, + spinlock_t *root_lock, bool validate) +{ + bool ret; + + __this_cpu_inc(xmit_recursion); + ret = sch_direct_xmit(skb, q, dev, txq, root_lock, validate); + __this_cpu_dec(xmit_recursion); + + return ret; +} + void __qdisc_run(struct Qdisc *q); static inline void qdisc_run(struct Qdisc *q) diff --git a/net/core/dev.c b/net/core/dev.c index 2b67f2aa59dd..74b77a773712 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3493,6 +3493,8 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, __qdisc_drop(skb, &to_free); rc = NET_XMIT_DROP; } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) && + likely(__this_cpu_read(xmit_recursion) <= + XMIT_RECURSION_LIMIT) && qdisc_run_begin(q)) { /* * This is a work-conserving queue; there are no old skbs @@ -3502,7 +3504,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, qdisc_bstats_update(q, skb); - if (sch_direct_xmit(skb, q, dev, txq, root_lock, true)) { + if (sch_direct_xmit_rec(skb, q, dev, txq, root_lock, true)) { if (unlikely(contended)) { spin_unlock(&q->busylock); contended = false; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index a117d9260558..2565ef18d4cc 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -395,6 +395,12 @@ void __qdisc_run(struct Qdisc *q) int quota = dev_tx_weight; int packets; + if (unlikely(__this_cpu_read(xmit_recursion) > XMIT_RECURSION_LIMIT)) { + __netif_schedule(q); + return; + } + + __this_cpu_inc(xmit_recursion); while (qdisc_restart(q, &packets)) { /* * Ordered by possible occurrence: Postpone processing if @@ -407,6 +413,7 @@ void __qdisc_run(struct Qdisc *q) break; } } + __this_cpu_dec(xmit_recursion); } unsigned long dev_trans_start(struct net_device *dev)