Message ID | 1579058620-26684-1-git-send-email-zhangshaokun@hisilicon.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | net: optimize cmpxchg in ip_idents_reserve | expand |
From: Shaokun Zhang <zhangshaokun@hisilicon.com> Date: Wed, 15 Jan 2020 11:23:40 +0800 > From: Yuqi Jin <jinyuqi@huawei.com> > > atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce > the access number of the global variable @p_id in the loop. Let's > optimize it for performance. > > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Yang Guo <guoyang2@huawei.com> > Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> > Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> I doubt this makes any measurable improvement in performance. If you can document a specific measurable improvement under a useful set of circumstances for real usage, then put those details into the commit message and resubmit. Otherwise, I'm not applying this, sorry.
Hi David, On 2020/1/16 20:27, David Miller wrote: > From: Shaokun Zhang <zhangshaokun@hisilicon.com> > Date: Wed, 15 Jan 2020 11:23:40 +0800 > >> From: Yuqi Jin <jinyuqi@huawei.com> >> >> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce >> the access number of the global variable @p_id in the loop. Let's >> optimize it for performance. >> >> Cc: "David S. Miller" <davem@davemloft.net> >> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> >> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> >> Cc: Eric Dumazet <edumazet@google.com> >> Cc: Yang Guo <guoyang2@huawei.com> >> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> >> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> > > I doubt this makes any measurable improvement in performance. > > If you can document a specific measurable improvement under > a useful set of circumstances for real usage, then put those > details into the commit message and resubmit. Ok, I will do it and resubmit. Thanks your reply, Shaokun > > Otherwise, I'm not applying this, sorry. > > . >
On 1/16/20 4:27 AM, David Miller wrote: > From: Shaokun Zhang <zhangshaokun@hisilicon.com> > Date: Wed, 15 Jan 2020 11:23:40 +0800 > >> From: Yuqi Jin <jinyuqi@huawei.com> >> >> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce >> the access number of the global variable @p_id in the loop. Let's >> optimize it for performance. >> >> Cc: "David S. Miller" <davem@davemloft.net> >> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> >> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> >> Cc: Eric Dumazet <edumazet@google.com> >> Cc: Yang Guo <guoyang2@huawei.com> >> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> >> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> > > I doubt this makes any measurable improvement in performance. > > If you can document a specific measurable improvement under > a useful set of circumstances for real usage, then put those > details into the commit message and resubmit. > > Otherwise, I'm not applying this, sorry. > Real difference that could be made here is to only use this cmpxchg() dance for CONFIG_UBSAN When CONFIG_UBSAN is not set, atomic_add_return() is just fine. (Supposedly UBSAN should not warn about that either, but this depends on compiler version)
On 1/16/20 7:12 AM, Eric Dumazet wrote: > > > On 1/16/20 4:27 AM, David Miller wrote: >> From: Shaokun Zhang <zhangshaokun@hisilicon.com> >> Date: Wed, 15 Jan 2020 11:23:40 +0800 >> >>> From: Yuqi Jin <jinyuqi@huawei.com> >>> >>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce >>> the access number of the global variable @p_id in the loop. Let's >>> optimize it for performance. >>> >>> Cc: "David S. Miller" <davem@davemloft.net> >>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> >>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> >>> Cc: Eric Dumazet <edumazet@google.com> >>> Cc: Yang Guo <guoyang2@huawei.com> >>> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> >>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> >> >> I doubt this makes any measurable improvement in performance. >> >> If you can document a specific measurable improvement under >> a useful set of circumstances for real usage, then put those >> details into the commit message and resubmit. >> >> Otherwise, I'm not applying this, sorry. >> > > > Real difference that could be made here is to > only use this cmpxchg() dance for CONFIG_UBSAN > > When CONFIG_UBSAN is not set, atomic_add_return() is just fine. > > (Supposedly UBSAN should not warn about that either, but this depends on compiler version) I will test something like : diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 2010888e68ca96ae880481973a6d808d6c5612c5..e2fa972f5c78f2aefc801db6a45b2a81141c3028 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -495,11 +495,15 @@ u32 ip_idents_reserve(u32 hash, int segs) if (old != now && cmpxchg(p_tstamp, old, now) == old) delta = prandom_u32_max(now - old); - /* Do not use atomic_add_return() as it makes UBSAN unhappy */ +#ifdef CONFIG_UBSAN + /* Do not use atomic_add_return() as it makes old UBSAN versions unhappy */ do { old = (u32)atomic_read(p_id); new = old + delta + segs; } while (atomic_cmpxchg(p_id, old, new) != old); +#else + new = atomic_add_return(segs + delta, p_id); +#endif return new - segs; }
+Cc Will Deacon, On 2020/1/16 23:19, Eric Dumazet wrote: > > > On 1/16/20 7:12 AM, Eric Dumazet wrote: >> >> >> On 1/16/20 4:27 AM, David Miller wrote: >>> From: Shaokun Zhang <zhangshaokun@hisilicon.com> >>> Date: Wed, 15 Jan 2020 11:23:40 +0800 >>> >>>> From: Yuqi Jin <jinyuqi@huawei.com> >>>> >>>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce >>>> the access number of the global variable @p_id in the loop. Let's >>>> optimize it for performance. >>>> >>>> Cc: "David S. Miller" <davem@davemloft.net> >>>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> >>>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> >>>> Cc: Eric Dumazet <edumazet@google.com> >>>> Cc: Yang Guo <guoyang2@huawei.com> >>>> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> >>>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> >>> >>> I doubt this makes any measurable improvement in performance. >>> >>> If you can document a specific measurable improvement under >>> a useful set of circumstances for real usage, then put those >>> details into the commit message and resubmit. >>> >>> Otherwise, I'm not applying this, sorry. >>> >> >> >> Real difference that could be made here is to >> only use this cmpxchg() dance for CONFIG_UBSAN >> >> When CONFIG_UBSAN is not set, atomic_add_return() is just fine. >> >> (Supposedly UBSAN should not warn about that either, but this depends on compiler version) > > I will test something like : > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index 2010888e68ca96ae880481973a6d808d6c5612c5..e2fa972f5c78f2aefc801db6a45b2a81141c3028 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -495,11 +495,15 @@ u32 ip_idents_reserve(u32 hash, int segs) > if (old != now && cmpxchg(p_tstamp, old, now) == old) > delta = prandom_u32_max(now - old); > > - /* Do not use atomic_add_return() as it makes UBSAN unhappy */ > +#ifdef CONFIG_UBSAN I appreciate Eric's idea. > + /* Do not use atomic_add_return() as it makes old UBSAN versions unhappy */ > do { > old = (u32)atomic_read(p_id); > new = old + delta + segs; > } while (atomic_cmpxchg(p_id, old, new) != old); But about this, I still think that we can try to use atomic_try_cmpxchg instead of atomic_cmpxchg, like refcount_add_not_zero in include/linux/refcount.h I abstract the model, as follow: while (get_cycles() <= (time_temp + t_cnt)) { do{ old_temp = wk_in->num.counter; oldt = atomic64_cmpxchg(&wk_in->num, old_temp, old_temp+1); }while(oldt != old_temp); myndelay(delay_o); w_cnt+=1; } val = atomic64_read(&wk_in->num); while (get_cycles() <= (time_temp + t_cnt)) { do{ new_val = val + 1; }while(!atomic64_try_cmpxchg(&wk_in->num, &val, new_val)); myndelay(delay_o); w_cnt += 1; } If we take the access global variable out of loop, the performance become better on both x86 and arm64, as follow: Kunpeng920: 24 cores call it and the successful operations per second atomic64_read in loop atomic64_read out of the loop 6291034 8751962 Intel 6248: 20 cores call it and the successful operations per second atomic64_read in loop atomic64_read out of the loop 8934792 9978998 So how about this? ;-) delta = prandom_u32_max(now - old); +#ifdef CONFIG_UBSAN /* Do not use atomic_add_return() as it makes UBSAN unhappy */ + old = (u32)atomic_read(p_id); do { - old = (u32)atomic_read(p_id); new = old + delta + segs; - } while (atomic_cmpxchg(p_id, old, new) != old); + } while (!atomic_try_cmpxchg(p_id, &old, new)); +#else + new = atomic_add_return(segs + delta, p_id); +#endif Thanks, Shaokun > +#else > + new = atomic_add_return(segs + delta, p_id); > +#endif > > return new - segs; > } > > > . >
On Fri, Jan 17, 2020 at 02:54:03PM +0800, Shaokun Zhang wrote: > So how about this? ;-) > > delta = prandom_u32_max(now - old); > > +#ifdef CONFIG_UBSAN > /* Do not use atomic_add_return() as it makes UBSAN unhappy */ > + old = (u32)atomic_read(p_id); > do { > - old = (u32)atomic_read(p_id); > new = old + delta + segs; > - } while (atomic_cmpxchg(p_id, old, new) != old); > + } while (!atomic_try_cmpxchg(p_id, &old, new)); > +#else > + new = atomic_add_return(segs + delta, p_id); > +#endif That's crazy, just accept that UBSAN is taking bonghits and ignore it. Use atomic_add_return() unconditionally.
On 1/17/20 4:32 AM, Peter Zijlstra wrote: > > That's crazy, just accept that UBSAN is taking bonghits and ignore it. > Use atomic_add_return() unconditionally. > Yes, we might simply add a comment so that people do not bug us if their compiler is too old. /* If UBSAN reports an error there, please make sure your compiler * supports -fno-strict-overflow before reporting it. */ return atomic_add_return(segs + delta, p_id) - segs;
On Fri, Jan 17, 2020 at 08:35:07AM -0800, Eric Dumazet wrote: > > > On 1/17/20 4:32 AM, Peter Zijlstra wrote: > > > > > That's crazy, just accept that UBSAN is taking bonghits and ignore it. > > Use atomic_add_return() unconditionally. > > > > Yes, we might simply add a comment so that people do not bug us if > their compiler is too old. > > /* If UBSAN reports an error there, please make sure your compiler > * supports -fno-strict-overflow before reporting it. > */ > return atomic_add_return(segs + delta, p_id) - segs; > Do we need that comment any more? The flag was apparently introduced in gcc-4.2 and we only support 4.6+ anyway?
On 1/17/20 10:03 AM, Arvind Sankar wrote: > On Fri, Jan 17, 2020 at 08:35:07AM -0800, Eric Dumazet wrote: >> >> >> On 1/17/20 4:32 AM, Peter Zijlstra wrote: >> >>> >>> That's crazy, just accept that UBSAN is taking bonghits and ignore it. >>> Use atomic_add_return() unconditionally. >>> >> >> Yes, we might simply add a comment so that people do not bug us if >> their compiler is too old. >> >> /* If UBSAN reports an error there, please make sure your compiler >> * supports -fno-strict-overflow before reporting it. >> */ >> return atomic_add_return(segs + delta, p_id) - segs; >> > > Do we need that comment any more? The flag was apparently introduced in > gcc-4.2 and we only support 4.6+ anyway? Wasńt it the case back in 2016 already for linux-4.8 ? What will prevent someone to send another report to netdev/lkml ? -fno-strict-overflow support is not a prereq for CONFIG_UBSAN. Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing. commit adb03115f4590baa280ddc440a8eff08a6be0cb7 Author: Eric Dumazet <edumazet@google.com> Date: Tue Sep 20 18:06:17 2016 -0700 net: get rid of an signed integer overflow in ip_idents_reserve() Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve() [] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11 [] signed integer overflow: [] -2117905507 + -695755206 cannot be represented in type 'int' Since we do not have uatomic_add_return() yet, use atomic_cmpxchg() so that the arithmetics can be done using unsigned int. Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote: > Wasńt it the case back in 2016 already for linux-4.8 ? > > What will prevent someone to send another report to netdev/lkml ? > > -fno-strict-overflow support is not a prereq for CONFIG_UBSAN. > > Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for > test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing. > No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6"). That raised it from 3.2 -> 4.6.
On 1/17/20 10:38 AM, Arvind Sankar wrote: > On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote: >> Wasńt it the case back in 2016 already for linux-4.8 ? >> >> What will prevent someone to send another report to netdev/lkml ? >> >> -fno-strict-overflow support is not a prereq for CONFIG_UBSAN. >> >> Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for >> test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing. >> > > No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum > required gcc version to 4.6"). That raised it from 3.2 -> 4.6. > This seems good to me, for gcc at least. Maybe it is time to enfore -fno-strict-overflow in KBUILD_CFLAGS instead of making it conditional. Thanks.
Hi Peter, On 2020/1/17 20:32, Peter Zijlstra wrote: > On Fri, Jan 17, 2020 at 02:54:03PM +0800, Shaokun Zhang wrote: >> So how about this? ;-) >> >> delta = prandom_u32_max(now - old); >> >> +#ifdef CONFIG_UBSAN >> /* Do not use atomic_add_return() as it makes UBSAN unhappy */ >> + old = (u32)atomic_read(p_id); >> do { >> - old = (u32)atomic_read(p_id); >> new = old + delta + segs; >> - } while (atomic_cmpxchg(p_id, old, new) != old); >> + } while (!atomic_try_cmpxchg(p_id, &old, new)); >> +#else >> + new = atomic_add_return(segs + delta, p_id); >> +#endif > > That's crazy, just accept that UBSAN is taking bonghits and ignore it. > Use atomic_add_return() unconditionally. We have used the atomic_add_return[1], but it makes the UBSAN unhappy followed by the comment. It seems that Eric also agreed to do it if some comments are added. I will do it later. Thanks, Shaokun [1] https://lkml.org/lkml/2019/7/26/217 > > . >
On Sat, Jan 18, 2020 at 7:47 PM Shaokun Zhang <zhangshaokun@hisilicon.com> wrote: > > We have used the atomic_add_return[1], but it makes the UBSAN unhappy followed > by the comment. > It seems that Eric also agreed to do it if some comments are added. I will do > it later. > > Thanks, > Shaokun > > [1] https://lkml.org/lkml/2019/7/26/217 > In case you have missed it, we needed a proper analysis. My feedback was quite simple : <quote> Have you first checked that current UBSAN versions will not complain anymore ? </quote> You never did this work, never replied to my question, and months later you come back with a convoluted patch while we simply can proceed with a revert now we are sure that linux kernels are compiled with the proper option. As mentioned yesterday, no need for a comment. Instead the changelog should be explaining why the revert is now safe.
On Fri, Jan 17, 2020 at 10:48:19AM -0800, Eric Dumazet wrote: > > > On 1/17/20 10:38 AM, Arvind Sankar wrote: > > On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote: > >> Wasńt it the case back in 2016 already for linux-4.8 ? > >> > >> What will prevent someone to send another report to netdev/lkml ? > >> > >> -fno-strict-overflow support is not a prereq for CONFIG_UBSAN. > >> > >> Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for > >> test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing. > >> > > > > No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum > > required gcc version to 4.6"). That raised it from 3.2 -> 4.6. > > > > This seems good to me, for gcc at least. > > Maybe it is time to enfore -fno-strict-overflow in KBUILD_CFLAGS > instead of making it conditional. IIRC there was a bug in UBSAN vs -fwrapv/-fno-strict-overflow that was only fixed in gcc-8 or 9 or so. So while the -fwrapv/-fno-strict-overflow flag has been correctly supported since like forever, UBSAN was buggy until quite recent when used in conjustion with that flag.
Hi Eric, On 2020/1/19 12:12, Eric Dumazet wrote: > On Sat, Jan 18, 2020 at 7:47 PM Shaokun Zhang > <zhangshaokun@hisilicon.com> wrote: >> > >> We have used the atomic_add_return[1], but it makes the UBSAN unhappy followed >> by the comment. >> It seems that Eric also agreed to do it if some comments are added. I will do >> it later. >> >> Thanks, >> Shaokun >> >> [1] https://lkml.org/lkml/2019/7/26/217 >> > > In case you have missed it, we needed a proper analysis. > My feedback was quite simple : > > <quote> > Have you first checked that current UBSAN versions will not complain anymore ? > </quote> > > You never did this work, never replied to my question, and months Yeah, I'm not sure how to deal with the UBSAN issue and you said that some of you would work this. > later you come back > with a convoluted patch while we simply can proceed with a revert now After several months, we thought that we can do it like refcount_add_not_zero, so we submit this patch. > we are sure that linux kernels are compiled with the proper option. > > As mentioned yesterday, no need for a comment. > Instead the changelog should be explaining why the revert is now safe. > Ok, it is really needed to consider this. Thanks, Shaokun > . >
On Sat, Jan 18, 2020 at 08:12:48PM -0800, Eric Dumazet wrote:
> Instead the changelog should be explaining why the revert is now safe.
FWIW, it was always safe, UBSAN was just buggy.
Hi Peter/Eric, Shall we use atomic_add_return() unconditionally and add some comments? Or I missed something. Thanks, Shaokun On 2020/1/20 16:18, Peter Zijlstra wrote: > On Fri, Jan 17, 2020 at 10:48:19AM -0800, Eric Dumazet wrote: >> >> >> On 1/17/20 10:38 AM, Arvind Sankar wrote: >>> On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote: >>>> Wasńt it the case back in 2016 already for linux-4.8 ? >>>> >>>> What will prevent someone to send another report to netdev/lkml ? >>>> >>>> -fno-strict-overflow support is not a prereq for CONFIG_UBSAN. >>>> >>>> Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for >>>> test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing. >>>> >>> >>> No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum >>> required gcc version to 4.6"). That raised it from 3.2 -> 4.6. >>> >> >> This seems good to me, for gcc at least. >> >> Maybe it is time to enfore -fno-strict-overflow in KBUILD_CFLAGS >> instead of making it conditional. > > IIRC there was a bug in UBSAN vs -fwrapv/-fno-strict-overflow that was > only fixed in gcc-8 or 9 or so. > > So while the -fwrapv/-fno-strict-overflow flag has been correctly > supported since like forever, UBSAN was buggy until quite recent when > used in conjustion with that flag. > > . >
On Thu, May 7, 2020 at 2:12 AM Shaokun Zhang <zhangshaokun@hisilicon.com> wrote: > > Hi Peter/Eric, > > Shall we use atomic_add_return() unconditionally and add some comments? Or I missed > something. > Yes. A big fat comment, because I do not want yet another bogus complaint from someone playing with a buggy UBSAN.
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 87e979f2b74a..7e28c7121c20 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -496,10 +496,10 @@ u32 ip_idents_reserve(u32 hash, int segs) delta = prandom_u32_max(now - old); /* Do not use atomic_add_return() as it makes UBSAN unhappy */ + old = (u32)atomic_read(p_id); do { - old = (u32)atomic_read(p_id); new = old + delta + segs; - } while (atomic_cmpxchg(p_id, old, new) != old); + } while (!atomic_try_cmpxchg(p_id, &old, new)); return new - segs; }