Message ID | 07707a797d6c3cd0bfe86f037d3d1eb329acbc86.1424099973.git.jslaby@suse.cz |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
> +static inline u32 from64to32(u64 x) > +{ > + /* add up 32-bit and 32-bit for 32+c bit */ > + x = (x & 0xffffffff) + (x >> 32); > + /* add up carry.. */ > + x = (x & 0xffffffff) + (x >> 32); > + return (u32)x; > +} As a matter of interest, does the compiler optimise away the second (x & 0xffffffff) ? The code could just be: x = (x & 0xffffffff) + (x >> 32); return x + (x >> 32); David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 17, 2015 at 12:04:22PM +0000, David Laight wrote: > > +static inline u32 from64to32(u64 x) > > +{ > > + /* add up 32-bit and 32-bit for 32+c bit */ > > + x = (x & 0xffffffff) + (x >> 32); > > + /* add up carry.. */ > > + x = (x & 0xffffffff) + (x >> 32); > > + return (u32)x; > > +} > > As a matter of interest, does the compiler optimise away the > second (x & 0xffffffff) ? > The code could just be: > x = (x & 0xffffffff) + (x >> 32); > return x + (x >> 32); > On my side, from what I've seen so far, your version results in better assembly, esp. with clang, but my first version http://article.gmane.org/gmane.linux.kernel/1875407: x += (x << 32) + (x >> 32); return (__force __wsum)(x >> 32); resulted in even better assembly, I just verified with gcc/clang, x86_64/ARM and -O1,2,3. Karl -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Karl Beldan > On Tue, Feb 17, 2015 at 12:04:22PM +0000, David Laight wrote: > > > +static inline u32 from64to32(u64 x) > > > +{ > > > + /* add up 32-bit and 32-bit for 32+c bit */ > > > + x = (x & 0xffffffff) + (x >> 32); > > > + /* add up carry.. */ > > > + x = (x & 0xffffffff) + (x >> 32); > > > + return (u32)x; > > > +} > > > > As a matter of interest, does the compiler optimise away the > > second (x & 0xffffffff) ? > > The code could just be: > > x = (x & 0xffffffff) + (x >> 32); > > return x + (x >> 32); > > > > On my side, from what I've seen so far, your version results in better > assembly, esp. with clang, but my first version > http://article.gmane.org/gmane.linux.kernel/1875407: > x += (x << 32) + (x >> 32); > return (__force __wsum)(x >> 32); > resulted in even better assembly, I just verified with gcc/clang, > x86_64/ARM and -O1,2,3. The latter looks to have a shorter dependency chain as well. Although I'd definitely include a comment saying that it is equivalent to the two lines in the current patch. Does either compiler manage to use a rotate for the two shifts? Using '(x << 32) | (x >> 32)' might convince it to do so. That would reduce it to three 'real' instructions and a register rename. David
On Wed, Feb 18, 2015 at 09:40:23AM +0000, David Laight wrote: > From: Karl Beldan > > On Tue, Feb 17, 2015 at 12:04:22PM +0000, David Laight wrote: > > > > +static inline u32 from64to32(u64 x) > > > > +{ > > > > + /* add up 32-bit and 32-bit for 32+c bit */ > > > > + x = (x & 0xffffffff) + (x >> 32); > > > > + /* add up carry.. */ > > > > + x = (x & 0xffffffff) + (x >> 32); > > > > + return (u32)x; > > > > +} > > > > > > As a matter of interest, does the compiler optimise away the > > > second (x & 0xffffffff) ? > > > The code could just be: > > > x = (x & 0xffffffff) + (x >> 32); > > > return x + (x >> 32); > > > > > > > On my side, from what I've seen so far, your version results in better > > assembly, esp. with clang, but my first version > > http://article.gmane.org/gmane.linux.kernel/1875407: > > x += (x << 32) + (x >> 32); > > return (__force __wsum)(x >> 32); > > resulted in even better assembly, I just verified with gcc/clang, > > x86_64/ARM and -O1,2,3. > > The latter looks to have a shorter dependency chain as well. > Although I'd definitely include a comment saying that it is equivalent > to the two lines in the current patch. > > Does either compiler manage to use a rotate for the two shifts? > Using '(x << 32) | (x >> 32)' might convince it to do so. > That would reduce it to three 'real' instructions and a register rename. > gcc and clang rotate for tile (just checked gcc) and x86_64, not for arm (and IMHO rightly so). Both '|' and '+' yielded the same asm for those 3 archs. Karl -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
=============== commit 150ae0e94634714b23919f0c333fee28a5b199d5 upstream. The carry from the 64->32bits folding was dropped, e.g with: saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1, csum_tcpudp_nofold returned 0 instead of 1. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Mike Frysinger <vapier@gentoo.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz> --- lib/checksum.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/lib/checksum.c b/lib/checksum.c index 129775eb6de6..fcf38943132c 100644 --- a/lib/checksum.c +++ b/lib/checksum.c @@ -47,6 +47,15 @@ static inline unsigned short from32to16(unsigned int x) return x; } +static inline u32 from64to32(u64 x) +{ + /* add up 32-bit and 32-bit for 32+c bit */ + x = (x & 0xffffffff) + (x >> 32); + /* add up carry.. */ + x = (x & 0xffffffff) + (x >> 32); + return (u32)x; +} + static unsigned int do_csum(const unsigned char *buff, int len) { int odd; @@ -195,8 +204,7 @@ __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr, #else s += (proto + len) << 8; #endif - s += (s >> 32); - return (__force __wsum)s; + return (__force __wsum)from64to32(s); } EXPORT_SYMBOL(csum_tcpudp_nofold); #endif
From: karl beldan <karl.beldan@gmail.com> 3.12-stable review patch. If anyone has any objections, please let me know.