diff mbox

[3.12,065/122] lib/checksum.c: fix carry in csum_tcpudp_nofold

Message ID 07707a797d6c3cd0bfe86f037d3d1eb329acbc86.1424099973.git.jslaby@suse.cz
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Slaby Feb. 17, 2015, 11:34 a.m. UTC
From: karl beldan <karl.beldan@gmail.com>

3.12-stable review patch.  If anyone has any objections, please let me know.

Comments

David Laight Feb. 17, 2015, 12:04 p.m. UTC | #1
> +static inline u32 from64to32(u64 x)
> +{
> +	/* add up 32-bit and 32-bit for 32+c bit */
> +	x = (x & 0xffffffff) + (x >> 32);
> +	/* add up carry.. */
> +	x = (x & 0xffffffff) + (x >> 32);
> +	return (u32)x;
> +}

As a matter of interest, does the compiler optimise away the
second (x & 0xffffffff) ?
The code could just be:
	x = (x & 0xffffffff) + (x >> 32);
	return x + (x >> 32);

	David

	
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Karl Beldan Feb. 17, 2015, 7:57 p.m. UTC | #2
On Tue, Feb 17, 2015 at 12:04:22PM +0000, David Laight wrote:
> > +static inline u32 from64to32(u64 x)
> > +{
> > +	/* add up 32-bit and 32-bit for 32+c bit */
> > +	x = (x & 0xffffffff) + (x >> 32);
> > +	/* add up carry.. */
> > +	x = (x & 0xffffffff) + (x >> 32);
> > +	return (u32)x;
> > +}
> 
> As a matter of interest, does the compiler optimise away the
> second (x & 0xffffffff) ?
> The code could just be:
> 	x = (x & 0xffffffff) + (x >> 32);
> 	return x + (x >> 32);
> 
On my side, from what I've seen so far, your version results in better
assembly, esp. with clang, but my first version
http://article.gmane.org/gmane.linux.kernel/1875407:
        x += (x << 32) + (x >> 32);
	return (__force __wsum)(x >> 32);
resulted in even better assembly, I just verified with gcc/clang,
x86_64/ARM and -O1,2,3.
 
Karl
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Feb. 18, 2015, 9:40 a.m. UTC | #3
From: Karl Beldan 

> On Tue, Feb 17, 2015 at 12:04:22PM +0000, David Laight wrote:

> > > +static inline u32 from64to32(u64 x)

> > > +{

> > > +	/* add up 32-bit and 32-bit for 32+c bit */

> > > +	x = (x & 0xffffffff) + (x >> 32);

> > > +	/* add up carry.. */

> > > +	x = (x & 0xffffffff) + (x >> 32);

> > > +	return (u32)x;

> > > +}

> >

> > As a matter of interest, does the compiler optimise away the

> > second (x & 0xffffffff) ?

> > The code could just be:

> > 	x = (x & 0xffffffff) + (x >> 32);

> > 	return x + (x >> 32);

> >

>

> On my side, from what I've seen so far, your version results in better

> assembly, esp. with clang, but my first version

> http://article.gmane.org/gmane.linux.kernel/1875407:

>         x += (x << 32) + (x >> 32);

> 	return (__force __wsum)(x >> 32);

> resulted in even better assembly, I just verified with gcc/clang,

> x86_64/ARM and -O1,2,3.


The latter looks to have a shorter dependency chain as well.
Although I'd definitely include a comment saying that it is equivalent
to the two lines in the current patch.

Does either compiler manage to use a rotate for the two shifts?
Using '(x << 32) | (x >> 32)' might convince it to do so.
That would reduce it to three 'real' instructions and a register rename.

	David
Karl Beldan Feb. 19, 2015, 11:47 p.m. UTC | #4
On Wed, Feb 18, 2015 at 09:40:23AM +0000, David Laight wrote:
> From: Karl Beldan 
> > On Tue, Feb 17, 2015 at 12:04:22PM +0000, David Laight wrote:
> > > > +static inline u32 from64to32(u64 x)
> > > > +{
> > > > +	/* add up 32-bit and 32-bit for 32+c bit */
> > > > +	x = (x & 0xffffffff) + (x >> 32);
> > > > +	/* add up carry.. */
> > > > +	x = (x & 0xffffffff) + (x >> 32);
> > > > +	return (u32)x;
> > > > +}
> > >
> > > As a matter of interest, does the compiler optimise away the
> > > second (x & 0xffffffff) ?
> > > The code could just be:
> > > 	x = (x & 0xffffffff) + (x >> 32);
> > > 	return x + (x >> 32);
> > >
> >
> > On my side, from what I've seen so far, your version results in better
> > assembly, esp. with clang, but my first version
> > http://article.gmane.org/gmane.linux.kernel/1875407:
> >         x += (x << 32) + (x >> 32);
> > 	return (__force __wsum)(x >> 32);
> > resulted in even better assembly, I just verified with gcc/clang,
> > x86_64/ARM and -O1,2,3.
> 
> The latter looks to have a shorter dependency chain as well.
> Although I'd definitely include a comment saying that it is equivalent
> to the two lines in the current patch.
> 
> Does either compiler manage to use a rotate for the two shifts?
> Using '(x << 32) | (x >> 32)' might convince it to do so.
> That would reduce it to three 'real' instructions and a register rename.
> 
gcc and clang rotate for tile (just checked gcc) and x86_64, not for arm
(and IMHO rightly so).
Both '|' and '+' yielded the same asm for those 3 archs.
 
Karl
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

===============

commit 150ae0e94634714b23919f0c333fee28a5b199d5 upstream.

The carry from the 64->32bits folding was dropped, e.g with:
saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1,
csum_tcpudp_nofold returned 0 instead of 1.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 lib/checksum.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/checksum.c b/lib/checksum.c
index 129775eb6de6..fcf38943132c 100644
--- a/lib/checksum.c
+++ b/lib/checksum.c
@@ -47,6 +47,15 @@  static inline unsigned short from32to16(unsigned int x)
 	return x;
 }
 
+static inline u32 from64to32(u64 x)
+{
+	/* add up 32-bit and 32-bit for 32+c bit */
+	x = (x & 0xffffffff) + (x >> 32);
+	/* add up carry.. */
+	x = (x & 0xffffffff) + (x >> 32);
+	return (u32)x;
+}
+
 static unsigned int do_csum(const unsigned char *buff, int len)
 {
 	int odd;
@@ -195,8 +204,7 @@  __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
 #else
 	s += (proto + len) << 8;
 #endif
-	s += (s >> 32);
-	return (__force __wsum)s;
+	return (__force __wsum)from64to32(s);
 }
 EXPORT_SYMBOL(csum_tcpudp_nofold);
 #endif