Message ID | 49C94B6A.5020304@cosmosbay.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <dada1@cosmosbay.com> Date: Tue, 24 Mar 2009 22:06:50 +0100 > [PATCH] arp_tables: ifname_compare() can assume 16bit alignment > > Arches without efficient unaligned access can still perform a loop > assuming 16bit alignment in ifname_compare() > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Looks great, applied, thanks Eric! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 2009-03-24 22:06, Eric Dumazet wrote: >>> +/* >>> + * Unfortunatly, _b and _mask are not aligned to an int (or long int) >>> + * Some arches dont care, unrolling the loop is a win on them. >>> + */ >>> +static unsigned long ifname_compare(const char *_a, const char *_b, const char *_mask) >>> +{ >>> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >>> + const unsigned long *a = (const unsigned long *)_a; >>> + const unsigned long *b = (const unsigned long *)_b; >> >> I think we can at least give some help for the platforms which >> require alignment. >> >> We can, for example, assume 16-bit alignment and thus loop >> over u16's > >Right. How about this incremental patch ? > >Thanks > >[PATCH] arp_tables: ifname_compare() can assume 16bit alignment > >Arches without efficient unaligned access can still perform a loop >assuming 16bit alignment in ifname_compare() Allow me some skepticism, but the code looks pretty much like a standard memcmp. > unsigned long ret = 0; >+ const u16 *a = (const u16 *)_a; >+ const u16 *b = (const u16 *)_b; >+ const u16 *mask = (const u16 *)_mask; > int i; > >- for (i = 0; i < IFNAMSIZ; i++) >- ret |= (_a[i] ^ _b[i]) & _mask[i]; >+ for (i = 0; i < IFNAMSIZ/sizeof(u16); i++) >+ ret |= (a[i] ^ b[i]) & mask[i]; > #endif > return ret; > } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Jan Engelhardt <jengelh@medozas.de> Date: Tue, 24 Mar 2009 22:17:17 +0100 (CET) > > On Tuesday 2009-03-24 22:06, Eric Dumazet wrote: > >>> +/* > >>> + * Unfortunatly, _b and _mask are not aligned to an int (or long int) > >>> + * Some arches dont care, unrolling the loop is a win on them. > >>> + */ > >>> +static unsigned long ifname_compare(const char *_a, const char *_b, const char *_mask) > >>> +{ > >>> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > >>> + const unsigned long *a = (const unsigned long *)_a; > >>> + const unsigned long *b = (const unsigned long *)_b; > >> > >> I think we can at least give some help for the platforms which > >> require alignment. > >> > >> We can, for example, assume 16-bit alignment and thus loop > >> over u16's > > > >Right. How about this incremental patch ? > > > >Thanks > > > >[PATCH] arp_tables: ifname_compare() can assume 16bit alignment > > > >Arches without efficient unaligned access can still perform a loop > >assuming 16bit alignment in ifname_compare() > > Allow me some skepticism, but the code looks pretty much like a > standard memcmp. memcmp() can't make any assumptions about alignment. Whereas we _know_ this thing is exactly 16-bit aligned. All of the optimized memcmp() implementations look for 32-bit alignment and punt to byte at a time comparison loops if things are not aligned enough. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 2009-03-24 22:18, David Miller wrote: >> > >> >Arches without efficient unaligned access can still perform a loop >> >assuming 16bit alignment in ifname_compare() >> >> Allow me some skepticism, but the code looks pretty much like a >> standard memcmp. > >memcmp() can't make any assumptions about alignment. >Whereas we _know_ this thing is exactly 16-bit aligned. > >All of the optimized memcmp() implementations look for >32-bit alignment and punt to byte at a time comparison >loops if things are not aligned enough. Yes, I seem to remember glibc doing something like if ((addr & 0x03) != 0) { // process single bytes (increment addr as you go) // until addr & 0x03 == 0. } /* optimized loop here. also increases addr */ if ((addr & 0x03) != 0) // still bytes left after loop - process on a per-byte basis Is the cost of testing for non-4-divisibility expensive enough to warrant not usnig memcmp? Irrespective of all that, I think putting the interface comparison code should be agglomerated in a function/header so that it is replicated across iptables, ip6tables, ebtables, arptables, etc. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Jan Engelhardt <jengelh@medozas.de> Date: Tue, 24 Mar 2009 22:23:23 +0100 (CET) > Is the cost of testing for non-4-divisibility expensive enough > to warrant not usnig memcmp? I think so. This is the fast path of these things. > Irrespective of all that, I think putting the interface comparison > code should be agglomerated in a function/header so that it is > replicated across iptables, ip6tables, ebtables, arptables, etc. Agreed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Engelhardt a écrit : > On Tuesday 2009-03-24 22:18, David Miller wrote: >>>> Arches without efficient unaligned access can still perform a loop >>>> assuming 16bit alignment in ifname_compare() >>> Allow me some skepticism, but the code looks pretty much like a >>> standard memcmp. >> memcmp() can't make any assumptions about alignment. >> Whereas we _know_ this thing is exactly 16-bit aligned. >> >> All of the optimized memcmp() implementations look for >> 32-bit alignment and punt to byte at a time comparison >> loops if things are not aligned enough. > > Yes, I seem to remember glibc doing something like > > if ((addr & 0x03) != 0) { > // process single bytes (increment addr as you go) > // until addr & 0x03 == 0. > } > > /* optimized loop here. also increases addr */ > > if ((addr & 0x03) != 0) > // still bytes left after loop - process on a per-byte basis > > Is the cost of testing for non-4-divisibility expensive enough > to warrant not usnig memcmp? > > Irrespective of all that, I think putting the interface comparison > code should be agglomerated in a function/header so that it is > replicated across iptables, ip6tables, ebtables, arptables, etc. memcmp() is fine, but how is it solving the masking problem we have ? Also in the case of arp_tables, _a is long word aligned, while _b and _mask are not. memcmp() in this case is slower, (and dont handle mask thing) If you look various ifname_compare(), we have two different implementations. So yes, a factorization is possible for three ip_tables.c, ip6_tables.c and xt_physdev.c -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 2009-03-24 22:39, Eric Dumazet wrote: >>> memcmp() can't make any assumptions about alignment. >>> Whereas we _know_ this thing is exactly 16-bit aligned. >>> >>> All of the optimized memcmp() implementations look for >>> 32-bit alignment and punt to byte at a time comparison >>> loops if things are not aligned enough. >> >> Yes, I seem to remember glibc doing something like >> [...] >> Is the cost of testing for non-4-divisibility expensive enough >> to warrant not usnig memcmp? >> >> Irrespective of all that, I think putting the interface comparison >> code should be agglomerated in a function/header so that it is >> replicated across iptables, ip6tables, ebtables, arptables, etc. > >memcmp() is fine, but how is it solving the masking problem we have ? You are right; we would have to calculate the prefix length of the mask first. (And I think we can assume that there will not be any 1s in the mask after the first 0.) It would consume CPU indeed. >Also in the case of arp_tables, _a is long word aligned, while _b and _mask >are not. > >If you look various ifname_compare(), we have two different implementations. > >So yes, a factorization is possible for three ip_tables.c, ip6_tables.c and > xt_physdev.c. Very well. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller <davem@davemloft.net> writes: > > memcmp() can't make any assumptions about alignment. Newer gcc often[1] knows how to generate specialized aligned memcmp() based on the type alignment. However you need to make sure the input types are right. So in theories some casts might be enough given a new enough compiler. [1] not always unfortunately, sometimes it loses this information. -Andi
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 64a7c6c..84b9c17 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -76,6 +76,7 @@ static inline int arp_devaddr_compare(const struct arpt_devaddr_info *ap, /* * Unfortunatly, _b and _mask are not aligned to an int (or long int) * Some arches dont care, unrolling the loop is a win on them. + * For other arches, we only have a 16bit alignement. */ static unsigned long ifname_compare(const char *_a, const char *_b, const char *_mask) { @@ -95,10 +96,13 @@ static unsigned long ifname_compare(const char *_a, const char *_b, const char * BUILD_BUG_ON(IFNAMSIZ > 4 * sizeof(unsigned long)); #else unsigned long ret = 0; + const u16 *a = (const u16 *)_a; + const u16 *b = (const u16 *)_b; + const u16 *mask = (const u16 *)_mask; int i; - for (i = 0; i < IFNAMSIZ; i++) - ret |= (_a[i] ^ _b[i]) & _mask[i]; + for (i = 0; i < IFNAMSIZ/sizeof(u16); i++) + ret |= (a[i] ^ b[i]) & mask[i]; #endif return ret; }