Message ID | 20150430215348.1798.15509.stgit@ahduyck-vm-fedora22 |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 2015-04-30 at 14:53 -0700, Alexander Duyck wrote: > This change takes advantage of the fact that ETH_P_802_3_MIN is aligned to > 512 so as a result we can actually ignore the lower 8b when comparing the > Ethertype to ETH_P_802_3_MIN. This allows us to avoid a byte swap by simply > masking the value and comparing it to the byte swapped value for > ETH_P_802_3_MIN. > > Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> > --- > net/ethernet/eth.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c > index f3bad41d725f..60069318d5d1 100644 > --- a/net/ethernet/eth.c > +++ b/net/ethernet/eth.c > @@ -178,7 +178,7 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev) > if (unlikely(netdev_uses_dsa(dev))) > return htons(ETH_P_XDSA); > > - if (likely(ntohs(eth->h_proto) >= ETH_P_802_3_MIN)) > + if (likely((eth->h_proto & htons(0xFF00)) >= htons(ETH_P_802_3_MIN))) > return eth->h_proto; Then, a byte operation on x86 is shorter/faster than u16 one. You also could use if (likely(*(u8 *)ð->h_proto >= (ETH_P_802_3_MIN>>8))) return eth->h_proto; I would at least leave a comment here to explain the logic. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/30/2015 04:03 PM, Eric Dumazet wrote: > On Thu, 2015-04-30 at 14:53 -0700, Alexander Duyck wrote: >> This change takes advantage of the fact that ETH_P_802_3_MIN is aligned to >> 512 so as a result we can actually ignore the lower 8b when comparing the >> Ethertype to ETH_P_802_3_MIN. This allows us to avoid a byte swap by simply >> masking the value and comparing it to the byte swapped value for >> ETH_P_802_3_MIN. >> >> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> >> --- >> net/ethernet/eth.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c >> index f3bad41d725f..60069318d5d1 100644 >> --- a/net/ethernet/eth.c >> +++ b/net/ethernet/eth.c >> @@ -178,7 +178,7 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev) >> if (unlikely(netdev_uses_dsa(dev))) >> return htons(ETH_P_XDSA); >> >> - if (likely(ntohs(eth->h_proto) >= ETH_P_802_3_MIN)) >> + if (likely((eth->h_proto & htons(0xFF00)) >= htons(ETH_P_802_3_MIN))) >> return eth->h_proto; > Then, a byte operation on x86 is shorter/faster than u16 one. > > You also could use > > if (likely(*(u8 *)ð->h_proto >= (ETH_P_802_3_MIN>>8))) > return eth->h_proto; > > I would at least leave a comment here to explain the logic. Actually a byte operation itself is not faster. Note in the next line we are returning the value. So what you typically end up with by doing it that way would be 2 reads, one for the u8 and one for the u16 return value. That is actually what I am trying to address in the second patch in the set since we were doing a 8b test on the first byte of the address followed by a 64b read. The advantage with the way I wrote this is that the compiler itself should be able to sort out how it wants to test the value while accessing it in a 16b size. So at worst case it is a mask and compare, followed by a return of the value. From what I have seen the compiler seems to be smart enough on x86 anyway to just convert this into a one byte compare on AL and then return the result in AX. I would suspect that for bit-endian systems it would likely just perform the compare. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-04-30 at 16:24 -0700, Alexander Duyck wrote: > Actually a byte operation itself is not faster. Note in the next line > we are returning the value. So what you typically end up with by doing > it that way would be 2 reads, one for the u8 and one for the u16 return > value. That is actually what I am trying to address in the second patch > in the set since we were doing a 8b test on the first byte of the > address followed by a 64b read. > > The advantage with the way I wrote this is that the compiler itself > should be able to sort out how it wants to test the value while > accessing it in a 16b size. So at worst case it is a mask and compare, > followed by a return of the value. From what I have seen the compiler > seems to be smart enough on x86 anyway to just convert this into a one > byte compare on AL and then return the result in AX. I would suspect > that for bit-endian systems it would likely just perform the compare. > My compiler (4.8.2 (Ubuntu 4.8.2-19ubuntu1)) does the following : 62d: 0f b7 42 0c movzwl 0xc(%rdx),%eax 631: 0f b6 d0 movzbl %al,%edx 634: 83 fa 05 cmp $0x5,%edx 637: 7e 02 jle 63b <eth_type_trans+0x8b> 639: c9 leaveq 63a: c3 retq Presumably this would be possible movzwl 0xc(%rdx),%eax cmp $0x5,%al jle 63b <eth_type_trans+0x8b> leaveq retq -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/30/2015 05:13 PM, Eric Dumazet wrote: > On Thu, 2015-04-30 at 16:24 -0700, Alexander Duyck wrote: > >> Actually a byte operation itself is not faster. Note in the next line >> we are returning the value. So what you typically end up with by doing >> it that way would be 2 reads, one for the u8 and one for the u16 return >> value. That is actually what I am trying to address in the second patch >> in the set since we were doing a 8b test on the first byte of the >> address followed by a 64b read. >> >> The advantage with the way I wrote this is that the compiler itself >> should be able to sort out how it wants to test the value while >> accessing it in a 16b size. So at worst case it is a mask and compare, >> followed by a return of the value. From what I have seen the compiler >> seems to be smart enough on x86 anyway to just convert this into a one >> byte compare on AL and then return the result in AX. I would suspect >> that for bit-endian systems it would likely just perform the compare. >> > > My compiler (4.8.2 (Ubuntu 4.8.2-19ubuntu1)) does the following : > > 62d: 0f b7 42 0c movzwl 0xc(%rdx),%eax > 631: 0f b6 d0 movzbl %al,%edx > 634: 83 fa 05 cmp $0x5,%edx > 637: 7e 02 jle 63b <eth_type_trans+0x8b> > 639: c9 leaveq > 63a: c3 retq > > Presumably this would be possible > > movzwl 0xc(%rdx),%eax > cmp $0x5,%al > jle 63b <eth_type_trans+0x8b> > leaveq > retq > > My compiler (5.0.1 (Red Hat 5.0.1-0.1)) does like what you have in the "would be possible" example. What I end up with is something like this: 648: 0f b7 42 0c movzwl 0xc(%rdx),%eax 64c: 3c 05 cmp $0x5,%al 64e: 76 40 jbe 690 <eth_type_trans+0xc0> The assembler before my patch was: 652: 0f b7 40 0c movzwl 0xc(%rax),%eax 656: 89 c2 mov %eax,%edx 658: 66 c1 c2 08 rol $0x8,%dx 65c: 0f b7 d2 movzwl %dx,%edx 65f: 81 fa ff 05 00 00 cmp $0x5ff,%edx 665: 7e 41 jle 6a8 <eth_type_trans+0xd8> The savings isn't meant to be anything huge for the patch, maybe a cycle or two. I suspect the before on your system is probably something similar to what I had so we are still probably dropping at least 2 instructions. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index f3bad41d725f..60069318d5d1 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -178,7 +178,7 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev) if (unlikely(netdev_uses_dsa(dev))) return htons(ETH_P_XDSA); - if (likely(ntohs(eth->h_proto) >= ETH_P_802_3_MIN)) + if (likely((eth->h_proto & htons(0xFF00)) >= htons(ETH_P_802_3_MIN))) return eth->h_proto; /*
This change takes advantage of the fact that ETH_P_802_3_MIN is aligned to 512 so as a result we can actually ignore the lower 8b when comparing the Ethertype to ETH_P_802_3_MIN. This allows us to avoid a byte swap by simply masking the value and comparing it to the byte swapped value for ETH_P_802_3_MIN. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> --- net/ethernet/eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html