Message ID | 4C19E634.3030703@cn.fujitsu.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Le jeudi 17 juin 2010 à 17:09 +0800, Shan Wei a écrit : > *Description of Problem* > When received an UDP packet, if the length parameter in UDP header is less than > the actual length of payload(including 8 bytes UDP header), and checksum parameter > is calculated including all payload, some NIC devices that supports hardware checksum > success to check checksum, and set ip_summed with CHECKSUM_UNNECESSARY flag. > But If we turn off rx-checksumming offload, UDP protocol failed to check the checksum. > > *Step to Reproduce* > We need to download netwib&netwox tools and then install them only on M1 node. > On M1 node, execute the below steps. > > M1 M2 > +---------------------------+ +---------------------------+ > | eth1 |<---------------> |eth0 | > |fe80::225:86ff:fe9d:3efa | |fe80::215:17ff:fe71:51f4 | > +---------------------------+ +---------------------------+ > > 1. netwox 149 -i fe80::215:17ff:fe71:51f4 -d eth1 -E 0:0:0:0:1:0 -e 0:15:17:71:51:f4 -I fe80::200:ff:fe00:100 -c 1 > This step is to create neighbor cache for spurious source address of fe80::200:ff:fe00:100. > > 2. netwox 141 -d eth1 -a 0:0:0:0:1:0 -b 0:15:17:71:51:f4 -f 17 -g 64 -h fe80::200:ff:fe00:100 -i fe80::215:17ff:fe71:51f4 \ > -o 3333 -p 7 -q 000000000000000000000000000000000000000000000000 -r 34525 -e 32 -s 16 -t 35126 > This step is to construct an UDPv6 packet that length field(16 bytes) less than total payload length(32 bytes). > > The readable format of this packet that netwox shows. > Ethernet________________________________________________________. > | 00:00:00:00:01:00->00:15:17:71:51:F4 type:0x86DD | > |_______________________________________________________________| > IP______________________________________________________________. > |version| traffic class | flow label | > |___6___|_______0_______|___________________0___________________| > | payload length | next header | hop limit | > |___________0x0020=32___________|____0x11=17____|______64_______| > | source | > |_____________________fe80::200:ff:fe00:100_____________________| > | destination | > |___________________fe80::215:17ff:fe71:51f4____________________| > UDP_____________________________________________________________. > | source port | destination port | > |__________0x0D05=3333__________|___________0x0007=7____________| > | length | checksum | > |___________0x0010=16___________|_________0x8936=35126__________| > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 # ................ > 00 00 00 00 00 00 00 00 # ........ > > > *Actual Results* > On M2 note, using ethtool to see the counter about rx_csum_offload. > #ethtool -S eth0 | grep csum > rx_csum_offload_good: 1 > rx_csum_offload_errors: 0 > > #cat /proc/net/snmp6 | grep Udp6 > Udp6InDatagrams 1 > Udp6InErrors 0 > > *Expected Results* > #ethtool -S eth0 | grep csum > rx_csum_offload_good: 0 > rx_csum_offload_errors: 1 > > #cat /proc/net/snmp6 | grep Udp6 > Udp6InDatagrams 0 > Udp6InErrors 1 > > *The Reason* > UDPv6 handles a received packet like this: > 1. Confirm length of data > If length parameter in UDPv6 header is greater than skb->len(actual data length added UDP header), > the packet will be dropped. If length parameter in UDPv6 header is lower than skb->len, the data > will be trimmed to be equal to length parameter. > > 2. Then UDPv6 calculates checksum with 40 bytes IPv6 pseudo-header,8 bytes UDPv6 header, 8 bytes > Payload Data. Note that checksum(35126) in UDPv6 header includes 16 bytes redundant data. > > NIC checks checksum with total data includes redundant data, So the checksum that hardware calculated > is different from that UDP did. > > > *The Solution* > We have reported the problem to Intel E1000e developer, the reply from Ronciak John is that > the driver code of e1000e is ok. > About the discuss, see http://comments.gmane.org/gmane.linux.drivers.e1000.devel/7077 > > For this case, UDP protocol should not trust the CHECKSUM_UNNECESSARY flag set by driver. > When UDP protocol received this kind of packet, if NIC hardware checked successfully, > we reset ip_summed with CHECKSUM_NONE, and UDP protocol checked checksum again. > > (This patch is not complete, it's just for my idea.) > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c > index 1dd1aff..47f7e86 100644 > --- a/net/ipv6/udp.c > +++ b/net/ipv6/udp.c > @@ -723,6 +723,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, > if (ulen < skb->len) { > if (pskb_trim_rcsum(skb, ulen)) > goto short_packet; > + > + if (skb_csum_unnecessary(skb)) > + skb->ip_summed = CHECKSUM_NONE; > + > saddr = &ipv6_hdr(skb)->saddr; > daddr = &ipv6_hdr(skb)->daddr; > uh = udp_hdr(skb); > I really dont know if this fix is the right one. pskb_trim_rcsum() already contains a check. Should this check be changed to include yours ? static inline int pskb_trim_rcsum(struct sk_buff *skb, unsigned int len) { if (likely(len >= skb->len)) return 0; if (skb->ip_summed == CHECKSUM_COMPLETE) skb->ip_summed = CHECKSUM_NONE; return __pskb_trim(skb, len); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote, at 06/26/2010 01:28 PM: >> (This patch is not complete, it's just for my idea.) >> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c >> index 1dd1aff..47f7e86 100644 >> --- a/net/ipv6/udp.c >> +++ b/net/ipv6/udp.c >> @@ -723,6 +723,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, >> if (ulen < skb->len) { >> if (pskb_trim_rcsum(skb, ulen)) >> goto short_packet; >> + >> + if (skb_csum_unnecessary(skb)) >> + skb->ip_summed = CHECKSUM_NONE; >> + >> saddr = &ipv6_hdr(skb)->saddr; >> daddr = &ipv6_hdr(skb)->daddr; >> uh = udp_hdr(skb); >> > > I really dont know if this fix is the right one. > > pskb_trim_rcsum() already contains a check. Should this check be changed > to include yours ? Oh..... I don't think so. pskb_trim_rcsum() is also used when IPv4/IPv6 protocol receiving packets and reassembling fragments. IP protocol does the right check and should trust CHECKSUM_UNNECESSARY flag that drivers set, So we no need to change IP protocol. If we add the skb_csum_unnecessary(skb) check into pskb_trim_rcsum() and reset ip_summed with CHECKSUM_NONE, the checksum check that NIC hardward has done is wasted. Only for UDP protocol over IPv4/IPv6, and length parameter is lower than skb->len, We reset ip_summed with CHECKSUM_NONE.
Le lundi 28 juin 2010 à 18:49 +0800, Shan Wei a écrit : > Eric Dumazet wrote, at 06/26/2010 01:28 PM: > >> (This patch is not complete, it's just for my idea.) > >> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c > >> index 1dd1aff..47f7e86 100644 > >> --- a/net/ipv6/udp.c > >> +++ b/net/ipv6/udp.c > >> @@ -723,6 +723,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, > >> if (ulen < skb->len) { > >> if (pskb_trim_rcsum(skb, ulen)) > >> goto short_packet; > >> + > >> + if (skb_csum_unnecessary(skb)) > >> + skb->ip_summed = CHECKSUM_NONE; > >> + > >> saddr = &ipv6_hdr(skb)->saddr; > >> daddr = &ipv6_hdr(skb)->daddr; > >> uh = udp_hdr(skb); > >> > > > > I really dont know if this fix is the right one. > > > > pskb_trim_rcsum() already contains a check. Should this check be changed > > to include yours ? > > Oh..... I don't think so. > pskb_trim_rcsum() is also used when IPv4/IPv6 protocol receiving packets > and reassembling fragments. IP protocol does the right check and should > trust CHECKSUM_UNNECESSARY flag that drivers set, So we no need to > change IP protocol. > If we add the skb_csum_unnecessary(skb) check into pskb_trim_rcsum() and > reset ip_summed with CHECKSUM_NONE, the checksum check that NIC hardward > has done is wasted. > > Only for UDP protocol over IPv4/IPv6, and length parameter is lower than > skb->len, We reset ip_summed with CHECKSUM_NONE. > > I read RFC 768 again, and cannot tell if your patch is ever needed. NIC validated the UDP checksum including the whole IP data length, not the udp length. Linux kernel computes checksum using udp.length, not ip.length, because only udp.length is delivered to application anyway. Extra padding is meaningless. RFC 768 author didnt asserted ip.length = udp.length + 8 Both implementations are RFC compliant, but may have different results with special hand crafted packets. You add a test for a non issue, my analysis is we should not care at all, unless you have another valid point (security ???) If a change is needed, I would vote for a change in NIC firmware, because RFC 768 gives following pseudo header : 0 7 8 15 16 23 24 31 +--------+--------+--------+--------+ | source address | +--------+--------+--------+--------+ | destination address | +--------+--------+--------+--------+ | zero |protocol| UDP length | +--------+--------+--------+--------+ Note it mentions UDP length, not IP length. If NIC reports UDP check sum was verified, it should have used UDP length as well. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote, at 06/29/2010 04:38 AM:
> If a change is needed, I would vote for a change in NIC firmware,
Eric, thanks for your explanation very much.
My solution is ugly. Modify UDP protocol stack is just a way to circumvent.
Most in need of modification should be the NIC's fireware. However,
such as the e1000e network card for years, and no serious problems have been reported.
So, I think this thread can be closed now.
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 1dd1aff..47f7e86 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -723,6 +723,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, if (ulen < skb->len) { if (pskb_trim_rcsum(skb, ulen)) goto short_packet; + + if (skb_csum_unnecessary(skb)) + skb->ip_summed = CHECKSUM_NONE; + saddr = &ipv6_hdr(skb)->saddr; daddr = &ipv6_hdr(skb)->daddr; uh = udp_hdr(skb);