Message ID | 1552865877-13401-1-git-send-email-bram-yvahk@mail.wizbit.be |
---|---|
Headers | show |
Series | Fragmentation of IPv4 in VTI | expand |
On Sun, Mar 17, 2019 at 11:37:55PM +0000, Bram Yvahk wrote: > We've experienced an issue with VTI when the path-mtu is smaller than the size > of the "client" packet. > > What happens: IPv4 packet from the client (i.e. another system in the LAN) > attempts to transmit some data; IPv4 header shows that 'DF' bit is not set but > still the client receives ICMPv4 "need-to-frag" message [which the client does > not expect and ignores]. > > Example: $ ping -s 1300 -M dont -c5 192.168.235.2 > PING 192.168.235.3 (192.168.235.3) 1300(1328) bytes of data. > From 192.168.236.254 icmp_seq=1 Frag needed and DF set (mtu = 1214) > From 192.168.236.254 icmp_seq=2 Frag needed and DF set (mtu = 1214) > From 192.168.236.254 icmp_seq=3 Frag needed and DF set (mtu = 1214) > From 192.168.236.254 icmp_seq=4 Frag needed and DF set (mtu = 1214) > From 192.168.236.254 icmp_seq=5 Frag needed and DF set (mtu = 1214) > > --- 192.168.235.3 ping statistics --- > 5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 3999ms Hm, this works here. Can you show how you setup the vti device? Some tunnel configuration options (set ttl etc.) force to have the DF bit set.
Steffen Klassert wrote: > On Sun, Mar 17, 2019 at 11:37:55PM +0000, Bram Yvahk wrote: >> We've experienced an issue with VTI when the path-mtu is smaller than the size >> of the "client" packet. >> >> What happens: IPv4 packet from the client (i.e. another system in the LAN) >> attempts to transmit some data; IPv4 header shows that 'DF' bit is not set but >> still the client receives ICMPv4 "need-to-frag" message [which the client does >> not expect and ignores]. >> >> Example: $ ping -s 1300 -M dont -c5 192.168.235.2 >> PING 192.168.235.3 (192.168.235.3) 1300(1328) bytes of data. >> From 192.168.236.254 icmp_seq=1 Frag needed and DF set (mtu = 1214) >> From 192.168.236.254 icmp_seq=2 Frag needed and DF set (mtu = 1214) >> From 192.168.236.254 icmp_seq=3 Frag needed and DF set (mtu = 1214) >> From 192.168.236.254 icmp_seq=4 Frag needed and DF set (mtu = 1214) >> From 192.168.236.254 icmp_seq=5 Frag needed and DF set (mtu = 1214) >> >> --- 192.168.235.3 ping statistics --- >> 5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 3999ms > > Hm, this works here. Can you show how you setup the vti device? > Some tunnel configuration options (set ttl etc.) force to have > the DF bit set. I will provide these details Tommorow. What I can say is that ttl was set to inherit. When testing this there is one important bit - which in hindsight I should've included in the previous message - the (IPsec) Gateway A needs to know the path-mtu to (IPsec) Gateway B. Some ways to accomplish this: - transmit a ICMP with DF bit set and a larger packet size from Gateway A to Gateway B - ensure the "nopmtudisc" option is *not* set in the xfrm state and then let client A transmit a ICMP *with* DF bit set to client B. [when "nopmtudisc" is set then all outgoing IPv4 ESP packet have the DF bit cleared, when "nopmtudisc" is not set then DF bit is copied from the client packet] For testing purposes I recommend to do the ping from Gateway A to Gateway B. (Otherwise tcpdumps/traffic get a bit more confusing.) A more in-depth description of what happens: Setup: ====== |----------| |-----------| |-------| |-----------| |----------| | client A |---| Gateway A |---| Hop H |---| Gateway B |---| client B | ------------ |-----------| |-------| |-----------| |----------| - testing with linux 4.14.95 (setup with more recent kernel is WIP) - link mtu between client A and Gateway A: 1500 - link mtu between Gateway A and Hop H: 1500 - link mtu between Hop H and Gateway B: 1280 - link mtu between Gateway B and client B: 1500 - path-mtu between Gateway A and Gateway B: 1280 - IPsec tunnel over *IPv4* between Gateway A and Gateway B - tunneling IPv4 over the IPsec tunnel - testing with VTI Scenario: ========== Before starting it's important to ensure that: - Gateway A does *not* know the path-mtu to Gateway B - Client A does *not* know the path-mtu to Gateway B * Step 1: client A: $ ping -M dont -s 1300 ip_of_client_B - IPv4 ICMP packet of client A does not have DF bit set - IPv4 ESP packet of Gateway A does not have DF bit set - Hop H receives a IPv4 ESP packet that is too large for link-mtu between Hop H and Gateway B: it fragments the IPv4 ESP packet. - Gateway B receives 2 IPv4 fragmented packets - (Client B receives one IPv4 ICMP packet from client A) * Step 2: Gateway A: $ ping -M do -s 1300 ip_of_gateway_B - IPv4 ICMP packet of Gateway A does have DF bit set - Gateway A receives a 'need to frag' ICMP from Hop H * Step 3: client A: $ ping -M dont -s 1300 ip_of_client_B - IPv4 ICMP packet of client A does not have DF bit set - Gateway A: it process this packet in VTI module and detects that packet size > path-mtu and then sends a 'need to frag' ICMP to client A. [this is the code I patched] => the critical bit in the above is that Gateway A learns the path-mtu to Gateway B. If it doesn't then it keeps assuming path-mtu is 1500 and the check in VTI will not trigger (since path-mtu of 1500 > packet size)
Bram Yvahk wrote: > Steffen Klassert wrote: >> On Sun, Mar 17, 2019 at 11:37:55PM +0000, Bram Yvahk wrote: >>> We've experienced an issue with VTI when the path-mtu is smaller than > the size >>> of the "client" packet. >>> >>> What happens: IPv4 packet from the client (i.e. another system in the > LAN) >>> attempts to transmit some data; IPv4 header shows that 'DF' bit is > not set but >>> still the client receives ICMPv4 "need-to-frag" message [which the > client does >>> not expect and ignores]. >>> >>> Example: $ ping -s 1300 -M dont -c5 192.168.235.2 >>> PING 192.168.235.3 (192.168.235.3) 1300(1328) bytes of data. >>> From 192.168.236.254 icmp_seq=1 Frag needed and DF set (mtu = 1214) >>> From 192.168.236.254 icmp_seq=2 Frag needed and DF set (mtu = 1214) >>> From 192.168.236.254 icmp_seq=3 Frag needed and DF set (mtu = 1214) >>> From 192.168.236.254 icmp_seq=4 Frag needed and DF set (mtu = 1214) >>> From 192.168.236.254 icmp_seq=5 Frag needed and DF set (mtu = 1214) >>> >>> --- 192.168.235.3 ping statistics --- >>> 5 packets transmitted, 0 received, +5 errors, 100% packet loss, > time 3999ms >> Hm, this works here. Can you show how you setup the vti device? >> Some tunnel configuration options (set ttl etc.) force to have >> the DF bit set. > > I will provide these details Tommorow. > What I can say is that ttl was set to inherit. > vti device is created (on Gateway A) using: $ ip tun add name vti0 mode vti ikey 1 okey 1 local <ip gateway A> $ ip link show dev vti0 46: vti0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ipip <ip gateway A> brd 0.0.0.0 $ ip tun show name vti0 vti0: ip/ip remote any local <ip gateway A> ttl inherit key 1 [I've also done setup with mtu 1400 - all remains the same] xfrm state: src <ip gateway B> dst <ip gateway A> proto esp spi 0xcd76a4a9 reqid 16389 mode tunnel replay-window 32 flag nopmtudisc af-unspec auth-trunc hmac(sha1) 0x08e1ce16b1f7f9039f9cc7421cf61010c029efc3 96 enc cbc(aes) 0x22c7aacd9680a10a52b0c5670b7d850c35ba17f7c7dc6c963252cdc311b1f4d5 anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 src <ip gateway A> dst <ip gateway B> proto esp spi 0x8f2988c7 reqid 16389 mode tunnel replay-window 32 flag nopmtudisc af-unspec auth-trunc hmac(sha1) 0x229bbe490606ddcc6a68332babd498001591c6bf 96 enc cbc(aes) 0xd598dba419bfc45232580e54d517aae6a77c3328a51ebb3321802b89cc51ae43 anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 (same behaviour with/without nopmtudisc; nopmtudisc only makes a difference for packets from 'client A' that *do* have the DF bit set) > > When testing this there is one important bit - which in hindsight I > should've included in the previous message - the (IPsec) Gateway A > needs to know the path-mtu to (IPsec) Gateway B. > > Some ways to accomplish this: > - transmit a ICMP with DF bit set and a larger packet size from > Gateway A to Gateway B > - ensure the "nopmtudisc" option is *not* set in the xfrm state > and then let client A transmit a ICMP *with* DF bit set to > client B. [when "nopmtudisc" is set then all outgoing IPv4 ESP > packet have the DF bit cleared, when "nopmtudisc" is not set then > DF bit is copied from the client packet] > > For testing purposes I recommend to do the ping from Gateway A to > Gateway B. (Otherwise tcpdumps/traffic get a bit more confusing.) > > A more in-depth description of what happens: > > Setup: > ====== > > |----------| |-----------| |-------| |-----------| |----------| > | client A |---| Gateway A |---| Hop H |---| Gateway B |---| client B | > ------------ |-----------| |-------| |-----------| |----------| > > - testing with linux 4.14.95 (setup with more recent kernel is WIP) > - link mtu between client A and Gateway A: 1500 > - link mtu between Gateway A and Hop H: 1500 > - link mtu between Hop H and Gateway B: 1280 > - link mtu between Gateway B and client B: 1500 > - path-mtu between Gateway A and Gateway B: 1280 > - IPsec tunnel over *IPv4* between Gateway A and Gateway B > - tunneling IPv4 over the IPsec tunnel > - testing with VTI > > Scenario: > ========== > > Before starting it's important to ensure that: > - Gateway A does *not* know the path-mtu to Gateway B > - Client A does *not* know the path-mtu to Gateway B On Gateway A: $ ip route get <ip of gateway B> <ip gateway B> via <hop H> dev eth1 src <ip gateway A> uid 0 cache => no mtu shown --> path-mtu not yet known > > * Step 1: client A: $ ping -M dont -s 1300 ip_of_client_B > - IPv4 ICMP packet of client A does not have DF bit set > - IPv4 ESP packet of Gateway A does not have DF bit set > - Hop H receives a IPv4 ESP packet that is too large for link-mtu > between Hop H and Gateway B: it fragments the IPv4 ESP packet. > - Gateway B receives 2 IPv4 fragmented packets > - (Client B receives one IPv4 ICMP packet from client A) tcpdump on Gateway A: - from client A it receives: IP (tos 0x0, ttl 64, id 46797, offset 0, flags [none], proto ICMP (1), length 1328) client_A > client_B: ICMP echo request, id 6855, seq 1, length 1308 - it transmits (to Gateway B): IP (tos 0x0, ttl 64, id 10932, offset 0, flags [none], proto ESP (50), length 1400) gateway_A > gateway_B: ESP(spi=0x8f2988c7,seq=0x3), length 1380 tcpdump on Gateway B: - it receives (from Gateway A): IP (tos 0x0, ttl 63, id 10932, offset 0, flags [+], proto ESP (50), length 1276) gateway_A > gateway_B: ESP(spi=0x8f2988c7,seq=0x3), length 1256 IP (tos 0x0, ttl 63, id 10932, offset 1256, flags [none], proto ESP (50), length 144) gateway_A > gateway_B: ip-proto-50 - it transmits (to client B): IP (tos 0x0, ttl 62, id 46797, offset 0, flags [none], proto ICMP (1), length 1328) client_A > client_B: ICMP echo request, id 6855, seq 1, length 1308 => Hop H fragmented the IPv4 packets. This is expected: DF bit is not set on ESP packets and Gateway A does not know path-mtu to Gateway B > > * Step 2: Gateway A: $ ping -M do -s 1300 ip_of_gateway_B > - IPv4 ICMP packet of Gateway A does have DF bit set > - Gateway A receives a 'need to frag' ICMP from Hop H tcpdump on Gateway A: - it transmits (local packet - to Gateway B): IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 1328) gateway_A > gateway_B: ICMP echo request, id 28176, seq 1, length 1308 - it receives (from Hop H): IP (tos 0xc0, ttl 64, id 52788, offset 0, flags [none], proto ICMP (1), length 576) hop_H > gateway_A: ICMP 1.1.235.254 unreachable - need to frag (mtu 1280), length 556 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 1328) gateway_A > gateway_B: ICMP echo request, id 28176, seq 1, length 1308 => Hop H send need-to-frag mtu. This expected: DF bit is set on ICMP packet so Hop H should not fragment. on Gateway A: $ ip route get <ip of gateway B> <ip gateway B> via <hop H> dev eth1 src <ip gateway A> uid 0 cache expires 17sec mtu 1280 => path-mtu known to be 1280 > * Step 3: client A: $ ping -M dont -s 1300 ip_of_client_B > - IPv4 ICMP packet of client A does not have DF bit set > - Gateway A: it process this packet in VTI module and detects that > packet size > path-mtu and then sends a 'need to frag' ICMP to > client A. [this is the code I patched] tcpdump on Gateway A: - from client A it receives: IP (tos 0x0, ttl 64, id 46798, offset 0, flags [none], proto ICMP (1), length 1328) client_A > client_B: ICMP echo request, id 7063, seq 1, length 1308 - it transmits to client A: IP (tos 0xc0, ttl 64, id 59290, offset 0, flags [none], proto ICMP (1), length 576) gateway_A > client_A: ICMP client_B unreachable - need to frag (mtu 1214), length 556 IP (tos 0x0, ttl 63, id 46798, offset 0, flags [none], proto ICMP (1), length 1328) client_A > client_B: ICMP echo request, id 7063, seq 1, length 1308 > > => the critical bit in the above is that Gateway A learns > the path-mtu to Gateway B. If it doesn't then it keeps > assuming path-mtu is 1500 and the check in VTI will not > trigger (since path-mtu of 1500 > packet size)