Message ID | 20200509052235.150348-1-zenczykowski@gmail.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
Series | document danger of '-j REJECT'ing of '-m state INVALID' packets | expand |
On Saturday 2020-05-09 07:22, Maciej Żenczykowski wrote: >diff --git a/extensions/libip6t_REJECT.man b/extensions/libip6t_REJECT.man >index 0030a51f..b6474811 100644 >--- a/extensions/libip6t_REJECT.man >+++ b/extensions/libip6t_REJECT.man >@@ -30,3 +30,18 @@ TCP RST packet to be sent back. This is mainly useful for blocking > hosts (which won't accept your mail otherwise). > \fBtcp\-reset\fP > can only be used with kernel versions 2.6.14 or later. >+.PP >+\fIWarning:\fP if you are using connection tracking and \fBACCEPT\fP'ing >+\fBESTABLISHED\fP (and possibly \fBRELATED\fP) state packets, do not >+indiscriminately \fBREJECT\fP (especially with \fITCP RST\fP) \fBINVALID\fP >+state packets. Sometimes naturally occuring packet reordering will result >+in packets being considered \fBINVALID\fP and the generated \fITCP RST\fP >+will abort an otherwise healthy connection. I fail to understand the problem here. 1. Because ESTABLISHED and INVALID are mutually exclusive, there is no ordering dependency between two rules of the kind {EST=>ACCEPT, INV=>REJ}, and thus their order plays no role. 2. Given packets D,R (data, rst) leads to state(ct(D))=EST, state(ct(R))=EST in the normal case. When this gets reordered to R,D, then we end up with state(ct(R))=EST, state(ct(D))=INV. Though the outcome of nfct changes, I do not think that will be of consequence, because in the absence of filtering, the tcp layer should be discarding/rejecting D. 3. Natural reordering of D1,D2 to D2,D1 should not cause nfct to drop the ct at reception of D1 and turn the state to INV. Reordering can happen at any time, and we'd be having more reports of problems if it did, wouldn't we...
So I've never tried to figure out how things break, just observed that they do - first many many years ago (close to 15ish) - between my wifi connected laptop at home and my university server in the same city. I've kept an INVALID->DROP rule in all my firewalls since then and not had problems. I vaguely recall seeing delayed packets when I debugged it back then. See for example: https://github.com/moby/libnetwork/issues/1090 for others running into this. Now we've hit an issue at work where a network misconfiguration has asymmetric one way pathing with a result that some packets were getting *massively* delayed, and it's been causing user firewalls to generate tcp resets for 'too old' 'already ack'ed' packets (ie. dups). While this is of course a misconfig, and it shouldn't happen, in practice it sometimes simply does. All it takes is for a packet to get into a long queue, and the network path to shift (immediately after it) to a less congested path. Due to bufferbloat those long queues can take seconds to drain and exceed path rtt by orders of magnitude. I *think* what happens is: A non-final tcp packet gets massively delayed, the packet past that makes it through to the receive, and triggers an ACK with SACK, which makes it back to the sender and triggers a retransmit and the connections keeps on making forward progress, then eventually the delayed packet arrives and it's no longer considered valid and triggers a tcp reset. Massively of course depends on the rtt and retransmit aggressiveness. Here's my attempt to demonstrate what I believe the problem to be: (on a freshly booted clean/empty/idle fedora 31 vm) iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT iptables -A INPUT -m state --state INVALID -j DROP modprobe ifb ip link set dev ifb0 up tc qdisc add dev ifb0 root netem reorder 99% 0% delay 10s tc qdisc add dev eth0 clsact tc filter add dev eth0 ingress u32 match u32 0 0 action mirred egress redirect dev ifb0 wget -O /dev/null https://git.kernel.org/torvalds/t/linux-5.7-rc4.tar.gz iptables-save -c ... /dev/null [ <=> ] 169.58M 2.93MB/s in 45s 2020-05-09 10:35:44 (3.81 MB/s) - ‘/dev/null’ saved [177819073] ... [31750:181080717] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT [244:1403178] -A INPUT -m state --state INVALID -j DROP Now if I reboot, and run the same script, except instead of the INVALID/DROP rule I do iptables -A INPUT -p tcp -j REJECT --reject-with tcp-reset then the download never finishes (it hangs after 15MB @ 2MB/s and eventually times out). [4170:16758894] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT [37:147454] -A INPUT -p tcp -j REJECT --reject-with tcp-reset (arguably since this is a VM, and thus NAT'ed by my host, and then again by the real ipv4 NAT, the setup isn't entirely clear, but I hope it makes my point: INVALID state needs to be dropped, not rejected)
Side note, it doesn't have to be nearly as aggressive as the above. With just: tc qdisc replace dev ifb0 root netem reorder 99.9% 0% delay 1s I still see 169.58M @ 7.02MB/s in 26s: [24263:180667450] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT [27:174654] -A INPUT -m state --state INVALID -j DROP [0:0] -A INPUT -p tcp -j REJECT --reject-with tcp-reset And the connection still freezes without the INVALID/DROP rule (after 43MiB this time)
diff --git a/extensions/libip6t_REJECT.man b/extensions/libip6t_REJECT.man index 0030a51f..b6474811 100644 --- a/extensions/libip6t_REJECT.man +++ b/extensions/libip6t_REJECT.man @@ -30,3 +30,18 @@ TCP RST packet to be sent back. This is mainly useful for blocking hosts (which won't accept your mail otherwise). \fBtcp\-reset\fP can only be used with kernel versions 2.6.14 or later. +.PP +\fIWarning:\fP if you are using connection tracking and \fBACCEPT\fP'ing +\fBESTABLISHED\fP (and possibly \fBRELATED\fP) state packets, do not +indiscriminately \fBREJECT\fP (especially with \fITCP RST\fP) \fBINVALID\fP +state packets. Sometimes naturally occuring packet reordering will result +in packets being considered \fBINVALID\fP and the generated \fITCP RST\fP +will abort an otherwise healthy connection. +.P +Suggested use: +.br + -A INPUT -m state ESTABLISHED,RELATED -j ACCEPT +.br + -A INPUT -m state INVALID -j DROP +.br +(and -j REJECT rules go here at the end) diff --git a/extensions/libipt_REJECT.man b/extensions/libipt_REJECT.man index 8a360ce7..d0f0f19b 100644 --- a/extensions/libipt_REJECT.man +++ b/extensions/libipt_REJECT.man @@ -30,3 +30,18 @@ TCP RST packet to be sent back. This is mainly useful for blocking hosts (which won't accept your mail otherwise). .IP (*) Using icmp\-admin\-prohibited with kernels that do not support it will result in a plain DROP instead of REJECT +.PP +\fIWarning:\fP if you are using connection tracking and \fBACCEPT\fP'ing +\fBESTABLISHED\fP (and possibly \fBRELATED\fP) state packets, do not +indiscriminately \fBREJECT\fP (especially with \fITCP RST\fP) \fBINVALID\fP +state packets. Sometimes naturally occuring packet reordering will result +in packets being considered \fBINVALID\fP and the generated \fITCP RST\fP +will abort an otherwise healthy connection. +.P +Suggested use: +.br + -A INPUT -m state ESTABLISHED,RELATED -j ACCEPT +.br + -A INPUT -m state INVALID -j DROP +.br +(and -j REJECT rules go here at the end)