Message ID | 20101111210341.31350.86916.stgit@paris.rdu.redhat.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Le jeudi 11 novembre 2010 à 16:03 -0500, Eric Paris a écrit : > THIS PATCH IS VERY POSSIBLY WRONG! But if it is I want some feedback. > > Basically what I found was that if I added an iptables rule like so: > > iptables -A OUTPUT -p tcp --dport 80 -j DROP > > And then ran a web browser like links it would just hang on 'establishing > connection.' I expected that the application would immediately, or at least > very quickly, get notified that the connect failed. This waiting for timeout > would be expected if something else dropped the SYN or if we were dropping the > SYN/ACK packet coming back, but I figured if we knew we threw away the SYN we knew > right away that the connection was denied and we should be able to indicate > that to the application. Yes, I realize this is little different than if the > SYN was dropped in the first network device, but it is different because we > know what happened! We know that connect() call failed and that there isn't > anything coming back. > > What I discovered was that we actually had 2 problems in making it possible. > For userspace to quickly realize the connect failed. The first was a problem > in the netfilter code which wasn't passing errors back up the stack correctly, > due to what I believe to be a mistake in precedence rules. > > http://marc.info/?l=netfilter-devel&m=128950262021804&w=2 > > And the second was that tcp_connect() was just ignoring the return value from > tcp_transmit_skb(). Maybe this was intentional but I really wish we could > find out that connect failed long before the minutes long timeout. Once I > fixed both of those issues I find that links gets denied (with EPERM) > immediately when it calls connect(). Is this wrong? Is this bad to tell > userspace more quickly what happened? Does passing this error code back up > the stack here break something else? Why do some functions seem to pay > attention to tcp_transmit_skb() return codes and some functions just ignore > it? What do others think? > I think its an interesting idea, but a temporary memory shortage would abort the connect(). We could imagine some special handling of the first packet of a flow being DROPED for whatever reason (flow control...) So it needs some refinement I think. SYN packets should be allowed to be re-transmitted before saying a TCP connect() cannot succeed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Yes, I realize this is little different than if the > SYN was dropped in the first network device, but it is different > because we know what happened! We know that connect() call failed > and that there isn't anything coming back. I would argue that -j DROP should behave exactly as the packet is dropped in the network, while -j REJECT should signal the failure to the application as soon as possible (which it doesn't seem to do). It does not only make sense, but also is a highly useful testing technique that we use -j DROP in OUTPUT to emulate network losses and see how the application behaves. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11.11.2010 22:58, Hua Zhong wrote: >> Yes, I realize this is little different than if the >> SYN was dropped in the first network device, but it is different >> because we know what happened! We know that connect() call failed >> and that there isn't anything coming back. > > I would argue that -j DROP should behave exactly as the packet is dropped in the network, while -j REJECT should signal the failure to the application as soon as possible (which it doesn't seem to do). It sends an ICMP error or TCP reset. Interpretation is up to TCP. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-11-11 at 13:58 -0800, Hua Zhong wrote: > > Yes, I realize this is little different than if the > > SYN was dropped in the first network device, but it is different > > because we know what happened! We know that connect() call failed > > and that there isn't anything coming back. > > I would argue that -j DROP should behave exactly as the packet is dropped in the network, while -j REJECT should signal the failure to the application as soon as possible (which it doesn't seem to do). > > It does not only make sense, but also is a highly useful testing technique that we use -j DROP in OUTPUT to emulate network losses and see how the application behaves. I guess I can be a bit more descriptive of my specific situation, although I'm not sure it matters. I don't actually plan to drop packets with -j REJECT or -j DROP, that's just a simple example everyone can see on their own machine. I plan to have the packets drop in the selinux netfilter hook. The SELinux hook uses NF_DROP/NF_ACCEPT just like any other netfilter hook. Maybe the answer is that I need to duplicate the -j REJECT type operations in the SELinux hook. -j REJECT doesn't do what I want today, but if that's the right way forward tell me and I'll look down that path. But the path I first started looking down rules in 2 distinct questions: 1) What should netfilter pass back up the stack. From my looking at this I see that nf_hook_slow() will convert NF_DROP into -EPERM and pass that back up the stack. Is this wrong? Should it more intelligently pass errors back up the stack? Maybe it needs an NF_REJECT as well as NF_DROP? NF_DROP returns 0 maybe and NF_REJECT return EPERM? 2) What should the generic TCP code (tcp_connect()) do if the skb failed to send. Should it return error codes back up the stack somehow or should they continue to be ignored? Obviously continuing to just ignore information we have doesn't make me happy (otherwise I wouldn't have started scratching this itch). But the point about ENOBUFS is well taken. Maybe I should make tcp_connect(), or the caller to tcp_connect() more intelligent about specific error codes? I'm looking for a path forward. If SELinux is rejecting the SYN packets on connect() I want to pass that info to userspace rather than just hanging. What's the best way to accomplish that? -Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 12 novembre 2010 à 11:08 -0500, Eric Paris a écrit : > 2) What should the generic TCP code (tcp_connect()) do if the skb failed > to send. Should it return error codes back up the stack somehow or > should they continue to be ignored? Obviously continuing to just ignore > information we have doesn't make me happy (otherwise I wouldn't have > started scratching this itch). But the point about ENOBUFS is well > taken. Maybe I should make tcp_connect(), or the caller to > tcp_connect() more intelligent about specific error codes? > > I'm looking for a path forward. If SELinux is rejecting the SYN packets > on connect() I want to pass that info to userspace rather than just > hanging. What's the best way to accomplish that? > Eric, if you can differentiate a permanent reject, instead of a temporary one (congestion, or rate limiting, or ENOBUF, or ...), then yes, you could make tcp_connect() report to user the permanent error, and ignore the temporary one. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 12, 2010 at 05:15:32PM +0100, Eric Dumazet wrote: > Le vendredi 12 novembre 2010 à 11:08 -0500, Eric Paris a écrit : > > > 2) What should the generic TCP code (tcp_connect()) do if the skb failed > > to send. Should it return error codes back up the stack somehow or > > should they continue to be ignored? Obviously continuing to just ignore > > information we have doesn't make me happy (otherwise I wouldn't have > > started scratching this itch). But the point about ENOBUFS is well > > taken. Maybe I should make tcp_connect(), or the caller to > > tcp_connect() more intelligent about specific error codes? > > > > I'm looking for a path forward. If SELinux is rejecting the SYN packets > > on connect() I want to pass that info to userspace rather than just > > hanging. What's the best way to accomplish that? > > > > Eric, if you can differentiate a permanent reject, instead of a > temporary one (congestion, or rate limiting, or ENOBUF, or ...), then > yes, you could make tcp_connect() report to user the permanent error, > and ignore the temporary one. If the netfilter targets DROP/REJECT match the NF_DROP/NF_REJECT counterparts, which i guess they do but i didn't read the source ;), then SELinux should use NF_REJECT in my opinion. NF_DROP does exactly what the name says, it drops the packet aka basically puts it in /dev/null. As with writing to /dev/null, you don't get an error for that. Even more, if in the meantime the DROP rule does not match anymore, the 2nd or 3rd SYN from the connect() can come through and establish a connection (think of "-m statistic" & co.) This is very different from REJECT. If REJECT doesn't immediately get reported to the application, that *is* a bug, but last time i checked i got EPERM immediately. I would fix SELinux to use the same mechanism. -David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2010-11-12 at 17:35 +0100, David Lamparter wrote: > On Fri, Nov 12, 2010 at 05:15:32PM +0100, Eric Dumazet wrote: > > Le vendredi 12 novembre 2010 à 11:08 -0500, Eric Paris a écrit : > > > > > 2) What should the generic TCP code (tcp_connect()) do if the skb failed > > > to send. Should it return error codes back up the stack somehow or > > > should they continue to be ignored? Obviously continuing to just ignore > > > information we have doesn't make me happy (otherwise I wouldn't have > > > started scratching this itch). But the point about ENOBUFS is well > > > taken. Maybe I should make tcp_connect(), or the caller to > > > tcp_connect() more intelligent about specific error codes? > > > > > > I'm looking for a path forward. If SELinux is rejecting the SYN packets > > > on connect() I want to pass that info to userspace rather than just > > > hanging. What's the best way to accomplish that? > > > > > > > Eric, if you can differentiate a permanent reject, instead of a > > temporary one (congestion, or rate limiting, or ENOBUF, or ...), then > > yes, you could make tcp_connect() report to user the permanent error, > > and ignore the temporary one. > > If the netfilter targets DROP/REJECT match the NF_DROP/NF_REJECT > counterparts, which i guess they do but i didn't read the source ;), > then SELinux should use NF_REJECT in my opinion. As it stands today there is no NF_REJECT. NF_DROP is the only (related) permitted return value from a netfilter hook. Maybe I need to change that fact though. > NF_DROP does exactly what the name says, it drops the packet aka > basically puts it in /dev/null. As with writing to /dev/null, you don't > get an error for that. Even more, if in the meantime the DROP rule does > not match anymore, the 2nd or 3rd SYN from the connect() can come > through and establish a connection (think of "-m statistic" & co.) > > This is very different from REJECT. > > If REJECT doesn't immediately get reported to the application, that *is* > a bug, but last time i checked i got EPERM immediately. I would fix > SELinux to use the same mechanism. I haven't looked at what -j REJECT does (or was intended to do) but it most certainly does not return an error to sys_connect(). Try it out. iptables -A OUTPUT -p tcp --dport 80 -j REJECT links www.google.com it just hangs on 'making connection' (exact same for -j DROP) If everyone agrees that's the wrong behavior (for -j REJECT) I'll work on fixing that (however is appropriate) and will change the SELinux code if needed after we've fixed the -j REJECT code. Obviously there's problems with my original way to fix the lack of error returns (namely that I would immediately EACCES for DROP as well as REJECT). I'm glad to hear that others seem to believe the current code is buggy and I'm not completely off my rocker to think that applications should be able to learn somehow that things fell down... -Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 12.11.2010 17:35, schrieb David Lamparter: > On Fri, Nov 12, 2010 at 05:15:32PM +0100, Eric Dumazet wrote: >> Le vendredi 12 novembre 2010 à 11:08 -0500, Eric Paris a écrit : >> >>> 2) What should the generic TCP code (tcp_connect()) do if the skb failed >>> to send. Should it return error codes back up the stack somehow or >>> should they continue to be ignored? Obviously continuing to just ignore >>> information we have doesn't make me happy (otherwise I wouldn't have >>> started scratching this itch). But the point about ENOBUFS is well >>> taken. Maybe I should make tcp_connect(), or the caller to >>> tcp_connect() more intelligent about specific error codes? >>> >>> I'm looking for a path forward. If SELinux is rejecting the SYN packets >>> on connect() I want to pass that info to userspace rather than just >>> hanging. What's the best way to accomplish that? >>> >> >> Eric, if you can differentiate a permanent reject, instead of a >> temporary one (congestion, or rate limiting, or ENOBUF, or ...), then >> yes, you could make tcp_connect() report to user the permanent error, >> and ignore the temporary one. Indeed. We could even make the NF_DROP return value configurable by encoding it in the verdict. > If the netfilter targets DROP/REJECT match the NF_DROP/NF_REJECT > counterparts, which i guess they do but i didn't read the source ;), > then SELinux should use NF_REJECT in my opinion. There is no NF_REJECT. > NF_DROP does exactly what the name says, it drops the packet aka > basically puts it in /dev/null. As with writing to /dev/null, you don't > get an error for that. Even more, if in the meantime the DROP rule does > not match anymore, the 2nd or 3rd SYN from the connect() can come > through and establish a connection (think of "-m statistic" & co.) > > This is very different from REJECT. Returning NF_DROP results in -EPERM getting reported back. As Eric noticed, this is ignored for SYN packets. > If REJECT doesn't immediately get reported to the application, that *is* > a bug, but last time i checked i got EPERM immediately. I would fix > SELinux to use the same mechanism. NF_DROP returns -EPERM, the REJECT targets send packets to reject a connection. Whether this is reported immediately depends on the error and the protocol in question. Using a TCP reset immediately resets the connection. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello! On Thu, Nov 11, 2010 at 04:03:41PM -0500, Eric Paris wrote: > immediately when it calls connect(). Is this wrong? Is this bad to tell > userspace more quickly what happened? Does passing this error code back up > the stack here break something else? Why do some functions seem to pay > attention to tcp_transmit_skb() return codes and some functions just ignore > it? Essentially, return value of tcp_transmit_skb() is always ignored. It is used only for accounting and for some optimization of retransmission behaviour. Generally, tcp does not react on errors coming outside of tcp protocol. The only loophole is ICMP error in the same case as yours. In _violation_ of specs linux immediately aborts unestablished connect on an icmp error. IMHO that thing which you suggest is correct (of course, provided you filter out transient errors and react only to EPERM or something like this). It was not done because it was expected firewall rule prescribing immediate abort is configured with "--reject-with icmp-port-unreachable", otherwise the rule orders real blackhole. Alexey -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello! I looked at tcp_v4_err() and found something strange. Quite non-trivial operations are performed on unlocked sockets. It looks like at least this BUG_ON(): skb = tcp_write_queue_head(sk); BUG_ON(!skb); can be easily triggered. Do I miss something? Alexey -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 12 novembre 2010 à 20:57 +0300, Alexey Kuznetsov a écrit : > Hello! > > I looked at tcp_v4_err() and found something strange. Quite non-trivial operations > are performed on unlocked sockets. It looks like at least this BUG_ON(): > > skb = tcp_write_queue_head(sk); > BUG_ON(!skb); > > can be easily triggered. > > Do I miss something? > Hi Alexey ! I see socket is locked around line 368, bh_lock_sock(sk); /* If too many ICMPs get dropped on busy * servers this needs to be solved differently. */ if (sock_owned_by_user(sk)) NET_INC_STATS_BH(net, LINUX_MIB_LOCKDROPPEDICMPS); Hmm, maybe some goto is missing ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 12 novembre 2010 à 19:12 +0100, Eric Dumazet a écrit : > Le vendredi 12 novembre 2010 à 20:57 +0300, Alexey Kuznetsov a écrit : > > Hello! > > > > I looked at tcp_v4_err() and found something strange. Quite non-trivial operations > > are performed on unlocked sockets. It looks like at least this BUG_ON(): > > > > skb = tcp_write_queue_head(sk); > > BUG_ON(!skb); > > > > can be easily triggered. > > > > Do I miss something? > > > > Hi Alexey ! > > I see socket is locked around line 368, > > bh_lock_sock(sk); > /* If too many ICMPs get dropped on busy > * servers this needs to be solved differently. > */ > if (sock_owned_by_user(sk)) > NET_INC_STATS_BH(net, LINUX_MIB_LOCKDROPPEDICMPS); > > > Hmm, maybe some goto is missing ;) > Well, goto is not missing. Why do you think BUG_ON(!skb) can be triggered ? We test before : if (seq != tp->snd_una || !icsk->icsk_retransmits || !icsk->icsk_backoff) break; So a concurrent user only can add new skb(s) in the (non empty) queue ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 12 novembre 2010 à 19:21 +0100, Eric Dumazet a écrit : > Le vendredi 12 novembre 2010 à 19:12 +0100, Eric Dumazet a écrit : > > Le vendredi 12 novembre 2010 à 20:57 +0300, Alexey Kuznetsov a écrit : > > > Hello! > > > > > > I looked at tcp_v4_err() and found something strange. Quite non-trivial operations > > > are performed on unlocked sockets. It looks like at least this BUG_ON(): > > > > > > skb = tcp_write_queue_head(sk); > > > BUG_ON(!skb); > > > > > > can be easily triggered. > > > > > > Do I miss something? > > > > > > > Hi Alexey ! > > > > I see socket is locked around line 368, > > > > bh_lock_sock(sk); > > /* If too many ICMPs get dropped on busy > > * servers this needs to be solved differently. > > */ > > if (sock_owned_by_user(sk)) > > NET_INC_STATS_BH(net, LINUX_MIB_LOCKDROPPEDICMPS); > > > > > > Hmm, maybe some goto is missing ;) > > > > Well, goto is not missing. > > Why do you think BUG_ON(!skb) can be triggered ? > > We test before : > > if (seq != tp->snd_una || !icsk->icsk_retransmits || > !icsk->icsk_backoff) > break; > > So a concurrent user only can add new skb(s) in the (non empty) queue ? > > Oh well, it seems you are right (backlog processing) Bug was introduced in commit f1ecd5d9e736660 (Revert Backoff [v3]: Revert RTO on ICMP destination unreachable) from Damian Lukowski -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello! On Fri, Nov 12, 2010 at 07:12:58PM +0100, Eric Dumazet wrote: > I see socket is locked around line 368, > > bh_lock_sock(sk); > /* If too many ICMPs get dropped on busy > * servers this needs to be solved differently. > */ > if (sock_owned_by_user(sk)) > NET_INC_STATS_BH(net, LINUX_MIB_LOCKDROPPEDICMPS); > > > Hmm, maybe some goto is missing ;) It is not missing, sock_owned_by_user() is checked later when some operation which cannot be done without lock is required. It was done to save error in sk_err_soft even when socket is locked. This code also _understands_ this: look at } else if (sock_owned_by_user(sk)) { /* RTO revert clocked out retransmission, * but socket is locked. Will defer. */ inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, HZ/20, TCP_RTO_MAX); but somehow it considers the manipulations with rto/backoff/write_queue as safe. Seems, they are not. Alexey -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello!
> Oh well, it seems you are right (backlog processing)
Exactly.
Alexey
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 12 novembre 2010 à 21:29 +0300, Alexey Kuznetsov a écrit : > Hello! > > On Fri, Nov 12, 2010 at 07:12:58PM +0100, Eric Dumazet wrote: > > I see socket is locked around line 368, > > > > bh_lock_sock(sk); > > /* If too many ICMPs get dropped on busy > > * servers this needs to be solved differently. > > */ > > if (sock_owned_by_user(sk)) > > NET_INC_STATS_BH(net, LINUX_MIB_LOCKDROPPEDICMPS); > > > > > > Hmm, maybe some goto is missing ;) > > It is not missing, sock_owned_by_user() is checked later when some operation which > cannot be done without lock is required. It was done to save error in sk_err_soft even > when socket is locked. > > This code also _understands_ this: look at > > } else if (sock_owned_by_user(sk)) { > /* RTO revert clocked out retransmission, > * but socket is locked. Will defer. */ > inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, > HZ/20, TCP_RTO_MAX); > > but somehow it considers the manipulations with rto/backoff/write_queue as safe. > Seems, they are not. Indeed, right you are, I came to same conclusion. I CC Damian Lukowski in my previous answer (and this one too) Thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Date: Fri, 12 Nov 2010 20:46:20 +0300 > The only loophole is ICMP error in the same case as yours. In > _violation_ of specs linux immediately aborts unestablished connect > on an icmp error. IMHO that thing which you suggest is correct (of > course, provided you filter out transient errors and react only to > EPERM or something like this). It was not done because it was > expected firewall rule prescribing immediate abort is configured > with "--reject-with icmp-port-unreachable", otherwise the rule > orders real blackhole. The idea to signal on -EPERM might be OK, but if that's also what things like "-m statistical" and friends end up reporting then we still cannot do it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 12, 2010 at 05:54:29PM +0100, Patrick McHardy wrote: > Am 12.11.2010 17:35, schrieb David Lamparter: > > On Fri, Nov 12, 2010 at 05:15:32PM +0100, Eric Dumazet wrote: > >> Le vendredi 12 novembre 2010 à 11:08 -0500, Eric Paris a écrit : > >> > >>> 2) What should the generic TCP code (tcp_connect()) do if the skb failed > >>> to send. Should it return error codes back up the stack somehow or > >>> should they continue to be ignored? Obviously continuing to just ignore > >>> information we have doesn't make me happy (otherwise I wouldn't have > >>> started scratching this itch). But the point about ENOBUFS is well > >>> taken. Maybe I should make tcp_connect(), or the caller to > >>> tcp_connect() more intelligent about specific error codes? > >>> > >>> I'm looking for a path forward. If SELinux is rejecting the SYN packets > >>> on connect() I want to pass that info to userspace rather than just > >>> hanging. What's the best way to accomplish that? > >>> > >> > >> Eric, if you can differentiate a permanent reject, instead of a > >> temporary one (congestion, or rate limiting, or ENOBUF, or ...), then > >> yes, you could make tcp_connect() report to user the permanent error, > >> and ignore the temporary one. > > Indeed. We could even make the NF_DROP return value configurable > by encoding it in the verdict. > There is no NF_REJECT. Ah, sorry, not at home in netfilter, coming from an user perspective here. > Returning NF_DROP results in -EPERM getting reported back. As Eric > noticed, this is ignored for SYN packets. Hrm. But how do you silently drop packets? This seems counterintuitive or even buggy to me; or at least the netfilter DROP target shouldn't use this kind of error-reporting drop. As food for thought I'd like to pose the following rule: iptables -A OUTPUT -m statistic --mode nth --every 5 -j DROP which should, to my understanding, still allow the connect to complete, even if the first SYN got (silently!...) dropped. Also, i'm *very* sure i was able to trigger a "permission denied" from either firewall or route rules; weirdly enough i can't get that on my 2.6.35.7 router... (poking older boxes to reproduce it right now) Just for reference some test results: (heavily cropped) TL;DR: only tcp-reset and route prohibit work immediately. + telnet 74.125.43.105 80 Connected to 74.125.43.105. + iptables -I OUTPUT -p tcp -d 74.125.43.105 --dport 80 -j REJECT # default w/o reject-with is icmp-port-unreachable + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: Connection refused real 0m3.014s + iptables -I OUTPUT -p tcp -d 74.125.43.105 --dport 80 -j REJECT --reject-with tcp-reset + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: Connection refused real 0m0.007s + iptables -I OUTPUT -p tcp -d 74.125.43.105 --dport 80 -j REJECT --reject-with host-prohib + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: No route to host real 0m3.010s + iptables -I OUTPUT -p tcp -d 74.125.43.105 --dport 80 -j REJECT --reject-with admin-prohib + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: No route to host real 0m3.009s + iptables -I OUTPUT -p tcp -d 74.125.43.105 --dport 80 -j REJECT --reject-with net-prohib + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: Network is unreachable real 0m3.011s + iptables -F OUTPUT + ip route add prohibit 74.125.43.105 + ip route flush cache + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: Network is unreachable real 0m0.007s -David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David Lamparter <equinox@diac24.net> Date: Fri, 12 Nov 2010 22:16:27 +0100 > As food for thought I'd like to pose the following rule: > iptables -A OUTPUT -m statistic --mode nth --every 5 -j DROP > which should, to my understanding, still allow the connect to complete, > even if the first SYN got (silently!...) dropped. Yes, I agree and this is pretty much the point I tried to make earlier. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On 11.11.2010 22:58, Hua Zhong wrote: > >> Yes, I realize this is little different than if the > >> SYN was dropped in the first network device, but it is different > >> because we know what happened! We know that connect() call failed > >> and that there isn't anything coming back. > > > > I would argue that -j DROP should behave exactly as the packet is > dropped in the network, while -j REJECT should signal the failure to > the application as soon as possible (which it doesn't seem to do). > > It sends an ICMP error or TCP reset. Interpretation is up to TCP. Huh? It's the OUTPUT chain we are talking about. There is no ICMP error or TCP reset. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 13.11.2010 00:14, Hua Zhong wrote: >> On 11.11.2010 22:58, Hua Zhong wrote: >>>> Yes, I realize this is little different than if the >>>> SYN was dropped in the first network device, but it is different >>>> because we know what happened! We know that connect() call failed >>>> and that there isn't anything coming back. >>> >>> I would argue that -j DROP should behave exactly as the packet is >> dropped in the network, while -j REJECT should signal the failure to >> the application as soon as possible (which it doesn't seem to do). >> >> It sends an ICMP error or TCP reset. Interpretation is up to TCP. > > Huh? It's the OUTPUT chain we are talking about. There is no ICMP error or > TCP reset. Of course there is. ICMP (default): iptables -A OUTPUT -p tcp -j REJECT TCP reset: iptables -A OUTPUT -p tcp -j REJECT --reject-with tcp-reset The second one will cause a hard error for the connection. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2010-11-15 at 11:32 +0100, Patrick McHardy wrote: > On 13.11.2010 00:14, Hua Zhong wrote: > >> On 11.11.2010 22:58, Hua Zhong wrote: > >>>> Yes, I realize this is little different than if the > >>>> SYN was dropped in the first network device, but it is different > >>>> because we know what happened! We know that connect() call failed > >>>> and that there isn't anything coming back. > >>> > >>> I would argue that -j DROP should behave exactly as the packet is > >> dropped in the network, while -j REJECT should signal the failure to > >> the application as soon as possible (which it doesn't seem to do). > >> > >> It sends an ICMP error or TCP reset. Interpretation is up to TCP. > > > > Huh? It's the OUTPUT chain we are talking about. There is no ICMP error or > > TCP reset. > > Of course there is. > > ICMP (default): > > iptables -A OUTPUT -p tcp -j REJECT > > TCP reset: > > iptables -A OUTPUT -p tcp -j REJECT --reject-with tcp-reset > > The second one will cause a hard error for the connection. Well I'm (I guess?) surprised that the --reject-with icmp doesn't do anything with a local outgoing connection but --reject-with tcp-reset does something like what I'm looking for. I notice the heavy lifting for this is done in net/ipv4/netfilter/ipt_REJECT.c::send_rest() (and something very similar for IPv6) I really don't want to duplicate that code into SELinux (for obvious reasons) and I'm wondering if anyone has objections to me making it available outside of netlink and/or suggestions on how to make that code available outside of netfilter (aka what header to expose it, and does it still make logical sense in ipt_REJECT.c or somewhere else?) -Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 15.11.2010 16:47, Eric Paris wrote: > On Mon, 2010-11-15 at 11:32 +0100, Patrick McHardy wrote: >> On 13.11.2010 00:14, Hua Zhong wrote: >>>> On 11.11.2010 22:58, Hua Zhong wrote: >>>>>> Yes, I realize this is little different than if the >>>>>> SYN was dropped in the first network device, but it is different >>>>>> because we know what happened! We know that connect() call failed >>>>>> and that there isn't anything coming back. >>>>> >>>>> I would argue that -j DROP should behave exactly as the packet is >>>> dropped in the network, while -j REJECT should signal the failure to >>>> the application as soon as possible (which it doesn't seem to do). >>>> >>>> It sends an ICMP error or TCP reset. Interpretation is up to TCP. >>> >>> Huh? It's the OUTPUT chain we are talking about. There is no ICMP error or >>> TCP reset. >> >> Of course there is. >> >> ICMP (default): >> >> iptables -A OUTPUT -p tcp -j REJECT >> >> TCP reset: >> >> iptables -A OUTPUT -p tcp -j REJECT --reject-with tcp-reset >> >> The second one will cause a hard error for the connection. > > Well I'm (I guess?) surprised that the --reject-with icmp doesn't do > anything with a local outgoing connection but --reject-with tcp-reset > does something like what I'm looking for. > > I notice the heavy lifting for this is done in > net/ipv4/netfilter/ipt_REJECT.c::send_rest() > (and something very similar for IPv6) > > I really don't want to duplicate that code into SELinux (for obvious > reasons) and I'm wondering if anyone has objections to me making it > available outside of netlink and/or suggestions on how to make that code > available outside of netfilter (aka what header to expose it, and does > it still make logical sense in ipt_REJECT.c or somewhere else?) I don't think having SELinux sending packets to handle local connections is a very elegant design, its not a firewall after all. What's wrong with reacting only to specific errno codes in tcp_connect()? You could f.i. return -ECONNREFUSED from SELinux, that one is pretty much guaranteed not to occur in the network stack itself and can be returned directly. That would need minor changes to nf_hook_slow so we can encode errno values in the upper 16 bits of the verdict, as we already do with the queue number. The added benefit is that we don't have to return EPERM anymore when f.i. rerouting fails. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 15.11.2010 16:57, Patrick McHardy wrote: > On 15.11.2010 16:47, Eric Paris wrote: >>> iptables -A OUTPUT -p tcp -j REJECT --reject-with tcp-reset >>> >>> The second one will cause a hard error for the connection. >> >> Well I'm (I guess?) surprised that the --reject-with icmp doesn't do >> anything with a local outgoing connection but --reject-with tcp-reset >> does something like what I'm looking for. >> >> I notice the heavy lifting for this is done in >> net/ipv4/netfilter/ipt_REJECT.c::send_rest() >> (and something very similar for IPv6) >> >> I really don't want to duplicate that code into SELinux (for obvious >> reasons) and I'm wondering if anyone has objections to me making it >> available outside of netlink and/or suggestions on how to make that code >> available outside of netfilter (aka what header to expose it, and does >> it still make logical sense in ipt_REJECT.c or somewhere else?) > > I don't think having SELinux sending packets to handle local > connections is a very elegant design, its not a firewall after > all. What's wrong with reacting only to specific errno codes > in tcp_connect()? You could f.i. return -ECONNREFUSED from > SELinux, that one is pretty much guaranteed not to occur in > the network stack itself and can be returned directly. One more note: there is also the problem that the RST might never reach the socket, f.i. because netfilter drops it, or TC actions reroute it etc. With netfilter users are expected to make sure the entire combination of network features does what the expect, but that's probably not what you want for SELinux. > That would need minor changes to nf_hook_slow so we can > encode errno values in the upper 16 bits of the verdict, > as we already do with the queue number. The added benefit > is that we don't have to return EPERM anymore when f.i. > rerouting fails. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello! On Mon, Nov 15, 2010 at 10:47:46AM -0500, Eric Paris wrote: > Well I'm (I guess?) surprised that the --reject-with icmp doesn't do > anything with a local outgoing connection Yeah. This was "an obvious surpise": for local socket the first icmp error always arrives on locked socket and gets dropped. I even forgot about this (tcp-reset still works ok due to backlog) This makes the idea to respect error more attractive. Alexey -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index e961522..67b8535 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2592,6 +2592,7 @@ int tcp_connect(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *buff; + int ret; tcp_connect_init(sk); @@ -2614,7 +2615,7 @@ int tcp_connect(struct sock *sk) sk->sk_wmem_queued += buff->truesize; sk_mem_charge(sk, buff->truesize); tp->packets_out += tcp_skb_pcount(buff); - tcp_transmit_skb(sk, buff, 1, sk->sk_allocation); + ret = tcp_transmit_skb(sk, buff, 1, sk->sk_allocation); /* We change tp->snd_nxt after the tcp_transmit_skb() call * in order to make this packet get counted in tcpOutSegs. @@ -2626,7 +2627,7 @@ int tcp_connect(struct sock *sk) /* Timer for repeating the SYN until an answer. */ inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, inet_csk(sk)->icsk_rto, TCP_RTO_MAX); - return 0; + return ret; } EXPORT_SYMBOL(tcp_connect);