Message ID | 1352247335-10396-1-git-send-email-jwerner@chromium.org |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, Nov 06, 2012 at 04:15:35PM -0800, Julius Werner wrote: > tcp_recvmsg contains a sanity check that WARNs when there is a gap > between the socket's copied_seq and the first buffer in the > sk_receive_queue. In theory, the TCP stack makes sure that This Should > Never Happen (TM)... however, practice shows that there are still a few > bug reports from it out there (and one in my inbox). > > Unfortunately, when it does happen for whatever reason, the situation > is not handled very well: the kernel logs a warning and breaks out of > the loop that walks the receive queue. It proceeds to find nothing else > to do on the socket and hits sk_wait_data, which cannot block because > the receive queue is not empty. As no data was read, the outer while > loop repeats (logging the same warning again) ad infinitum until the > system's syslog exhausts all available hard drive capacity. > > This patch improves that behavior by going straight to a proper kernel > crash. The cause of the error can be identified right away and the > system's hard drive is not unnecessarily strained. > > Signed-off-by: Julius Werner <jwerner@chromium.org> > --- > net/ipv4/tcp.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 197c000..fcb0927 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1628,7 +1628,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, > "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n", > *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt, > flags)) > - break; > + BUG(); > > offset = *seq - TCP_SKB_CB(skb)->seq; > if (tcp_hdr(skb)->syn) We've had reports of this WARN against the Fedora kernel for a while. Had this been immediately followed by a BUG(), we'd have never seen those traces at all, and just got "my machine just locked up" reports instead. The proper fix here is to find out why we're getting into this state. Dave -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> We've had reports of this WARN against the Fedora kernel for a while. > Had this been immediately followed by a BUG(), we'd have never seen those traces at all, > and just got "my machine just locked up" reports instead. > > The proper fix here is to find out why we're getting into this state. Are you sure you don't mean the WARN below that ("recvmsg bug 2") instead? I don't think this one can happen without eventually running into the syslog overflow issue I described. I agree that the underlying cause must be fixed too, but as we will always have bugs in the kernel I think proper handling when it does happen is also important (and filling the hard disk with junk is obviously not the best approach). If you think a full panic is too extreme, I have an alternative version of this patch that logs the WARN once, closes the socket, and returns EBADFD from the syscall... would you think that is more appropriate? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-11-06 at 20:39 -0500, Dave Jones wrote: > On Tue, Nov 06, 2012 at 04:15:35PM -0800, Julius Werner wrote: > > tcp_recvmsg contains a sanity check that WARNs when there is a gap > > between the socket's copied_seq and the first buffer in the > > sk_receive_queue. In theory, the TCP stack makes sure that This Should > > Never Happen (TM)... however, practice shows that there are still a few > > bug reports from it out there (and one in my inbox). > > > > Unfortunately, when it does happen for whatever reason, the situation > > is not handled very well: the kernel logs a warning and breaks out of > > the loop that walks the receive queue. It proceeds to find nothing else > > to do on the socket and hits sk_wait_data, which cannot block because > > the receive queue is not empty. As no data was read, the outer while > > loop repeats (logging the same warning again) ad infinitum until the > > system's syslog exhausts all available hard drive capacity. > > > > This patch improves that behavior by going straight to a proper kernel > > crash. The cause of the error can be identified right away and the > > system's hard drive is not unnecessarily strained. > > > > Signed-off-by: Julius Werner <jwerner@chromium.org> > > --- > > net/ipv4/tcp.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index 197c000..fcb0927 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -1628,7 +1628,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, > > "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n", > > *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt, > > flags)) > > - break; > > + BUG(); > > > > offset = *seq - TCP_SKB_CB(skb)->seq; > > if (tcp_hdr(skb)->syn) > > We've had reports of this WARN against the Fedora kernel for a while. > Had this been immediately followed by a BUG(), we'd have never seen those traces at all, > and just got "my machine just locked up" reports instead. > > The proper fix here is to find out why we're getting into this state. Yes, but there is no need to fill syslog over and over. In fact, some drivers are buggy and can overwrite skbs. Thats also a security issue, as payload can be changed without notice (unless SSL or application checksums are done, see commit abf02cfc179bb4bd for an example) Quite frankly BUG_ON() here is the only way we can fix bugs instead of being lazy. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 06, 2012 at 05:51:19PM -0800, Julius Werner wrote: > > We've had reports of this WARN against the Fedora kernel for a while. > > Had this been immediately followed by a BUG(), we'd have never seen those traces at all, > > and just got "my machine just locked up" reports instead. > > > > The proper fix here is to find out why we're getting into this state. > > Are you sure you don't mean the WARN below that ("recvmsg bug 2") > instead? I don't think this one can happen without eventually running > into the syslog overflow issue I described. bug2 is more common (And usually is accompanied by mangled traces), but we have reports of the first WARN too.. https://bugzilla.redhat.com/show_bug.cgi?id=841769 https://bugzilla.redhat.com/show_bug.cgi?id=845853 https://bugzilla.redhat.com/show_bug.cgi?id=846991 https://bugzilla.redhat.com/show_bug.cgi?id=860039 (I note that none of these reports mention "also, my hard disk is now full") > I agree that the underlying cause must be fixed too, but as we will > always have bugs in the kernel I think proper handling when it does > happen is also important (and filling the hard disk with junk is > obviously not the best approach). If you think a full panic is too > extreme, I have an alternative version of this patch that logs the > WARN once, closes the socket, and returns EBADFD from the syscall... > would you think that is more appropriate? It sounds more appropriate to me, instead of silently wedging the box. At least with that approach we have a chance of finding out what happened. Dave -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-11-07 at 10:54 -0500, Dave Jones wrote: > It sounds more appropriate to me, instead of silently wedging the box. > At least with that approach we have a chance of finding out what happened. Its quite the opposite. If bug is still there 6 months after the commits that broke the drivers, (making an old bug visible) that means that people never realized the bug was there. I understand a distro maintainer has its own choices, but for upstream kernel we want to have early reports. This bug is fatal and a security issue. BUG() is appropriate. If the driver cant be fixed, it should be marked broken. So I personally NACKed patch to hide the bug, trying to be friendly to the user. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 07, 2012 at 08:29:12AM -0800, Eric Dumazet wrote: > On Wed, 2012-11-07 at 10:54 -0500, Dave Jones wrote: > > > It sounds more appropriate to me, instead of silently wedging the box. > > At least with that approach we have a chance of finding out what happened. > > Its quite the opposite. > > If bug is still there 6 months after the commits that broke the drivers, > (making an old bug visible) that means that people never realized the > bug was there. dude, look at the bug reports I just pointed you at. People _are_ aware there are bugs there. If you turn that into a BUG() those reports would never have been filed. How is that increasing awareness ? People are going to see wedged computers, and hit the reset button. If we're lucky, we'll get photos of someone lucky enough to have hit it while at the console, not in X. But this is a huge step backwards for debugability. > I understand a distro maintainer has its own choices, but for upstream > kernel we want to have early reports. I'm running out of ways to word this, but I'll try again. You won't get those early reports if you turn this into a BUG(). > This bug is fatal and a security issue. BUG() is appropriate. turning a bug into a remote DoS is also a security issue. Dave -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-11-07 at 11:43 -0500, Dave Jones wrote: > dude, look at the bug reports I just pointed you at. > People _are_ aware there are bugs there. > If I remember well, I helped to fix some of them. > If you turn that into a BUG() those reports would never have been filed. > How is that increasing awareness ? People are going to see wedged computers, > and hit the reset button. If we're lucky, we'll get photos of someone lucky > enough to have hit it while at the console, not in X. But this is a huge > step backwards for debugability. > > > I understand a distro maintainer has its own choices, but for upstream > > kernel we want to have early reports. > > I'm running out of ways to word this, but I'll try again. > You won't get those early reports if you turn this into a BUG(). > > > This bug is fatal and a security issue. BUG() is appropriate. > > turning a bug into a remote DoS is also a security issue. > Apparently in some cases we can loop and fill the syslog, or else Julius wouldnt have sent a patch. So the proper fix is to emit this message only once, and to find a way to alert the user security is compromised. So if BUG() isnt good, just use WARN_ON_ONCE() I feel that WARN_ON_ONCE() wont be clear enough to the user, especially if we recover from this by closing the tcp session, exactly as if we received a proper FIN. Really if you object a BUG() here, I cant understand you didnt shout to other BUG() uses in the kernel. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 07, 2012 at 09:05:02AM -0800, Eric Dumazet wrote: > On Wed, 2012-11-07 at 11:43 -0500, Dave Jones wrote: > > > dude, look at the bug reports I just pointed you at. > > People _are_ aware there are bugs there. > > > If I remember well, I helped to fix some of them. indeed, and I commend you for it. I want to help you fix more ;) > > > I understand a distro maintainer has its own choices, but for upstream > > > kernel we want to have early reports. > > > > I'm running out of ways to word this, but I'll try again. > > You won't get those early reports if you turn this into a BUG(). > > > > > This bug is fatal and a security issue. BUG() is appropriate. > > > > turning a bug into a remote DoS is also a security issue. > > Apparently in some cases we can loop and fill the syslog, or > else Julius wouldnt have sent a patch. > > So the proper fix is to emit this message only once, and to find > a way to alert the user security is compromised. > > So if BUG() isnt good, just use WARN_ON_ONCE() > > I feel that WARN_ON_ONCE() wont be clear enough to the user, especially > if we recover from this by closing the tcp session, exactly as if we > received a proper FIN. Judging by the mangled traces we've seen, further reports after the initial one aren't too useful anyway. Automated detectors like abrt should be able to pick up these traces from the logs on the next reboot. (Which would probably be better than it trying to file them immediately over the network when the tcp layer is so confused) sidenote: If the integrity of the tcp layer is in question, maybe some kind of localised version of BUG() that just shuts down that subsystem might be something worth persueing. > Really if you object a BUG() here, I cant understand you didnt shout to > other BUG() uses in the kernel. When I see them, I call them. But I am just one person, and usage of that macro is like a disease. Dave -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I tend to agree with Dave that it's not in the user's best interest to have a full-on BUG() here, and that we can get our reports just as well by fishing them from the log through abrt or something similar. I will just submit my alternative patch too and let you decide which one you prefer. This version shuts down the socket, so the broken receive queue will not be used again and eventually freed. Other sockets and the system as a whole will stay usable and probably still work if the bug is a very rare coincidence. Of course, the driver will still be buggy, but the same would stay true after a reboot (which is what most people do after a panic). The userland caller gets an unexpected error code, which is not the same as receiving a proper FIN and is the only thing we can do to communicate this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 197c000..fcb0927 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1628,7 +1628,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n", *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt, flags)) - break; + BUG(); offset = *seq - TCP_SKB_CB(skb)->seq; if (tcp_hdr(skb)->syn)
tcp_recvmsg contains a sanity check that WARNs when there is a gap between the socket's copied_seq and the first buffer in the sk_receive_queue. In theory, the TCP stack makes sure that This Should Never Happen (TM)... however, practice shows that there are still a few bug reports from it out there (and one in my inbox). Unfortunately, when it does happen for whatever reason, the situation is not handled very well: the kernel logs a warning and breaks out of the loop that walks the receive queue. It proceeds to find nothing else to do on the socket and hits sk_wait_data, which cannot block because the receive queue is not empty. As no data was read, the outer while loop repeats (logging the same warning again) ad infinitum until the system's syslog exhausts all available hard drive capacity. This patch improves that behavior by going straight to a proper kernel crash. The cause of the error can be identified right away and the system's hard drive is not unnecessarily strained. Signed-off-by: Julius Werner <jwerner@chromium.org> --- net/ipv4/tcp.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)