Message ID | Pine.LNX.4.64.0905180958150.3381@wrl-59.cs.helsinki.fi |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Am 18.05.2009, 09:24 Uhr, schrieb Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>: > On Sun, 17 May 2009, David Miller wrote: > >> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> >> Date: Mon, 11 May 2009 09:32:34 +0300 (EEST) >> >> > [PATCH v2] tcp: fix MSG_PEEK race check >> > >> > Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of >> > blocking behavior) lets the loop run longer than the race check >> > did previously expect, so we need to be more careful with this >> > check and consider the work we have been doing. >> > >> > I tried my best to deal with urg hole madness too which happens >> > here: >> > if (!sock_flag(sk, SOCK_URGINLINE)) { >> > ++*seq; >> > ... >> > by using additional offset by one but I certainly have very >> > little interest in testing that part. >> > >> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> >> > Tested-by: Frans Pop <elendil@planet.nl> >> >> Ok, now that I've looked at this, the urg_hole part of this change has >> to be removed. > > Thanks for taking a look... :-) The first patch btw was with RFC and > without urg bits btw but you put that into some discarded like(?) state > in patchwork... :-/ WRT the earlier patch, we have one more success report from one of the users who reported the problem, namely Ian Zimmermann: <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=513695#155>
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Mon, 18 May 2009 10:24:23 +0300 (EEST) > Sorry, I'm not sure you thought this fully through here. What my patch > with urg_hole does is that it keeps peek_seq non-advancing using the > offsets. Now without the urg offsetting, the peek_seq side of the race > check advances, which means that we didn't catch the race when it happened > for real as copied_seq advanced too?!? I see now what the situation is, thanks for explaining. You're account for the "*seq" advance for URG that happens in tcp_recvmsg() rather than the one that happens in tcp_check_urg(). > From another perspective when the race didn't happen but potential for it > existed, the current check should catch that since peek_seq advanced and > copied_seq stayed where it was. But that certainly doesn't match to your > description above. Right. > I wonder why all this fuzz about this particular race as we will do our > best anyway to skip the hole on MSG_PEEK side (if I read the code right)? > Either we never see the hole in MSG_PEEK side, or we're happily past it > already, does it make any difference anymore in the latter case? <insert > some "not that I fully understand all of that multipage function" > disclaimer here though, I may think too simple scenarios here>. Not that > I'm too interested in "improving" urg or so anywa, I'm just curious... :-) A long long time ago we had an assertion here checking whether ->copied_seq moved without our knowledge. Alexey and I found this could trigger and investigation of that is what helped us find the tcp_check_urg()+MSG_PEEK case. That's when we added this test and log message. When the MSG_PEEK test existing in the !copied if() branch of tcp_recvmsg(), so many of these code paths we are dealing with in this fix could simply not be reached when MSG_PEEK. That ++*seq could never happen, because if "copied" was non-zero and MSG_PEEK was true we would leave the loop and return from tcp_recvmsg() immediately. Now, we have to accomodate those adjustments. Since I now understand your urg_hole code I'm going to apply your v2 patch which takes care of the URG stuff. I also buy the argument that perhaps we should get rid of the log message, but look at how it helped us find this kernel regression :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 18 May 2009, David Miller wrote: > From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> > Date: Mon, 18 May 2009 10:24:23 +0300 (EEST) > > > Sorry, I'm not sure you thought this fully through here. What my patch > > with urg_hole does is that it keeps peek_seq non-advancing using the > > offsets. Now without the urg offsetting, the peek_seq side of the race > > check advances, which means that we didn't catch the race when it happened > > for real as copied_seq advanced too?!? > > I see now what the situation is, thanks for explaining. > > You're account for the "*seq" advance for URG that happens in > tcp_recvmsg() rather than the one that happens in tcp_check_urg(). Right. [I've moved the next fragment here from below...] > When the MSG_PEEK test existing in the !copied if() branch of > tcp_recvmsg(), so many of these code paths we are dealing with in this > fix could simply not be reached when MSG_PEEK. That ++*seq could > never happen, because if "copied" was non-zero and MSG_PEEK was true > we would leave the loop and return from tcp_recvmsg() immediately. > > Now, we have to accomodate those adjustments. ...I thought I wrote something along those lines in the log message (though I failed to make it so dead obvious as you do it here :-)). > > From another perspective when the race didn't happen but potential for it > > existed, the current check should catch that since peek_seq advanced and > > copied_seq stayed where it was. But that certainly doesn't match to your > > description above. > > Right. [...more moving of fragments here...] > Since I now understand your urg_hole code I'm going to apply your v2 > patch which takes care of the URG stuff. I wonder if you realized, that after my change the check _no longer_ catches this case where the potential exists but race didn't happen for real? (bleh, I realized while writing the previous mail that this might be misinterpreted but I was lazy enough to not fix that :-() If we want to "forbid" urgs during msg_peek (ie., unconditionally warn), we could just change the check to do || urg_hole and that would catch both cases, or even better, just clone it into if (MSG_PEEK) under urg hole branch. The latter would even allow us to distinguish between the cases => more intelligent message can then be given in the urg case but that of course needs first the decision that it's something we must check for at all. > > I wonder why all this fuzz about this particular race as we will do our > > best anyway to skip the hole on MSG_PEEK side (if I read the code right)? > > Either we never see the hole in MSG_PEEK side, or we're happily past it > > already, does it make any difference anymore in the latter case? <insert > > some "not that I fully understand all of that multipage function" > > disclaimer here though, I may think too simple scenarios here>. Not that > > I'm too interested in "improving" urg or so anywa, I'm just curious... :-) > > A long long time ago we had an assertion here checking whether > ->copied_seq moved without our knowledge. Alexey and I found this > could trigger and investigation of that is what helped us > find the tcp_check_urg()+MSG_PEEK case. That's when we added this > test and log message. Sure, the copied_seq is moving, but what I'm after is that does it really make any difference from tcp_recvmsg point of view? It certainly triggers the message but that won't work as a proof of the evilness for me. ...The above paragraph is assuming recvmsg is able to deal with that and doesn't choke because of the changing copied_seq. I'd have to audit it once again to verify that it's really ok but I don't see any particular reason why it couldn't be possible to make recvmsg to not care on the tcp_check_urg side copied_seq changes but I'd like to hear a clear confirmation on that from you too. > I also buy the argument that perhaps we should get rid of the log message, I'm not against the log message here. It's still useful for detecting real userspace races but the urg vs. MSG_PEEK case is what I'm not so sure is as evil in itself as claimed. I'm not sure if you meant the latter? > but look at how it helped us find this kernel regression :-) Heh, the regression was quite devastating yeah... ...limited to that check itself :-). However, I had to audit rest of the loop too to check for other similar problems, so it certainly had more potential that just spurious printout (luckily I didn't find any other problem like this).
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Tue, 19 May 2009 07:33:02 +0300 (EEST) > Sure, the copied_seq is moving, but what I'm after is that does it really > make any difference from tcp_recvmsg point of view? It certainly triggers > the message but that won't work as a proof of the evilness for me. > > ...The above paragraph is assuming recvmsg is able to deal with that and > doesn't choke because of the changing copied_seq. I'd have to audit it > once again to verify that it's really ok but I don't see any particular > reason why it couldn't be possible to make recvmsg to not care on the > tcp_check_urg side copied_seq changes but I'd like to hear a clear > confirmation on that from you too. recvmsg can deal with it fine, because we reset the peek_seq when we print out that message. There is no way that an application writer has any clue about this interaction, where peeked bytes disappear and then reappear in the out-of-band URG byte. That's why the message is there. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller schrieb: > Since I now understand your urg_hole code I'm going to apply your v2 > patch which takes care of the URG stuff. I also buy the argument that > perhaps we should get rid of the log message, but look at how it > helped us find this kernel regression :-) Hi, ...and this discovery is a reason to leave it in, and perhaps make sure that the check code is properly linked (through comments for lack of better means) to the actual data transfer functions. Cheers MA -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 1d7f49c..ccbd69b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1532,7 +1532,7 @@ do_prequeue: } } } - if ((flags & MSG_PEEK) && peek_seq != tp->copied_seq) { + if ((flags & MSG_PEEK) && (peek_seq - copied != tp->copied_seq)) { if (net_ratelimit()) printk(KERN_DEBUG "TCP(%s:%d): Application bug, race in MSG_PEEK.\n", current->comm, task_pid_nr(current));