Message ID | 4DD38B30.9090601@cn.fujitsu.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
2011/5/18 Wei Yongjun <yjwei@cn.fujitsu.com>: > >> Le mercredi 18 mai 2011 à 10:06 +0200, Jacek Luczak a écrit : >>> 2011/5/18 Eric Dumazet <eric.dumazet@gmail.com>: >>>> If you're removing items from this list, you must be a writer here, with >>>> exclusive access. So rcu_read_lock()/rcu_read_unlock() is not necessary. >>> I could agree to some extend ... but strict RCU section IMO is needed here. >>> I can check this if the issue exists. >>> >> I can tell you for sure rcu_read_lock() is not needed here. Its only >> showing confusion from code's author. >> >> Please read Documentation/RCU/listRCU.txt for concise explanations, >> line 117. >> >> >>>> Therefore, I guess following code is better : >>>> >>>> list_for_each_entry(addr, &bp->address_list, list) { >>>> list_del_rcu(&addr->list); >>>> call_rcu(&addr->rcu, sctp_local_addr_free); >>>> SCTP_DBG_OBJCNT_DEC(addr); >>>> } >>>> >>>> Then, why dont you fix sctp_bind_addr_clean() instead ? >>>> >>>> if 'struct sctp_sockaddr_entry' is recu protected, then all frees should >>>> be protected as well. >>> The _clean() as claimed by Vlad is called many times from various places >>> in code and this could give a overhead. I guess Vlad would need to comment. >> I guess a full review of this code is needed. You'll have to prove >> sctp_bind_addr_clean() is always called after one RCU grace period if >> you want to leave it as is. >> >> You cant get RCU for free, some rules must be followed or you risk >> crashes. >> > > fix the race between sctp_bind_addr_free() and sctp_bind_addr_conflict(), maybe you just > need to remove the socket from the port hash before empty the bind address list. > some thing like this, not test. > > diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c > index c8cc24e..924d846 100644 > --- a/net/sctp/endpointola.c > +++ b/net/sctp/endpointola.c > @@ -268,12 +268,13 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep) > > /* Cleanup. */ > sctp_inq_free(&ep->base.inqueue); > - sctp_bind_addr_free(&ep->base.bind_addr); > > /* Remove and free the port */ > if (sctp_sk(ep->base.sk)->bind_hash) > sctp_put_port(ep->base.sk); > > + sctp_bind_addr_free(&ep->base.bind_addr); > + > /* Give up our hold on the sock. */ > if (ep->base.sk) > sock_put(ep->base.sk); > Done tests with that one and looks like it survive the flood. How then this will be handled is up to you. Both ways fix the issue which makes me happy as damn hell. @Eric, if you will take a look into the code you might notice that there are few places where list operations could be optimised and the main question here is do we really care to have the data ,,safe'' so that things like that won't popup. The good example can be the set of _local_ functions. Ahhh... and I'm aware of how tricky can be abuse of RCU. -Jacek -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 18 mai 2011 à 13:01 +0200, Jacek Luczak a écrit : > @Eric, if you will take a look into the code you might notice > that there are few places where list operations could be > optimised and the main question here is do we really care > to have the data ,,safe'' so that things like that won't popup. > The good example can be the set of _local_ functions. > Ahhh... and I'm aware of how tricky can be abuse of RCU. I took a quick look at existing rcu_read_lock() uses in sctp and did not find other problems [or optimizations if you prefer this point of view]. Please elaborate. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011/5/18 Eric Dumazet <eric.dumazet@gmail.com>: > Le mercredi 18 mai 2011 à 13:01 +0200, Jacek Luczak a écrit : > >> @Eric, if you will take a look into the code you might notice >> that there are few places where list operations could be >> optimised and the main question here is do we really care >> to have the data ,,safe'' so that things like that won't popup. >> The good example can be the set of _local_ functions. There should be a vertical space here ... >> Ahhh... and I'm aware of how tricky can be abuse of RCU. > > I took a quick look at existing rcu_read_lock() uses in sctp and did not > find other problems [or optimizations if you prefer this point of view]. > Please elaborate. ... so that the last line does not have any connection with what I've wrote above. I get your point Eric so you don't really have to prove it - if you still need. -jacek -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/18/2011 05:02 AM, Wei Yongjun wrote: > fix the race between sctp_bind_addr_free() and sctp_bind_addr_conflict(), maybe you just > need to remove the socket from the port hash before empty the bind address list. > some thing like this, not test. > > diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c > index c8cc24e..924d846 100644 > --- a/net/sctp/endpointola.c > +++ b/net/sctp/endpointola.c > @@ -268,12 +268,13 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep) > > /* Cleanup. */ > sctp_inq_free(&ep->base.inqueue); > - sctp_bind_addr_free(&ep->base.bind_addr); > > /* Remove and free the port */ > if (sctp_sk(ep->base.sk)->bind_hash) > sctp_put_port(ep->base.sk); > > + sctp_bind_addr_free(&ep->base.bind_addr); > + > /* Give up our hold on the sock. */ > if (ep->base.sk) > sock_put(ep->base.sk); > > I am not sure that this will guarantee avoidance of this crash, even though it is the right thing to do in general. We simply make the race condition much smaller and much harder to hit. Potentially, one cpu may be doing lookup of the socket while the other is doing the destroy. If the lookup cpu finds the socket just as this code removes the socket from the hash, we still have this potential race condition. I agree with Eric, rcu_read_lock() is not strictly necessary, as what we are really after is call_rcu() based destruction. We need to delay memory destruction for the rcu grace period. Thinking a little more about how bind_addr_clean() is used, it would probably benefit from getting converted to call_rcu(). That function is used as local clean-up in case of malloc failure; however, that doesn't preclude a potential race. The fact that we do not have this race simply points out that we don't have any kind of parallel lookup that can hit it (and the code confirms it). This doesn't mean that we wouldn't have it in the future. -vlad -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c index c8cc24e..924d846 100644 --- a/net/sctp/endpointola.c +++ b/net/sctp/endpointola.c @@ -268,12 +268,13 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep) /* Cleanup. */ sctp_inq_free(&ep->base.inqueue); - sctp_bind_addr_free(&ep->base.bind_addr); /* Remove and free the port */ if (sctp_sk(ep->base.sk)->bind_hash) sctp_put_port(ep->base.sk); + sctp_bind_addr_free(&ep->base.bind_addr); + /* Give up our hold on the sock. */ if (ep->base.sk) sock_put(ep->base.sk);