Message ID | 1270030626-16687-5-git-send-email-timo.teras@iki.fi |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, Mar 31, 2010 at 01:17:05PM +0300, Timo Teras wrote: > All of the code considers ->dead as a hint that the cached policy > needs to get refreshed. The read side can just drop the read lock > without any side effects. > > The write side needs to make sure that it's written only exactly > once. Only possible race is at xfrm_policy_kill(). This is fixed > by checking result of __xfrm_policy_unlink() when needed. It will > always succeed if the policy object is looked up from the hash > list (so some checks are removed), but it needs to be checked if > we are trying to unlink policy via a reference (appropriate > checks added). > > Since policy->walk.dead is written exactly once, it no longer > needs to be protected with a write lock. > > Signed-off-by: Timo Teras <timo.teras@iki.fi> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> > @@ -794,10 +783,9 @@ int xfrm_policy_flush(struct net *net, u8 type, struct xfrm_audit *audit_info) > &net->xfrm.policy_inexact[dir], bydst) { > if (pol->type != type) > continue; > - dp = __xfrm_policy_unlink(pol, dir); > + __xfrm_policy_unlink(pol, dir); > write_unlock_bh(&xfrm_policy_lock); > - if (dp) > - cnt++; > + cnt++; > > xfrm_audit_policy_delete(pol, 1, audit_info->loginuid, > audit_info->sessionid, I was intrigued to find that my local source never had this dp stuff. It seems that this was added recently: commit 2f1eb65f366b81aa3c22c31e6e8db26168777ec5 Author: Jamal Hadi Salim <hadi@cyberus.ca> Date: Fri Feb 19 02:00:42 2010 +0000 xfrm: Flushing empty SPD generates false events To see the effect make sure you have an empty SPD. On window1 "ip xfrm mon" and on window2 issue "ip xfrm policy flush" You get prompt back in window2 and you see the flush event on window1. With this fix, you still get prompt on window1 but no event on window2. Thanks to Alexey Dobriyan for finding a bug in earlier version when using pfkey to do the flushing. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net> This seems to be bogus to me. Just because the DB was empty before the flush doesn't mean that the flush didn't happen. We should revert this patch in its entirety. Thanks,
On Wed, 2010-03-31 at 19:03 +0800, Herbert Xu wrote: > This seems to be bogus to me. Just because the DB was empty > before the flush doesn't mean that the flush didn't happen. Herbert, If a tree falls in a forest and no one is around to hear it, does it make a sound? ;-> A flush event is meant to be a signal to user space that what was once a non-empty table is now empty. This is a consistent definition of the semantics everywhere tables are flushed (not just in Linux).. What makes the SPD and SAD speacial? cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 31, 2010 at 09:06:13AM -0400, jamal wrote: > On Wed, 2010-03-31 at 19:03 +0800, Herbert Xu wrote: > > > This seems to be bogus to me. Just because the DB was empty > > before the flush doesn't mean that the flush didn't happen. > > Herbert, If a tree falls in a forest and no one is around to hear it, > does it make a sound? ;-> > A flush event is meant to be a signal to user space that what > was once a non-empty table is now empty. I disagree. A flush event is a signal that someone has sent a flush command. In any case we've had this semantics for years and I haven't heard a good reason why this should be changed. > This is a consistent definition of the semantics everywhere tables > are flushed (not just in Linux).. Please give specific examples in the kernel. Cheers,
On Wed, Mar 31, 2010 at 09:11:31PM +0800, Herbert Xu wrote: > > > This is a consistent definition of the semantics everywhere tables > > are flushed (not just in Linux).. > > Please give specific examples in the kernel. In fact the previous behaviour is also consistent with RFC2367: 3.1.9 SADB_FLUSH The SADB_FLUSH message causes the kernel to delete all entries in its key table for a certain sadb_msg_satype. Only the base header is required for a flush message. If sadb_msg_satype is filled in with a specific value, only associations of that type are deleted. If it is filled in with SADB_SATYPE_UNSPEC, ALL associations are deleted. The messaging behavior for SADB_FLUSH is: Send an SADB_FLUSH message from a user process to the kernel. <base> The kernel will return an SADB_FLUSH message to all listening sockets. <base> The reply message happens only after the actual flushing of security associations has been attempted. There is no special treatment for an empty DB. Please send a revert. Thanks,
On Wed, 2010-03-31 at 21:11 +0800, Herbert Xu wrote: > I disagree. A flush event is a signal that someone has sent a > flush command. Sorry - I respectfully disagree. > In any case we've had this semantics for years > and I haven't heard a good reason why this should be changed. It generates unnecessary noise and it is a deviation like i mentioned. > > This is a consistent definition of the semantics everywhere tables > > are flushed (not just in Linux).. > > Please give specific examples in the kernel. Something i can do safely right now without messing my connection; Issue iproute commands in one window, observe events in another -sudo ip route add 192.168.11.100 dev eth0 table 15 generates an event -sudo ip route flush table 15 generates an event -sudo ip route flush table 15 No event But pick anything else in the other netlink knowledgeable subsystem and youd see similar behavior. If there was an app depending on this behavior - thats a separate reason (but thats not the arguement you are making). cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-03-31 at 21:26 +0800, Herbert Xu wrote:
> In fact the previous behaviour is also consistent with RFC2367:
I did not touch pfkey. That behavior remains there.
cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-03-31 at 09:32 -0400, jamal wrote:
> I did not touch pfkey. That behavior remains there.
Sorry - I lied. I did touch pfkey. Here was my reasoning.
RFC 2367 says flushing behavior should be:
1) user space -> kernel: flush
2) kernel: flush
3) kernel -> user space: flush event to ALL listeners
This is not realistic today in the presence of selinux policies
which may reject the flush etc. So we make the sequence become:
1) user space -> kernel: flush
2) kernel: flush
3) kernel -> user space: flush response to originater from #1
4) if there were no errors then:
kernel -> user space: flush event to ALL listeners
This was in the logs.
cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-03-31 at 09:39 -0400, jamal wrote: > > On Wed, 2010-03-31 at 09:32 -0400, jamal wrote: > > > I did not touch pfkey. That behavior remains there. > > Sorry - I lied. I did touch pfkey. Here was my reasoning. And what I meant by not touching is that "the behavior there remains as it was before" cheers, jamal > RFC 2367 says flushing behavior should be: > 1) user space -> kernel: flush > 2) kernel: flush > 3) kernel -> user space: flush event to ALL listeners > > This is not realistic today in the presence of selinux policies > which may reject the flush etc. So we make the sequence become: > 1) user space -> kernel: flush > 2) kernel: flush > 3) kernel -> user space: flush response to originater from #1 > 4) if there were no errors then: > kernel -> user space: flush event to ALL listeners > > This was in the logs. > > cheers, > jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 31, 2010 at 09:28:12AM -0400, jamal wrote: > > -sudo ip route add 192.168.11.100 dev eth0 table 15 > generates an event > -sudo ip route flush table 15 > generates an event > -sudo ip route flush table 15 > No event That's completely different. We don't have a route flush event, instead we're sending route delete events. That's why when the table is empty you get no events. If we had a route flush event then it would behave exactly the same. BTW you've also made xfrm_state_flush inconsistent with respect to xfrm_policy_flush. Dave, please revert this patch. Thanks,
On Wed, Mar 31, 2010 at 09:39:55AM -0400, jamal wrote: > > This is not realistic today in the presence of selinux policies > which may reject the flush etc. So we make the sequence become: > 1) user space -> kernel: flush > 2) kernel: flush > 3) kernel -> user space: flush response to originater from #1 > 4) if there were no errors then: > kernel -> user space: flush event to ALL listeners Eliding the notification if SELinux says so is fine, but eliding it because the table is empty is wrong. The flush did not fail just because the table was empty to begin with. Cheers,
On Wed, Mar 31, 2010 at 09:41:23AM -0400, jamal wrote: > > On Wed, 2010-03-31 at 09:39 -0400, jamal wrote: > > > > On Wed, 2010-03-31 at 09:32 -0400, jamal wrote: > > > > > I did not touch pfkey. That behavior remains there. > > > > Sorry - I lied. I did touch pfkey. Here was my reasoning. > > And what I meant by not touching is that "the behavior there > remains as it was before" No you've changed it. PF_KEY will no longer notify if the policy table is empty. This is inconsistent with the behaviour of SADB flushes, and the spirit of RFC2367. Cheers,
On Wed, 2010-03-31 at 21:55 +0800, Herbert Xu wrote: > Eliding the notification if SELinux says so is fine, but eliding > it because the table is empty is wrong. > > The flush did not fail just because the table was empty to begin > with. Like i said i didnt touch the behavior except for the selinux case (which sounds very reasonable). I believe there maybe historical legacy reasons for that semantic in pfkey. Can you point to something in the kernel (or anywhere else) that behaves like this on table flushing? Actually if there was an app that depended on netlink flush being exposed on empty table - then i think theres reason for a revert. Other than that i will say again: i respectfully disagree. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-03-31 at 21:56 +0800, Herbert Xu wrote: > > No you've changed it. PF_KEY will no longer notify if the policy > table is empty. > > This is inconsistent with the behaviour of SADB flushes, and the > spirit of RFC2367. I did not mean to change it for pfkey. I do believe there are apps that need it. I will run some tests and if it breaks - I will send a patch. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 31, 2010 at 10:12:48AM -0400, jamal wrote: > On Wed, 2010-03-31 at 21:55 +0800, Herbert Xu wrote: > > > Eliding the notification if SELinux says so is fine, but eliding > > it because the table is empty is wrong. > > > > The flush did not fail just because the table was empty to begin > > with. > > Like i said i didnt touch the behavior except for the selinux case > (which sounds very reasonable). I believe there maybe historical legacy > reasons for that semantic in pfkey. > Can you point to something in the kernel (or anywhere else) that behaves > like this on table flushing? Actually if there was an app that depended > on netlink flush being exposed on empty table - then i think theres > reason for a revert. > Other than that i will say again: i respectfully disagree. OK I give up. Dave can keep or revert this as he likes. Cheers,
On Wed, 2010-03-31 at 22:15 +0800, Herbert Xu wrote: > > OK I give up. > > Dave can keep or revert this as he likes. Ok - I will await to see what Dave thinks before i bother doing whatever setup. What we have here is a philosophical difference. Herbert note: i realize this is unfair for me to say given you were likely very busy: I respect your opinion and knowledge and did CC you on all the many mails when i was RFCing this and when i sent the patches. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 31, 2010 at 10:24:38AM -0400, jamal wrote: > On Wed, 2010-03-31 at 22:15 +0800, Herbert Xu wrote: > > > > > OK I give up. > > > > Dave can keep or revert this as he likes. > > Ok - I will await to see what Dave thinks before i bother doing whatever > setup. What we have here is a philosophical difference. > Herbert note: i realize this is unfair for me to say given you were > likely very busy: I respect your opinion and knowledge and did CC you on > all the many mails when i was RFCing this and when i sent the patches. Yes you did cc me. Unfortunately I was rather busy with some other stuff at the time. Had I read your patch back then I would've said something :) Cheers,
On Wed, 2010-03-31 at 22:29 +0800, Herbert Xu wrote: > Yes you did cc me. Unfortunately I was rather busy with some other > stuff at the time. Had I read your patch back then I would've said > something :) makes sense ;-> And it wouldnt have been like the first (or the last) time you and i have had this heated discussions;-> Lets just wait to hear what Dave says - I dont want make his life more difficult than it is; so no hard feeling if he decides to revert. cheers, jamal PS:- since you woke me up and i know you are awake, i am going to get rid of a jetlag by sending probably another controversial patch. Hmmm.. where to start. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Herbert Xu wrote: > On Wed, Mar 31, 2010 at 09:28:12AM -0400, jamal wrote: >> -sudo ip route add 192.168.11.100 dev eth0 table 15 >> generates an event >> -sudo ip route flush table 15 >> generates an event >> -sudo ip route flush table 15 >> No event > > That's completely different. We don't have a route flush event, > instead we're sending route delete events. That's why when the > table is empty you get no events. > > If we had a route flush event then it would behave exactly the > same. I was just about to say the exact same thing when I noticed your email. I agree with Herbert, the flush notification indicates that the table is now empty, independant of its previous state. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-03-31 at 18:41 +0200, Patrick McHardy wrote: > I agree with Herbert, the flush notification indicates that > the table is now empty, independant of its previous state. What purpose does it serve? As an example, if i delete an entry, the fact i deleted an entry is of interest to some manager in user space for the purpose of syncing. If i brought a link down, same thing. If i brought a link down that was already down - why would that be of interest to generate as an event? etc. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Wed, 31 Mar 2010 21:53:31 +0800 > Dave, please revert this patch. Herbert I actually don't agree with you on this one, sorry :-) If no semantic change occurred to the policy or state databases, there is zero reason to report the flush event, it's noise. Like Jamal, I too am pretty frustrated with how noisy and unusable the netlink messages are to watch when trying to debug something and this behavior change should break no applications whatsoever. When you have evidence of a real existing case failing because of Jamal's changes (not some case you crafted specifically to show that an empty flush doesn't get reported), I'll consider a revert or ask Jamal to fix that case up. Until then, Jamal's change is staying in the tree. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Wed, 31 Mar 2010 21:56:29 +0800 > This is inconsistent with the behaviour of SADB flushes, and the > spirit of RFC2367. Want to have an even larger list of areas where we're not compliant with that RFC2367 that everyone has to extend and makes changes to in order to be useful? :-) Herbert, you have to show that something breaks because of this, because the new behavior helps in diagnosis and efficiency and the old behavior has been a serious pet peeve of mine just as it was for Jamal. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Wed, 31 Mar 2010 22:15:25 +0800 > Dave can keep or revert this as he likes. Herbert, if it doesn't break anything, I feel the gains are very worthwhile. When we have a breakage report, I will undo this or fix it immediately. Until then, we have to be cognizant of what positives we get out of this. Have you actually tried to monitor the IPSEC netlink socket events when a daemon is running? It's way too painful before Jamal's changes, and we already emit way too many semantically empty events in netlink. Minimization of the noise is a good thing, if it can legally be done, which I believe is the case here. And this applies to pfkey events too, because processing those is even more expensive on the application side. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 31, 2010 at 01:57:35PM -0700, David Miller wrote: > > Herbert, if it doesn't break anything, I feel the gains are very > worthwhile. OK, in that case please change xfrm_state_flush too. Right now it also notifies on empty. > Have you actually tried to monitor the IPSEC netlink socket events > when a daemon is running? It's way too painful before Jamal's > changes, and we already emit way too many semantically empty events in > netlink. Honestly I can't see how eliminating an empty flush event helps. Flushes are very rare. Cheers,
On Thu, 2010-04-01 at 08:22 +0800, Herbert Xu wrote: > OK, in that case please change xfrm_state_flush too. Right now > it also notifies on empty. It shouldnt. I will double check it tomorrow.. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ive tried it with net-next which is derived of 2.6.34-rc1 and it seems to work as expected i.e. only the flush on a non-empty SAD is event-ed. Probably older kernel you are running? cheers, jamal On Wed, 2010-03-31 at 22:19 -0400, jamal wrote: > On Thu, 2010-04-01 at 08:22 +0800, Herbert Xu wrote: > > > OK, in that case please change xfrm_state_flush too. Right now > > it also notifies on empty. > > It shouldnt. I will double check it tomorrow.. > > cheers, > jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
jamal wrote: > On Wed, 2010-03-31 at 18:41 +0200, Patrick McHardy wrote: > >> I agree with Herbert, the flush notification indicates that >> the table is now empty, independant of its previous state. > > What purpose does it serve? > > As an example, if i delete an entry, the fact i deleted an entry is > of interest to some manager in user space for the purpose of syncing. > If i brought a link down, same thing. If i brought a link down > that was already down - why would that be of interest to generate > as an event? etc. That's true. Since both pfkey and xfrm process messages synchronously, there shouldn't be any need for this. In fact I couldn't even find a single keying daemon that cares about this message. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 843e066..82789cf 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -156,7 +156,7 @@ static void xfrm_policy_timer(unsigned long data) read_lock(&xp->lock); - if (xp->walk.dead) + if (unlikely(xp->walk.dead)) goto out; dir = xfrm_policy_id2dir(xp->index); @@ -297,17 +297,7 @@ static DECLARE_WORK(xfrm_policy_gc_work, xfrm_policy_gc_task); static void xfrm_policy_kill(struct xfrm_policy *policy) { - int dead; - - write_lock_bh(&policy->lock); - dead = policy->walk.dead; policy->walk.dead = 1; - write_unlock_bh(&policy->lock); - - if (unlikely(dead)) { - WARN_ON(1); - return; - } spin_lock_bh(&xfrm_policy_gc_lock); hlist_add_head(&policy->bydst, &xfrm_policy_gc_list); @@ -776,7 +766,6 @@ xfrm_policy_flush_secctx_check(struct net *net, u8 type, struct xfrm_audit *audi int xfrm_policy_flush(struct net *net, u8 type, struct xfrm_audit *audit_info) { int dir, err = 0, cnt = 0; - struct xfrm_policy *dp; write_lock_bh(&xfrm_policy_lock); @@ -794,10 +783,9 @@ int xfrm_policy_flush(struct net *net, u8 type, struct xfrm_audit *audit_info) &net->xfrm.policy_inexact[dir], bydst) { if (pol->type != type) continue; - dp = __xfrm_policy_unlink(pol, dir); + __xfrm_policy_unlink(pol, dir); write_unlock_bh(&xfrm_policy_lock); - if (dp) - cnt++; + cnt++; xfrm_audit_policy_delete(pol, 1, audit_info->loginuid, audit_info->sessionid, @@ -816,10 +804,9 @@ int xfrm_policy_flush(struct net *net, u8 type, struct xfrm_audit *audit_info) bydst) { if (pol->type != type) continue; - dp = __xfrm_policy_unlink(pol, dir); + __xfrm_policy_unlink(pol, dir); write_unlock_bh(&xfrm_policy_lock); - if (dp) - cnt++; + cnt++; xfrm_audit_policy_delete(pol, 1, audit_info->loginuid, @@ -1132,6 +1119,9 @@ int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol) __xfrm_policy_link(pol, XFRM_POLICY_MAX+dir); } if (old_pol) + /* Unlinking succeeds always. This is the only function + * allowed to delete or replace socket policy. + */ __xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir); write_unlock_bh(&xfrm_policy_lock); @@ -1737,11 +1727,8 @@ restart: goto error; } - for (pi = 0; pi < npols; pi++) { - read_lock_bh(&pols[pi]->lock); + for (pi = 0; pi < npols; pi++) pol_dead |= pols[pi]->walk.dead; - read_unlock_bh(&pols[pi]->lock); - } write_lock_bh(&policy->lock); if (unlikely(pol_dead || stale_bundle(dst))) { diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index da5ba86..a267fbd 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -1770,13 +1770,9 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh, if (xp == NULL) return -ENOENT; - read_lock(&xp->lock); - if (xp->walk.dead) { - read_unlock(&xp->lock); + if (unlikely(xp->walk.dead)) goto out; - } - read_unlock(&xp->lock); err = 0; if (up->hard) { uid_t loginuid = NETLINK_CB(skb).loginuid;
All of the code considers ->dead as a hint that the cached policy needs to get refreshed. The read side can just drop the read lock without any side effects. The write side needs to make sure that it's written only exactly once. Only possible race is at xfrm_policy_kill(). This is fixed by checking result of __xfrm_policy_unlink() when needed. It will always succeed if the policy object is looked up from the hash list (so some checks are removed), but it needs to be checked if we are trying to unlink policy via a reference (appropriate checks added). Since policy->walk.dead is written exactly once, it no longer needs to be protected with a write lock. Signed-off-by: Timo Teras <timo.teras@iki.fi> --- net/xfrm/xfrm_policy.c | 31 +++++++++---------------------- net/xfrm/xfrm_user.c | 6 +----- 2 files changed, 10 insertions(+), 27 deletions(-)