diff mbox

regression due to "flush SAD/SPD generate false events"

Message ID 1266449529.17794.157.camel@bigi
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

jamal Feb. 17, 2010, 11:32 p.m. UTC
On Wed, 2010-02-17 at 18:07 -0500, jamal wrote:
> When you flush
> an empty table, the time goes up from about 1.5s
> to 3 secs with pfkey. You also see the EAGAIN for
> each flush ...
> I am going to dig a little more ..

Here is a fix. The speed is restored (actually looks a little better
now) - the only thing is if you try to flush an empty table we return
-ESRCH; this seems reasonable, no? So a script like following 

---
#!/usr/sbin/setkey -f
flush;
spdflush;
----

will get:
---
bigismall:~# time setkey -f ./setkey-sample
The result of line 2: No such process.
The result of line 3: No such process.

real	0m0.663s
user	0m0.080s
sys	0m0.128s
----

cheers,
jamal

Comments

Alexey Dobriyan Feb. 18, 2010, 6:55 p.m. UTC | #1
On Wed, Feb 17, 2010 at 06:32:09PM -0500, jamal wrote:
> On Wed, 2010-02-17 at 18:07 -0500, jamal wrote:
> > When you flush
> > an empty table, the time goes up from about 1.5s
> > to 3 secs with pfkey. You also see the EAGAIN for
> > each flush ...
> > I am going to dig a little more ..
> 
> Here is a fix. The speed is restored (actually looks a little better
> now) - the only thing is if you try to flush an empty table we return
> -ESRCH; this seems reasonable, no? So a script like following 
> 
> ---
> #!/usr/sbin/setkey -f
> flush;
> spdflush;
> ----
> 
> will get:
> ---
> bigismall:~# time setkey -f ./setkey-sample
> The result of line 2: No such process.
> The result of line 3: No such process.

I'd expect flushing empty SAD/SPD to be exact nops. :^)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Feb. 18, 2010, 8:53 p.m. UTC | #2
On Thu, 2010-02-18 at 20:55 +0200, Alexey Dobriyan wrote:

> I'd expect flushing empty SAD/SPD to be exact nops. :^)

Agree - that is a better alternative.
I was hoping returning an error code of 0 will do that.
But that triggers the EAGAIN you saw. It doesnt trigger
an EAGAIN for the netlink side (and it is a nop there).
Let me look at it. Clearly the EAGAIN is causing the 
latency.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Feb. 18, 2010, 9:45 p.m. UTC | #3
On Thu, 2010-02-18 at 15:53 -0500, jamal wrote:

> Agree - that is a better alternative.
> I was hoping returning an error code of 0 will do that.
> But that triggers the EAGAIN you saw. It doesnt trigger
> an EAGAIN for the netlink side (and it is a nop there).
> Let me look at it. Clearly the EAGAIN is causing the 
> latency.

Ok, looking closely at the strace i think ive figured out
what pfkey is doing. It basically sets non-blocking receive
and expects to see the flush event whether real or bogus
(EAGAIN is because it tries to recvmsg and fails the first
time because we dont send the event when table is empty).

The two options are: 1) Restore old behavior just for pfkey
where bogus event is sent always or 2) return -ESRCH as I
have done in the last patch.
I think #2 is more accurate. But if that is going to break
existing scripts then #1 is the way to go.

Thoughts?

cheers,
jamal



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 18, 2010, 10:27 p.m. UTC | #4
From: jamal <hadi@cyberus.ca>
Date: Thu, 18 Feb 2010 16:45:58 -0500

> The two options are: 1) Restore old behavior just for pfkey
> where bogus event is sent always or 2) return -ESRCH as I
> have done in the last patch.
> I think #2 is more accurate. But if that is going to break
> existing scripts then #1 is the way to go.
> 
> Thoughts?

Let's see how #2 works out, if we get any more breakage
cases reported then we'll have to revert again and go with
#1.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Feb. 18, 2010, 10:52 p.m. UTC | #5
On Thu, 2010-02-18 at 14:27 -0800, David Miller wrote:

> Let's see how #2 works out, if we get any more breakage
> cases reported then we'll have to revert again and go with
> #1.

Actually, i think i found the root cause. The sequence should be:

1) pfkey->kernel: flush
kernel flushes SA/SPD
2)kernel->pfkey: respond to flush message
3)kernel->all user space listeners: i just flushed the SA/SPD

#2 is always needed.
#3 is wrong if SA/SPD is empty.
kernel unfortunately sends a single broadcast which combines
#2 and #3. So the proper fix is to break #2 and #3 into
separate messages. And follow the rules above.

I dont have time to fix it right now - but i will send a patch
tommorow.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 18, 2010, 11:01 p.m. UTC | #6
From: jamal <hadi@cyberus.ca>
Date: Thu, 18 Feb 2010 17:52:41 -0500

> I dont have time to fix it right now - but i will send a patch
> tommorow.

Ok, thanks for the update.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 8b8e26a..79d2c0f 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1751,7 +1751,7 @@  static int pfkey_flush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hd
 	audit_info.secid = 0;
 	err = xfrm_state_flush(net, proto, &audit_info);
 	if (err)
-		return 0;
+		return err;
 	c.data.proto = proto;
 	c.seq = hdr->sadb_msg_seq;
 	c.pid = hdr->sadb_msg_pid;
@@ -2713,7 +2713,7 @@  static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg
 	audit_info.secid = 0;
 	err = xfrm_policy_flush(net, XFRM_POLICY_TYPE_MAIN, &audit_info);
 	if (err)
-		return 0;
+		return err;
 	c.data.type = XFRM_POLICY_TYPE_MAIN;
 	c.event = XFRM_MSG_FLUSHPOLICY;
 	c.pid = hdr->sadb_msg_pid;