diff mbox

pppoe: Use workqueue to die properly when a PADT is received

Message ID 1424381068-22252-1-git-send-email-simon@farnz.org.uk
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Simon Farnsworth Feb. 19, 2015, 9:24 p.m. UTC
When a PADT frame is received, the socket may not be in a good state to
close down the PPP interface. The current implementation handles this by
simply blocking all further PPP traffic, and hoping that the lack of traffic
will trigger the user to investigate.

Use schedule_work to get to a process context from which we clear down the
PPP interface, in a fashion analogous to hangup on a TTY-based PPP
interface. This causes pppd to disconnect immediately, and allows tools to
take immediate corrective action.

Signed-off-by: Simon Farnsworth <simon@farnz.org.uk>
---
Note that I'm not subscribed to netdev; please cc me on any replies.

The patch falls out of https://bugzilla.gnome.org/show_bug.cgi?id=742939
I'm trying to get NetworkManager back to using kernel PPPoE partly because
it performs a little better, and mostly because kernel PPPoE copes with
larger MTUs than userspace PPPoE.

Dan Williams (cc'd) has tested a previous version of this patch; the
differences to this version are only cosmetic.

 drivers/net/ppp/pppoe.c  | 17 ++++++++++++++++-
 include/linux/if_pppox.h |  2 ++
 2 files changed, 18 insertions(+), 1 deletion(-)

Comments

Simon Farnsworth Feb. 20, 2015, 11:17 a.m. UTC | #1
Michal's e-mail as listed in MAINTAINERS is bouncing (he's listed as PPPoE
maintainer).

If anyone has alternate contact details for him, please let him know that
his kernel-related e-mail is bouncing.

The relevant chunk of MAINTAINERS is:

PPP OVER ETHERNET
M:      Michal Ostrowski <mostrows@earthlink.net>
S:      Maintained
F:      drivers/net/ppp/pppoe.c
F:      drivers/net/ppp/pppox.c

If I don't hear from him in the next couple of weeks, I'll send the trivial
patch to remove PPP OVER ETHERNET from maintained status.

On Thursday 19 February 2015 21:24:28 Simon Farnsworth wrote:
> When a PADT frame is received, the socket may not be in a good state to
> close down the PPP interface. The current implementation handles this by
> simply blocking all further PPP traffic, and hoping that the lack of traffic
> will trigger the user to investigate.
> 
> Use schedule_work to get to a process context from which we clear down the
> PPP interface, in a fashion analogous to hangup on a TTY-based PPP
> interface. This causes pppd to disconnect immediately, and allows tools to
> take immediate corrective action.
> 
> Signed-off-by: Simon Farnsworth <simon@farnz.org.uk>
> ---
> Note that I'm not subscribed to netdev; please cc me on any replies.
> 
> The patch falls out of https://bugzilla.gnome.org/show_bug.cgi?id=742939
> I'm trying to get NetworkManager back to using kernel PPPoE partly because
> it performs a little better, and mostly because kernel PPPoE copes with
> larger MTUs than userspace PPPoE.
> 
> Dan Williams (cc'd) has tested a previous version of this patch; the
> differences to this version are only cosmetic.
> 
>  drivers/net/ppp/pppoe.c  | 17 ++++++++++++++++-
>  include/linux/if_pppox.h |  2 ++
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
<snip patch>

--
Simon Farnsworth
Christoph Schulz Feb. 20, 2015, 1:25 p.m. UTC | #2
(Cc: linux-ppp@vger.kernel.org, mostrows@gmail.com)

Hello!

Simon Farnsworth schrieb am Fri, 20 Feb 2015 11:17:26 +0000:

> Michal's e-mail as listed in MAINTAINERS is bouncing (he's listed as PPPoE
> maintainer).

I once had the same problem. I finally used mostrows@gmail.com, found  
elsewhere (see e.g. http://patchwork.ozlabs.org/patch/36932/).


Regards,
Christoph Schulz Feb. 20, 2015, 4:10 p.m. UTC | #3
(Cc: linux-ppp@vger.kernel.org, mostrows@gmail.com)

Hello!

Simon Farnsworth schrieb am Thu, 19 Feb 2015 21:24:28 +0000:

> When a PADT frame is received, the socket may not be in a good state to
> close down the PPP interface. The current implementation handles this by
> simply blocking all further PPP traffic, and hoping that the lack of traffic
> will trigger the user to investigate.
>
> Use schedule_work to get to a process context from which we clear down the
> PPP interface, in a fashion analogous to hangup on a TTY-based PPP
> interface. This causes pppd to disconnect immediately, and allows tools to
> take immediate corrective action.

Tested-by: Christoph Schulz <develop@kristov.de>

I tested your patch von top of 3.14.33 and 3.18.7 together with a  
slightly modified pppd based on Debian's pppd-2.4.6-3 (which is almost  
identical to pppd 2.4.7). The client now indeed receives PADT feedback  
immediately. Closing a PPPoE connection on the server side

Feb 20 16:45:44 swing local2.notice pppoe-server-wrapper[20154]:  
circ4: pppoe-server died with exit code 0, cleaning up

leads to immediate hangup on the client side:

Feb 20 16:45:44 sandbox local2.notice pppd[539]: Modem hangup

Formerly, the LCP echo timeout had to be reached. On the server side:

Feb 20 14:56:53 swing local2.notice pppoe-server-wrapper[12744]:  
circ4: pppoe-server died with exit code 0, cleaning up

On the client side:

Feb 20 14:57:01 sandbox local2.info pppd[15575]: No response to 2  
echo-requests
Feb 20 14:57:01 sandbox local2.notice pppd[15575]: Serial link appears  
to be disconnected.

Note that the short detection of the terminated link (~ 8 s) stems  
from very aggressive LCP echo settings (lcp-echo-interval "3",  
lcp-echo-failure "2"). Typically, these timeouts are much larger to  
accomodate for bad links. Often the client recognizes a dead link only  
after two to three minutes (lcp-echo-interval "30", lcp-echo-failure  
"5").

However, note also that your patch causes pppd (or rather the rp-pppoe  
plugin) to wonder about the socket closed by the kernel:

Feb 20 16:45:44 sandbox local2.err pppd[539]: Failed to disconnect  
PPPoE socket: 114 Operation already in progress

I don't fully understand the code there; it seems that the plugin  
*connects* the PPPoE session socket in order to *disconnect* it:

static void
PPPOEDisconnectDevice(void)
{
     struct sockaddr_pppox sp;

     sp.sa_family = AF_PPPOX;
     sp.sa_protocol = PX_PROTO_OE;
     sp.sa_addr.pppoe.sid = 0;
     memcpy(sp.sa_addr.pppoe.dev, conn->ifName, IFNAMSIZ);
     memcpy(sp.sa_addr.pppoe.remote, conn->peerEth, ETH_ALEN);
     if (connect(conn->sessionSocket, (struct sockaddr *) &sp,
                 sizeof(struct sockaddr_pppox)) < 0)
         error("Failed to disconnect PPPoE socket: %d %m", errno);
     close(conn->sessionSocket);
     /* don't send PADT?? */
     if (conn->discoverySocket >= 0)
         close(conn->discoverySocket);
}


Regards,
Simon Farnsworth Feb. 20, 2015, 4:41 p.m. UTC | #4
On Friday 20 February 2015 17:10:14 Christoph Schulz wrote:
> (Cc: linux-ppp@vger.kernel.org, mostrows@gmail.com)
> 
> Hello!
> 
> Simon Farnsworth schrieb am Thu, 19 Feb 2015 21:24:28 +0000:
> 
> > When a PADT frame is received, the socket may not be in a good state to
> > close down the PPP interface. The current implementation handles this by
> > simply blocking all further PPP traffic, and hoping that the lack of traffic
> > will trigger the user to investigate.
> >
> > Use schedule_work to get to a process context from which we clear down the
> > PPP interface, in a fashion analogous to hangup on a TTY-based PPP
> > interface. This causes pppd to disconnect immediately, and allows tools to
> > take immediate corrective action.
> 
> Tested-by: Christoph Schulz <develop@kristov.de>
> 
<snip success report>
> However, note also that your patch causes pppd (or rather the rp-pppoe  
> plugin) to wonder about the socket closed by the kernel:
> 
> Feb 20 16:45:44 sandbox local2.err pppd[539]: Failed to disconnect  
> PPPoE socket: 114 Operation already in progress
>
I assume there's nothing else wrong here, other than pppd complaining? The
code doesn't suggest there will be issues if we fail to disconnect.

> I don't fully understand the code there; it seems that the plugin  
> *connects* the PPPoE session socket in order to *disconnect* it:
> 
> static void
> PPPOEDisconnectDevice(void)
> {
>      struct sockaddr_pppox sp;
> 
>      sp.sa_family = AF_PPPOX;
>      sp.sa_protocol = PX_PROTO_OE;
>      sp.sa_addr.pppoe.sid = 0;
>      memcpy(sp.sa_addr.pppoe.dev, conn->ifName, IFNAMSIZ);
>      memcpy(sp.sa_addr.pppoe.remote, conn->peerEth, ETH_ALEN);
>      if (connect(conn->sessionSocket, (struct sockaddr *) &sp,
>                  sizeof(struct sockaddr_pppox)) < 0)
>          error("Failed to disconnect PPPoE socket: %d %m", errno);
>      close(conn->sessionSocket);
>      /* don't send PADT?? */
>      if (conn->discoverySocket >= 0)
>          close(conn->discoverySocket);
> }

The code is trying to disconnect the session by connecting to session 0
(which is invalid) in order to stop data flow. I'll have another look at the
kernel code tonight to see if that does anything that
close(conn->sessionSocket) won't do - I can't see a good reason for it,
though.

I suspect this is a straight bug in the rp-pppoe.so plugin.
--
Simon Farnsworth
Dan Williams Feb. 20, 2015, 5:05 p.m. UTC | #5
On Thu, 2015-02-19 at 21:24 +0000, Simon Farnsworth wrote:
> When a PADT frame is received, the socket may not be in a good state to
> close down the PPP interface. The current implementation handles this by
> simply blocking all further PPP traffic, and hoping that the lack of traffic
> will trigger the user to investigate.
> 
> Use schedule_work to get to a process context from which we clear down the
> PPP interface, in a fashion analogous to hangup on a TTY-based PPP
> interface. This causes pppd to disconnect immediately, and allows tools to
> take immediate corrective action.
> 
> Signed-off-by: Simon Farnsworth <simon@farnz.org.uk>

Adding my tested-by since I tested the patch and confirmed it fixes the
issue before Simon posted to netdev.

Tested-by: Dan Williams <dcbw@redhat.com>

> ---
> Note that I'm not subscribed to netdev; please cc me on any replies.
> 
> The patch falls out of https://bugzilla.gnome.org/show_bug.cgi?id=742939
> I'm trying to get NetworkManager back to using kernel PPPoE partly because
> it performs a little better, and mostly because kernel PPPoE copes with
> larger MTUs than userspace PPPoE.
> 
> Dan Williams (cc'd) has tested a previous version of this patch; the
> differences to this version are only cosmetic.
> 
>  drivers/net/ppp/pppoe.c  | 17 ++++++++++++++++-
>  include/linux/if_pppox.h |  2 ++
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
> index d2408a5..9c97e9b 100644
> --- a/drivers/net/ppp/pppoe.c
> +++ b/drivers/net/ppp/pppoe.c
> @@ -455,6 +455,18 @@ out:
>  	return NET_RX_DROP;
>  }
>  
> +static void pppoe_unbind_sock_work(struct work_struct *work)
> +{
> +	struct pppox_sock *po = container_of(work, struct pppox_sock,
> +					     proto.pppoe.padt_work);
> +	struct sock *sk = sk_pppox(po);
> +
> +	lock_sock(sk);
> +	pppox_unbind_sock(sk);
> +	release_sock(sk);
> +	sock_put(sk);
> +}
> +
>  /************************************************************************
>   *
>   * Receive a PPPoE Discovery frame.
> @@ -500,7 +512,8 @@ static int pppoe_disc_rcv(struct sk_buff *skb, struct net_device *dev,
>  		}
>  
>  		bh_unlock_sock(sk);
> -		sock_put(sk);
> +		if (!schedule_work(&po->proto.pppoe.padt_work))
> +			sock_put(sk);
>  	}
>  
>  abort:
> @@ -613,6 +626,8 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
>  
>  	lock_sock(sk);
>  
> +	INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work);
> +
>  	error = -EINVAL;
>  	if (sp->sa_protocol != PX_PROTO_OE)
>  		goto end;
> diff --git a/include/linux/if_pppox.h b/include/linux/if_pppox.h
> index aff7ad8..66a7d76 100644
> --- a/include/linux/if_pppox.h
> +++ b/include/linux/if_pppox.h
> @@ -19,6 +19,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/ppp_channel.h>
>  #include <linux/skbuff.h>
> +#include <linux/workqueue.h>
>  #include <uapi/linux/if_pppox.h>
>  
>  static inline struct pppoe_hdr *pppoe_hdr(const struct sk_buff *skb)
> @@ -32,6 +33,7 @@ struct pppoe_opt {
>  	struct pppoe_addr	pa;	  /* what this socket is bound to*/
>  	struct sockaddr_pppox	relay;	  /* what socket data will be
>  					     relayed to (PPPoE relaying) */
> +	struct work_struct      padt_work;/* Work item for handling PADT */
>  };
>  
>  struct pptp_opt {


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Schulz Feb. 20, 2015, 7:49 p.m. UTC | #6
Hello!

Simon Farnsworth schrieb am Fri, 20 Feb 2015 16:41:17 +0000:

>> Feb 20 16:45:44 sandbox local2.err pppd[539]: Failed to disconnect
>> PPPoE socket: 114 Operation already in progress
>>
> I assume there's nothing else wrong here, other than pppd complaining? The
> code doesn't suggest there will be issues if we fail to disconnect.

Yes, there are no further problems beside the message. Nevertheless I  
wanted to mention it because any error message containing something  
like "Failed to disconnect" may upset the user. Of course, most PPPoE  
users nowadays use a flat rate for accessing the Internet and need not  
worry much about a "failed" disconnection, but you never know for  
sure...


Regards,

Christoph Schulz

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simon Farnsworth Feb. 20, 2015, 9:04 p.m. UTC | #7
On Friday 20 February 2015 20:49:23 Christoph Schulz wrote:
> Hello!
> 
> Simon Farnsworth schrieb am Fri, 20 Feb 2015 16:41:17 +0000:
> >> Feb 20 16:45:44 sandbox local2.err pppd[539]: Failed to disconnect
> >> PPPoE socket: 114 Operation already in progress
> > 
> > I assume there's nothing else wrong here, other than pppd complaining? The
> > code doesn't suggest there will be issues if we fail to disconnect.
> 
> Yes, there are no further problems beside the message. Nevertheless I
> wanted to mention it because any error message containing something
> like "Failed to disconnect" may upset the user. Of course, most PPPoE
> users nowadays use a flat rate for accessing the Internet and need not
> worry much about a "failed" disconnection, but you never know for
> sure...
> 
I've now looked at the kernel code; the message is harmless, as closing the 
socket will also get rid of the associated session.

However, I think I should probably do a v2 patch with a clear commit message, 
so that if someone does bisect down to this commit as a cause of the "error" 
they're seeing, they'll understand why it's not a problem.

I'll also put together a pppd patch to not change session ID during shutdown. 
Won't be tonight - but that's no bad thing as it leaves time for more 
reviewers to comment on my kernel patch.
David Miller Feb. 22, 2015, 2:58 a.m. UTC | #8
From: Simon Farnsworth <simon@farnz.org.uk>
Date: Fri, 20 Feb 2015 21:04:53 +0000

> However, I think I should probably do a v2 patch with a clear commit message, 
> so that if someone does bisect down to this commit as a cause of the "error" 
> they're seeing, they'll understand why it's not a problem.

Ok then, I'll wait for that v2.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index d2408a5..9c97e9b 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -455,6 +455,18 @@  out:
 	return NET_RX_DROP;
 }
 
+static void pppoe_unbind_sock_work(struct work_struct *work)
+{
+	struct pppox_sock *po = container_of(work, struct pppox_sock,
+					     proto.pppoe.padt_work);
+	struct sock *sk = sk_pppox(po);
+
+	lock_sock(sk);
+	pppox_unbind_sock(sk);
+	release_sock(sk);
+	sock_put(sk);
+}
+
 /************************************************************************
  *
  * Receive a PPPoE Discovery frame.
@@ -500,7 +512,8 @@  static int pppoe_disc_rcv(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		bh_unlock_sock(sk);
-		sock_put(sk);
+		if (!schedule_work(&po->proto.pppoe.padt_work))
+			sock_put(sk);
 	}
 
 abort:
@@ -613,6 +626,8 @@  static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
 
 	lock_sock(sk);
 
+	INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work);
+
 	error = -EINVAL;
 	if (sp->sa_protocol != PX_PROTO_OE)
 		goto end;
diff --git a/include/linux/if_pppox.h b/include/linux/if_pppox.h
index aff7ad8..66a7d76 100644
--- a/include/linux/if_pppox.h
+++ b/include/linux/if_pppox.h
@@ -19,6 +19,7 @@ 
 #include <linux/netdevice.h>
 #include <linux/ppp_channel.h>
 #include <linux/skbuff.h>
+#include <linux/workqueue.h>
 #include <uapi/linux/if_pppox.h>
 
 static inline struct pppoe_hdr *pppoe_hdr(const struct sk_buff *skb)
@@ -32,6 +33,7 @@  struct pppoe_opt {
 	struct pppoe_addr	pa;	  /* what this socket is bound to*/
 	struct sockaddr_pppox	relay;	  /* what socket data will be
 					     relayed to (PPPoE relaying) */
+	struct work_struct      padt_work;/* Work item for handling PADT */
 };
 
 struct pptp_opt {