diff mbox

[net,V2] xen-netback: disable rogue vif in kthread context

Message ID 1395750051-15932-1-git-send-email-wei.liu2@citrix.com
State Deferred, archived
Delegated to: David Miller
Headers show

Commit Message

Wei Liu March 25, 2014, 12:20 p.m. UTC
When netback discovers frontend is sending malformed packet it will
disables the interface which serves that frontend.

However disabling a network interface involving taking a mutex which
cannot be done in softirq context, so we need to defer this process to
kthread context.

This patch does the following:
1. introduce a flag to indicate the interface is disabled.
2. check that flag in TX path, don't do any work if it's true.
3. check that flag in RX path, turn off that interface if it's true.

The reason to disable it in RX path is because RX uses kthread. After
this change the behavior of netback is still consistent -- it won't do
any TX work for a rogue frontend, and the interface will be eventually
turned off.

Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
doesn't make sense to continue processing packets if frontend is rogue.

This is a fix for XSA-90.

Reported-by: Török Edwin <edwin@etorok.net>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    5 +++++
 drivers/net/xen-netback/interface.c |   15 ++++++++++++++-
 drivers/net/xen-netback/netback.c   |   15 +++++++++++++--
 3 files changed, 32 insertions(+), 3 deletions(-)

Comments

David Vrabel March 25, 2014, 1:04 p.m. UTC | #1
On 25/03/14 12:20, Wei Liu wrote:
> When netback discovers frontend is sending malformed packet it will
> disables the interface which serves that frontend.
> 
> However disabling a network interface involving taking a mutex which
> cannot be done in softirq context, so we need to defer this process to
> kthread context.
> 
> This patch does the following:
> 1. introduce a flag to indicate the interface is disabled.
> 2. check that flag in TX path, don't do any work if it's true.
> 3. check that flag in RX path, turn off that interface if it's true.
> 
> The reason to disable it in RX path is because RX uses kthread. After
> this change the behavior of netback is still consistent -- it won't do
> any TX work for a rogue frontend, and the interface will be eventually
> turned off.
[...]
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -61,12 +61,23 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
>  {
>  	struct xenvif *vif = container_of(napi, struct xenvif, napi);
>  	int work_done;
> +	unsigned long flags;
> +
> +	/* This vif is rogue, we pretend we've there is nothing to do
> +	 * for this vif to deschedule it from NAPI. But this interface
> +	 * will be turned off in thread context later.
> +	 */
> +	if (unlikely(vif->disabled)) {
> +		local_irq_save(flags);
> +		__napi_complete(napi);
> +		local_irq_restore(flags);

Why isn't this napi_complete(napi) (which uses local_irq_save/restore()
internally)?

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Liu March 25, 2014, 1:38 p.m. UTC | #2
On Tue, Mar 25, 2014 at 01:04:10PM +0000, David Vrabel wrote:
> On 25/03/14 12:20, Wei Liu wrote:
> > When netback discovers frontend is sending malformed packet it will
> > disables the interface which serves that frontend.
> > 
> > However disabling a network interface involving taking a mutex which
> > cannot be done in softirq context, so we need to defer this process to
> > kthread context.
> > 
> > This patch does the following:
> > 1. introduce a flag to indicate the interface is disabled.
> > 2. check that flag in TX path, don't do any work if it's true.
> > 3. check that flag in RX path, turn off that interface if it's true.
> > 
> > The reason to disable it in RX path is because RX uses kthread. After
> > this change the behavior of netback is still consistent -- it won't do
> > any TX work for a rogue frontend, and the interface will be eventually
> > turned off.
> [...]
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -61,12 +61,23 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
> >  {
> >  	struct xenvif *vif = container_of(napi, struct xenvif, napi);
> >  	int work_done;
> > +	unsigned long flags;
> > +
> > +	/* This vif is rogue, we pretend we've there is nothing to do
> > +	 * for this vif to deschedule it from NAPI. But this interface
> > +	 * will be turned off in thread context later.
> > +	 */
> > +	if (unlikely(vif->disabled)) {
> > +		local_irq_save(flags);
> > +		__napi_complete(napi);
> > +		local_irq_restore(flags);
> 
> Why isn't this napi_complete(napi) (which uses local_irq_save/restore()
> internally)?
> 

The difference between napi_complete and __napi_complete is that
napi_complete will not descheduled this instance if it's being served on
another CPU.

Wei.

> David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss March 25, 2014, 1:40 p.m. UTC | #3
On 25/03/14 13:04, David Vrabel wrote:
> On 25/03/14 12:20, Wei Liu wrote:
>> When netback discovers frontend is sending malformed packet it will
>> disables the interface which serves that frontend.
>>
>> However disabling a network interface involving taking a mutex which
>> cannot be done in softirq context, so we need to defer this process to
>> kthread context.
>>
>> This patch does the following:
>> 1. introduce a flag to indicate the interface is disabled.
>> 2. check that flag in TX path, don't do any work if it's true.
>> 3. check that flag in RX path, turn off that interface if it's true.
>>
>> The reason to disable it in RX path is because RX uses kthread. After
>> this change the behavior of netback is still consistent -- it won't do
>> any TX work for a rogue frontend, and the interface will be eventually
>> turned off.
> [...]
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -61,12 +61,23 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
>>   {
>>   	struct xenvif *vif = container_of(napi, struct xenvif, napi);
>>   	int work_done;
>> +	unsigned long flags;
>> +
>> +	/* This vif is rogue, we pretend we've there is nothing to do
>> +	 * for this vif to deschedule it from NAPI. But this interface
>> +	 * will be turned off in thread context later.
>> +	 */
>> +	if (unlikely(vif->disabled)) {
>> +		local_irq_save(flags);
>> +		__napi_complete(napi);
>> +		local_irq_restore(flags);
>
> Why isn't this napi_complete(napi) (which uses local_irq_save/restore()
> internally)?
I guess we don't need napi_gro_flush, so you can spare a few cycles, and 
I don't know if we need to check for netpoll, I'm not sure it's a 
sensible thing to run debug console from a guest towards the backend (or 
do I misunderstand what's the purpose here?)

>
> David
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 26, 2014, 8:44 p.m. UTC | #4
From: Wei Liu <wei.liu2@citrix.com>
Date: Tue, 25 Mar 2014 12:20:51 +0000

> When netback discovers frontend is sending malformed packet it will
> disables the interface which serves that frontend.
> 
> However disabling a network interface involving taking a mutex which
> cannot be done in softirq context, so we need to defer this process to
> kthread context.
> 
> This patch does the following:
> 1. introduce a flag to indicate the interface is disabled.
> 2. check that flag in TX path, don't do any work if it's true.
> 3. check that flag in RX path, turn off that interface if it's true.
> 
> The reason to disable it in RX path is because RX uses kthread. After
> this change the behavior of netback is still consistent -- it won't do
> any TX work for a rogue frontend, and the interface will be eventually
> turned off.
> 
> Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
> doesn't make sense to continue processing packets if frontend is rogue.
> 
> This is a fix for XSA-90.
> 
> Reported-by: Török Edwin <edwin@etorok.net>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

What is the status of this patch?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 28, 2014, 6:37 p.m. UTC | #5
From: David Miller <davem@davemloft.net>
Date: Wed, 26 Mar 2014 16:44:47 -0400 (EDT)

> What is the status of this patch?

Since nobody is following up, I'm marking this patch as "deferred"
in patchwork.

Resubmit this change once things start progressing again.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell March 31, 2014, 10:30 a.m. UTC | #6
On Fri, 2014-03-28 at 14:37 -0400, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Wed, 26 Mar 2014 16:44:47 -0400 (EDT)
> 
> > What is the status of this patch?
> 
> Since nobody is following up, I'm marking this patch as "deferred"
> in patchwork.

Sorry, I hadn't realised Wei was away for a few days. I think he's back
today or tomorrow so it may as well wait for him to get back.

> Resubmit this change once things start progressing again.

Ack.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index ae413a2..4bf5b33 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -113,6 +113,11 @@  struct xenvif {
 	domid_t          domid;
 	unsigned int     handle;
 
+	/* Is this interface disabled? True when backend discovers
+	 * frontend is rogue.
+	 */
+	bool disabled;
+
 	/* Use NAPI for guest TX */
 	struct napi_struct napi;
 	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 301cc03..8c921de 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -61,12 +61,23 @@  static int xenvif_poll(struct napi_struct *napi, int budget)
 {
 	struct xenvif *vif = container_of(napi, struct xenvif, napi);
 	int work_done;
+	unsigned long flags;
+
+	/* This vif is rogue, we pretend we've there is nothing to do
+	 * for this vif to deschedule it from NAPI. But this interface
+	 * will be turned off in thread context later.
+	 */
+	if (unlikely(vif->disabled)) {
+		local_irq_save(flags);
+		__napi_complete(napi);
+		local_irq_restore(flags);
+		return 0;
+	}
 
 	work_done = xenvif_tx_action(vif, budget);
 
 	if (work_done < budget) {
 		int more_to_do = 0;
-		unsigned long flags;
 
 		/* It is necessary to disable IRQ before calling
 		 * RING_HAS_UNCONSUMED_REQUESTS. Otherwise we might
@@ -321,6 +332,8 @@  struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->ip_csum = 1;
 	vif->dev = dev;
 
+	vif->disabled = false;
+
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
 	vif->credit_usec  = 0UL;
 	init_timer(&vif->credit_timeout);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 438d0c0..17633dd 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -655,7 +655,8 @@  static void xenvif_tx_err(struct xenvif *vif,
 static void xenvif_fatal_tx_err(struct xenvif *vif)
 {
 	netdev_err(vif->dev, "fatal error; disabling device\n");
-	xenvif_carrier_off(vif);
+	vif->disabled = true;
+	xenvif_kick_thread(vif);
 }
 
 static int xenvif_count_requests(struct xenvif *vif,
@@ -1126,7 +1127,7 @@  static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 				   vif->tx.sring->req_prod, vif->tx.req_cons,
 				   XEN_NETIF_TX_RING_SIZE);
 			xenvif_fatal_tx_err(vif);
-			continue;
+			break;
 		}
 
 		work_to_do = RING_HAS_UNCONSUMED_REQUESTS(&vif->tx);
@@ -1549,6 +1550,16 @@  int xenvif_kthread(void *data)
 		wait_event_interruptible(vif->wq,
 					 rx_work_todo(vif) ||
 					 kthread_should_stop());
+
+		/* This frontend is found to be rogue, disable it in
+		 * kthread context. Currently this is only set when
+		 * netback finds out frontend sends malformed packet,
+		 * but we cannot disable the interface in softirq
+		 * context so we defer it here.
+		 */
+		if (unlikely(vif->disabled && netif_carrier_ok(vif->dev)))
+			xenvif_carrier_off(vif);
+
 		if (kthread_should_stop())
 			break;