diff mbox

xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()

Message ID 1485798346-4425-1-git-send-email-boris.ostrovsky@oracle.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Boris Ostrovsky Jan. 30, 2017, 5:45 p.m. UTC
rx_refill_timer should be deleted as soon as we disconnect from the
backend since otherwise it is possible for the timer to go off before
we get to xennet_destroy_queues(). If this happens we may dereference
queue->rx.sring which is set to NULL in xennet_disconnect_backend().

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: stable@vger.kernel.org
---
 drivers/net/xen-netfront.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Eric Dumazet Jan. 30, 2017, 6:07 p.m. UTC | #1
On Mon, 2017-01-30 at 12:45 -0500, Boris Ostrovsky wrote:
> rx_refill_timer should be deleted as soon as we disconnect from the
> backend since otherwise it is possible for the timer to go off before
> we get to xennet_destroy_queues(). If this happens we may dereference
> queue->rx.sring which is set to NULL in xennet_disconnect_backend().
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: stable@vger.kernel.org
> ---
>  drivers/net/xen-netfront.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 8315fe7..722fe9f 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1379,6 +1379,8 @@ static void xennet_disconnect_backend(struct netfront_info *info)
>  	for (i = 0; i < num_queues && info->queues; ++i) {
>  		struct netfront_queue *queue = &info->queues[i];
>  
> +		del_timer_sync(&queue->rx_refill_timer);
> +

If napi_disable() was not called before this del_timer_sync(), another
RX might come here and rearm rx_refill_timer.

>  		if (queue->tx_irq && (queue->tx_irq == queue->rx_irq))
>  			unbind_from_irqhandler(queue->tx_irq, queue);
>  		if (queue->tx_irq && (queue->tx_irq != queue->rx_irq)) {
> @@ -1733,7 +1735,6 @@ static void xennet_destroy_queues(struct netfront_info *info)
>  
>  		if (netif_running(info->netdev))
>  			napi_disable(&queue->napi);
> -		del_timer_sync(&queue->rx_refill_timer);
>  		netif_napi_del(&queue->napi);
>  	}
>
Boris Ostrovsky Jan. 30, 2017, 6:23 p.m. UTC | #2
On 01/30/2017 01:07 PM, Eric Dumazet wrote:
> On Mon, 2017-01-30 at 12:45 -0500, Boris Ostrovsky wrote:
>> rx_refill_timer should be deleted as soon as we disconnect from the
>> backend since otherwise it is possible for the timer to go off before
>> we get to xennet_destroy_queues(). If this happens we may dereference
>> queue->rx.sring which is set to NULL in xennet_disconnect_backend().
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> CC: stable@vger.kernel.org
>> ---
>>  drivers/net/xen-netfront.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index 8315fe7..722fe9f 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -1379,6 +1379,8 @@ static void xennet_disconnect_backend(struct netfront_info *info)
>>  	for (i = 0; i < num_queues && info->queues; ++i) {
>>  		struct netfront_queue *queue = &info->queues[i];
>>  
>> +		del_timer_sync(&queue->rx_refill_timer);
>> +
> If napi_disable() was not called before this del_timer_sync(), another
> RX might come here and rearm rx_refill_timer.

We do netif_carrier_off() first thing in xennet_disconnect_backend() and
the only place where the timer is rearmed is xennet_alloc_rx_buffers(),
which is guarded by netif_carrier_ok() check.

-boris


>
>>  		if (queue->tx_irq && (queue->tx_irq == queue->rx_irq))
>>  			unbind_from_irqhandler(queue->tx_irq, queue);
>>  		if (queue->tx_irq && (queue->tx_irq != queue->rx_irq)) {
>> @@ -1733,7 +1735,6 @@ static void xennet_destroy_queues(struct netfront_info *info)
>>  
>>  		if (netif_running(info->netdev))
>>  			napi_disable(&queue->napi);
>> -		del_timer_sync(&queue->rx_refill_timer);
>>  		netif_napi_del(&queue->napi);
>>  	}
>>  
>
Eric Dumazet Jan. 30, 2017, 7:06 p.m. UTC | #3
On Mon, 2017-01-30 at 13:23 -0500, Boris Ostrovsky wrote:

> We do netif_carrier_off() first thing in xennet_disconnect_backend() and
> the only place where the timer is rearmed is xennet_alloc_rx_buffers(),
> which is guarded by netif_carrier_ok() check.

Oh well, testing netif_carrier_ok() in packet processing fast path looks
unusual and a waste of cpu cycles. I've never seen that pattern before.

If one day, we remove this netif_carrier_ok() test during a cleanup,
then the race window will open again.
Boris Ostrovsky Jan. 30, 2017, 7:31 p.m. UTC | #4
On 01/30/2017 02:06 PM, Eric Dumazet wrote:
> On Mon, 2017-01-30 at 13:23 -0500, Boris Ostrovsky wrote:
>
>> We do netif_carrier_off() first thing in xennet_disconnect_backend() and
>> the only place where the timer is rearmed is xennet_alloc_rx_buffers(),
>> which is guarded by netif_carrier_ok() check.
> Oh well, testing netif_carrier_ok() in packet processing fast path looks
> unusual and a waste of cpu cycles. I've never seen that pattern before.
>
> If one day, we remove this netif_carrier_ok() test during a cleanup,
> then the race window will open again.


I don't know much about napi but I wonder whether I can indeed disable
it in xennet_disconnect_backend(). I don't see how anything can happen
after disconnect since it unmaps the rings. And then napi is re-enabled
during reconnection in xennet_create_queues(). In which case am not sure
there is any need for xennet_destroy_queues() as everything there could
be folded into xennet_disconnect_backend().

-boris
Boris Ostrovsky Jan. 31, 2017, 5:47 p.m. UTC | #5
On 01/30/2017 02:31 PM, Boris Ostrovsky wrote:
> On 01/30/2017 02:06 PM, Eric Dumazet wrote:
>> On Mon, 2017-01-30 at 13:23 -0500, Boris Ostrovsky wrote:
>>
>>> We do netif_carrier_off() first thing in xennet_disconnect_backend() and
>>> the only place where the timer is rearmed is xennet_alloc_rx_buffers(),
>>> which is guarded by netif_carrier_ok() check.
>> Oh well, testing netif_carrier_ok() in packet processing fast path looks
>> unusual and a waste of cpu cycles. I've never seen that pattern before.
>>
>> If one day, we remove this netif_carrier_ok() test during a cleanup,
>> then the race window will open again.
>
> I don't know much about napi but I wonder whether I can indeed disable
> it in xennet_disconnect_backend(). I don't see how anything can happen
> after disconnect since it unmaps the rings. And then napi is re-enabled
> during reconnection in xennet_create_queues(). In which case am not sure
> there is any need for xennet_destroy_queues() as everything there could
> be folded into xennet_disconnect_backend().

While this does work, there was a reason why napi_disable() was not
called in xennet_disconnect_backend() and it is explained in commit
ce58725fec6e --- napi_disable() may sleep and that's why it is called in
xennet_destroy_queues().

OTOH, there is a napi_synchronize() call in xennet_destroy_queues().
Will destroying the timer after it guarantee that all preceding RX have
been completed? RX interrupt is disabled prior to napi_synchronize() so
presumably nothing new can be received.


-boris
Boris Ostrovsky Feb. 1, 2017, 11:29 p.m. UTC | #6
On 01/31/2017 12:47 PM, Boris Ostrovsky wrote:
> On 01/30/2017 02:31 PM, Boris Ostrovsky wrote:
>> On 01/30/2017 02:06 PM, Eric Dumazet wrote:
>>> On Mon, 2017-01-30 at 13:23 -0500, Boris Ostrovsky wrote:
>>>
>>>> We do netif_carrier_off() first thing in xennet_disconnect_backend() and
>>>> the only place where the timer is rearmed is xennet_alloc_rx_buffers(),
>>>> which is guarded by netif_carrier_ok() check.
>>> Oh well, testing netif_carrier_ok() in packet processing fast path looks
>>> unusual and a waste of cpu cycles. I've never seen that pattern before.
>>>
>>> If one day, we remove this netif_carrier_ok() test during a cleanup,
>>> then the race window will open again.
>> I don't know much about napi but I wonder whether I can indeed disable
>> it in xennet_disconnect_backend(). I don't see how anything can happen
>> after disconnect since it unmaps the rings. And then napi is re-enabled
>> during reconnection in xennet_create_queues(). In which case am not sure
>> there is any need for xennet_destroy_queues() as everything there could
>> be folded into xennet_disconnect_backend().
> While this does work, there was a reason why napi_disable() was not
> called in xennet_disconnect_backend() and it is explained in commit
> ce58725fec6e --- napi_disable() may sleep and that's why it is called in
> xennet_destroy_queues().
>
> OTOH, there is a napi_synchronize() call in xennet_destroy_queues().
> Will destroying the timer after it guarantee that all preceding RX have
> been completed? RX interrupt is disabled prior to napi_synchronize() so
> presumably nothing new can be received.


I could not convince myself that napi_synchronize() is sufficient here
(mostly because I am not familiar with napi flow). At the same time I
would rather not make changes in anticipation of possible disappearance
of netif_carrier_ok() in the future so I'd like this patch to go in as is.

Unless there are other problems with the patch or if Eric (or others)
feel strongly about usage of netif_carrier_ok() here.


-boris
Eric Dumazet Feb. 2, 2017, 12:01 a.m. UTC | #7
On Wed, 2017-02-01 at 18:29 -0500, Boris Ostrovsky wrote:

> 
> I could not convince myself that napi_synchronize() is sufficient here
> (mostly because I am not familiar with napi flow). At the same time I
> would rather not make changes in anticipation of possible disappearance
> of netif_carrier_ok() in the future so I'd like this patch to go in as is.
> 
> Unless there are other problems with the patch or if Eric (or others)
> feel strongly about usage of netif_carrier_ok() here.
> 

No strong feelings from me.
We probably have more serious issues to fix anyway.
Jürgen Groß Feb. 3, 2017, 9:38 a.m. UTC | #8
On 30/01/17 18:45, Boris Ostrovsky wrote:
> rx_refill_timer should be deleted as soon as we disconnect from the
> backend since otherwise it is possible for the timer to go off before
> we get to xennet_destroy_queues(). If this happens we may dereference
> queue->rx.sring which is set to NULL in xennet_disconnect_backend().
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: stable@vger.kernel.org

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen
Boris Ostrovsky Feb. 9, 2017, 1:42 p.m. UTC | #9
On 02/03/2017 04:38 AM, Juergen Gross wrote:
> On 30/01/17 18:45, Boris Ostrovsky wrote:
>> rx_refill_timer should be deleted as soon as we disconnect from the
>> backend since otherwise it is possible for the timer to go off before
>> we get to xennet_destroy_queues(). If this happens we may dereference
>> queue->rx.sring which is set to NULL in xennet_disconnect_backend().
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> CC: stable@vger.kernel.org
>
> Reviewed-by: Juergen Gross <jgross@suse.com>



David,

Are you going to take this to your tree or would you rather it goes via 
Xen tree?

And the same question for

https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00625.html

and

https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00754.html


(And while I am at it, I took

https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg00833.html

to Xen tree, hope you don't mind)


-boris
David Miller Feb. 10, 2017, 6:46 p.m. UTC | #10
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Date: Thu, 9 Feb 2017 08:42:59 -0500

> Are you going to take this to your tree or would you rather it goes
> via Xen tree?

Ok, I just did.

> And the same question for
> 
> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00625.html

As I stated in the thread, I applied this one.

> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00754.html

Likewise.

In the future, if you use netdev patchwork URLs, two things will
happen.  You will see immediately in the discussion log and the patch
state whether I applied it or not.  And second, I will be able to
reference and do something with the patch that much more quickly
and easily.

Thank you.
diff mbox

Patch

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 8315fe7..722fe9f 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1379,6 +1379,8 @@  static void xennet_disconnect_backend(struct netfront_info *info)
 	for (i = 0; i < num_queues && info->queues; ++i) {
 		struct netfront_queue *queue = &info->queues[i];
 
+		del_timer_sync(&queue->rx_refill_timer);
+
 		if (queue->tx_irq && (queue->tx_irq == queue->rx_irq))
 			unbind_from_irqhandler(queue->tx_irq, queue);
 		if (queue->tx_irq && (queue->tx_irq != queue->rx_irq)) {
@@ -1733,7 +1735,6 @@  static void xennet_destroy_queues(struct netfront_info *info)
 
 		if (netif_running(info->netdev))
 			napi_disable(&queue->napi);
-		del_timer_sync(&queue->rx_refill_timer);
 		netif_napi_del(&queue->napi);
 	}