diff mbox series

[net,3/3] hinic: fix bug of send pkts while setting channels

Message ID 20200902094145.12216-4-luobin9@huawei.com
State Changes Requested
Delegated to: David Miller
Headers show
Series hinic: BugFixes | expand

Commit Message

Luo bin Sept. 2, 2020, 9:41 a.m. UTC
When calling hinic_close in hinic_set_channels, netif_carrier_off
and netif_tx_disable are excuted, and TX host resources are freed
after that. Core may call hinic_xmit_frame to send pkt after
netif_tx_disable within a short time, so we should judge whether
carrier is on before sending pkt otherwise the resources that
have already been freed in hinic_close may be accessed.

Fixes: 2eed5a8b614b ("hinic: add set_channels ethtool_ops support")
Signed-off-by: Luo bin <luobin9@huawei.com>
---
 drivers/net/ethernet/huawei/hinic/hinic_tx.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Eric Dumazet Sept. 2, 2020, 10:16 a.m. UTC | #1
On 9/2/20 2:41 AM, Luo bin wrote:
> When calling hinic_close in hinic_set_channels, netif_carrier_off
> and netif_tx_disable are excuted, and TX host resources are freed
> after that. Core may call hinic_xmit_frame to send pkt after
> netif_tx_disable within a short time, so we should judge whether
> carrier is on before sending pkt otherwise the resources that
> have already been freed in hinic_close may be accessed.
> 
> Fixes: 2eed5a8b614b ("hinic: add set_channels ethtool_ops support")
> Signed-off-by: Luo bin <luobin9@huawei.com>
> ---
>  drivers/net/ethernet/huawei/hinic/hinic_tx.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_tx.c b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
> index a97498ee6914..a0662552a39c 100644
> --- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
> @@ -531,6 +531,11 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
>  	struct hinic_txq *txq;
>  	struct hinic_qp *qp;
>  
> +	if (unlikely(!netif_carrier_ok(netdev))) {
> +		dev_kfree_skb_any(skb);
> +		return NETDEV_TX_OK;
> +	}
> +
>  	txq = &nic_dev->txqs[q_id];
>  	qp = container_of(txq->sq, struct hinic_qp, sq);
>  
> 

Adding this kind of tests in fast path seems a big hammer to me.

See https://marc.info/?l=linux-netdev&m=159903844423389&w=2   for a similar problem.

Normally, after hinic_close() operation, no packet should be sent by core networking stack.

Trying to work around some core networking issue in each driver is a dead end.
David Miller Sept. 2, 2020, 7:52 p.m. UTC | #2
From: Luo bin <luobin9@huawei.com>
Date: Wed, 2 Sep 2020 17:41:45 +0800

> @@ -531,6 +531,11 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
>  	struct hinic_txq *txq;
>  	struct hinic_qp *qp;
>  
> +	if (unlikely(!netif_carrier_ok(netdev))) {
> +		dev_kfree_skb_any(skb);
> +		return NETDEV_TX_OK;
> +	}

As Eric said, these kinds of tests should not be placed in the fast path
of the driver.

If you invoke close and the core networking still sends packets to the
driver, that's a bug that needs to be fixed in the core networking.
Luo bin Sept. 3, 2020, 2:18 p.m. UTC | #3
On 2020/9/2 18:16, Eric Dumazet wrote:
> 
> 
> On 9/2/20 2:41 AM, Luo bin wrote:
>> When calling hinic_close in hinic_set_channels, netif_carrier_off
>> and netif_tx_disable are excuted, and TX host resources are freed
>> after that. Core may call hinic_xmit_frame to send pkt after
>> netif_tx_disable within a short time, so we should judge whether
>> carrier is on before sending pkt otherwise the resources that
>> have already been freed in hinic_close may be accessed.
>>
>> Fixes: 2eed5a8b614b ("hinic: add set_channels ethtool_ops support")
>> Signed-off-by: Luo bin <luobin9@huawei.com>
>> ---
>>  drivers/net/ethernet/huawei/hinic/hinic_tx.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_tx.c b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
>> index a97498ee6914..a0662552a39c 100644
>> --- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
>> +++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
>> @@ -531,6 +531,11 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
>>  	struct hinic_txq *txq;
>>  	struct hinic_qp *qp;
>>  
>> +	if (unlikely(!netif_carrier_ok(netdev))) {
>> +		dev_kfree_skb_any(skb);
>> +		return NETDEV_TX_OK;
>> +	}
>> +
>>  	txq = &nic_dev->txqs[q_id];
>>  	qp = container_of(txq->sq, struct hinic_qp, sq);
>>  
>>
> 
> Adding this kind of tests in fast path seems a big hammer to me.
> 
> See https://marc.info/?l=linux-netdev&m=159903844423389&w=2   for a similar problem.
> 
> Normally, after hinic_close() operation, no packet should be sent by core networking stack.
> 
> Trying to work around some core networking issue in each driver is a dead end.
Thanks for your review. I agree with what you said. Theoretically, core can't call ndo_start_xmit
to send packet after netif_tx_disable called by hinic_close because __QUEUE_STATE_DRV_XOFF bit is set
and this bit is protected by __netif_tx_lock but it does call hinic_xmit_frame after netif_tx_disable
in my debug message. I'll try to figure out why and fix it. It seems like that the patch from
https://marc.info/?l=linux-netdev&m=159903844423389&w=2 can't fix this problem.
> 
> 
> 
> 
> 
> 
> .
>
Luo bin Sept. 3, 2020, 2:27 p.m. UTC | #4
On 2020/9/3 3:52, David Miller wrote:
> From: Luo bin <luobin9@huawei.com>
> Date: Wed, 2 Sep 2020 17:41:45 +0800
> 
>> @@ -531,6 +531,11 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
>>  	struct hinic_txq *txq;
>>  	struct hinic_qp *qp;
>>  
>> +	if (unlikely(!netif_carrier_ok(netdev))) {
>> +		dev_kfree_skb_any(skb);
>> +		return NETDEV_TX_OK;
>> +	}
> 
> As Eric said, these kinds of tests should not be placed in the fast path
> of the driver.
> 
> If you invoke close and the core networking still sends packets to the
> driver, that's a bug that needs to be fixed in the core networking.
> .
> 
Okay, I'm trying to figure out why the core networking can still call ndo_start_xmit
after netif_tx_disable and solve the problem fundamentally. And I'll undo this patch
temporarily.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_tx.c b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
index a97498ee6914..a0662552a39c 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
@@ -531,6 +531,11 @@  netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	struct hinic_txq *txq;
 	struct hinic_qp *qp;
 
+	if (unlikely(!netif_carrier_ok(netdev))) {
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+
 	txq = &nic_dev->txqs[q_id];
 	qp = container_of(txq->sq, struct hinic_qp, sq);