diff mbox series

[iwl-net,v1] i40e: fix "Error I40E_AQ_RC_ENOSPC adding RX filters on VF" issue

Message ID 20241015070450.1572415-1-aleksandr.loktionov@intel.com
State Changes Requested
Headers show
Series [iwl-net,v1] i40e: fix "Error I40E_AQ_RC_ENOSPC adding RX filters on VF" issue | expand

Commit Message

Loktionov, Aleksandr Oct. 15, 2024, 7:04 a.m. UTC
Fix a race condition in the i40e driver that leads to MAC/VLAN filters
becoming corrupted and leaking. Address the issue that occurs under
heavy load when multiple threads are concurrently modifying MAC/VLAN
filters by setting mac and port VLAN.

1. Thread T0 allocates a filter in i40e_add_filter() within
        i40e_ndo_set_vf_port_vlan().
2. Thread T1 concurrently frees the filter in __i40e_del_filter() within
        i40e_ndo_set_vf_mac().
3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which
        refers to the already freed filter memory, causing corruption.

Reproduction steps:
1. Spawn multiple VFs.
2. Apply a concurrent heavy load by running parallel operations to change
        MAC addresses on the VFs and change port VLANs on the host.
3. Observe errors in dmesg:
"Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,
 please set promiscuous on manually for VF XX".

The fix involves implementing a new intermediate filter state,
I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list.
These filters cannot be deleted from the hash list directly but
must be removed using the full process.

Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 12 ++++++++++--
 3 files changed, 14 insertions(+), 2 deletions(-)

Comments

Paul Menzel Oct. 15, 2024, 8:24 a.m. UTC | #1
Dear Aleksandr,


Thank you for your patch. For the summary I’d make it more about the 
action of the patch like “Add intermediate filter to …”.


Am 15.10.24 um 09:04 schrieb Aleksandr Loktionov:
> Fix a race condition in the i40e driver that leads to MAC/VLAN filters
> becoming corrupted and leaking. Address the issue that occurs under
> heavy load when multiple threads are concurrently modifying MAC/VLAN
> filters by setting mac and port VLAN.
> 
> 1. Thread T0 allocates a filter in i40e_add_filter() within
>          i40e_ndo_set_vf_port_vlan().
> 2. Thread T1 concurrently frees the filter in __i40e_del_filter() within
>          i40e_ndo_set_vf_mac().
> 3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which
>          refers to the already freed filter memory, causing corruption.
> 
> Reproduction steps:
> 1. Spawn multiple VFs.
> 2. Apply a concurrent heavy load by running parallel operations to change
>          MAC addresses on the VFs and change port VLANs on the host.

It’d be great if you shared your commands.

> 3. Observe errors in dmesg:
> "Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,
>   please set promiscuous on manually for VF XX".

I’d indent it by eight spaces and put it in one line.

> The fix involves implementing a new intermediate filter state,
> I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list.
> These filters cannot be deleted from the hash list directly but
> must be removed using the full process.

Please excuse my ignorance. Where is that done in the diff? For me it 
looks like you only replace `I40E_FILTER_NEW` by `I40E_FILTER_NEW_SYNC` 
in certain places, but no new condition for this case.

> Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key")
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
>   drivers/net/ethernet/intel/i40e/i40e.h         |  2 ++
>   drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  2 ++
>   drivers/net/ethernet/intel/i40e/i40e_main.c    | 12 ++++++++++--
>   3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> index 2089a0e..a1842dd 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -755,6 +755,8 @@ enum i40e_filter_state {
>   	I40E_FILTER_ACTIVE,		/* Added to switch by FW */
>   	I40E_FILTER_FAILED,		/* Rejected by FW */
>   	I40E_FILTER_REMOVE,		/* To be removed */
> +	/* RESERVED */

Why the reserved comment? Please elaborate in the commit message.

> +	I40E_FILTER_NEW_SYNC = 6,	/* New, not sent, in sync task */
>   /* There is no 'removed' state; the filter struct is freed */
>   };
>   struct i40e_mac_filter {
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> index abf624d..1c439b1 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> @@ -89,6 +89,8 @@ static char *i40e_filter_state_string[] = {
>   	"ACTIVE",
>   	"FAILED",
>   	"REMOVE",
> +	"<RESERVED>",
> +	"NEW_SYNC",
>   };
>   
>   /**
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 25295ae..55fb362 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -1255,6 +1255,7 @@ int i40e_count_filters(struct i40e_vsi *vsi)
>   
>   	hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) {
>   		if (f->state == I40E_FILTER_NEW ||
> +		    f->state == I40E_FILTER_NEW_SYNC ||
>   		    f->state == I40E_FILTER_ACTIVE)
>   			++cnt;
>   	}
> @@ -1441,6 +1442,8 @@ static int i40e_correct_mac_vlan_filters(struct i40e_vsi *vsi,
>   
>   			new->f = add_head;
>   			new->state = add_head->state;
> +			if (add_head->state == I40E_FILTER_NEW)
> +				add_head->state = I40E_FILTER_NEW_SYNC;
>   
>   			/* Add the new filter to the tmp list */
>   			hlist_add_head(&new->hlist, tmp_add_list);
> @@ -1550,6 +1553,8 @@ static int i40e_correct_vf_mac_vlan_filters(struct i40e_vsi *vsi,
>   				return -ENOMEM;
>   			new_mac->f = add_head;
>   			new_mac->state = add_head->state;
> +			if (add_head->state == I40E_FILTER_NEW)
> +				add_head->state = I40E_FILTER_NEW_SYNC;
>   
>   			/* Add the new filter to the tmp list */
>   			hlist_add_head(&new_mac->hlist, tmp_add_list);
> @@ -2437,7 +2442,8 @@ static int
>   i40e_aqc_broadcast_filter(struct i40e_vsi *vsi, const char *vsi_name,
>   			  struct i40e_mac_filter *f)
>   {
> -	bool enable = f->state == I40E_FILTER_NEW;
> +	bool enable = f->state == I40E_FILTER_NEW ||
> +		      f->state == I40E_FILTER_NEW_SYNC;
>   	struct i40e_hw *hw = &vsi->back->hw;
>   	int aq_ret;
>   
> @@ -2611,6 +2617,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi)
>   
>   				/* Add it to the hash list */
>   				hlist_add_head(&new->hlist, &tmp_add_list);
> +				f->state = I40E_FILTER_NEW_SYNC;
>   			}
>   
>   			/* Count the number of active (current and new) VLAN
> @@ -2762,7 +2769,8 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi)
>   		spin_lock_bh(&vsi->mac_filter_hash_lock);
>   		hlist_for_each_entry_safe(new, h, &tmp_add_list, hlist) {
>   			/* Only update the state if we're still NEW */
> -			if (new->f->state == I40E_FILTER_NEW)
> +			if (new->f->state == I40E_FILTER_NEW ||
> +			    new->f->state == I40E_FILTER_NEW_SYNC)
>   				new->f->state = new->state;
>   			hlist_del(&new->hlist);
>   			netdev_hw_addr_refcnt(new->f, vsi->netdev, -1);


Kind nregards,

Paul
Loktionov, Aleksandr Oct. 15, 2024, 9:45 a.m. UTC | #2
> -----Original Message-----
> From: Paul Menzel <pmenzel@molgen.mpg.de>
> Sent: Tuesday, October 15, 2024 10:24 AM
> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; netdev@vger.kernel.org
> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v1] i40e: fix "Error
> I40E_AQ_RC_ENOSPC adding RX filters on VF" issue
> 
> Dear Aleksandr,
> 
> 
> Thank you for your patch. For the summary I’d make it more about the
> action of the patch like “Add intermediate filter to …”.
> 
Good day Paul
Sorry, I don't get your point. This patch fixes bug that can stop vfs to receive any traffic making them useless.
The first and most visible effect of the bug is a lot of "Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,..." errors from F/W complaining to add MAC/VLAN filter. So I've mentioned it in the title to be easy found.
I don't add any filter in the driver, we have to add one more intermediate state of the filter to avoid the race condition.

> 
> Am 15.10.24 um 09:04 schrieb Aleksandr Loktionov:
> > Fix a race condition in the i40e driver that leads to MAC/VLAN
> filters
> > becoming corrupted and leaking. Address the issue that occurs under
> > heavy load when multiple threads are concurrently modifying MAC/VLAN
> > filters by setting mac and port VLAN.
> >
> > 1. Thread T0 allocates a filter in i40e_add_filter() within
> >          i40e_ndo_set_vf_port_vlan().
> > 2. Thread T1 concurrently frees the filter in __i40e_del_filter()
> within
> >          i40e_ndo_set_vf_mac().
> > 3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(),
> which
> >          refers to the already freed filter memory, causing
> corruption.
> >
> > Reproduction steps:
> > 1. Spawn multiple VFs.
> > 2. Apply a concurrent heavy load by running parallel operations to
> change
> >          MAC addresses on the VFs and change port VLANs on the host.
> 
> It’d be great if you shared your commands.
Sorry, I'm pretty sure it's quite impossible to reproduce the issue with bash scripts /*I've tried really hard*/. Reproduction is related to user-space/kernel code which might be not open-sourced. So as I've explained in the commit title the race condition possibility that was introduced from the very beginning.

> 
> > 3. Observe errors in dmesg:
> > "Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,
> >   please set promiscuous on manually for VF XX".
> 
> I’d indent it by eight spaces and put it in one line.
Ok, I'll fix it in v2

> > The fix involves implementing a new intermediate filter state,
> > I40E_FILTER_NEW_SYNC, for the time when a filter is on a
> tmp_add_list.
> > These filters cannot be deleted from the hash list directly but
> > must be removed using the full process.
> 
> Please excuse my ignorance. Where is that done in the diff? For me it
> looks like you only replace `I40E_FILTER_NEW` by
> `I40E_FILTER_NEW_SYNC`
> in certain places, but no new condition for this case.
> 
Here are below the code which adds new I40E_FILTER_NEW_SYNC enum.
And additional conditions for this I40E_FILTER_NEW_SYNC state.
All other places in the driver just tract I40E_FILTER_NEW_SYNC as not just I40E_FILTER_NEW by default. 

> > Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with
> the MAC Address as key")
> > Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> > ---
> >   drivers/net/ethernet/intel/i40e/i40e.h         |  2 ++
> >   drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  2 ++
> >   drivers/net/ethernet/intel/i40e/i40e_main.c    | 12 ++++++++++--
> >   3 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e.h
> b/drivers/net/ethernet/intel/i40e/i40e.h
> > index 2089a0e..a1842dd 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> > @@ -755,6 +755,8 @@ enum i40e_filter_state {
> >   	I40E_FILTER_ACTIVE,		/* Added to switch by FW */
> >   	I40E_FILTER_FAILED,		/* Rejected by FW */
> >   	I40E_FILTER_REMOVE,		/* To be removed */
> > +	/* RESERVED */
> 
> Why the reserved comment? Please elaborate in the commit message.
This is for not breaking compatibility with different driver versions.
Between OOT, net-next and plain old net. Isn't it obvious from the comment, it's "RESERVERD"?
Can you provide me example commit message what I should follow?

> > +	I40E_FILTER_NEW_SYNC = 6,	/* New, not sent, in sync task */
> >   /* There is no 'removed' state; the filter struct is freed */
> >   };
> >   struct i40e_mac_filter {
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> > index abf624d..1c439b1 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> > @@ -89,6 +89,8 @@ static char *i40e_filter_state_string[] = {
> >   	"ACTIVE",
> >   	"FAILED",
> >   	"REMOVE",
> > +	"<RESERVED>",
> > +	"NEW_SYNC",
> >   };
> >
> >   /**
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > index 25295ae..55fb362 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > @@ -1255,6 +1255,7 @@ int i40e_count_filters(struct i40e_vsi *vsi)
> >
> >   	hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) {
> >   		if (f->state == I40E_FILTER_NEW ||
> > +		    f->state == I40E_FILTER_NEW_SYNC ||
> >   		    f->state == I40E_FILTER_ACTIVE)
> >   			++cnt;
> >   	}
> > @@ -1441,6 +1442,8 @@ static int
> i40e_correct_mac_vlan_filters(struct i40e_vsi *vsi,
> >
> >   			new->f = add_head;
> >   			new->state = add_head->state;
> > +			if (add_head->state == I40E_FILTER_NEW)
> > +				add_head->state = I40E_FILTER_NEW_SYNC;
> >
> >   			/* Add the new filter to the tmp list */
> >   			hlist_add_head(&new->hlist, tmp_add_list);
> > @@ -1550,6 +1553,8 @@ static int
> i40e_correct_vf_mac_vlan_filters(struct i40e_vsi *vsi,
> >   				return -ENOMEM;
> >   			new_mac->f = add_head;
> >   			new_mac->state = add_head->state;
> > +			if (add_head->state == I40E_FILTER_NEW)
> > +				add_head->state = I40E_FILTER_NEW_SYNC;
> >
> >   			/* Add the new filter to the tmp list */
> >   			hlist_add_head(&new_mac->hlist, tmp_add_list);
> > @@ -2437,7 +2442,8 @@ static int
> >   i40e_aqc_broadcast_filter(struct i40e_vsi *vsi, const char
> *vsi_name,
> >   			  struct i40e_mac_filter *f)
> >   {
> > -	bool enable = f->state == I40E_FILTER_NEW;
> > +	bool enable = f->state == I40E_FILTER_NEW ||
> > +		      f->state == I40E_FILTER_NEW_SYNC;
> >   	struct i40e_hw *hw = &vsi->back->hw;
> >   	int aq_ret;
> >
> > @@ -2611,6 +2617,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi
> *vsi)
> >
> >   				/* Add it to the hash list */
> >   				hlist_add_head(&new->hlist, &tmp_add_list);
> > +				f->state = I40E_FILTER_NEW_SYNC;
> >   			}
> >
> >   			/* Count the number of active (current and new)
> VLAN
> > @@ -2762,7 +2769,8 @@ int i40e_sync_vsi_filters(struct i40e_vsi
> *vsi)
> >   		spin_lock_bh(&vsi->mac_filter_hash_lock);
> >   		hlist_for_each_entry_safe(new, h, &tmp_add_list, hlist) {
> >   			/* Only update the state if we're still NEW */
> > -			if (new->f->state == I40E_FILTER_NEW)
> > +			if (new->f->state == I40E_FILTER_NEW ||
> > +			    new->f->state == I40E_FILTER_NEW_SYNC)
> >   				new->f->state = new->state;
> >   			hlist_del(&new->hlist);
> >   			netdev_hw_addr_refcnt(new->f, vsi->netdev, -1);
> 
> 
> Kind nregards,
> 
> Paul
Paul Menzel Oct. 15, 2024, 10:32 a.m. UTC | #3
Dear Aleksandr,


Thank you for your reply. Just a note at the beginning, that your mailer 
seems to wrap lines without preserving the right citation level. It’d be 
great if you used a mailer following standards. (I know, it’s hard in a 
corporate environment, but other Intel developers seem to have found 
solutions for this.)


Am 15.10.24 um 11:45 schrieb Loktionov, Aleksandr:

>> -----Original Message-----
>> From: Paul Menzel <pmenzel@molgen.mpg.de>
>> Sent: Tuesday, October 15, 2024 10:24 AM

>> Thank you for your patch. For the summary I’d make it more about the
>> action of the patch like “Add intermediate filter to …”.
> 
> Sorry, I don't get your point. This patch fixes bug that can stop
> vfs to receive any traffic making them useless. The first and most
> visible effect of the bug is a lot of "Error I40E_AQ_RC_ENOSPC
> adding RX filters on VF XX,..." errors from F/W complaining to add
> MAC/VLAN filter. So I've mentioned it in the title to be easy found. 
> I don't add any filter in the driver, we have to add one more
> intermediate state of the filter to avoid the race condition.

In my opinion, having the log in the body is good enough for search 
engines and the summary should be optimized for `git log --oneline` 
consumption. I am sorry about the confusion with my example. Maybe:

Add intermediate sync state to fix race condition

>> Am 15.10.24 um 09:04 schrieb Aleksandr Loktionov:
>>> Fix a race condition in the i40e driver that leads to MAC/VLAN filters
>>> becoming corrupted and leaking. Address the issue that occurs under
>>> heavy load when multiple threads are concurrently modifying MAC/VLAN
>>> filters by setting mac and port VLAN.
>>>
>>> 1. Thread T0 allocates a filter in i40e_add_filter() within
>>>           i40e_ndo_set_vf_port_vlan().
>>> 2. Thread T1 concurrently frees the filter in __i40e_del_filter() within
>>>           i40e_ndo_set_vf_mac().
>>> 3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which
>>>           refers to the already freed filter memory, causing corruption.
>>>
>>> Reproduction steps:
>>> 1. Spawn multiple VFs.
>>> 2. Apply a concurrent heavy load by running parallel operations to change
>>>           MAC addresses on the VFs and change port VLANs on the host.
>>
>> It’d be great if you shared your commands.

> Sorry, I'm pretty sure it's quite impossible to reproduce the issue
> with bash scripts /*I've tried really hard*/. Reproduction is
> related to user-space/kernel code which might be not open-sourced.
> So as I've explained in the commit title the race condition
> possibility that was introduced from the very beginning.

Could you please ask to get clearance to publish it. My naive view 
wonders, why legal(?) should oppose publication.

>>> 3. Observe errors in dmesg:
>>> "Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,
>>>    please set promiscuous on manually for VF XX".
>>
>> I’d indent it by eight spaces and put it in one line.
> Ok, I'll fix it in v2
> 
>>> The fix involves implementing a new intermediate filter state,
>>> I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list.
>>> These filters cannot be deleted from the hash list directly but
>>> must be removed using the full process.
>>
>> Please excuse my ignorance. Where is that done in the diff? For me it
>> looks like you only replace `I40E_FILTER_NEW` by `I40E_FILTER_NEW_SYNC`
>> in certain places, but no new condition for this case.
>
> Here are below the code which adds new I40E_FILTER_NEW_SYNC enum. 
> And additional conditions for this I40E_FILTER_NEW_SYNC state. All
> other places in the driver just tract I40E_FILTER_NEW_SYNC as not
> just I40E_FILTER_NEW by default.
Thank you. For me it’s not so obvious from the diff, and indeed, it’s 
done in `i40e_sync_vsi_filters()`. Thank you again.

>>> Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key")
>>> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>>> ---
>>>    drivers/net/ethernet/intel/i40e/i40e.h         |  2 ++
>>>    drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  2 ++
>>>    drivers/net/ethernet/intel/i40e/i40e_main.c    | 12 ++++++++++--
>>>    3 files changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h
>> b/drivers/net/ethernet/intel/i40e/i40e.h
>>> index 2089a0e..a1842dd 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e.h
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
>>> @@ -755,6 +755,8 @@ enum i40e_filter_state {
>>>    	I40E_FILTER_ACTIVE,		/* Added to switch by FW */
>>>    	I40E_FILTER_FAILED,		/* Rejected by FW */
>>>    	I40E_FILTER_REMOVE,		/* To be removed */
>>> +	/* RESERVED */
>>
>> Why the reserved comment? Please elaborate in the commit message.
> 
> This is for not breaking compatibility with different driver
> versions. Between OOT, net-next and plain old net. Isn't it obvious
> from the comment, it's "RESERVERD"?

Apparently not, otherwise I wouldn’t ask. ;-)

> Can you provide me example commit message what I should follow?

There are people reading the code not familiar with the ecosystem, that 
there is an out of tree driver fore example. So the code or the commit 
message should have an explanation why `I40E_FILTER_NEW_SYNC = 6` and 
what the reserved is used for.

>>> +	I40E_FILTER_NEW_SYNC = 6,	/* New, not sent, in sync task */

Also mention the hash list in the comment?

>>>    /* There is no 'removed' state; the filter struct is freed */
>>>    };
>>>    struct i40e_mac_filter {
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
>>> index abf624d..1c439b1 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
>>> @@ -89,6 +89,8 @@ static char *i40e_filter_state_string[] = {
>>>    	"ACTIVE",
>>>    	"FAILED",
>>>    	"REMOVE",
>>> +	"<RESERVED>",
>>> +	"NEW_SYNC",
>>>    };
>>>
>>>    /**
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
>>> index 25295ae..55fb362 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>>> @@ -1255,6 +1255,7 @@ int i40e_count_filters(struct i40e_vsi *vsi)
>>>
>>>    	hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) {
>>>    		if (f->state == I40E_FILTER_NEW ||
>>> +		    f->state == I40E_FILTER_NEW_SYNC ||
>>>    		    f->state == I40E_FILTER_ACTIVE)
>>>    			++cnt;
>>>    	}
>>> @@ -1441,6 +1442,8 @@ static int i40e_correct_mac_vlan_filters(struct i40e_vsi *vsi,
>>>
>>>    			new->f = add_head;
>>>    			new->state = add_head->state;
>>> +			if (add_head->state == I40E_FILTER_NEW)
>>> +				add_head->state = I40E_FILTER_NEW_SYNC;
>>>
>>>    			/* Add the new filter to the tmp list */
>>>    			hlist_add_head(&new->hlist, tmp_add_list);
>>> @@ -1550,6 +1553,8 @@ static int i40e_correct_vf_mac_vlan_filters(struct i40e_vsi *vsi,
>>>    				return -ENOMEM;
>>>    			new_mac->f = add_head;
>>>    			new_mac->state = add_head->state;
>>> +			if (add_head->state == I40E_FILTER_NEW)
>>> +				add_head->state = I40E_FILTER_NEW_SYNC;
>>>
>>>    			/* Add the new filter to the tmp list */
>>>    			hlist_add_head(&new_mac->hlist, tmp_add_list);
>>> @@ -2437,7 +2442,8 @@ static int
>>>    i40e_aqc_broadcast_filter(struct i40e_vsi *vsi, const char *vsi_name,
>>>    			  struct i40e_mac_filter *f)
>>>    {
>>> -	bool enable = f->state == I40E_FILTER_NEW;
>>> +	bool enable = f->state == I40E_FILTER_NEW ||
>>> +		      f->state == I40E_FILTER_NEW_SYNC;
>>>    	struct i40e_hw *hw = &vsi->back->hw;
>>>    	int aq_ret;
>>>
>>> @@ -2611,6 +2617,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi)
>>>
>>>    				/* Add it to the hash list */
>>>    				hlist_add_head(&new->hlist, &tmp_add_list);
>>> +				f->state = I40E_FILTER_NEW_SYNC;
>>>    			}
>>>
>>>    			/* Count the number of active (current and new) VLAN
>>> @@ -2762,7 +2769,8 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi)
>>>    		spin_lock_bh(&vsi->mac_filter_hash_lock);
>>>    		hlist_for_each_entry_safe(new, h, &tmp_add_list, hlist) {
>>>    			/* Only update the state if we're still NEW */
>>> -			if (new->f->state == I40E_FILTER_NEW)
>>> +			if (new->f->state == I40E_FILTER_NEW ||
>>> +			    new->f->state == I40E_FILTER_NEW_SYNC)
>>>    				new->f->state = new->state;
>>>    			hlist_del(&new->hlist);
>>>    			netdev_hw_addr_refcnt(new->f, vsi->netdev, -1);
Thank you again for your work and explanations.


Kind regards,

Paul
Loktionov, Aleksandr Oct. 15, 2024, 11:05 a.m. UTC | #4
> -----Original Message-----
> From: Paul Menzel <pmenzel@molgen.mpg.de>
> Sent: Tuesday, October 15, 2024 12:33 PM
> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; netdev@vger.kernel.org
> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v1] i40e: fix "Error
> I40E_AQ_RC_ENOSPC adding RX filters on VF" issue
> 
> Dear Aleksandr,
> 
> 
> Thank you for your reply. Just a note at the beginning, that your
> mailer
> seems to wrap lines without preserving the right citation level. It’d
> be
> great if you used a mailer following standards. (I know, it’s hard in
> a
> corporate environment, but other Intel developers seem to have found
> solutions for this.)
> 
> 
> Am 15.10.24 um 11:45 schrieb Loktionov, Aleksandr:
> 
> >> -----Original Message-----
> >> From: Paul Menzel <pmenzel@molgen.mpg.de>
> >> Sent: Tuesday, October 15, 2024 10:24 AM
> 
> >> Thank you for your patch. For the summary I’d make it more about
> the
> >> action of the patch like “Add intermediate filter to …”.
> >
> > Sorry, I don't get your point. This patch fixes bug that can stop
> > vfs to receive any traffic making them useless. The first and most
> > visible effect of the bug is a lot of "Error I40E_AQ_RC_ENOSPC
> > adding RX filters on VF XX,..." errors from F/W complaining to add
> > MAC/VLAN filter. So I've mentioned it in the title to be easy found.
> > I don't add any filter in the driver, we have to add one more
> > intermediate state of the filter to avoid the race condition.
> 
> In my opinion, having the log in the body is good enough for search
> engines and the summary should be optimized for `git log --oneline`
> consumption. I am sorry about the confusion with my example. Maybe:
> 
> Add intermediate sync state to fix race condition
> 
I just wonder why do you insist on "ADD" which usually means implementing a new feature.
When this patch FIXes the bug? If I'd want to add a new feature I'd rather send my patch to net-next, isn't it?
For me 'fix race condition by adding filter's intermediate sync state '
Can you explain your strong argument for the Add? 

> >> Am 15.10.24 um 09:04 schrieb Aleksandr Loktionov:
> >>> Fix a race condition in the i40e driver that leads to MAC/VLAN
> filters
> >>> becoming corrupted and leaking. Address the issue that occurs
> under
> >>> heavy load when multiple threads are concurrently modifying
> MAC/VLAN
> >>> filters by setting mac and port VLAN.
> >>>
> >>> 1. Thread T0 allocates a filter in i40e_add_filter() within
> >>>           i40e_ndo_set_vf_port_vlan().
> >>> 2. Thread T1 concurrently frees the filter in __i40e_del_filter()
> within
> >>>           i40e_ndo_set_vf_mac().
> >>> 3. Subsequently, i40e_service_task() calls
> i40e_sync_vsi_filters(), which
> >>>           refers to the already freed filter memory, causing
> corruption.
> >>>
> >>> Reproduction steps:
> >>> 1. Spawn multiple VFs.
> >>> 2. Apply a concurrent heavy load by running parallel operations to
> change
> >>>           MAC addresses on the VFs and change port VLANs on the
> host.
> >>
> >> It’d be great if you shared your commands.
> 
> > Sorry, I'm pretty sure it's quite impossible to reproduce the issue
> > with bash scripts /*I've tried really hard*/. Reproduction is
> > related to user-space/kernel code which might be not open-sourced.
> > So as I've explained in the commit title the race condition
> > possibility that was introduced from the very beginning.
> 
> Could you please ask to get clearance to publish it. My naive view
> wonders, why legal(?) should oppose publication.
Simply the defect can come from development tools or from external customer.
And being tract as a commercial secret, which even doesn't belong to Intel sometimes.

> 
> >>> 3. Observe errors in dmesg:
> >>> "Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,
> >>>    please set promiscuous on manually for VF XX".
> >>
> >> I’d indent it by eight spaces and put it in one line.
> > Ok, I'll fix it in v2
> >
> >>> The fix involves implementing a new intermediate filter state,
> >>> I40E_FILTER_NEW_SYNC, for the time when a filter is on a
> tmp_add_list.
> >>> These filters cannot be deleted from the hash list directly but
> >>> must be removed using the full process.
> >>
> >> Please excuse my ignorance. Where is that done in the diff? For me
> it
> >> looks like you only replace `I40E_FILTER_NEW` by
> `I40E_FILTER_NEW_SYNC`
> >> in certain places, but no new condition for this case.
> >
> > Here are below the code which adds new I40E_FILTER_NEW_SYNC enum.
> > And additional conditions for this I40E_FILTER_NEW_SYNC state. All
> > other places in the driver just tract I40E_FILTER_NEW_SYNC as not
> > just I40E_FILTER_NEW by default.
> Thank you. For me it’s not so obvious from the diff, and indeed, it’s
> done in `i40e_sync_vsi_filters()`. Thank you again.
> 
> >>> Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with
> the MAC Address as key")
> >>> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> >>> ---
> >>>    drivers/net/ethernet/intel/i40e/i40e.h         |  2 ++
> >>>    drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  2 ++
> >>>    drivers/net/ethernet/intel/i40e/i40e_main.c    | 12 ++++++++++-
> -
> >>>    3 files changed, 14 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h
> >> b/drivers/net/ethernet/intel/i40e/i40e.h
> >>> index 2089a0e..a1842dd 100644
> >>> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> >>> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> >>> @@ -755,6 +755,8 @@ enum i40e_filter_state {
> >>>    	I40E_FILTER_ACTIVE,		/* Added to switch by FW
> */
> >>>    	I40E_FILTER_FAILED,		/* Rejected by FW */
> >>>    	I40E_FILTER_REMOVE,		/* To be removed */
> >>> +	/* RESERVED */
> >>
> >> Why the reserved comment? Please elaborate in the commit message.
> >
> > This is for not breaking compatibility with different driver
> > versions. Between OOT, net-next and plain old net. Isn't it obvious
> > from the comment, it's "RESERVERD"?
> 
> Apparently not, otherwise I wouldn’t ask. ;-)
> 
> > Can you provide me example commit message what I should follow?
> 
> There are people reading the code not familiar with the ecosystem,
> that
> there is an out of tree driver fore example. So the code or the commit
> message should have an explanation why `I40E_FILTER_NEW_SYNC = 6` and
> what the reserved is used for.
> 
> >>> +	I40E_FILTER_NEW_SYNC = 6,	/* New, not sent, in sync task */
> 
> Also mention the hash list in the comment?
> 
> >>>    /* There is no 'removed' state; the filter struct is freed */
> >>>    };
> >>>    struct i40e_mac_filter {
> >>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> >>> index abf624d..1c439b1 100644
> >>> --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> >>> +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> >>> @@ -89,6 +89,8 @@ static char *i40e_filter_state_string[] = {
> >>>    	"ACTIVE",
> >>>    	"FAILED",
> >>>    	"REMOVE",
> >>> +	"<RESERVED>",
> >>> +	"NEW_SYNC",
> >>>    };
> >>>
> >>>    /**
> >>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> >>> index 25295ae..55fb362 100644
> >>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> >>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> >>> @@ -1255,6 +1255,7 @@ int i40e_count_filters(struct i40e_vsi *vsi)
> >>>
> >>>    	hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist)
> {
> >>>    		if (f->state == I40E_FILTER_NEW ||
> >>> +		    f->state == I40E_FILTER_NEW_SYNC ||
> >>>    		    f->state == I40E_FILTER_ACTIVE)
> >>>    			++cnt;
> >>>    	}
> >>> @@ -1441,6 +1442,8 @@ static int
> i40e_correct_mac_vlan_filters(struct i40e_vsi *vsi,
> >>>
> >>>    			new->f = add_head;
> >>>    			new->state = add_head->state;
> >>> +			if (add_head->state == I40E_FILTER_NEW)
> >>> +				add_head->state = I40E_FILTER_NEW_SYNC;
> >>>
> >>>    			/* Add the new filter to the tmp list */
> >>>    			hlist_add_head(&new->hlist, tmp_add_list);
> >>> @@ -1550,6 +1553,8 @@ static int
> i40e_correct_vf_mac_vlan_filters(struct i40e_vsi *vsi,
> >>>    				return -ENOMEM;
> >>>    			new_mac->f = add_head;
> >>>    			new_mac->state = add_head->state;
> >>> +			if (add_head->state == I40E_FILTER_NEW)
> >>> +				add_head->state = I40E_FILTER_NEW_SYNC;
> >>>
> >>>    			/* Add the new filter to the tmp list */
> >>>    			hlist_add_head(&new_mac->hlist,
> tmp_add_list);
> >>> @@ -2437,7 +2442,8 @@ static int
> >>>    i40e_aqc_broadcast_filter(struct i40e_vsi *vsi, const char
> *vsi_name,
> >>>    			  struct i40e_mac_filter *f)
> >>>    {
> >>> -	bool enable = f->state == I40E_FILTER_NEW;
> >>> +	bool enable = f->state == I40E_FILTER_NEW ||
> >>> +		      f->state == I40E_FILTER_NEW_SYNC;
> >>>    	struct i40e_hw *hw = &vsi->back->hw;
> >>>    	int aq_ret;
> >>>
> >>> @@ -2611,6 +2617,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi
> *vsi)
> >>>
> >>>    				/* Add it to the hash list */
> >>>    				hlist_add_head(&new->hlist,
> &tmp_add_list);
> >>> +				f->state = I40E_FILTER_NEW_SYNC;
> >>>    			}
> >>>
> >>>    			/* Count the number of active (current and
> new) VLAN
> >>> @@ -2762,7 +2769,8 @@ int i40e_sync_vsi_filters(struct i40e_vsi
> *vsi)
> >>>    		spin_lock_bh(&vsi->mac_filter_hash_lock);
> >>>    		hlist_for_each_entry_safe(new, h, &tmp_add_list,
> hlist) {
> >>>    			/* Only update the state if we're still NEW
> */
> >>> -			if (new->f->state == I40E_FILTER_NEW)
> >>> +			if (new->f->state == I40E_FILTER_NEW ||
> >>> +			    new->f->state == I40E_FILTER_NEW_SYNC)
> >>>    				new->f->state = new->state;
> >>>    			hlist_del(&new->hlist);
> >>>    			netdev_hw_addr_refcnt(new->f, vsi->netdev, -
> 1);
> Thank you again for your work and explanations.
> 
> 
> Kind regards,
> 
> Paul
Paul Menzel Oct. 16, 2024, 5:28 a.m. UTC | #5
Dear Aleksandr,


Am 15.10.24 um 13:05 schrieb Aleksandr Loktionov:

>> -----Original Message-----
>> From: Paul Menzel <pmenzel@molgen.mpg.de>
>> Sent: Tuesday, October 15, 2024 12:33 PM

[…]

>> Am 15.10.24 um 11:45 schrieb Loktionov, Aleksandr:
>>
>>>> -----Original Message-----
>>>> From: Paul Menzel <pmenzel@molgen.mpg.de>
>>>> Sent: Tuesday, October 15, 2024 10:24 AM
>>
>>>> Thank you for your patch. For the summary I’d make it more about the
>>>> action of the patch like “Add intermediate filter to …”.
>>>
>>> Sorry, I don't get your point. This patch fixes bug that can stop
>>> vfs to receive any traffic making them useless. The first and most
>>> visible effect of the bug is a lot of "Error I40E_AQ_RC_ENOSPC
>>> adding RX filters on VF XX,..." errors from F/W complaining to add
>>> MAC/VLAN filter. So I've mentioned it in the title to be easy found.
>>> I don't add any filter in the driver, we have to add one more
>>> intermediate state of the filter to avoid the race condition.
>>
>> In my opinion, having the log in the body is good enough for search
>> engines and the summary should be optimized for `git log --oneline`
>> consumption. I am sorry about the confusion with my example. Maybe:
>>
>> Add intermediate sync state to fix race condition
>
> I just wonder why do you insist on "ADD" which usually means
> implementing a new feature. When this patch FIXes the bug?

Some projects use this interpretation/structure, but to my knowledge not 
the Linux kernel.

> If I'd want to add a new feature I'd rather send my patch to net-next,
> isn't it?

*Add*, how I used it, does not imply a new feature addition.

> For me 'fix race condition by adding filter's intermediate sync
> state'
> Can you explain your strong argument for the Add?
I am fine with your suggestion, and do not oppose to start with *Fix*. 
Also, thank you for your elaborate answer, so I could understand the 
point of view you came from.
>>>> Am 15.10.24 um 09:04 schrieb Aleksandr Loktionov:
>>>>> Fix a race condition in the i40e driver that leads to MAC/VLAN filters
>>>>> becoming corrupted and leaking. Address the issue that occurs under
>>>>> heavy load when multiple threads are concurrently modifying MAC/VLAN
>>>>> filters by setting mac and port VLAN.
>>>>>
>>>>> 1. Thread T0 allocates a filter in i40e_add_filter() within
>>>>>            i40e_ndo_set_vf_port_vlan().
>>>>> 2. Thread T1 concurrently frees the filter in __i40e_del_filter() within
>>>>>            i40e_ndo_set_vf_mac().
>>>>> 3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which
>>>>>            refers to the already freed filter memory, causing corruption.
>>>>>
>>>>> Reproduction steps:
>>>>> 1. Spawn multiple VFs.
>>>>> 2. Apply a concurrent heavy load by running parallel operations to change
>>>>>            MAC addresses on the VFs and change port VLANs on the host.
>>>>
>>>> It’d be great if you shared your commands.
>>
>>> Sorry, I'm pretty sure it's quite impossible to reproduce the issue
>>> with bash scripts /*I've tried really hard*/. Reproduction is
>>> related to user-space/kernel code which might be not open-sourced.
>>> So as I've explained in the commit title the race condition
>>> possibility that was introduced from the very beginning.
>>
>> Could you please ask to get clearance to publish it. My naive view
>> wonders, why legal(?) should oppose publication.
> 
> Simply the defect can come from development tools or from external
> customer. And being tract as a commercial secret, which even doesn't
> belong to Intel sometimes.
I know, that these are general problems. But in this specific case you 
should know the circumstances, and be able to document it in the commit 
message, why a test case cannot be published.


Kind regards,

Paul
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 2089a0e..a1842dd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -755,6 +755,8 @@  enum i40e_filter_state {
 	I40E_FILTER_ACTIVE,		/* Added to switch by FW */
 	I40E_FILTER_FAILED,		/* Rejected by FW */
 	I40E_FILTER_REMOVE,		/* To be removed */
+	/* RESERVED */
+	I40E_FILTER_NEW_SYNC = 6,	/* New, not sent, in sync task */
 /* There is no 'removed' state; the filter struct is freed */
 };
 struct i40e_mac_filter {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index abf624d..1c439b1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -89,6 +89,8 @@  static char *i40e_filter_state_string[] = {
 	"ACTIVE",
 	"FAILED",
 	"REMOVE",
+	"<RESERVED>",
+	"NEW_SYNC",
 };
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 25295ae..55fb362 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1255,6 +1255,7 @@  int i40e_count_filters(struct i40e_vsi *vsi)
 
 	hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) {
 		if (f->state == I40E_FILTER_NEW ||
+		    f->state == I40E_FILTER_NEW_SYNC ||
 		    f->state == I40E_FILTER_ACTIVE)
 			++cnt;
 	}
@@ -1441,6 +1442,8 @@  static int i40e_correct_mac_vlan_filters(struct i40e_vsi *vsi,
 
 			new->f = add_head;
 			new->state = add_head->state;
+			if (add_head->state == I40E_FILTER_NEW)
+				add_head->state = I40E_FILTER_NEW_SYNC;
 
 			/* Add the new filter to the tmp list */
 			hlist_add_head(&new->hlist, tmp_add_list);
@@ -1550,6 +1553,8 @@  static int i40e_correct_vf_mac_vlan_filters(struct i40e_vsi *vsi,
 				return -ENOMEM;
 			new_mac->f = add_head;
 			new_mac->state = add_head->state;
+			if (add_head->state == I40E_FILTER_NEW)
+				add_head->state = I40E_FILTER_NEW_SYNC;
 
 			/* Add the new filter to the tmp list */
 			hlist_add_head(&new_mac->hlist, tmp_add_list);
@@ -2437,7 +2442,8 @@  static int
 i40e_aqc_broadcast_filter(struct i40e_vsi *vsi, const char *vsi_name,
 			  struct i40e_mac_filter *f)
 {
-	bool enable = f->state == I40E_FILTER_NEW;
+	bool enable = f->state == I40E_FILTER_NEW ||
+		      f->state == I40E_FILTER_NEW_SYNC;
 	struct i40e_hw *hw = &vsi->back->hw;
 	int aq_ret;
 
@@ -2611,6 +2617,7 @@  int i40e_sync_vsi_filters(struct i40e_vsi *vsi)
 
 				/* Add it to the hash list */
 				hlist_add_head(&new->hlist, &tmp_add_list);
+				f->state = I40E_FILTER_NEW_SYNC;
 			}
 
 			/* Count the number of active (current and new) VLAN
@@ -2762,7 +2769,8 @@  int i40e_sync_vsi_filters(struct i40e_vsi *vsi)
 		spin_lock_bh(&vsi->mac_filter_hash_lock);
 		hlist_for_each_entry_safe(new, h, &tmp_add_list, hlist) {
 			/* Only update the state if we're still NEW */
-			if (new->f->state == I40E_FILTER_NEW)
+			if (new->f->state == I40E_FILTER_NEW ||
+			    new->f->state == I40E_FILTER_NEW_SYNC)
 				new->f->state = new->state;
 			hlist_del(&new->hlist);
 			netdev_hw_addr_refcnt(new->f, vsi->netdev, -1);