diff mbox series

[net] netfilter: flowtable: Add pending bit for offload work

Message ID 1588764279-12166-1-git-send-email-paulb@mellanox.com
State Awaiting Upstream
Delegated to: David Miller
Headers show
Series [net] netfilter: flowtable: Add pending bit for offload work | expand

Commit Message

Paul Blakey May 6, 2020, 11:24 a.m. UTC
Gc step can queue offloaded flow del work or stats work.
Those work items can race each other and a flow could be freed
before the stats work is executed and querying it.
To avoid that, add a pending bit that if a work exists for a flow
don't queue another work for it.
This will also avoid adding multiple stats works in case stats work
didn't complete but gc step started again.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
---
 include/net/netfilter/nf_flow_table.h | 1 +
 net/netfilter/nf_flow_table_offload.c | 8 +++++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Pablo Neira Ayuso May 10, 2020, 10:14 p.m. UTC | #1
Hi,

On Wed, May 06, 2020 at 02:24:39PM +0300, Paul Blakey wrote:
> Gc step can queue offloaded flow del work or stats work.
> Those work items can race each other and a flow could be freed
> before the stats work is executed and querying it.
> To avoid that, add a pending bit that if a work exists for a flow
> don't queue another work for it.
> This will also avoid adding multiple stats works in case stats work
> didn't complete but gc step started again.

This is happening since the mutex has been removed, right?

Another question below.

> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> Reviewed-by: Roi Dayan <roid@mellanox.com>
> ---
>  include/net/netfilter/nf_flow_table.h | 1 +
>  net/netfilter/nf_flow_table_offload.c | 8 +++++++-
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
> index 6bf6965..c54a7f7 100644
> --- a/include/net/netfilter/nf_flow_table.h
> +++ b/include/net/netfilter/nf_flow_table.h
> @@ -127,6 +127,7 @@ enum nf_flow_flags {
>  	NF_FLOW_HW_DYING,
>  	NF_FLOW_HW_DEAD,
>  	NF_FLOW_HW_REFRESH,
> +	NF_FLOW_HW_PENDING,
>  };
>  
>  enum flow_offload_type {
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index b9d5ecc..731d738 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -817,6 +817,7 @@ static void flow_offload_work_handler(struct work_struct *work)
>  			WARN_ON_ONCE(1);
>  	}
>  
> +	clear_bit(NF_FLOW_HW_PENDING, &offload->flow->flags);
>  	kfree(offload);
>  }
>  
> @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload)
>  {
>  	struct flow_offload_work *offload;
>  
> +	if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
> +		return NULL;

In case of stats, it's fine to lose work.

But how does this work for the deletion case? Does this falls back to
the timeout deletion?

Thanks.
Paul Blakey May 11, 2020, 8:32 a.m. UTC | #2
On 5/11/2020 1:14 AM, Pablo Neira Ayuso wrote:
> Hi,
>
> On Wed, May 06, 2020 at 02:24:39PM +0300, Paul Blakey wrote:
>> Gc step can queue offloaded flow del work or stats work.
>> Those work items can race each other and a flow could be freed
>> before the stats work is executed and querying it.
>> To avoid that, add a pending bit that if a work exists for a flow
>> don't queue another work for it.
>> This will also avoid adding multiple stats works in case stats work
>> didn't complete but gc step started again.
> This is happening since the mutex has been removed, right?
>
> Another question below.
it's from the using a new workqueue and one work per item, allowing parallelization
between a flow work items.


>
>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>> Reviewed-by: Roi Dayan <roid@mellanox.com>
>> ---
>>  include/net/netfilter/nf_flow_table.h | 1 +
>>  net/netfilter/nf_flow_table_offload.c | 8 +++++++-
>>  2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
>> index 6bf6965..c54a7f7 100644
>> --- a/include/net/netfilter/nf_flow_table.h
>> +++ b/include/net/netfilter/nf_flow_table.h
>> @@ -127,6 +127,7 @@ enum nf_flow_flags {
>>  	NF_FLOW_HW_DYING,
>>  	NF_FLOW_HW_DEAD,
>>  	NF_FLOW_HW_REFRESH,
>> +	NF_FLOW_HW_PENDING,
>>  };
>>  
>>  enum flow_offload_type {
>> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
>> index b9d5ecc..731d738 100644
>> --- a/net/netfilter/nf_flow_table_offload.c
>> +++ b/net/netfilter/nf_flow_table_offload.c
>> @@ -817,6 +817,7 @@ static void flow_offload_work_handler(struct work_struct *work)
>>  			WARN_ON_ONCE(1);
>>  	}
>>  
>> +	clear_bit(NF_FLOW_HW_PENDING, &offload->flow->flags);
>>  	kfree(offload);
>>  }
>>  
>> @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload)
>>  {
>>  	struct flow_offload_work *offload;
>>  
>> +	if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
>> +		return NULL;
> In case of stats, it's fine to lose work.
>
> But how does this work for the deletion case? Does this falls back to
> the timeout deletion?

We get to nf_flow_table_offload_del (delete) in these cases:

>-------if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) ||
>-------    test_bit(NF_FLOW_TEARDOWN, &flow->flags) {
>------->-------   ....
>------->-------    nf_flow_offload_del(flow_table, flow);

Which are all persistent once set but the nf_flow_has_expired(flow). So we will
try the delete
again and again till pending flag is unset or the flow is 'saved' by the already
queued stats updating the timeout.
A pending stats update can't save the flow once it's marked for teardown or
(flow->ct is dying), only delay it.

We didn't mention flush, like in table free. I guess we need to flush the
hardware workqueue
of any pending stats work, then queue the deletion, and flush again:

Adding nf_flow_table_offload_flush(flow_table), after
cancel_delayed_work_sync(&flow_table->gc_work);



>
> Thanks.
Pablo Neira Ayuso May 11, 2020, 11:59 a.m. UTC | #3
On Mon, May 11, 2020 at 11:32:36AM +0300, Paul Blakey wrote:
> On 5/11/2020 1:14 AM, Pablo Neira Ayuso wrote:
[...]
> >> @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload)
> >>  {
> >>  	struct flow_offload_work *offload;
> >>  
> >> +	if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
> >> +		return NULL;
> > In case of stats, it's fine to lose work.
> >
> > But how does this work for the deletion case? Does this falls back to
> > the timeout deletion?
> 
> We get to nf_flow_table_offload_del (delete) in these cases:
> 
> >-------if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) ||
> >-------    test_bit(NF_FLOW_TEARDOWN, &flow->flags) {
> >------->-------   ....
> >------->-------    nf_flow_offload_del(flow_table, flow);
> 
> Which are all persistent once set but the nf_flow_has_expired(flow). So we will
> try the delete
> again and again till pending flag is unset or the flow is 'saved' by the already
> queued stats updating the timeout.
> A pending stats update can't save the flow once it's marked for teardown or
> (flow->ct is dying), only delay it.

Thanks for explaining.

> We didn't mention flush, like in table free. I guess we need to flush the
> hardware workqueue
> of any pending stats work, then queue the deletion, and flush again:
> Adding nf_flow_table_offload_flush(flow_table), after
> cancel_delayed_work_sync(&flow_table->gc_work);

The "flush" makes sure that stats work runs before the deletion, to
ensure no races happen for in-transit work objects, right?

We might use alloc_ordered_workqueue() and let the workqueue handle
this problem?
Roi Dayan May 11, 2020, 1:57 p.m. UTC | #4
On 2020-05-11 2:59 PM, Pablo Neira Ayuso wrote:
> On Mon, May 11, 2020 at 11:32:36AM +0300, Paul Blakey wrote:
>> On 5/11/2020 1:14 AM, Pablo Neira Ayuso wrote:
> [...]
>>>> @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload)
>>>>  {
>>>>  	struct flow_offload_work *offload;
>>>>  
>>>> +	if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
>>>> +		return NULL;
>>> In case of stats, it's fine to lose work.
>>>
>>> But how does this work for the deletion case? Does this falls back to
>>> the timeout deletion?
>>
>> We get to nf_flow_table_offload_del (delete) in these cases:
>>
>>> -------if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) ||
>>> -------    test_bit(NF_FLOW_TEARDOWN, &flow->flags) {
>>> ------->-------   ....
>>> ------->-------    nf_flow_offload_del(flow_table, flow);
>>
>> Which are all persistent once set but the nf_flow_has_expired(flow). So we will
>> try the delete
>> again and again till pending flag is unset or the flow is 'saved' by the already
>> queued stats updating the timeout.
>> A pending stats update can't save the flow once it's marked for teardown or
>> (flow->ct is dying), only delay it.
> 
> Thanks for explaining.
> 
>> We didn't mention flush, like in table free. I guess we need to flush the
>> hardware workqueue
>> of any pending stats work, then queue the deletion, and flush again:
>> Adding nf_flow_table_offload_flush(flow_table), after
>> cancel_delayed_work_sync(&flow_table->gc_work);
> 
> The "flush" makes sure that stats work runs before the deletion, to
> ensure no races happen for in-transit work objects, right?
> 
> We might use alloc_ordered_workqueue() and let the workqueue handle
> this problem?
> 

ordered workqueue executed one work at a time.
Roi Dayan May 11, 2020, 1:58 p.m. UTC | #5
On 2020-05-11 2:59 PM, Pablo Neira Ayuso wrote:
> On Mon, May 11, 2020 at 11:32:36AM +0300, Paul Blakey wrote:
>> On 5/11/2020 1:14 AM, Pablo Neira Ayuso wrote:
> [...]
>>>> @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload)
>>>>  {
>>>>  	struct flow_offload_work *offload;
>>>>  
>>>> +	if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
>>>> +		return NULL;
>>> In case of stats, it's fine to lose work.
>>>
>>> But how does this work for the deletion case? Does this falls back to
>>> the timeout deletion?
>>
>> We get to nf_flow_table_offload_del (delete) in these cases:
>>
>>> -------if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) ||
>>> -------    test_bit(NF_FLOW_TEARDOWN, &flow->flags) {
>>> ------->-------   ....
>>> ------->-------    nf_flow_offload_del(flow_table, flow);
>>
>> Which are all persistent once set but the nf_flow_has_expired(flow). So we will
>> try the delete
>> again and again till pending flag is unset or the flow is 'saved' by the already
>> queued stats updating the timeout.
>> A pending stats update can't save the flow once it's marked for teardown or
>> (flow->ct is dying), only delay it.
> 
> Thanks for explaining.
> 
>> We didn't mention flush, like in table free. I guess we need to flush the
>> hardware workqueue
>> of any pending stats work, then queue the deletion, and flush again:
>> Adding nf_flow_table_offload_flush(flow_table), after
>> cancel_delayed_work_sync(&flow_table->gc_work);
> 
> The "flush" makes sure that stats work runs before the deletion, to
> ensure no races happen for in-transit work objects, right?
> 
> We might use alloc_ordered_workqueue() and let the workqueue handle
> this problem?
> 

ordered workqueue executes one work at a time.
Pablo Neira Ayuso May 11, 2020, 2:37 p.m. UTC | #6
On Wed, May 06, 2020 at 02:24:39PM +0300, Paul Blakey wrote:
> Gc step can queue offloaded flow del work or stats work.
> Those work items can race each other and a flow could be freed
> before the stats work is executed and querying it.
> To avoid that, add a pending bit that if a work exists for a flow
> don't queue another work for it.
> This will also avoid adding multiple stats works in case stats work
> didn't complete but gc step started again.

Applied to nf, thanks.
diff mbox series

Patch

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 6bf6965..c54a7f7 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -127,6 +127,7 @@  enum nf_flow_flags {
 	NF_FLOW_HW_DYING,
 	NF_FLOW_HW_DEAD,
 	NF_FLOW_HW_REFRESH,
+	NF_FLOW_HW_PENDING,
 };
 
 enum flow_offload_type {
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index b9d5ecc..731d738 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -817,6 +817,7 @@  static void flow_offload_work_handler(struct work_struct *work)
 			WARN_ON_ONCE(1);
 	}
 
+	clear_bit(NF_FLOW_HW_PENDING, &offload->flow->flags);
 	kfree(offload);
 }
 
@@ -831,9 +832,14 @@  static void flow_offload_queue_work(struct flow_offload_work *offload)
 {
 	struct flow_offload_work *offload;
 
+	if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
+		return NULL;
+
 	offload = kmalloc(sizeof(struct flow_offload_work), GFP_ATOMIC);
-	if (!offload)
+	if (!offload) {
+		clear_bit(NF_FLOW_HW_PENDING, &flow->flags);
 		return NULL;
+	}
 
 	offload->cmd = cmd;
 	offload->flow = flow;