Message ID | 1588764279-12166-1-git-send-email-paulb@mellanox.com |
---|---|
State | Awaiting Upstream |
Delegated to: | David Miller |
Headers | show |
Series | [net] netfilter: flowtable: Add pending bit for offload work | expand |
Hi, On Wed, May 06, 2020 at 02:24:39PM +0300, Paul Blakey wrote: > Gc step can queue offloaded flow del work or stats work. > Those work items can race each other and a flow could be freed > before the stats work is executed and querying it. > To avoid that, add a pending bit that if a work exists for a flow > don't queue another work for it. > This will also avoid adding multiple stats works in case stats work > didn't complete but gc step started again. This is happening since the mutex has been removed, right? Another question below. > Signed-off-by: Paul Blakey <paulb@mellanox.com> > Reviewed-by: Roi Dayan <roid@mellanox.com> > --- > include/net/netfilter/nf_flow_table.h | 1 + > net/netfilter/nf_flow_table_offload.c | 8 +++++++- > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h > index 6bf6965..c54a7f7 100644 > --- a/include/net/netfilter/nf_flow_table.h > +++ b/include/net/netfilter/nf_flow_table.h > @@ -127,6 +127,7 @@ enum nf_flow_flags { > NF_FLOW_HW_DYING, > NF_FLOW_HW_DEAD, > NF_FLOW_HW_REFRESH, > + NF_FLOW_HW_PENDING, > }; > > enum flow_offload_type { > diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c > index b9d5ecc..731d738 100644 > --- a/net/netfilter/nf_flow_table_offload.c > +++ b/net/netfilter/nf_flow_table_offload.c > @@ -817,6 +817,7 @@ static void flow_offload_work_handler(struct work_struct *work) > WARN_ON_ONCE(1); > } > > + clear_bit(NF_FLOW_HW_PENDING, &offload->flow->flags); > kfree(offload); > } > > @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload) > { > struct flow_offload_work *offload; > > + if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags)) > + return NULL; In case of stats, it's fine to lose work. But how does this work for the deletion case? Does this falls back to the timeout deletion? Thanks.
On 5/11/2020 1:14 AM, Pablo Neira Ayuso wrote: > Hi, > > On Wed, May 06, 2020 at 02:24:39PM +0300, Paul Blakey wrote: >> Gc step can queue offloaded flow del work or stats work. >> Those work items can race each other and a flow could be freed >> before the stats work is executed and querying it. >> To avoid that, add a pending bit that if a work exists for a flow >> don't queue another work for it. >> This will also avoid adding multiple stats works in case stats work >> didn't complete but gc step started again. > This is happening since the mutex has been removed, right? > > Another question below. it's from the using a new workqueue and one work per item, allowing parallelization between a flow work items. > >> Signed-off-by: Paul Blakey <paulb@mellanox.com> >> Reviewed-by: Roi Dayan <roid@mellanox.com> >> --- >> include/net/netfilter/nf_flow_table.h | 1 + >> net/netfilter/nf_flow_table_offload.c | 8 +++++++- >> 2 files changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h >> index 6bf6965..c54a7f7 100644 >> --- a/include/net/netfilter/nf_flow_table.h >> +++ b/include/net/netfilter/nf_flow_table.h >> @@ -127,6 +127,7 @@ enum nf_flow_flags { >> NF_FLOW_HW_DYING, >> NF_FLOW_HW_DEAD, >> NF_FLOW_HW_REFRESH, >> + NF_FLOW_HW_PENDING, >> }; >> >> enum flow_offload_type { >> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c >> index b9d5ecc..731d738 100644 >> --- a/net/netfilter/nf_flow_table_offload.c >> +++ b/net/netfilter/nf_flow_table_offload.c >> @@ -817,6 +817,7 @@ static void flow_offload_work_handler(struct work_struct *work) >> WARN_ON_ONCE(1); >> } >> >> + clear_bit(NF_FLOW_HW_PENDING, &offload->flow->flags); >> kfree(offload); >> } >> >> @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload) >> { >> struct flow_offload_work *offload; >> >> + if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags)) >> + return NULL; > In case of stats, it's fine to lose work. > > But how does this work for the deletion case? Does this falls back to > the timeout deletion? We get to nf_flow_table_offload_del (delete) in these cases: >-------if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) || >------- test_bit(NF_FLOW_TEARDOWN, &flow->flags) { >------->------- .... >------->------- nf_flow_offload_del(flow_table, flow); Which are all persistent once set but the nf_flow_has_expired(flow). So we will try the delete again and again till pending flag is unset or the flow is 'saved' by the already queued stats updating the timeout. A pending stats update can't save the flow once it's marked for teardown or (flow->ct is dying), only delay it. We didn't mention flush, like in table free. I guess we need to flush the hardware workqueue of any pending stats work, then queue the deletion, and flush again: Adding nf_flow_table_offload_flush(flow_table), after cancel_delayed_work_sync(&flow_table->gc_work); > > Thanks.
On Mon, May 11, 2020 at 11:32:36AM +0300, Paul Blakey wrote: > On 5/11/2020 1:14 AM, Pablo Neira Ayuso wrote: [...] > >> @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload) > >> { > >> struct flow_offload_work *offload; > >> > >> + if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags)) > >> + return NULL; > > In case of stats, it's fine to lose work. > > > > But how does this work for the deletion case? Does this falls back to > > the timeout deletion? > > We get to nf_flow_table_offload_del (delete) in these cases: > > >-------if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) || > >------- test_bit(NF_FLOW_TEARDOWN, &flow->flags) { > >------->------- .... > >------->------- nf_flow_offload_del(flow_table, flow); > > Which are all persistent once set but the nf_flow_has_expired(flow). So we will > try the delete > again and again till pending flag is unset or the flow is 'saved' by the already > queued stats updating the timeout. > A pending stats update can't save the flow once it's marked for teardown or > (flow->ct is dying), only delay it. Thanks for explaining. > We didn't mention flush, like in table free. I guess we need to flush the > hardware workqueue > of any pending stats work, then queue the deletion, and flush again: > Adding nf_flow_table_offload_flush(flow_table), after > cancel_delayed_work_sync(&flow_table->gc_work); The "flush" makes sure that stats work runs before the deletion, to ensure no races happen for in-transit work objects, right? We might use alloc_ordered_workqueue() and let the workqueue handle this problem?
On 2020-05-11 2:59 PM, Pablo Neira Ayuso wrote: > On Mon, May 11, 2020 at 11:32:36AM +0300, Paul Blakey wrote: >> On 5/11/2020 1:14 AM, Pablo Neira Ayuso wrote: > [...] >>>> @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload) >>>> { >>>> struct flow_offload_work *offload; >>>> >>>> + if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags)) >>>> + return NULL; >>> In case of stats, it's fine to lose work. >>> >>> But how does this work for the deletion case? Does this falls back to >>> the timeout deletion? >> >> We get to nf_flow_table_offload_del (delete) in these cases: >> >>> -------if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) || >>> ------- test_bit(NF_FLOW_TEARDOWN, &flow->flags) { >>> ------->------- .... >>> ------->------- nf_flow_offload_del(flow_table, flow); >> >> Which are all persistent once set but the nf_flow_has_expired(flow). So we will >> try the delete >> again and again till pending flag is unset or the flow is 'saved' by the already >> queued stats updating the timeout. >> A pending stats update can't save the flow once it's marked for teardown or >> (flow->ct is dying), only delay it. > > Thanks for explaining. > >> We didn't mention flush, like in table free. I guess we need to flush the >> hardware workqueue >> of any pending stats work, then queue the deletion, and flush again: >> Adding nf_flow_table_offload_flush(flow_table), after >> cancel_delayed_work_sync(&flow_table->gc_work); > > The "flush" makes sure that stats work runs before the deletion, to > ensure no races happen for in-transit work objects, right? > > We might use alloc_ordered_workqueue() and let the workqueue handle > this problem? > ordered workqueue executed one work at a time.
On 2020-05-11 2:59 PM, Pablo Neira Ayuso wrote: > On Mon, May 11, 2020 at 11:32:36AM +0300, Paul Blakey wrote: >> On 5/11/2020 1:14 AM, Pablo Neira Ayuso wrote: > [...] >>>> @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload) >>>> { >>>> struct flow_offload_work *offload; >>>> >>>> + if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags)) >>>> + return NULL; >>> In case of stats, it's fine to lose work. >>> >>> But how does this work for the deletion case? Does this falls back to >>> the timeout deletion? >> >> We get to nf_flow_table_offload_del (delete) in these cases: >> >>> -------if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) || >>> ------- test_bit(NF_FLOW_TEARDOWN, &flow->flags) { >>> ------->------- .... >>> ------->------- nf_flow_offload_del(flow_table, flow); >> >> Which are all persistent once set but the nf_flow_has_expired(flow). So we will >> try the delete >> again and again till pending flag is unset or the flow is 'saved' by the already >> queued stats updating the timeout. >> A pending stats update can't save the flow once it's marked for teardown or >> (flow->ct is dying), only delay it. > > Thanks for explaining. > >> We didn't mention flush, like in table free. I guess we need to flush the >> hardware workqueue >> of any pending stats work, then queue the deletion, and flush again: >> Adding nf_flow_table_offload_flush(flow_table), after >> cancel_delayed_work_sync(&flow_table->gc_work); > > The "flush" makes sure that stats work runs before the deletion, to > ensure no races happen for in-transit work objects, right? > > We might use alloc_ordered_workqueue() and let the workqueue handle > this problem? > ordered workqueue executes one work at a time.
On Wed, May 06, 2020 at 02:24:39PM +0300, Paul Blakey wrote: > Gc step can queue offloaded flow del work or stats work. > Those work items can race each other and a flow could be freed > before the stats work is executed and querying it. > To avoid that, add a pending bit that if a work exists for a flow > don't queue another work for it. > This will also avoid adding multiple stats works in case stats work > didn't complete but gc step started again. Applied to nf, thanks.
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 6bf6965..c54a7f7 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -127,6 +127,7 @@ enum nf_flow_flags { NF_FLOW_HW_DYING, NF_FLOW_HW_DEAD, NF_FLOW_HW_REFRESH, + NF_FLOW_HW_PENDING, }; enum flow_offload_type { diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index b9d5ecc..731d738 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -817,6 +817,7 @@ static void flow_offload_work_handler(struct work_struct *work) WARN_ON_ONCE(1); } + clear_bit(NF_FLOW_HW_PENDING, &offload->flow->flags); kfree(offload); } @@ -831,9 +832,14 @@ static void flow_offload_queue_work(struct flow_offload_work *offload) { struct flow_offload_work *offload; + if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags)) + return NULL; + offload = kmalloc(sizeof(struct flow_offload_work), GFP_ATOMIC); - if (!offload) + if (!offload) { + clear_bit(NF_FLOW_HW_PENDING, &flow->flags); return NULL; + } offload->cmd = cmd; offload->flow = flow;