Message ID | 1588764449-12706-1-git-send-email-paulb@mellanox.com |
---|---|
State | Awaiting Upstream |
Delegated to: | David Miller |
Headers | show |
Series | [net] netfilter: flowtable: Fix expired flow not being deleted from software | expand |
On Wed, May 06, 2020 at 02:27:29PM +0300, Paul Blakey wrote: > Once a flow is considered expired, it is marked as DYING, and > scheduled a delete from hardware. The flow will be deleted from > software, in the next gc_step after hardware deletes the flow > (and flow is marked DEAD). Till that happens, the flow's timeout > might be updated from a previous scheduled stats, or software packets > (refresh). This will cause the gc_step to no longer consider the flow > expired, and it will not be deleted from software. > > Fix that by looking at the DYING flag as in deciding > a flow should be deleted from software. Would this work for you? The idea is to skip the refresh if this has already expired. Thanks.
On 5/11/2020 1:26 AM, Pablo Neira Ayuso wrote: > On Wed, May 06, 2020 at 02:27:29PM +0300, Paul Blakey wrote: >> Once a flow is considered expired, it is marked as DYING, and >> scheduled a delete from hardware. The flow will be deleted from >> software, in the next gc_step after hardware deletes the flow >> (and flow is marked DEAD). Till that happens, the flow's timeout >> might be updated from a previous scheduled stats, or software packets >> (refresh). This will cause the gc_step to no longer consider the flow >> expired, and it will not be deleted from software. >> >> Fix that by looking at the DYING flag as in deciding >> a flow should be deleted from software. > Would this work for you? > > The idea is to skip the refresh if this has already expired. > > Thanks. The idea is ok, but timeout check + update isn't atomic (need atomic_inc_unlesss or something like that), and there is also the hardware stats which if comes too late (after gc finds it expired) might bring a flow back to life.
On Mon, May 11, 2020 at 10:24:44AM +0300, Paul Blakey wrote: > > > On 5/11/2020 1:26 AM, Pablo Neira Ayuso wrote: > > On Wed, May 06, 2020 at 02:27:29PM +0300, Paul Blakey wrote: > >> Once a flow is considered expired, it is marked as DYING, and > >> scheduled a delete from hardware. The flow will be deleted from > >> software, in the next gc_step after hardware deletes the flow > >> (and flow is marked DEAD). Till that happens, the flow's timeout > >> might be updated from a previous scheduled stats, or software packets > >> (refresh). This will cause the gc_step to no longer consider the flow > >> expired, and it will not be deleted from software. > >> > >> Fix that by looking at the DYING flag as in deciding > >> a flow should be deleted from software. > > Would this work for you? > > > > The idea is to skip the refresh if this has already expired. > > > > Thanks. > > The idea is ok, but timeout check + update isn't atomic (need atomic_inc_unlesss > or something like that), and there is also > the hardware stats which if comes too late (after gc finds it expired) might > bring a flow back to life. Right. Once the entry has expired, there should not be a way turning back. I'm attaching a new sketch, it's basically using the teardown state to specify that the gc already made the decision to remove this entry. Thanks.
On 5/11/2020 11:42 AM, Pablo Neira Ayuso wrote: > On Mon, May 11, 2020 at 10:24:44AM +0300, Paul Blakey wrote: >> >> On 5/11/2020 1:26 AM, Pablo Neira Ayuso wrote: >>> On Wed, May 06, 2020 at 02:27:29PM +0300, Paul Blakey wrote: >>>> Once a flow is considered expired, it is marked as DYING, and >>>> scheduled a delete from hardware. The flow will be deleted from >>>> software, in the next gc_step after hardware deletes the flow >>>> (and flow is marked DEAD). Till that happens, the flow's timeout >>>> might be updated from a previous scheduled stats, or software packets >>>> (refresh). This will cause the gc_step to no longer consider the flow >>>> expired, and it will not be deleted from software. >>>> >>>> Fix that by looking at the DYING flag as in deciding >>>> a flow should be deleted from software. >>> Would this work for you? >>> >>> The idea is to skip the refresh if this has already expired. >>> >>> Thanks. >> The idea is ok, but timeout check + update isn't atomic (need atomic_inc_unlesss >> or something like that), and there is also >> the hardware stats which if comes too late (after gc finds it expired) might >> bring a flow back to life. > Right. Once the entry has expired, there should not be a way turning > back. > > I'm attaching a new sketch, it's basically using the teardown state to > specify that the gc already made the decision to remove this entry. > > Thanks. Looks fine to me, are you submitting that instead?
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index c0cb7949..b0e9f7a 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -362,7 +362,8 @@ static void nf_flow_offload_gc_step(struct flow_offload *flow, void *data) struct nf_flowtable *flow_table = data; if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) || - test_bit(NF_FLOW_TEARDOWN, &flow->flags)) { + test_bit(NF_FLOW_TEARDOWN, &flow->flags) || + test_bit(NF_FLOW_HW_DYING, &flow->flags)) { if (test_bit(NF_FLOW_HW, &flow->flags)) { if (!test_bit(NF_FLOW_HW_DYING, &flow->flags)) nf_flow_offload_del(flow_table, flow);