mbox series

[SRU,F:linux-bluefield,0/9] CT offload fixes

Message ID 20210406175209.1714809-1-roid@nvidia.com
Headers show
Series CT offload fixes | expand

Message

Roi Dayan April 6, 2021, 5:52 p.m. UTC
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922682
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922678
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922672


SRU Justification:

1. The first 2 patches are fixing a race, potentially crashing the kernel.
2. The next 2 patches are fixing a possible memory hog and aging active
   ct conns.
3. The last patches are adding offload support for ct_state invalid and
   ct_state reply.

* Explain the bug(s)

1. The kernel crash can happen on stress tcp traffic opening and closing
   the conns fast.

2. The memory hog and aging active ct conns can happen from any stress test
   as we have a single workqueue for handling the ct offload conns
   for add/del/stats.

* brief explanation of fixes

The fix for #1 is setting the offload timeout early and not relying on gc.

The fix for #2 is splitting add/del/stats for diff workqueue and also
we set a limit for add work entries.

* How to test

Testing #1 was done with stress http traffic opening conns, short data, close conns.
different 5-tuple each time.

Testing #2 was done with just stress traffic with lots of conns different 5-tuple.

* What it could break.

Issue #1 could potentially crash the kernel.

Issue #2 can take a lot of memory for a long time and also causing active conns to
age out when not necessary.


 include/linux/skbuff.h                |  5 ++-
 include/net/flow_offload.h            |  1 +
 include/net/netfilter/nf_conntrack.h  | 12 ++++++
 include/net/sch_generic.h             |  1 +
 include/uapi/linux/pkt_cls.h          |  2 +
 net/core/dev.c                        |  2 +
 net/core/flow_dissector.c             | 13 +++++--
 net/netfilter/nf_conntrack_core.c     | 12 ------
 net/netfilter/nf_flow_table_core.c    |  2 +
 net/netfilter/nf_flow_table_offload.c | 56 +++++++++++++++++++++++----
 net/openvswitch/conntrack.c           |  8 ++--
 net/openvswitch/conntrack.h           |  6 ++-
 net/openvswitch/flow.c                |  4 +-
 net/sched/act_ct.c                    |  6 ++-
 net/sched/cls_api.c                   |  1 +
 net/sched/cls_flower.c                | 10 +++--
 16 files changed, 105 insertions(+), 36 deletions(-)

Comments

Stefan Bader April 9, 2021, 1:05 p.m. UTC | #1
On 06.04.21 19:52, Roi Dayan wrote:
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922682
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922678
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922672
> 
> 
Hi Roi,

since the 3 issues appear to be independent of each other, this should be 3 
individual submissions to the mailing list. For multiple reasons: They can then 
be treated as units of their own. That makes each piece smaller (to look at) and 
also allows to potentially handle in parallel. But also prevents one part that 
maybe rises questions blocking the other parts.

The SRU justification has to be in each bug report (best in the description 
because that place is prominently at the top and also can be adjusted / 
corrected at any time). The content of each section is best not too technical. 
It is basically the place where program managers, developers, and also testers 
look at. The "regression potential" part (here "what (it) could break" is 
something that should be written with user / tester experience in mind. So which 
areas or activities might be affected. The current statements actually look to 
be good that way. I just mention it again since that is something that is 
misunderstood often.

With the SRU justification in the bug report, the cover email does not 
necessarily have to repeat that (but it can). The cover email can be used to 
pass on helpful information for developers looking at the patches. That part is 
usually read only by technical people.

A final note on the BugLink URLs: We prefer the short version of

https://bugs.launchpad.net/bugs/<bugnumber>

because this is the form that will always be valid, even if the project/package 
name changes. Not that this is likely for this kernel but then at least things 
look the same across all projects. I heard that this might be a mail servier 
config issue. So if that is the only thing that is not quite right, we would 
adjust that part when applying the patches. But we would appreciate not to have 
to for sure. :)

So once the bug reports are updated, please send the individual pieces again.
Thanks!

-Stefan
> SRU Justification:
> 
> 1. The first 2 patches are fixing a race, potentially crashing the kernel.
> 2. The next 2 patches are fixing a possible memory hog and aging active
>     ct conns.
> 3. The last patches are adding offload support for ct_state invalid and
>     ct_state reply.
> 
> * Explain the bug(s)
> 
> 1. The kernel crash can happen on stress tcp traffic opening and closing
>     the conns fast.
> 
> 2. The memory hog and aging active ct conns can happen from any stress test
>     as we have a single workqueue for handling the ct offload conns
>     for add/del/stats.
> 
> * brief explanation of fixes
> 
> The fix for #1 is setting the offload timeout early and not relying on gc.
> 
> The fix for #2 is splitting add/del/stats for diff workqueue and also
> we set a limit for add work entries.
> 
> * How to test
> 
> Testing #1 was done with stress http traffic opening conns, short data, close conns.
> different 5-tuple each time.
> 
> Testing #2 was done with just stress traffic with lots of conns different 5-tuple.
> 
> * What it could break.
> 
> Issue #1 could potentially crash the kernel.
> 
> Issue #2 can take a lot of memory for a long time and also causing active conns to
> age out when not necessary.
> 
> 
>   include/linux/skbuff.h                |  5 ++-
>   include/net/flow_offload.h            |  1 +
>   include/net/netfilter/nf_conntrack.h  | 12 ++++++
>   include/net/sch_generic.h             |  1 +
>   include/uapi/linux/pkt_cls.h          |  2 +
>   net/core/dev.c                        |  2 +
>   net/core/flow_dissector.c             | 13 +++++--
>   net/netfilter/nf_conntrack_core.c     | 12 ------
>   net/netfilter/nf_flow_table_core.c    |  2 +
>   net/netfilter/nf_flow_table_offload.c | 56 +++++++++++++++++++++++----
>   net/openvswitch/conntrack.c           |  8 ++--
>   net/openvswitch/conntrack.h           |  6 ++-
>   net/openvswitch/flow.c                |  4 +-
>   net/sched/act_ct.c                    |  6 ++-
>   net/sched/cls_api.c                   |  1 +
>   net/sched/cls_flower.c                | 10 +++--
>   16 files changed, 105 insertions(+), 36 deletions(-)
>
Roi Dayan April 11, 2021, 10:20 a.m. UTC | #2
On 2021-04-09 4:05 PM, Stefan Bader wrote:
> On 06.04.21 19:52, Roi Dayan wrote:
>> BugLink: 
>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922682
>> BugLink: 
>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922678
>> BugLink: 
>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922672
>>
>>
> Hi Roi,
> 
> since the 3 issues appear to be independent of each other, this should 
> be 3 individual submissions to the mailing list. For multiple reasons: 
> They can then be treated as units of their own. That makes each piece 
> smaller (to look at) and also allows to potentially handle in parallel. 
> But also prevents one part that maybe rises questions blocking the other 
> parts.
> 
> The SRU justification has to be in each bug report (best in the 
> description because that place is prominently at the top and also can be 
> adjusted / corrected at any time). The content of each section is best 
> not too technical. It is basically the place where program managers, 
> developers, and also testers look at. The "regression potential" part 
> (here "what (it) could break" is something that should be written with 
> user / tester experience in mind. So which areas or activities might be 
> affected. The current statements actually look to be good that way. I 
> just mention it again since that is something that is misunderstood often.
> 
> With the SRU justification in the bug report, the cover email does not 
> necessarily have to repeat that (but it can). The cover email can be 
> used to pass on helpful information for developers looking at the 
> patches. That part is usually read only by technical people.
> 
> A final note on the BugLink URLs: We prefer the short version of
> 
> https://bugs.launchpad.net/bugs/<bugnumber>
> 
> because this is the form that will always be valid, even if the 
> project/package name changes. Not that this is likely for this kernel 
> but then at least things look the same across all projects. I heard that 
> this might be a mail servier config issue. So if that is the only thing 
> that is not quite right, we would adjust that part when applying the 
> patches. But we would appreciate not to have to for sure. :)
> 
> So once the bug reports are updated, please send the individual pieces 
> again.
> Thanks!
> 
> -Stefan

great. thanks for the comments.  i'll resubmit.
is a cover letter still needed if its a single patch?
could the SRU justification comments be inside the commit msg after the
patch separator? (---) ?


>> SRU Justification:
>>
>> 1. The first 2 patches are fixing a race, potentially crashing the 
>> kernel.
>> 2. The next 2 patches are fixing a possible memory hog and aging active
>>     ct conns.
>> 3. The last patches are adding offload support for ct_state invalid and
>>     ct_state reply.
>>
>> * Explain the bug(s)
>>
>> 1. The kernel crash can happen on stress tcp traffic opening and closing
>>     the conns fast.
>>
>> 2. The memory hog and aging active ct conns can happen from any stress 
>> test
>>     as we have a single workqueue for handling the ct offload conns
>>     for add/del/stats.
>>
>> * brief explanation of fixes
>>
>> The fix for #1 is setting the offload timeout early and not relying on 
>> gc.
>>
>> The fix for #2 is splitting add/del/stats for diff workqueue and also
>> we set a limit for add work entries.
>>
>> * How to test
>>
>> Testing #1 was done with stress http traffic opening conns, short 
>> data, close conns.
>> different 5-tuple each time.
>>
>> Testing #2 was done with just stress traffic with lots of conns 
>> different 5-tuple.
>>
>> * What it could break.
>>
>> Issue #1 could potentially crash the kernel.
>>
>> Issue #2 can take a lot of memory for a long time and also causing 
>> active conns to
>> age out when not necessary.
>>
>>
>>   include/linux/skbuff.h                |  5 ++-
>>   include/net/flow_offload.h            |  1 +
>>   include/net/netfilter/nf_conntrack.h  | 12 ++++++
>>   include/net/sch_generic.h             |  1 +
>>   include/uapi/linux/pkt_cls.h          |  2 +
>>   net/core/dev.c                        |  2 +
>>   net/core/flow_dissector.c             | 13 +++++--
>>   net/netfilter/nf_conntrack_core.c     | 12 ------
>>   net/netfilter/nf_flow_table_core.c    |  2 +
>>   net/netfilter/nf_flow_table_offload.c | 56 +++++++++++++++++++++++----
>>   net/openvswitch/conntrack.c           |  8 ++--
>>   net/openvswitch/conntrack.h           |  6 ++-
>>   net/openvswitch/flow.c                |  4 +-
>>   net/sched/act_ct.c                    |  6 ++-
>>   net/sched/cls_api.c                   |  1 +
>>   net/sched/cls_flower.c                | 10 +++--
>>   16 files changed, 105 insertions(+), 36 deletions(-)
>>
> 
>
Stefan Bader April 12, 2021, 7:28 a.m. UTC | #3
On 11.04.21 12:20, Roi Dayan wrote:
> 
> 
> On 2021-04-09 4:05 PM, Stefan Bader wrote:
>> On 06.04.21 19:52, Roi Dayan wrote:
>>> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922682
>>> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922678
>>> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922672
>>>
>>>
>> Hi Roi,
>>
>> since the 3 issues appear to be independent of each other, this should be 3 
>> individual submissions to the mailing list. For multiple reasons: They can 
>> then be treated as units of their own. That makes each piece smaller (to look 
>> at) and also allows to potentially handle in parallel. But also prevents one 
>> part that maybe rises questions blocking the other parts.
>>
>> The SRU justification has to be in each bug report (best in the description 
>> because that place is prominently at the top and also can be adjusted / 
>> corrected at any time). The content of each section is best not too technical. 
>> It is basically the place where program managers, developers, and also testers 
>> look at. The "regression potential" part (here "what (it) could break" is 
>> something that should be written with user / tester experience in mind. So 
>> which areas or activities might be affected. The current statements actually 
>> look to be good that way. I just mention it again since that is something that 
>> is misunderstood often.
>>
>> With the SRU justification in the bug report, the cover email does not 
>> necessarily have to repeat that (but it can). The cover email can be used to 
>> pass on helpful information for developers looking at the patches. That part 
>> is usually read only by technical people.
>>
>> A final note on the BugLink URLs: We prefer the short version of
>>
>> https://bugs.launchpad.net/bugs/<bugnumber>
>>
>> because this is the form that will always be valid, even if the 
>> project/package name changes. Not that this is likely for this kernel but then 
>> at least things look the same across all projects. I heard that this might be 
>> a mail servier config issue. So if that is the only thing that is not quite 
>> right, we would adjust that part when applying the patches. But we would 
>> appreciate not to have to for sure. :)
>>
>> So once the bug reports are updated, please send the individual pieces again.
>> Thanks!
>>
>> -Stefan
> 
> great. thanks for the comments.  i'll resubmit.
> is a cover letter still needed if its a single patch?
> could the SRU justification comments be inside the commit msg after the
> patch separator? (---) ?

For single patches it would be ok the put additional info between separator and 
the patch as that goes away when applying. Funnily it seems by splitting the 
submission into 3, the patches magically multiplied in numbers...

-Stefan

> 
> 
>>> SRU Justification:
>>>
>>> 1. The first 2 patches are fixing a race, potentially crashing the kernel.
>>> 2. The next 2 patches are fixing a possible memory hog and aging active
>>>     ct conns.
>>> 3. The last patches are adding offload support for ct_state invalid and
>>>     ct_state reply.
>>>
>>> * Explain the bug(s)
>>>
>>> 1. The kernel crash can happen on stress tcp traffic opening and closing
>>>     the conns fast.
>>>
>>> 2. The memory hog and aging active ct conns can happen from any stress test
>>>     as we have a single workqueue for handling the ct offload conns
>>>     for add/del/stats.
>>>
>>> * brief explanation of fixes
>>>
>>> The fix for #1 is setting the offload timeout early and not relying on gc.
>>>
>>> The fix for #2 is splitting add/del/stats for diff workqueue and also
>>> we set a limit for add work entries.
>>>
>>> * How to test
>>>
>>> Testing #1 was done with stress http traffic opening conns, short data, close 
>>> conns.
>>> different 5-tuple each time.
>>>
>>> Testing #2 was done with just stress traffic with lots of conns different 
>>> 5-tuple.
>>>
>>> * What it could break.
>>>
>>> Issue #1 could potentially crash the kernel.
>>>
>>> Issue #2 can take a lot of memory for a long time and also causing active 
>>> conns to
>>> age out when not necessary.
>>>
>>>
>>>   include/linux/skbuff.h                |  5 ++-
>>>   include/net/flow_offload.h            |  1 +
>>>   include/net/netfilter/nf_conntrack.h  | 12 ++++++
>>>   include/net/sch_generic.h             |  1 +
>>>   include/uapi/linux/pkt_cls.h          |  2 +
>>>   net/core/dev.c                        |  2 +
>>>   net/core/flow_dissector.c             | 13 +++++--
>>>   net/netfilter/nf_conntrack_core.c     | 12 ------
>>>   net/netfilter/nf_flow_table_core.c    |  2 +
>>>   net/netfilter/nf_flow_table_offload.c | 56 +++++++++++++++++++++++----
>>>   net/openvswitch/conntrack.c           |  8 ++--
>>>   net/openvswitch/conntrack.h           |  6 ++-
>>>   net/openvswitch/flow.c                |  4 +-
>>>   net/sched/act_ct.c                    |  6 ++-
>>>   net/sched/cls_api.c                   |  1 +
>>>   net/sched/cls_flower.c                | 10 +++--
>>>   16 files changed, 105 insertions(+), 36 deletions(-)
>>>
>>
>>
Roi Dayan April 12, 2021, 8:16 a.m. UTC | #4
On 2021-04-12 10:28 AM, Stefan Bader wrote:
> On 11.04.21 12:20, Roi Dayan wrote:
>>
>>
>> On 2021-04-09 4:05 PM, Stefan Bader wrote:
>>> On 06.04.21 19:52, Roi Dayan wrote:
>>>> BugLink: 
>>>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922682
>>>> BugLink: 
>>>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922678
>>>> BugLink: 
>>>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922672
>>>>
>>>>
>>> Hi Roi,
>>>
>>> since the 3 issues appear to be independent of each other, this 
>>> should be 3 individual submissions to the mailing list. For multiple 
>>> reasons: They can then be treated as units of their own. That makes 
>>> each piece smaller (to look at) and also allows to potentially handle 
>>> in parallel. But also prevents one part that maybe rises questions 
>>> blocking the other parts.
>>>
>>> The SRU justification has to be in each bug report (best in the 
>>> description because that place is prominently at the top and also can 
>>> be adjusted / corrected at any time). The content of each section is 
>>> best not too technical. It is basically the place where program 
>>> managers, developers, and also testers look at. The "regression 
>>> potential" part (here "what (it) could break" is something that 
>>> should be written with user / tester experience in mind. So which 
>>> areas or activities might be affected. The current statements 
>>> actually look to be good that way. I just mention it again since that 
>>> is something that is misunderstood often.
>>>
>>> With the SRU justification in the bug report, the cover email does 
>>> not necessarily have to repeat that (but it can). The cover email can 
>>> be used to pass on helpful information for developers looking at the 
>>> patches. That part is usually read only by technical people.
>>>
>>> A final note on the BugLink URLs: We prefer the short version of
>>>
>>> https://bugs.launchpad.net/bugs/<bugnumber>
>>>
>>> because this is the form that will always be valid, even if the 
>>> project/package name changes. Not that this is likely for this kernel 
>>> but then at least things look the same across all projects. I heard 
>>> that this might be a mail servier config issue. So if that is the 
>>> only thing that is not quite right, we would adjust that part when 
>>> applying the patches. But we would appreciate not to have to for 
>>> sure. :)
>>>
>>> So once the bug reports are updated, please send the individual 
>>> pieces again.
>>> Thanks!
>>>
>>> -Stefan
>>
>> great. thanks for the comments.  i'll resubmit.
>> is a cover letter still needed if its a single patch?
>> could the SRU justification comments be inside the commit msg after the
>> patch separator? (---) ?
> 
> For single patches it would be ok the put additional info between 
> separator and the patch as that goes away when applying. Funnily it 
> seems by splitting the submission into 3, the patches magically 
> multiplied in numbers...
> 
> -Stefan
> 

what do you mean magically multiplied? just 2 more cover letters.
the single series had 9 patches + a cover letter.
now series1 is 2 patches, series2 is 2 patches, series3 is 5 patches.
it's still the original 9 patches.

>>
>>
>>>> SRU Justification:
>>>>
>>>> 1. The first 2 patches are fixing a race, potentially crashing the 
>>>> kernel.
>>>> 2. The next 2 patches are fixing a possible memory hog and aging active
>>>>     ct conns.
>>>> 3. The last patches are adding offload support for ct_state invalid and
>>>>     ct_state reply.
>>>>
>>>> * Explain the bug(s)
>>>>
>>>> 1. The kernel crash can happen on stress tcp traffic opening and 
>>>> closing
>>>>     the conns fast.
>>>>
>>>> 2. The memory hog and aging active ct conns can happen from any 
>>>> stress test
>>>>     as we have a single workqueue for handling the ct offload conns
>>>>     for add/del/stats.
>>>>
>>>> * brief explanation of fixes
>>>>
>>>> The fix for #1 is setting the offload timeout early and not relying 
>>>> on gc.
>>>>
>>>> The fix for #2 is splitting add/del/stats for diff workqueue and also
>>>> we set a limit for add work entries.
>>>>
>>>> * How to test
>>>>
>>>> Testing #1 was done with stress http traffic opening conns, short 
>>>> data, close conns.
>>>> different 5-tuple each time.
>>>>
>>>> Testing #2 was done with just stress traffic with lots of conns 
>>>> different 5-tuple.
>>>>
>>>> * What it could break.
>>>>
>>>> Issue #1 could potentially crash the kernel.
>>>>
>>>> Issue #2 can take a lot of memory for a long time and also causing 
>>>> active conns to
>>>> age out when not necessary.
>>>>
>>>>
>>>>   include/linux/skbuff.h                |  5 ++-
>>>>   include/net/flow_offload.h            |  1 +
>>>>   include/net/netfilter/nf_conntrack.h  | 12 ++++++
>>>>   include/net/sch_generic.h             |  1 +
>>>>   include/uapi/linux/pkt_cls.h          |  2 +
>>>>   net/core/dev.c                        |  2 +
>>>>   net/core/flow_dissector.c             | 13 +++++--
>>>>   net/netfilter/nf_conntrack_core.c     | 12 ------
>>>>   net/netfilter/nf_flow_table_core.c    |  2 +
>>>>   net/netfilter/nf_flow_table_offload.c | 56 
>>>> +++++++++++++++++++++++----
>>>>   net/openvswitch/conntrack.c           |  8 ++--
>>>>   net/openvswitch/conntrack.h           |  6 ++-
>>>>   net/openvswitch/flow.c                |  4 +-
>>>>   net/sched/act_ct.c                    |  6 ++-
>>>>   net/sched/cls_api.c                   |  1 +
>>>>   net/sched/cls_flower.c                | 10 +++--
>>>>   16 files changed, 105 insertions(+), 36 deletions(-)
>>>>
>>>
>>>
> 
>
Stefan Bader April 12, 2021, 9:19 a.m. UTC | #5
On 12.04.21 10:16, Roi Dayan wrote:
> 
> 
> On 2021-04-12 10:28 AM, Stefan Bader wrote:
>> On 11.04.21 12:20, Roi Dayan wrote:
>>>
>>>
>>> On 2021-04-09 4:05 PM, Stefan Bader wrote:
>>>> On 06.04.21 19:52, Roi Dayan wrote:
>>>>> BugLink: 
>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922682
>>>>> BugLink: 
>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922678
>>>>> BugLink: 
>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922672
>>>>>
>>>>>
>>>> Hi Roi,
>>>>
>>>> since the 3 issues appear to be independent of each other, this should be 3 
>>>> individual submissions to the mailing list. For multiple reasons: They can 
>>>> then be treated as units of their own. That makes each piece smaller (to 
>>>> look at) and also allows to potentially handle in parallel. But also 
>>>> prevents one part that maybe rises questions blocking the other parts.
>>>>
>>>> The SRU justification has to be in each bug report (best in the description 
>>>> because that place is prominently at the top and also can be adjusted / 
>>>> corrected at any time). The content of each section is best not too 
>>>> technical. It is basically the place where program managers, developers, and 
>>>> also testers look at. The "regression potential" part (here "what (it) could 
>>>> break" is something that should be written with user / tester experience in 
>>>> mind. So which areas or activities might be affected. The current statements 
>>>> actually look to be good that way. I just mention it again since that is 
>>>> something that is misunderstood often.
>>>>
>>>> With the SRU justification in the bug report, the cover email does not 
>>>> necessarily have to repeat that (but it can). The cover email can be used to 
>>>> pass on helpful information for developers looking at the patches. That part 
>>>> is usually read only by technical people.
>>>>
>>>> A final note on the BugLink URLs: We prefer the short version of
>>>>
>>>> https://bugs.launchpad.net/bugs/<bugnumber>
>>>>
>>>> because this is the form that will always be valid, even if the 
>>>> project/package name changes. Not that this is likely for this kernel but 
>>>> then at least things look the same across all projects. I heard that this 
>>>> might be a mail servier config issue. So if that is the only thing that is 
>>>> not quite right, we would adjust that part when applying the patches. But we 
>>>> would appreciate not to have to for sure. :)
>>>>
>>>> So once the bug reports are updated, please send the individual pieces again.
>>>> Thanks!
>>>>
>>>> -Stefan
>>>
>>> great. thanks for the comments.  i'll resubmit.
>>> is a cover letter still needed if its a single patch?
>>> could the SRU justification comments be inside the commit msg after the
>>> patch separator? (---) ?
>>
>> For single patches it would be ok the put additional info between separator 
>> and the patch as that goes away when applying. Funnily it seems by splitting 
>> the submission into 3, the patches magically multiplied in numbers...
>>
>> -Stefan
>>
> 
> what do you mean magically multiplied? just 2 more cover letters.
> the single series had 9 patches + a cover letter.
> now series1 is 2 patches, series2 is 2 patches, series3 is 5 patches.
> it's still the original 9 patches.

Sorry apparently I cannot look right this morning and my memory is distorted. 
For some reason my memory had 2 reports with 1 patch each and another with 2. 
But reality was different. I think I confused it with another submission and did 
not double check.

-Stefan

> 
>>>
>>>
>>>>> SRU Justification:
>>>>>
>>>>> 1. The first 2 patches are fixing a race, potentially crashing the kernel.
>>>>> 2. The next 2 patches are fixing a possible memory hog and aging active
>>>>>     ct conns.
>>>>> 3. The last patches are adding offload support for ct_state invalid and
>>>>>     ct_state reply.
>>>>>
>>>>> * Explain the bug(s)
>>>>>
>>>>> 1. The kernel crash can happen on stress tcp traffic opening and closing
>>>>>     the conns fast.
>>>>>
>>>>> 2. The memory hog and aging active ct conns can happen from any stress test
>>>>>     as we have a single workqueue for handling the ct offload conns
>>>>>     for add/del/stats.
>>>>>
>>>>> * brief explanation of fixes
>>>>>
>>>>> The fix for #1 is setting the offload timeout early and not relying on gc.
>>>>>
>>>>> The fix for #2 is splitting add/del/stats for diff workqueue and also
>>>>> we set a limit for add work entries.
>>>>>
>>>>> * How to test
>>>>>
>>>>> Testing #1 was done with stress http traffic opening conns, short data, 
>>>>> close conns.
>>>>> different 5-tuple each time.
>>>>>
>>>>> Testing #2 was done with just stress traffic with lots of conns different 
>>>>> 5-tuple.
>>>>>
>>>>> * What it could break.
>>>>>
>>>>> Issue #1 could potentially crash the kernel.
>>>>>
>>>>> Issue #2 can take a lot of memory for a long time and also causing active 
>>>>> conns to
>>>>> age out when not necessary.
>>>>>
>>>>>
>>>>>   include/linux/skbuff.h                |  5 ++-
>>>>>   include/net/flow_offload.h            |  1 +
>>>>>   include/net/netfilter/nf_conntrack.h  | 12 ++++++
>>>>>   include/net/sch_generic.h             |  1 +
>>>>>   include/uapi/linux/pkt_cls.h          |  2 +
>>>>>   net/core/dev.c                        |  2 +
>>>>>   net/core/flow_dissector.c             | 13 +++++--
>>>>>   net/netfilter/nf_conntrack_core.c     | 12 ------
>>>>>   net/netfilter/nf_flow_table_core.c    |  2 +
>>>>>   net/netfilter/nf_flow_table_offload.c | 56 +++++++++++++++++++++++----
>>>>>   net/openvswitch/conntrack.c           |  8 ++--
>>>>>   net/openvswitch/conntrack.h           |  6 ++-
>>>>>   net/openvswitch/flow.c                |  4 +-
>>>>>   net/sched/act_ct.c                    |  6 ++-
>>>>>   net/sched/cls_api.c                   |  1 +
>>>>>   net/sched/cls_flower.c                | 10 +++--
>>>>>   16 files changed, 105 insertions(+), 36 deletions(-)
>>>>>
>>>>
>>>>
>>
>>