Message ID | 20120304110151.GA22404@1984 |
---|---|
State | Accepted |
Headers | show |
Hi Pablo, On 04/03/2012 11:01, Pablo Neira Ayuso wrote: > Hi Kerin, > > On Sat, Mar 03, 2012 at 06:47:27PM +0000, Kerin Millar wrote: >> Hi, >> >> On 03/03/2012 13:30, Pablo Neira Ayuso wrote: >>> I just posted another patch to the ML that is a relative fix to >>> Jozsef's patch. You have to apply that as well. >> >> I've now tested 3.3-rc5 with the addition of the above mentioned >> follow-on patch. The behaviour during conntrackd -c execution is >> clearly much improved - in so far as it doesn't generate much noise >> - but the crash that follows remains. Here's a netconsole capture:- >> >> http://paste.pocoo.org/raw/560439/ > > Great to know :-). I apologize but I think I may have led you astray on the nf_nat issue. At the time of submitting my original report, I now believe that the nf_nat module wasn't loaded prior to starting conntrackd, although it was definitely available. For all tests that followed, however, I am entirely certain the the nf_nat module was loaded in advance. The upshot is that my claim that things had improved may have been premature; I need to specifically test under both circumstances to be sure that things are improving. That is, both with and without the module loaded in advance. Following my own advice then, I first tried going through my test case *without* loading nf_nat in advance. Alas, conntrackd -c triggered hard lockups and didn't return to prompt. Here are the results:- http://paste.pocoo.org/raw/561350/ In case it matters, the existing ssh session continued to respond to input but I was no longer able to initiate any new sessions. > > Regarding your previous email, I'm sorry, by reading your email I > thought you were using 2.6.32 which was not the case, your > configuration is perfectly reasonable. > > It seems we still have problems regarding early_drop, but this time > with reliable event delivery enabled (15 seconds is the time that > is required to retry sending the destroy event). > > If you can test the following patch, I'll appreciate. Gladly. I applied the patch to my 3.3-rc5 tree, which is still carrying the two patches discussed earlier in the thread. I then went through my test case under normal circumstances i.e. all firewall rules in place, nf_nat confirmed present before conntrackd etc. Again, conntrackd -c did not return to prompt. Here are the results:- http://paste.pocoo.org/raw/561354/ Well, at least there was no oops this time. I should also add that the patch was present for both of the tests mentioned in this email. --- Incidentally, I found out why the internal cache on the master was filling up to capacity. It was apparently due to the use of "iptables -I PREROUTING -t raw -j CT --ctevents assured". Perhaps I'm missing something but doesn't this stop events such as new and destroy from being propagated? An inspection with conntrack -E suggests so. Once I removed the above rule, I could see destroy events being propagated and the number of active connections in the cache no longer exceeded my chosen limit of 2097152 ... # conntrack -S | head -n1; conntrackd -s | head -n2 entries 725826 cache internal: current active connections: 1409472 Whatever the case, I'm quite happy to go without this rule as these systems are coping fine with the load incurred by conntrackd. Cheers, --Kerin -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Kerin, On Mon, Mar 05, 2012 at 05:19:49PM +0000, Kerin Millar wrote: > Hi Pablo, > > On 04/03/2012 11:01, Pablo Neira Ayuso wrote: > >Hi Kerin, > > > >On Sat, Mar 03, 2012 at 06:47:27PM +0000, Kerin Millar wrote: > >>Hi, > >> > >>On 03/03/2012 13:30, Pablo Neira Ayuso wrote: > >>>I just posted another patch to the ML that is a relative fix to > >>>Jozsef's patch. You have to apply that as well. > >> > >>I've now tested 3.3-rc5 with the addition of the above mentioned > >>follow-on patch. The behaviour during conntrackd -c execution is > >>clearly much improved - in so far as it doesn't generate much noise > >>- but the crash that follows remains. Here's a netconsole capture:- > >> > >>http://paste.pocoo.org/raw/560439/ > > > >Great to know :-). > > I apologize but I think I may have led you astray on the nf_nat > issue. At the time of submitting my original report, I now believe > that the nf_nat module wasn't loaded prior to starting conntrackd, > although it was definitely available. For all tests that followed, > however, I am entirely certain the the nf_nat module was loaded in > advance. The upshot is that my claim that things had improved may > have been premature; I need to specifically test under both > circumstances to be sure that things are improving. That is, both > with and without the module loaded in advance. > > Following my own advice then, I first tried going through my test > case *without* loading nf_nat in advance. Alas, conntrackd -c > triggered hard lockups and didn't return to prompt. Here are the > results:- > > http://paste.pocoo.org/raw/561350/ > > In case it matters, the existing ssh session continued to respond to > input but I was no longer able to initiate any new sessions. > > > > >Regarding your previous email, I'm sorry, by reading your email I > >thought you were using 2.6.32 which was not the case, your > >configuration is perfectly reasonable. > > > >It seems we still have problems regarding early_drop, but this time > >with reliable event delivery enabled (15 seconds is the time that > >is required to retry sending the destroy event). > > > >If you can test the following patch, I'll appreciate. > > Gladly. I applied the patch to my 3.3-rc5 tree, which is still > carrying the two patches discussed earlier in the thread. I then > went through my test case under normal circumstances i.e. all > firewall rules in place, nf_nat confirmed present before conntrackd > etc. Again, conntrackd -c did not return to prompt. Here are the > results:- > > http://paste.pocoo.org/raw/561354/ > > Well, at least there was no oops this time. I should also add that > the patch was present for both of the tests mentioned in this email. Previous patch that I sent you was not OK, sorry. I have committed the following to my git tree: http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 I've been using the following tools that you can find enclosed to this email, they are much more simple than conntrackd but, they do the same in essence: * conntrack_stress.c * conntrack_events.c gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress gcc -lnetfilter_conntrack conntrack_events.c -o ct_events Then, to listen to events with reliable event delivery enabled: # ./ct_events & And to create loads of flow entries in ASSURED state: # ./ct_stress 65535 # that's my ct table size in my laptop You'll hit ENOMEM errors at some point, that's fine, but no oops or lockups happen here. I have pushed this tools to the qa/ directory under libnetfilter_conntrack: commit 94e75add9867fb6f0e05e73b23f723f139da829e Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Tue Mar 6 12:10:55 2012 +0100 qa: add some stress tools to test conntrack via ctnetlink (BTW, ct_stress may disrupt your network connection since the table gets filled. You can use conntrack -F to get the ct table empty again). > --- > Incidentally, I found out why the internal cache on the master was > filling up to capacity. It was apparently due to the use of > "iptables -I PREROUTING -t raw -j CT --ctevents assured". Perhaps > I'm missing something but doesn't this stop events such as new and > destroy from being propagated? An inspection with conntrack -E > suggests so. Once I removed the above rule, I could see destroy > events being propagated and the number of active connections in the > cache no longer exceeded my chosen limit of 2097152 ... Yes, that line was wrong, I have fixed in the documentation, the correct one must be: iptables -I PREROUTING -t raw -j CT --ctevents assured,destroy Thus, destroy events are delivered to user-space. > # conntrack -S | head -n1; conntrackd -s | head -n2 > entries 725826 > cache internal: > current active connections: 1409472 > > Whatever the case, I'm quite happy to go without this rule as these > systems are coping fine with the load incurred by conntrackd. I want to get things fixed, please, don't give up on using that rule yet :-). Regarding the hardlockups. I'd be happy if you can re-do the tests, both with conntrackd and the tools that I send you. Make sure you have these three patches, note that the last one has changed. http://1984.lsi.us.es/git/net/commit/?id=7d367e06688dc7a2cc98c2ace04e1296e1d987e2 http://1984.lsi.us.es/git/net/commit/?id=a8f341e98a46f579061fabfe6ea50be3d0eb2c60 http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 Thanks!
Hi Pablo, On 06/03/2012 11:14, Pablo Neira Ayuso wrote: <snip> >> Gladly. I applied the patch to my 3.3-rc5 tree, which is still >> carrying the two patches discussed earlier in the thread. I then >> went through my test case under normal circumstances i.e. all >> firewall rules in place, nf_nat confirmed present before conntrackd >> etc. Again, conntrackd -c did not return to prompt. Here are the >> results:- >> >> http://paste.pocoo.org/raw/561354/ >> >> Well, at least there was no oops this time. I should also add that >> the patch was present for both of the tests mentioned in this email. > > Previous patch that I sent you was not OK, sorry. I have committed the > following to my git tree: > > http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 Noted. > > I've been using the following tools that you can find enclosed to this > email, they are much more simple than conntrackd but, they do the same > in essence: > > * conntrack_stress.c > * conntrack_events.c > > gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress > gcc -lnetfilter_conntrack conntrack_events.c -o ct_events > > Then, to listen to events with reliable event delivery enabled: > > # ./ct_events& > > And to create loads of flow entries in ASSURED state: > > # ./ct_stress 65535 # that's my ct table size in my laptop > > You'll hit ENOMEM errors at some point, that's fine, but no oops or > lockups happen here. > > I have pushed this tools to the qa/ directory under > libnetfilter_conntrack: > > commit 94e75add9867fb6f0e05e73b23f723f139da829e > Author: Pablo Neira Ayuso<pablo@netfilter.org> > Date: Tue Mar 6 12:10:55 2012 +0100 > > qa: add some stress tools to test conntrack via ctnetlink > > (BTW, ct_stress may disrupt your network connection since the table > gets filled. You can use conntrack -F to get the ct table empty again). > Sorry if this is a silly question but should conntrackd be running while I conduct this stress test? If so, is there any danger of the master becoming unstable? I must ask because, if the stability of the master is compromised, I will be in big trouble ;) <snip> > Yes, that line was wrong, I have fixed in the documentation, the > correct one must be: > > iptables -I PREROUTING -t raw -j CT --ctevents assured,destroy > > Thus, destroy events are delivered to user-space. > >> # conntrack -S | head -n1; conntrackd -s | head -n2 >> entries 725826 >> cache internal: >> current active connections: 1409472 >> >> Whatever the case, I'm quite happy to go without this rule as these >> systems are coping fine with the load incurred by conntrackd. > > I want to get things fixed, please, don't give up on using that rule > yet :-). Sure. I've re-instated the rule as requested. With the addition of destroy events, cache usage remains under control. > > Regarding the hardlockups. I'd be happy if you can re-do the tests, > both with conntrackd and the tools that I send you. > > Make sure you have these three patches, note that the last one has > changed. > > http://1984.lsi.us.es/git/net/commit/?id=7d367e06688dc7a2cc98c2ace04e1296e1d987e2 > http://1984.lsi.us.es/git/net/commit/?id=a8f341e98a46f579061fabfe6ea50be3d0eb2c60 > http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 > Duly applied to a fresh 3.3-rc5 tree. Cheers, --Kerin -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 06, 2012 at 04:42:02PM +0000, Kerin Millar wrote: > Hi Pablo, > > On 06/03/2012 11:14, Pablo Neira Ayuso wrote: > > <snip> > > >>Gladly. I applied the patch to my 3.3-rc5 tree, which is still > >>carrying the two patches discussed earlier in the thread. I then > >>went through my test case under normal circumstances i.e. all > >>firewall rules in place, nf_nat confirmed present before conntrackd > >>etc. Again, conntrackd -c did not return to prompt. Here are the > >>results:- > >> > >>http://paste.pocoo.org/raw/561354/ > >> > >>Well, at least there was no oops this time. I should also add that > >>the patch was present for both of the tests mentioned in this email. > > > >Previous patch that I sent you was not OK, sorry. I have committed the > >following to my git tree: > > > >http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 > > Noted. > > > > >I've been using the following tools that you can find enclosed to this > >email, they are much more simple than conntrackd but, they do the same > >in essence: > > > >* conntrack_stress.c > >* conntrack_events.c > > > >gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress > >gcc -lnetfilter_conntrack conntrack_events.c -o ct_events > > > >Then, to listen to events with reliable event delivery enabled: > > > ># ./ct_events& > > > >And to create loads of flow entries in ASSURED state: > > > ># ./ct_stress 65535 # that's my ct table size in my laptop > > > >You'll hit ENOMEM errors at some point, that's fine, but no oops or > >lockups happen here. > > > >I have pushed this tools to the qa/ directory under > >libnetfilter_conntrack: > > > >commit 94e75add9867fb6f0e05e73b23f723f139da829e > >Author: Pablo Neira Ayuso<pablo@netfilter.org> > >Date: Tue Mar 6 12:10:55 2012 +0100 > > > > qa: add some stress tools to test conntrack via ctnetlink > > > >(BTW, ct_stress may disrupt your network connection since the table > >gets filled. You can use conntrack -F to get the ct table empty again). > > > > Sorry if this is a silly question but should conntrackd be running > while I conduct this stress test? If so, is there any danger of the > master becoming unstable? I must ask because, if the stability of > the master is compromised, I will be in big trouble ;) If you run this in the backup, conntrackd will spam the master with lots of new flows in the external cache. That shouldn't be a problem (just a bit of extra load invested in the replication). But if you run this in the master, my test will fill the ct table with lots of assured flows. Thus, packets that belong new flows will be likely dropped in that node. > >Yes, that line was wrong, I have fixed in the documentation, the > >correct one must be: > > > >iptables -I PREROUTING -t raw -j CT --ctevents assured,destroy > > > >Thus, destroy events are delivered to user-space. > > > >># conntrack -S | head -n1; conntrackd -s | head -n2 > >>entries 725826 > >>cache internal: > >>current active connections: 1409472 > >> > >>Whatever the case, I'm quite happy to go without this rule as these > >>systems are coping fine with the load incurred by conntrackd. > > > >I want to get things fixed, please, don't give up on using that rule > >yet :-). > > Sure. I've re-instated the rule as requested. With the addition of > destroy events, cache usage remains under control. > > > > >Regarding the hardlockups. I'd be happy if you can re-do the tests, > >both with conntrackd and the tools that I send you. > > > >Make sure you have these three patches, note that the last one has > >changed. > > > >http://1984.lsi.us.es/git/net/commit/?id=7d367e06688dc7a2cc98c2ace04e1296e1d987e2 > >http://1984.lsi.us.es/git/net/commit/?id=a8f341e98a46f579061fabfe6ea50be3d0eb2c60 > >http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 > > > > Duly applied to a fresh 3.3-rc5 tree. > > Cheers, > > --Kerin > > -- > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Pablo, On 06/03/2012 17:23, Pablo Neira Ayuso wrote: <snip> >>> I've been using the following tools that you can find enclosed to this >>> email, they are much more simple than conntrackd but, they do the same >>> in essence: >>> >>> * conntrack_stress.c >>> * conntrack_events.c >>> >>> gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress >>> gcc -lnetfilter_conntrack conntrack_events.c -o ct_events >>> >>> Then, to listen to events with reliable event delivery enabled: >>> >>> # ./ct_events& >>> >>> And to create loads of flow entries in ASSURED state: >>> >>> # ./ct_stress 65535 # that's my ct table size in my laptop >>> >>> You'll hit ENOMEM errors at some point, that's fine, but no oops or >>> lockups happen here. >>> >>> I have pushed this tools to the qa/ directory under >>> libnetfilter_conntrack: >>> >>> commit 94e75add9867fb6f0e05e73b23f723f139da829e >>> Author: Pablo Neira Ayuso<pablo@netfilter.org> >>> Date: Tue Mar 6 12:10:55 2012 +0100 >>> >>> qa: add some stress tools to test conntrack via ctnetlink >>> >>> (BTW, ct_stress may disrupt your network connection since the table >>> gets filled. You can use conntrack -F to get the ct table empty again). >>> >> >> Sorry if this is a silly question but should conntrackd be running >> while I conduct this stress test? If so, is there any danger of the >> master becoming unstable? I must ask because, if the stability of >> the master is compromised, I will be in big trouble ;) > > If you run this in the backup, conntrackd will spam the master with > lots of new flows in the external cache. That shouldn't be a problem > (just a bit of extra load invested in the replication). > > But if you run this in the master, my test will fill the ct table > with lots of assured flows. Thus, packets that belong new flows will > be likely dropped in that node. That makes sense. So, I rebooted the backup with the latest kernel build, ran my iptables script then started conntrackd. I was not able to destabilize the system through the use of your stress tool. The sequence of commands used to invoke the ct_stress tool was as follows:- 1) ct_stress 2097152 2) ct_stress 2097152 3) ct_stress 1048576 There were indeed a lot of ENOMEM errors, and messages warning that the conntrack table was full with packets being dropped. Nothing surprising. I then tried my test case again. The exact sequence of commands was as follows:- 4) conntrackd -n 5) conntrackd -c 6) conntrackd -f internal 7) conntrackd -F 8) conntrackd -n 9) conntrackd -c It didn't crash after the 5th step (to my amazement) but it did after the 9th. Here's a netconsole log covering all of the above: http://paste.pocoo.org/raw/562136/ The invalid opcode error was also present in the log that I provided with my first post in this thread. For some reason, I couldn't capture stdout from your ct_events tool but here's as much as I was able to copy and paste before it stopped responding completely. 2100000 events received (2 new, 1048702 destroy) 2110000 events received (2 new, 1048706 destroy) 2120000 events received (2 new, 1048713 destroy) 2130000 events received (2 new, 1048722 destroy) 2140000 events received (2 new, 1048735 destroy) 2150000 events received (2 new, 1048748 destroy) 2160000 events received (2 new, 1048776 destroy) 2170000 events received (2 new, 1048797 destroy) 2180000 events received (2 new, 1048830 destroy) 2190000 events received (2 new, 1048872 destroy) 2200000 events received (2 new, 1048909 destroy) 2210000 events received (2 new, 1048945 destroy) 2220000 events received (2 new, 1048985 destroy) 2230000 events received (2 new, 1049039 destroy) 2240000 events received (2 new, 1049102 destroy) 2250000 events received (2 new, 1049170 destroy) 2260000 events received (2 new, 1049238 destroy) 2270000 events received (2 new, 1049292 destroy) 2280000 events received (2 new, 1049347 destroy) 2290000 events received (2 new, 1049423 destroy) 2300000 events received (2 new, 1049490 destroy) 2310000 events received (2 new, 1049563 destroy) 2320000 events received (2 new, 1049646 destroy) 2330000 events received (2 new, 1049739 destroy) 2340000 events received (2 new, 1049819 destroy) 2350000 events received (2 new, 1049932 destroy) 2360000 events received (2 new, 1050040 destroy) 2370000 events received (2 new, 1050153 destroy) 2380000 events received (2 new, 1050293 destroy) 2390000 events received (2 new, 1050405 destroy) 2400000 events received (2 new, 1050535 destroy) 2410000 events received (2 new, 1050661 destroy) 2420000 events received (2 new, 1050786 destroy) 2430000 events received (2 new, 1050937 destroy) 2440000 events received (2 new, 1051085 destroy) 2450000 events received (2 new, 1051226 destroy) 2460000 events received (2 new, 1051378 destroy) 2470000 events received (2 new, 1051542 destroy) 2480000 events received (2 new, 1051693 destroy) 2490000 events received (2 new, 1051852 destroy) 2500000 events received (2 new, 1052008 destroy) 2510000 events received (2 new, 1052185 destroy) 2520000 events received (2 new, 1052373 destroy) 2530000 events received (2 new, 1052569 destroy) 2540000 events received (2 new, 1052770 destroy) 2550000 events received (2 new, 1052978 destroy) Cheers, --Kerin -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Pablo, To follow up briefly (at the end of this message) ... On 06/03/2012 22:37, Kerin Millar wrote: > Hi Pablo, > > On 06/03/2012 17:23, Pablo Neira Ayuso wrote: > > <snip> > >>>> I've been using the following tools that you can find enclosed to this >>>> email, they are much more simple than conntrackd but, they do the same >>>> in essence: >>>> >>>> * conntrack_stress.c >>>> * conntrack_events.c >>>> >>>> gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress >>>> gcc -lnetfilter_conntrack conntrack_events.c -o ct_events >>>> >>>> Then, to listen to events with reliable event delivery enabled: >>>> >>>> # ./ct_events& >>>> >>>> And to create loads of flow entries in ASSURED state: >>>> >>>> # ./ct_stress 65535 # that's my ct table size in my laptop >>>> >>>> You'll hit ENOMEM errors at some point, that's fine, but no oops or >>>> lockups happen here. >>>> >>>> I have pushed this tools to the qa/ directory under >>>> libnetfilter_conntrack: >>>> >>>> commit 94e75add9867fb6f0e05e73b23f723f139da829e >>>> Author: Pablo Neira Ayuso<pablo@netfilter.org> >>>> Date: Tue Mar 6 12:10:55 2012 +0100 >>>> >>>> qa: add some stress tools to test conntrack via ctnetlink >>>> >>>> (BTW, ct_stress may disrupt your network connection since the table >>>> gets filled. You can use conntrack -F to get the ct table empty again). >>>> >>> >>> Sorry if this is a silly question but should conntrackd be running >>> while I conduct this stress test? If so, is there any danger of the >>> master becoming unstable? I must ask because, if the stability of >>> the master is compromised, I will be in big trouble ;) >> >> If you run this in the backup, conntrackd will spam the master with >> lots of new flows in the external cache. That shouldn't be a problem >> (just a bit of extra load invested in the replication). >> >> But if you run this in the master, my test will fill the ct table >> with lots of assured flows. Thus, packets that belong new flows will >> be likely dropped in that node. > > That makes sense. So, I rebooted the backup with the latest kernel > build, ran my iptables script then started conntrackd. I was not able to > destabilize the system through the use of your stress tool. The sequence > of commands used to invoke the ct_stress tool was as follows:- > > 1) ct_stress 2097152 > 2) ct_stress 2097152 > 3) ct_stress 1048576 > > There were indeed a lot of ENOMEM errors, and messages warning that the > conntrack table was full with packets being dropped. Nothing surprising. > > I then tried my test case again. The exact sequence of commands was as > follows:- > > 4) conntrackd -n > 5) conntrackd -c > 6) conntrackd -f internal > 7) conntrackd -F > 8) conntrackd -n > 9) conntrackd -c > > It didn't crash after the 5th step (to my amazement) but it did after > the 9th. Here's a netconsole log covering all of the above: > > http://paste.pocoo.org/raw/562136/ > > The invalid opcode error was also present in the log that I provided > with my first post in this thread. > > For some reason, I couldn't capture stdout from your ct_events tool but > here's as much as I was able to copy and paste before it stopped > responding completely. > > 2100000 events received (2 new, 1048702 destroy) > 2110000 events received (2 new, 1048706 destroy) > 2120000 events received (2 new, 1048713 destroy) > 2130000 events received (2 new, 1048722 destroy) > 2140000 events received (2 new, 1048735 destroy) > 2150000 events received (2 new, 1048748 destroy) > 2160000 events received (2 new, 1048776 destroy) > 2170000 events received (2 new, 1048797 destroy) > 2180000 events received (2 new, 1048830 destroy) > 2190000 events received (2 new, 1048872 destroy) > 2200000 events received (2 new, 1048909 destroy) > 2210000 events received (2 new, 1048945 destroy) > 2220000 events received (2 new, 1048985 destroy) > 2230000 events received (2 new, 1049039 destroy) > 2240000 events received (2 new, 1049102 destroy) > 2250000 events received (2 new, 1049170 destroy) > 2260000 events received (2 new, 1049238 destroy) > 2270000 events received (2 new, 1049292 destroy) > 2280000 events received (2 new, 1049347 destroy) > 2290000 events received (2 new, 1049423 destroy) > 2300000 events received (2 new, 1049490 destroy) > 2310000 events received (2 new, 1049563 destroy) > 2320000 events received (2 new, 1049646 destroy) > 2330000 events received (2 new, 1049739 destroy) > 2340000 events received (2 new, 1049819 destroy) > 2350000 events received (2 new, 1049932 destroy) > 2360000 events received (2 new, 1050040 destroy) > 2370000 events received (2 new, 1050153 destroy) > 2380000 events received (2 new, 1050293 destroy) > 2390000 events received (2 new, 1050405 destroy) > 2400000 events received (2 new, 1050535 destroy) > 2410000 events received (2 new, 1050661 destroy) > 2420000 events received (2 new, 1050786 destroy) > 2430000 events received (2 new, 1050937 destroy) > 2440000 events received (2 new, 1051085 destroy) > 2450000 events received (2 new, 1051226 destroy) > 2460000 events received (2 new, 1051378 destroy) > 2470000 events received (2 new, 1051542 destroy) > 2480000 events received (2 new, 1051693 destroy) > 2490000 events received (2 new, 1051852 destroy) > 2500000 events received (2 new, 1052008 destroy) > 2510000 events received (2 new, 1052185 destroy) > 2520000 events received (2 new, 1052373 destroy) > 2530000 events received (2 new, 1052569 destroy) > 2540000 events received (2 new, 1052770 destroy) > 2550000 events received (2 new, 1052978 destroy) Just to add that I ran a more extensive stress test on the backup, like so ... for x in $(seq 1 100); do ct_stress 1048576; sleep $(( $RANDOM % 60 )); done It remained stable throughout. I notice that there's an option to dump the cache in XML format. I wonder if it be useful if I were to provide such a dump, having synced with the master? Assuming that there's a way to inject the contents, perhaps you could reproduce the issue also. Cheers, --Kerin -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 07, 2012 at 02:41:02PM +0000, Kerin Millar wrote: > Hi Pablo, > > To follow up briefly (at the end of this message) ... > > On 06/03/2012 22:37, Kerin Millar wrote: > >Hi Pablo, > > > >On 06/03/2012 17:23, Pablo Neira Ayuso wrote: > > > ><snip> > > > >>>>I've been using the following tools that you can find enclosed to this > >>>>email, they are much more simple than conntrackd but, they do the same > >>>>in essence: > >>>> > >>>>* conntrack_stress.c > >>>>* conntrack_events.c > >>>> > >>>>gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress > >>>>gcc -lnetfilter_conntrack conntrack_events.c -o ct_events > >>>> > >>>>Then, to listen to events with reliable event delivery enabled: > >>>> > >>>># ./ct_events& > >>>> > >>>>And to create loads of flow entries in ASSURED state: > >>>> > >>>># ./ct_stress 65535 # that's my ct table size in my laptop > >>>> > >>>>You'll hit ENOMEM errors at some point, that's fine, but no oops or > >>>>lockups happen here. > >>>> > >>>>I have pushed this tools to the qa/ directory under > >>>>libnetfilter_conntrack: > >>>> > >>>>commit 94e75add9867fb6f0e05e73b23f723f139da829e > >>>>Author: Pablo Neira Ayuso<pablo@netfilter.org> > >>>>Date: Tue Mar 6 12:10:55 2012 +0100 > >>>> > >>>>qa: add some stress tools to test conntrack via ctnetlink > >>>> > >>>>(BTW, ct_stress may disrupt your network connection since the table > >>>>gets filled. You can use conntrack -F to get the ct table empty again). > >>>> > >>> > >>>Sorry if this is a silly question but should conntrackd be running > >>>while I conduct this stress test? If so, is there any danger of the > >>>master becoming unstable? I must ask because, if the stability of > >>>the master is compromised, I will be in big trouble ;) > >> > >>If you run this in the backup, conntrackd will spam the master with > >>lots of new flows in the external cache. That shouldn't be a problem > >>(just a bit of extra load invested in the replication). > >> > >>But if you run this in the master, my test will fill the ct table > >>with lots of assured flows. Thus, packets that belong new flows will > >>be likely dropped in that node. > > > >That makes sense. So, I rebooted the backup with the latest kernel > >build, ran my iptables script then started conntrackd. I was not able to > >destabilize the system through the use of your stress tool. The sequence > >of commands used to invoke the ct_stress tool was as follows:- > > > >1) ct_stress 2097152 > >2) ct_stress 2097152 > >3) ct_stress 1048576 > > > >There were indeed a lot of ENOMEM errors, and messages warning that the > >conntrack table was full with packets being dropped. Nothing surprising. > > > >I then tried my test case again. The exact sequence of commands was as > >follows:- > > > >4) conntrackd -n > >5) conntrackd -c > >6) conntrackd -f internal > >7) conntrackd -F > >8) conntrackd -n > >9) conntrackd -c > > > >It didn't crash after the 5th step (to my amazement) but it did after > >the 9th. Here's a netconsole log covering all of the above: > > > >http://paste.pocoo.org/raw/562136/ > > > >The invalid opcode error was also present in the log that I provided > >with my first post in this thread. > > > >For some reason, I couldn't capture stdout from your ct_events tool but > >here's as much as I was able to copy and paste before it stopped > >responding completely. > > > >2100000 events received (2 new, 1048702 destroy) > >2110000 events received (2 new, 1048706 destroy) > >2120000 events received (2 new, 1048713 destroy) > >2130000 events received (2 new, 1048722 destroy) > >2140000 events received (2 new, 1048735 destroy) > >2150000 events received (2 new, 1048748 destroy) > >2160000 events received (2 new, 1048776 destroy) > >2170000 events received (2 new, 1048797 destroy) > >2180000 events received (2 new, 1048830 destroy) > >2190000 events received (2 new, 1048872 destroy) > >2200000 events received (2 new, 1048909 destroy) > >2210000 events received (2 new, 1048945 destroy) > >2220000 events received (2 new, 1048985 destroy) > >2230000 events received (2 new, 1049039 destroy) > >2240000 events received (2 new, 1049102 destroy) > >2250000 events received (2 new, 1049170 destroy) > >2260000 events received (2 new, 1049238 destroy) > >2270000 events received (2 new, 1049292 destroy) > >2280000 events received (2 new, 1049347 destroy) > >2290000 events received (2 new, 1049423 destroy) > >2300000 events received (2 new, 1049490 destroy) > >2310000 events received (2 new, 1049563 destroy) > >2320000 events received (2 new, 1049646 destroy) > >2330000 events received (2 new, 1049739 destroy) > >2340000 events received (2 new, 1049819 destroy) > >2350000 events received (2 new, 1049932 destroy) > >2360000 events received (2 new, 1050040 destroy) > >2370000 events received (2 new, 1050153 destroy) > >2380000 events received (2 new, 1050293 destroy) > >2390000 events received (2 new, 1050405 destroy) > >2400000 events received (2 new, 1050535 destroy) > >2410000 events received (2 new, 1050661 destroy) > >2420000 events received (2 new, 1050786 destroy) > >2430000 events received (2 new, 1050937 destroy) > >2440000 events received (2 new, 1051085 destroy) > >2450000 events received (2 new, 1051226 destroy) > >2460000 events received (2 new, 1051378 destroy) > >2470000 events received (2 new, 1051542 destroy) > >2480000 events received (2 new, 1051693 destroy) > >2490000 events received (2 new, 1051852 destroy) > >2500000 events received (2 new, 1052008 destroy) > >2510000 events received (2 new, 1052185 destroy) > >2520000 events received (2 new, 1052373 destroy) > >2530000 events received (2 new, 1052569 destroy) > >2540000 events received (2 new, 1052770 destroy) > >2550000 events received (2 new, 1052978 destroy) > > Just to add that I ran a more extensive stress test on the backup, > like so ... > > for x in $(seq 1 100); do ct_stress 1048576; sleep $(( $RANDOM % 60 )); done I guess you're running ct_events_reliable as well. Lauching several ct_stress at the same time is also interesting. > It remained stable throughout. I notice that there's an option to > dump the cache in XML format. I wonder if it be useful if I were to > provide such a dump, having synced with the master? Assuming that > there's a way to inject the contents, perhaps you could reproduce > the issue also. I've been launching the user-space stress tests but I was not abled to reproduce the problem that you reported so far. I'd need to know if the problem that you reported is easy to reproduce in your setup or, it looks more like a race condition. Moreover, I need to know if there was some traffic circulating through the backup or no traffic at all. Let me know. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Pablo, On 08/03/2012 01:33, Pablo Neira Ayuso wrote: > On Wed, Mar 07, 2012 at 02:41:02PM +0000, Kerin Millar wrote: <snip> >>> That makes sense. So, I rebooted the backup with the latest kernel >>> build, ran my iptables script then started conntrackd. I was not able to >>> destabilize the system through the use of your stress tool. The sequence >>> of commands used to invoke the ct_stress tool was as follows:- >>> >>> 1) ct_stress 2097152 >>> 2) ct_stress 2097152 >>> 3) ct_stress 1048576 >>> >>> There were indeed a lot of ENOMEM errors, and messages warning that the >>> conntrack table was full with packets being dropped. Nothing surprising. >>> >>> I then tried my test case again. The exact sequence of commands was as >>> follows:- >>> >>> 4) conntrackd -n >>> 5) conntrackd -c >>> 6) conntrackd -f internal >>> 7) conntrackd -F >>> 8) conntrackd -n >>> 9) conntrackd -c >>> >>> It didn't crash after the 5th step (to my amazement) but it did after >>> the 9th. Here's a netconsole log covering all of the above: >>> >>> http://paste.pocoo.org/raw/562136/ >>> >>> The invalid opcode error was also present in the log that I provided >>> with my first post in this thread. >>> >>> For some reason, I couldn't capture stdout from your ct_events tool but >>> here's as much as I was able to copy and paste before it stopped >>> responding completely. >>> >>> 2100000 events received (2 new, 1048702 destroy) >>> 2110000 events received (2 new, 1048706 destroy) >>> 2120000 events received (2 new, 1048713 destroy) >>> 2130000 events received (2 new, 1048722 destroy) >>> 2140000 events received (2 new, 1048735 destroy) >>> 2150000 events received (2 new, 1048748 destroy) >>> 2160000 events received (2 new, 1048776 destroy) >>> 2170000 events received (2 new, 1048797 destroy) >>> 2180000 events received (2 new, 1048830 destroy) >>> 2190000 events received (2 new, 1048872 destroy) >>> 2200000 events received (2 new, 1048909 destroy) >>> 2210000 events received (2 new, 1048945 destroy) >>> 2220000 events received (2 new, 1048985 destroy) >>> 2230000 events received (2 new, 1049039 destroy) >>> 2240000 events received (2 new, 1049102 destroy) >>> 2250000 events received (2 new, 1049170 destroy) >>> 2260000 events received (2 new, 1049238 destroy) >>> 2270000 events received (2 new, 1049292 destroy) >>> 2280000 events received (2 new, 1049347 destroy) >>> 2290000 events received (2 new, 1049423 destroy) >>> 2300000 events received (2 new, 1049490 destroy) >>> 2310000 events received (2 new, 1049563 destroy) >>> 2320000 events received (2 new, 1049646 destroy) >>> 2330000 events received (2 new, 1049739 destroy) >>> 2340000 events received (2 new, 1049819 destroy) >>> 2350000 events received (2 new, 1049932 destroy) >>> 2360000 events received (2 new, 1050040 destroy) >>> 2370000 events received (2 new, 1050153 destroy) >>> 2380000 events received (2 new, 1050293 destroy) >>> 2390000 events received (2 new, 1050405 destroy) >>> 2400000 events received (2 new, 1050535 destroy) >>> 2410000 events received (2 new, 1050661 destroy) >>> 2420000 events received (2 new, 1050786 destroy) >>> 2430000 events received (2 new, 1050937 destroy) >>> 2440000 events received (2 new, 1051085 destroy) >>> 2450000 events received (2 new, 1051226 destroy) >>> 2460000 events received (2 new, 1051378 destroy) >>> 2470000 events received (2 new, 1051542 destroy) >>> 2480000 events received (2 new, 1051693 destroy) >>> 2490000 events received (2 new, 1051852 destroy) >>> 2500000 events received (2 new, 1052008 destroy) >>> 2510000 events received (2 new, 1052185 destroy) >>> 2520000 events received (2 new, 1052373 destroy) >>> 2530000 events received (2 new, 1052569 destroy) >>> 2540000 events received (2 new, 1052770 destroy) >>> 2550000 events received (2 new, 1052978 destroy) >> >> Just to add that I ran a more extensive stress test on the backup, >> like so ... >> >> for x in $(seq 1 100); do ct_stress 1048576; sleep $(( $RANDOM % 60 )); done > > I guess you're running ct_events_reliable as well. Lauching several > ct_stress at the same time is also interesting. The ct_events (conntrack_events.c) program was running throughout and "NetlinkEventsReliable On" remains defined in conntrackd.conf. I will try running ct_stress concurrently. > >> It remained stable throughout. I notice that there's an option to >> dump the cache in XML format. I wonder if it be useful if I were to >> provide such a dump, having synced with the master? Assuming that >> there's a way to inject the contents, perhaps you could reproduce >> the issue also. > > I've been launching the user-space stress tests but I was not abled to > reproduce the problem that you reported so far. > > I'd need to know if the problem that you reported is easy to > reproduce in your setup or, it looks more like a race condition. > Moreover, I need to know if there was some traffic circulating > through the backup or no traffic at all. Indeed, I can reproduce it so easily and consistently that I've now lost track of the amount of times I've had to hard reboot this machine. To recap: I boot the slave with 3.3-rc5 (now featuring the three patches you asked me to apply). My iptables ruleset is loaded and conntrackd is started. At that point, the stats look like this:- # conntrackd -s cache internal: current active connections: 109 connections created: 175 failed: 0 connections updated: 0 failed: 0 connections destroyed: 66 failed: 0 cache external: current active connections: 3676 connections created: 3688 failed: 0 connections updated: 4 failed: 0 connections destroyed: 12 failed: 0 traffic processed: 0 Bytes 0 Pckts UDP traffic (active device=eth2): 4360 Bytes sent 570188 Bytes recv 91 Pckts sent 6681 Pckts recv 0 Error send 0 Error recv message tracking: 0 Malformed msgs 0 Lost msgs NOTE: if left alone, external cache usage grows steadily - as expected due to the master handling new connections to our busy load balance/appserver farm. Next, rather than spend the time waiting for the backup to catch up, I explicitly synchronize with the master. # conntrackd -n # conntrackd -s cache internal: current active connections: 5 connections created: 179 failed: 0 connections updated: 0 failed: 0 connections destroyed: 174 failed: 0 cache external: current active connections: 1295640 connections created: 1351838 failed: 0 connections updated: 26 failed: 0 connections destroyed: 56198 failed: 0 traffic processed: 0 Bytes 0 Pckts UDP traffic (active device=eth2): 139384 Bytes sent 112306776 Bytes recv 1054 Pckts sent 135073 Pckts recv 0 Error send 0 Error recv message tracking: 0 Malformed msgs 175170 Lost msgs The number of state entries in the external cache is now in line with the master. Finally, I commit. # conntrackd -c What happens next is one of two things ... a) a seemingly never-ending series of hard lockups occur b) it panics with "not syncing: Fatal exception in interrupt" Scenario (a) seems to be more frequent but, either way, it happens virtually every single time. I think I've had no more than *two* occasions where conntrackd -c returned to prompt in dozens upon dozens of tests and, even then, the system didn't survive a second invocation of conntrackd -c. In all cases, I have to hard reboot the machine. Netconsole logs for both of these outcomes have been provided in previous posts. Ergo, it's entirely reproducible here. There are absolutely no signs of instability with the system except for when state is being committed. Indeed, I first became aware of this situation upon simulating a genuine failover scenario. That is, I shut down the master, ucarp migrated the VIPs and instructed conntrackd to commit state. Then it died. Since that first time, I deactivated ucarp entirely and have easily reproduced the issue by running conntrackd -c manually. Cheers, --Kerin -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/03/2012 01:33, Pablo Neira Ayuso wrote: > Moreover, I need to know if there was some traffic circulating > through the backup or no traffic at all. Sorry, I didn't address this point in my previous email. The backup does indeed handle some traffic. Both systems run BIND and, as such, the backup is also our secondary public-facing nameserver. The load generated by this is not significant though. At any given moment, the number of state entries hovers at between 100-150 and almost all of these are from UDP entries on account of DNS queries. The rest can be accounted for by ntpd (about 4 active entries for upstream ntp servers), ssh (1 connection only), ICMP echo-request handling and a few other sundries. In my test case, I am running conntrackd -c under circumstances where conntrackd on the master is still pushing events across. But, I have also simulated a realistic failover scenario on at least two occasions by shutting down the master (at which point, conntrackd terminates and is obviously no longer pushing events to the backup). Regardless, the backup still crashes upon conntrackd -c. In summary: * Both nodes are handling DNS traffic (but it's packet forwarding which really generates a heavy load) * conntrackd -c has been run under circumstances where conntrack daemon is and isn't continuing to receive traffic from other node. It crashes anyway. Cheers, --Kerin -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From 1320c099d618a278fa17715127d6fecca2786a36 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso <pablo@netfilter.org> Date: Sun, 4 Mar 2012 11:34:06 +0100 Subject: [PATCH] netfilter: nf_conntrack: fix early_drop with reliable event delivery With reliable event delivery is enabled, if we fail to deliver the destroy event in early_drop, we put out one entry that is still in the dying list. Reported-by: Kerin Millar <kerframil@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> --- net/netfilter/nf_conntrack_core.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index ed86a3b..7d2d641 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -635,6 +635,11 @@ static noinline int early_drop(struct net *net, unsigned int hash) if (del_timer(&ct->timeout)) { death_by_timeout((unsigned long)ct); + /* Check if we indeed killed this entry. Reliable event + delivery may insert this into the dying list. */ + if (!test_bit(IPS_DYING_BIT, &ct->status)) + return dropped; + dropped = 1; NF_CT_STAT_INC_ATOMIC(net, early_drop); } -- 1.7.7.3