Message ID | 4A0105A8.3060707@cosmosbay.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, May 06, 2009 at 05:36:08AM +0200, Eric Dumazet wrote: > > Is there any way at least to balance individual NICs on per core basis? > > > > Problem of this setup is you have four NICS, but two logical devices (bond0 > & bond1) and a central HTB thing. This essentialy makes flows go through the same > locks (some rwlocks guarding bonding driver, and others guarding HTB structures). > > Also when a cpu receives a frame on ethX, it has to forward it on ethY, and > another lock guards access to TX queue of ethY device. If another cpus receives > a frame on ethZ and want to forward it to ethY device, this other cpu will > need same locks and everything slowdown. > > I am pretty sure you could get good results choosing two cpus sharing same L2 > cache. L2 on your cpu is 6MB. Another point would be to carefuly choose size > of RX rings on ethX devices. You could try to *reduce* them so that number > of inflight skb is small enough that everything fits in this 6MB cache. > > Problem is not really CPU power, but RAM bandwidth. Having two cores instead of one > attached to one central memory bank wont increase ram bandwidth, but reduce it. Thanks for the detailed explanation. On the particular server I reported, I worked around the problem by getting rid of classes and switching to ingress policers. However, I have one central box doing HTB, small amount of classes, but 850 mbps of traffic. The CPU is dual-core 5160 @ 3 Ghz. With 2.6.29 + bond I'm experiencing strange problems with HTB, under high load borrowing doesn't seem to work properly. This box has two BNX2 and two E1000 NICs, and for some reason I cannot force BNX2 to sit on a single IRQ - even though I put only one CPU into smp_affinity, it keeps balancing on both. So I cannot figure out if its related to IRQ balancing or not. [root@tshape3 tshaper]# cat /proc/irq/63/smp_affinity 01 [root@tshape3 tshaper]# cat /proc/interrupts | grep eth0 63: 44610754 95469129 PCI-MSI-edge eth0 [root@tshape3 tshaper]# cat /proc/interrupts | grep eth0 63: 44614125 95472512 PCI-MSI-edge eth0 lspci -v: 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) Subsystem: Hewlett-Packard Company NC373i Integrated Multifunction Gigabit Server Adapter Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 63 Memory at f8000000 (64-bit, non-prefetchable) [size=32M] [virtual] Expansion ROM at 88200000 [disabled] [size=2K] Capabilities: [40] PCI-X non-bridge device Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data <?> Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+ Kernel driver in use: bnx2 Kernel modules: bnx2 Any ideas on how to force it on a single CPU ? Thanks for the new patch, I will try it and let you know.
Vladimir Ivashchenko a écrit : > On Wed, May 06, 2009 at 05:36:08AM +0200, Eric Dumazet wrote: > >>> Is there any way at least to balance individual NICs on per core basis? >>> >> Problem of this setup is you have four NICS, but two logical devices (bond0 >> & bond1) and a central HTB thing. This essentialy makes flows go through the same >> locks (some rwlocks guarding bonding driver, and others guarding HTB structures). >> >> Also when a cpu receives a frame on ethX, it has to forward it on ethY, and >> another lock guards access to TX queue of ethY device. If another cpus receives >> a frame on ethZ and want to forward it to ethY device, this other cpu will >> need same locks and everything slowdown. >> >> I am pretty sure you could get good results choosing two cpus sharing same L2 >> cache. L2 on your cpu is 6MB. Another point would be to carefuly choose size >> of RX rings on ethX devices. You could try to *reduce* them so that number >> of inflight skb is small enough that everything fits in this 6MB cache. >> >> Problem is not really CPU power, but RAM bandwidth. Having two cores instead of one >> attached to one central memory bank wont increase ram bandwidth, but reduce it. > > Thanks for the detailed explanation. > > On the particular server I reported, I worked around the problem by getting rid of classes > and switching to ingress policers. > > However, I have one central box doing HTB, small amount of classes, but 850 mbps of > traffic. The CPU is dual-core 5160 @ 3 Ghz. With 2.6.29 + bond I'm experiencing strange problems > with HTB, under high load borrowing doesn't seem to work properly. This box has two > BNX2 and two E1000 NICs, and for some reason I cannot force BNX2 to sit on a single IRQ - > even though I put only one CPU into smp_affinity, it keeps balancing on both. So I cannot > figure out if its related to IRQ balancing or not. > > [root@tshape3 tshaper]# cat /proc/irq/63/smp_affinity > 01 > [root@tshape3 tshaper]# cat /proc/interrupts | grep eth0 > 63: 44610754 95469129 PCI-MSI-edge eth0 > [root@tshape3 tshaper]# cat /proc/interrupts | grep eth0 > 63: 44614125 95472512 PCI-MSI-edge eth0 > > lspci -v: > > 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) > Subsystem: Hewlett-Packard Company NC373i Integrated Multifunction Gigabit Server Adapter > Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 63 > Memory at f8000000 (64-bit, non-prefetchable) [size=32M] > [virtual] Expansion ROM at 88200000 [disabled] [size=2K] > Capabilities: [40] PCI-X non-bridge device > Capabilities: [48] Power Management version 2 > Capabilities: [50] Vital Product Data <?> > Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+ > Kernel driver in use: bnx2 > Kernel modules: bnx2 > > > Any ideas on how to force it on a single CPU ? > > Thanks for the new patch, I will try it and let you know. > Yes, its doable but tricky with bnx2, this is a known problem on recent kernels as well. You must do for example (to bind on CPU 0) echo 1 >/proc/irq/default_smp_affinity ifconfig eth1 down # IRQ of eth1 handled by CPU0 only echo 1 >/proc/irq/34/smp_affinity ifconfig eth1 up ifconfig eth0 down # IRQ of eth0 handled by CPU0 only echo 1 >/proc/irq/36/smp_affinity ifconfig eth0 up One thing to consider too is the BIOS option you might have, labeled "Adjacent Sector Prefetch" This basically tells your cpu to use 128 bytes cache lines, instead of 64 In your forwarding worload, I believe this extra prefetch can slowdown your machine. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 06 May 2009 13:41:25 Eric Dumazet wrote: > You must do for example (to bind on CPU 0) > > echo 1 >/proc/irq/default_smp_affinity > > ifconfig eth1 down > # IRQ of eth1 handled by CPU0 only > echo 1 >/proc/irq/34/smp_affinity > ifconfig eth1 up > > ifconfig eth0 down > # IRQ of eth0 handled by CPU0 only > echo 1 >/proc/irq/36/smp_affinity > ifconfig eth0 up I think better to use some method over ethtool, that will cause reset. WHen you do down - you will loose default route, beware of that > > > One thing to consider too is the BIOS option you might have, labeled > "Adjacent Sector Prefetch" > > This basically tells your cpu to use 128 bytes cache lines, instead of 64 > > In your forwarding worload, I believe this extra prefetch can slowdown your > machine. > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2009-05-06 at 05:36 +0200, Eric Dumazet wrote: > Ah, I forgot about one patch that could help your setup too (if using more than one > cpu on NIC irqs of course), queued for 2.6.31 I have tried the patch. Didn't make a noticeable difference. Under 850 mbps HTB+sfq load, 2.6.29.1, four NICs / two bond ifaces, IRQ balancing, the dual-core server has only 25% idle on each CPU. What's interesting, the same 850mbps load, identical machine, but with only two NICs and no bond, HTB+esfq, kernel 2.6.21.2 => 60% CPU idle. 2.5x overhead. > (commit 6a321cb370ad3db4ba6e405e638b3a42c41089b0) > > You could post oprofile results to help us finding other hot spots. > > > [PATCH] net: netif_tx_queue_stopped too expensive > > netif_tx_queue_stopped(txq) is most of the time false. > > Yet its cost is very expensive on SMP. > > static inline int netif_tx_queue_stopped(const struct netdev_queue *dev_queue) > { > return test_bit(__QUEUE_STATE_XOFF, &dev_queue->state); > } > > I saw this on oprofile hunting and bnx2 driver bnx2_tx_int(). > > We probably should split "struct netdev_queue" in two parts, one > being read mostly. > > __netif_tx_lock() touches _xmit_lock & xmit_lock_owner, these > deserve a separate cache line. > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 2e7783f..1caaebb 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -447,12 +447,18 @@ enum netdev_queue_state_t > }; > > struct netdev_queue { > +/* > + * read mostly part > + */ > struct net_device *dev; > struct Qdisc *qdisc; > unsigned long state; > - spinlock_t _xmit_lock; > - int xmit_lock_owner; > struct Qdisc *qdisc_sleeping; > +/* > + * write mostly part > + */ > + spinlock_t _xmit_lock ____cacheline_aligned_in_smp; > + int xmit_lock_owner; > } ____cacheline_aligned_in_smp; > >
On Wednesday 06 May 2009 21:45:18 Vladimir Ivashchenko wrote: > On Wed, 2009-05-06 at 05:36 +0200, Eric Dumazet wrote: > > Ah, I forgot about one patch that could help your setup too (if using > > more than one cpu on NIC irqs of course), queued for 2.6.31 > > I have tried the patch. Didn't make a noticeable difference. Under 850 > mbps HTB+sfq load, 2.6.29.1, four NICs / two bond ifaces, IRQ balancing, > the dual-core server has only 25% idle on each CPU. > > What's interesting, the same 850mbps load, identical machine, but with > only two NICs and no bond, HTB+esfq, kernel 2.6.21.2 => 60% CPU idle. > 2.5x overhead. Probably oprofile can sched some light on this. On my own experience IRQ balancing hurt performance a lot, because of cache misses. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 06, 2009 at 10:30:04PM +0300, Denys Fedoryschenko wrote: > > What's interesting, the same 850mbps load, identical machine, but with > > only two NICs and no bond, HTB+esfq, kernel 2.6.21.2 => 60% CPU idle. > > 2.5x overhead. > > Probably oprofile can sched some light on this. > On my own experience IRQ balancing hurt performance a lot, because of cache > misses. This is a dual-core machine, isn't cache shared between the cores? Without IRQ balancing, one of the cores goes around 10% idle and HTB doesn't do its job properly. Actually, in my experience HTB stops working properly after idle goes below 35%. I'll try gathering some stats using oprofile.
On Wednesday 06 May 2009 23:47:59 Vladimir Ivashchenko wrote: > On Wed, May 06, 2009 at 10:30:04PM +0300, Denys Fedoryschenko wrote: > > > What's interesting, the same 850mbps load, identical machine, but with > > > only two NICs and no bond, HTB+esfq, kernel 2.6.21.2 => 60% CPU idle. > > > 2.5x overhead. > > > > Probably oprofile can sched some light on this. > > On my own experience IRQ balancing hurt performance a lot, because of > > cache misses. > > This is a dual-core machine, isn't cache shared between the cores? > > Without IRQ balancing, one of the cores goes around 10% idle and HTB > doesn't do its job properly. Actually, in my experience HTB stops working > properly after idle goes below 35%. It seems they should. No idea, more experienced guys should know more. Can you show me please cat /proc/net/psched If it is highres working, try to add in HTB script, first line HZ=1000 to set environment variable. Because if clock resolution high, burst calculation going crazy on high speeds. Maybe it will help. Also without irq balance, did you try to assign interface to cpu by smp_affinity? (/proc/irq/NN/smp_affinity) And still i think best thing is oprofile. It can show "hot" places in code, who is spending cpu cycles. > > I'll try gathering some stats using oprofile. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Without IRQ balancing, one of the cores goes around 10% idle and HTB > > doesn't do its job properly. Actually, in my experience HTB stops working > > properly after idle goes below 35%. > It seems they should. No idea, more experienced guys should know more. > > Can you show me please > cat /proc/net/psched > If it is highres working, try to add in HTB script, first line > > HZ=1000 > to set environment variable. Because if clock resolution high, burst > calculation going crazy on high speeds. > Maybe it will help. Wow, instead of 98425b burst, its calculating 970203b. Exporting HZ=1000 doesn't help. However, even if I recompile the kernel to 1000 Hz and the burst is calculated correctly, for some reason HTB on 2.6.29 is still worse at rate control than 2.6.21. With 2.6.21, ceil of 775 mbits, burst 99425b -> actual rate 825 mbits. With 2.6.29, same ceil/burst -> actual rate 890 mbits. Moreover, after I stop the traffic *COMPLETELY* on 2.6.29, actual rate reported by htb goes ballistic and stays at 1100mbits. Then it drops back to expected value after a minute or so. > Also without irq balance, did you try to assign interface to cpu by > smp_affinity? (/proc/irq/NN/smp_affinity) Yes, I did, didn't make any difference. > And still i think best thing is oprofile. It can show "hot" places in code, > who is spending cpu cycles. For some reason I get a hard freeze when I start oprofile daemon, even without traffic. Never used oprofile before, so I'm not sure if I'm doing something wrong ... I'm starting it just with --vmlinux parameter and nothing else. I use vanilla 2.6.29 and oprofile from FC8.
On Friday 08 May 2009 23:46:11 Vladimir Ivashchenko wrote: > > > Without IRQ balancing, one of the cores goes around 10% idle and HTB > > > doesn't do its job properly. Actually, in my experience HTB stops > > > working properly after idle goes below 35%. > > > > It seems they should. No idea, more experienced guys should know more. > > > > Can you show me please > > cat /proc/net/psched > > If it is highres working, try to add in HTB script, first line > > > > HZ=1000 > > to set environment variable. Because if clock resolution high, burst > > calculation going crazy on high speeds. > > Maybe it will help. > > Wow, instead of 98425b burst, its calculating 970203b. Kind of strange burst, something wrong there. For 1000HZ and 1 Gbit it should be 126375b. You value is for 8Gbit/s. What version of iproute2 you are using ( tc -V )? > > Exporting HZ=1000 doesn't help. However, even if I recompile the kernel > to 1000 Hz and the burst is calculated correctly, for some reason HTB on > 2.6.29 is still worse at rate control than 2.6.21. > > With 2.6.21, ceil of 775 mbits, burst 99425b -> actual rate 825 mbits. > With 2.6.29, same ceil/burst -> actual rate 890 mbits. It depends also if there is child classes, what is bursts set for them, and what is ceil/burst set for them. > > Moreover, after I stop the traffic *COMPLETELY* on 2.6.29, actual rate > reported by htb goes ballistic and stays at 1100mbits. Then it drops > back to expected value after a minute or so. It is average bandwidth for some period, it is not realtime value. > > > Also without irq balance, did you try to assign interface to cpu by > > smp_affinity? (/proc/irq/NN/smp_affinity) > > Yes, I did, didn't make any difference. What is a clock source? cat /sys/devices/system/clocksource/clocksource0/current_clocksource Timer resolution? cat /proc/net/psched > > > And still i think best thing is oprofile. It can show "hot" places in > > code, who is spending cpu cycles. > > For some reason I get a hard freeze when I start oprofile daemon, even > without traffic. Never used oprofile before, so I'm not sure if I'm > doing something wrong ... I'm starting it just with --vmlinux parameter > and nothing else. I use vanilla 2.6.29 and oprofile from FC8. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Wow, instead of 98425b burst, its calculating 970203b. > Kind of strange burst, something wrong there. For 1000HZ and 1 Gbit it should > be 126375b. You value is for 8Gbit/s. > What version of iproute2 you are using ( tc -V )? That was iproute2-ss080725, I think it is confused by tickless mode. With iproute2-ss090324 I'm getting an opposite: 1589b :) > > > > With 2.6.21, ceil of 775 mbits, burst 99425b -> actual rate 825 mbits. > > With 2.6.29, same ceil/burst -> actual rate 890 mbits. > It depends also if there is child classes, what is bursts set for them, and > what is ceil/burst set for them. All child classes have smaller bursts than the parent. However, there are two sub-classes which have ceil at 70% of parent, e.g. ~500mbit each. I don't know HTB internals, perhaps these two classes make the parent class overstretch itself. By the way, I experience the same "overstretching" with hfsc. In any case, I prefer HTB because it reports statistics of parent classes, unlike hfsc. > > Moreover, after I stop the traffic *COMPLETELY* on 2.6.29, actual rate > > reported by htb goes ballistic and stays at 1100mbits. Then it drops > > back to expected value after a minute or so. > It is average bandwidth for some period, it is not realtime value. But why it would it jump from 850mbits to 1200mbits *AFTER* I remove all the traffic ? > > Yes, I did, didn't make any difference. > What is a clock source? > cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc > Timer resolution? > cat /proc/net/psched With tickless kernel: 000003e8 00000400 000f4240 3b9aca00
On Saturday 09 May 2009 01:07:27 Vladimir Ivashchenko wrote: > > > Wow, instead of 98425b burst, its calculating 970203b. > > > > Kind of strange burst, something wrong there. For 1000HZ and 1 Gbit it > > should be 126375b. You value is for 8Gbit/s. > > What version of iproute2 you are using ( tc -V )? > > That was iproute2-ss080725, I think it is confused by tickless mode. > With iproute2-ss090324 I'm getting an opposite: 1589b :) And it is too low. Thats why i set HZ=1000 > > > All child classes have smaller bursts than the parent. However, there are > two sub-classes which have ceil at 70% of parent, e.g. ~500mbit each. I > don't know HTB internals, perhaps these two classes make the parent class > overstretch itself. As i remember important to keep sum of child rates lower or equal parent rate. Sure ceil of childs must not exceed ceil of parent. Sometimes i had mess, when i tried to play with quantum value. After all that i switched to HFSC which works for me flawlessly. Maybe we should give more attention to HTB problem with high speeds and help kernel developers spot problem, if there is any. > > By the way, I experience the same "overstretching" with hfsc. In any case, > I prefer HTB because it reports statistics of parent classes, unlike hfsc. Sometimes it happen when some offloading enabled on devices. Check ethtool -k device I think everything except rx/tx checksumming have to be off, at least for test. Disable them by "ethtool -K device tso off " for example. > > But why it would it jump from 850mbits to 1200mbits *AFTER* I remove all > the traffic ? > Well, i dont know how it is doing averaging, even maybe for 1 minute. I dont like it at all, and thats why i prefer HFSC. But HTB work very well in some setups -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > All child classes have smaller bursts than the parent. However, there are > > two sub-classes which have ceil at 70% of parent, e.g. ~500mbit each. I > > don't know HTB internals, perhaps these two classes make the parent class > > overstretch itself. > As i remember important to keep sum of child rates lower or equal parent rate. > Sure ceil of childs must not exceed ceil of parent. > Sometimes i had mess, when i tried to play with quantum value. After all that > i switched to HFSC which works for me flawlessly. Maybe we should give more > attention to HTB problem with high speeds and help kernel developers spot > problem, if there is any. In case of HFSC my problem is even worse. With 775mbit ceiling configured it is passing over 900mbit in reality. Moreover not having statistics for parent classes makes it difficult to troubleshoot :( I'm 100% sure that it is 900 mbps, I see this on the switch. Attached is "tc -s -d class show dev bond0" output. To calculate total traffic rate: $ cat hfsc-stat.txt | grep rate | grep Kbit | sed 's/Kbit//' | awk '{ a=a+$2; } END { print a; }' 906955 Did I misconfigure something ?... How can hfsc go above 775mbit when everything goes via class 1:2 with 775mbit rate & ul ? > > By the way, I experience the same "overstretching" with hfsc. In any case, > > I prefer HTB because it reports statistics of parent classes, unlike hfsc. > Sometimes it happen when some offloading enabled on devices. > Check ethtool -k device Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off udp fragmentation offload: off > I think everything except rx/tx checksumming have to be off, at least for > test. > > Disable them by "ethtool -K device tso off " for example. Doesn't help.
On 17-05-2009 20:46, Vladimir Ivashchenko wrote: >>> All child classes have smaller bursts than the parent. However, there are >>> two sub-classes which have ceil at 70% of parent, e.g. ~500mbit each. I >>> don't know HTB internals, perhaps these two classes make the parent class >>> overstretch itself. >> As i remember important to keep sum of child rates lower or equal parent rate. >> Sure ceil of childs must not exceed ceil of parent. >> Sometimes i had mess, when i tried to play with quantum value. After all that >> i switched to HFSC which works for me flawlessly. Maybe we should give more >> attention to HTB problem with high speeds and help kernel developers spot >> problem, if there is any. > > In case of HFSC my problem is even worse. With 775mbit ceiling > configured it is passing over 900mbit in reality. Moreover not having > statistics for parent classes makes it difficult to troubleshoot :( I'm > 100% sure that it is 900 mbps, I see this on the switch. > > Attached is "tc -s -d class show dev bond0" output. > > To calculate total traffic rate: > > $ cat hfsc-stat.txt | grep rate | grep Kbit | sed 's/Kbit//' | awk > '{ a=a+$2; } END { print a; }' > 906955 > > Did I misconfigure something ?... How can hfsc go above 775mbit when > everything goes via class 1:2 with 775mbit rate & ul ? Maybe... It's a lot of checking - it seems test cases could be simpler to show the real problem. Anyway, it looks like the sum of m2 of 1:2 children is more than 775Mbit. >>> By the way, I experience the same "overstretching" with hfsc. In any case, >>> I prefer HTB because it reports statistics of parent classes, unlike hfsc. >> Sometimes it happen when some offloading enabled on devices. >> Check ethtool -k device > > Offload parameters for eth0: > rx-checksumming: on > tx-checksumming: on > scatter-gather: on > tcp segmentation offload: off > udp fragmentation offload: off Current versions of ethtool should show "generic segmentation offload" too. I hope you've read the nearby thread "HTB accuracy for high speed", which explains at least partially some problems/bugs, and maybe you'll try some patches too (at last one of them addresses the problem you've reported). Anyway, if you don't find hfsc is better for you I'd be more interested in tracking this on htb test cases yet. Thanks, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e7783f..1caaebb 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -447,12 +447,18 @@ enum netdev_queue_state_t }; struct netdev_queue { +/* + * read mostly part + */ struct net_device *dev; struct Qdisc *qdisc; unsigned long state; - spinlock_t _xmit_lock; - int xmit_lock_owner; struct Qdisc *qdisc_sleeping; +/* + * write mostly part + */ + spinlock_t _xmit_lock ____cacheline_aligned_in_smp; + int xmit_lock_owner; } ____cacheline_aligned_in_smp;