Message ID | 49F767FD.2040205@cosmosbay.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 28 Apr 2009, Eric Dumazet wrote: > The one that did improved your udpping 'bench' :) > > http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=bf368e4e70cd4e0f880923c44e95a4273d725ab4 Well yes that is git2p1. The measurements that we took showed not much of an effect as you see. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 28 Apr 2009, Eric Dumazet wrote: > BNX2 interrupts first handle TX completions, then RX events. > So If by the Guess there is more to come? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 28 Apr 2009, Eric Dumazet wrote: > Sorry, hit wrong key. > > So if by t You lost 2 more characters..... Keyboard wonky? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Lameter a écrit : > On Tue, 28 Apr 2009, Eric Dumazet wrote: > >> The one that did improved your udpping 'bench' :) >> >> http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=bf368e4e70cd4e0f880923c44e95a4273d725ab4 > > Well yes that is git2p1. The measurements that we took showed not much of > an effect as you see. > It depends of coalescing parameters of NIC. BNX2 interrupts first handle TX completions, then RX events. So If by the -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet a écrit : > Christoph Lameter a écrit : >> On Tue, 28 Apr 2009, Eric Dumazet wrote: >> >>> The one that did improved your udpping 'bench' :) >>> >>> http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=bf368e4e70cd4e0f880923c44e95a4273d725ab4 >> Well yes that is git2p1. The measurements that we took showed not much of >> an effect as you see. >> > > It depends of coalescing parameters of NIC. > > BNX2 interrupts first handle TX completions, then RX events. > So If by the Sorry, hit wrong key. So if by t -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet a écrit : > Christoph Lameter a écrit : >> On Tue, 28 Apr 2009, Eric Dumazet wrote: >> >>> The one that did improved your udpping 'bench' :) >>> >>> http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=bf368e4e70cd4e0f880923c44e95a4273d725ab4 >> Well yes that is git2p1. The measurements that we took showed not much of >> an effect as you see. >> > > It depends of coalescing parameters of NIC. > > BNX2 interrupts first handle TX completions, then RX events. > So If by the > Sorry for the previous message... If by the time interrupt comes to the host, TX was handled right before RX event, the extra wakeup is not a problem, because incoming frame will be delivered into socket queue right before awaken thread tries to pull it. On real workloads (many incoming/outgoing frames), then avoiding extra wakeups is a win, regardless of coalescing parameters and cpu affinities... On uddpping, I had prior to the patch about 49000 wakeups per second, and after patch about 26000 wakeups per second (matches number of incoming udp messages per second) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Eric Dumazet <dada1@cosmosbay.com> wrote: > On uddpping, I had prior to the patch about 49000 wakeups per > second, and after patch about 26000 wakeups per second (matches > number of incoming udp messages per second) very nice. It might not show up as a real performance difference if the CPUs are not fully saturated during the test - but it could show up as a decrease in CPU utilization. Also, if you run the test via 'perf stat -a ./test.sh' you should see a reduction in instructions executed: aldebaran:~/linux/linux> perf stat -a sleep 1 Performance counter stats for 'sleep': 16128.045994 task clock ticks (msecs) 12876 context switches (events) 219 CPU migrations (events) 186144 pagefaults (events) 20911802763 CPU cycles (events) 19309416815 instructions (events) 199608554 cache references (events) 19990754 cache misses (events) Wall-clock time elapsed: 1008.882282 msecs With -a it's measured system-wide, from start of test to end of test - the results will be a lot more stable (and relevant) statistically than wall-clock time or CPU usage measurements. (both of which are rather imprecise in general) Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ingo Molnar a écrit : > * Eric Dumazet <dada1@cosmosbay.com> wrote: > >> On uddpping, I had prior to the patch about 49000 wakeups per >> second, and after patch about 26000 wakeups per second (matches >> number of incoming udp messages per second) > > very nice. It might not show up as a real performance difference if > the CPUs are not fully saturated during the test - but it could show > up as a decrease in CPU utilization. > > Also, if you run the test via 'perf stat -a ./test.sh' you should > see a reduction in instructions executed: > > aldebaran:~/linux/linux> perf stat -a sleep 1 > > Performance counter stats for 'sleep': > > 16128.045994 task clock ticks (msecs) > 12876 context switches (events) > 219 CPU migrations (events) > 186144 pagefaults (events) > 20911802763 CPU cycles (events) > 19309416815 instructions (events) > 199608554 cache references (events) > 19990754 cache misses (events) > > Wall-clock time elapsed: 1008.882282 msecs > > With -a it's measured system-wide, from start of test to end of test > - the results will be a lot more stable (and relevant) statistically > than wall-clock time or CPU usage measurements. (both of which are > rather imprecise in general) I tried this perf stuff and got strange results on a cpu burning bench, saturating my 8 cpus with a "while (1) ;" loop # perf stat -a sleep 10 Performance counter stats for 'sleep': 80334.709038 task clock ticks (msecs) 80638 context switches (events) 4 CPU migrations (events) 468 pagefaults (events) 160694681969 CPU cycles (events) 160127154810 instructions (events) 686393 cache references (events) 230117 cache misses (events) Wall-clock time elapsed: 10041.531644 msecs So its about 16069468196 cycles per second for 8 cpus Divide by 8 to get 2008683524 cycles per second per cpu, which is not 3000000000 (E5450 @ 3.00GHz) It seems strange a "jmp myself" uses one unhalted cycle per instruction and 0.5 halted cycle ... Also, after using "perf stat", tbench results are 1778 MB/S instead of 2610 MB/s. Even if no perf stat running. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Eric Dumazet <dada1@cosmosbay.com> wrote: > Ingo Molnar a écrit : > > * Eric Dumazet <dada1@cosmosbay.com> wrote: > > > >> On uddpping, I had prior to the patch about 49000 wakeups per > >> second, and after patch about 26000 wakeups per second (matches > >> number of incoming udp messages per second) > > > > very nice. It might not show up as a real performance difference if > > the CPUs are not fully saturated during the test - but it could show > > up as a decrease in CPU utilization. > > > > Also, if you run the test via 'perf stat -a ./test.sh' you should > > see a reduction in instructions executed: > > > > aldebaran:~/linux/linux> perf stat -a sleep 1 > > > > Performance counter stats for 'sleep': > > > > 16128.045994 task clock ticks (msecs) > > 12876 context switches (events) > > 219 CPU migrations (events) > > 186144 pagefaults (events) > > 20911802763 CPU cycles (events) > > 19309416815 instructions (events) > > 199608554 cache references (events) > > 19990754 cache misses (events) > > > > Wall-clock time elapsed: 1008.882282 msecs > > > > With -a it's measured system-wide, from start of test to end of test > > - the results will be a lot more stable (and relevant) statistically > > than wall-clock time or CPU usage measurements. (both of which are > > rather imprecise in general) > > I tried this perf stuff and got strange results on a cpu burning > bench, saturating my 8 cpus with a "while (1) ;" loop > > > # perf stat -a sleep 10 > > Performance counter stats for 'sleep': > > 80334.709038 task clock ticks (msecs) > 80638 context switches (events) > 4 CPU migrations (events) > 468 pagefaults (events) > 160694681969 CPU cycles (events) > 160127154810 instructions (events) > 686393 cache references (events) > 230117 cache misses (events) > > Wall-clock time elapsed: 10041.531644 msecs > > So its about 16069468196 cycles per second for 8 cpus > Divide by 8 to get 2008683524 cycles per second per cpu, > which is not 3000000000 (E5450 @ 3.00GHz) What does "perf stat -l -a sleep 10" show? I suspect your counters are scaled by about 67%, due to counter over-commit. -l will show the scaling factor (and will scale up the results). If so then i think this behavior is confusing, and i'll make -l default-enabled. (in fact i just committed this change to latest -tip and pushed it out) To get only instructions and cycles, do: perf stat -e instructions -e cycles > It seems strange a "jmp myself" uses one unhalted cycle per > instruction and 0.5 halted cycle ... > > Also, after using "perf stat", tbench results are 1778 MB/S > instead of 2610 MB/s. Even if no perf stat running. Hm, that would be a bug. Could you send the dmesg output of: echo p > /proc/sysrq-trigger echo p > /proc/sysrq-trigger with counters running it will show something like: [ 868.105712] SysRq : Show Regs [ 868.106544] [ 868.106544] CPU#1: ctrl: ffffffffffffffff [ 868.106544] CPU#1: status: 0000000000000000 [ 868.106544] CPU#1: overflow: 0000000000000000 [ 868.106544] CPU#1: fixed: 0000000000000000 [ 868.106544] CPU#1: used: 0000000000000000 [ 868.106544] CPU#1: gen-PMC0 ctrl: 00000000001300c0 [ 868.106544] CPU#1: gen-PMC0 count: 000000ffee889194 [ 868.106544] CPU#1: gen-PMC0 left: 0000000011e1791a [ 868.106544] CPU#1: gen-PMC1 ctrl: 000000000013003c [ 868.106544] CPU#1: gen-PMC1 count: 000000ffd2542438 [ 868.106544] CPU#1: gen-PMC1 left: 000000002dd17a8e the counts should stay put (i.e. all counters should be disabled). If they move around - despite there being no 'perf stat -a' session running, that would be a bug. Also, the overhead might be profile-able, via: perf record -m 1024 sleep 10 (this records the profile into output.perf.) followed by: ./perf-report | tail -20 to display a histogram, with kernel-space and user-space symbols mixed into a single profile. (Pick up latest -tip to get perf-report built by default.) Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ingo Molnar a écrit : > * Eric Dumazet <dada1@cosmosbay.com> wrote: > >> Ingo Molnar a écrit : >>> * Eric Dumazet <dada1@cosmosbay.com> wrote: >>> >>>> On uddpping, I had prior to the patch about 49000 wakeups per >>>> second, and after patch about 26000 wakeups per second (matches >>>> number of incoming udp messages per second) >>> very nice. It might not show up as a real performance difference if >>> the CPUs are not fully saturated during the test - but it could show >>> up as a decrease in CPU utilization. >>> >>> Also, if you run the test via 'perf stat -a ./test.sh' you should >>> see a reduction in instructions executed: >>> >>> aldebaran:~/linux/linux> perf stat -a sleep 1 >>> >>> Performance counter stats for 'sleep': >>> >>> 16128.045994 task clock ticks (msecs) >>> 12876 context switches (events) >>> 219 CPU migrations (events) >>> 186144 pagefaults (events) >>> 20911802763 CPU cycles (events) >>> 19309416815 instructions (events) >>> 199608554 cache references (events) >>> 19990754 cache misses (events) >>> >>> Wall-clock time elapsed: 1008.882282 msecs >>> >>> With -a it's measured system-wide, from start of test to end of test >>> - the results will be a lot more stable (and relevant) statistically >>> than wall-clock time or CPU usage measurements. (both of which are >>> rather imprecise in general) >> I tried this perf stuff and got strange results on a cpu burning >> bench, saturating my 8 cpus with a "while (1) ;" loop >> >> >> # perf stat -a sleep 10 >> >> Performance counter stats for 'sleep': >> >> 80334.709038 task clock ticks (msecs) >> 80638 context switches (events) >> 4 CPU migrations (events) >> 468 pagefaults (events) >> 160694681969 CPU cycles (events) >> 160127154810 instructions (events) >> 686393 cache references (events) >> 230117 cache misses (events) >> >> Wall-clock time elapsed: 10041.531644 msecs >> >> So its about 16069468196 cycles per second for 8 cpus >> Divide by 8 to get 2008683524 cycles per second per cpu, >> which is not 3000000000 (E5450 @ 3.00GHz) > > What does "perf stat -l -a sleep 10" show? I suspect your counters > are scaled by about 67%, due to counter over-commit. -l will show > the scaling factor (and will scale up the results). Only difference I see with '-l' is cache misses not counted. (tbench 8 running, so not one instruction per cycle) # perf stat -l -a sleep 10 Performance counter stats for 'sleep': 80007.128844 task clock ticks (msecs) 6754642 context switches (events) 2 CPU migrations (events) 474 pagefaults (events) 160925719143 CPU cycles (events) 108482003620 instructions (events) 7584035056 cache references (events) <not counted> cache misses Wall-clock time elapsed: 10000.595448 msecs # perf stat -a sleep 10 Performance counter stats for 'sleep': 80702.908287 task clock ticks (msecs) 6792588 context switches (events) 24 CPU migrations (events) 4867 pagefaults (events) 161957342744 CPU cycles (events) 109147553984 instructions (events) 7633190481 cache references (events) 22996234 cache misses (events) Wall-clock time elapsed: 10087.502391 msecs > > If so then i think this behavior is confusing, and i'll make -l > default-enabled. (in fact i just committed this change to latest > -tip and pushed it out) > > To get only instructions and cycles, do: > > perf stat -e instructions -e cycles > # perf stat -e instructions -e cycles -a sleep 10 Performance counter stats for 'sleep': 109469842392 instructions (events) 162012922122 CPU cycles (events) Wall-clock time elapsed: 10124.943544 msecs I am wondering if cpus are not running at 2 GHz ;) >> It seems strange a "jmp myself" uses one unhalted cycle per >> instruction and 0.5 halted cycle ... >> >> Also, after using "perf stat", tbench results are 1778 MB/S >> instead of 2610 MB/s. Even if no perf stat running. > > Hm, that would be a bug. Could you send the dmesg output of: > > echo p > /proc/sysrq-trigger > echo p > /proc/sysrq-trigger > > with counters running it will show something like: > > [ 868.105712] SysRq : Show Regs > [ 868.106544] > [ 868.106544] CPU#1: ctrl: ffffffffffffffff > [ 868.106544] CPU#1: status: 0000000000000000 > [ 868.106544] CPU#1: overflow: 0000000000000000 > [ 868.106544] CPU#1: fixed: 0000000000000000 > [ 868.106544] CPU#1: used: 0000000000000000 > [ 868.106544] CPU#1: gen-PMC0 ctrl: 00000000001300c0 > [ 868.106544] CPU#1: gen-PMC0 count: 000000ffee889194 > [ 868.106544] CPU#1: gen-PMC0 left: 0000000011e1791a > [ 868.106544] CPU#1: gen-PMC1 ctrl: 000000000013003c > [ 868.106544] CPU#1: gen-PMC1 count: 000000ffd2542438 > [ 868.106544] CPU#1: gen-PMC1 left: 000000002dd17a8e They stay fix (but only CPU#0 is displayed) Is perf able to display per cpu counters, and not aggregated values ? [ 7894.426787] CPU#0: ctrl: ffffffffffffffff [ 7894.426788] CPU#0: status: 0000000000000000 [ 7894.426790] CPU#0: overflow: 0000000000000000 [ 7894.426792] CPU#0: fixed: 0000000000000000 [ 7894.426793] CPU#0: used: 0000000000000000 [ 7894.426796] CPU#0: gen-PMC0 ctrl: 0000000000134f2e [ 7894.426798] CPU#0: gen-PMC0 count: 000000ffb91e31e1 [ 7894.426799] CPU#0: gen-PMC0 left: 000000007fffffff [ 7894.426802] CPU#0: gen-PMC1 ctrl: 000000000013412e [ 7894.426804] CPU#0: gen-PMC1 count: 000000ff80312b23 [ 7894.426805] CPU#0: gen-PMC1 left: 000000007fffffff [ 7894.426807] CPU#0: fixed-PMC0 count: 000000ffacf54a68 [ 7894.426809] CPU#0: fixed-PMC1 count: 000000ffb71cfe02 [ 7894.426811] CPU#0: fixed-PMC2 count: 0000000000000000 [ 7905.522262] SysRq : Show Regs [ 7905.522277] [ 7905.522279] CPU#0: ctrl: ffffffffffffffff [ 7905.522281] CPU#0: status: 0000000000000000 [ 7905.522283] CPU#0: overflow: 0000000000000000 [ 7905.522284] CPU#0: fixed: 0000000000000000 [ 7905.522286] CPU#0: used: 0000000000000000 [ 7905.522288] CPU#0: gen-PMC0 ctrl: 0000000000134f2e [ 7905.522290] CPU#0: gen-PMC0 count: 000000ffb91e31e1 [ 7905.522292] CPU#0: gen-PMC0 left: 000000007fffffff [ 7905.522294] CPU#0: gen-PMC1 ctrl: 000000000013412e [ 7905.522296] CPU#0: gen-PMC1 count: 000000ff80312b23 [ 7905.522298] CPU#0: gen-PMC1 left: 000000007fffffff [ 7905.522299] CPU#0: fixed-PMC0 count: 000000ffacf54a68 [ 7905.522301] CPU#0: fixed-PMC1 count: 000000ffb71cfe02 [ 7905.522303] CPU#0: fixed-PMC2 count: 0000000000000000 > > the counts should stay put (i.e. all counters should be disabled). > If they move around - despite there being no 'perf stat -a' session > running, that would be a bug. I rebooted my machine then got good results # perf stat -e instructions -e cycles -a sleep 10 Performance counter stats for 'sleep': 240021659058 instructions (events) 240997984836 CPU cycles (events) << OK >> Wall-clock time elapsed: 10041.499326 msecs But if I use plain "perf stat -a sleep 10" it seems I get wrong values again (16 G cycles/sec) for all next perf sessions # perf stat -a sleep 10 Performance counter stats for 'sleep': 80332.718661 task clock ticks (msecs) 80602 context switches (events) 4 CPU migrations (events) 473 pagefaults (events) 160665397757 CPU cycles (events) << bad >> 160079277974 instructions (events) 793665 cache references (events) 226829 cache misses (events) Wall-clock time elapsed: 10041.302969 msecs # perf stat -e cycles -a sleep 10 Performance counter stats for 'sleep': 160665176575 CPU cycles (events) << bad >> Wall-clock time elapsed: 10041.491503 msecs > > Also, the overhead might be profile-able, via: > > perf record -m 1024 sleep 10 > > (this records the profile into output.perf.) > > followed by: > > ./perf-report | tail -20 > > to display a histogram, with kernel-space and user-space symbols > mixed into a single profile. > > (Pick up latest -tip to get perf-report built by default.) Thanks, this is what I use currently -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet a écrit : > But if I use plain "perf stat -a sleep 10" > it seems I get wrong values again (16 G cycles/sec) for all next perf sessions > Well, I confirm all my cpus switched from 3GHz to 2GHz, after "perf stat -a sleep 10" (but "perf stat -e instructions -e cycles -a sleep 10" doesnt trigger this problem) Nothing logged, and /proc/cpuinfo stills reports 3 GHz frequencies # cat unit.c main() { int i; for (i = 0 ; i < 10000000; i++) getppid(); } # time ./unit real 0m0.818s user 0m0.289s sys 0m0.529s # perf stat -a sleep 10 2>/dev/null # time ./unit real 0m1.122s user 0m0.482s sys 0m0.640s # tail -n 27 /proc/cpuinfo processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz stepping : 6 cpu MHz : 3000.102 cache size : 6144 KB physical id : 1 siblings : 1 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm tpr_shadow vnmi flexpriority bogomips : 6000.01 clflush size : 64 power management: # grep CPU_FREQ .config # CONFIG_CPU_FREQ is not set perf_counter seems promising, but still... needs some bug hunting :) Thank you -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 30, 2009 at 04:08:48PM +0200, Eric Dumazet wrote: > Ingo Molnar a écrit : > > * Eric Dumazet <dada1@cosmosbay.com> wrote: > > > >> Ingo Molnar a écrit : > >>> * Eric Dumazet <dada1@cosmosbay.com> wrote: > >>> > >>>> On uddpping, I had prior to the patch about 49000 wakeups per > >>>> second, and after patch about 26000 wakeups per second (matches > >>>> number of incoming udp messages per second) > >>> very nice. It might not show up as a real performance difference if > >>> the CPUs are not fully saturated during the test - but it could show > >>> up as a decrease in CPU utilization. > >>> > >>> Also, if you run the test via 'perf stat -a ./test.sh' you should > >>> see a reduction in instructions executed: > >>> > >>> aldebaran:~/linux/linux> perf stat -a sleep 1 > >>> > >>> Performance counter stats for 'sleep': > >>> > >>> 16128.045994 task clock ticks (msecs) > >>> 12876 context switches (events) > >>> 219 CPU migrations (events) > >>> 186144 pagefaults (events) > >>> 20911802763 CPU cycles (events) > >>> 19309416815 instructions (events) > >>> 199608554 cache references (events) > >>> 19990754 cache misses (events) > >>> > >>> Wall-clock time elapsed: 1008.882282 msecs > >>> > >>> With -a it's measured system-wide, from start of test to end of test > >>> - the results will be a lot more stable (and relevant) statistically > >>> than wall-clock time or CPU usage measurements. (both of which are > >>> rather imprecise in general) > >> I tried this perf stuff and got strange results on a cpu burning > >> bench, saturating my 8 cpus with a "while (1) ;" loop > >> > >> > >> # perf stat -a sleep 10 > >> > >> Performance counter stats for 'sleep': > >> > >> 80334.709038 task clock ticks (msecs) > >> 80638 context switches (events) > >> 4 CPU migrations (events) > >> 468 pagefaults (events) > >> 160694681969 CPU cycles (events) > >> 160127154810 instructions (events) > >> 686393 cache references (events) > >> 230117 cache misses (events) > >> > >> Wall-clock time elapsed: 10041.531644 msecs > >> > >> So its about 16069468196 cycles per second for 8 cpus > >> Divide by 8 to get 2008683524 cycles per second per cpu, > >> which is not 3000000000 (E5450 @ 3.00GHz) > > > > What does "perf stat -l -a sleep 10" show? I suspect your counters > > are scaled by about 67%, due to counter over-commit. -l will show > > the scaling factor (and will scale up the results). > > Only difference I see with '-l' is cache misses not counted. > > (tbench 8 running, so not one instruction per cycle) > > # perf stat -l -a sleep 10 > > Performance counter stats for 'sleep': > > 80007.128844 task clock ticks (msecs) > 6754642 context switches (events) > 2 CPU migrations (events) > 474 pagefaults (events) > 160925719143 CPU cycles (events) > 108482003620 instructions (events) > 7584035056 cache references (events) > <not counted> cache misses > > Wall-clock time elapsed: 10000.595448 msecs > > # perf stat -a sleep 10 > > Performance counter stats for 'sleep': > > 80702.908287 task clock ticks (msecs) > 6792588 context switches (events) > 24 CPU migrations (events) > 4867 pagefaults (events) > 161957342744 CPU cycles (events) > 109147553984 instructions (events) > 7633190481 cache references (events) > 22996234 cache misses (events) > > Wall-clock time elapsed: 10087.502391 msecs > > > > > > > If so then i think this behavior is confusing, and i'll make -l > > default-enabled. (in fact i just committed this change to latest > > -tip and pushed it out) > > > > To get only instructions and cycles, do: > > > > perf stat -e instructions -e cycles > > > > # perf stat -e instructions -e cycles -a sleep 10 > > Performance counter stats for 'sleep': > > 109469842392 instructions (events) > 162012922122 CPU cycles (events) > > Wall-clock time elapsed: 10124.943544 msecs > > I am wondering if cpus are not running at 2 GHz ;) > > > >> It seems strange a "jmp myself" uses one unhalted cycle per > >> instruction and 0.5 halted cycle ... > >> > >> Also, after using "perf stat", tbench results are 1778 MB/S > >> instead of 2610 MB/s. Even if no perf stat running. > > > > Hm, that would be a bug. Could you send the dmesg output of: > > > > echo p > /proc/sysrq-trigger > > echo p > /proc/sysrq-trigger > > > > with counters running it will show something like: > > > > [ 868.105712] SysRq : Show Regs > > [ 868.106544] > > [ 868.106544] CPU#1: ctrl: ffffffffffffffff > > [ 868.106544] CPU#1: status: 0000000000000000 > > [ 868.106544] CPU#1: overflow: 0000000000000000 > > [ 868.106544] CPU#1: fixed: 0000000000000000 > > [ 868.106544] CPU#1: used: 0000000000000000 > > [ 868.106544] CPU#1: gen-PMC0 ctrl: 00000000001300c0 > > [ 868.106544] CPU#1: gen-PMC0 count: 000000ffee889194 > > [ 868.106544] CPU#1: gen-PMC0 left: 0000000011e1791a > > [ 868.106544] CPU#1: gen-PMC1 ctrl: 000000000013003c > > [ 868.106544] CPU#1: gen-PMC1 count: 000000ffd2542438 > > [ 868.106544] CPU#1: gen-PMC1 left: 000000002dd17a8e > > They stay fix (but only CPU#0 is displayed) > > Is perf able to display per cpu counters, and not aggregated values ? > > [ 7894.426787] CPU#0: ctrl: ffffffffffffffff > [ 7894.426788] CPU#0: status: 0000000000000000 > [ 7894.426790] CPU#0: overflow: 0000000000000000 > [ 7894.426792] CPU#0: fixed: 0000000000000000 > [ 7894.426793] CPU#0: used: 0000000000000000 > [ 7894.426796] CPU#0: gen-PMC0 ctrl: 0000000000134f2e > [ 7894.426798] CPU#0: gen-PMC0 count: 000000ffb91e31e1 > [ 7894.426799] CPU#0: gen-PMC0 left: 000000007fffffff > [ 7894.426802] CPU#0: gen-PMC1 ctrl: 000000000013412e > [ 7894.426804] CPU#0: gen-PMC1 count: 000000ff80312b23 > [ 7894.426805] CPU#0: gen-PMC1 left: 000000007fffffff > [ 7894.426807] CPU#0: fixed-PMC0 count: 000000ffacf54a68 > [ 7894.426809] CPU#0: fixed-PMC1 count: 000000ffb71cfe02 > [ 7894.426811] CPU#0: fixed-PMC2 count: 0000000000000000 > [ 7905.522262] SysRq : Show Regs > [ 7905.522277] > [ 7905.522279] CPU#0: ctrl: ffffffffffffffff > [ 7905.522281] CPU#0: status: 0000000000000000 > [ 7905.522283] CPU#0: overflow: 0000000000000000 > [ 7905.522284] CPU#0: fixed: 0000000000000000 > [ 7905.522286] CPU#0: used: 0000000000000000 > [ 7905.522288] CPU#0: gen-PMC0 ctrl: 0000000000134f2e > [ 7905.522290] CPU#0: gen-PMC0 count: 000000ffb91e31e1 > [ 7905.522292] CPU#0: gen-PMC0 left: 000000007fffffff > [ 7905.522294] CPU#0: gen-PMC1 ctrl: 000000000013412e > [ 7905.522296] CPU#0: gen-PMC1 count: 000000ff80312b23 > [ 7905.522298] CPU#0: gen-PMC1 left: 000000007fffffff > [ 7905.522299] CPU#0: fixed-PMC0 count: 000000ffacf54a68 > [ 7905.522301] CPU#0: fixed-PMC1 count: 000000ffb71cfe02 > [ 7905.522303] CPU#0: fixed-PMC2 count: 0000000000000000 > > > > > > the counts should stay put (i.e. all counters should be disabled). > > If they move around - despite there being no 'perf stat -a' session > > running, that would be a bug. > > I rebooted my machine then got good results > > # perf stat -e instructions -e cycles -a sleep 10 > > Performance counter stats for 'sleep': > > 240021659058 instructions (events) > 240997984836 CPU cycles (events) << OK >> > > Wall-clock time elapsed: 10041.499326 msecs > > But if I use plain "perf stat -a sleep 10" > it seems I get wrong values again (16 G cycles/sec) for all next perf sessions I have to ask... Is it possible that the machine runs at 3GHz initially, but slows down to 2GHz for cooling reasons? One thing to try would be to run powertop, which displays the frequencies. I get the following if mostly idle: PowerTOP version 1.8 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (cpu running) (14.1%) 2.17 Ghz 4.3% C1 0.0ms ( 0.0%) 1.67 Ghz 0.0% C2 0.5ms (16.2%) 1333 Mhz 0.0% C3 0.5ms (69.8%) 1000 Mhz 95.7% And the following with an infinite loop running: PowerTOP version 1.8 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (cpu running) (54.3%) 2.17 Ghz 100.0% C1 0.0ms ( 0.0%) 1.67 Ghz 0.0% C2 1.2ms ( 1.7%) 1333 Mhz 0.0% C3 1.3ms (44.0%) 1000 Mhz 0.0% But I am probably missing the point here... Thanx, Paul > # perf stat -a sleep 10 > > Performance counter stats for 'sleep': > > 80332.718661 task clock ticks (msecs) > 80602 context switches (events) > 4 CPU migrations (events) > 473 pagefaults (events) > 160665397757 CPU cycles (events) << bad >> > 160079277974 instructions (events) > 793665 cache references (events) > 226829 cache misses (events) > > Wall-clock time elapsed: 10041.302969 msecs > > # perf stat -e cycles -a sleep 10 > > Performance counter stats for 'sleep': > > 160665176575 CPU cycles (events) << bad >> > > Wall-clock time elapsed: 10041.491503 msecs > > > > > > Also, the overhead might be profile-able, via: > > > > perf record -m 1024 sleep 10 > > > > (this records the profile into output.perf.) > > > > followed by: > > > > ./perf-report | tail -20 > > > > to display a histogram, with kernel-space and user-space symbols > > mixed into a single profile. > > > > (Pick up latest -tip to get perf-report built by default.) > > Thanks, this is what I use currently > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet a écrit : > Eric Dumazet a écrit : > >> But if I use plain "perf stat -a sleep 10" >> it seems I get wrong values again (16 G cycles/sec) for all next perf sessions >> > > Well, I confirm all my cpus switched from 3GHz to 2GHz, after > > "perf stat -a sleep 10" > > (but "perf stat -e instructions -e cycles -a sleep 10" doesnt trigger this problem) > > Nothing logged, and /proc/cpuinfo stills reports 3 GHz frequencies > > # cat unit.c > main() { > int i; > for (i = 0 ; i < 10000000; i++) > getppid(); > } > # time ./unit > > real 0m0.818s > user 0m0.289s > sys 0m0.529s > # perf stat -a sleep 10 2>/dev/null > # time ./unit > > real 0m1.122s > user 0m0.482s > sys 0m0.640s > > # tail -n 27 /proc/cpuinfo > processor : 7 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz > stepping : 6 > cpu MHz : 3000.102 > cache size : 6144 KB > physical id : 1 > siblings : 1 > core id : 3 > cpu cores : 4 > apicid : 7 > initial apicid : 7 > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm tpr_shadow vnmi flexpriority > bogomips : 6000.01 > clflush size : 64 > power management: > > # grep CPU_FREQ .config > # CONFIG_CPU_FREQ is not set > > > perf_counter seems promising, but still... needs some bug hunting :) > Update : Mike Galbraith suggested me to try various things, and finally, I discovered this frequency change was probably a BIOS problem on my HP BL460c G1 System Options -> Power regulator for Proliant [*] HP Dynamic Power Savings Mode [ ] HP Static Low Power Mode [ ] HP Static High Performance Mode [ ] OS Control Mode I switched it to 'OS Control Mode' Then acpi-cpufreq could load, and no more frequencies changes on a "perf -a sleep 10" session, using or not cpufreq. (Supported cpufreq speeds on these cpus : 1999 & 2999 MHz) So it was a BIOS issue # perf stat -a sleep 10 Performance counter stats for 'sleep': 80005.418223 task clock ticks (msecs) 80266 context switches (events) 3 CPU migrations (events) 486 pagefaults (events) 240013851624 CPU cycles (events) << good >> 239076501419 instructions (events) 679464 cache references (events) <not counted> cache misses Wall-clock time elapsed: 10000.468808 msecs Thank you -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Eric Dumazet <dada1@cosmosbay.com> wrote: > Eric Dumazet a écrit : > > Eric Dumazet a écrit : > > > >> But if I use plain "perf stat -a sleep 10" > >> it seems I get wrong values again (16 G cycles/sec) for all next perf sessions > >> > > > > Well, I confirm all my cpus switched from 3GHz to 2GHz, after > > > > "perf stat -a sleep 10" > > > > (but "perf stat -e instructions -e cycles -a sleep 10" doesnt trigger this problem) > > > > Nothing logged, and /proc/cpuinfo stills reports 3 GHz frequencies > > > > # cat unit.c > > main() { > > int i; > > for (i = 0 ; i < 10000000; i++) > > getppid(); > > } > > # time ./unit > > > > real 0m0.818s > > user 0m0.289s > > sys 0m0.529s > > # perf stat -a sleep 10 2>/dev/null > > # time ./unit > > > > real 0m1.122s > > user 0m0.482s > > sys 0m0.640s > > > > # tail -n 27 /proc/cpuinfo > > processor : 7 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 23 > > model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz > > stepping : 6 > > cpu MHz : 3000.102 > > cache size : 6144 KB > > physical id : 1 > > siblings : 1 > > core id : 3 > > cpu cores : 4 > > apicid : 7 > > initial apicid : 7 > > fdiv_bug : no > > hlt_bug : no > > f00f_bug : no > > coma_bug : no > > fpu : yes > > fpu_exception : yes > > cpuid level : 10 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm tpr_shadow vnmi flexpriority > > bogomips : 6000.01 > > clflush size : 64 > > power management: > > > > # grep CPU_FREQ .config > > # CONFIG_CPU_FREQ is not set > > > > > > perf_counter seems promising, but still... needs some bug hunting :) > > > > Update : > > Mike Galbraith suggested me to try various things, and finally, I discovered > this frequency change was probably a BIOS problem on my HP BL460c G1 > > System Options -> Power regulator for Proliant > > [*] HP Dynamic Power Savings Mode > [ ] HP Static Low Power Mode > [ ] HP Static High Performance Mode > [ ] OS Control Mode > > > I switched it to 'OS Control Mode' > > Then acpi-cpufreq could load, and no more frequencies changes on a "perf -a sleep 10" > session, using or not cpufreq. > (Supported cpufreq speeds on these cpus : 1999 & 2999 MHz) > > So it was a BIOS issue ah! That makes quite a bit of sense. The BIOS interfering with an OS feature ... Was that the default setting in the BIOS? > # perf stat -a sleep 10 > > Performance counter stats for 'sleep': > > 80005.418223 task clock ticks (msecs) > 80266 context switches (events) > 3 CPU migrations (events) > 486 pagefaults (events) > 240013851624 CPU cycles (events) << good >> > 239076501419 instructions (events) > 679464 cache references (events) > <not counted> cache misses > > Wall-clock time elapsed: 10000.468808 msecs That looks perfect now. It would also be really nice to have a sysrq-p dump of your PMU state before you've done any profiling. Is there any trace of the BIOS meddling with them, that we could detect (and warn about) during bootup? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ingo Molnar a écrit : > * Eric Dumazet <dada1@cosmosbay.com> wrote: > >> Eric Dumazet a écrit : >>> Eric Dumazet a écrit : >>> >>>> But if I use plain "perf stat -a sleep 10" >>>> it seems I get wrong values again (16 G cycles/sec) for all next perf sessions >>>> >>> Well, I confirm all my cpus switched from 3GHz to 2GHz, after >>> >>> "perf stat -a sleep 10" >>> >>> (but "perf stat -e instructions -e cycles -a sleep 10" doesnt trigger this problem) >>> >>> Nothing logged, and /proc/cpuinfo stills reports 3 GHz frequencies >>> >>> # cat unit.c >>> main() { >>> int i; >>> for (i = 0 ; i < 10000000; i++) >>> getppid(); >>> } >>> # time ./unit >>> >>> real 0m0.818s >>> user 0m0.289s >>> sys 0m0.529s >>> # perf stat -a sleep 10 2>/dev/null >>> # time ./unit >>> >>> real 0m1.122s >>> user 0m0.482s >>> sys 0m0.640s >>> >>> # tail -n 27 /proc/cpuinfo >>> processor : 7 >>> vendor_id : GenuineIntel >>> cpu family : 6 >>> model : 23 >>> model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz >>> stepping : 6 >>> cpu MHz : 3000.102 >>> cache size : 6144 KB >>> physical id : 1 >>> siblings : 1 >>> core id : 3 >>> cpu cores : 4 >>> apicid : 7 >>> initial apicid : 7 >>> fdiv_bug : no >>> hlt_bug : no >>> f00f_bug : no >>> coma_bug : no >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 10 >>> wp : yes >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm tpr_shadow vnmi flexpriority >>> bogomips : 6000.01 >>> clflush size : 64 >>> power management: >>> >>> # grep CPU_FREQ .config >>> # CONFIG_CPU_FREQ is not set >>> >>> >>> perf_counter seems promising, but still... needs some bug hunting :) >>> >> Update : >> >> Mike Galbraith suggested me to try various things, and finally, I discovered >> this frequency change was probably a BIOS problem on my HP BL460c G1 >> >> System Options -> Power regulator for Proliant >> >> [*] HP Dynamic Power Savings Mode >> [ ] HP Static Low Power Mode >> [ ] HP Static High Performance Mode >> [ ] OS Control Mode >> >> >> I switched it to 'OS Control Mode' >> >> Then acpi-cpufreq could load, and no more frequencies changes on a "perf -a sleep 10" >> session, using or not cpufreq. >> (Supported cpufreq speeds on these cpus : 1999 & 2999 MHz) >> >> So it was a BIOS issue > > ah! That makes quite a bit of sense. The BIOS interfering with an OS > feature ... Was that the default setting in the BIOS? This was default setting in BIOS, yes. > >> # perf stat -a sleep 10 >> >> Performance counter stats for 'sleep': >> >> 80005.418223 task clock ticks (msecs) >> 80266 context switches (events) >> 3 CPU migrations (events) >> 486 pagefaults (events) >> 240013851624 CPU cycles (events) << good >> >> 239076501419 instructions (events) >> 679464 cache references (events) >> <not counted> cache misses >> >> Wall-clock time elapsed: 10000.468808 msecs > > That looks perfect now. > > It would also be really nice to have a sysrq-p dump of your PMU > state before you've done any profiling. Is there any trace of the > BIOS meddling with them, that we could detect (and warn about) > during bootup? Difference is that on BIOS set to 'OS Control Mode' I see one more entry in ACPI list : [ 0.000000] ACPI: SSDT cfe5b000 004C9 (v01 HP SSDTP 00000001 INTL 20030228) ... And these 8 additional lines after (one per cpu) [ 0.706697] ACPI: SSDT cfe5c000 002DA (v01 HP SSDT0 00000001 INTL 20030228) [ 0.707250] ACPI: SSDT cfe5c300 002DA (v01 HP SSDT1 00000001 INTL 20030228) [ 0.707768] ACPI: SSDT cfe5c600 002DA (v01 HP SSDT2 00000001 INTL 20030228) [ 0.708376] ACPI: SSDT cfe5c900 002DF (v01 HP SSDT3 00000001 INTL 20030228) [ 0.708964] ACPI: SSDT cfe5cc00 002DA (v01 HP SSDT4 00000001 INTL 20030228) [ 0.709567] ACPI: SSDT cfe5cf00 002DA (v01 HP SSDT5 00000001 INTL 20030228) [ 0.710122] ACPI: SSDT cfe5d200 002DA (v01 HP SSDT6 00000001 INTL 20030228) [ 0.710713] ACPI: SSDT cfe5d500 002DA (v01 HP SSDT7 00000001 INTL 20030228) Also, if this option is set to default (HP Dynamic Power Savings Mode) I get : # modprobe acpi-cpufreq FATAL: Error inserting acpi_cpufreq (/lib/modules/2.6.30-rc4-tip-01560-gdd5fa92/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko): No such device but no kernel message logged. Might be possible to add some kind of warning yes, I can test a patch if you want. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/wait.h b/include/linux/wait.h index 5d631c1..bc02463 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -440,13 +440,15 @@ void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key); int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key); -#define DEFINE_WAIT(name) \ +#define DEFINE_WAIT_FUNC(name, function) \ wait_queue_t name = { \ .private = current, \ - .func = autoremove_wake_function, \ + .func = function, \ .task_list = LIST_HEAD_INIT((name).task_list), \ } +#define DEFINE_WAIT(name) DEFINE_WAIT_FUNC(name, autoremove_wake_function) + #define DEFINE_WAIT_BIT(name, word, bit) \ struct wait_bit_queue name = { \ .key = __WAIT_BIT_KEY_INITIALIZER(word, bit), \ diff --git a/net/core/datagram.c b/net/core/datagram.c index d0de644..b7960a3 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -64,13 +64,25 @@ static inline int connection_based(struct sock *sk) return sk->sk_type == SOCK_SEQPACKET || sk->sk_type == SOCK_STREAM; } +static int receiver_wake_function(wait_queue_t *wait, unsigned mode, int sync, + void *key) +{ + unsigned long bits = (unsigned long)key; + + /* + * Avoid a wakeup if event not interesting for us + */ + if (bits && !(bits & (POLLIN | POLLERR))) + return 0; + return autoremove_wake_function(wait, mode, sync, key); +} /* * Wait for a packet.. */ static int wait_for_packet(struct sock *sk, int *err, long *timeo_p) { int error; - DEFINE_WAIT(wait); + DEFINE_WAIT_FUNC(wait, receiver_wake_function); prepare_to_wait_exclusive(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);