Message ID | 20140605123701.GF9145@spoyarek.pnq.redhat.com |
---|---|
State | New |
Headers | show |
On Thu, 2014-06-05 at 18:07 +0530, Siddhesh Poyarekar wrote: > -bench-pthread := pthread_once > +bench-pthread := pthread_once pthread_rwlock_test Could we rename this to pthread_rwlock_uncontended or pthread_rwlock_1thread to signal that this measures just the single-thread overheads on an uncontended rwlock?
On Thu, Jun 05, 2014 at 06:07:01PM +0530, Siddhesh Poyarekar wrote: > On Sat, May 03, 2014 at 12:48:31PM +0200, Ondřej Bílka wrote: > > Its nice to have but orthogonal on proposed change, so it should not > > block this patch > > This looks like something that could easily use the $bench-inputs > format since you're really just looking for mean times. I haven't > checked to see if a failed branch would be expensive compared to the > lock/unlock cycle we're testing, but I'm guessing that it should get > cached and hence not have an impact since we don't interleave the > rdlock and wrlock calls. > Anyway I realized that microbenchmarks may be wrong way here. How it is likely that you have lock in L1 cache? Andy any insigth? I could try to simulate it by using many locks and choosing one at random, how that would work. Also Siddhesh, benchmarks are useless unless you use their result so what did you measured with your benchmark and what do you thing about Andi's patch.
On 5 June 2014 18:27, Ondřej Bílka <neleai@seznam.cz> wrote: > Also Siddhesh, benchmarks are useless unless you use their result so > what did you measured with your benchmark and what do you thing about > Andi's patch. I haven't looked at Andi's patch and I haven't done any comparative testing; I merely proposed a test similar to your test in the standard format to measure uncontended rdlock/wrlock+unlock performance because such an example of framework usage is not present in the current tests. Torvald, I'm not very picky about the name if we find such a test useful to include in the microbenchmark. I haven't actually looked at the original problem statement yet to make any comments about that. Siddhesh
On Thu, 2014-06-05 at 14:57 +0200, Ondřej Bílka wrote: > On Thu, Jun 05, 2014 at 06:07:01PM +0530, Siddhesh Poyarekar wrote: > > On Sat, May 03, 2014 at 12:48:31PM +0200, Ondřej Bílka wrote: > > > Its nice to have but orthogonal on proposed change, so it should not > > > block this patch > > > > This looks like something that could easily use the $bench-inputs > > format since you're really just looking for mean times. I haven't > > checked to see if a failed branch would be expensive compared to the > > lock/unlock cycle we're testing, but I'm guessing that it should get > > cached and hence not have an impact since we don't interleave the > > rdlock and wrlock calls. > > > Anyway I realized that microbenchmarks may be wrong way here. > > How it is likely that you have lock in L1 cache? That can indeed be likely, depending on the way the program uses the locks. Especially for a rdlock this can be likely. > Andy any insigth? > I could try to simulate it by using many locks and choosing one at > random, how that would work. I don't think that's necessary. The purpose of this benchmark is to measure the single-thread overheads, and whether they differ between a C implementation and an assembly implementation. Assuming that does implement the same algorithm, they should roughly make the same memory accesses. > Also Siddhesh, benchmarks are useless unless you use their result so > what did you measured with your benchmark and what do you thing about > Andi's patch. The fact that Siddhesh reviewed it doesn't imply that he has to measure, nor that the benchmark would be useless.
On Thu, Jun 05, 2014 at 03:06:24PM +0200, Torvald Riegel wrote: > On Thu, 2014-06-05 at 14:57 +0200, Ondřej Bílka wrote: > > On Thu, Jun 05, 2014 at 06:07:01PM +0530, Siddhesh Poyarekar wrote: > > > On Sat, May 03, 2014 at 12:48:31PM +0200, Ondřej Bílka wrote: > > > > Its nice to have but orthogonal on proposed change, so it should not > > > > block this patch > > > > > > This looks like something that could easily use the $bench-inputs > > > format since you're really just looking for mean times. I haven't > > > checked to see if a failed branch would be expensive compared to the > > > lock/unlock cycle we're testing, but I'm guessing that it should get > > > cached and hence not have an impact since we don't interleave the > > > rdlock and wrlock calls. > > > > > Anyway I realized that microbenchmarks may be wrong way here. > > > > How it is likely that you have lock in L1 cache? > > That can indeed be likely, depending on the way the program uses the > locks. Especially for a rdlock this can be likely. > That I wanted to hear. > > Andy any insigth? > > I could try to simulate it by using many locks and choosing one at > > random, how that would work. > > I don't think that's necessary. The purpose of this benchmark is to > measure the single-thread overheads, and whether they differ between a C > implementation and an assembly implementation. Assuming that does > implement the same algorithm, they should roughly make the same memory > accesses. > That could depend on scheduling but rwlock code looks like taking gcc -S output anyway. > > Also Siddhesh, benchmarks are useless unless you use their result so > > what did you measured with your benchmark and what do you thing about > > Andi's patch. > > The fact that Siddhesh reviewed it doesn't imply that he has to measure, > nor that the benchmark would be useless. For benchmark review you should definitely run it to check if result make sense. It helps to prevent oops like what happened to me when I checked pthread_once it took always few cycles and problem turned out to be not linking with pthread. Siddhesh send a alternate benchmark so somebody needs to stand before it. Who will do it? I wont as I am convinced by previous benchmark. Will one of you stop bikesheding and say that there is really no difference between assembly and c rwlock implementation or will you keep arguing about name of benchmark for next month?
On Thu, 2014-06-05 at 15:51 +0200, Ondřej Bílka wrote: > Will one of you stop bikesheding and say that there is really no difference > between assembly and c rwlock implementation or will you keep arguing > about name of benchmark for next month? I find the tone of this message inappropriate. YMMV -- but don't be surprised if that doesn't make your stuff bounce up on my priority list.
On 5 June 2014 19:21, Ondřej Bílka <neleai@seznam.cz> wrote: > For benchmark review you should definitely run it to check if result > make sense. It helps to prevent oops like what happened to me when I > checked pthread_once it took always few cycles and problem turned out to > be not linking with pthread. If sanity is what you're looking for, i.e. the requisite calls not being optimized out, then yes, I did verify that and even ran it once against current master: "pthread_rwlock_test": { "rwlock": { "duration": 2.87958e+09, "iterations": 2.7688e+07, "max": 172.944, "min": 97.516, "mean": 104.001 }, "rdlock": { "duration": 2.88252e+09, "iterations": 2.7022e+07, "max": 252.66, "min": 101.541, "mean": 106.673 } } > Siddhesh send a alternate benchmark so somebody needs to stand before > it. Who will do it? I wont as I am convinced by previous benchmark. The benchmark you proposed is equivalent in terms of stuff we're measuring and I was only trying to help you get your benchmark into the standard format so that you don't have to redo the json output or leave that for someone else to do. If uncontended performance is important then this benchmark is useful regardless of whether it shows that the assembly implementation is better than the C one or not. In fact you argued in your patch proposal that the benchmark itself is orthogonal to Andi's patch. Did you change your mind about that? > Will one of you stop bikesheding and say that there is really no difference > between assembly and c rwlock implementation or will you keep arguing > about name of benchmark for next month? I like my benchmark name in italics :) Siddhesh
On Thu, Jun 05, 2014 at 09:23:04PM +0530, Siddhesh Poyarekar wrote: > On 5 June 2014 19:21, Ondřej Bílka <neleai@seznam.cz> wrote: > > For benchmark review you should definitely run it to check if result > > make sense. It helps to prevent oops like what happened to me when I > > checked pthread_once it took always few cycles and problem turned out to > > be not linking with pthread. > > If sanity is what you're looking for, i.e. the requisite calls not > being optimized out, then yes, I did verify that and even ran it once > against current master: > > "pthread_rwlock_test": { > "rwlock": { > "duration": 2.87958e+09, > "iterations": 2.7688e+07, > "max": 172.944, > "min": 97.516, > "mean": 104.001 > }, > "rdlock": { > "duration": 2.88252e+09, > "iterations": 2.7022e+07, > "max": 252.66, > "min": 101.541, > "mean": 106.673 > } > } > It is that you must always look for ways that could make benchmark invalid. Also I noticed that this takes long to run, did you also noticed that? > > Siddhesh send a alternate benchmark so somebody needs to stand before > > it. Who will do it? I wont as I am convinced by previous benchmark. > > The benchmark you proposed is equivalent in terms of stuff we're > measuring and I was only trying to help you get your benchmark into > the standard format so that you don't have to redo the json output or > leave that for someone else to do. I am ok also with this benchmark but its your benchmark so I treated it like that. > If uncontended performance is > important then this benchmark is useful regardless of whether it shows > that the assembly implementation is better than the C one or not. That is true to some extend, but lets assume that we check this patch in other way and commit your benchmark. Do you thing that next time when there comes a patch touching rdlocks somebody will do a comparison if nobody did it first time? > In fact you argued in your patch proposal that the benchmark itself is > orthogonal to Andi's patch. Did you change your mind about that? > Its orthogonal what benchmark we use. However as Carlos said that he wants a benchtest before it could be commited we should settle for something. > > Will one of you stop bikesheding and say that there is really no difference > > between assembly and c rwlock implementation or will you keep arguing > > about name of benchmark for next month? > > I like my benchmark name in italics :) > > Siddhesh > -- > http://siddhesh.in
On Thu, Jun 05, 2014 at 06:07:01PM +0530, Siddhesh Poyarekar wrote: > On Sat, May 03, 2014 at 12:48:31PM +0200, Ondřej Bílka wrote: > > Its nice to have but orthogonal on proposed change, so it should not > > block this patch > > This looks like something that could easily use the $bench-inputs > format since you're really just looking for mean times. I haven't > checked to see if a failed branch would be expensive compared to the > lock/unlock cycle we're testing, but I'm guessing that it should get > cached and hence not have an impact since we don't interleave the > rdlock and wrlock calls. > I assume that you asked about branch misprediction, that should not be problem here, there are two ways how that it could happen. First one is that it was not cache but then this is in cold path so it is not that interesting. Second possibility is that we took different path previous time, this is also not that interesting because previous call went to slow path which was order of magnitude slower so its contribution is tiny. A third case where misprediction can happen is that there are parallel readers. That would need a different benchmark.
On 6 June 2014 01:24, Ondřej Bílka <neleai@seznam.cz> wrote: >> "pthread_rwlock_test": { >> "rwlock": { >> "duration": 2.87958e+09, >> "iterations": 2.7688e+07, >> "max": 172.944, >> "min": 97.516, >> "mean": 104.001 >> }, >> "rdlock": { >> "duration": 2.88252e+09, >> "iterations": 2.7022e+07, >> "max": 252.66, >> "min": 101.541, >> "mean": 106.673 >> } >> } >> > It is that you must always look for ways that could make benchmark > invalid. > Also I noticed that this takes long to run, did you also noticed that? The standard benchmarks run only for BENCH_DURATION time and the above was for 1 second, so I'm not sure how I should be concluding whether it 'takes long' or not. > That is true to some extend, but lets assume that we check this patch in > other way and commit your benchmark. Do you thing that next time when > there comes a patch touching rdlocks somebody will do a comparison if > nobody did it first time? Why not? That is the point of the microbenchmarks - if there is a significant change in a function and there's a microbenchmark for it, the submitter should post the before and after results of the change. Since we don't have any disagreement over the approach and the generated code also is not a problem (as you pointed out in your other response) there should be no problem with getting the benchmark in and letting Andi use it to compare the before and after results. I don't even mind waiting for Andi to revert with results before committing the benchmark if that's what you want. Siddhesh
diff --git a/benchtests/Makefile b/benchtests/Makefile index 63a5a7f..d74de80 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -25,7 +25,7 @@ include ../Makeconfig bench-math := acos acosh asin asinh atan atanh cos cosh exp exp2 ffs ffsll \ log log2 modf pow rint sin sincos sinh sqrt tan tanh -bench-pthread := pthread_once +bench-pthread := pthread_once pthread_rwlock_test bench := $(bench-math) $(bench-pthread) diff --git a/benchtests/pthread_rwlock_test-inputs b/benchtests/pthread_rwlock_test-inputs new file mode 100644 index 0000000..9122afb --- /dev/null +++ b/benchtests/pthread_rwlock_test-inputs @@ -0,0 +1,8 @@ +## args: int +## includes: pthread.h +## include-sources: pthread_rwlock_test-source.c + +##name: rdlock +0 +##name: rwlock +1 diff --git a/benchtests/pthread_rwlock_test-source.c b/benchtests/pthread_rwlock_test-source.c new file mode 100644 index 0000000..c679f7b --- /dev/null +++ b/benchtests/pthread_rwlock_test-source.c @@ -0,0 +1,13 @@ + +static pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITIALIZER; + +static void +pthread_rwlock_test (int write) +{ + if (write) + pthread_rwlock_wrlock (&rwlock); + else + pthread_rwlock_rdlock (&rwlock); + + pthread_rwlock_unlock (&rwlock); +}