Message ID | 20201116144426.8415-1-anton.ivanov@cambridgegreys.com |
---|---|
State | Superseded |
Headers | show |
Series | um: borrow bitops from the x86 tree | expand |
Hi Anton, So I thought I'd test your performance patches here, and applied (hopefully the latest versions of) on top of 5.9: um: allow the use of glibc functions instead of builtins um: Fetch registers only for signals which need them um: enable the use of optimized xor routines in UML um: add a UML specific futex implementation um: Remove use of asprinf in umid.c um: "borrow" atomics from x86 architecture um: "borrow" cmpxchg from x86 tree in UML um: borrow bitops from the x86 tree With the patches (compiled with glibc functions), one of my trivial virtual lab tests gets: Time (mean ± σ): 15.918 s ± 0.833 s [User: 10.977 s, System: 5.600 s] Range (min … max): 15.371 s … 17.986 s 10 runs It's not a large improvement, it seems noticable; without the patches I get: Time (mean ± σ): 16.525 s ± 0.884 s [User: 11.355 s, System: 5.648 s] Range (min … max): 15.682 s … 18.088 s 10 runs johannes
On 17/11/2020 11:05, Johannes Berg wrote: > Hi Anton, > > So I thought I'd test your performance patches here, and applied > (hopefully the latest versions of) on top of 5.9: > > um: allow the use of glibc functions instead of builtins > um: Fetch registers only for signals which need them > um: enable the use of optimized xor routines in UML > um: add a UML specific futex implementation > um: Remove use of asprinf in umid.c > um: "borrow" atomics from x86 architecture > um: "borrow" cmpxchg from x86 tree in UML > um: borrow bitops from the x86 tree > > > With the patches (compiled with glibc functions), one of my trivial > virtual lab tests gets: > > Time (mean ± σ): 15.918 s ± 0.833 s [User: 10.977 s, System: 5.600 s] > Range (min … max): 15.371 s … 17.986 s 10 runs > > It's not a large improvement, it seems noticable; without the patches I > get: > > Time (mean ± σ): 16.525 s ± 0.884 s [User: 11.355 s, System: 5.648 s] > Range (min … max): 15.682 s … 18.088 s 10 runs > > johannes > > This is similar to what I get. My usual test is: time busybox find /usr/lib/ -type f -exec cat {} > /dev/null \; I discard the first run and use only runs from fs cache. With stock I get real 34.0 - 36.0 user 29.6 - 29.9 sys 3.4 - 3.6 With the patch-set I get real 32.0 - 34.0 user 28.2 - 29.2 sys 3.0 - 3.4 dd if=/dev/zero of=/dev/null bs=1M on the whole UBD device without the patches for 2nd run and later is 2.0GB/s - 2.1GB/s, with the patches is 2.2GB/s - 2.3GB/s It is not a lot, but something - 2-5% on average depending on actual test. The real gain will be to figure out how to optimize the memory mapper. It is the "handbrake" which slows down everything else.
On Tue, 2020-11-17 at 11:46 +0000, Anton Ivanov wrote: > > My usual test is: > > time busybox find /usr/lib/ -type f -exec cat {} > /dev/null \; > > I discard the first run and use only runs from fs cache. Oh. I didn't even run the timing inside. I ran it *outside*, something like time ./linux args... init=/path/to/test-script.sh johannes
On 17/11/2020 12:11, Johannes Berg wrote: > On Tue, 2020-11-17 at 11:46 +0000, Anton Ivanov wrote: >> >> My usual test is: >> >> time busybox find /usr/lib/ -type f -exec cat {} > /dev/null \; >> >> I discard the first run and use only runs from fs cache. > > Oh. I didn't even run the timing inside. I ran it *outside*, something > like > > time ./linux args... init=/path/to/test-script.sh I usually do a full set of tests on fs access, device IO access and a netperf after each patch. Based on them it looks like it is worth it. The more interesting question is - is this the right organization? We have stuff in multiple places now - arch/x86/um , arch/um, etc. IMHO, we should probably look at getting it organized so that all sub-arches are under the um tree at some point. > > johannes > > > _______________________________________________ > linux-um mailing list > linux-um@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-um >
On 17/11/2020 12:53, Anton Ivanov wrote: > On 17/11/2020 12:11, Johannes Berg wrote: >> On Tue, 2020-11-17 at 11:46 +0000, Anton Ivanov wrote: >>> >>> My usual test is: >>> >>> time busybox find /usr/lib/ -type f -exec cat {} > /dev/null \; >>> >>> I discard the first run and use only runs from fs cache. >> >> Oh. I didn't even run the timing inside. I ran it *outside*, something >> like >> >> time ./linux args... init=/path/to/test-script.sh > > I usually do a full set of tests on fs access, device IO access and a > netperf after each patch. > > Based on them it looks like it is worth it. > > The more interesting question is - is this the right organization? > > We have stuff in multiple places now - arch/x86/um , arch/um, etc. > > IMHO, we should probably look at getting it organized so that all > sub-arches are under the um tree at some point. In the meantime, a backport of these patchsets (string, atomic, bitops, xor, futex, etc) to OpenWRT/UML has clocked 14 days as my main CPE. I have not observed any stability issues and there is some visible improvement in CPU usage. > >> >> johannes >> >> >> _______________________________________________ >> linux-um mailing list >> linux-um@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-um >> > >
diff --git a/arch/um/include/asm/bitops-x86.h b/arch/um/include/asm/bitops-x86.h new file mode 120000 index 000000000000..15a96ff554b2 --- /dev/null +++ b/arch/um/include/asm/bitops-x86.h @@ -0,0 +1 @@ +../../../x86/include/asm/bitops.h \ No newline at end of file diff --git a/arch/um/include/asm/bitops.h b/arch/um/include/asm/bitops.h new file mode 100644 index 000000000000..e578c628a6d5 --- /dev/null +++ b/arch/um/include/asm/bitops.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_UM_BITOPS_H +#define _ASM_UM_BITOPS_H + +#ifdef CONFIG_64BIT + +#undef CONFIG_X86_32 + +#ifndef CONFIG_X86_64 +#define CONFIG_X86_64 +#endif + +#else +#define CONFIG_X86_32 +#endif + +#include <asm/bitops-x86.h> + + +#endif