mbox series

[0/2] LoongArch: Add ifunc support for strchr{nul},

Message ID 20230815094336.1722160-1-dengjianbo@loongson.cn
Headers show
Series LoongArch: Add ifunc support for strchr{nul}, | expand

Message

dengjianbo Aug. 15, 2023, 9:43 a.m. UTC
Although our implementations of strchr, strchrnul, memcpy and memmove
experience performance degradation in a few cases, overall, the
performance gains are significant.

See:
https://github.com/jiadengx/glibc_test/blob/main/bench/strchr_compare.out
https://github.com/jiadengx/glibc_test/blob/main/bench/strchrnul_compare.out
https://github.com/jiadengx/glibc_test/blob/main/bench/memcpy_compare.out
https://github.com/jiadengx/glibc_test/blob/main/bench/memmove_compare.out

In the data, positive values in the parentheses indicate that our
implementation took less time, indicating a performance improvement;
negative values in the parentheses mean that our implementation took
more time, indicating a decrease in performance.

strchr-lasx       reduces the runtime about 50%-83%
strchr-lsx        reduces the runtime about 30%-67%
strchr-aligned    reduces the runtime about 10%-20%

strchrnul-lasx    reduces the runtime about 50%-83%
strchrnul-lsx     reduces the runtime about 36%-65%
strchrnul-aligned reduces the runtime about 6%-10%

memcpy-lasx       reduces the runtime about 8%-76%
memcpy-lsx        reduces the runtime about 8%-72%
memcpy-unaligned  reduces the runtime of unaligned data
                  copying up to 40%
memcpy-aligned    reduece the runtime of unaligned data
                  copying up to 25%

memmove-lasx      reduces the runtime about 20%-73%
memmove-lsx       reduces the runtime about 50%
memmove-unaligned reduces the runtime of unaligned data
                  moving up to 40%
memmove-aligned   reduces the runtime of unaligned data
                  moving up to 25%

comparing command:
python benchtests/scripts/compare_strings.py
-i build/benchtests/bench-strchr.out
-f generic_strchr,__strchr_lasx,__strchr_lsx,__strchr_aligned
-a timings -a length,pos,alignment
-s build/benchtests/bench-strchr.out
-b generic_strchr > strchr_compare.out

python benchtests/scripts/compare_strings.py
-i build/benchtests/bench-strchrnul.out
-f generic_strchrnul,__strchrnul_lasx,__strchrnul_lsx,__strchrnul_aligned
-a timings -a length,pos,alignment
-s build/benchtests/bench-strchrnul.out
-b generic_strchrnul > strchrnul_compare.out

python benchtests/scripts/compare_strings.py 
-i ./build/benchtests/bench-memcpy.out
-f generic_memcpy,__memcpy_lasx,__memcpy_lsx,__memcpy_unaligned,__memcpy_aligned
-a timings -a length,align1,align2,"dst > src"
-s ./build/benchtests/bench-memcpy.out
-b generic_memcpy > memcpy_compare.out

python benchtests/scripts/compare_strings.py
-i ./build/benchtests/bench-memmove.out
-f generic_memmove,__memmove_lasx,__memmove_lsx,__memmove_unaligned,__memmove_aligned
-a timings -a length,align1,align2
-s ./build/benchtests/bench-memmove.out
-b generic_memmove > memmove_compare.out

dengjianbo (2):
  Loongarch: Add ifunc support for strchr{aligned, lsx, lasx} and
    strchrnul{aligned, lsx, lasx}
  Loongarch: Add ifunc support for memcpy{aligned, unaligned, lsx, lasx}
    and memmove{aligned, unaligned, lsx, lasx}

 sysdeps/loongarch/lp64/multiarch/Makefile     |  11 +
 .../lp64/multiarch/ifunc-impl-list.c          |  34 +
 sysdeps/loongarch/lp64/multiarch/ifunc-lasx.h |  45 +
 .../loongarch/lp64/multiarch/ifunc-strchr.h   |  41 +
 .../lp64/multiarch/ifunc-strchrnul.h          |  41 +
 .../loongarch/lp64/multiarch/memcpy-aligned.S | 783 ++++++++++++++++++
 .../loongarch/lp64/multiarch/memcpy-lasx.S    |  20 +
 sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S |  20 +
 .../lp64/multiarch/memcpy-unaligned.S         | 247 ++++++
 sysdeps/loongarch/lp64/multiarch/memcpy.c     |  37 +
 .../lp64/multiarch/memmove-aligned.S          |  20 +
 .../loongarch/lp64/multiarch/memmove-lasx.S   | 287 +++++++
 .../loongarch/lp64/multiarch/memmove-lsx.S    | 534 ++++++++++++
 .../lp64/multiarch/memmove-unaligned.S        | 380 +++++++++
 sysdeps/loongarch/lp64/multiarch/memmove.c    |  38 +
 .../loongarch/lp64/multiarch/strchr-aligned.S |  99 +++
 .../loongarch/lp64/multiarch/strchr-lasx.S    |  91 ++
 sysdeps/loongarch/lp64/multiarch/strchr-lsx.S |  73 ++
 sysdeps/loongarch/lp64/multiarch/strchr.c     |  36 +
 .../lp64/multiarch/strchrnul-aligned.S        |  95 +++
 .../loongarch/lp64/multiarch/strchrnul-lasx.S |  22 +
 .../loongarch/lp64/multiarch/strchrnul-lsx.S  |  22 +
 sysdeps/loongarch/lp64/multiarch/strchrnul.c  |  39 +
 23 files changed, 3015 insertions(+)
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-lasx.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-strchr.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-strchrnul.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcpy.c
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memmove-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memmove-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memmove-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memmove.c
 create mode 100644 sysdeps/loongarch/lp64/multiarch/strchr-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/strchr-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/strchr-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/strchr.c
 create mode 100644 sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/strchrnul.c