Message ID | 53834CE6.2080802@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
> This patch replaces the insrdi by insrwi in powerpc32 assembly. Although they > are not wrong, since all POWER chips supported in 32-bits are 64-bits and the chips > do not thrown an illegal exception when running these instructions, valgrind > fails accusing an invalid one. This code is CPU-specific; as you say, those CPUs can use rldimi just fine. The reason the code uses rldimi instead of rlwimi is because it is faster (at least on power4, power5). Fix valgrind instead? Segher
On 26-05-2014 15:12, Segher Boessenkool wrote: >> This patch replaces the insrdi by insrwi in powerpc32 assembly. Although they >> are not wrong, since all POWER chips supported in 32-bits are 64-bits and the chips >> do not thrown an illegal exception when running these instructions, valgrind >> fails accusing an invalid one. > This code is CPU-specific; as you say, those CPUs can use rldimi just > fine. The reason the code uses rldimi instead of rlwimi is because > it is faster (at least on power4, power5). Fix valgrind instead? > > > Segher > Well, using http://pastebin.com/CttashRQ on a POWER5 (1.9 GHz) I get: > ./test rldimi: min: 7 | max: 9 rlwimi: min: 7 | max: 10 And by issuing 16 instruction per test function I get: > ./test rldimi: min: 7 | max: 9 rlwimi: min: 7 | max: 13 Newer processor (POWER7) also shows the same behavior. And the instructions and not in hot path in the code (it is only called once), so I hardly consider this a performance regression. Anyway, I would prefer to keep consistent and using only 32-bits in 32-bits assembly code to avoid such issues with external tools (valgrind is only an example) and to allow possible future implementation in different chips that do not implement the 64-bits instructions to use powerN code.
> >> This patch replaces the insrdi by insrwi in powerpc32 assembly. Although they > >> are not wrong, since all POWER chips supported in 32-bits are 64-bits and the chips > >> do not thrown an illegal exception when running these instructions, valgrind > >> fails accusing an invalid one. > > This code is CPU-specific; as you say, those CPUs can use rldimi just > > fine. The reason the code uses rldimi instead of rlwimi is because > > it is faster (at least on power4, power5). Fix valgrind instead? > > > > > > Segher > > > Well, using http://pastebin.com/CttashRQ on a POWER5 (1.9 GHz) I get: > > > ./test > rldimi: min: 7 | max: 9 > rlwimi: min: 7 | max: 10 > > And by issuing 16 instruction per test function I get: > > > ./test > rldimi: min: 7 | max: 9 > rlwimi: min: 7 | max: 13 > > Newer processor (POWER7) also shows the same behavior. On a POWER7 I get that rlwimi is almost twice as slow as rldimi, just as expected. The way you constructed your test with a blr immediately after a single rl*imi you get only one per group no matter what. > And the instructions > and not in hot path in the code (it is only called once), so I hardly consider > this a performance regression. That might well be. But see http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html where (part of) this code was added. > Anyway, I would prefer to keep consistent and using only 32-bits in 32-bits > assembly code to avoid such issues with external tools (valgrind is only an > example) and to allow possible future implementation in different chips that > do not implement the 64-bits instructions to use powerN code. I find this not a convincing argument at all. But it's not my call ;-) Segher
On 26-05-2014 18:04, Segher Boessenkool wrote: >>>> This patch replaces the insrdi by insrwi in powerpc32 assembly. Although they >>>> are not wrong, since all POWER chips supported in 32-bits are 64-bits and the chips >>>> do not thrown an illegal exception when running these instructions, valgrind >>>> fails accusing an invalid one. >>> This code is CPU-specific; as you say, those CPUs can use rldimi just >>> fine. The reason the code uses rldimi instead of rlwimi is because >>> it is faster (at least on power4, power5). Fix valgrind instead? >>> >>> >>> Segher >>> >> Well, using http://pastebin.com/CttashRQ on a POWER5 (1.9 GHz) I get: >> >>> ./test >> rldimi: min: 7 | max: 9 >> rlwimi: min: 7 | max: 10 >> >> And by issuing 16 instruction per test function I get: >> >>> ./test >> rldimi: min: 7 | max: 9 >> rlwimi: min: 7 | max: 13 >> >> Newer processor (POWER7) also shows the same behavior. > On a POWER7 I get that rlwimi is almost twice as slow as rldimi, > just as expected. The way you constructed your test with a blr > immediately after a single rl*imi you get only one per group no > matter what. That's why I also constructed a example with 16 instructions (which I didn't included in the link). A more comprehensible example: http://pastebin.com/KD7xSqkJ and running it on POWER7 (3.5GHz): $ taskset -c 24 ./test rldimi1: min: 12 | max: 19 rldimi4: min: 12 | max: 20 rldimi8: min: 12 | max: 20 rldimi16: min: 12 | max: 23 rlwimi1: min: 12 | max: 23 rlwimi4: min: 12 | max: 23 rlwimi8: min: 12 | max: 24 rlwimi16: min: 12 | max: 29 > >> And the instructions >> and not in hot path in the code (it is only called once), so I hardly consider >> this a performance regression. > That might well be. But see http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html > where (part of) this code was added. Yeap I aware, I recalled this thread. Anyway, in attachments I'm sending you the strchr benchtest output in a POWER7 with and without the modification. Since the code patch is taken only once I would expect some latency difference with short length, however the results does not really shown any noticeable slowdown. I will check later on a POWER5 machine, but I also don't really expect much difference. > >> Anyway, I would prefer to keep consistent and using only 32-bits in 32-bits >> assembly code to avoid such issues with external tools (valgrind is only an >> example) and to allow possible future implementation in different chips that >> do not implement the 64-bits instructions to use powerN code. > I find this not a convincing argument at all. But it's not my call ;-) POWER4 ifunc call selection, for instance, use the hwcap bits flags to select the best implementation. If a future chips would like to use the same code to add ifunc as well, it will require a new way to differ from PPC_FEATURE_POWER4 (which is bit 0x00080000). It will add more complexity on this code. It is the for external tools: it will need to add more logic to handle different chips. Also, AFAIK GCC does not generate 64-bits instructions for -m32 (I might be wrong on this one). Anyway, as I said I just change this because 1. it fixes valgrind checks, 2. I didn't see performance compelling reasons, and 3. it do see it more consistent to use 32 bits instructions on 32 bit code. However, 1. can be fixed if 2. is false (and 3. just can be ignored). > > > Segher > simple_STRCHR stupid_STRCHR __strchr_power7 __strchr_ppc Length 32, alignment in bytes 0: 23 201.656 9.75 11.4531 Length 32, alignment in bytes 1: 21.5312 196.188 7.5 10.7812 Length 64, alignment in bytes 0: 37.9531 221.5 11.75 19.8438 Length 64, alignment in bytes 2: 37.8594 220.828 11.875 19.875 Length 128, alignment in bytes 0: 70.4688 272.984 23.4844 32.4375 Length 128, alignment in bytes 3: 70.2969 273.281 22.7656 32.2969 Length 256, alignment in bytes 0: 135.188 374.203 37.3438 57.4844 Length 256, alignment in bytes 4: 185.766 523.484 37.3594 57.25 Length 512, alignment in bytes 0: 527.656 627.25 66.4531 107.781 Length 512, alignment in bytes 5: 265.359 545.375 65.375 107.562 Length 1024, alignment in bytes 0: 525.516 998.297 124.25 207.594 Length 1024, alignment in bytes 6: 525.469 913.25 123.375 207.672 Length 2048, alignment in bytes 0: 1045.94 1777.61 240.75 411.406 Length 2048, alignment in bytes 7: 1045.58 1657.86 240.062 406.5 Length 64, alignment in bytes 1: 37.7812 85.6562 11.7656 19.8438 Length 64, alignment in bytes 1: 37.7812 85.2188 11.3906 19.9219 Length 64, alignment in bytes 2: 37.9531 85.3281 11.2344 19.9219 Length 64, alignment in bytes 2: 37.6562 85.6719 11.25 19.875 Length 64, alignment in bytes 3: 37.7188 84.3281 11.2969 19.75 Length 64, alignment in bytes 3: 37.5781 84.5781 11.2031 19.7812 Length 64, alignment in bytes 4: 37.875 82.4219 11.4531 20 Length 64, alignment in bytes 4: 37.5 81.3594 10.8281 19.7969 Length 64, alignment in bytes 5: 37.7344 82.1719 10.6406 19.875 Length 64, alignment in bytes 5: 37.5312 81.5156 10.6562 19.9062 Length 64, alignment in bytes 6: 37.7656 82.1406 10.6094 19.75 Length 64, alignment in bytes 6: 37.7188 81.1875 10.7344 19.9219 Length 64, alignment in bytes 7: 37.625 81.3125 10.5938 19.75 Length 64, alignment in bytes 7: 37.7031 81.125 10.6562 20 Length 0, alignment in bytes 0: 3.65625 7.09375 4 4.75 Length 0, alignment in bytes 0: 2.03125 6.5625 3.71875 4.28125 Length 1, alignment in bytes 0: 2.6875 7.17188 3.45312 4.25 Length 1, alignment in bytes 0: 2.5 7 3.46875 4.26562 Length 2, alignment in bytes 0: 6.54688 8.125 3.35938 4.34375 Length 2, alignment in bytes 0: 8.51562 7.65625 3.53125 4.375 Length 3, alignment in bytes 0: 6.82812 8.71875 3.5625 4.92188 Length 3, alignment in bytes 0: 7.14062 7.96875 3.5 4.73438 Length 4, alignment in bytes 0: 7.42188 9 4.51562 5.51562 Length 4, alignment in bytes 0: 7.28125 8.5625 4.1875 4.76562 Length 5, alignment in bytes 0: 7.45312 9.40625 3.8125 4.79688 Length 5, alignment in bytes 0: 7.75 9.04688 3.92188 4.76562 Length 6, alignment in bytes 0: 8.28125 10.25 3.70312 4.71875 Length 6, alignment in bytes 0: 8.28125 9.71875 3.89062 4.76562 Length 7, alignment in bytes 0: 8.84375 12.375 3.85938 5.3125 Length 7, alignment in bytes 0: 8.73438 11.4531 3.84375 5.10938 Length 8, alignment in bytes 0: 9.21875 12.3125 5.21875 6.0625 Length 8, alignment in bytes 0: 9.3125 12 5.03125 5.76562 Length 9, alignment in bytes 0: 9.67188 13.2812 4.71875 5.73438 Length 9, alignment in bytes 0: 9.67188 13.0469 4.67188 5.73438 Length 10, alignment in bytes 0: 10.3281 13.7969 4.73438 5.53125 Length 10, alignment in bytes 0: 10.4375 13.5 4.625 5.6875 Length 11, alignment in bytes 0: 10.8125 14 4.71875 6.25 Length 11, alignment in bytes 0: 10.6406 13.8594 4.57812 5.67188 Length 12, alignment in bytes 0: 11.1562 15 5.28125 6.84375 Length 12, alignment in bytes 0: 11.25 14.75 5.125 6.40625 Length 13, alignment in bytes 0: 11.7344 15.2188 4.95312 6.35938 Length 13, alignment in bytes 0: 11.8906 14.9688 5 6.4375 Length 14, alignment in bytes 0: 12.3594 15.7812 5 6.29688 Length 14, alignment in bytes 0: 12.25 15.1094 5.04688 6.3125 Length 15, alignment in bytes 0: 12.7344 19.0938 4.92188 6.84375 Length 15, alignment in bytes 0: 12.8594 18.1406 5 6.54688 Length 16, alignment in bytes 0: 13.3906 19.7031 6.45312 7.96875 Length 16, alignment in bytes 0: 13.0625 19.3125 5.71875 7.17188 Length 17, alignment in bytes 0: 13.9062 21.7812 5.6875 7.5 Length 17, alignment in bytes 0: 13.8281 20.4219 5.6875 7.25 Length 18, alignment in bytes 0: 14.3125 24.7656 5.625 7.40625 Length 18, alignment in bytes 0: 14.3125 24.7031 5.64062 7.14062 Length 19, alignment in bytes 0: 14.6875 25.5312 5.54688 7.78125 Length 19, alignment in bytes 0: 14.6094 24.9844 5.76562 7.35938 Length 20, alignment in bytes 0: 15.2969 25.2031 6.28125 8.45312 Length 20, alignment in bytes 0: 15.2812 25.4062 5.9375 8.32812 Length 21, alignment in bytes 0: 15.6719 26.0156 5.75 8.09375 Length 21, alignment in bytes 0: 15.7812 25.875 5.76562 8.25 Length 22, alignment in bytes 0: 16.4375 26.8594 5.79688 8.1875 Length 22, alignment in bytes 0: 16.4219 27 5.78125 8.25 Length 23, alignment in bytes 0: 16.9062 31.5625 5.79688 8.75 Length 23, alignment in bytes 0: 16.75 30.5 5.73438 8.34375 Length 24, alignment in bytes 0: 17.4844 30.1719 7.1875 9.3125 Length 24, alignment in bytes 0: 17.4219 30.7969 6.60938 8.75 Length 25, alignment in bytes 0: 17.9219 31.2188 6.51562 8.82812 Length 25, alignment in bytes 0: 17.9375 30.3125 6.34375 8.84375 Length 26, alignment in bytes 0: 18.25 31.2656 6.45312 8.82812 Length 26, alignment in bytes 0: 18.2656 31.4688 6.34375 8.75 Length 27, alignment in bytes 0: 18.8594 31.9062 6.35938 9.17188 Length 27, alignment in bytes 0: 18.9062 31.2969 6.42188 8.85938 Length 28, alignment in bytes 0: 19.4062 32.2188 7.5 9.73438 Length 28, alignment in bytes 0: 19.5 32 6.92188 9.46875 Length 29, alignment in bytes 0: 19.9688 32.6406 6.75 9.4375 Length 29, alignment in bytes 0: 19.8125 32.2812 6.78125 9.5 Length 30, alignment in bytes 0: 20.3594 34.625 6.85938 9.5 Length 30, alignment in bytes 0: 20.3281 34.0781 6.75 9.35938 Length 31, alignment in bytes 0: 21 37.6562 6.92188 9.85938 Length 31, alignment in bytes 0: 20.7188 36.7344 6.76562 9.57812 Length 32, alignment in bytes 0: 21.6094 36.7344 6.46875 10.3594 Length 32, alignment in bytes 1: 21.4844 36.5938 6.0625 10.25 Length 64, alignment in bytes 0: 37.7031 64.0469 9.20312 19.75 Length 64, alignment in bytes 2: 37.75 63.4688 8.40625 19.5 Length 128, alignment in bytes 0: 70.0938 118.703 13.6562 32.0781 Length 128, alignment in bytes 3: 70.0156 118.25 13.0156 32 Length 256, alignment in bytes 0: 135.047 238.5 28.0312 56.8281 Length 256, alignment in bytes 4: 135.234 221.203 27.5156 57.0312 Length 512, alignment in bytes 0: 265.281 457.75 47.125 106.766 Length 512, alignment in bytes 5: 265.375 426.359 46.5 107.188 Length 1024, alignment in bytes 0: 525.391 912.156 85.2344 206.922 Length 1024, alignment in bytes 6: 525.5 842 85.4688 207 Length 2048, alignment in bytes 0: 1045.5 1778.64 163.734 409.719 Length 2048, alignment in bytes 7: 1045.47 1656.89 163.266 407.641 Length 64, alignment in bytes 1: 37.6094 64.5 8.70312 19.7031 Length 64, alignment in bytes 1: 37.75 58.9219 8.39062 19.4844 Length 64, alignment in bytes 2: 37.9062 63.1094 8.1875 19.5938 Length 64, alignment in bytes 2: 37.5469 63.5156 8.14062 19.5938 Length 64, alignment in bytes 3: 37.6719 64.4688 8.14062 19.75 Length 64, alignment in bytes 3: 37.75 63.5781 8.01562 19.7188 Length 64, alignment in bytes 4: 37.8438 64.3281 8.39062 19.5469 Length 64, alignment in bytes 4: 37.875 64.3906 8 19.7344 Length 64, alignment in bytes 5: 37.6406 64.1406 7.6875 19.625 Length 64, alignment in bytes 5: 37.8906 64.125 7.78125 19.6094 Length 64, alignment in bytes 6: 37.8594 64.6406 7.70312 19.6406 Length 64, alignment in bytes 6: 37.6719 64.4219 7.89062 19.5 Length 64, alignment in bytes 7: 37.8281 64 7.82812 19.5938 Length 64, alignment in bytes 7: 37.75 64.7031 7.78125 19.6875 Length 0, alignment in bytes 0: 2.32812 6.92188 3.45312 4.70312 Length 0, alignment in bytes 0: 1.96875 6.5625 3.04688 4.34375 Length 1, alignment in bytes 0: 2.6875 7.03125 3.20312 4.25 Length 1, alignment in bytes 0: 2.46875 6.64062 3.04688 4.5 Length 2, alignment in bytes 0: 5.85938 8 2.95312 4.10938 Length 2, alignment in bytes 0: 6 7.53125 3.04688 4.20312 Length 3, alignment in bytes 0: 6.84375 8.35938 3 4.20312 Length 3, alignment in bytes 0: 6.65625 7.89062 2.98438 4.14062 Length 4, alignment in bytes 0: 7.4375 9.15625 3.82812 5.29688 Length 4, alignment in bytes 0: 7.28125 8.92188 3.48438 4.92188 Length 5, alignment in bytes 0: 7.79688 9.3125 3 4.75 Length 5, alignment in bytes 0: 7.59375 9.67188 3 4.89062 Length 6, alignment in bytes 0: 8.20312 9.92188 3.07812 4.70312 Length 6, alignment in bytes 0: 8.32812 10.125 3.125 4.67188 Length 7, alignment in bytes 0: 8.71875 10.8594 3.35938 4.82812 Length 7, alignment in bytes 0: 8.71875 10.0625 3.28125 4.76562 Length 8, alignment in bytes 0: 9.40625 13 4.75 5.73438 Length 8, alignment in bytes 0: 9.07812 12.0156 4 5.71875 Length 9, alignment in bytes 0: 9.64062 13.4844 3.78125 5.65625 Length 9, alignment in bytes 0: 9.67188 13.1406 3.92188 5.67188 Length 10, alignment in bytes 0: 10.1875 13.6875 3.875 5.57812 Length 10, alignment in bytes 0: 10.3281 13.6562 3.89062 5.625 Length 11, alignment in bytes 0: 10.9375 14.7188 3.82812 5.57812 Length 11, alignment in bytes 0: 10.8125 14.4219 3.79688 5.59375 Length 12, alignment in bytes 0: 11.4688 15.25 4.5 6.84375 Length 12, alignment in bytes 0: 11.1719 14.8438 3.95312 6.48438 Length 13, alignment in bytes 0: 11.8125 15.5156 3.65625 6.4375 Length 13, alignment in bytes 0: 11.8281 15.0312 3.67188 6.39062 Length 14, alignment in bytes 0: 12.2969 16.5625 3.75 6.32812 Length 14, alignment in bytes 0: 12.3281 15.0312 3.76562 6.375 Length 15, alignment in bytes 0: 12.6094 17.0469 3.67188 6.34375 Length 15, alignment in bytes 0: 12.7344 16.4375 3.98438 6.35938 Length 16, alignment in bytes 0: 13.1875 19.9219 10.2031 7.75 Length 16, alignment in bytes 0: 13.1562 18.9531 9.875 7.25 Length 17, alignment in bytes 0: 13.6719 20.4219 9.73438 7.14062 Length 17, alignment in bytes 0: 13.8281 20.3594 9.76562 7.23438 Length 18, alignment in bytes 0: 14.3438 24.75 9.85938 7.26562 Length 18, alignment in bytes 0: 14.4219 24.7344 9.65625 7.20312 Length 19, alignment in bytes 0: 14.75 26.2188 9.65625 7.17188 Length 19, alignment in bytes 0: 14.8594 26 9.6875 7.32812 Length 20, alignment in bytes 0: 15.3594 25.5156 4.625 8.48438 Length 20, alignment in bytes 0: 15.1094 27.0469 4.25 8.20312 Length 21, alignment in bytes 0: 15.8438 27.9219 4.07812 8.25 Length 21, alignment in bytes 0: 15.7812 28.8281 4.20312 8.28125 Length 22, alignment in bytes 0: 16.4375 29.2188 4.17188 8.25 Length 22, alignment in bytes 0: 16.4531 29.5938 4.125 8.14062 Length 23, alignment in bytes 0: 16.8281 27.1094 4 8.10938 Length 23, alignment in bytes 0: 16.8125 27.1875 4.20312 8.10938 Length 24, alignment in bytes 0: 17.5 31.4531 6 9.125 Length 24, alignment in bytes 0: 17.5 30.625 5.40625 8.84375 Length 25, alignment in bytes 0: 17.8281 31.0625 5.26562 8.75 Length 25, alignment in bytes 0: 17.9219 29.125 5.5 8.85938 Length 26, alignment in bytes 0: 18.4844 31.7188 5.10938 8.95312 Length 26, alignment in bytes 0: 18.3594 31.8906 5.10938 8.9375 Length 27, alignment in bytes 0: 18.9375 31.7031 5.1875 8.82812 Length 27, alignment in bytes 0: 18.8906 31.7344 5.1875 8.82812 Length 28, alignment in bytes 0: 19.375 33.25 9.98438 9.92188 Length 28, alignment in bytes 0: 19.25 33.6094 9.6875 9.73438 Length 29, alignment in bytes 0: 19.9531 33.9844 11 9.65625 Length 29, alignment in bytes 0: 19.9688 33.7656 9.46875 9.53125 Length 30, alignment in bytes 0: 20.3438 34.4062 9.53125 9.5 Length 30, alignment in bytes 0: 20.5 34.0156 10.5938 9.5625 Length 31, alignment in bytes 0: 20.9375 34.9062 11.4531 9.625 Length 31, alignment in bytes 0: 20.875 36.3906 11.1094 9.625 simple_STRCHR stupid_STRCHR __strchr_power7 __strchr_ppc Length 32, alignment in bytes 0: 23.5625 209.562 10.25 13.3125 Length 32, alignment in bytes 1: 21.5156 196.406 7.53125 10.7969 Length 64, alignment in bytes 0: 37.9219 221.297 12.2812 19.8438 Length 64, alignment in bytes 2: 37.7188 222 11.7188 19.8438 Length 128, alignment in bytes 0: 70.25 272.047 22.9062 32.5 Length 128, alignment in bytes 3: 70.3906 273.375 22.8438 32.5 Length 256, alignment in bytes 0: 135.172 369.406 37.3125 57.6562 Length 256, alignment in bytes 4: 135.484 360.969 37.5156 57.7031 Length 512, alignment in bytes 0: 265.266 572.812 66.3906 108.25 Length 512, alignment in bytes 5: 265.25 542.641 65.7812 107.828 Length 1024, alignment in bytes 0: 525.484 995.219 124.188 209.141 Length 1024, alignment in bytes 6: 525.5 919.625 123.922 209.031 Length 2048, alignment in bytes 0: 1045.92 1807.25 241.641 411.703 Length 2048, alignment in bytes 7: 1045.73 1643.98 240.531 411.078 Length 64, alignment in bytes 1: 37.9844 83.6406 12 20.0156 Length 64, alignment in bytes 1: 37.6719 85.0312 11.7344 19.9375 Length 64, alignment in bytes 2: 37.75 85.6562 11.6875 19.8594 Length 64, alignment in bytes 2: 37.6406 85.125 11.3438 20.0781 Length 64, alignment in bytes 3: 37.6094 86 11.375 19.8125 Length 64, alignment in bytes 3: 37.625 85.5 11.4219 19.8906 Length 64, alignment in bytes 4: 37.8906 82.8281 11.4219 19.8906 Length 64, alignment in bytes 4: 37.6562 82.3281 10.9062 20 Length 64, alignment in bytes 5: 37.75 82.1719 10.6719 19.8438 Length 64, alignment in bytes 5: 37.75 82.0781 10.6406 19.9844 Length 64, alignment in bytes 6: 37.6875 82.4844 10.5 19.8125 Length 64, alignment in bytes 6: 37.75 82.2969 10.7344 20 Length 64, alignment in bytes 7: 37.6875 82.4531 10.7344 19.9062 Length 64, alignment in bytes 7: 37.8594 82.2031 10.7031 20.0156 Length 0, alignment in bytes 0: 3.67188 7.07812 4.25 4.89062 Length 0, alignment in bytes 0: 2.10938 6.75 3.75 4.39062 Length 1, alignment in bytes 0: 3.15625 7.40625 3.45312 4.26562 Length 1, alignment in bytes 0: 2.89062 7.35938 3.5 4.14062 Length 2, alignment in bytes 0: 8.82812 8.375 3.40625 4.40625 Length 2, alignment in bytes 0: 8.6875 7.75 3.54688 4.34375 Length 3, alignment in bytes 0: 6.9375 8.64062 3.51562 4.60938 Length 3, alignment in bytes 0: 6.79688 8 3.54688 4.6875 Length 4, alignment in bytes 0: 7.28125 9.42188 4.53125 5.54688 Length 4, alignment in bytes 0: 7.4375 8.75 4.10938 4.84375 Length 5, alignment in bytes 0: 7.57812 10.3281 3.95312 5.01562 Length 5, alignment in bytes 0: 7.67188 9.01562 3.79688 4.82812 Length 6, alignment in bytes 0: 8.34375 10.5 3.85938 4.75 Length 6, alignment in bytes 0: 8.21875 10.4062 3.8125 4.85938 Length 7, alignment in bytes 0: 8.65625 12.8125 3.92188 5.51562 Length 7, alignment in bytes 0: 8.71875 11.2812 3.84375 5.29688 Length 8, alignment in bytes 0: 9.20312 12.8438 5.17188 6.3125 Length 8, alignment in bytes 0: 9.32812 11.9688 4.90625 5.8125 Length 9, alignment in bytes 0: 9.625 13.2812 4.76562 5.5625 Length 9, alignment in bytes 0: 9.85938 13.2188 4.67188 5.6875 Length 10, alignment in bytes 0: 10.3438 13.75 4.6875 5.5 Length 10, alignment in bytes 0: 10.25 13.7656 4.71875 5.67188 Length 11, alignment in bytes 0: 10.9062 14.5 4.73438 6.20312 Length 11, alignment in bytes 0: 10.8906 13.5625 4.73438 5.75 Length 12, alignment in bytes 0: 11.2344 14.8281 5.23438 7.29688 Length 12, alignment in bytes 0: 11.2188 14.6094 5.25 6.48438 Length 13, alignment in bytes 0: 11.9844 15.5156 5.0625 6.59375 Length 13, alignment in bytes 0: 11.9062 15.0938 4.95312 6.34375 Length 14, alignment in bytes 0: 12.3594 16.25 5 6.45312 Length 14, alignment in bytes 0: 12.3281 15.0781 5.01562 6.4375 Length 15, alignment in bytes 0: 12.75 18.6562 4.92188 7.04688 Length 15, alignment in bytes 0: 12.75 19.0156 4.98438 6.71875 Length 16, alignment in bytes 0: 13.3594 19.0312 6.20312 7.67188 Length 16, alignment in bytes 0: 13.1562 18.6719 5.82812 7.10938 Length 17, alignment in bytes 0: 13.9219 21.9219 5.71875 7.25 Length 17, alignment in bytes 0: 13.7812 20.3906 5.65625 7.21875 Length 18, alignment in bytes 0: 14.1094 24.7812 5.78125 7.25 Length 18, alignment in bytes 0: 14.2188 24.8281 5.73438 7.23438 Length 19, alignment in bytes 0: 14.6562 26.4688 5.60938 7.98438 Length 19, alignment in bytes 0: 14.7031 24.9219 5.57812 7.35938 Length 20, alignment in bytes 0: 15.2031 25.3438 6.40625 8.4375 Length 20, alignment in bytes 0: 15.1719 26.75 5.92188 8.20312 Length 21, alignment in bytes 0: 15.9219 26.1406 5.84375 8.34375 Length 21, alignment in bytes 0: 15.75 26.2969 5.8125 8.03125 Length 22, alignment in bytes 0: 16.2812 27 5.78125 8.20312 Length 22, alignment in bytes 0: 16.2188 27.5 5.85938 8.15625 Length 23, alignment in bytes 0: 16.75 31.7031 5.84375 8.75 Length 23, alignment in bytes 0: 16.7656 30.6562 5.79688 8.375 Length 24, alignment in bytes 0: 17.1719 30.5469 7.46875 9.09375 Length 24, alignment in bytes 0: 17.1875 30 6.75 8.75 Length 25, alignment in bytes 0: 17.6562 32.5469 6.4375 8.84375 Length 25, alignment in bytes 0: 17.7344 30.0312 6.45312 8.95312 Length 26, alignment in bytes 0: 18.375 31.3438 6.46875 8.73438 Length 26, alignment in bytes 0: 18.25 31.3594 6.46875 8.85938 Length 27, alignment in bytes 0: 18.6406 32.2344 6.45312 9.57812 Length 27, alignment in bytes 0: 18.6562 31.75 6.51562 8.96875 Length 28, alignment in bytes 0: 19.3594 32.5625 7.28125 9.67188 Length 28, alignment in bytes 0: 19.4531 32.1406 6.95312 9.60938 Length 29, alignment in bytes 0: 19.9688 33.2344 6.98438 9.46875 Length 29, alignment in bytes 0: 19.7969 32.8594 6.89062 9.51562 Length 30, alignment in bytes 0: 20.3594 34.0938 6.76562 9.57812 Length 30, alignment in bytes 0: 20.2812 33.7812 6.89062 9.65625 Length 31, alignment in bytes 0: 20.8594 38.4062 6.73438 9.84375 Length 31, alignment in bytes 0: 20.9219 37.2031 6.78125 9.70312 Length 32, alignment in bytes 0: 21.5 36.9844 6.46875 10.4375 Length 32, alignment in bytes 1: 21.4375 36.75 6.10938 10.5 Length 64, alignment in bytes 0: 37.7031 64.1562 8.6875 19.8906 Length 64, alignment in bytes 2: 37.75 63.3594 8.40625 19.5938 Length 128, alignment in bytes 0: 70.2031 116.984 13.9219 32.1719 Length 128, alignment in bytes 3: 70.0781 118.141 13.3438 32.1562 Length 256, alignment in bytes 0: 135.078 239.953 27.6094 57 Length 256, alignment in bytes 4: 135.25 221.781 27.8594 57.5 Length 512, alignment in bytes 0: 265.281 460.062 47.2344 107.562 Length 512, alignment in bytes 5: 265.25 427.344 46.4531 107.875 Length 1024, alignment in bytes 0: 525.344 912.016 86.0469 208.438 Length 1024, alignment in bytes 6: 525.391 832.109 85.75 208.656 Length 2048, alignment in bytes 0: 1045.69 1819.2 163.281 411.562 Length 2048, alignment in bytes 7: 1045.53 1680.5 282.812 731.234 Length 64, alignment in bytes 1: 50.4531 72.625 10.9531 25.7812 Length 64, alignment in bytes 1: 50.5 70.1406 8.29688 17.8906 Length 64, alignment in bytes 2: 37.6562 63.7188 8.15625 17.9219 Length 64, alignment in bytes 2: 37.75 62.5 8.14062 17.9844 Length 64, alignment in bytes 3: 37.9375 60.9062 8.09375 17.9375 Length 64, alignment in bytes 3: 37.5 62 8.28125 18 Length 64, alignment in bytes 4: 37.7969 65 8.07812 18 Length 64, alignment in bytes 4: 37.875 64.5469 7.73438 18.0469 Length 64, alignment in bytes 5: 37.6094 64.3125 7.51562 17.9531 Length 64, alignment in bytes 5: 37.7969 64.8594 7.54688 17.9062 Length 64, alignment in bytes 6: 37.75 64.2969 7.51562 17.9375 Length 64, alignment in bytes 6: 37.5781 64.3594 7.60938 17.9375 Length 64, alignment in bytes 7: 37.6875 64.5156 7.39062 17.9219 Length 64, alignment in bytes 7: 37.6875 64.4531 7.5625 18.0156 Length 0, alignment in bytes 0: 2.21875 6.39062 3.65625 4.03125 Length 0, alignment in bytes 0: 2.01562 6.32812 3.35938 3.875 Length 1, alignment in bytes 0: 2.57812 7.21875 3.125 3.9375 Length 1, alignment in bytes 0: 2.64062 6.82812 3.10938 3.78125 Length 2, alignment in bytes 0: 5.625 7.75 3 3.75 Length 2, alignment in bytes 0: 5.4375 7.5 3.04688 3.75 Length 3, alignment in bytes 0: 6.73438 8.29688 2.92188 3.73438 Length 3, alignment in bytes 0: 9.625 8.125 3 3.67188 Length 4, alignment in bytes 0: 7.64062 8.9375 3.5625 4.9375 Length 4, alignment in bytes 0: 7.75 8.79688 3.28125 4.20312 Length 5, alignment in bytes 0: 7.6875 9.85938 2.98438 4.20312 Length 5, alignment in bytes 0: 7.60938 9.04688 3.04688 4.1875 Length 6, alignment in bytes 0: 8.3125 9.76562 3.17188 4.20312 Length 6, alignment in bytes 0: 8.26562 9.64062 3.15625 4.09375 Length 7, alignment in bytes 0: 8.82812 10.9219 3.125 4.0625 Length 7, alignment in bytes 0: 8.53125 10.3125 3.23438 4.10938 Length 8, alignment in bytes 0: 9.53125 12.75 4.375 5.20312 Length 8, alignment in bytes 0: 9.28125 11.7812 4 4.75 Length 9, alignment in bytes 0: 9.78125 12.8594 3.71875 4.90625 Length 9, alignment in bytes 0: 9.71875 12.7656 3.70312 4.76562 Length 10, alignment in bytes 0: 10.0469 13.25 3.70312 4.75 Length 10, alignment in bytes 0: 10.0781 13.25 3.57812 4.57812 Length 11, alignment in bytes 0: 10.7656 14 3.5625 4.76562 Length 11, alignment in bytes 0: 10.6875 13.7969 3.5 4.67188 Length 12, alignment in bytes 0: 11.1719 14.6719 4.6875 6 Length 12, alignment in bytes 0: 11.0938 14.125 4 5.5 Length 13, alignment in bytes 0: 11.8594 15.5312 3.84375 5.40625 Length 13, alignment in bytes 0: 11.75 14.9219 3.67188 5.45312 Length 14, alignment in bytes 0: 12.1094 16.25 3.92188 5.29688 Length 14, alignment in bytes 0: 12.0312 16.0781 3.875 5.5 Length 15, alignment in bytes 0: 12.75 16.9062 3.75 5.42188 Length 15, alignment in bytes 0: 12.75 16.3438 3.70312 5.46875 Length 16, alignment in bytes 0: 13.1406 19.0938 10.0469 6.67188 Length 16, alignment in bytes 0: 13.3594 19.0781 10.1094 6.14062 Length 17, alignment in bytes 0: 13.6094 19.0469 9.60938 6 Length 17, alignment in bytes 0: 13.7031 19.625 9.6875 6.0625 Length 18, alignment in bytes 0: 14.2188 23.8594 9.625 5.96875 Length 18, alignment in bytes 0: 14.2969 23.9531 9.5 5.98438 Length 19, alignment in bytes 0: 14.7031 24.5312 9.60938 5.96875 Length 19, alignment in bytes 0: 14.7656 24.25 9.5625 5.98438 Length 20, alignment in bytes 0: 15.1406 25.6875 4.5625 7.40625 Length 20, alignment in bytes 0: 15.3281 25.7812 4.28125 7.17188 Length 21, alignment in bytes 0: 15.6562 27.2656 4.0625 7.125 Length 21, alignment in bytes 0: 15.8438 27.0938 3.84375 7.03125 Length 22, alignment in bytes 0: 16.1719 27.0938 3.96875 7.14062 Length 22, alignment in bytes 0: 16.2344 27 4.01562 7.09375 Length 23, alignment in bytes 0: 16.6562 29.3281 3.98438 7.40625 Length 23, alignment in bytes 0: 16.7344 29.2812 4.0625 7.03125 Length 24, alignment in bytes 0: 17.3438 29.3906 5.75 7.89062 Length 24, alignment in bytes 0: 17.1719 29.2344 5.39062 7.46875 Length 25, alignment in bytes 0: 17.6719 29.5 5.29688 7.375 Length 25, alignment in bytes 0: 17.75 29.5625 5 7.375 Length 26, alignment in bytes 0: 18.2812 30.3594 4.96875 7.34375 Length 26, alignment in bytes 0: 18.1562 30.4062 5.0625 7.5 Length 27, alignment in bytes 0: 18.9062 31.1406 5.09375 7.5 Length 27, alignment in bytes 0: 18.6875 31.0625 5.10938 7.65625 Length 28, alignment in bytes 0: 19.2031 31.7812 9.73438 8.54688 Length 28, alignment in bytes 0: 19.1875 31.75 9.59375 8.25 Length 29, alignment in bytes 0: 19.9375 33.0469 9.64062 8.21875 Length 29, alignment in bytes 0: 19.75 33.25 9.40625 8.125 Length 30, alignment in bytes 0: 20.3125 34.2656 9.34375 8.07812 Length 30, alignment in bytes 0: 20.2969 33.7969 9.20312 8.15625 Length 31, alignment in bytes 0: 20.7812 34.5 10.4375 8.01562 Length 31, alignment in bytes 0: 20.8438 34.5 9.6875 8.04688
diff --git a/sysdeps/powerpc/powerpc32/power4/memset.S b/sysdeps/powerpc/powerpc32/power4/memset.S index 88110e3..8b746a6 100644 --- a/sysdeps/powerpc/powerpc32/power4/memset.S +++ b/sysdeps/powerpc/powerpc32/power4/memset.S @@ -50,7 +50,7 @@ L(_memset): /* Align to word boundary. */ cmplwi cr5, rLEN, 31 - insrdi rCHR, rCHR, 8, 48 /* Replicate byte to halfword. */ + insrwi rCHR, rCHR, 8, 16 /* Replicate byte to halfword. */ beq+ L(aligned) mtcrf 0x01, rMEMP0 subfic rALIGN, rALIGN, 4 @@ -65,7 +65,7 @@ L(g0): /* Handle the case of size < 31. */ L(aligned): mtcrf 0x01, rLEN - insrdi rCHR, rCHR, 16, 32 /* Replicate halfword to word. */ + insrwi rCHR, rCHR, 16, 0 /* Replicate halfword to word. */ ble cr5, L(medium) /* Align to 32-byte boundary. */ andi. rALIGN, rMEMP, 0x1C diff --git a/sysdeps/powerpc/powerpc32/power6/memset.S b/sysdeps/powerpc/powerpc32/power6/memset.S index 4b18fa7..445fa44 100644 --- a/sysdeps/powerpc/powerpc32/power6/memset.S +++ b/sysdeps/powerpc/powerpc32/power6/memset.S @@ -48,7 +48,7 @@ L(_memset): ble- cr1, L(small) /* Align to word boundary. */ cmplwi cr5, rLEN, 31 - insrdi rCHR, rCHR, 8, 48 /* Replicate byte to halfword. */ + insrwi rCHR, rCHR, 8, 16 /* Replicate byte to halfword. */ beq+ L(aligned) mtcrf 0x01, rMEMP0 subfic rALIGN, rALIGN, 4 @@ -64,7 +64,7 @@ L(g0): /* Handle the case of size < 31. */ L(aligned): mtcrf 0x01, rLEN - insrdi rCHR, rCHR, 16, 32 /* Replicate halfword to word. */ + insrwi rCHR, rCHR, 16, 0 /* Replicate halfword to word. */ ble cr5, L(medium) /* Align to 32-byte boundary. */ andi. rALIGN, rMEMP, 0x1C diff --git a/sysdeps/powerpc/powerpc32/power7/memchr.S b/sysdeps/powerpc/powerpc32/power7/memchr.S index 1d6a0d6..ccdd7cf 100644 --- a/sysdeps/powerpc/powerpc32/power7/memchr.S +++ b/sysdeps/powerpc/powerpc32/power7/memchr.S @@ -25,9 +25,9 @@ ENTRY (__memchr) CALL_MCOUNT dcbt 0,r3 clrrwi r8,r3,2 - insrdi r4,r4,8,48 + insrwi r4,r4,8,16 /* Replicate byte to word. */ add r7,r3,r5 /* Calculate the last acceptable address. */ - insrdi r4,r4,16,32 + insrwi r4,r4,16,0 cmplwi r5,16 li r9, -1 rlwinm r6,r3,3,27,28 /* Calculate padding. */ diff --git a/sysdeps/powerpc/powerpc32/power7/memrchr.S b/sysdeps/powerpc/powerpc32/power7/memrchr.S index ebfd540..b05bf32 100644 --- a/sysdeps/powerpc/powerpc32/power7/memrchr.S +++ b/sysdeps/powerpc/powerpc32/power7/memrchr.S @@ -32,8 +32,8 @@ ENTRY (__memrchr) dcbt r9,r6,16 /* Stream hint, decreasing addresses. */ /* Replicate BYTE to word. */ - rldimi r4,r4,8,48 - rldimi r4,r4,16,32 + insrwi r4,r4,8,16 + insrwi r4,r4,16,0 li r6,-4 li r9,-1 rlwinm r0,r0,3,27,28 /* Calculate padding. */ diff --git a/sysdeps/powerpc/powerpc32/power7/memset.S b/sysdeps/powerpc/powerpc32/power7/memset.S index ae18761..34fc1ad 100644 --- a/sysdeps/powerpc/powerpc32/power7/memset.S +++ b/sysdeps/powerpc/powerpc32/power7/memset.S @@ -35,8 +35,8 @@ L(_memset): cfi_offset(31,-8) /* Replicate byte to word. */ - insrdi 4,4,8,48 - insrdi 4,4,16,32 + insrwi 4,4,8,16 + insrwi 4,4,16,0 ble cr6,L(small) /* If length <= 8, use short copy code. */ diff --git a/sysdeps/powerpc/powerpc32/power7/rawmemchr.S b/sysdeps/powerpc/powerpc32/power7/rawmemchr.S index dec4db0..8ccf186 100644 --- a/sysdeps/powerpc/powerpc32/power7/rawmemchr.S +++ b/sysdeps/powerpc/powerpc32/power7/rawmemchr.S @@ -27,8 +27,8 @@ ENTRY (__rawmemchr) clrrwi r8,r3,2 /* Align the address to word boundary. */ /* Replicate byte to word. */ - rldimi r4,r4,8,48 - rldimi r4,r4,16,32 + insrwi r4,r4,8,16 + insrwi r4,r4,16,0 /* Now r4 has a word of c bytes. */ diff --git a/sysdeps/powerpc/powerpc32/power7/strchr.S b/sysdeps/powerpc/powerpc32/power7/strchr.S index f7ecb72..d795833 100644 --- a/sysdeps/powerpc/powerpc32/power7/strchr.S +++ b/sysdeps/powerpc/powerpc32/power7/strchr.S @@ -35,8 +35,8 @@ ENTRY (strchr) beq cr7,L(null_match) /* Replicate byte to word. */ - insrdi r4,r4,8,48 - insrdi r4,r4,16,32 + insrwi r4,r4,8,16 + insrwi r4,r4,16,0 /* Now r4 has a word of c bytes and r0 has a word of null bytes. */ diff --git a/sysdeps/powerpc/powerpc32/power7/strchrnul.S b/sysdeps/powerpc/powerpc32/power7/strchrnul.S index ece8237..dcc7620 100644 --- a/sysdeps/powerpc/powerpc32/power7/strchrnul.S +++ b/sysdeps/powerpc/powerpc32/power7/strchrnul.S @@ -27,8 +27,8 @@ ENTRY (__strchrnul) clrrwi r8,r3,2 /* Align the address to word boundary. */ /* Replicate byte to word. */ - insrdi r4,r4,8,48 - insrdi r4,r4,16,32 + insrwi r4,r4,8,16 + insrwi r4,r4,16,0 rlwinm r6,r3,3,27,28 /* Calculate padding. */ lwz r12,0(r8) /* Load word from memory. */