[0/9] rs6000: Rework rs6000_emit_vector_compare

Message ID	20221124091557.514727-1-linkw@linux.ibm.com
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7DE8D38432D5 To: gcc-patches@gcc.gnu.org Cc: Kewen Lin <linkw@linux.ibm.com>, segher@kernel.crashing.org, dje.gcc@gmail.com, bergner@linux.ibm.com, meissner@linux.ibm.com Subject: [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare Date: Thu, 24 Nov 2022 03:15:48 -0600 Message-Id: <20221124091557.514727-1-linkw@linux.ibm.com> Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Precedence: list From: Kewen Lin via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Kewen Lin <linkw@linux.ibm.com> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	rs6000: Rework rs6000_emit_vector_compare \| expand [0/9] rs6000: Rework rs6000_emit_vector_compare [1/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1 [2/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2 [3/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3 [4/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4 [5/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1 [6/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2 [7/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3 [8/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4 [9/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5

Kewen.Lin Nov. 24, 2022, 9:15 a.m. UTC

Hi,

Following Segher's suggestion, this patch series is to rework
function rs6000_emit_vector_compare for vector float and int
in multiple steps, it's based on the previous attempts [1][2].
As mentioned in [1], the need to rework this for float is to
make a centralized place for vector float comparison handlings
instead of supporting with swapping ops and reversing code etc.
dispersedly.  It's also for a subsequent patch to handle
comparison operators with or without trapping math (PR105480).
With the handling on vector float reworked, we can further make
the handling on vector int simplified as shown.

For Segher's concern about whether this rework causes any
assembly change, I constructed two testcases for vector float[3]
and int[4] respectively before, it showed the most are fine
excepting for the difference on LE and UNGT, it's demonstrated
as improvement since it uses GE instead of GT ior EQ.  The
associated test case in patch 3/9 is a good example.

Besides, w/ and w/o the whole patch series, I built the whole
SPEC2017 at options -O3 and -Ofast separately, checked the
differences on object assembly.  The result showed that the
most are unchanged, except for:

  * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
    9 object files with differences.

  * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
    one and 527.cam4_r has 4 object files with differences.

By looking into these differences, all significant differences
are caused by the known improvement mentined above transforming
GT ior EQ to GE, which can also affect unrolling decision due
to insn count.  Some other trivial differences are branch
target offset difference, nop difference for alignment, vsx
register number differences etc.

I also evaluated the runtime performance for these changed
benchmarks, the result is neutral.

These patches are bootstrapped and regress-tested
incrementally on powerpc64-linux-gnu P7 & P8, and
powerpc64le-linux-gnu P9 & P10.

Is it ok for trunk?

BR,
Kewen
-----
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html

Kewen Lin (9):
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5

 gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
 gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
 2 files changed, 74 insertions(+), 131 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c

Kewen.Lin Dec. 14, 2022, 11:23 a.m. UTC | #1

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

on 2022/11/24 17:15, Kewen Lin wrote:
> Hi,
> 
> Following Segher's suggestion, this patch series is to rework
> function rs6000_emit_vector_compare for vector float and int
> in multiple steps, it's based on the previous attempts [1][2].
> As mentioned in [1], the need to rework this for float is to
> make a centralized place for vector float comparison handlings
> instead of supporting with swapping ops and reversing code etc.
> dispersedly.  It's also for a subsequent patch to handle
> comparison operators with or without trapping math (PR105480).
> With the handling on vector float reworked, we can further make
> the handling on vector int simplified as shown.
> 
> For Segher's concern about whether this rework causes any
> assembly change, I constructed two testcases for vector float[3]
> and int[4] respectively before, it showed the most are fine
> excepting for the difference on LE and UNGT, it's demonstrated
> as improvement since it uses GE instead of GT ior EQ.  The
> associated test case in patch 3/9 is a good example.
> 
> Besides, w/ and w/o the whole patch series, I built the whole
> SPEC2017 at options -O3 and -Ofast separately, checked the
> differences on object assembly.  The result showed that the
> most are unchanged, except for:
> 
>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>     9 object files with differences.
> 
>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>     one and 527.cam4_r has 4 object files with differences.
> 
> By looking into these differences, all significant differences
> are caused by the known improvement mentined above transforming
> GT ior EQ to GE, which can also affect unrolling decision due
> to insn count.  Some other trivial differences are branch
> target offset difference, nop difference for alignment, vsx
> register number differences etc.
> 
> I also evaluated the runtime performance for these changed
> benchmarks, the result is neutral.
> 
> These patches are bootstrapped and regress-tested
> incrementally on powerpc64-linux-gnu P7 & P8, and
> powerpc64le-linux-gnu P9 & P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -----
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
> 
> Kewen Lin (9):
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
> 
>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>  2 files changed, 74 insertions(+), 131 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>

Kewen.Lin May 17, 2023, 6:26 a.m. UTC | #2

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

> 
> on 2022/11/24 17:15, Kewen Lin wrote:
>> Hi,
>>
>> Following Segher's suggestion, this patch series is to rework
>> function rs6000_emit_vector_compare for vector float and int
>> in multiple steps, it's based on the previous attempts [1][2].
>> As mentioned in [1], the need to rework this for float is to
>> make a centralized place for vector float comparison handlings
>> instead of supporting with swapping ops and reversing code etc.
>> dispersedly.  It's also for a subsequent patch to handle
>> comparison operators with or without trapping math (PR105480).
>> With the handling on vector float reworked, we can further make
>> the handling on vector int simplified as shown.
>>
>> For Segher's concern about whether this rework causes any
>> assembly change, I constructed two testcases for vector float[3]
>> and int[4] respectively before, it showed the most are fine
>> excepting for the difference on LE and UNGT, it's demonstrated
>> as improvement since it uses GE instead of GT ior EQ.  The
>> associated test case in patch 3/9 is a good example.
>>
>> Besides, w/ and w/o the whole patch series, I built the whole
>> SPEC2017 at options -O3 and -Ofast separately, checked the
>> differences on object assembly.  The result showed that the
>> most are unchanged, except for:
>>
>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>     9 object files with differences.
>>
>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>     one and 527.cam4_r has 4 object files with differences.
>>
>> By looking into these differences, all significant differences
>> are caused by the known improvement mentined above transforming
>> GT ior EQ to GE, which can also affect unrolling decision due
>> to insn count.  Some other trivial differences are branch
>> target offset difference, nop difference for alignment, vsx
>> register number differences etc.
>>
>> I also evaluated the runtime performance for these changed
>> benchmarks, the result is neutral.
>>
>> These patches are bootstrapped and regress-tested
>> incrementally on powerpc64-linux-gnu P7 & P8, and
>> powerpc64le-linux-gnu P9 & P10.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -----
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>
>> Kewen Lin (9):
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>
>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>

Kewen.Lin June 15, 2023, 6:38 a.m. UTC | #3

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

> 
>>
>> on 2022/11/24 17:15, Kewen Lin wrote:
>>> Hi,
>>>
>>> Following Segher's suggestion, this patch series is to rework
>>> function rs6000_emit_vector_compare for vector float and int
>>> in multiple steps, it's based on the previous attempts [1][2].
>>> As mentioned in [1], the need to rework this for float is to
>>> make a centralized place for vector float comparison handlings
>>> instead of supporting with swapping ops and reversing code etc.
>>> dispersedly.  It's also for a subsequent patch to handle
>>> comparison operators with or without trapping math (PR105480).
>>> With the handling on vector float reworked, we can further make
>>> the handling on vector int simplified as shown.
>>>
>>> For Segher's concern about whether this rework causes any
>>> assembly change, I constructed two testcases for vector float[3]
>>> and int[4] respectively before, it showed the most are fine
>>> excepting for the difference on LE and UNGT, it's demonstrated
>>> as improvement since it uses GE instead of GT ior EQ.  The
>>> associated test case in patch 3/9 is a good example.
>>>
>>> Besides, w/ and w/o the whole patch series, I built the whole
>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>> differences on object assembly.  The result showed that the
>>> most are unchanged, except for:
>>>
>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>     9 object files with differences.
>>>
>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>     one and 527.cam4_r has 4 object files with differences.
>>>
>>> By looking into these differences, all significant differences
>>> are caused by the known improvement mentined above transforming
>>> GT ior EQ to GE, which can also affect unrolling decision due
>>> to insn count.  Some other trivial differences are branch
>>> target offset difference, nop difference for alignment, vsx
>>> register number differences etc.
>>>
>>> I also evaluated the runtime performance for these changed
>>> benchmarks, the result is neutral.
>>>
>>> These patches are bootstrapped and regress-tested
>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>> powerpc64le-linux-gnu P9 & P10.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -----
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>
>>> Kewen Lin (9):
>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>
>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>

Michael Meissner July 6, 2023, 9:54 p.m. UTC | #4

I get the following warning which prevents gcc from bootstrapping due to
-Werror:

/home/meissner/fsf-src/work124-sfsplat/gcc/config/rs6000/rs6000-p10sfopt.cc: In function ‘void {anonymous}::process_chain_from_load(gimple*)’:
/home/meissner/fsf-src/work124-sfsplat/gcc/config/rs6000/rs6000-p10sfopt.cc:505:30: warning: zero-length gcc_dump_printf format string [-Wformat-zero-length]
  505 |       dump_printf (MSG_NOTE, "");
      |                              ^~

I just commented out the dump_printf call.

Kewen.Lin Aug. 7, 2023, 10:05 a.m. UTC | #5

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

>>> on 2022/11/24 17:15, Kewen Lin wrote:
>>>> Hi,
>>>>
>>>> Following Segher's suggestion, this patch series is to rework
>>>> function rs6000_emit_vector_compare for vector float and int
>>>> in multiple steps, it's based on the previous attempts [1][2].
>>>> As mentioned in [1], the need to rework this for float is to
>>>> make a centralized place for vector float comparison handlings
>>>> instead of supporting with swapping ops and reversing code etc.
>>>> dispersedly.  It's also for a subsequent patch to handle
>>>> comparison operators with or without trapping math (PR105480).
>>>> With the handling on vector float reworked, we can further make
>>>> the handling on vector int simplified as shown.
>>>>
>>>> For Segher's concern about whether this rework causes any
>>>> assembly change, I constructed two testcases for vector float[3]
>>>> and int[4] respectively before, it showed the most are fine
>>>> excepting for the difference on LE and UNGT, it's demonstrated
>>>> as improvement since it uses GE instead of GT ior EQ.  The
>>>> associated test case in patch 3/9 is a good example.
>>>>
>>>> Besides, w/ and w/o the whole patch series, I built the whole
>>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>>> differences on object assembly.  The result showed that the
>>>> most are unchanged, except for:
>>>>
>>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>>     9 object files with differences.
>>>>
>>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>>     one and 527.cam4_r has 4 object files with differences.
>>>>
>>>> By looking into these differences, all significant differences
>>>> are caused by the known improvement mentined above transforming
>>>> GT ior EQ to GE, which can also affect unrolling decision due
>>>> to insn count.  Some other trivial differences are branch
>>>> target offset difference, nop difference for alignment, vsx
>>>> register number differences etc.
>>>>
>>>> I also evaluated the runtime performance for these changed
>>>> benchmarks, the result is neutral.
>>>>
>>>> These patches are bootstrapped and regress-tested
>>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>>> powerpc64le-linux-gnu P9 & P10.
>>>>
>>>> Is it ok for trunk?
>>>>
>>>> BR,
>>>> Kewen
>>>> -----
>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>>
>>>> Kewen Lin (9):
>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>>
>>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>>

Kewen.Lin Oct. 25, 2023, 2:47 a.m. UTC | #6

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

>>>> on 2022/11/24 17:15, Kewen Lin wrote:
>>>>> Hi,
>>>>>
>>>>> Following Segher's suggestion, this patch series is to rework
>>>>> function rs6000_emit_vector_compare for vector float and int
>>>>> in multiple steps, it's based on the previous attempts [1][2].
>>>>> As mentioned in [1], the need to rework this for float is to
>>>>> make a centralized place for vector float comparison handlings
>>>>> instead of supporting with swapping ops and reversing code etc.
>>>>> dispersedly.  It's also for a subsequent patch to handle
>>>>> comparison operators with or without trapping math (PR105480).
>>>>> With the handling on vector float reworked, we can further make
>>>>> the handling on vector int simplified as shown.
>>>>>
>>>>> For Segher's concern about whether this rework causes any
>>>>> assembly change, I constructed two testcases for vector float[3]
>>>>> and int[4] respectively before, it showed the most are fine
>>>>> excepting for the difference on LE and UNGT, it's demonstrated
>>>>> as improvement since it uses GE instead of GT ior EQ.  The
>>>>> associated test case in patch 3/9 is a good example.
>>>>>
>>>>> Besides, w/ and w/o the whole patch series, I built the whole
>>>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>>>> differences on object assembly.  The result showed that the
>>>>> most are unchanged, except for:
>>>>>
>>>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>>>     9 object files with differences.
>>>>>
>>>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>>>     one and 527.cam4_r has 4 object files with differences.
>>>>>
>>>>> By looking into these differences, all significant differences
>>>>> are caused by the known improvement mentined above transforming
>>>>> GT ior EQ to GE, which can also affect unrolling decision due
>>>>> to insn count.  Some other trivial differences are branch
>>>>> target offset difference, nop difference for alignment, vsx
>>>>> register number differences etc.
>>>>>
>>>>> I also evaluated the runtime performance for these changed
>>>>> benchmarks, the result is neutral.
>>>>>
>>>>> These patches are bootstrapped and regress-tested
>>>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>>>> powerpc64le-linux-gnu P9 & P10.
>>>>>
>>>>> Is it ok for trunk?
>>>>>
>>>>> BR,
>>>>> Kewen
>>>>> -----
>>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>>>
>>>>> Kewen Lin (9):
>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>>>
>>>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>>>

Kewen.Lin Nov. 8, 2023, 2:50 a.m. UTC | #7

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

>>>>> on 2022/11/24 17:15, Kewen Lin wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Following Segher's suggestion, this patch series is to rework
>>>>>> function rs6000_emit_vector_compare for vector float and int
>>>>>> in multiple steps, it's based on the previous attempts [1][2].
>>>>>> As mentioned in [1], the need to rework this for float is to
>>>>>> make a centralized place for vector float comparison handlings
>>>>>> instead of supporting with swapping ops and reversing code etc.
>>>>>> dispersedly.  It's also for a subsequent patch to handle
>>>>>> comparison operators with or without trapping math (PR105480).
>>>>>> With the handling on vector float reworked, we can further make
>>>>>> the handling on vector int simplified as shown.
>>>>>>
>>>>>> For Segher's concern about whether this rework causes any
>>>>>> assembly change, I constructed two testcases for vector float[3]
>>>>>> and int[4] respectively before, it showed the most are fine
>>>>>> excepting for the difference on LE and UNGT, it's demonstrated
>>>>>> as improvement since it uses GE instead of GT ior EQ.  The
>>>>>> associated test case in patch 3/9 is a good example.
>>>>>>
>>>>>> Besides, w/ and w/o the whole patch series, I built the whole
>>>>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>>>>> differences on object assembly.  The result showed that the
>>>>>> most are unchanged, except for:
>>>>>>
>>>>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>>>>     9 object files with differences.
>>>>>>
>>>>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>>>>     one and 527.cam4_r has 4 object files with differences.
>>>>>>
>>>>>> By looking into these differences, all significant differences
>>>>>> are caused by the known improvement mentined above transforming
>>>>>> GT ior EQ to GE, which can also affect unrolling decision due
>>>>>> to insn count.  Some other trivial differences are branch
>>>>>> target offset difference, nop difference for alignment, vsx
>>>>>> register number differences etc.
>>>>>>
>>>>>> I also evaluated the runtime performance for these changed
>>>>>> benchmarks, the result is neutral.
>>>>>>
>>>>>> These patches are bootstrapped and regress-tested
>>>>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>>>>> powerpc64le-linux-gnu P9 & P10.
>>>>>>
>>>>>> Is it ok for trunk?
>>>>>>
>>>>>> BR,
>>>>>> Kewen
>>>>>> -----
>>>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>>>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>>>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>>>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>>>>
>>>>>> Kewen Lin (9):
>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>>>>
>>>>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>>>>
>

Kewen.Lin Dec. 4, 2023, 9:50 a.m. UTC | #8

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

> 
>>>>>> on 2022/11/24 17:15, Kewen Lin wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Following Segher's suggestion, this patch series is to rework
>>>>>>> function rs6000_emit_vector_compare for vector float and int
>>>>>>> in multiple steps, it's based on the previous attempts [1][2].
>>>>>>> As mentioned in [1], the need to rework this for float is to
>>>>>>> make a centralized place for vector float comparison handlings
>>>>>>> instead of supporting with swapping ops and reversing code etc.
>>>>>>> dispersedly.  It's also for a subsequent patch to handle
>>>>>>> comparison operators with or without trapping math (PR105480).
>>>>>>> With the handling on vector float reworked, we can further make
>>>>>>> the handling on vector int simplified as shown.
>>>>>>>
>>>>>>> For Segher's concern about whether this rework causes any
>>>>>>> assembly change, I constructed two testcases for vector float[3]
>>>>>>> and int[4] respectively before, it showed the most are fine
>>>>>>> excepting for the difference on LE and UNGT, it's demonstrated
>>>>>>> as improvement since it uses GE instead of GT ior EQ.  The
>>>>>>> associated test case in patch 3/9 is a good example.
>>>>>>>
>>>>>>> Besides, w/ and w/o the whole patch series, I built the whole
>>>>>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>>>>>> differences on object assembly.  The result showed that the
>>>>>>> most are unchanged, except for:
>>>>>>>
>>>>>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>>>>>     9 object files with differences.
>>>>>>>
>>>>>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>>>>>     one and 527.cam4_r has 4 object files with differences.
>>>>>>>
>>>>>>> By looking into these differences, all significant differences
>>>>>>> are caused by the known improvement mentined above transforming
>>>>>>> GT ior EQ to GE, which can also affect unrolling decision due
>>>>>>> to insn count.  Some other trivial differences are branch
>>>>>>> target offset difference, nop difference for alignment, vsx
>>>>>>> register number differences etc.
>>>>>>>
>>>>>>> I also evaluated the runtime performance for these changed
>>>>>>> benchmarks, the result is neutral.
>>>>>>>
>>>>>>> These patches are bootstrapped and regress-tested
>>>>>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>>>>>> powerpc64le-linux-gnu P9 & P10.
>>>>>>>
>>>>>>> Is it ok for trunk?
>>>>>>>
>>>>>>> BR,
>>>>>>> Kewen
>>>>>>> -----
>>>>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>>>>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>>>>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>>>>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>>>>>
>>>>>>> Kewen Lin (9):
>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>>>>>
>>>>>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>>>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>>>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>>>>>
>>

Kewen.Lin Dec. 12, 2023, 6:08 a.m. UTC | #9

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

>>>>>>> on 2022/11/24 17:15, Kewen Lin wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Following Segher's suggestion, this patch series is to rework
>>>>>>>> function rs6000_emit_vector_compare for vector float and int
>>>>>>>> in multiple steps, it's based on the previous attempts [1][2].
>>>>>>>> As mentioned in [1], the need to rework this for float is to
>>>>>>>> make a centralized place for vector float comparison handlings
>>>>>>>> instead of supporting with swapping ops and reversing code etc.
>>>>>>>> dispersedly.  It's also for a subsequent patch to handle
>>>>>>>> comparison operators with or without trapping math (PR105480).
>>>>>>>> With the handling on vector float reworked, we can further make
>>>>>>>> the handling on vector int simplified as shown.
>>>>>>>>
>>>>>>>> For Segher's concern about whether this rework causes any
>>>>>>>> assembly change, I constructed two testcases for vector float[3]
>>>>>>>> and int[4] respectively before, it showed the most are fine
>>>>>>>> excepting for the difference on LE and UNGT, it's demonstrated
>>>>>>>> as improvement since it uses GE instead of GT ior EQ.  The
>>>>>>>> associated test case in patch 3/9 is a good example.
>>>>>>>>
>>>>>>>> Besides, w/ and w/o the whole patch series, I built the whole
>>>>>>>> SPEC2017 at options -O3 and -Ofast separately, checked the
>>>>>>>> differences on object assembly.  The result showed that the
>>>>>>>> most are unchanged, except for:
>>>>>>>>
>>>>>>>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>>>>>>>>     9 object files with differences.
>>>>>>>>
>>>>>>>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>>>>>>>>     one and 527.cam4_r has 4 object files with differences.
>>>>>>>>
>>>>>>>> By looking into these differences, all significant differences
>>>>>>>> are caused by the known improvement mentined above transforming
>>>>>>>> GT ior EQ to GE, which can also affect unrolling decision due
>>>>>>>> to insn count.  Some other trivial differences are branch
>>>>>>>> target offset difference, nop difference for alignment, vsx
>>>>>>>> register number differences etc.
>>>>>>>>
>>>>>>>> I also evaluated the runtime performance for these changed
>>>>>>>> benchmarks, the result is neutral.
>>>>>>>>
>>>>>>>> These patches are bootstrapped and regress-tested
>>>>>>>> incrementally on powerpc64-linux-gnu P7 & P8, and
>>>>>>>> powerpc64le-linux-gnu P9 & P10.
>>>>>>>>
>>>>>>>> Is it ok for trunk?
>>>>>>>>
>>>>>>>> BR,
>>>>>>>> Kewen
>>>>>>>> -----
>>>>>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>>>>>>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>>>>>>>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>>>>>>>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>>>>>>>
>>>>>>>> Kewen Lin (9):
>>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>>>>>>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>>>>>>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>>>>>>>
>>>>>>>>  gcc/config/rs6000/rs6000.cc                 | 180 ++++++--------------
>>>>>>>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>>>>>>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>>>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>>>>>>>
>>

[0/9] rs6000: Rework rs6000_emit_vector_compare

Message

Comments