[ARM] PR66791: Replace builtins in vshl_n

Message ID	CAAgBjMnEDgMYgjg7vXvQTWNfG6nn-wEbOuR3iwMD1PwmazBmDQ@mail.gmail.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AF8843858002 MIME-Version: 1.0 Date: Thu, 22 Jul 2021 13:15:05 +0530 Message-ID: <CAAgBjMnEDgMYgjg7vXvQTWNfG6nn-wEbOuR3iwMD1PwmazBmDQ@mail.gmail.com> Subject: [ARM] PR66791: Replace builtins in vshl_n To: gcc Patches <gcc-patches@gcc.gnu.org>, Kyrill Tkachov <kyrylo.tkachov@arm.com> Content-Type: multipart/mixed; boundary="0000000000004a57d505c7b179f2" Precedence: list From: Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	[ARM] PR66791: Replace builtins in vshl_n \| expand [ARM] PR66791: Replace builtins in vshl_n

Prathamesh Kulkarni July 22, 2021, 7:45 a.m. UTC

Hi,
The attached patch removes calls to builtins from vshl_n intrinsics,
and replacing them
with left shift operator. The patch passes bootstrap+test on
arm-linux-gnueabihf.

Altho, I noticed, that the patch causes 3 extra registers to spill
using << instead
of the builtin for vshl_n.c. Could that be perhaps due to inlining of
intrinsics ?
Before patch, the shift operation was performed by call to
__builtin_neon_vshl<type> (__a, __b)
and now it's inlined to __a << __b, which might result in increased
register pressure ?

Thanks,
Prathamesh

Richard Earnshaw July 22, 2021, 10:33 a.m. UTC | #1

On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
> Hi,
> The attached patch removes calls to builtins from vshl_n intrinsics,
> and replacing them
> with left shift operator. The patch passes bootstrap+test on
> arm-linux-gnueabihf.
> 
> Altho, I noticed, that the patch causes 3 extra registers to spill
> using << instead
> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
> intrinsics ?
> Before patch, the shift operation was performed by call to
> __builtin_neon_vshl<type> (__a, __b)
> and now it's inlined to __a << __b, which might result in increased
> register pressure ?
> 
> Thanks,
> Prathamesh
> 


You're missing a ChangeLog for the patch.

However, I'm not sure about this.  The register shift form of VSHL 
performs a right shift if the value is negative, which is UB if you 
write `<<` instead.

Have I missed something here?

R.

Prathamesh Kulkarni July 22, 2021, 11:32 a.m. UTC | #2

On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
>
>
>
> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
> > Hi,
> > The attached patch removes calls to builtins from vshl_n intrinsics,
> > and replacing them
> > with left shift operator. The patch passes bootstrap+test on
> > arm-linux-gnueabihf.
> >
> > Altho, I noticed, that the patch causes 3 extra registers to spill
> > using << instead
> > of the builtin for vshl_n.c. Could that be perhaps due to inlining of
> > intrinsics ?
> > Before patch, the shift operation was performed by call to
> > __builtin_neon_vshl<type> (__a, __b)
> > and now it's inlined to __a << __b, which might result in increased
> > register pressure ?
> >
> > Thanks,
> > Prathamesh
> >
>
>
> You're missing a ChangeLog for the patch.
Sorry, updated in this patch.
>
> However, I'm not sure about this.  The register shift form of VSHL
> performs a right shift if the value is negative, which is UB if you
> write `<<` instead.
>
> Have I missed something here?
Hi Richard,
According to this article:
https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
For vshl_n, the shift amount is always in the non-negative range for all types.

I tried using vshl_n_s32 (a, -1), and the compiler emitted following diagnostic:
foo.c: In function ‘main’:
foo.c:17:1: error: constant -1 out of range 0 - 31
   17 | }
      | ^

So, is the attached patch OK ?

Thanks,
Prathamesh
>
> R.
2021-22-07  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	PR target/66791
	* config/arm/arm_neon.h (vshl_n_s8): Replace call to builtin with
	left shift operator.
	(vshl_n_s16): Likewise.
	(vshl_n_s32): Likewise.
	(vshl_n_s64): Likewise.
	(vshl_n_u8): Likewise.
	(vshl_n_u16): Likewise.
	(vshl_n_u32): Likewise.
	(vshl_n_u64): Likewise.
	(vshlq_n_s8): Likewise.
	(vshlq_n_s16): Likewise.
	(vshlq_n_s32): Likewise.
	(vshlq_n_s64): Likewise.
	(vshlq_n_u8): Likewise.
	(vshlq_n_u16): Likewise.
	(vshlq_n_u32): Likewise.
	(vshlq_n_u64): Likewise.
	* config/arm/arm_neon_builtins.def (vshl_n): Remove entry.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 41b596b5fc6..f5c85eb43e7 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -4887,112 +4887,112 @@ __extension__ extern __inline int8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshl_n_s8 (int8x8_t __a, const int __b)
 {
-  return (int8x8_t)__builtin_neon_vshl_nv8qi (__a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline int16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshl_n_s16 (int16x4_t __a, const int __b)
 {
-  return (int16x4_t)__builtin_neon_vshl_nv4hi (__a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshl_n_s32 (int32x2_t __a, const int __b)
 {
-  return (int32x2_t)__builtin_neon_vshl_nv2si (__a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline int64x1_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshl_n_s64 (int64x1_t __a, const int __b)
 {
-  return (int64x1_t)__builtin_neon_vshl_ndi (__a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline uint8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshl_n_u8 (uint8x8_t __a, const int __b)
 {
-  return (uint8x8_t)__builtin_neon_vshl_nv8qi ((int8x8_t) __a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline uint16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshl_n_u16 (uint16x4_t __a, const int __b)
 {
-  return (uint16x4_t)__builtin_neon_vshl_nv4hi ((int16x4_t) __a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline uint32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshl_n_u32 (uint32x2_t __a, const int __b)
 {
-  return (uint32x2_t)__builtin_neon_vshl_nv2si ((int32x2_t) __a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline uint64x1_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshl_n_u64 (uint64x1_t __a, const int __b)
 {
-  return (uint64x1_t)__builtin_neon_vshl_ndi ((int64x1_t) __a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline int8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshlq_n_s8 (int8x16_t __a, const int __b)
 {
-  return (int8x16_t)__builtin_neon_vshl_nv16qi (__a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshlq_n_s16 (int16x8_t __a, const int __b)
 {
-  return (int16x8_t)__builtin_neon_vshl_nv8hi (__a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshlq_n_s32 (int32x4_t __a, const int __b)
 {
-  return (int32x4_t)__builtin_neon_vshl_nv4si (__a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline int64x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshlq_n_s64 (int64x2_t __a, const int __b)
 {
-  return (int64x2_t)__builtin_neon_vshl_nv2di (__a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline uint8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshlq_n_u8 (uint8x16_t __a, const int __b)
 {
-  return (uint8x16_t)__builtin_neon_vshl_nv16qi ((int8x16_t) __a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline uint16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshlq_n_u16 (uint16x8_t __a, const int __b)
 {
-  return (uint16x8_t)__builtin_neon_vshl_nv8hi ((int16x8_t) __a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshlq_n_u32 (uint32x4_t __a, const int __b)
 {
-  return (uint32x4_t)__builtin_neon_vshl_nv4si ((int32x4_t) __a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline uint64x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vshlq_n_u64 (uint64x2_t __a, const int __b)
 {
-  return (uint64x2_t)__builtin_neon_vshl_nv2di ((int64x2_t) __a, __b);
+  return __a << __b;
 }
 
 __extension__ extern __inline int8x8_t
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 70438ac1848..ea6bd43a035 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -103,7 +103,6 @@ VAR3 (BINOP_IMM, vqrshrns_n, v8hi, v4si, v2di)
 VAR3 (BINOP_IMM, vqrshrnu_n, v8hi, v4si, v2di)
 VAR3 (BINOP_IMM, vqshrun_n, v8hi, v4si, v2di)
 VAR3 (BINOP_IMM, vqrshrun_n, v8hi, v4si, v2di)
-VAR8 (BINOP_IMM, vshl_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP_IMM, vqshl_s_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP_IMM, vqshl_u_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP_IMM, vqshlu_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)

Richard Earnshaw July 22, 2021, 11:58 a.m. UTC | #3

On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
> On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
> <Richard.Earnshaw@foss.arm.com> wrote:
>>
>>
>>
>> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
>>> Hi,
>>> The attached patch removes calls to builtins from vshl_n intrinsics,
>>> and replacing them
>>> with left shift operator. The patch passes bootstrap+test on
>>> arm-linux-gnueabihf.
>>>
>>> Altho, I noticed, that the patch causes 3 extra registers to spill
>>> using << instead
>>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
>>> intrinsics ?
>>> Before patch, the shift operation was performed by call to
>>> __builtin_neon_vshl<type> (__a, __b)
>>> and now it's inlined to __a << __b, which might result in increased
>>> register pressure ?
>>>
>>> Thanks,
>>> Prathamesh
>>>
>>
>>
>> You're missing a ChangeLog for the patch.
> Sorry, updated in this patch.
>>
>> However, I'm not sure about this.  The register shift form of VSHL
>> performs a right shift if the value is negative, which is UB if you
>> write `<<` instead.
>>
>> Have I missed something here?
> Hi Richard,
> According to this article:
> https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
> For vshl_n, the shift amount is always in the non-negative range for all types.
> 
> I tried using vshl_n_s32 (a, -1), and the compiler emitted following diagnostic:
> foo.c: In function ‘main’:
> foo.c:17:1: error: constant -1 out of range 0 - 31
>     17 | }
>        | ^
> 

It does do that now, but that's because the intrinsic expansion does 
some bounds checking; when you remove the call into the back-end 
intrinsic that will no-longer happen.

I think with this change various things are likely:

- We'll no-longer reject non-immediate values, so users will be able to 
write

         int b = 5;
	vshl_n_s32 (a, b);

   which will expand to a vdup followed by the register form.

- we'll rely on the front-end diagnosing out-of range shifts

- code of the form

	int b = -1;
	vshl_n_s32 (a, b);

   will probably now go through without any errors, especially at low 
optimization levels.  It may end up doing what the user wanted, but it's 
definitely a change in behaviour - and perhaps worse, the compiler might 
diagnose the above as UB and silently throw some stuff away.

It might be that we need to insert some form of static assertion that 
the second argument is a __builtin_constant_p().

R.

> So, is the attached patch OK ?

> 
> Thanks,
> Prathamesh
>>
>> R.

Prathamesh Kulkarni July 22, 2021, 1:47 p.m. UTC | #4

On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
>
>
>
> On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
> > On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
> > <Richard.Earnshaw@foss.arm.com> wrote:
> >>
> >>
> >>
> >> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
> >>> Hi,
> >>> The attached patch removes calls to builtins from vshl_n intrinsics,
> >>> and replacing them
> >>> with left shift operator. The patch passes bootstrap+test on
> >>> arm-linux-gnueabihf.
> >>>
> >>> Altho, I noticed, that the patch causes 3 extra registers to spill
> >>> using << instead
> >>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
> >>> intrinsics ?
> >>> Before patch, the shift operation was performed by call to
> >>> __builtin_neon_vshl<type> (__a, __b)
> >>> and now it's inlined to __a << __b, which might result in increased
> >>> register pressure ?
> >>>
> >>> Thanks,
> >>> Prathamesh
> >>>
> >>
> >>
> >> You're missing a ChangeLog for the patch.
> > Sorry, updated in this patch.
> >>
> >> However, I'm not sure about this.  The register shift form of VSHL
> >> performs a right shift if the value is negative, which is UB if you
> >> write `<<` instead.
> >>
> >> Have I missed something here?
> > Hi Richard,
> > According to this article:
> > https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
> > For vshl_n, the shift amount is always in the non-negative range for all types.
> >
> > I tried using vshl_n_s32 (a, -1), and the compiler emitted following diagnostic:
> > foo.c: In function ‘main’:
> > foo.c:17:1: error: constant -1 out of range 0 - 31
> >     17 | }
> >        | ^
> >
>
> It does do that now, but that's because the intrinsic expansion does
> some bounds checking; when you remove the call into the back-end
> intrinsic that will no-longer happen.
>
> I think with this change various things are likely:
>
> - We'll no-longer reject non-immediate values, so users will be able to
> write
>
>          int b = 5;
>         vshl_n_s32 (a, b);
>
>    which will expand to a vdup followed by the register form.
>
> - we'll rely on the front-end diagnosing out-of range shifts
>
> - code of the form
>
>         int b = -1;
>         vshl_n_s32 (a, b);
>
>    will probably now go through without any errors, especially at low
> optimization levels.  It may end up doing what the user wanted, but it's
> definitely a change in behaviour - and perhaps worse, the compiler might
> diagnose the above as UB and silently throw some stuff away.
>
> It might be that we need to insert some form of static assertion that
> the second argument is a __builtin_constant_p().
Ah right, thanks for the suggestions!
I tried the above example:
int b = -1;
vshl_n_s32 (a, b);
and it compiled without any errors with -O0 after patch.

Would it be OK to use _Static_assert (__builtin_constant_p (b)) to
guard against non-immediate values ?

With the following change:
__extension__ extern __inline int32x2_t
__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
vshl_n_s32 (int32x2_t __a, const int __b)
{
  _Static_assert (__builtin_constant_p (__b));
  return __a << __b;
}

the above example fails at -O0:
../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’:
../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed
 4904 |   _Static_assert (__builtin_constant_p (__b));
      |   ^~~~~~~~~~~~~~

Thanks,
Prathamesh
>
> R.
>
> > So, is the attached patch OK ?
>
> >
> > Thanks,
> > Prathamesh
> >>
> >> R.

Richard Earnshaw July 22, 2021, 2:59 p.m. UTC | #5

On 22/07/2021 14:47, Prathamesh Kulkarni via Gcc-patches wrote:
> On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw
> <Richard.Earnshaw@foss.arm.com> wrote:
>>
>>
>>
>> On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
>>> On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
>>> <Richard.Earnshaw@foss.arm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
>>>>> Hi,
>>>>> The attached patch removes calls to builtins from vshl_n intrinsics,
>>>>> and replacing them
>>>>> with left shift operator. The patch passes bootstrap+test on
>>>>> arm-linux-gnueabihf.
>>>>>
>>>>> Altho, I noticed, that the patch causes 3 extra registers to spill
>>>>> using << instead
>>>>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
>>>>> intrinsics ?
>>>>> Before patch, the shift operation was performed by call to
>>>>> __builtin_neon_vshl<type> (__a, __b)
>>>>> and now it's inlined to __a << __b, which might result in increased
>>>>> register pressure ?
>>>>>
>>>>> Thanks,
>>>>> Prathamesh
>>>>>
>>>>
>>>>
>>>> You're missing a ChangeLog for the patch.
>>> Sorry, updated in this patch.
>>>>
>>>> However, I'm not sure about this.  The register shift form of VSHL
>>>> performs a right shift if the value is negative, which is UB if you
>>>> write `<<` instead.
>>>>
>>>> Have I missed something here?
>>> Hi Richard,
>>> According to this article:
>>> https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
>>> For vshl_n, the shift amount is always in the non-negative range for all types.
>>>
>>> I tried using vshl_n_s32 (a, -1), and the compiler emitted following diagnostic:
>>> foo.c: In function ‘main’:
>>> foo.c:17:1: error: constant -1 out of range 0 - 31
>>>      17 | }
>>>         | ^
>>>
>>
>> It does do that now, but that's because the intrinsic expansion does
>> some bounds checking; when you remove the call into the back-end
>> intrinsic that will no-longer happen.
>>
>> I think with this change various things are likely:
>>
>> - We'll no-longer reject non-immediate values, so users will be able to
>> write
>>
>>           int b = 5;
>>          vshl_n_s32 (a, b);
>>
>>     which will expand to a vdup followed by the register form.
>>
>> - we'll rely on the front-end diagnosing out-of range shifts
>>
>> - code of the form
>>
>>          int b = -1;
>>          vshl_n_s32 (a, b);
>>
>>     will probably now go through without any errors, especially at low
>> optimization levels.  It may end up doing what the user wanted, but it's
>> definitely a change in behaviour - and perhaps worse, the compiler might
>> diagnose the above as UB and silently throw some stuff away.
>>
>> It might be that we need to insert some form of static assertion that
>> the second argument is a __builtin_constant_p().
> Ah right, thanks for the suggestions!
> I tried the above example:
> int b = -1;
> vshl_n_s32 (a, b);
> and it compiled without any errors with -O0 after patch.
> 
> Would it be OK to use _Static_assert (__builtin_constant_p (b)) to
> guard against non-immediate values ?
> 
> With the following change:
> __extension__ extern __inline int32x2_t
> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> vshl_n_s32 (int32x2_t __a, const int __b)
> {
>    _Static_assert (__builtin_constant_p (__b));
>    return __a << __b;
> }
> 
> the above example fails at -O0:
> ../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’:
> ../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed
>   4904 |   _Static_assert (__builtin_constant_p (__b));
>        |   ^~~~~~~~~~~~~~

I've been playing with that but unfortunately it doesn't seem to work in 
the way we want it to.  For a complete test:



typedef __simd64_int32_t int32x2_t;

__extension__ extern __inline int32x2_t
__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
vshl_n_s32 (int32x2_t __a, const int __b)
{
   _Static_assert (__builtin_constant_p (__b), "Second argument must be 
a litteral constant");
   return __a << __b;
}

int32x2_t f (int32x2_t x, const int b)
{
   return vshl_n_s32 (x, 1);
}

At -O0 I get:

test.c: In function ‘vshl_n_s32’:
test.c:7:3: error: static assertion failed: "Second argument must be a 
litteral constant"
     7 |   _Static_assert (__builtin_constant_p (__b), "Second argument 
must be a litteral constant");
       |   ^~~~~~~~~~~~~~

While at -O1 and above I get:


test.c: In function ‘vshl_n_s32’:
test.c:7:19: error: expression in static assertion is not constant
     7 |   _Static_assert (__builtin_constant_p (__b), "Second argument 
must be a litteral constant");
       |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~

Which indicates that it doesn't consider __builtin_constant_p() to be a 
constant expression :(

So either I'm writing the static assertion incorrectly, or something 
weird is going on.  The most likely issue is that the static assertion 
is being processed too early, before the function is inlined.

R.

> 
> Thanks,
> Prathamesh
>>
>> R.
>>
>>> So, is the attached patch OK ?
>>
>>>
>>> Thanks,
>>> Prathamesh
>>>>
>>>> R.

Prathamesh Kulkarni July 23, 2021, 7:04 a.m. UTC | #6

On Thu, 22 Jul 2021 at 20:29, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
>
>
>
> On 22/07/2021 14:47, Prathamesh Kulkarni via Gcc-patches wrote:
> > On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw
> > <Richard.Earnshaw@foss.arm.com> wrote:
> >>
> >>
> >>
> >> On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
> >>> On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
> >>> <Richard.Earnshaw@foss.arm.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
> >>>>> Hi,
> >>>>> The attached patch removes calls to builtins from vshl_n intrinsics,
> >>>>> and replacing them
> >>>>> with left shift operator. The patch passes bootstrap+test on
> >>>>> arm-linux-gnueabihf.
> >>>>>
> >>>>> Altho, I noticed, that the patch causes 3 extra registers to spill
> >>>>> using << instead
> >>>>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
> >>>>> intrinsics ?
> >>>>> Before patch, the shift operation was performed by call to
> >>>>> __builtin_neon_vshl<type> (__a, __b)
> >>>>> and now it's inlined to __a << __b, which might result in increased
> >>>>> register pressure ?
> >>>>>
> >>>>> Thanks,
> >>>>> Prathamesh
> >>>>>
> >>>>
> >>>>
> >>>> You're missing a ChangeLog for the patch.
> >>> Sorry, updated in this patch.
> >>>>
> >>>> However, I'm not sure about this.  The register shift form of VSHL
> >>>> performs a right shift if the value is negative, which is UB if you
> >>>> write `<<` instead.
> >>>>
> >>>> Have I missed something here?
> >>> Hi Richard,
> >>> According to this article:
> >>> https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
> >>> For vshl_n, the shift amount is always in the non-negative range for all types.
> >>>
> >>> I tried using vshl_n_s32 (a, -1), and the compiler emitted following diagnostic:
> >>> foo.c: In function ‘main’:
> >>> foo.c:17:1: error: constant -1 out of range 0 - 31
> >>>      17 | }
> >>>         | ^
> >>>
> >>
> >> It does do that now, but that's because the intrinsic expansion does
> >> some bounds checking; when you remove the call into the back-end
> >> intrinsic that will no-longer happen.
> >>
> >> I think with this change various things are likely:
> >>
> >> - We'll no-longer reject non-immediate values, so users will be able to
> >> write
> >>
> >>           int b = 5;
> >>          vshl_n_s32 (a, b);
> >>
> >>     which will expand to a vdup followed by the register form.
> >>
> >> - we'll rely on the front-end diagnosing out-of range shifts
> >>
> >> - code of the form
> >>
> >>          int b = -1;
> >>          vshl_n_s32 (a, b);
> >>
> >>     will probably now go through without any errors, especially at low
> >> optimization levels.  It may end up doing what the user wanted, but it's
> >> definitely a change in behaviour - and perhaps worse, the compiler might
> >> diagnose the above as UB and silently throw some stuff away.
> >>
> >> It might be that we need to insert some form of static assertion that
> >> the second argument is a __builtin_constant_p().
> > Ah right, thanks for the suggestions!
> > I tried the above example:
> > int b = -1;
> > vshl_n_s32 (a, b);
> > and it compiled without any errors with -O0 after patch.
> >
> > Would it be OK to use _Static_assert (__builtin_constant_p (b)) to
> > guard against non-immediate values ?
> >
> > With the following change:
> > __extension__ extern __inline int32x2_t
> > __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> > vshl_n_s32 (int32x2_t __a, const int __b)
> > {
> >    _Static_assert (__builtin_constant_p (__b));
> >    return __a << __b;
> > }
> >
> > the above example fails at -O0:
> > ../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’:
> > ../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed
> >   4904 |   _Static_assert (__builtin_constant_p (__b));
> >        |   ^~~~~~~~~~~~~~
>
> I've been playing with that but unfortunately it doesn't seem to work in
> the way we want it to.  For a complete test:
>
>
>
> typedef __simd64_int32_t int32x2_t;
>
> __extension__ extern __inline int32x2_t
> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> vshl_n_s32 (int32x2_t __a, const int __b)
> {
>    _Static_assert (__builtin_constant_p (__b), "Second argument must be
> a litteral constant");
>    return __a << __b;
> }
>
> int32x2_t f (int32x2_t x, const int b)
> {
>    return vshl_n_s32 (x, 1);
> }
>
> At -O0 I get:
>
> test.c: In function ‘vshl_n_s32’:
> test.c:7:3: error: static assertion failed: "Second argument must be a
> litteral constant"
>      7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
> must be a litteral constant");
>        |   ^~~~~~~~~~~~~~
>
> While at -O1 and above I get:
>
>
> test.c: In function ‘vshl_n_s32’:
> test.c:7:19: error: expression in static assertion is not constant
>      7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
> must be a litteral constant");
>        |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Which indicates that it doesn't consider __builtin_constant_p() to be a
> constant expression :(
>
> So either I'm writing the static assertion incorrectly, or something
> weird is going on.  The most likely issue is that the static assertion
> is being processed too early, before the function is inlined.
Ah indeed. I wonder if we should add an attribute to parameter that it
should be constant,
and emit an error if the caller passes non-constant value ?
sth like:
void foo(int x __attribute__((runtime_constant)));
and the front-end can then diagnose if the argument is
__builtin_constant_p while type-checking call to foo.

Thanks,
Prathamesh
>
> R.
>
> >
> > Thanks,
> > Prathamesh
> >>
> >> R.
> >>
> >>> So, is the attached patch OK ?
> >>
> >>>
> >>> Thanks,
> >>> Prathamesh
> >>>>
> >>>> R.

Richard Earnshaw July 23, 2021, 9:32 a.m. UTC | #7

On 23/07/2021 08:04, Prathamesh Kulkarni via Gcc-patches wrote:
> On Thu, 22 Jul 2021 at 20:29, Richard Earnshaw
> <Richard.Earnshaw@foss.arm.com> wrote:
>>
>>
>>
>> On 22/07/2021 14:47, Prathamesh Kulkarni via Gcc-patches wrote:
>>> On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw
>>> <Richard.Earnshaw@foss.arm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
>>>>> On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
>>>>> <Richard.Earnshaw@foss.arm.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
>>>>>>> Hi,
>>>>>>> The attached patch removes calls to builtins from vshl_n intrinsics,
>>>>>>> and replacing them
>>>>>>> with left shift operator. The patch passes bootstrap+test on
>>>>>>> arm-linux-gnueabihf.
>>>>>>>
>>>>>>> Altho, I noticed, that the patch causes 3 extra registers to spill
>>>>>>> using << instead
>>>>>>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
>>>>>>> intrinsics ?
>>>>>>> Before patch, the shift operation was performed by call to
>>>>>>> __builtin_neon_vshl<type> (__a, __b)
>>>>>>> and now it's inlined to __a << __b, which might result in increased
>>>>>>> register pressure ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Prathamesh
>>>>>>>
>>>>>>
>>>>>>
>>>>>> You're missing a ChangeLog for the patch.
>>>>> Sorry, updated in this patch.
>>>>>>
>>>>>> However, I'm not sure about this.  The register shift form of VSHL
>>>>>> performs a right shift if the value is negative, which is UB if you
>>>>>> write `<<` instead.
>>>>>>
>>>>>> Have I missed something here?
>>>>> Hi Richard,
>>>>> According to this article:
>>>>> https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
>>>>> For vshl_n, the shift amount is always in the non-negative range for all types.
>>>>>
>>>>> I tried using vshl_n_s32 (a, -1), and the compiler emitted following diagnostic:
>>>>> foo.c: In function ‘main’:
>>>>> foo.c:17:1: error: constant -1 out of range 0 - 31
>>>>>      17 | }
>>>>>         | ^
>>>>>
>>>>
>>>> It does do that now, but that's because the intrinsic expansion does
>>>> some bounds checking; when you remove the call into the back-end
>>>> intrinsic that will no-longer happen.
>>>>
>>>> I think with this change various things are likely:
>>>>
>>>> - We'll no-longer reject non-immediate values, so users will be able to
>>>> write
>>>>
>>>>           int b = 5;
>>>>          vshl_n_s32 (a, b);
>>>>
>>>>     which will expand to a vdup followed by the register form.
>>>>
>>>> - we'll rely on the front-end diagnosing out-of range shifts
>>>>
>>>> - code of the form
>>>>
>>>>          int b = -1;
>>>>          vshl_n_s32 (a, b);
>>>>
>>>>     will probably now go through without any errors, especially at low
>>>> optimization levels.  It may end up doing what the user wanted, but it's
>>>> definitely a change in behaviour - and perhaps worse, the compiler might
>>>> diagnose the above as UB and silently throw some stuff away.
>>>>
>>>> It might be that we need to insert some form of static assertion that
>>>> the second argument is a __builtin_constant_p().
>>> Ah right, thanks for the suggestions!
>>> I tried the above example:
>>> int b = -1;
>>> vshl_n_s32 (a, b);
>>> and it compiled without any errors with -O0 after patch.
>>>
>>> Would it be OK to use _Static_assert (__builtin_constant_p (b)) to
>>> guard against non-immediate values ?
>>>
>>> With the following change:
>>> __extension__ extern __inline int32x2_t
>>> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
>>> vshl_n_s32 (int32x2_t __a, const int __b)
>>> {
>>>    _Static_assert (__builtin_constant_p (__b));
>>>    return __a << __b;
>>> }
>>>
>>> the above example fails at -O0:
>>> ../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’:
>>> ../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed
>>>   4904 |   _Static_assert (__builtin_constant_p (__b));
>>>        |   ^~~~~~~~~~~~~~
>>
>> I've been playing with that but unfortunately it doesn't seem to work in
>> the way we want it to.  For a complete test:
>>
>>
>>
>> typedef __simd64_int32_t int32x2_t;
>>
>> __extension__ extern __inline int32x2_t
>> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
>> vshl_n_s32 (int32x2_t __a, const int __b)
>> {
>>    _Static_assert (__builtin_constant_p (__b), "Second argument must be
>> a litteral constant");
>>    return __a << __b;
>> }
>>
>> int32x2_t f (int32x2_t x, const int b)
>> {
>>    return vshl_n_s32 (x, 1);
>> }
>>
>> At -O0 I get:
>>
>> test.c: In function ‘vshl_n_s32’:
>> test.c:7:3: error: static assertion failed: "Second argument must be a
>> litteral constant"
>>      7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
>> must be a litteral constant");
>>        |   ^~~~~~~~~~~~~~
>>
>> While at -O1 and above I get:
>>
>>
>> test.c: In function ‘vshl_n_s32’:
>> test.c:7:19: error: expression in static assertion is not constant
>>      7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
>> must be a litteral constant");
>>        |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Which indicates that it doesn't consider __builtin_constant_p() to be a
>> constant expression :(
>>
>> So either I'm writing the static assertion incorrectly, or something
>> weird is going on.  The most likely issue is that the static assertion
>> is being processed too early, before the function is inlined.
> Ah indeed. I wonder if we should add an attribute to parameter that it
> should be constant,
> and emit an error if the caller passes non-constant value ?
> sth like:
> void foo(int x __attribute__((runtime_constant)));
> and the front-end can then diagnose if the argument is
> __builtin_constant_p while type-checking call to foo.

It's an interesting idea, it would have to be on the prototype, not on
the function declaration (except where that serves both purposes).  We
might also want an optional range check on the value as well.

I think a better name for the immediate would be literal_constant, which
is more in keeping with the semantics of the language.  So:

void foo(int x __attribute__((literal_constant (min_val, max_val)));

R.

> 
> Thanks,
> Prathamesh
>>
>> R.
>>
>>>
>>> Thanks,
>>> Prathamesh
>>>>
>>>> R.
>>>>
>>>>> So, is the attached patch OK ?
>>>>
>>>>>
>>>>> Thanks,
>>>>> Prathamesh
>>>>>>
>>>>>> R.

Prathamesh Kulkarni July 23, 2021, 10:44 a.m. UTC | #8

On Fri, 23 Jul 2021 at 15:02, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
>
> On 23/07/2021 08:04, Prathamesh Kulkarni via Gcc-patches wrote:
> > On Thu, 22 Jul 2021 at 20:29, Richard Earnshaw
> > <Richard.Earnshaw@foss.arm.com> wrote:
> >>
> >>
> >>
> >> On 22/07/2021 14:47, Prathamesh Kulkarni via Gcc-patches wrote:
> >>> On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw
> >>> <Richard.Earnshaw@foss.arm.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
> >>>>> On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
> >>>>> <Richard.Earnshaw@foss.arm.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
> >>>>>>> Hi,
> >>>>>>> The attached patch removes calls to builtins from vshl_n intrinsics,
> >>>>>>> and replacing them
> >>>>>>> with left shift operator. The patch passes bootstrap+test on
> >>>>>>> arm-linux-gnueabihf.
> >>>>>>>
> >>>>>>> Altho, I noticed, that the patch causes 3 extra registers to spill
> >>>>>>> using << instead
> >>>>>>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
> >>>>>>> intrinsics ?
> >>>>>>> Before patch, the shift operation was performed by call to
> >>>>>>> __builtin_neon_vshl<type> (__a, __b)
> >>>>>>> and now it's inlined to __a << __b, which might result in increased
> >>>>>>> register pressure ?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Prathamesh
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> You're missing a ChangeLog for the patch.
> >>>>> Sorry, updated in this patch.
> >>>>>>
> >>>>>> However, I'm not sure about this.  The register shift form of VSHL
> >>>>>> performs a right shift if the value is negative, which is UB if you
> >>>>>> write `<<` instead.
> >>>>>>
> >>>>>> Have I missed something here?
> >>>>> Hi Richard,
> >>>>> According to this article:
> >>>>> https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
> >>>>> For vshl_n, the shift amount is always in the non-negative range for all types.
> >>>>>
> >>>>> I tried using vshl_n_s32 (a, -1), and the compiler emitted following diagnostic:
> >>>>> foo.c: In function ‘main’:
> >>>>> foo.c:17:1: error: constant -1 out of range 0 - 31
> >>>>>      17 | }
> >>>>>         | ^
> >>>>>
> >>>>
> >>>> It does do that now, but that's because the intrinsic expansion does
> >>>> some bounds checking; when you remove the call into the back-end
> >>>> intrinsic that will no-longer happen.
> >>>>
> >>>> I think with this change various things are likely:
> >>>>
> >>>> - We'll no-longer reject non-immediate values, so users will be able to
> >>>> write
> >>>>
> >>>>           int b = 5;
> >>>>          vshl_n_s32 (a, b);
> >>>>
> >>>>     which will expand to a vdup followed by the register form.
> >>>>
> >>>> - we'll rely on the front-end diagnosing out-of range shifts
> >>>>
> >>>> - code of the form
> >>>>
> >>>>          int b = -1;
> >>>>          vshl_n_s32 (a, b);
> >>>>
> >>>>     will probably now go through without any errors, especially at low
> >>>> optimization levels.  It may end up doing what the user wanted, but it's
> >>>> definitely a change in behaviour - and perhaps worse, the compiler might
> >>>> diagnose the above as UB and silently throw some stuff away.
> >>>>
> >>>> It might be that we need to insert some form of static assertion that
> >>>> the second argument is a __builtin_constant_p().
> >>> Ah right, thanks for the suggestions!
> >>> I tried the above example:
> >>> int b = -1;
> >>> vshl_n_s32 (a, b);
> >>> and it compiled without any errors with -O0 after patch.
> >>>
> >>> Would it be OK to use _Static_assert (__builtin_constant_p (b)) to
> >>> guard against non-immediate values ?
> >>>
> >>> With the following change:
> >>> __extension__ extern __inline int32x2_t
> >>> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> >>> vshl_n_s32 (int32x2_t __a, const int __b)
> >>> {
> >>>    _Static_assert (__builtin_constant_p (__b));
> >>>    return __a << __b;
> >>> }
> >>>
> >>> the above example fails at -O0:
> >>> ../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’:
> >>> ../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed
> >>>   4904 |   _Static_assert (__builtin_constant_p (__b));
> >>>        |   ^~~~~~~~~~~~~~
> >>
> >> I've been playing with that but unfortunately it doesn't seem to work in
> >> the way we want it to.  For a complete test:
> >>
> >>
> >>
> >> typedef __simd64_int32_t int32x2_t;
> >>
> >> __extension__ extern __inline int32x2_t
> >> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> >> vshl_n_s32 (int32x2_t __a, const int __b)
> >> {
> >>    _Static_assert (__builtin_constant_p (__b), "Second argument must be
> >> a litteral constant");
> >>    return __a << __b;
> >> }
> >>
> >> int32x2_t f (int32x2_t x, const int b)
> >> {
> >>    return vshl_n_s32 (x, 1);
> >> }
> >>
> >> At -O0 I get:
> >>
> >> test.c: In function ‘vshl_n_s32’:
> >> test.c:7:3: error: static assertion failed: "Second argument must be a
> >> litteral constant"
> >>      7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
> >> must be a litteral constant");
> >>        |   ^~~~~~~~~~~~~~
> >>
> >> While at -O1 and above I get:
> >>
> >>
> >> test.c: In function ‘vshl_n_s32’:
> >> test.c:7:19: error: expression in static assertion is not constant
> >>      7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
> >> must be a litteral constant");
> >>        |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> Which indicates that it doesn't consider __builtin_constant_p() to be a
> >> constant expression :(
> >>
> >> So either I'm writing the static assertion incorrectly, or something
> >> weird is going on.  The most likely issue is that the static assertion
> >> is being processed too early, before the function is inlined.
> > Ah indeed. I wonder if we should add an attribute to parameter that it
> > should be constant,
> > and emit an error if the caller passes non-constant value ?
> > sth like:
> > void foo(int x __attribute__((runtime_constant)));
> > and the front-end can then diagnose if the argument is
> > __builtin_constant_p while type-checking call to foo.
>
> It's an interesting idea, it would have to be on the prototype, not on
> the function declaration (except where that serves both purposes).  We
> might also want an optional range check on the value as well.
>
> I think a better name for the immediate would be literal_constant, which
> is more in keeping with the semantics of the language.  So:
>
> void foo(int x __attribute__((literal_constant (min_val, max_val)));
Thanks for the suggestions! I will raise a RFC on gcc@ for
literal_constant attribute.

Digging a bit into discrepancy in warnings: assertion failed at -O0 vs
expression not constant
at -O1+:
The errors come from following hunk in
c-parser.c:c_parser_static_assert_declaration_no_semi:

  if (TREE_CODE (value) != INTEGER_CST)
    {
      error_at (value_loc, "expression in static assertion is not constant");
      return;
    }
  constant_expression_warning (value);
  if (integer_zerop (value))
    {
      if (string)
        error_at (assert_loc, "static assertion failed: %E", string);
      else
        error_at (assert_loc, "static assertion failed");
    }

So at -O0, "value" is literal constant 0, while at -O1+, "value" is
CALL_EXPR, which is why it seems to give different warnings at -O0 and
-O1+.

Thanks,
Prathamesh
>
> R.
>
> >
> > Thanks,
> > Prathamesh
> >>
> >> R.
> >>
> >>>
> >>> Thanks,
> >>> Prathamesh
> >>>>
> >>>> R.
> >>>>
> >>>>> So, is the attached patch OK ?
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Prathamesh
> >>>>>>
> >>>>>> R.
>

[ARM] PR66791: Replace builtins in vshl_n

Commit Message

Comments

Patch