mbox series

[0/2] RISC-V: Define not broken prefetch builtins

Message ID cover.1695366672.git.research_trasio@irq.a4lg.com
Headers show
Series RISC-V: Define not broken prefetch builtins | expand

Message

Tsukasa OI Sept. 22, 2023, 7:11 a.m. UTC
Hello,

As I explained earlier:
<https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626916.html>,
the builtin function for RISC-V "__builtin_riscv_zicbop_cbo_prefetchi" is
completely broken.  Instead, this patch set (in PATCH 1/2) creates three
new, working builtin intrinsics.

void __builtin_riscv_prefetch_i(void *addr, [intptr_t offset,] ...);
void __builtin_riscv_prefetch_r(void *addr, [intptr_t offset,] ...);
void __builtin_riscv_prefetch_w(void *addr, [intptr_t offset,] ...);


For consistency with "prefetch.i" and the reason I describe later (which
requires native instructions for "prefetch.r" and "prefetch.w"), I decided
to make builtin functions for "prefetch.[rw]" as well.

Optional second argument (named "offset" here) defaults to zero and must be
a compile-time integral constant.  Also, it must be a valid offset for a
"prefetch.[irw]" HINT instruction (x % 32 == 0 && x >= -2048 && x < 2048).

They are defined if the 'Zicbop' extension is supported and expands to:

> prefetch.i offset(addr_reg)  ; __builtin_riscv_prefetch_i
> prefetch.r offset(addr_reg)  ; __builtin_riscv_prefetch_r
> prefetch.w offset(addr_reg)  ; __builtin_riscv_prefetch_w


The hardest part of this patch set was to support builtin function with
variable argument (making "offset" optional).  It required:

1.  Support for variable argument function prototype for RISC-V builtins
    (corresponding "..." on C-based languages)
2.  Support for (non-vector) RISC-V builtins with custom expansion
    (on RVV intrinsics, custom expansion is already implemented)


... and PATCH 2/2 fixes an ICE while I'm investigating regular prefetch
builtin (__builtin_prefetch).  If the 'Zicbop' extension is enabled,
__builtin_prefetch with the first argument NULL or (not all but) some
fixed addresses (like ((void*)0x20)) can cause an ICE.  This is because
the "r" constraint is not checked and a constant can be a first argument
of target-specific "prefetch" RTL instruction.

PATCH 2/2 fixes this issue by:

1.  Making "prefetch" not an instruction but instead an expansion
    (this is not rare; e.g. on i386) and
2.  Coercing the address argument into a register in the expansion

It requires separate instructions for "prefetch.[rw]" and I decided to make
those prefetch instructions very similar to "prefetch.i".  That's one of the
reasons I created builtins corresponding those.


Sincerely,
Tsukasa




Tsukasa OI (2):
  RISC-V: Define not broken prefetch builtins
  RISC-V: Fix ICE by expansion and register coercion

 gcc/config/riscv/riscv-builtins.cc            | 112 +++++++++++++++++-
 gcc/config/riscv/riscv-cmo.def                |   8 +-
 gcc/config/riscv/riscv-ftypes.def             |   1 +
 gcc/config/riscv/riscv.md                     |  67 ++++++++---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c |  41 ++++---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c |  33 ++----
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-3.c |  29 +++++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-4.c |  14 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-5.c |  14 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-6.c |  38 ++++++
 .../gcc.target/riscv/cmo-zicbop-by-common-1.c |  17 +++
 .../gcc.target/riscv/cmo-zicbop-by-common-2.c |   7 ++
 .../gcc.target/riscv/cmo-zicbop-by-common-3.c |  13 ++
 .../riscv/cmo-zicbop-by-common-ice-1.c        |  13 ++
 .../riscv/cmo-zicbop-by-common-ice-2.c        |   7 ++
 15 files changed, 350 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c


base-commit: 40ac613627205dd4d24ae136917e48b357fee758

Comments

Jeff Law Sept. 26, 2023, 9:38 p.m. UTC | #1
On 9/22/23 01:11, Tsukasa OI wrote:
> Hello,
> 
> As I explained earlier:
> <https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626916.html>,
> the builtin function for RISC-V "__builtin_riscv_zicbop_cbo_prefetchi" is
> completely broken.  Instead, this patch set (in PATCH 1/2) creates three
> new, working builtin intrinsics.
> 
> void __builtin_riscv_prefetch_i(void *addr, [intptr_t offset,] ...);
> void __builtin_riscv_prefetch_r(void *addr, [intptr_t offset,] ...);
> void __builtin_riscv_prefetch_w(void *addr, [intptr_t offset,] ...);
> 
> 
> For consistency with "prefetch.i" and the reason I describe later (which
> requires native instructions for "prefetch.r" and "prefetch.w"), I decided
> to make builtin functions for "prefetch.[rw]" as well.
> 
> Optional second argument (named "offset" here) defaults to zero and must be
> a compile-time integral constant.  Also, it must be a valid offset for a
> "prefetch.[irw]" HINT instruction (x % 32 == 0 && x >= -2048 && x < 2048).
> 
> They are defined if the 'Zicbop' extension is supported and expands to:
> 
>> prefetch.i offset(addr_reg)  ; __builtin_riscv_prefetch_i
>> prefetch.r offset(addr_reg)  ; __builtin_riscv_prefetch_r
>> prefetch.w offset(addr_reg)  ; __builtin_riscv_prefetch_w
> 
> 
> The hardest part of this patch set was to support builtin function with
> variable argument (making "offset" optional).  It required:
> 
> 1.  Support for variable argument function prototype for RISC-V builtins
>      (corresponding "..." on C-based languages)
> 2.  Support for (non-vector) RISC-V builtins with custom expansion
>      (on RVV intrinsics, custom expansion is already implemented)
> 
> 
> ... and PATCH 2/2 fixes an ICE while I'm investigating regular prefetch
> builtin (__builtin_prefetch).  If the 'Zicbop' extension is enabled,
> __builtin_prefetch with the first argument NULL or (not all but) some
> fixed addresses (like ((void*)0x20)) can cause an ICE.  This is because
> the "r" constraint is not checked and a constant can be a first argument
> of target-specific "prefetch" RTL instruction.
> 
> PATCH 2/2 fixes this issue by:
> 
> 1.  Making "prefetch" not an instruction but instead an expansion
>      (this is not rare; e.g. on i386) and
> 2.  Coercing the address argument into a register in the expansion
> 
> It requires separate instructions for "prefetch.[rw]" and I decided to make
> those prefetch instructions very similar to "prefetch.i".  That's one of the
> reasons I created builtins corresponding those.
What I still don't understand is why we're dealing with a decomposed 
address in the builtin, define_expand and/or define_insn.

Have the builtin accept an address, any address.  Then use force_reg to 
force the address into a register in the expander.  My understanding is 
register indirect is always valid.

Create an operand predicate that accepts reg and reg+d for the limited 
displacements allowed.  Use that for the address operand in the 
associated define_insn.


It seems like you're making this more complex than it needs to be.  Or 
I'm missing something critically important.

jeff
Tsukasa OI Oct. 23, 2023, 3:55 a.m. UTC | #2
On 2023/09/27 6:38, Jeff Law wrote:
> 
> 
> On 9/22/23 01:11, Tsukasa OI wrote:
>> Hello,
>>
>> As I explained earlier:
>> <https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626916.html>,
>> the builtin function for RISC-V "__builtin_riscv_zicbop_cbo_prefetchi" is
>> completely broken.  Instead, this patch set (in PATCH 1/2) creates three
>> new, working builtin intrinsics.
>>
>> void __builtin_riscv_prefetch_i(void *addr, [intptr_t offset,] ...);
>> void __builtin_riscv_prefetch_r(void *addr, [intptr_t offset,] ...);
>> void __builtin_riscv_prefetch_w(void *addr, [intptr_t offset,] ...);
>>
>>
>> For consistency with "prefetch.i" and the reason I describe later (which
>> requires native instructions for "prefetch.r" and "prefetch.w"), I
>> decided
>> to make builtin functions for "prefetch.[rw]" as well.
>>
>> Optional second argument (named "offset" here) defaults to zero and
>> must be
>> a compile-time integral constant.  Also, it must be a valid offset for a
>> "prefetch.[irw]" HINT instruction (x % 32 == 0 && x >= -2048 && x <
>> 2048).
>>
>> They are defined if the 'Zicbop' extension is supported and expands to:
>>
>>> prefetch.i offset(addr_reg)  ; __builtin_riscv_prefetch_i
>>> prefetch.r offset(addr_reg)  ; __builtin_riscv_prefetch_r
>>> prefetch.w offset(addr_reg)  ; __builtin_riscv_prefetch_w
>>
>>
>> The hardest part of this patch set was to support builtin function with
>> variable argument (making "offset" optional).  It required:
>>
>> 1.  Support for variable argument function prototype for RISC-V builtins
>>      (corresponding "..." on C-based languages)
>> 2.  Support for (non-vector) RISC-V builtins with custom expansion
>>      (on RVV intrinsics, custom expansion is already implemented)
>>
>>
>> ... and PATCH 2/2 fixes an ICE while I'm investigating regular prefetch
>> builtin (__builtin_prefetch).  If the 'Zicbop' extension is enabled,
>> __builtin_prefetch with the first argument NULL or (not all but) some
>> fixed addresses (like ((void*)0x20)) can cause an ICE.  This is because
>> the "r" constraint is not checked and a constant can be a first argument
>> of target-specific "prefetch" RTL instruction.
>>
>> PATCH 2/2 fixes this issue by:
>>
>> 1.  Making "prefetch" not an instruction but instead an expansion
>>      (this is not rare; e.g. on i386) and
>> 2.  Coercing the address argument into a register in the expansion
>>
>> It requires separate instructions for "prefetch.[rw]" and I decided to
>> make
>> those prefetch instructions very similar to "prefetch.i".  That's one
>> of the
>> reasons I created builtins corresponding those.
> What I still don't understand is why we're dealing with a decomposed
> address in the builtin, define_expand and/or define_insn.

Sorry, I misunderstood your intent (quite badly) possibly because I was
not familiar with the concept of "predicates" in GCC.

On 2023/08/29 6:20, Jeff Law wrote:
> What I would suggest is making a new predicate that accepts either a 
> register or a register+offset where the offset fits in a signed 12 bit 
> immediate.  Use that for operand 0's predicate and I think this will 
> "just work" and cover all the cases supported by the prefetch.i instruction.

I misunderstood that as "just" adding the offset field to the
instructions and that's the reason I veered off the path so much.  So
instead, I'll answer your original question.

register+offset seems a problem for prefetch instructions because signed
12 bit immediate values need to be also a multiple of 32.  There's no
proper relocation type for this kind and I considered we have "very"
limited cases where making such predicate (as you suggested) will
*efficiently* work.

My opinion is, if we need very fine-grained control with prefetch
instructions, we'd better to use inline assembly.

I'll continue testing the possibilities of register+offset predicate
(including whether it works efficiently) and I'll temporarily withdraw
new built-in functions to focus on major issues before GCC 14:

1.  Remove completely broken __builtin_riscv_zicbop_prefetch_i and
2.  Fix an ICE when __builtin_prefetch is used with some constants.

I'll submit minimized patches only to fix those issues.  They will not
contain "register+offset" you suggested because of the difficulties
above but should be sufficient to fix imminent issues.

Thanks,
Tsukasa

> 
> Have the builtin accept an address, any address.  Then use force_reg to
> force the address into a register in the expander.  My understanding is
> register indirect is always valid.
> 
> Create an operand predicate that accepts reg and reg+d for the limited
> displacements allowed.  Use that for the address operand in the
> associated define_insn.
> 
> 
> It seems like you're making this more complex than it needs to be.  Or
> I'm missing something critically important.
> 
> jeff
>
Jeff Law Oct. 30, 2023, 9:38 p.m. UTC | #3
On 10/22/23 21:55, Tsukasa OI wrote:

>> What I still don't understand is why we're dealing with a decomposed
>> address in the builtin, define_expand and/or define_insn.

> 
> Sorry, I misunderstood your intent (quite badly) possibly because I was
> not familiar with the concept of "predicates" in GCC.
OK.  So you might want to read the machine description part of the GCC 
manual.  It describes operand predicates, operand constraints, insn 
conditions, the difference between define_insn vs define_expand and much 
more.


> 
> On 2023/08/29 6:20, Jeff Law wrote:
>> What I would suggest is making a new predicate that accepts either a
>> register or a register+offset where the offset fits in a signed 12 bit
>> immediate.  Use that for operand 0's predicate and I think this will
>> "just work" and cover all the cases supported by the prefetch.i instruction.
> 
> I misunderstood that as "just" adding the offset field to the
> instructions and that's the reason I veered off the path so much.  So
> instead, I'll answer your original question.
> 
> register+offset seems a problem for prefetch instructions because signed
> 12 bit immediate values need to be also a multiple of 32.  There's no
> proper relocation type for this kind and I considered we have "very"
> limited cases where making such predicate (as you suggested) will
> *efficiently* work.
> 
> My opinion is, if we need very fine-grained control with prefetch
> instructions, we'd better to use inline assembly.
> 
> I'll continue testing the possibilities of register+offset predicate
> (including whether it works efficiently) and I'll temporarily withdraw
> new built-in functions to focus on major issues before GCC 14:
> 
> 1.  Remove completely broken __builtin_riscv_zicbop_prefetch_i and
> 2.  Fix an ICE when __builtin_prefetch is used with some constants.
> 
> I'll submit minimized patches only to fix those issues.  They will not
> contain "register+offset" you suggested because of the difficulties
> above but should be sufficient to fix imminent issues.
We should be able to describe this need quite easily.

Each operand has a predicate which the compiler tests to see if a 
particular RTL expression matches.  Some are very generic like 
"register_operand".  Others are target specific.  If you look in 
predicates.md you'll see a list of the predicates already defined for 
risc-v.  I'm pretty sure none of them will work for this case, but we 
can add a new one easily.

The operand in question is going to be a MEM with restrictions on its 
addressing mode.  Either REG or REG + aligned offset.

(define_predicate "prefetch_memory_operand"
   (match_code "mem")
{
   op = XEXP (op, 0);
   return (REG_P (op)
           || (GET_CODE (op) == PLUS
               && REG_P (XEXP (op, 0))
               && CONST_INT_P (XEXP (op, 1))
               && (INTVAL (XEXP (op, 1)) % 32) == 0);
}

[ Note that we did not declare "op".  It's provided by the generator and 
corresponds to the operand we're testing. ]

So you're going to want a define_expand for the basic prefetch

(define_expand "riscv_prefetch_r_<mode>"
   [(unspec_volatile:X [(match_operand:X 0 "memory_operand")]
                        UNSPEC_PREFETCH_R)]
   "TARGET_ZICBOP"
{
   if (!prefetch_memory_operand (Pmode, operands[0])
     XEXP (operands[0], 0) = force_reg (Pmode, XEXP (operands[0], 0);
}

The thing to know about a define_expand is that it's sole purpose is for 
RTL generation purposes.   We can use it as a place to adjust operands 
(as is done in this case), or emit additional RTL such as we do for 
SImode max on rv64 where we have to extend the incoming operands.

In our case we see if the memory address matches 
prefetch_memory_operand, and if not it'll force that address into a new 
register to create a (mem (reg)) object.



(define_insn "*riscv_prefetch_r_<mode>"
   [(unspec_volatile:X [(match_operand:X 0 "prefetch_memory_operand")]
                        UNSPEC_PREFETCH_R)]
   "TARGET_ZICBOP"
   "prefetch.r\t%0"
   [(set_attr "type" "cbo")])

The define_insn construct maps an RTL template to assembly code with 
provisions for testing operands and such.

Anyway, hopefully that makes things clearer.


Jeff