mbox series

[repost,0/5] Add PowerPC Dense Math Support for future cpus

Message ID ZzlGqCyepXFcTbqq@cowardly-lion.the-meissners.org
Headers show
Series Add PowerPC Dense Math Support for future cpus | expand

Message

Michael Meissner Nov. 17, 2024, 1:28 a.m. UTC
I have posted this patch several times over the years.  I am reposting it in
case the last time I posted it got lost.

The last time I posted this patch was on October 28th:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666662.html

This patch was posted a year or so during the GCC 14 patches and then reposted
on October 28th, and I'm posting it again with the hopes that I can get this
into GCC 15.  In the GCC 14 time frame, 1,024 bit registers were not supported
due to the bit length in internal structures.  In GCC 15, 1,024 bit registers
are now supported.

Note, these patches are for a potential future PowerPC.  They are not targeted
towards a specific CPU, and they may change if/when a PowerPC with this
instruction set is released.

The main motivation is to get in support for the 1,024 bit dense math registers
into the current GCC.  In the current power10 hardware, the 8 512-bit accumulator
registers overlap with the VSX registers 0..31.

If dense math register support is added in a future machine, these registers
will become separate registers.  The current instructions will work, using
these new registers.  If you use existing code, the VSX registers that
currently overlap with the accumulators will not be used, and instead the
separate dense math registers will be used.

One of the important changes in these patches is to add a new constraint
('wD').  When code is compiled for the power10, 'wD' will match the VSX
registers 0..31 (i.e. the traditional floating point registers).  When code is
compiled for the potential future machine, 'wD' will match the new separate
dense math registers.  Thus for __asm__ code that uses the accumulator
registers, the code should change 'd' constraints to 'wD'.

The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:

    1)  If possible, don't use extended asm, but instead use the MMA built-in
        functions;

    2)  If you do need to write extended asm, change the d constraints
        targetting accumulators should now use wD when using GCC 15 or later;

    3)  Only use the built-in zero, assemble and disassemble functions create
        move data between vector quad types and dense math accumulators.
        I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
        extended asm code.  The reason is these instructions assume there is a
        1-to-1 correspondence between 4 adjacent FPR registers and an
        accumulator that overlaps with those instructions.  With accumulators
        now being separate registers, there no longer is a 1-to-1
        correspondence.

Note, the first patch of the previous patch set, which enables the memory move
optimizations to use load/store vector pair instructions for -mcpu=future has
been moved to the -mcpu=future support.

This patch assumes the previous patches submitted on November 16th have been
applied:

Add more user friendly TARGET_names for PowerPC
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669067.html

Add support for -mcpu=future in the PowerPC
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669099.html

Logically the following patches might not be needed by these patches, but I
haven't tried the combination:

Do not allow -mvsx to boost the cpu to power7
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669106.html

Separate PowerPC ISA bits from architecture bits set by -mcpu=<xxx>
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669108.html

The other bug fixes posted are independent of this patch.

Comments

Michael Meissner Dec. 4, 2024, 8:36 a.m. UTC | #1
I have posted a new version of this patch.  The only difference is that patches
#3 and #4 in this patch set are now deleted.  Those patches changed the
mnemonic of the MMA instruction from the version used on the power10/power11 to
the new version for dense math registers if -mcpu=future is used.  The GAS
assembler will generate the same bit pattern for either the old or new mnemonic.

The new patches are at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670789.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670790.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670791.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670792.html