mbox series

[0/6] PowerPC Future support (Dense Math Registers)

Message ID Zx_nX5chpY5pZC7R@cowardly-lion.the-meissners.org
Headers show
Series PowerPC Future support (Dense Math Registers) | expand

Message

Michael Meissner Oct. 28, 2024, 7:34 p.m. UTC
This patch was posted a year or so during the GCC 14 patches, and I'm posting
it again with the hopes that I can get this into GCC 15.  In the GCC 14 time
frame, 1,024 bit registers were not supported due to the bit length in internal
structures.  In GCC 15, 1,024 bit registers are now supported.

Note, these patches are for a potential future PowerPC.  They are not targeted
towards a specific CPU, and they may change if/when a PowerPC with this
instruction set is released.

The main motivation is to get in support for the 1,024 bit dense math registers
into the current GCC.  In the current power10 hardware, the 8 512-bit accumulator
registers overlap with the VSX registers 0..31.

If dense math register support is added in a future machine, these registers
will become separate registers.  The current instructions will work, using
these new registers.  If you use existing code, the VSX registers that
currently overlap with the accumulators will not be used, and instead the
separate dense math registers will be used.

One of the important changes in these patches is to add a new constraint
('wD').  When code is compiled for the power10, 'wD' will match the VSX
registers 0..31 (i.e. the traditional floating point registers).  When code is
compiled for the potential future machine, 'wD' will match the new separate
dense math registers.  Thus for __asm__ code that uses the accumulator
registers, the code should change 'd' constraints to 'wD'.

The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:

    1)  If possible, don't use extended asm, but instead use the MMA built-in
        functions;

    2)  If you do need to write extended asm, change the d constraints
        targetting accumulators should now use wD when using GCC 15 or later;

    3)  Only use the built-in zero, assemble and disassemble functions create
        move data between vector quad types and dense math accumulators.
        I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
        extended asm code.  The reason is these instructions assume there is a
        1-to-1 correspondence between 4 adjacent FPR registers and an
        accumulator that overlaps with those instructions.  With accumulators
        now being separate registers, there no longer is a 1-to-1
        correspondence.

This patch assumes the 11 patches that were posted on October 25th that
separate the ISA flags bits from the architecture bits and that adds the
-mcpu=future option have been applied.  If those patches are rejected, I would
need to modify these patches to add an undocumented '-mfuture' option that
would be set for dense math generation.
 * https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666529.html

There are 6 patches in this patch set:

Patch #1 enables using the vector pair load and store instructions when
generating memory copy operations.

Patch #2 adds the 'wD' constraint, and modifies the mma code to use 'wD'
instead of 'd' or 'f'.

Patch #3 adds support for separate dense math registers if -mcpu=future.  This
support keeps the register size to be 512 bits, issuing the instructions that
are common between the power10 MMA instruction set and the future dense math
instruction set.

Patch #4 changes the assembler instruction names from the original MMA
instructions to the newer mnemonics for dense math instructions when
-mcpu=future is used.  The GAS assembler will issue the same bit pattern for
the old name and the new name.

Patch #5 adds a test for dense match support.

Patch #6 adds support for the dense math instructions that use 1,024 bit
registers.  This patch adds a new keyword ('__dmr') for the 1,024 bit dense
math registers.  A new mode (TDOmode) is added for 1,024 bit registers.  Only
the register support is added in this patch.  Assuming these 6 patches go in,
future patches will provide new built-in functions to issue the new
instructions.