mbox series

[0/1] AArch64: LUTI2/LUTI4 ACLE for SVE2

Message ID 20240710133414.741793-1-vladimir.miloserdov@arm.com
Headers show
Series AArch64: LUTI2/LUTI4 ACLE for SVE2 | expand

Message

Vladimir Miloserdov July 10, 2024, 1:34 p.m. UTC
From: Vladimir Miloserdov <Vladimir.Miloserdov@arm.com>

Hi All,

This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.

LUTI instructions are used for efficient table lookups with 2-bit or 4-bit
indices. LUTI2 reads indexed 8-bit or 16-bit elements from the low 128 bits of
the table vector using packed 2-bit indices, while LUTI4 can read from the low
128 or 256 bits of the table vector or from two table vectors using packed 
4-bit indices. These instructions fill the destination vector by copying 
elements indexed by segments of the source vector, selected by the vector 
segment index.

The changes include the addition of a new AArch64 option extension "lut",
__ARM_FEATURE_LUT preprocessor macro, definitions for the new LUTI instruction
shapes, and implementations of the svluti2 and svluti4 builtins.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

This depends on "Extend aarch64_feature_flags to 128 bits" work which is soon 
to be submitted upstream as we ran out of 64-bit flags. 

The patch needs to be committed for me as I don't have commit rights.

Ok for master when the pre-requisites get committed? 

BR,
- Vladimir

gcc/ChangeLog:

	* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): 
	Add support for __ARM_FEATURE_LUT preprocessor macro.
	* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): 
	Add "lut" option extension.
	* config/aarch64/aarch64-sve-builtins-shapes.cc (struct luti_base): 
	Define new LUTI ACLE shapes.
	(SHAPE): Define shapes for luti2 and luti4.
	* config/aarch64/aarch64-sve-builtins-shapes.h: Add declarations 
	for luti2 and luti4.
	* config/aarch64/aarch64-sve-builtins-sve2.cc (class svluti_lane_impl): 
	Implement support for LUTI instructions.
	(FUNCTION): Register svluti2 and svluti4 functions.
	* config/aarch64/aarch64-sve-builtins-sve2.def (svluti2): 
	Define svluti2 function.
	(svluti4): Define svluti4 function.
	* config/aarch64/aarch64-sve-builtins-sve2.h: Add declarations 
	for svluti2 and svluti4.
	* config/aarch64/aarch64-sve2.md (@aarch64_sve_luti<LUTI_BITS><mode>): 
	Define machine description patterns for LUTI.
	* config/aarch64/aarch64.h (AARCH64_ISA_LUT): Define macro for LUTI.
	(TARGET_LUT): Likewise.
	* config/aarch64/iterators.md: Define mode iterators 
	for LUTI MD patterns.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add macro for 
	SVE ACLE to enable LUTI tests.
	* lib/target-supports.exp: Update to include check for the LUT feature.
	* gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti2_f16.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti2_s16.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti2_s8.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti2_u16.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti2_u8.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti4_bf16_vg1x2.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti4_f16.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti4_f16_vg1x2.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti4_s16.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti4_s16_vg1x2.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti4_s8.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti4_u16.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti4_u16_vg1x2.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/luti4_u8.c: New test.