Message ID | 20240322155449.747518-1-ams@baylibre.com |
---|---|
State | New |
Headers | show |
Series | [committed] amdgcn: Prefer V32 on RDNA devices | expand |
Hi Andrew! On 2024-03-22T15:54:48+0000, Andrew Stubbs <ams@baylibre.com> wrote: > This patch alters the default (preferred) vector size to 32 on RDNA devices to > better match the actual hardware. 64-lane vectors will continue to be > used where they are hard-coded (such as function prologues). > > We run these devices in wavefrontsize64 for compatibility, but they actually > only have 32-lane vectors, natively. If the upper part of a V64 is masked > off (as it is in V32) then RDNA devices will skip execution of the upper part > for most operations, so this adjustment shouldn't leave too much performance on > the table. One exception is memory instructions, so full wavefrontsize32 > support would be better. > > The advantage is that we avoid the missing V64 operations (such as permute and > vec_extract). > > Committed to mainline. In my GCN target '-march=gfx1100' testing, this commit "amdgcn: Prefer V32 on RDNA devices" does resolve (or, make latent?) a number of execution test FAILs (that is, regressions compared to earlier '-march=gfx90a' etc. testing). This commit also resolves (for my '-march=gfx1100' testing) one pre-existing FAIL (that is, already seen in '-march=gfx90a' earlier etc. testing): PASS: gcc.dg/tree-ssa/scev-14.c (test for excess errors) [-FAIL:-]{+PASS:+} gcc.dg/tree-ssa/scev-14.c scan-tree-dump ivopts "Overflowness wrto loop niter:\tNo-overflow" That means, this test case specifically (or, just its 'scan-tree-dump'?) needs to be adjusted for GCN V64 testing? This commit, as you'd also mentioned elsewhere, however also causes a number of regressions in 'gcc.target/gcn/gcn.exp', see list below. Those can be "fixed" with 'dg-additional-options -march=gfx90a' (or similar) in the affected test cases (let me know if you'd like me to 'git push' that), but I suppose something more elaborate may be in order? (Conditionalize those on 'target { ! gcn_rdna }', and add respective scanning for 'target gcn_rdna'? I can help with effective-target 'gcn_rdna' (or similar), if you'd like me to.) And/or, have a '-mpreferred-simd-mode=v64' (or similar) to be used for such test cases, to override 'if (TARGET_RDNA2_PLUS)' etc. in 'gcn_vectorize_preferred_simd_mode'? Best, probably, both these things, to properly test both V32 and V64? PASS: gcc.target/gcn/cond_fmaxnm_1.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times smaxv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times smaxv64sf3_exec 3 PASS: gcc.target/gcn/cond_fmaxnm_1_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fmaxnm_1_run.c execution test PASS: gcc.target/gcn/cond_fmaxnm_2.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-times smaxv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-times smaxv64sf3_exec 3 PASS: gcc.target/gcn/cond_fmaxnm_2_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fmaxnm_2_run.c execution test PASS: gcc.target/gcn/cond_fmaxnm_3.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times movv64df_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times movv64sf_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times smaxv64sf3 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times smaxv64sf3 3 PASS: gcc.target/gcn/cond_fmaxnm_3_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fmaxnm_3_run.c execution test PASS: gcc.target/gcn/cond_fmaxnm_4.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times movv64df_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times movv64sf_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times smaxv64sf3 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times smaxv64sf3 3 PASS: gcc.target/gcn/cond_fmaxnm_4_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fmaxnm_4_run.c execution test PASS: gcc.target/gcn/cond_fmaxnm_5.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-times smaxv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-times smaxv64sf3_exec 3 PASS: gcc.target/gcn/cond_fmaxnm_5_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fmaxnm_5_run.c execution test PASS: gcc.target/gcn/cond_fmaxnm_6.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_6.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_6.c scan-assembler-times smaxv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_6.c scan-assembler-times smaxv64sf3_exec 3 PASS: gcc.target/gcn/cond_fmaxnm_6_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fmaxnm_6_run.c execution test PASS: gcc.target/gcn/cond_fmaxnm_7.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_7.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_7.c scan-assembler-times smaxv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_7.c scan-assembler-times smaxv64sf3_exec 3 PASS: gcc.target/gcn/cond_fmaxnm_7_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fmaxnm_7_run.c execution test PASS: gcc.target/gcn/cond_fmaxnm_8.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_8.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_8.c scan-assembler-times smaxv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_8.c scan-assembler-times smaxv64sf3_exec 3 PASS: gcc.target/gcn/cond_fmaxnm_8_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fmaxnm_8_run.c execution test PASS: gcc.target/gcn/cond_fminnm_1.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_1.c scan-assembler-times sminv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_1.c scan-assembler-times sminv64sf3_exec 3 PASS: gcc.target/gcn/cond_fminnm_1_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fminnm_1_run.c execution test PASS: gcc.target/gcn/cond_fminnm_2.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_2.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_2.c scan-assembler-times sminv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_2.c scan-assembler-times sminv64sf3_exec 3 PASS: gcc.target/gcn/cond_fminnm_2_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fminnm_2_run.c execution test PASS: gcc.target/gcn/cond_fminnm_3.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times movv64df_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times movv64sf_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times sminv64sf3 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times sminv64sf3 3 PASS: gcc.target/gcn/cond_fminnm_3_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fminnm_3_run.c execution test PASS: gcc.target/gcn/cond_fminnm_4.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times movv64df_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times movv64sf_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times sminv64sf3 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times sminv64sf3 3 PASS: gcc.target/gcn/cond_fminnm_4_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fminnm_4_run.c execution test PASS: gcc.target/gcn/cond_fminnm_5.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_5.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_5.c scan-assembler-times sminv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_5.c scan-assembler-times sminv64sf3_exec 3 PASS: gcc.target/gcn/cond_fminnm_5_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fminnm_5_run.c execution test PASS: gcc.target/gcn/cond_fminnm_6.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_6.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_6.c scan-assembler-times sminv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_6.c scan-assembler-times sminv64sf3_exec 3 PASS: gcc.target/gcn/cond_fminnm_6_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fminnm_6_run.c execution test PASS: gcc.target/gcn/cond_fminnm_7.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_7.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_7.c scan-assembler-times sminv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_7.c scan-assembler-times sminv64sf3_exec 3 PASS: gcc.target/gcn/cond_fminnm_7_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fminnm_7_run.c execution test PASS: gcc.target/gcn/cond_fminnm_8.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_8.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_.. [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_8.c scan-assembler-times sminv64df3_exec 3 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_8.c scan-assembler-times sminv64sf3_exec 3 PASS: gcc.target/gcn/cond_fminnm_8_run.c (test for excess errors) PASS: gcc.target/gcn/cond_fminnm_8_run.c execution test @@ -124634,12 +124634,12 @@ PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-not movv64di_exec/2 PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-not v_cndmask_b32 PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-times \\tv_ashrrev_i32\\tv[0-9]+, 3, v[0-9]+ 1 PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-times \\tv_lshlrev_b32\\tv[0-9]+, 3, v[0-9]+ 10 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vashlv64di3_exec 2 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vashlv64si3_exec 18 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vashrv64di3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vashrv64si3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vlshrv64di3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times vlshrv64si3_exec 1 PASS: gcc.target/gcn/cond_shift_3_run.c (test for excess errors) PASS: gcc.target/gcn/cond_shift_3_run.c execution test PASS: gcc.target/gcn/cond_shift_4.c (test for excess errors) @@ -124647,77 +124647,77 @@ PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-not movv64di_exec/2 PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-not v_cndmask_b32 PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-times \\tv_ashrrev_i32\\tv[0-9]+, 3, v[0-9]+ 1 PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-times \\tv_lshlrev_b32\\tv[0-9]+, 3, v[0-9]+ 10 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vashlv64di3_exec 2 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vashlv64si3_exec 18 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vashrv64di3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vashrv64si3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vlshrv64di3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times vlshrv64si3_exec 1 PASS: gcc.target/gcn/cond_shift_4_run.c (test for excess errors) PASS: gcc.target/gcn/cond_shift_4_run.c execution test PASS: gcc.target/gcn/cond_shift_8.c (test for excess errors) PASS: gcc.target/gcn/cond_shift_8.c scan-assembler-not movv64di_exec/0 PASS: gcc.target/gcn/cond_shift_8.c scan-assembler-not movv64si_exec/0 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vashlv64di3_exec 2 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vashlv64si3_exec 18 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vashrv64di3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vashrv64si3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vlshrv64di3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times vlshrv64si3_exec 1 PASS: gcc.target/gcn/cond_shift_8_run.c (test for excess errors) PASS: gcc.target/gcn/cond_shift_8_run.c execution test PASS: gcc.target/gcn/cond_shift_9.c (test for excess errors) PASS: gcc.target/gcn/cond_shift_9.c scan-assembler-not movv64di_exec/1 PASS: gcc.target/gcn/cond_shift_9.c scan-assembler-not movv64si_exec/2 PASS: gcc.target/gcn/cond_shift_9.c scan-assembler-not v_cndmask_b32 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vashlv64di3_exec 2 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vashlv64si3_exec 18 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vashrv64di3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vashrv64si3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vlshrv64di3_exec 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times vlshrv64si3_exec 1 PASS: gcc.target/gcn/cond_shift_9_run.c (test for excess errors) PASS: gcc.target/gcn/cond_shift_9_run.c execution test PASS: gcc.target/gcn/cond_smax_1.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smax_1.c scan-assembler-not \\ts_cmpk_lg_u32\\tvcc_lo, 0 PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-not \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_??, 0 PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-not smaxv64si3/0 PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80 PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10 PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], -1 10 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smax_1.c scan-assembler-times smaxv64si3_exec 30 PASS: gcc.target/gcn/cond_smax_1_run.c (test for excess errors) PASS: gcc.target/gcn/cond_smax_1_run.c execution test PASS: gcc.target/gcn/cond_smin_1.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smin_1.c scan-assembler-not \\ts_cmpk_lg_u32\\tvcc_lo, 0 PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-not \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_??, 0 PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-not sminv64si3/0 PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80 PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_lt_i64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10 PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], -1 10 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smin_1.c scan-assembler-times sminv64si3_exec 30 PASS: gcc.target/gcn/cond_smin_1_run.c (test for excess errors) PASS: gcc.target/gcn/cond_smin_1_run.c execution test PASS: gcc.target/gcn/cond_umax_1.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umax_1.c scan-assembler-not \\ts_cmpk_lg_u32\\tvcc_lo, 0 PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_??, 0 PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-not umaxv64si3/0 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56 PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_u64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8 PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], 1 8 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umax_1.c scan-assembler-times umaxv64si3_exec 20 PASS: gcc.target/gcn/cond_umax_1_run.c (test for excess errors) PASS: gcc.target/gcn/cond_umax_1_run.c execution test PASS: gcc.target/gcn/cond_umin_1.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umin_1.c scan-assembler-not \\ts_cmpk_lg_u32\\tvcc_lo, 0 PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-not \\tv_writelane_b32\\tv[0-9]+, vcc_??, 0 PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-not uminv64si3/0 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56 PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_lt_u64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8 PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], 1 8 [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umin_1.c scan-assembler-times uminv64si3_exec 20 PASS: gcc.target/gcn/cond_umin_1_run.c (test for excess errors) PASS: gcc.target/gcn/cond_umin_1_run.c execution test PASS: gcc.target/gcn/simd-math-1.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_acos" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_acosh" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_asin" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_asinh" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_atan" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_atan2" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_atanh" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_copysign" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_cos" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_cosh" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_erf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_exp" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_exp2" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_fmod" XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_gamma" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_hypot" XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_lgamma" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_log" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_log10" XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_log2" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_pow" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_remainder" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_rint" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_scalb" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_significand" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_sin" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_sinh" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_sqrt" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_tan" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_tanh" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_tgamma" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_acosf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_acoshf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_asinf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_asinhf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_atan2f" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_atanf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_atanhf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_copysignf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_cosf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_coshf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_erff" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_exp2f" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_expf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_fmodf" XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_gammaf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_hypotf" XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_lgammaf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_log10f" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_log2f" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_logf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_powf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_remainderf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_rintf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_scalbf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_significandf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_sinf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_sinhf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_sqrtf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_tanf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_tanhf" [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_tgammaf" @@ -125130,7 +125130,7 @@ PASS: gcc.target/gcn/simd-math-5-char-run.c (test for excess errors) PASS: gcc.target/gcn/simd-math-5-char-run.c execution test PASS: gcc.target/gcn/simd-math-5-char.c (test for excess errors) XFAIL: gcc.target/gcn/simd-math-5-char.c scan-assembler-times __divmodv64si4@rel32@lo 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-char.c scan-assembler-times __divv64hi3@rel32@lo 1 PASS: gcc.target/gcn/simd-math-5-char.c scan-assembler-times __divv64qi3@rel32@lo 0 FAIL: gcc.target/gcn/simd-math-5-char.c scan-assembler-times __modv64qi3@rel32@lo 1 PASS: gcc.target/gcn/simd-math-5-char.c scan-assembler-times __udivv64qi3@rel32@lo 0 @@ -125171,8 +125171,8 @@ PASS: gcc.target/gcn/simd-math-5-long-run.c (test for excess errors) PASS: gcc.target/gcn/simd-math-5-long-run.c execution test PASS: gcc.target/gcn/simd-math-5-long.c (test for excess errors) XFAIL: gcc.target/gcn/simd-math-5-long.c scan-assembler-times __divmodv64di4@rel32@lo 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-long.c scan-assembler-times __divv64di3@rel32@lo 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-long.c scan-assembler-times __modv64di3@rel32@lo 1 PASS: gcc.target/gcn/simd-math-5-long.c scan-assembler-times __udivv64di3@rel32@lo 0 PASS: gcc.target/gcn/simd-math-5-long.c scan-assembler-times __umodv64di3@rel32@lo 0 PASS: gcc.target/gcn/simd-math-5-short.c (test for excess errors) XFAIL: gcc.target/gcn/simd-math-5-short.c scan-assembler-times __divmodv64si4@rel32@lo 1 PASS: gcc.target/gcn/simd-math-5-short.c scan-assembler-times __divv64hi3@rel32@lo 0 [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-short.c scan-assembler-times __divv64si3@rel32@lo 1 FAIL: gcc.target/gcn/simd-math-5-short.c scan-assembler-times __modv64hi3@rel32@lo 1 PASS: gcc.target/gcn/simd-math-5-short.c scan-assembler-times __udivv64hi3@rel32@lo 0 PASS: gcc.target/gcn/simd-math-5-short.c scan-assembler-times __umodv64hi3@rel32@lo 0 PASS: gcc.target/gcn/simd-math-5.c (test for excess errors) XFAIL: gcc.target/gcn/simd-math-5.c scan-assembler-times __divmodv64si4@rel32@lo 1 PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __divsi3@rel32@lo 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5.c scan-assembler-times __divv64si3@rel32@lo 1 [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5.c scan-assembler-times __modv64si3@rel32@lo 1 PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __udivmodv64si4@rel32@lo 0 PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __udivsi3@rel32@lo 0 PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __udivv64si3@rel32@lo 0 @@ -125242,13 +125242,13 @@ PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __umodv64si3@rel32@lo 0 PASS: gcc.target/gcn/smax_1.c (test for excess errors) PASS: gcc.target/gcn/smax_1.c scan-assembler-times \\tv_cmp_gt_i64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10 FAIL: gcc.target/gcn/smax_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80 [-PASS:-]{+FAIL:+} gcc.target/gcn/smax_1.c scan-assembler-times vec_cmpv64didi 10 PASS: gcc.target/gcn/smax_1_run.c (test for excess errors) PASS: gcc.target/gcn/smax_1_run.c execution test PASS: gcc.target/gcn/smin_1.c (test for excess errors) PASS: gcc.target/gcn/smin_1.c scan-assembler-times \\tv_cmp_lt_i64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10 FAIL: gcc.target/gcn/smin_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80 [-PASS:-]{+FAIL:+} gcc.target/gcn/smin_1.c scan-assembler-times vec_cmpv64didi 10 PASS: gcc.target/gcn/smin_1_run.c (test for excess errors) PASS: gcc.target/gcn/smin_1_run.c execution test PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler (\\*zero_extendv64qiv64si_sdwa|\\*zero_extendv64qiv64si_shift) PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler (\\*zero_extendv64hiv64si_sdwa|\\*zero_extendv64hiv64si_shift) PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler (\\*zero_extendv64qiv64si_sdwa|\\*zero_extendv64qiv64si_shift) PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler (\\*zero_extendv64hiv64si_sdwa|\\*zero_extendv64hiv64si_shift) PASS: gcc.target/gcn/umax_1.c (test for excess errors) PASS: gcc.target/gcn/umax_1.c scan-assembler-times \\tv_cmp_gt_u64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8 FAIL: gcc.target/gcn/umax_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56 [-PASS:-]{+FAIL:+} gcc.target/gcn/umax_1.c scan-assembler-times vec_cmpv64didi 8 PASS: gcc.target/gcn/umax_1_run.c (test for excess errors) PASS: gcc.target/gcn/umax_1_run.c execution test PASS: gcc.target/gcn/umin_1.c (test for excess errors) PASS: gcc.target/gcn/umin_1.c scan-assembler-times \\tv_cmp_lt_u64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8 FAIL: gcc.target/gcn/umin_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56 [-PASS:-]{+FAIL:+} gcc.target/gcn/umin_1.c scan-assembler-times vec_cmpv64didi 8 PASS: gcc.target/gcn/umin_1_run.c (test for excess errors) PASS: gcc.target/gcn/umin_1_run.c execution test Grüße Thomas > gcc/ChangeLog: > > * config/gcn/gcn.cc (gcn_vectorize_preferred_simd_mode): Prefer V32 on > RDNA devices. > --- > gcc/config/gcn/gcn.cc | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc > index 498146dcde9..efb73af50c4 100644 > --- a/gcc/config/gcn/gcn.cc > +++ b/gcc/config/gcn/gcn.cc > @@ -5226,6 +5226,32 @@ gcn_vector_mode_supported_p (machine_mode mode) > static machine_mode > gcn_vectorize_preferred_simd_mode (scalar_mode mode) > { > + /* RDNA devices have 32-lane vectors with limited support for 64-bit vectors > + (in particular, permute operations are only available for cases that don't > + span the 32-lane boundary). > + > + From the RDNA3 manual: "Hardware may choose to skip either half if the > + EXEC mask for that half is all zeros...". This means that preferring > + 32-lanes is a good stop-gap until we have proper wave32 support. */ > + if (TARGET_RDNA2_PLUS) > + switch (mode) > + { > + case E_QImode: > + return V32QImode; > + case E_HImode: > + return V32HImode; > + case E_SImode: > + return V32SImode; > + case E_DImode: > + return V32DImode; > + case E_SFmode: > + return V32SFmode; > + case E_DFmode: > + return V32DFmode; > + default: > + return word_mode; > + } > + > switch (mode) > { > case E_QImode: > -- > 2.41.0
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 498146dcde9..efb73af50c4 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -5226,6 +5226,32 @@ gcn_vector_mode_supported_p (machine_mode mode) static machine_mode gcn_vectorize_preferred_simd_mode (scalar_mode mode) { + /* RDNA devices have 32-lane vectors with limited support for 64-bit vectors + (in particular, permute operations are only available for cases that don't + span the 32-lane boundary). + + From the RDNA3 manual: "Hardware may choose to skip either half if the + EXEC mask for that half is all zeros...". This means that preferring + 32-lanes is a good stop-gap until we have proper wave32 support. */ + if (TARGET_RDNA2_PLUS) + switch (mode) + { + case E_QImode: + return V32QImode; + case E_HImode: + return V32HImode; + case E_SImode: + return V32SImode; + case E_DImode: + return V32DImode; + case E_SFmode: + return V32SFmode; + case E_DFmode: + return V32DFmode; + default: + return word_mode; + } + switch (mode) { case E_QImode: