Message ID | CAMe9rOrQjcYbqY-rtnWszpca=LT2gdXm8ao_d5HqtvrRsXOqnA@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [v2] x86: Check AVX512 without mask instructions | expand |
On Fri, Jun 25, 2021 at 5:39 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > > > > > > > CPUID functions are used to detect CPU features. If vector ISAs > > > > > are enabled, compiler is free to use them in these functions. Add > > > > > __attribute__ ((target("general-regs-only"))) to CPUID functions > > > > > to avoid vector instructions. > > > > > > > > These functions are intended to be inlined, so how does target > > > > attribute affect inlining? > > > I guess w/ -O0. they may not be inlined, that's why H.J adds those > > > attributes to those functions. > > > > The problem is not with these functions, but with surrounding checks > > for cpuid features. These checks are implemented with logic > > instructions, and nothing prevents RA from allocating mask registers, > > and consequently mask insn is emitted. Regarding mentioned functions, > > cpuid insn pattern has four GPR single-reg constraints, so mask > > registers can't be allocated here. > > > > > pr96814.dump: > > > 0804aa40 <main>: > > > 804aa40: 8d 4c 24 04 lea 0x4(%esp),%ecx > > > ... > > > 804aa63: 6a 07 push $0x7 > > > 804aa65: e8 e0 e7 ff ff call 804924a <__get_cpuid_count> > > > > > > Also we need to add a target attribute to avx512f_os_support (), and > > > that would be enough to fix the AVX512 part. > > > > > > Moreover, all check functions in below files may also need to deal with: > > > adx-check.h > > > aes-avx-check.h > > > aes-check.h > > > amx-check.h > > > attr-nocf-check-1a.c > > > attr-nocf-check-3a.c > > > avx2-check.h > > > avx2-vpop-check.h > > > avx512bw-check.h > > > avx512-check.h > > > avx512dq-check.h > > > avx512er-check.h > > > avx512f-check.h > > > avx512vl-check.h > > > avx-check.h > > > bmi2-check.h > > > bmi-check.h > > > cf_check-1.c > > > cf_check-2.c > > > cf_check-3.c > > > cf_check-4.c > > > cf_check-5.c > > > f16c-check.h > > > fma4-check.h > > > fma-check.h > > > isa-check.h > > > lzcnt-check.h > > > m128-check.h > > > m256-check.h > > > m512-check.h > > > mmx-3dnow-check.h > > > mmx-check.h > > > pclmul-avx-check.h > > > pclmul-check.h > > > pr39315-check.c > > > rtm-check.h > > > sha-check.h > > > spellcheck-options-1.c > > > spellcheck-options-2.c > > > spellcheck-options-3.c > > > spellcheck-options-4.c > > > spellcheck-options-5.c > > > sse2-check.h > > > sse3-check.h > > > sse4_1-check.h > > > sse4_2-check.h > > > sse4a-check.h > > > sse-check.h > > > ssse3-check.h > > > stack-check-11.c > > > stack-check-12.c > > > stack-check-17.c > > > stack-check-18.c > > > stack-check-19.c > > > xop-check.h > > > > True, but this would just paper over the real problem. Now, it is > > expected that the user decorates the function that checks CPUID > > features with the target attribute. I'm not sure if this is OK. > > > > Uros. > > CPUID functions are used to detect CPU features. If mask instructions > are enabled, compiler is free to use them in these functions. Disable > AVX512F in AVX512 check with target pragma to avoid mask instructions. > > OK for master? > PING: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573717.html
On Wed, Jul 14, 2021 at 8:27 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Fri, Jun 25, 2021 at 5:39 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > > > > > > > > > CPUID functions are used to detect CPU features. If vector ISAs > > > > > > are enabled, compiler is free to use them in these functions. Add > > > > > > __attribute__ ((target("general-regs-only"))) to CPUID functions > > > > > > to avoid vector instructions. > > > > > > > > > > These functions are intended to be inlined, so how does target > > > > > attribute affect inlining? > > > > I guess w/ -O0. they may not be inlined, that's why H.J adds those > > > > attributes to those functions. > > > > > > The problem is not with these functions, but with surrounding checks > > > for cpuid features. These checks are implemented with logic > > > instructions, and nothing prevents RA from allocating mask registers, > > > and consequently mask insn is emitted. Regarding mentioned functions, > > > cpuid insn pattern has four GPR single-reg constraints, so mask > > > registers can't be allocated here. > > > > > > > pr96814.dump: > > > > 0804aa40 <main>: > > > > 804aa40: 8d 4c 24 04 lea 0x4(%esp),%ecx > > > > ... > > > > 804aa63: 6a 07 push $0x7 > > > > 804aa65: e8 e0 e7 ff ff call 804924a <__get_cpuid_count> > > > > > > > > Also we need to add a target attribute to avx512f_os_support (), and > > > > that would be enough to fix the AVX512 part. > > > > > > > > Moreover, all check functions in below files may also need to deal with: > > > > adx-check.h > > > > aes-avx-check.h > > > > aes-check.h > > > > amx-check.h > > > > attr-nocf-check-1a.c > > > > attr-nocf-check-3a.c > > > > avx2-check.h > > > > avx2-vpop-check.h > > > > avx512bw-check.h > > > > avx512-check.h > > > > avx512dq-check.h > > > > avx512er-check.h > > > > avx512f-check.h > > > > avx512vl-check.h > > > > avx-check.h > > > > bmi2-check.h > > > > bmi-check.h > > > > cf_check-1.c > > > > cf_check-2.c > > > > cf_check-3.c > > > > cf_check-4.c > > > > cf_check-5.c > > > > f16c-check.h > > > > fma4-check.h > > > > fma-check.h > > > > isa-check.h > > > > lzcnt-check.h > > > > m128-check.h > > > > m256-check.h > > > > m512-check.h > > > > mmx-3dnow-check.h > > > > mmx-check.h > > > > pclmul-avx-check.h > > > > pclmul-check.h > > > > pr39315-check.c > > > > rtm-check.h > > > > sha-check.h > > > > spellcheck-options-1.c > > > > spellcheck-options-2.c > > > > spellcheck-options-3.c > > > > spellcheck-options-4.c > > > > spellcheck-options-5.c > > > > sse2-check.h > > > > sse3-check.h > > > > sse4_1-check.h > > > > sse4_2-check.h > > > > sse4a-check.h > > > > sse-check.h > > > > ssse3-check.h > > > > stack-check-11.c > > > > stack-check-12.c > > > > stack-check-17.c > > > > stack-check-18.c > > > > stack-check-19.c > > > > xop-check.h > > > > > > True, but this would just paper over the real problem. Now, it is > > > expected that the user decorates the function that checks CPUID > > > features with the target attribute. I'm not sure if this is OK. vmovw is enabled by AVX512FP16, and compile cpuid check function w/ avx512fp16 may result in SIGILL on non-avx512fp16 target(though, we didn't get a testcase yet). Would that be a sufficient reason to disable avx512 for cpuid check? > > > > > > Uros. > > > > CPUID functions are used to detect CPU features. If mask instructions > > are enabled, compiler is free to use them in these functions. Disable > > AVX512F in AVX512 check with target pragma to avoid mask instructions. > > > > OK for master? > > > > PING: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573717.html > > > -- > H.J.
On Mon, Jul 26, 2021 at 5:33 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Wed, Jul 14, 2021 at 8:27 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Fri, Jun 25, 2021 at 5:39 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > > > On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > > > > > > > > > > > CPUID functions are used to detect CPU features. If vector ISAs > > > > > > > are enabled, compiler is free to use them in these functions. Add > > > > > > > __attribute__ ((target("general-regs-only"))) to CPUID functions > > > > > > > to avoid vector instructions. > > > > > > > > > > > > These functions are intended to be inlined, so how does target > > > > > > attribute affect inlining? > > > > > I guess w/ -O0. they may not be inlined, that's why H.J adds those > > > > > attributes to those functions. > > > > > > > > The problem is not with these functions, but with surrounding checks > > > > for cpuid features. These checks are implemented with logic > > > > instructions, and nothing prevents RA from allocating mask registers, > > > > and consequently mask insn is emitted. Regarding mentioned functions, > > > > cpuid insn pattern has four GPR single-reg constraints, so mask > > > > registers can't be allocated here. > > > > > > > > > pr96814.dump: > > > > > 0804aa40 <main>: > > > > > 804aa40: 8d 4c 24 04 lea 0x4(%esp),%ecx > > > > > ... > > > > > 804aa63: 6a 07 push $0x7 > > > > > 804aa65: e8 e0 e7 ff ff call 804924a <__get_cpuid_count> > > > > > > > > > > Also we need to add a target attribute to avx512f_os_support (), and > > > > > that would be enough to fix the AVX512 part. > > > > > > > > > > Moreover, all check functions in below files may also need to deal with: > > > > > adx-check.h > > > > > aes-avx-check.h > > > > > aes-check.h > > > > > amx-check.h > > > > > attr-nocf-check-1a.c > > > > > attr-nocf-check-3a.c > > > > > avx2-check.h > > > > > avx2-vpop-check.h > > > > > avx512bw-check.h > > > > > avx512-check.h > > > > > avx512dq-check.h > > > > > avx512er-check.h > > > > > avx512f-check.h > > > > > avx512vl-check.h > > > > > avx-check.h > > > > > bmi2-check.h > > > > > bmi-check.h > > > > > cf_check-1.c > > > > > cf_check-2.c > > > > > cf_check-3.c > > > > > cf_check-4.c > > > > > cf_check-5.c > > > > > f16c-check.h > > > > > fma4-check.h > > > > > fma-check.h > > > > > isa-check.h > > > > > lzcnt-check.h > > > > > m128-check.h > > > > > m256-check.h > > > > > m512-check.h > > > > > mmx-3dnow-check.h > > > > > mmx-check.h > > > > > pclmul-avx-check.h > > > > > pclmul-check.h > > > > > pr39315-check.c > > > > > rtm-check.h > > > > > sha-check.h > > > > > spellcheck-options-1.c > > > > > spellcheck-options-2.c > > > > > spellcheck-options-3.c > > > > > spellcheck-options-4.c > > > > > spellcheck-options-5.c > > > > > sse2-check.h > > > > > sse3-check.h > > > > > sse4_1-check.h > > > > > sse4_2-check.h > > > > > sse4a-check.h > > > > > sse-check.h > > > > > ssse3-check.h > > > > > stack-check-11.c > > > > > stack-check-12.c > > > > > stack-check-17.c > > > > > stack-check-18.c > > > > > stack-check-19.c > > > > > xop-check.h > > > > > > > > True, but this would just paper over the real problem. Now, it is > > > > expected that the user decorates the function that checks CPUID > > > > features with the target attribute. I'm not sure if this is OK. > vmovw is enabled by AVX512FP16, and compile cpuid check function w/ > avx512fp16 may result in SIGILL on non-avx512fp16 target(though, we > didn't get a testcase yet). In struct processor_costs (i386.h) we have: const int sse_to_integer; /* cost of moving SSE register to integer. */ const int integer_to_sse; /* cost of moving integer register to SSE. */ const int mask_to_integer; /* cost of moving mask register to integer. */ const int integer_to_mask; /* cost of moving integer register to mask. */ These are currently set sufficiently high, so we won't get vmovw for the same reason we don't get vmovd and vmovq. > Would that be a sufficient reason to disable avx512 for cpuid check? We would like to avoid inter-unit moves, and keep values in their respective register set as much as possible. This is the reason for relatively high values for the above costs and special passes were introduced (STV) to avoid excessive moves between register sets. Without this approach, register allocator is free to generate e.g. instructions with mask registers instead of integer registers (especially under register pressure), trading spills with inter-unit moves. We tried to spill to SSE registers, and the experiment ended with a nice list of PRs. See ix86_spill_class in i386.c. Decorating the function with general_regs_only would just paper over the above problem. Regarding mask registers, some sort of STV-like target pass is needed to control precisely how values are moved between register sets, loaded to mask registers, handled there and stored. Uros.
From 7961c445f27de3e813d7332afc14e9844c0831f7 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Thu, 24 Jun 2021 04:43:41 -0700 Subject: [PATCH v2] x86: Check AVX512 without mask instructions CPUID functions are used to detect CPU features. If mask instructions are enabled, compiler is free to use them in these functions. Disable AVX512F in AVX512 check with target pragma to avoid mask instructions. PR target/101185 * gcc.target/i386/avx512-check.h: Disable AVX512F in AVX512 check with target pragma. --- gcc/testsuite/gcc.target/i386/avx512-check.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h index 0a377dba1d5..7ccf730c4f1 100644 --- a/gcc/testsuite/gcc.target/i386/avx512-check.h +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h @@ -1,5 +1,8 @@ #include <stdlib.h> +#pragma GCC push_options +#pragma GCC target ("no-mmx,no-sse") #include "cpuid.h" +#pragma GCC pop_options #include "m512-check.h" #include "avx512f-os-support.h" @@ -25,6 +28,9 @@ do_test (void) } #endif +#pragma GCC push_options +#pragma GCC target ("no-mmx,no-sse") + static int check_osxsave (void) { @@ -110,3 +116,4 @@ main () #endif return 0; } +#pragma GCC pop_options -- 2.31.1