From patchwork Tue Aug 8 16:59:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 1818734 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=fOH+Kq0Q; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RKzw36kczz1yYl for ; Wed, 9 Aug 2023 03:00:35 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BA91C3858284 for ; Tue, 8 Aug 2023 17:00:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BA91C3858284 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1691514033; bh=Dliq0wplx3+7i7P49oszoClQM9bV4s3Y2nK9/Rd6M0I=; h=Date:Subject:To:Cc:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=fOH+Kq0Qi3rtprraxay8M1+yZ+RwTgR6V09uL0/hiS64PPhCXgpZ/9Zse7BuulStd Yy+WPVC6sKwg6lpyav4utGal0p/azbmNy76F19lolVRALBE+fbCFhts870bGlrx5GH /1MVSVmUuQNWbWiJqjcWwfOXbusG+i/tsxofg6TU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by sourceware.org (Postfix) with ESMTPS id 84ED63858D20 for ; Tue, 8 Aug 2023 17:00:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 84ED63858D20 Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-5230df1ce4fso7873115a12.1 for ; Tue, 08 Aug 2023 10:00:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691514010; x=1692118810; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Dliq0wplx3+7i7P49oszoClQM9bV4s3Y2nK9/Rd6M0I=; b=NGhJ+INb++99DStlB7UPhQm1f2N7c0CmzS2Dddm/AL5GzaGYQfxyrxRX7mQD4pqUho fG08OOgw96uDtLtW8JRjCtWPakHqqgOPe342kmqdPfUBvILqwQoFpARAV2sc0MyoSLOJ ELHvEVwBfUrP4sFIWgpgM1I8usrIh7l35u7YQRJ/58wkkWfFUoAB5MMXovMOMf0H0Z+K ehKJVFgocZ7k2NulrNe3ZGwPbfDZwTon8vqAnGnqRiZZvyneOQR4W79KKd8kzC5ZHYHl D3oCxFamrPiyUxiJc7wpTuXJmzmTkxGYgYc/2iQqasItN9Sng8uEWtUayuFAgtl2X0jz kPlg== X-Gm-Message-State: AOJu0YzniuyxiGm8fdK2KdqdT3cKqwS2K6y6q2hIIP5+zV3U3NI2qenC 603vUjDih0YCCMBnqpCteqnRAQscIPLTWw3Ce+zy2RK0Jh1cv3bC X-Google-Smtp-Source: AGHT+IGzEeh2ERivrGm3x2Hv+25Dlj/cIc/SfZTTQZAALaZxkTBcz90Jel0Pobt2Qd0tL/0JOYuelytzOGOxf4NBo4w= X-Received: by 2002:a05:6402:327:b0:523:2873:8323 with SMTP id q7-20020a056402032700b0052328738323mr325360edw.35.1691514009658; Tue, 08 Aug 2023 10:00:09 -0700 (PDT) MIME-Version: 1.0 Date: Tue, 8 Aug 2023 18:59:58 +0200 Message-ID: Subject: [committed] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832] To: "gcc-patches@gcc.gnu.org" Cc: Richard Biener , Hongtao Liu , Jan Hubicka X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Uros Bizjak via Gcc-patches From: Uros Bizjak Reply-To: Uros Bizjak Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Also introduce -m[no-]partial-vector-fp-math option to disable trapping V2SF named patterns in order to avoid generation of partial vector V4SFmode trapping instructions. The new option is enabled by default, because even with sanitization, a small but consistent speed up of 2 to 3% with Polyhedron capacita benchmark can be achieved vs. scalar code. Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9% vs. scalar code. This is what clang does by default, as it defaults to -fno-trapping-math. PR target/110832 gcc/ChangeLog: * config/i386/i386.opt (mpartial-vector-fp-math): New option. * config/i386/mmx.md (movq__to_sse): Do not sanitize upper part of V2SFmode register with -fno-trapping-math. (v2sf3): Enable for ix86_partial_vec_fp_math. (divv2sf3): Ditto. (v2sf3): Ditto. (sqrtv2sf2): Ditto. (*mmx_haddv2sf3_low): Ditto. (*mmx_hsubv2sf3_low): Ditto. (vec_addsubv2sf3): Ditto. (vec_cmpv2sfv2si): Ditto. (vcondv2sf): Ditto. (fmav2sf4): Ditto. (fmsv2sf4): Ditto. (fnmav2sf4): Ditto. (fnmsv2sf4): Ditto. (fix_truncv2sfv2si2): Ditto. (fixuns_truncv2sfv2si2): Ditto. (floatv2siv2sf2): Ditto. (floatunsv2siv2sf2): Ditto. (nearbyintv2sf2): Ditto. (rintv2sf2): Ditto. (lrintv2sfv2si2): Ditto. (ceilv2sf2): Ditto. (lceilv2sfv2si2): Ditto. (floorv2sf2): Ditto. (lfloorv2sfv2si2): Ditto. (btruncv2sf2): Ditto. (roundv2sf2): Ditto. (lroundv2sfv2si2): Ditto. * doc/invoke.texi (x86 Options): Document -mpartial-vector-fp-math option. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110832-1.c: New test. * gcc.target/i386/pr110832-2.c: New test. * gcc.target/i386/pr110832-3.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 1cc8563477a..2feabc1bf32 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -632,6 +632,10 @@ Enum(prefer_vector_width) String(256) Value(PVW_AVX256) EnumValue Enum(prefer_vector_width) String(512) Value(PVW_AVX512) +mpartial-vector-fp-math +Target Var(ix86_partial_vec_fp_math) Init(1) +Enable floating-point status flags setting SSE vector operations on partial vectors + mmove-max= Target RejectNegative Joined Var(ix86_move_max) Enum(prefer_vector_width) Init(PVW_NONE) Save Maximum number of bits that can be moved from memory to memory efficiently. diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index b49554e9b8f..d51b3b9dc71 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -595,7 +595,18 @@ (define_expand "movq__to_sse" (match_operand:V2FI_V4HF 1 "nonimmediate_operand") (match_dup 2)))] "TARGET_SSE2" - "operands[2] = CONST0_RTX (mode);") +{ + if (mode == V2SFmode + && !flag_trapping_math) + { + rtx op1 = force_reg (mode, operands[1]); + emit_move_insn (operands[0], lowpart_subreg (mode, + op1, mode)); + DONE; + } + + operands[2] = CONST0_RTX (mode); +}) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; @@ -648,7 +659,7 @@ (define_expand "v2sf3" (plusminusmult:V2SF (match_operand:V2SF 1 "nonimmediate_operand") (match_operand:V2SF 2 "nonimmediate_operand")))] - "TARGET_MMX_WITH_SSE" + "TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op2 = gen_reg_rtx (V4SFmode); rtx op1 = gen_reg_rtx (V4SFmode); @@ -726,7 +737,7 @@ (define_expand "divv2sf3" [(set (match_operand:V2SF 0 "register_operand") (div:V2SF (match_operand:V2SF 1 "register_operand") (match_operand:V2SF 2 "register_operand")))] - "TARGET_MMX_WITH_SSE" + "TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op2 = gen_reg_rtx (V4SFmode); rtx op1 = gen_reg_rtx (V4SFmode); @@ -748,7 +759,7 @@ (define_expand "v2sf3" (smaxmin:V2SF (match_operand:V2SF 1 "register_operand") (match_operand:V2SF 2 "register_operand")))] - "TARGET_MMX_WITH_SSE" + "TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op2 = gen_reg_rtx (V4SFmode); rtx op1 = gen_reg_rtx (V4SFmode); @@ -850,7 +861,7 @@ (define_insn "mmx_rcpit2v2sf3" (define_expand "sqrtv2sf2" [(set (match_operand:V2SF 0 "register_operand") (sqrt:V2SF (match_operand:V2SF 1 "nonimmediate_operand")))] - "TARGET_MMX_WITH_SSE" + "TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SFmode); @@ -931,7 +942,7 @@ (define_insn_and_split "*mmx_haddv2sf3_low" (vec_select:SF (match_dup 1) (parallel [(match_operand:SI 3 "const_0_to_1_operand")]))))] - "TARGET_SSE3 && TARGET_MMX_WITH_SSE + "TARGET_SSE3 && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math && INTVAL (operands[2]) != INTVAL (operands[3]) && ix86_pre_reload_split ()" "#" @@ -977,7 +988,7 @@ (define_insn_and_split "*mmx_hsubv2sf3_low" (vec_select:SF (match_dup 1) (parallel [(const_int 1)]))))] - "TARGET_SSE3 && TARGET_MMX_WITH_SSE + "TARGET_SSE3 && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math && ix86_pre_reload_split ()" "#" "&& 1" @@ -1039,7 +1050,7 @@ (define_expand "vec_addsubv2sf3" (match_operand:V2SF 2 "nonimmediate_operand")) (plus:V2SF (match_dup 1) (match_dup 2)) (const_int 1)))] - "TARGET_SSE3 && TARGET_MMX_WITH_SSE" + "TARGET_SSE3 && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op2 = gen_reg_rtx (V4SFmode); rtx op1 = gen_reg_rtx (V4SFmode); @@ -1102,7 +1113,7 @@ (define_expand "vec_cmpv2sfv2si" (match_operator:V2SI 1 "" [(match_operand:V2SF 2 "nonimmediate_operand") (match_operand:V2SF 3 "nonimmediate_operand")]))] - "TARGET_MMX_WITH_SSE" + "TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx ops[4]; ops[3] = gen_reg_rtx (V4SFmode); @@ -1128,7 +1139,7 @@ (define_expand "vcondv2sf" (match_operand:V2SF 5 "nonimmediate_operand")]) (match_operand:V2FI 1 "general_operand") (match_operand:V2FI 2 "general_operand")))] - "TARGET_MMX_WITH_SSE" + "TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx ops[6]; ops[5] = gen_reg_rtx (V4SFmode); @@ -1318,7 +1329,7 @@ (define_expand "fmav2sf4" (match_operand:V2SF 2 "nonimmediate_operand") (match_operand:V2SF 3 "nonimmediate_operand")))] "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL) - && TARGET_MMX_WITH_SSE" + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op3 = gen_reg_rtx (V4SFmode); rtx op2 = gen_reg_rtx (V4SFmode); @@ -1343,7 +1354,7 @@ (define_expand "fmsv2sf4" (neg:V2SF (match_operand:V2SF 3 "nonimmediate_operand"))))] "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL) - && TARGET_MMX_WITH_SSE" + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op3 = gen_reg_rtx (V4SFmode); rtx op2 = gen_reg_rtx (V4SFmode); @@ -1368,7 +1379,7 @@ (define_expand "fnmav2sf4" (match_operand:V2SF 2 "nonimmediate_operand") (match_operand:V2SF 3 "nonimmediate_operand")))] "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL) - && TARGET_MMX_WITH_SSE" + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op3 = gen_reg_rtx (V4SFmode); rtx op2 = gen_reg_rtx (V4SFmode); @@ -1394,7 +1405,7 @@ (define_expand "fnmsv2sf4" (neg:V2SF (match_operand:V2SF 3 "nonimmediate_operand"))))] "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL) - && TARGET_MMX_WITH_SSE" + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op3 = gen_reg_rtx (V4SFmode); rtx op2 = gen_reg_rtx (V4SFmode); @@ -1420,7 +1431,7 @@ (define_expand "fnmsv2sf4" (define_expand "fix_truncv2sfv2si2" [(set (match_operand:V2SI 0 "register_operand") (fix:V2SI (match_operand:V2SF 1 "nonimmediate_operand")))] - "TARGET_MMX_WITH_SSE" + "TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SImode); @@ -1436,7 +1447,7 @@ (define_expand "fix_truncv2sfv2si2" (define_expand "fixuns_truncv2sfv2si2" [(set (match_operand:V2SI 0 "register_operand") (unsigned_fix:V2SI (match_operand:V2SF 1 "nonimmediate_operand")))] - "TARGET_AVX512VL && TARGET_MMX_WITH_SSE" + "TARGET_AVX512VL && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SImode); @@ -1461,7 +1472,7 @@ (define_insn "mmx_fix_truncv2sfv2si2" (define_expand "floatv2siv2sf2" [(set (match_operand:V2SF 0 "register_operand") (float:V2SF (match_operand:V2SI 1 "nonimmediate_operand")))] - "TARGET_MMX_WITH_SSE" + "TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SImode); rtx op0 = gen_reg_rtx (V4SFmode); @@ -1477,7 +1488,7 @@ (define_expand "floatv2siv2sf2" (define_expand "floatunsv2siv2sf2" [(set (match_operand:V2SF 0 "register_operand") (unsigned_float:V2SF (match_operand:V2SI 1 "nonimmediate_operand")))] - "TARGET_AVX512VL && TARGET_MMX_WITH_SSE" + "TARGET_AVX512VL && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SImode); rtx op0 = gen_reg_rtx (V4SFmode); @@ -1754,7 +1765,7 @@ (define_expand "vec_initv2sfsf" (define_expand "nearbyintv2sf2" [(match_operand:V2SF 0 "register_operand") (match_operand:V2SF 1 "nonimmediate_operand")] - "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SFmode); @@ -1770,7 +1781,7 @@ (define_expand "nearbyintv2sf2" (define_expand "rintv2sf2" [(match_operand:V2SF 0 "register_operand") (match_operand:V2SF 1 "nonimmediate_operand")] - "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SFmode); @@ -1786,8 +1797,8 @@ (define_expand "rintv2sf2" (define_expand "lrintv2sfv2si2" [(match_operand:V2SI 0 "register_operand") (match_operand:V2SF 1 "nonimmediate_operand")] - "TARGET_SSE4_1 && !flag_trapping_math - && TARGET_MMX_WITH_SSE" + "TARGET_SSE4_1 && !flag_trapping_math + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SImode); @@ -1804,7 +1815,7 @@ (define_expand "ceilv2sf2" [(match_operand:V2SF 0 "register_operand") (match_operand:V2SF 1 "nonimmediate_operand")] "TARGET_SSE4_1 && !flag_trapping_math - && TARGET_MMX_WITH_SSE" + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SFmode); @@ -1820,8 +1831,8 @@ (define_expand "ceilv2sf2" (define_expand "lceilv2sfv2si2" [(match_operand:V2SI 0 "register_operand") (match_operand:V2SF 1 "nonimmediate_operand")] - "TARGET_SSE4_1 && !flag_trapping_math - && TARGET_MMX_WITH_SSE" + "TARGET_SSE4_1 && !flag_trapping_math + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SImode); @@ -1838,7 +1849,7 @@ (define_expand "floorv2sf2" [(match_operand:V2SF 0 "register_operand") (match_operand:V2SF 1 "nonimmediate_operand")] "TARGET_SSE4_1 && !flag_trapping_math - && TARGET_MMX_WITH_SSE" + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SFmode); @@ -1854,8 +1865,8 @@ (define_expand "floorv2sf2" (define_expand "lfloorv2sfv2si2" [(match_operand:V2SI 0 "register_operand") (match_operand:V2SF 1 "nonimmediate_operand")] - "TARGET_SSE4_1 && !flag_trapping_math - && TARGET_MMX_WITH_SSE" + "TARGET_SSE4_1 && !flag_trapping_math + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SImode); @@ -1872,7 +1883,7 @@ (define_expand "btruncv2sf2" [(match_operand:V2SF 0 "register_operand") (match_operand:V2SF 1 "nonimmediate_operand")] "TARGET_SSE4_1 && !flag_trapping_math - && TARGET_MMX_WITH_SSE" + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SFmode); @@ -1889,7 +1900,7 @@ (define_expand "roundv2sf2" [(match_operand:V2SF 0 "register_operand") (match_operand:V2SF 1 "nonimmediate_operand")] "TARGET_SSE4_1 && !flag_trapping_math - && TARGET_MMX_WITH_SSE" + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SFmode); @@ -1905,8 +1916,8 @@ (define_expand "roundv2sf2" (define_expand "lroundv2sfv2si2" [(match_operand:V2SI 0 "register_operand") (match_operand:V2SF 1 "nonimmediate_operand")] - "TARGET_SSE4_1 && !flag_trapping_math - && TARGET_MMX_WITH_SSE" + "TARGET_SSE4_1 && !flag_trapping_math + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" { rtx op1 = gen_reg_rtx (V4SFmode); rtx op0 = gen_reg_rtx (V4SImode); diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 674f956f4b8..38c9b4e2fb7 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1419,6 +1419,7 @@ See RS/6000 and PowerPC Options. -mcld -mcx16 -msahf -mmovbe -mcrc32 -mmwait -mrecip -mrecip=@var{opt} -mvzeroupper -mprefer-avx128 -mprefer-vector-width=@var{opt} +-mpartial-vector-fp-math -mmove-max=@var{bits} -mstore-max=@var{bits} -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx -mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -mavx512vl @@ -33754,6 +33755,23 @@ This option instructs GCC to use 128-bit AVX instructions instead of This option instructs GCC to use @var{opt}-bit vector width in instructions instead of default on the selected platform. +@opindex mpartial-vector-fp-math +@item -mpartial-vector-fp-math +This option enables GCC to generate floating-point operations that might +affect the set of floating-point status flags on partial vectors, where +vector elements reside in the low part of the 128-bit SSE register. Unless +@option{-fno-trapping-math} is specified, the compiler guarantees correct +behavior by sanitizing all input operands to have zeroes in the unused +upper part of the vector register. Note that by using built-in functions +or inline assembly with partial vector arguments, NaNs, denormal or invalid +values can leak into the upper part of the vector, causing possible +performance issues when @option{-fno-trapping-math} is in effect. These +issues can be mitigated by manually sanitizing the upper part of the partial +vector argument register or by using @option{-mdaz-ftz} to set +denormals-are-zero (DAZ) flag in the MXCSR register. + +This option is enabled by default. + @opindex mmove-max @item -mmove-max=@var{bits} This option instructs GCC to set the maximum number of bits can be diff --git a/gcc/testsuite/gcc.target/i386/pr110832-1.c b/gcc/testsuite/gcc.target/i386/pr110832-1.c new file mode 100644 index 00000000000..f473e40e23a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr110832-1.c @@ -0,0 +1,12 @@ +/* PR target/110832 */ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -msse2 -mno-partial-vector-fp-math" } */ + +typedef float __attribute__((vector_size(8))) v2sf; + +v2sf test (v2sf a, v2sf b) +{ + return a + b; +} + +/* { dg-final { scan-assembler-not "addps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr110832-2.c b/gcc/testsuite/gcc.target/i386/pr110832-2.c new file mode 100644 index 00000000000..59cf8f9c666 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr110832-2.c @@ -0,0 +1,13 @@ +/* PR target/110832 */ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -ftrapping-math -msse2 -mpartial-vector-fp-math -dp" } */ + +typedef float __attribute__((vector_size(8))) v2sf; + +v2sf test (v2sf a, v2sf b) +{ + return a + b; +} + +/* { dg-final { scan-assembler "addps" } } */ +/* { dg-final { scan-assembler-times "\\*vec_concatv4sf_0" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr110832-3.c b/gcc/testsuite/gcc.target/i386/pr110832-3.c new file mode 100644 index 00000000000..19e219e1a11 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr110832-3.c @@ -0,0 +1,13 @@ +/* PR target/110832 */ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -fno-trapping-math -msse2 -mpartial-vector-fp-math -dp" } */ + +typedef float __attribute__((vector_size(8))) v2sf; + +v2sf test (v2sf a, v2sf b) +{ + return a + b; +} + +/* { dg-final { scan-assembler "addps" } } */ +/* { dg-final { scan-assembler-not "\\*vec_concatv4sf_0" } } */