From patchwork Fri Apr 23 15:28:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 1469652 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Adlm4cop; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FRdVr3QsQz9sSs for ; Sat, 24 Apr 2021 01:29:07 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 50219393D02B; Fri, 23 Apr 2021 15:29:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 50219393D02B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1619191745; bh=wsLRWJCILkuAwXO3HD8KpKmJVm82O3YOAxVFT2FLmP0=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Adlm4copqqxWQiCvrhfFfz1JyvUsDAftCwgvwyBAsyRdKQi0HzzWeniDe3mziY/D7 S1P1cHNVOHrv2sOIGsuh0/bOgUY9z61QvRemptY8RxkGssDLeA3VI/InEAzRTticBq K56kL1oaMJoapgC11pFDcV5VaE1T7CvnE1Gr+E2M= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by sourceware.org (Postfix) with ESMTPS id 94A36393D006 for ; Fri, 23 Apr 2021 15:29:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 94A36393D006 Received: by mail-qt1-x835.google.com with SMTP id c6so36678767qtc.1 for ; Fri, 23 Apr 2021 08:29:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=wsLRWJCILkuAwXO3HD8KpKmJVm82O3YOAxVFT2FLmP0=; b=ZqbGI07YNLKOi/SDtMK8+nBOw82AMg+NazYR0z295T989blXPjQV8hMA6FanEiQ25S YsgLfoUakVhssRR7N65eTNwWs2ZlRir77ep1qEXFvu6FxFPGbgnhqKv8IZihozTxMlFC PoHv1/t+lvbxquaMqE5WjsPfvFTOh0M/3dtj1ff32MEWykZ3xnWTMs39nAjU4LsWppG4 10jRlZbl1JwZh+6bKH1w2FxRmhXhUgW9s3PzYoCa5C8zLhM1wdYOfZoangpLmofDc9eH p/KjY4k9AxZapLLmnADMM4gal4HUl3mmAK4v07IPwMDBv/0PZSmaFWyMNk7UC16fHLXX 5wsA== X-Gm-Message-State: AOAM530IMpW9ZV1y5VRBlruEG+YccJenvgI5YzP3y7Nu8Er8bJoBFKVp j+qSHRjQgtl07mh+fTS0azd756C5e3NBn6NDjNxLwvRdgI9HGg== X-Google-Smtp-Source: ABdhPJz9wzqLfy8bBpGPCERSUvDdsooAOeoI4rpSogId1n6MmZHyneeV0/GojbO+MRexI2ZUsoypjSUaHr9RfIO0BBg= X-Received: by 2002:a05:622a:243:: with SMTP id c3mr4325329qtx.225.1619191739810; Fri, 23 Apr 2021 08:28:59 -0700 (PDT) MIME-Version: 1.0 Date: Fri, 23 Apr 2021 17:28:48 +0200 Message-ID: Subject: i386: Fix atomic FP peepholes [PR100182] To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Uros Bizjak via Gcc-patches From: Uros Bizjak Reply-To: Uros Bizjak Cc: Jakub Jelinek Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" 64bit loads to/stores from x87 and SSE registers are atomic also on 32-bit targets, so there is no need for additional atomic moves to a temporary register. Introduced load peephole2 patterns assume that there won't be any additional loads from the load location outside the peepholed sequence and wrongly removed the source location initialization. OTOH, introduced store peephole2 patterns assume there won't be any additional loads from the stored location outside the peepholed sequence and wrongly removed the destination location initialization. Note that we can't use plain x87 FST instruction to initialize destination location because FST converts the value to the double-precision format, changing bits during move. The patch restores removed initializations in load and store patterns. Additionally, plain x87 FST in store peephole2 patterns is prevented by limiting the store operand source to SSE registers. 2021-04-23 Uroš Bizjak gcc/ PR target/100182 * config/i386/sync.md (FILD_ATOMIC/FIST_ATOMIC FP load peephole2): Copy operand 3 to operand 4. Use sse_reg_operand as operand 3 predicate. (FILD_ATOMIC/FIST_ATOMIC FP load peephole2 with mem blockage): Ditto. (LDX_ATOMIC/STX_ATOMIC FP load peephole2): Ditto. (LDX_ATOMIC/LDX_ATOMIC FP load peephole2 with mem blockage): Ditto. (FILD_ATOMIC/FIST_ATOMIC FP store peephole2): Copy operand 1 to operand 0. (FILD_ATOMIC/FIST_ATOMIC FP store peephole2 with mem blockage): Ditto. (LDX_ATOMIC/STX_ATOMIC FP store peephole2): Ditto. (LDX_ATOMIC/LDX_ATOMIC FP store peephole2 with mem blockage): Ditto. gcc/testsuite/ PR target/100182 * gcc.target/i386/pr100182.c: New test. * gcc.target/i386/pr71245-1.c (dg-final): Xfail scan-assembler-not. * gcc.target/i386/pr71245-2.c (dg-final): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to mainline, will be pushed to other release branches once gcc-11 opens. Uros. diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index c7c508c8de8..7913b918796 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -226,12 +226,13 @@ (set (match_operand:DI 2 "memory_operand") (unspec:DI [(match_dup 0)] UNSPEC_FIST_ATOMIC)) - (set (match_operand:DF 3 "any_fp_register_operand") + (set (match_operand:DF 3 "sse_reg_operand") (match_operand:DF 4 "memory_operand"))] "!TARGET_64BIT && peep2_reg_dead_p (2, operands[0]) && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))" - [(set (match_dup 3) (match_dup 5))] + [(set (match_dup 3) (match_dup 5)) + (set (match_dup 4) (match_dup 3))] "operands[5] = gen_lowpart (DFmode, operands[1]);") (define_peephole2 @@ -243,7 +244,7 @@ UNSPEC_FIST_ATOMIC)) (set (mem:BLK (scratch:SI)) (unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE)) - (set (match_operand:DF 3 "any_fp_register_operand") + (set (match_operand:DF 3 "sse_reg_operand") (match_operand:DF 4 "memory_operand"))] "!TARGET_64BIT && peep2_reg_dead_p (2, operands[0]) @@ -251,6 +252,7 @@ [(const_int 0)] { emit_move_insn (operands[3], gen_lowpart (DFmode, operands[1])); + emit_move_insn (operands[4], operands[3]); emit_insn (gen_memory_blockage ()); DONE; }) @@ -262,12 +264,13 @@ (set (match_operand:DI 2 "memory_operand") (unspec:DI [(match_dup 0)] UNSPEC_STX_ATOMIC)) - (set (match_operand:DF 3 "any_fp_register_operand") + (set (match_operand:DF 3 "sse_reg_operand") (match_operand:DF 4 "memory_operand"))] "!TARGET_64BIT && peep2_reg_dead_p (2, operands[0]) && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))" - [(set (match_dup 3) (match_dup 5))] + [(set (match_dup 3) (match_dup 5)) + (set (match_dup 4) (match_dup 3))] "operands[5] = gen_lowpart (DFmode, operands[1]);") (define_peephole2 @@ -279,7 +282,7 @@ UNSPEC_STX_ATOMIC)) (set (mem:BLK (scratch:SI)) (unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE)) - (set (match_operand:DF 3 "any_fp_register_operand") + (set (match_operand:DF 3 "sse_reg_operand") (match_operand:DF 4 "memory_operand"))] "!TARGET_64BIT && peep2_reg_dead_p (2, operands[0]) @@ -287,6 +290,7 @@ [(const_int 0)] { emit_move_insn (operands[3], gen_lowpart (DFmode, operands[1])); + emit_move_insn (operands[4], operands[3]); emit_insn (gen_memory_blockage ()); DONE; }) @@ -392,7 +396,8 @@ "!TARGET_64BIT && peep2_reg_dead_p (3, operands[2]) && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))" - [(set (match_dup 5) (match_dup 1))] + [(set (match_dup 0) (match_dup 1)) + (set (match_dup 5) (match_dup 1))] "operands[5] = gen_lowpart (DFmode, operands[4]);") (define_peephole2 @@ -411,6 +416,7 @@ && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))" [(const_int 0)] { + emit_move_insn (operands[0], operands[1]); emit_insn (gen_memory_blockage ()); emit_move_insn (gen_lowpart (DFmode, operands[4]), operands[1]); DONE; @@ -428,7 +434,8 @@ "!TARGET_64BIT && peep2_reg_dead_p (3, operands[2]) && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))" - [(set (match_dup 5) (match_dup 1))] + [(set (match_dup 0) (match_dup 1)) + (set (match_dup 5) (match_dup 1))] "operands[5] = gen_lowpart (DFmode, operands[4]);") (define_peephole2 @@ -447,6 +454,7 @@ && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))" [(const_int 0)] { + emit_move_insn (operands[0], operands[1]); emit_insn (gen_memory_blockage ()); emit_move_insn (gen_lowpart (DFmode, operands[4]), operands[1]); DONE; diff --git a/gcc/testsuite/gcc.target/i386/pr100182.c b/gcc/testsuite/gcc.target/i386/pr100182.c new file mode 100644 index 00000000000..2f92a04db73 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100182.c @@ -0,0 +1,30 @@ +/* { dg-do run { target ia32 } } */ +/* { dg-options "-O2 -march=i686" } */ + +struct S { double _M_fp; }; +union U { double d; unsigned long long int l; }; + +void +__attribute__((noipa)) +foo (void) +{ + struct S a0, a1; + union U u; + double d0, d1; + a0._M_fp = 0.0; + a1._M_fp = 1.0; + __atomic_store_8 (&a0._M_fp, __atomic_load_8 (&a1._M_fp, __ATOMIC_SEQ_CST), __ATOMIC_SEQ_CST); + u.l = __atomic_load_8 (&a0._M_fp, __ATOMIC_SEQ_CST); + d0 = u.d; + u.l = __atomic_load_8 (&a1._M_fp, __ATOMIC_SEQ_CST); + d1 = u.d; + if (d0 != d1) + __builtin_abort (); +} + +int +main () +{ + foo (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr71245-1.c b/gcc/testsuite/gcc.target/i386/pr71245-1.c index be0b7602a8c..02c0dcb80b6 100644 --- a/gcc/testsuite/gcc.target/i386/pr71245-1.c +++ b/gcc/testsuite/gcc.target/i386/pr71245-1.c @@ -19,4 +19,4 @@ void foo_d (void) __atomic_store_n (&d.ll, tmp.ll, __ATOMIC_SEQ_CST); } -/* { dg-final { scan-assembler-not "(fistp|fild)" } } */ +/* { dg-final { scan-assembler-not "(fistp|fild)" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr71245-2.c b/gcc/testsuite/gcc.target/i386/pr71245-2.c index 65c139849d5..bf37a8cbb71 100644 --- a/gcc/testsuite/gcc.target/i386/pr71245-2.c +++ b/gcc/testsuite/gcc.target/i386/pr71245-2.c @@ -19,4 +19,4 @@ void foo_d (void) __atomic_store_n (&d.ll, tmp.ll, __ATOMIC_SEQ_CST); } -/* { dg-final { scan-assembler-not "movlps" } } */ +/* { dg-final { scan-assembler-not "movlps" { xfail *-*-* } } } */