From patchwork Thu Sep 22 13:41:50 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Preudhomme X-Patchwork-Id: 673406 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3sfyN55Yvzz9srZ for ; Thu, 22 Sep 2016 23:42:13 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=W0acYytb; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=Z2OdwcqiQ2wOg9PuD44J4TzKsgnYJPifkCHeun96Wae/1WlsB4 sGQ8zf31oeUz1OYr9SF1hMrGxaySISBLbqcu/2r1W3uBlPopK0+Ri3xBrfAP42uO ij8r3+BSGv2KR2gIHm9ns8iqy1/1gsceVenJ9244wJzWruerc0YZ2E5sI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=6xjTXf1l8usykO6sn1MZaj6qQac=; b=W0acYytbE0J6twWrclw0 VMa1bDyhScqfsN0m1M7gUTMa0GnDu+52sazvHvywaV82gbxfXL+S8IMQ149a1Qr8 5KLXj9spj1lyvriXOYuDpiZ5IXUK3Rhx8sUs9X+K5y0RIfBCTy80eOThE/+sZZJ1 dKgdpTEe0DKZoCh64La9qtQ= Received: (qmail 90015 invoked by alias); 22 Sep 2016 13:42:05 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 90001 invoked by uid 89); 22 Sep 2016 13:42:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.0 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=rr, match_code, 4077, unspec_volatile X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 22 Sep 2016 13:41:53 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A2E3117C; Thu, 22 Sep 2016 06:41:52 -0700 (PDT) Received: from [10.2.206.52] (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E2EF63F21A; Thu, 22 Sep 2016 06:41:51 -0700 (PDT) To: "gcc-patches@gcc.gnu.org" , Kyrill Tkachov , Ramana Radhakrishnan , Richard Earnshaw From: Thomas Preudhomme Subject: [PATCH, ARM 2/7] Adapt atomic and exclusive load and store to ARMv8-M Baseline Message-ID: <83cc150f-31dc-5b35-7eec-e672a9c3943d@foss.arm.com> Date: Thu, 22 Sep 2016 14:41:50 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 X-IsSubscribed: yes Hi, This patch is part of a patch series to add support for atomic operations on ARMv8-M Baseline targets in GCC. This specific patch adapts atomic and exclusive load and store patterns to the constraints of ARMv8-M Baseline. It consists of two sets of changes: - adding non predicated output templates because ARMv8-M Baseline does not have IT instruction - use low registers for ldr/str Together these changes require to create 2 new alternatives for atomic_load and atomic_store: (i) one for relaxed, consume and release memory model (the new Pf constraint) where ldr/str are used and thus low registers must be used and (ii) another one for the other memory model where lda/stl are used. These are separate from the constraint for 32bit targets whose output templates expect predication. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-07-05 Thomas Preud'homme * config/arm/constraints.md (Q constraint): Document its use for Thumb-1. (Pf constraint): New constraint for relaxed, consume or relaxed memory models. * config/arm/sync.md (atomic_load): Add new ARMv8-M Baseline only alternatives to allow any register when memory model matches Pf and thus lda is used, but only low registers otherwise. Use unpredicated output template for Thumb-1 targets. (atomic_store): Likewise for stl. (arm_load_exclusive): Add new ARMv8-M Baseline only alternative whose output template does not have predication. (arm_load_acquire_exclusive): Likewise. (arm_load_exclusivesi): Likewise. (arm_load_acquire_exclusivesi): Likewise. (arm_store_release_exclusive): Likewise. (arm_store_exclusive): Use unpredicated output template for Thumb-1 targets. Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all atomic and synchronization testcases in the testsuite [2]. Patchset was also bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at optimization level -O1 and above [1] without any regression in the testsuite and no code generation difference in libitm and libgomp. Code generation for ARMv8-M Baseline has been manually examined and compared against ARMv8-A Thumb-2 for the following configuration without finding any issue: gcc.dg/atomic-op-2.c at -Os gcc.dg/atomic-compare-exchange-2.c at -Os gcc.dg/atomic-compare-exchange-3.c at -O3 Is this ok for trunk? Best regards, Thomas [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and undefined ("-O2 -g") [2] The exact list is: gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c gcc/testsuite/gcc.dg/atomic-exchange-1.c gcc/testsuite/gcc.dg/atomic-exchange-2.c gcc/testsuite/gcc.dg/atomic-exchange-3.c gcc/testsuite/gcc.dg/atomic-fence.c gcc/testsuite/gcc.dg/atomic-flag.c gcc/testsuite/gcc.dg/atomic-generic.c gcc/testsuite/gcc.dg/atomic-generic-aux.c gcc/testsuite/gcc.dg/atomic-invalid-2.c gcc/testsuite/gcc.dg/atomic-load-1.c gcc/testsuite/gcc.dg/atomic-load-2.c gcc/testsuite/gcc.dg/atomic-load-3.c gcc/testsuite/gcc.dg/atomic-lockfree.c gcc/testsuite/gcc.dg/atomic-lockfree-aux.c gcc/testsuite/gcc.dg/atomic-noinline.c gcc/testsuite/gcc.dg/atomic-noinline-aux.c gcc/testsuite/gcc.dg/atomic-op-1.c gcc/testsuite/gcc.dg/atomic-op-2.c gcc/testsuite/gcc.dg/atomic-op-3.c gcc/testsuite/gcc.dg/atomic-op-6.c gcc/testsuite/gcc.dg/atomic-store-1.c gcc/testsuite/gcc.dg/atomic-store-2.c gcc/testsuite/gcc.dg/atomic-store-3.c gcc/testsuite/g++.dg/ext/atomic-1.C gcc/testsuite/g++.dg/ext/atomic-2.C gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c gcc/testsuite/gcc.target/arm/atomic-op-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-char.c gcc/testsuite/gcc.target/arm/atomic-op-consume.c gcc/testsuite/gcc.target/arm/atomic-op-int.c gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c gcc/testsuite/gcc.target/arm/atomic-op-release.c gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c gcc/testsuite/gcc.target/arm/atomic-op-short.c gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c gcc/testsuite/gcc.target/arm/sync-1.c gcc/testsuite/gcc.target/arm/synchronize.c gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c libstdc++-v3/testsuite/29_atomics/atomic/60658.cc libstdc++-v3/testsuite/29_atomics/atomic/62259.cc libstdc++-v3/testsuite/29_atomics/atomic/64658.cc libstdc++-v3/testsuite/29_atomics/atomic/65147.cc libstdc++-v3/testsuite/29_atomics/atomic/65913.cc libstdc++-v3/testsuite/29_atomics/atomic/70766.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/49445.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/constexpr.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/copy_list.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/default.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/direct_list.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/single_value.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/user_pod.cc libstdc++-v3/testsuite/29_atomics/atomic/operators/51811.cc libstdc++-v3/testsuite/29_atomics/atomic/operators/56011.cc libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_assignment.cc libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_conversion.cc libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc libstdc++-v3/testsuite/29_atomics/atomic/requirements/base_classes.cc libstdc++-v3/testsuite/29_atomics/atomic/requirements/compare_exchange_lowering.cc libstdc++-v3/testsuite/29_atomics/atomic/requirements/explicit_instantiation/1.cc libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/1.cc libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/56012.cc libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/aggregate.cc libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/default.cc libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/standard_layout.cc libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/trivial.cc libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/60940.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/65147.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/constexpr.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/copy_list.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/default.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/direct_list.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/single_value.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/bitwise.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/decrement.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/increment.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_assignment.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_conversion.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/standard_layout.cc libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/trivial.cc libstdc++-v3/testsuite/29_atomics/headers/atomic/functions_std_c++0x.cc libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc libstdc++-v3/testsuite/29_atomics/headers/atomic/types_std_c++0x.cc diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index 4ece5f013c92adee04157b5c909e1d47c894c994..65098ceeb1a66174b345bcfb0688152f9f137150 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -34,11 +34,13 @@ ;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Do, Dv, Dy, Di, Dt, Dp, Dz ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py +;; in all states: Pf ;; The following memory constraints have been used: -;; in ARM/Thumb-2 state: Q, Uh, Ut, Uv, Uy, Un, Um, Us +;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us ;; in ARM state: Uq ;; in Thumb state: Uu, Uw +;; in all states: Q (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS" @@ -180,6 +182,13 @@ (and (match_code "const_int") (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510"))) +(define_constraint "Pf" + "Memory models except relaxed, consume or release ones." + (and (match_code "const_int") + (match_test "!is_mm_relaxed (memmodel_from_int (ival)) + && !is_mm_consume (memmodel_from_int (ival)) + && !is_mm_release (memmodel_from_int (ival))"))) + (define_constraint "Ps" "@internal In Thumb-2 state a constant in the range -255 to +255" (and (match_code "const_int") @@ -407,7 +416,7 @@ (define_memory_constraint "Q" "@internal - In ARM/Thumb-2 state an address that is a single base register." + An address that is a single base register." (and (match_code "mem") (match_test "REG_P (XEXP (op, 0))"))) diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md index d10ede4175f94e627a23bf32d19d2b5f3de76771..d36c24f76f670d7602f766d7172286504faa7af5 100644 --- a/gcc/config/arm/sync.md +++ b/gcc/config/arm/sync.md @@ -63,37 +63,59 @@ (set_attr "predicable" "no")]) (define_insn "atomic_load" - [(set (match_operand:QHSI 0 "register_operand" "=r") + [(set (match_operand:QHSI 0 "register_operand" "=r,r,l") (unspec_volatile:QHSI - [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q") - (match_operand:SI 2 "const_int_operand")] ;; model + [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q") + (match_operand:SI 2 "const_int_operand" "n,Pf,n")] ;; model VUNSPEC_LDA))] "TARGET_HAVE_LDACQ" { enum memmodel model = memmodel_from_int (INTVAL (operands[2])); if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_release (model)) - return \"ldr%?\\t%0, %1\"; + { + if (TARGET_THUMB1) + return \"ldr\\t%0, %1\"; + else + return \"ldr%?\\t%0, %1\"; + } else - return \"lda%?\\t%0, %1\"; + { + if (TARGET_THUMB1) + return \"lda\\t%0, %1\"; + else + return \"lda%?\\t%0, %1\"; + } } - [(set_attr "predicable" "yes") + [(set_attr "arch" "32,v8mb,any") + (set_attr "predicable" "yes") (set_attr "predicable_short_it" "no")]) (define_insn "atomic_store" - [(set (match_operand:QHSI 0 "memory_operand" "=Q") + [(set (match_operand:QHSI 0 "memory_operand" "=Q,Q,Q") (unspec_volatile:QHSI - [(match_operand:QHSI 1 "general_operand" "r") - (match_operand:SI 2 "const_int_operand")] ;; model + [(match_operand:QHSI 1 "general_operand" "r,r,l") + (match_operand:SI 2 "const_int_operand" "n,Pf,n")] ;; model VUNSPEC_STL))] "TARGET_HAVE_LDACQ" { enum memmodel model = memmodel_from_int (INTVAL (operands[2])); if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_acquire (model)) - return \"str%?\t%1, %0\"; + { + if (TARGET_THUMB1) + return \"str\t%1, %0\"; + else + return \"str%?\t%1, %0\"; + } else - return \"stl%?\t%1, %0\"; + { + if (TARGET_THUMB1) + return \"stl\t%1, %0\"; + else + return \"stl%?\t%1, %0\"; + } } - [(set_attr "predicable" "yes") + [(set_attr "arch" "32,v8mb,any") + (set_attr "predicable" "yes") (set_attr "predicable_short_it" "no")]) ;; An LDRD instruction usable by the atomic_loaddi expander on LPAE targets @@ -380,45 +402,57 @@ }) (define_insn "arm_load_exclusive" - [(set (match_operand:SI 0 "s_register_operand" "=r") + [(set (match_operand:SI 0 "s_register_operand" "=r,r") (zero_extend:SI (unspec_volatile:NARROW - [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")] + [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")] VUNSPEC_LL)))] "TARGET_HAVE_LDREXBH" - "ldrex%?\t%0, %C1" - [(set_attr "predicable" "yes") + "@ + ldrex%?\t%0, %C1 + ldrex\t%0, %C1" + [(set_attr "arch" "32,v8mb") + (set_attr "predicable" "yes") (set_attr "predicable_short_it" "no")]) (define_insn "arm_load_acquire_exclusive" - [(set (match_operand:SI 0 "s_register_operand" "=r") + [(set (match_operand:SI 0 "s_register_operand" "=r,r") (zero_extend:SI (unspec_volatile:NARROW - [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")] + [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")] VUNSPEC_LAX)))] "TARGET_HAVE_LDACQ" - "ldaex%?\\t%0, %C1" - [(set_attr "predicable" "yes") + "@ + ldaex%?\\t%0, %C1 + ldaex\\t%0, %C1" + [(set_attr "arch" "32,v8mb") + (set_attr "predicable" "yes") (set_attr "predicable_short_it" "no")]) (define_insn "arm_load_exclusivesi" - [(set (match_operand:SI 0 "s_register_operand" "=r") + [(set (match_operand:SI 0 "s_register_operand" "=r,r") (unspec_volatile:SI - [(match_operand:SI 1 "mem_noofs_operand" "Ua")] + [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")] VUNSPEC_LL))] "TARGET_HAVE_LDREX" - "ldrex%?\t%0, %C1" - [(set_attr "predicable" "yes") + "@ + ldrex%?\t%0, %C1 + ldrex\t%0, %C1" + [(set_attr "arch" "32,v8mb") + (set_attr "predicable" "yes") (set_attr "predicable_short_it" "no")]) (define_insn "arm_load_acquire_exclusivesi" - [(set (match_operand:SI 0 "s_register_operand" "=r") + [(set (match_operand:SI 0 "s_register_operand" "=r,r") (unspec_volatile:SI - [(match_operand:SI 1 "mem_noofs_operand" "Ua")] + [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")] VUNSPEC_LAX))] "TARGET_HAVE_LDACQ" - "ldaex%?\t%0, %C1" - [(set_attr "predicable" "yes") + "@ + ldaex%?\t%0, %C1 + ldaex\t%0, %C1" + [(set_attr "arch" "32,v8mb") + (set_attr "predicable" "yes") (set_attr "predicable_short_it" "no")]) (define_insn "arm_load_exclusivedi" @@ -460,7 +494,10 @@ gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2); return "strexd%?\t%0, %2, %H2, %C1"; } - return "strex%?\t%0, %2, %C1"; + if (TARGET_THUMB1) + return "strex\t%0, %2, %C1"; + else + return "strex%?\t%0, %2, %C1"; } [(set_attr "predicable" "yes") (set_attr "predicable_short_it" "no")]) @@ -482,13 +519,16 @@ (set_attr "predicable_short_it" "no")]) (define_insn "arm_store_release_exclusive" - [(set (match_operand:SI 0 "s_register_operand" "=&r") + [(set (match_operand:SI 0 "s_register_operand" "=&r,&r") (unspec_volatile:SI [(const_int 0)] VUNSPEC_SLX)) - (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua") + (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua,Ua") (unspec_volatile:QHSI - [(match_operand:QHSI 2 "s_register_operand" "r")] + [(match_operand:QHSI 2 "s_register_operand" "r,r")] VUNSPEC_SLX))] "TARGET_HAVE_LDACQ" - "stlex%?\t%0, %2, %C1" - [(set_attr "predicable" "yes") + "@ + stlex%?\t%0, %2, %C1 + stlex\t%0, %2, %C1" + [(set_attr "arch" "32,v8mb") + (set_attr "predicable" "yes") (set_attr "predicable_short_it" "no")])