From patchwork Mon Jul 24 20:35:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1812059 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=c3RoQbkd; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4R8sPt67fFz1yYC for ; Tue, 25 Jul 2023 06:36:10 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7A574385700D for ; Mon, 24 Jul 2023 20:36:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7A574385700D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1690230965; bh=8QNQsw4IVPy7MjRwnmjkPhjjKMSxvy51MrwqepMhqIM=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=c3RoQbkdaWOw/3xxRel7J3R4+LnYDsmeDrEBFcNh8NcG4pxp+jMXVBXTpXYEEfVxt nfxGItuHdM/W0/bqeJzYUBcyDAEL3iMwP2ucE+9Ep32bFuSbGNxCkasyifGr3bryqt 59s2+6mofyZiDhxt+sBNyOI45iE4DLFM4zS7njYA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by sourceware.org (Postfix) with ESMTPS id A71393858C5F for ; Mon, 24 Jul 2023 20:35:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A71393858C5F Received: by mail-pf1-x42f.google.com with SMTP id d2e1a72fcca58-666ecb21f86so4508059b3a.3 for ; Mon, 24 Jul 2023 13:35:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690230939; x=1690835739; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8QNQsw4IVPy7MjRwnmjkPhjjKMSxvy51MrwqepMhqIM=; b=j1A6v5gmRjoiB1PHzyeT2XGkApxch2pArmu8cJFu8AJTba/NarqQ4F/yqOMrOJSq2y XDG4TRMd+UgZmbcKAHIMCko8du9u20gEKBtH1bffQ4TujOYbojJjllnjBQn0SOIywWtx DiOy8StIUCIadvU7Nyy2c+nM6s1RsDg0KNibxaEu3kG9l5tBqUs20BnKui4Xbw0agnHo 4hPqMQ6yZD9c86NBwd6l5M+5uzRmkxR+MQEUtnTwZFv7dpTp3bImQrSA3zaZxvL2SMPC /+HOIbnhHerAKpWXpctfbE/xZAVcmgToT1a5q3qLf7+o7/m2GiAtHgbV0kVbfwIkoTsn L5JA== X-Gm-Message-State: ABy/qLba3qzABmK7UbFgjCikL6TBUZETkLDs3wCwNATlZPSQJ8mXlm/f ECojPskafAnwZsVaJ5btpu79ZKXRfMk= X-Google-Smtp-Source: APBJJlFe8zktwtkdKjXYDzLnSQxAdF61aBu08/Y9xYaMIFUnjKwDCQejGhOhSwptft3gVtyctvokUw== X-Received: by 2002:aa7:88d5:0:b0:63a:ea82:b7b7 with SMTP id k21-20020aa788d5000000b0063aea82b7b7mr12627177pff.28.1690230938625; Mon, 24 Jul 2023 13:35:38 -0700 (PDT) Received: from gnu-cfl-3.localdomain ([172.59.161.25]) by smtp.gmail.com with ESMTPSA id e17-20020a62ee11000000b0064d57ecaa1dsm8145528pfi.28.2023.07.24.13.35.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 13:35:38 -0700 (PDT) Received: from gnu-cfl-3.. (localhost [IPv6:::1]) by gnu-cfl-3.localdomain (Postfix) with ESMTP id 15DE77401F7 for ; Mon, 24 Jul 2023 13:35:37 -0700 (PDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v3] x86: Properly find the maximum stack slot alignment Date: Mon, 24 Jul 2023 13:35:36 -0700 Message-ID: <20230724203536.40091-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 X-Spam-Status: No, score=-3025.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Gcc-patches" From: "H.J. Lu" Reply-To: "H.J. Lu" Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Don't assume that stack slots can only be accessed by stack or frame registers. We first find all registers defined by stack or frame registers. Then check memory accesses by such registers, including stack and frame registers. gcc/ PR target/109780 * config/i386/i386.cc (ix86_update_stack_alignment): New. (ix86_find_all_reg_use): Likewise. (ix86_find_max_used_stack_alignment): Also check memory accesses from registers defined by stack or frame registers. gcc/testsuite/ PR target/109780 * g++.target/i386/pr109780-1.C: New test. * gcc.target/i386/pr109780-1.c: Likewise. * gcc.target/i386/pr109780-2.c: Likewise. --- gcc/config/i386/i386.cc | 128 +++++++++++++++++---- gcc/testsuite/g++.target/i386/pr109780-1.C | 72 ++++++++++++ gcc/testsuite/gcc.target/i386/pr109780-1.c | 14 +++ gcc/testsuite/gcc.target/i386/pr109780-2.c | 21 ++++ 4 files changed, 214 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/pr109780-1.C create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-2.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index caca74d6dec..b71fd9401ef 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -8084,6 +8084,65 @@ output_probe_stack_range (rtx reg, rtx end) return ""; } +/* Update the maximum stack slot alignment from memory alignment in + PAT. */ + +static void +ix86_update_stack_alignment (rtx, const_rtx pat, void *data) +{ + /* This insn may reference stack slot. Update the maximum stack slot + alignment. */ + subrtx_iterator::array_type array; + FOR_EACH_SUBRTX (iter, array, pat, ALL) + if (MEM_P (*iter)) + { + unsigned int alignment = MEM_ALIGN (*iter); + unsigned int *stack_alignment + = (unsigned int *) data; + if (alignment > *stack_alignment) + *stack_alignment = alignment; + break; + } +} + +/* Find all registers defined with REG. */ + +static void +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access, + unsigned int reg, auto_bitmap &worklist) +{ + for (df_ref ref = DF_REG_USE_CHAIN (reg); + ref != NULL; + ref = DF_REF_NEXT_REG (ref)) + { + if (DF_REF_IS_ARTIFICIAL (ref)) + continue; + + rtx_insn *insn = DF_REF_INSN (ref); + if (!NONDEBUG_INSN_P (insn)) + continue; + + rtx set = single_set (insn); + if (!set) + continue; + + rtx src = SET_SRC (set); + if (MEM_P (src)) + continue; + + rtx dest = SET_DEST (set); + if (!REG_P (dest)) + continue; + + if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest))) + continue; + + /* Add this register to stack_slot_access. */ + add_to_hard_reg_set (&stack_slot_access, Pmode, REGNO (dest)); + bitmap_set_bit (worklist, REGNO (dest)); + } +} + /* Set stack_frame_required to false if stack frame isn't required. Update STACK_ALIGNMENT to the largest alignment, in bits, of stack slot used if stack frame is required and CHECK_STACK_SLOT is true. */ @@ -8102,10 +8161,6 @@ ix86_find_max_used_stack_alignment (unsigned int &stack_alignment, add_to_hard_reg_set (&set_up_by_prologue, Pmode, HARD_FRAME_POINTER_REGNUM); - /* The preferred stack alignment is the minimum stack alignment. */ - if (stack_alignment > crtl->preferred_stack_boundary) - stack_alignment = crtl->preferred_stack_boundary; - bool require_stack_frame = false; FOR_EACH_BB_FN (bb, cfun) @@ -8117,27 +8172,58 @@ ix86_find_max_used_stack_alignment (unsigned int &stack_alignment, set_up_by_prologue)) { require_stack_frame = true; - - if (check_stack_slot) - { - /* Find the maximum stack alignment. */ - subrtx_iterator::array_type array; - FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL) - if (MEM_P (*iter) - && (reg_mentioned_p (stack_pointer_rtx, - *iter) - || reg_mentioned_p (frame_pointer_rtx, - *iter))) - { - unsigned int alignment = MEM_ALIGN (*iter); - if (alignment > stack_alignment) - stack_alignment = alignment; - } - } + break; } } cfun->machine->stack_frame_required = require_stack_frame; + + /* Stop if we don't need to check stack slot. */ + if (!check_stack_slot) + return; + + /* The preferred stack alignment is the minimum stack alignment. */ + if (stack_alignment > crtl->preferred_stack_boundary) + stack_alignment = crtl->preferred_stack_boundary; + + HARD_REG_SET stack_slot_access; + CLEAR_HARD_REG_SET (stack_slot_access); + + /* Stack slot can be accessed by stack pointer, frame pointer or + registers defined by stack pointer or frame pointer. */ + auto_bitmap worklist; + add_to_hard_reg_set (&stack_slot_access, Pmode, + STACK_POINTER_REGNUM); + bitmap_set_bit (worklist, STACK_POINTER_REGNUM); + if (frame_pointer_needed) + { + add_to_hard_reg_set (&stack_slot_access, Pmode, + HARD_FRAME_POINTER_REGNUM); + bitmap_set_bit (worklist, HARD_FRAME_POINTER_REGNUM); + } + unsigned int reg; + do + { + reg = bitmap_clear_first_set_bit (worklist); + ix86_find_all_reg_use (stack_slot_access, reg, worklist); + } + while (!bitmap_empty_p (worklist)); + + hard_reg_set_iterator hrsi; + EXECUTE_IF_SET_IN_HARD_REG_SET (stack_slot_access, 0, reg, hrsi) + for (df_ref ref = DF_REG_USE_CHAIN (reg); + ref != NULL; + ref = DF_REF_NEXT_REG (ref)) + { + if (DF_REF_IS_ARTIFICIAL (ref)) + continue; + + rtx_insn *insn = DF_REF_INSN (ref); + if (!NONDEBUG_INSN_P (insn)) + continue; + note_stores (insn, ix86_update_stack_alignment, + &stack_alignment); + } } /* Finalize stack_realign_needed and frame_pointer_needed flags, which diff --git a/gcc/testsuite/g++.target/i386/pr109780-1.C b/gcc/testsuite/g++.target/i386/pr109780-1.C new file mode 100644 index 00000000000..7e3eabdec94 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr109780-1.C @@ -0,0 +1,72 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target c++17 } */ +/* { dg-options "-O2 -mavx2 -mtune=haswell" } */ + +template struct remove_reference { + using type = __remove_reference(_Tp); +}; +template struct MaybeStorageBase { + T val; + struct Union { + ~Union(); + } mStorage; +}; +template struct MaybeStorage : MaybeStorageBase { + char mIsSome; +}; +template ::type> +constexpr MaybeStorage Some(T &&); +template constexpr MaybeStorage Some(T &&aValue) { + return {aValue}; +} +template struct Span { + int operator[](long idx) { + int *__trans_tmp_4; + if (__builtin_expect(idx, 0)) + *(int *)__null = false; + __trans_tmp_4 = storage_.data(); + return __trans_tmp_4[idx]; + } + struct { + int *data() { return data_; } + int *data_; + } storage_; +}; +struct Variant { + template Variant(RefT) {} +}; +long from_i, from___trans_tmp_9; +namespace js::intl { +struct DecimalNumber { + Variant string_; + unsigned long significandStart_; + unsigned long significandEnd_; + bool zero_ = false; + bool negative_; + template DecimalNumber(CharT string) : string_(string) {} + template + static MaybeStorage from(Span); + void from(); +}; +} // namespace js::intl +void js::intl::DecimalNumber::from() { + Span __trans_tmp_3; + from(__trans_tmp_3); +} +template +MaybeStorage +js::intl::DecimalNumber::from(Span chars) { + DecimalNumber number(chars); + if (auto ch = chars[from_i]) { + from_i++; + number.negative_ = ch == '-'; + } + while (from___trans_tmp_9 && chars[from_i]) + ; + if (chars[from_i]) + while (chars[from_i - 1]) + number.zero_ = true; + return Some(number); +} + +/* { dg-final { scan-assembler-not "and\[lq\]?\[^\\n\]*-32,\[^\\n\]*sp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr109780-1.c b/gcc/testsuite/gcc.target/i386/pr109780-1.c new file mode 100644 index 00000000000..6b06947f2a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109780-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake" } */ + +char perm[64]; + +void +__attribute__((noipa)) +foo (int n) +{ + for (int i = 0; i < n; ++i) + perm[i] = i; +} + +/* { dg-final { scan-assembler-not "and\[lq\]?\[^\\n\]*-32,\[^\\n\]*sp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr109780-2.c b/gcc/testsuite/gcc.target/i386/pr109780-2.c new file mode 100644 index 00000000000..152da06c6ad --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109780-2.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake" } */ + +#define N 9 + +void +f (double x, double y, double *res) +{ + y = -y; + for (int i = 0; i < N; ++i) + { + double tmp = y; + y = x; + x = tmp; + res[i] = i; + } + res[N] = y * y; + res[N + 1] = x; +} + +/* { dg-final { scan-assembler-not "and\[lq\]?\[^\\n\]*-32,\[^\\n\]*sp" } } */