From patchwork Wed Jan 3 18:22:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 855141 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-470092-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="BaO1c0q1"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zBfSD5hwYz9s7G for ; Thu, 4 Jan 2018 05:23:07 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:message-id:date:mime-version:content-type; q=dns; s= default; b=CUOJfpnK7GGSznA5XIO4kqkfNIhsrs2/H8wyhQQiF2AvROgY3K5Td PTTMuUg6nzQULQErfVeHeBWoPuXdIoxfkbTPisMkXyBnR0ri46fuIAXP7IKKgVFL d9yWBIVtD0eSDPDLuoZ9DSuGuREbNo/YehIfIBnekJpIH01n6acuhs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:message-id:date:mime-version:content-type; s= default; bh=ch2b2Lq4IaNsuYMA7L6cOb6mhco=; b=BaO1c0q1mgtdjdN8QRMa 56MU2sh+dE70WhaDvypwa0K8txCuRWql6sRvfC2gJnIwIk+Jh+c94iL+hT3qe/QU ywUlDzPgwEL8cJX+jM1uIHRCxl/ijez8mcT4C14Qj4dQ3eQQm7sx4MlU3bJPiTVz 22HfiJHks+/9qWRla/kJG+Q= Received: (qmail 24663 invoked by alias); 3 Jan 2018 18:22:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 24654 invoked by uid 89); 3 Jan 2018 18:22:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_HELO_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=probed, entrance, gen_rtx_PLUS, gen_rtx_plus X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 03 Jan 2018 18:22:57 +0000 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 10101C058EB3 for ; Wed, 3 Jan 2018 18:22:56 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-21.rdu2.redhat.com [10.10.112.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id 50199619F0 for ; Wed, 3 Jan 2018 18:22:55 +0000 (UTC) From: Jeff Law To: gcc-patches Subject: [PATCH][committed][ PR middle-end/83654] Fix bogus probe in dynamic space when there are no residuals Message-ID: Date: Wed, 3 Jan 2018 11:22:54 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 X-IsSubscribed: yes There's a of sense of deja-vu on this BZ. There's a lot of similarities with an issue that came up earlier on aarch64 and its special handling of the outgoing argument area. Basically the residual of dynamic allocation may be a compile time constant or a runtime variant and anything in between. In the case where it is not a compile time constant, but has the value zero at runtime we must not probe the non-existent residual. So the way to address that is to emit the compare/branch around the probe when the residual is not a compile time constant (just like we do for the outgoing argument area on aarch64). I took the opportunity to make the residual handling and aarch64 outgoing argument space handling have the same core style. There's two tests. One is the original from Florian. It's a bit interesting in that at expansion time we do not see the allocation size or residual as a constant. So we emit a probing loop and residual handling during expansion. We check for that. Later optimizers come along and prove the residual handling isn't needed which we verify as well by checking the assembly output. In theory we should realize the loop iterates precisely once and remove the looping structure, but don't (I haven't investigated that missed-optimization). I've added a derived test where the size is not compile-time determinable. We verify there'll be loop and residual probing during expansion. We then verify the existence of both probe instructions *and* verify there's 3 conditionals (one guarding entrance to the probing loop, one for the probing loop backedge and one guarding the residual probe) in the assembly output. Bootstrapped and regression tested on x86_64. Verified the test works on both x86 and x86_64. Installing on the trunk. Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index d95c297e96a..f570eb4e606 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2017-01-03 Jeff Law + + PR middle-end/83654 + * explow.c (anti_adjust_stack_and_probe_stack_clash): Test a + non-constant residual for zero at runtime and avoid probing in + that case. Reorganize code for trailing problem to mirror handling + of the residual. + 2018-01-03 Prathamesh Kulkarni PR tree-optimization/83501 diff --git a/gcc/explow.c b/gcc/explow.c index b6c56602152..042e71904ec 100644 --- a/gcc/explow.c +++ b/gcc/explow.c @@ -1997,11 +1997,27 @@ anti_adjust_stack_and_probe_stack_clash (rtx size) if (residual != CONST0_RTX (Pmode)) { + rtx label = NULL_RTX; + /* RESIDUAL could be zero at runtime and in that case *sp could + hold live data. Furthermore, we do not want to probe into the + red zone. + + Go ahead and just guard the probe at *sp on RESIDUAL != 0 at + runtime if RESIDUAL is not a compile time constant. */ + if (!CONST_INT_P (residual)) + { + label = gen_label_rtx (); + emit_cmp_and_jump_insns (residual, CONST0_RTX (GET_MODE (residual)), + EQ, NULL_RTX, Pmode, 1, label); + } + rtx x = force_reg (Pmode, plus_constant (Pmode, residual, -GET_MODE_SIZE (word_mode))); anti_adjust_stack (residual); emit_stack_probe (gen_rtx_PLUS (Pmode, stack_pointer_rtx, x)); emit_insn (gen_blockage ()); + if (!CONST_INT_P (residual)) + emit_label (label); } /* Some targets make optimistic assumptions in their prologues about @@ -2014,28 +2030,20 @@ anti_adjust_stack_and_probe_stack_clash (rtx size) live data. Furthermore, we don't want to probe into the red zone. - Go ahead and just guard a probe at *sp on SIZE != 0 at runtime + Go ahead and just guard the probe at *sp on SIZE != 0 at runtime if SIZE is not a compile time constant. */ - - /* Ideally we would just probe at *sp. However, if SIZE is not - a compile-time constant, but is zero at runtime, then *sp - might hold live data. So probe at *sp if we know that - an allocation was made, otherwise probe into the red zone - which is obviously undesirable. */ - if (CONST_INT_P (size)) - { - emit_stack_probe (stack_pointer_rtx); - emit_insn (gen_blockage ()); - } - else + rtx label = NULL_RTX; + if (!CONST_INT_P (size)) { - rtx label = gen_label_rtx (); + label = gen_label_rtx (); emit_cmp_and_jump_insns (size, CONST0_RTX (GET_MODE (size)), EQ, NULL_RTX, Pmode, 1, label); - emit_stack_probe (stack_pointer_rtx); - emit_insn (gen_blockage ()); - emit_label (label); } + + emit_stack_probe (stack_pointer_rtx); + emit_insn (gen_blockage ()); + if (!CONST_INT_P (size)) + emit_label (label); } } diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 1c21902be97..7777ee5e478 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2018-01-03 Jeff Law + + PR middle-end/83654 + * gcc.target/i386/stack-check-18.c: New test. + * gcc.target/i386/stack-check-19.c: New test. + 2018-01-03 Martin Sebor PR tree-optimization/83501 diff --git a/gcc/testsuite/gcc.target/i386/stack-check-18.c b/gcc/testsuite/gcc.target/i386/stack-check-18.c new file mode 100644 index 00000000000..6dbff4402da --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/stack-check-18.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fstack-clash-protection -mtune=generic -fdump-rtl-expand" } */ +/* { dg-require-effective-target supports_stack_clash_protection } */ + +int f1 (char *); + +int +f2 (void) +{ + const int size = 4096; + char buffer[size]; + return f1 (buffer); +} + +/* So we want to verify that at expand time that we probed the main + VLA allocation as well as the residuals. Then we want to verify + there was only one probe in the final assembly (implying the + residual probe was optimized away). */ +/* { dg-final { scan-rtl-dump-times "allocation and probing in loop" 1 "expand" } } */ +/* { dg-final { scan-rtl-dump-times "allocation and probing residuals" 1 "expand" } } */ + +/* { dg-final { scan-assembler-times "or\[ql\]" 1 } } */ + diff --git a/gcc/testsuite/gcc.target/i386/stack-check-19.c b/gcc/testsuite/gcc.target/i386/stack-check-19.c new file mode 100644 index 00000000000..b92c126d57f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/stack-check-19.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fstack-clash-protection -mtune=generic -fdump-rtl-expand" } */ +/* { dg-require-effective-target supports_stack_clash_protection } */ + +int f1 (char *); + +int +f2 (const int size) +{ + char buffer[size]; + return f1 (buffer); +} + +/* So we want to verify that at expand time that we probed the main + VLA allocation as well as the residuals. Then we want to verify + there are two probes in the final assembly code. */ +/* { dg-final { scan-rtl-dump-times "allocation and probing in loop" 1 "expand" } } */ +/* { dg-final { scan-rtl-dump-times "allocation and probing residuals" 1 "expand" } } */ +/* { dg-final { scan-assembler-times "or\[ql\]" 2 } } */ + +/* We also want to verify (indirectly) that the residual probe is + guarded. We do that by checking the number of conditional + branches. There should be 3. One that bypasses the probe loop, one + in the probe loop and one that bypasses the residual probe. + + These will all be equality tests. */ +/* { dg-final { scan-assembler-times "(\?:je|jne)" 3 } } */ + +