From patchwork Wed Jan 27 12:40:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Oliva X-Patchwork-Id: 1432194 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DQjr61q3xz9sVn for ; Wed, 27 Jan 2021 23:40:37 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 056C13959E73; Wed, 27 Jan 2021 12:40:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from rock.gnat.com (rock.gnat.com [IPv6:2620:20:4000:0:a9e:1ff:fe9b:1d1]) by sourceware.org (Postfix) with ESMTP id 74019388A429 for ; Wed, 27 Jan 2021 12:40:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 74019388A429 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=oliva@adacore.com Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id 4AD80116B4B; Wed, 27 Jan 2021 07:40:28 -0500 (EST) X-Virus-Scanned: Debian amavisd-new at gnat.com Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id bmfKtYUGPzav; Wed, 27 Jan 2021 07:40:28 -0500 (EST) Received: from free.home (tron.gnat.com [IPv6:2620:20:4000:0:46a8:42ff:fe0e:e294]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by rock.gnat.com (Postfix) with ESMTPS id D8DA4116A3C; Wed, 27 Jan 2021 07:40:27 -0500 (EST) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 10RCeLtS578604 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 27 Jan 2021 09:40:21 -0300 From: Alexandre Oliva To: gcc-patches@gcc.gnu.org Subject: [RFC] test builtin ratio for loop distribution Organization: Free thinker, does not speak for AdaCore Date: Wed, 27 Jan 2021 09:40:21 -0300 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zdenek Dvorak Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch attempts to fix a libgcc codegen regression introduced in gcc-10, as -ftree-loop-distribute-patterns was enabled at -O2. The ldist pass turns even very short loops into memset calls. E.g., the TFmode emulation calls end with a loop of up to 3 iterations, to zero out trailing words, and the loop distribution pass turns them into calls of the memset builtin. Though short constant-length memsets are usually dealt with efficiently, for non-constant-length ones, the options are setmemM, or a function calls. RISC-V doesn't have any setmemM pattern, so the loops above end up "optimized" into memset calls, incurring not only the overhead of an explicit call, but also discarding the information the compiler has about the alignment of the destination, and that the length is a multiple of the word alignment. This patch adds to the loop distribution pass some cost analysis based on preexisting *_RATIO macros, so that we won't transform loops with trip counts as low as the ratios we'd rather expand inline. This patch is not finished; it needs adjustments to the testsuite, to make up for the behavior changes it brings about. Specifically, on a x86_64-linux-gnu regstrap, it regresses: > FAIL: gcc.dg/pr53265.c (test for warnings, line 40) > FAIL: gcc.dg/pr53265.c (test for warnings, line 42) > FAIL: gcc.dg/tree-ssa/ldist-38.c scan-tree-dump ldist "split to 0 loops and 1 library cal> FAIL: g++.dg/tree-ssa/pr78847.C -std=gnu++14 scan-tree-dump ldist "split to 0 loops and 1 library calls" > FAIL: g++.dg/tree-ssa/pr78847.C -std=gnu++17 scan-tree-dump ldist "split to 0 loops and 1 library calls" > FAIL: g++.dg/tree-ssa/pr78847.C -std=gnu++2a scan-tree-dump ldist "split to 0 loops and 1 library calls" I suppose just lengthening the loops will take care of ldist-38 and pr78847, but the loss of the warnings in pr53265 is more concerning, and will require investigation. Nevertheless, I seek feedback on whether this is an acceptable approach, or whether we should use alternate tuning parameters for ldist, or something entirely different. Thanks in advance, for gcc/ChangeLog * tree-loop-distribution.c (maybe_normalize_partition): New. (loop_distribution::distribute_loop): Call it. [requires testsuite adjustments and investigation of a warning regression] --- gcc/tree-loop-distribution.c | 54 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index bb15fd3723fb6..b5198652817ee 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -2848,6 +2848,52 @@ fuse_memset_builtins (vec *partitions) } } +/* Return false if it's profitable to turn the LOOP PARTITION into a builtin + call, and true if it wasn't, changing the PARTITION to PKIND_NORMAL. */ + +static bool +maybe_normalize_partition (class loop *loop, struct partition *partition) +{ + unsigned HOST_WIDE_INT ratio; + + switch (partition->kind) + { + case PKIND_NORMAL: + case PKIND_PARTIAL_MEMSET: + return false; + + case PKIND_MEMSET: + if (integer_zerop (gimple_assign_rhs1 (DR_STMT + (partition->builtin->dst_dr)))) + ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop)); + else + ratio = SET_RATIO (optimize_loop_for_speed_p (loop)); + break; + + case PKIND_MEMCPY: + case PKIND_MEMMOVE: + ratio = MOVE_RATIO (optimize_loop_for_speed_p (loop)); + break; + + default: + gcc_unreachable (); + } + + tree niters = number_of_latch_executions (loop); + if (niters == NULL_TREE || niters == chrec_dont_know) + return false; + + wide_int minit, maxit; + value_range_kind vrk = determine_value_range (niters, &minit, &maxit); + if (vrk == VR_RANGE && wi::ltu_p (maxit, ratio)) + { + partition->kind = PKIND_NORMAL; + return true; + } + + return false; +} + void loop_distribution::finalize_partitions (class loop *loop, vec *partitions, @@ -3087,6 +3133,14 @@ loop_distribution::distribute_loop (class loop *loop, vec stmts, } finalize_partitions (loop, &partitions, &alias_ddrs); + { + bool any_changes_p = false; + for (i = 0; partitions.iterate (i, &partition); ++i) + if (maybe_normalize_partition (loop, partition)) + any_changes_p = true; + if (any_changes_p) + finalize_partitions (loop, &partitions, &alias_ddrs); + } /* If there is a reduction in all partitions make sure the last one is not classified for builtin code generation. */