From patchwork Sat Dec 9 07:04:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Oliva X-Patchwork-Id: 1874021 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=adacore.com header.i=@adacore.com header.a=rsa-sha256 header.s=google header.b=d+U4qQMs; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SnJtG5fWcz23n0 for ; Sat, 9 Dec 2023 18:05:07 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6020F385842E for ; Sat, 9 Dec 2023 07:05:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id 7AE523858CD1 for ; Sat, 9 Dec 2023 07:04:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7AE523858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7AE523858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::435 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702105494; cv=none; b=ceMRJlaMNudfLshL+X452wGZBXcDNrSv/Jfa678lSfqH9RwhtUtpwjRY7/906O3h3xEHoEGpYxpXvLj6jNIkMSOI1pIP/lufvn/ChcJbc/SFnVp/+/boNP+PUvi69HWHNUNipm7BS1SM3RpzUNlcaXF4WlhGrdvh/a9aWdkncug= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702105494; c=relaxed/simple; bh=sGPwtxOJDKferafGE82WlYk3fOYqG8UQCfuXl9rzQ/Q=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=bFz4BlUMvSc0NJ6e0Wv6XJ8SwCVBsPlBE3Du5xRuhkWk5g+kD1hITPprggaPvPvHFa4cwk4YJPbu8aenOnXnkz7YenLqCX1FtHrx/Ia6zvb8kQFCOfTJ24bf4HvhRmSvLnKQ8o/dA2YLzqxZlbF2FJpnLAN7pZL9uyYO2134YGE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-6ce939ecfc2so2246589b3a.2 for ; Fri, 08 Dec 2023 23:04:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adacore.com; s=google; t=1702105484; x=1702710284; darn=gcc.gnu.org; h=mime-version:user-agent:message-id:in-reply-to:date:references :organization:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=Phy69NAKm7iIccZpCwEbkL5AWK+Qak7In4iTIrvnvuU=; b=d+U4qQMsUyIHCkfjcsXvG2c78Oz6oySbFbjCMX3vaOvfY0IC5Uwxwl2BEsvTYIoTR8 eW6zeQtzloO/4MGghJF0zOJ6+OlceXPclWJYYXhbXRby7PXqx8kX/fe5zPyHLng0B1ye eH0GATRs4fY7bhgWmI3Kr6D+vq0ZvWTwVeQ9huFs/m/+kzFN441QM2HfcTT/gjfLFDZa GyUo9w7RyBdVMKntHZXa5xG9r2Pm9Nzj1acveZWLUizdKVp9Rqu2p+p3S+6o/t9De/SI E3XGX9heeXq0kWd0qpE7eIsuv6a3LsqVXOBj/jnc5KV4vIY/6MJv8Oy/PVwyH7RRNuoM IbUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702105484; x=1702710284; h=mime-version:user-agent:message-id:in-reply-to:date:references :organization:subject:to:from:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Phy69NAKm7iIccZpCwEbkL5AWK+Qak7In4iTIrvnvuU=; b=wKx3E9Ve8+/e0vcEUT1L5YaLBDD/xdaa+2mpTNIZ/h98HjdpEabUEtTu/Qux8ykU7o W9109B6cyRvbD6aWkCQx8RLtmvWo3gUPDgiIFuq3YVi2rj0Ibm2e80H5z+fX5FGgeVcP 2Do32BbPdXXqSGp95kszszdl4XlhI38hD/+gQX3XBPfbkEM4DB5xs+ZCvBg/gnF0WWtc XagejPtSNCbv73IlfCBZbwzD3P3ee4UMPI2j5L86cY5y+TnQZYi+LNSuhQ8xgsIOo1hn VINTb7w+2s5hREq316RijkFZZQsPQndpE2leKIIy2w0tU0CAVswNGWB8DH7CVj1NKrU5 7nZQ== X-Gm-Message-State: AOJu0Yw6QQfczK5cCSvc4ZdsfMHYRUr7b4nfwfvSn2+6bMVdxGB/xqWf 7CwAOEADLcRKjS+ktz9OKN1ERk9DcjNDujq3/3Ul9Q== X-Google-Smtp-Source: AGHT+IEdT4FjEWzhP/9MUwqcq+mOPw9o/L76G6+VlMc3Jg3oJmjzRdHVRBFbbx+NgLxOxZkB0k0g9Q== X-Received: by 2002:a05:6a00:1303:b0:6ce:7656:5cba with SMTP id j3-20020a056a00130300b006ce76565cbamr1363015pfu.22.1702105484325; Fri, 08 Dec 2023 23:04:44 -0800 (PST) Received: from free.home ([2804:14c:4d1:44a5:f4d9:b7a4:4fb8:376f]) by smtp.gmail.com with ESMTPSA id y24-20020aa78558000000b006cee23e7677sm1992837pfn.210.2023.12.08.23.04.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 23:04:43 -0800 (PST) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 3B974YHe322027 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Sat, 9 Dec 2023 04:04:35 -0300 From: Alexandre Oliva To: gcc-patches@gcc.gnu.org Subject: [PATCH v2] -finline-stringops: check base blksize for memset [PR112778] Organization: Free thinker, does not speak for AdaCore References: Date: Sat, 09 Dec 2023 04:04:34 -0300 In-Reply-To: (Alexandre Oliva's message of "Fri, 08 Dec 2023 23:45:29 -0300") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_QUOTING autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Scratch the previous one, the "slightly different version" I had before it was not entirely broken due to unnecessary, suboptimal and incorrect use of ctz. Here I have yet another implementation of that loop that should perform better and even work correctly ;-) This one has so far regstrapped on x86_64-linux-gnu (v1 failed in regression testing, sorry), and bootstrapped with -finline-stringops on ppc64le-linux-gnu (still ongoing on x86-64-linux-gnu and aarch64-linux-gnu). Ok to install? The recently-added logic for -finline-stringops=memset introduced an assumption that doesn't necessarily hold, namely, that can_store_by_pieces of a larger size implies can_store_by_pieces by smaller sizes. Checks for all sizes the by-multiple-pieces machinery might use before committing to an expansion pattern. for gcc/ChangeLog PR target/112778 * builtins.cc (can_store_by_multiple_pieces): New. (try_store_by_multiple_pieces): Call it. for gcc/testsuite/ChangeLog PR target/112778 * gcc.dg/inline-mem-cmp-pr112778.c: New. --- gcc/builtins.cc | 57 ++++++++++++++++++++---- gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c | 10 ++++ 2 files changed, 58 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 12a535d313f12..f6c96498f0783 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -4284,6 +4284,40 @@ expand_builtin_memset (tree exp, rtx target, machine_mode mode) return expand_builtin_memset_args (dest, val, len, target, mode, exp); } +/* Check that store_by_pieces allows BITS + LEN (so that we don't + expand something too unreasonably long), and every power of 2 in + BITS. It is assumed that LEN has already been tested by + itself. */ +static bool +can_store_by_multiple_pieces (unsigned HOST_WIDE_INT bits, + by_pieces_constfn constfun, + void *constfundata, unsigned int align, + bool memsetp, + unsigned HOST_WIDE_INT len) +{ + if (bits + && !can_store_by_pieces (bits + len, constfun, constfundata, + align, memsetp)) + return false; + + /* BITS set are expected to be generally in the low range and + contiguous. We do NOT want to repeat the test above in case BITS + has a single bit set, so we terminate the loop when BITS == BIT. + In the unlikely case that BITS has the MSB set, also terminate in + case BIT gets shifted out. */ + for (unsigned HOST_WIDE_INT bit = 1; bit < bits && bit; bit <<= 1) + { + if ((bits & bit) == 0) + continue; + + if (!can_store_by_pieces (bit, constfun, constfundata, + align, memsetp)) + return false; + } + + return true; +} + /* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO. Return TRUE if successful, FALSE otherwise. TO is assumed to be aligned at an ALIGN-bits boundary. LEN must be a multiple of @@ -4341,7 +4375,11 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned int ctz_len, else /* Huh, max_len < min_len? Punt. See pr100843.c. */ return false; - if (min_len >= blksize) + if (min_len >= blksize + /* ??? Maybe try smaller fixed-prefix blksizes before + punting? */ + && can_store_by_pieces (blksize, builtin_memset_read_str, + &valc, align, true)) { min_len -= blksize; min_bits = floor_log2 (min_len); @@ -4367,8 +4405,9 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned int ctz_len, happen because of the way max_bits and blksize are related, but it doesn't hurt to test. */ if (blksize > xlenest - || !can_store_by_pieces (xlenest, builtin_memset_read_str, - &valc, align, true)) + || !can_store_by_multiple_pieces (xlenest - blksize, + builtin_memset_read_str, + &valc, align, true, blksize)) { if (!(flag_inline_stringops & ILSOP_MEMSET)) return false; @@ -4386,17 +4425,17 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned int ctz_len, of overflow. */ if (max_bits < orig_max_bits && xlenest + blksize >= xlenest - && can_store_by_pieces (xlenest + blksize, - builtin_memset_read_str, - &valc, align, true)) + && can_store_by_multiple_pieces (xlenest, + builtin_memset_read_str, + &valc, align, true, blksize)) { max_loop = true; break; } if (blksize - && can_store_by_pieces (xlenest, - builtin_memset_read_str, - &valc, align, true)) + && can_store_by_multiple_pieces (xlenest, + builtin_memset_read_str, + &valc, align, true, 0)) { max_len += blksize; min_len += blksize; diff --git a/gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c b/gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c new file mode 100644 index 0000000000000..fdfc5b6f28c8e --- /dev/null +++ b/gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-finline-stringops" } */ + +char buf[3]; + +int +f () +{ + __builtin_memset (buf, 'v', 3); +}