From patchwork Fri Aug 16 04:04:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Oliva X-Patchwork-Id: 1973049 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=adacore.com header.i=@adacore.com header.a=rsa-sha256 header.s=google header.b=k06xMclc; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WlT1F6N1Tz1yXZ for ; Fri, 16 Aug 2024 14:05:37 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D6D40385700F for ; Fri, 16 Aug 2024 04:05:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) by sourceware.org (Postfix) with ESMTPS id F06033858D28 for ; Fri, 16 Aug 2024 04:05:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F06033858D28 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F06033858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::529 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723781115; cv=none; b=bSGIFasLM6u21hiW+GYwp5minuZ2XX8iUJGPbFbHB4xwD1xXJAaV4iaG2bffBN4C6ueqGnAF665IrO006A01BQ5Ejam7r4/IAmn/jqi6WhH9e3G0EbhAORTjkmeF4bd2Ru1uDOmdBIcT36h9a+oXrZdLxAITpY8HzGKdOEuh32g= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723781115; c=relaxed/simple; bh=cFGx+RhD/dUmtgCAxI4YdXZaKuY0wJpOGqOEdv5oTXQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=T7ONVI6xMSwbe+xRMfD4EhxvLd6qwTZLU9teIs3lFSBtC5ZarZqzhvdOSoCTIMnKE7dp/p8Bb+ZIZY/waL2Xly1hf0j3aUyLgz02x5GThsKlCHvxQRDUQUMK7e9v0tad9ga1YR144TODLmWHIBoQJ/wTYd0DS5+2Xl1Jt2PpsVo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x529.google.com with SMTP id 41be03b00d2f7-7c3e1081804so1255950a12.3 for ; Thu, 15 Aug 2024 21:05:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adacore.com; s=google; t=1723781112; x=1724385912; darn=gcc.gnu.org; h=mime-version:user-agent:message-id:in-reply-to:date:references :organization:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=AaUwDrhbak5MxPtC6TiclNwo9oeL7xY1L2dRbc9I5Oc=; b=k06xMclcyjpfkU9fX8IYAshTWMnhVOtWEta95/8IW5zcDxvRmD/7dswKeP+uDDHGhk ZzK1xKmZtZGRRKKADFCj7fQT+jqBwbCGT7+t8IIyz/r/QzxBaPfhKlAeVriqMvZh+weQ /N9ssKiYFY3ZBpczSYxA/xG32uKC++nsdEDA1W2G4zkU+UhuZ09yGdBnF0Xlnn9TtArh 0I9b1fWaBuFYRiabVQLxirajXx4bbBI9KzlEXtSa4x3dQQ9IRpALXA6S7j0oLR2BKHA+ h1mHcX9KRAjFr+3KwL7TkyfeLpQFfl+SPl27I1Qbd5zwVE7N2xx41JOd94owwkOvp8KR x6QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723781112; x=1724385912; h=mime-version:user-agent:message-id:in-reply-to:date:references :organization:subject:to:from:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=AaUwDrhbak5MxPtC6TiclNwo9oeL7xY1L2dRbc9I5Oc=; b=Uh4p0T1lY9ZKxf8chbjZYalOwy7N3wYvugBsTRdGXvFGRY9MwFqaAsKWpBDyzmOTxg iv88tZKyB0lmmq9xP567h/ShwXlrLz8O3gBNgoNlsb0MwXhE9avmb6Yg9l7koh0fDm+8 LV0TzKwhwgZ93GToSoMnBgXVQ62gih9Tz0mEGbDlJP4ioaTpWPgU4U+cEjiwohkRUZGy HlcMH5rLxxCfOPfpFOYzScFNsEEYVy9EY12mSOnfavSgQG9ddDbF0hFQWX3CHav2XeQm 9Qz+5HIIlUl0N2GJTIS+4lbjZ+tEx2yNCpUaXvpV3yS2BNT3ROKe/Yl+0jK9AkyDxxRW rycg== X-Gm-Message-State: AOJu0YxSQgyyDW+uYneezWegrymiXCycGL9jnj43lYEs3CeA6i8aURCP CJIgpwnoaOmHcVg6EHUOlYW92uYa5WW1uk3rEXp4n2E4d0wwhuOJmYprW8qH94Q3tBpZY6I1iYm gEQ== X-Google-Smtp-Source: AGHT+IHfRiXj/iVlrRz5K8WaHBCwGXEMTV8BWq/sg4aF6fgGX3ECAOgVa6rKea8Xf328TzUBO/k8Dg== X-Received: by 2002:a05:6a21:3983:b0:1c6:f859:542d with SMTP id adf61e73a8af0-1c904f96579mr2252527637.16.1723781111749; Thu, 15 Aug 2024 21:05:11 -0700 (PDT) Received: from free.home ([2804:7f1:218b:fb3:deed:d05e:1601:303a]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7127add7cbcsm1823199b3a.14.2024.08.15.21.05.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Aug 2024 21:05:11 -0700 (PDT) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 47G44x4u1696563 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 16 Aug 2024 01:04:59 -0300 From: Alexandre Oliva To: gcc-patches@gcc.gnu.org Subject: [PATCH v2] Optimize initialization of small padded objects Organization: Free thinker, does not speak for AdaCore References: Date: Fri, 16 Aug 2024 01:04:59 -0300 In-Reply-To: (Alexandre Oliva's message of "Thu, 15 Aug 2024 01:32:26 -0300") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_QUOTING autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org On Aug 15, 2024, Alexandre Oliva wrote: > I can't quite envision what to check for in a target-independent test. Got it. Also dropped some occurrences of CONST_CAST_TREE that I added, then changed function signatures but failed to remove them. Retested on x86_64-linux-gnu. Ok to install? When small objects containing padding bits (or bytes) are fully initialized, we will often store them in registers, and setting bitfields and other small fields will attempt to preserve the uninitialized padding bits, which tends to be expensive. Zero-initializing registers, OTOH, tends to be cheap. So, if we're optimizing, zero-initialize such small padded objects even if that's not needed for correctness. We can't zero-initialize all such padding objects, though: if there's no padding whatsoever, and all fields are initialized with nonzero, the zero initialization would be flagged as dead. That's why we introduce machinery to detect whether objects have padding bits. I considered distinguishing between bitfields, units and larger padding elements, but I didn't pursue that distinction. Since the object's zero-initialization subsumes fields' zero-initialization, the empty string test in builtin-snprintf-6.c's test_assign_aggregate would regress without the addition of native_encode_constructor. for gcc/ChangeLog * expr.cc (categorize_ctor_elements_1): Change p_complete to int, to distinguish complete initialization in presence or absence of uninitialized padding bits. (categorize_ctor_elements): Likewise. Adjust all callers... * expr.h (categorize_ctor_elements): ... and declaration. (type_has_padding_at_level_p): New. * gimple-fold.cc (type_has_padding_at_level_p): New. * fold-const.cc (native_encode_constructor): New. (native_encode_expr): Call it. * gimplify.cc (gimplify_init_constructor): Clear small non-addressable non-volatile objects with padding or other uninitialized fields as an optimization. for gcc/testsuite/ChangeLog * gcc.dg/init-pad-1.c: New. --- gcc/expr.cc | 20 ++++++++++----- gcc/expr.h | 3 +- gcc/fold-const.cc | 33 ++++++++++++++++++++++++ gcc/gimple-fold.cc | 50 +++++++++++++++++++++++++++++++++++++ gcc/gimplify.cc | 14 ++++++++++ gcc/testsuite/gcc.dg/init-pad-1.c | 18 +++++++++++++ 6 files changed, 129 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/init-pad-1.c diff --git a/gcc/expr.cc b/gcc/expr.cc index 2089c2b86a98a..a701c67b3485d 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -7096,7 +7096,7 @@ count_type_elements (const_tree type, bool for_ctor_p) static bool categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts, HOST_WIDE_INT *p_unique_nz_elts, - HOST_WIDE_INT *p_init_elts, bool *p_complete) + HOST_WIDE_INT *p_init_elts, int *p_complete) { unsigned HOST_WIDE_INT idx; HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields; @@ -7218,7 +7218,10 @@ categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts, if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor), num_fields, elt_type)) - *p_complete = false; + *p_complete = 0; + else if (*p_complete > 0 + && type_has_padding_at_level_p (TREE_TYPE (ctor))) + *p_complete = -1; *p_nz_elts += nz_elts; *p_unique_nz_elts += unique_nz_elts; @@ -7239,7 +7242,10 @@ categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts, and place it in *P_ELT_COUNT. * whether the constructor is complete -- in the sense that every meaningful byte is explicitly given a value -- - and place it in *P_COMPLETE. + and place it in *P_COMPLETE: + - 0 if any field is missing + - 1 if all fields are initialized, and there's no padding + - -1 if all fields are initialized, but there's padding Return whether or not CTOR is a valid static constant initializer, the same as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0". */ @@ -7247,12 +7253,12 @@ categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts, bool categorize_ctor_elements (const_tree ctor, HOST_WIDE_INT *p_nz_elts, HOST_WIDE_INT *p_unique_nz_elts, - HOST_WIDE_INT *p_init_elts, bool *p_complete) + HOST_WIDE_INT *p_init_elts, int *p_complete) { *p_nz_elts = 0; *p_unique_nz_elts = 0; *p_init_elts = 0; - *p_complete = true; + *p_complete = 1; return categorize_ctor_elements_1 (ctor, p_nz_elts, p_unique_nz_elts, p_init_elts, p_complete); @@ -7313,7 +7319,7 @@ mostly_zeros_p (const_tree exp) if (TREE_CODE (exp) == CONSTRUCTOR) { HOST_WIDE_INT nz_elts, unz_elts, init_elts; - bool complete_p; + int complete_p; categorize_ctor_elements (exp, &nz_elts, &unz_elts, &init_elts, &complete_p); @@ -7331,7 +7337,7 @@ all_zeros_p (const_tree exp) if (TREE_CODE (exp) == CONSTRUCTOR) { HOST_WIDE_INT nz_elts, unz_elts, init_elts; - bool complete_p; + int complete_p; categorize_ctor_elements (exp, &nz_elts, &unz_elts, &init_elts, &complete_p); diff --git a/gcc/expr.h b/gcc/expr.h index 533ae0af3871a..04782b15f192c 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -361,7 +361,8 @@ extern unsigned HOST_WIDE_INT highest_pow2_factor (const_tree); extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *, HOST_WIDE_INT *, HOST_WIDE_INT *, - bool *); + int *); +extern bool type_has_padding_at_level_p (tree); extern bool immediate_const_ctor_p (const_tree, unsigned int words = 1); extern void store_constructor (tree, rtx, int, poly_int64, bool); extern HOST_WIDE_INT int_expr_size (const_tree exp); diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 8908e7381e72c..ff2c8f35303b8 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -8193,6 +8193,36 @@ native_encode_string (const_tree expr, unsigned char *ptr, int len, int off) return len; } +/* Subroutine of native_encode_expr. Encode the CONSTRUCTOR + specified by EXPR into the buffer PTR of length LEN bytes. + Return the number of bytes placed in the buffer, or zero + upon failure. */ + +static int +native_encode_constructor (const_tree expr, unsigned char *ptr, int len, int off) +{ + /* We are only concerned with zero-initialization constructors here. */ + if (CONSTRUCTOR_NELTS (expr)) + return 0; + + /* Wide-char strings are encoded in target byte-order so native + encoding them is trivial. */ + if (BITS_PER_UNIT != CHAR_BIT + || !tree_fits_shwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr)))) + return 0; + + HOST_WIDE_INT total_bytes = tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE (expr))); + if ((off == -1 && total_bytes > len) || off >= total_bytes) + return 0; + if (off == -1) + off = 0; + len = MIN (total_bytes - off, len); + if (ptr == NULL) + /* Dry run. */; + else + memset (ptr, 0, len); + return len; +} /* Subroutine of fold_view_convert_expr. Encode the INTEGER_CST, REAL_CST, FIXED_CST, COMPLEX_CST, STRING_CST, or VECTOR_CST specified by EXPR into @@ -8229,6 +8259,9 @@ native_encode_expr (const_tree expr, unsigned char *ptr, int len, int off) case STRING_CST: return native_encode_string (expr, ptr, len, off); + case CONSTRUCTOR: + return native_encode_constructor (expr, ptr, len, off); + default: return 0; } diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index 18d7a6b176db7..f955a006ebe4e 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -4661,6 +4661,56 @@ clear_padding_type_may_have_padding_p (tree type) } } +/* Return true if TYPE has padding bits aside from those in fields, + elements, etc. */ + +bool +type_has_padding_at_level_p (tree type) +{ + switch (TREE_CODE (type)) + { + case RECORD_TYPE: + { + tree bitpos = size_zero_node; + /* Expect fields to be sorted by bit position. */ + for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f)) + if (TREE_CODE (f) == FIELD_DECL) + { + if (DECL_PADDING_P (f)) + return true; + tree pos = bit_position (f); + if (simple_cst_equal (bitpos, pos) != 1) + return true; + if (!DECL_SIZE (f)) + return true; + bitpos = int_const_binop (PLUS_EXPR, pos, DECL_SIZE (f)); + } + if (simple_cst_equal (bitpos, TYPE_SIZE (type)) != 1) + return true; + return false; + } + case UNION_TYPE: + /* If any of the fields is smaller than the whole, there is padding. */ + for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f)) + if (TREE_CODE (f) == FIELD_DECL) + if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)), + TREE_TYPE (type)) != 1) + return true; + return false; + case ARRAY_TYPE: + case COMPLEX_TYPE: + case VECTOR_TYPE: + /* No recursing here, no padding at this level. */ + return false; + case REAL_TYPE: + return clear_padding_real_needs_padding_p (type); + case BITINT_TYPE: + return clear_padding_bitint_needs_padding_p (type); + default: + return false; + } +} + /* Emit a runtime loop: for (; buf.base != end; buf.base += sz) __builtin_clear_padding (buf.base); */ diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 26a216e151d6f..354a6f53a7216 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -5564,7 +5564,8 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, struct gimplify_init_ctor_preeval_data preeval_data; HOST_WIDE_INT num_ctor_elements, num_nonzero_elements; HOST_WIDE_INT num_unique_nonzero_elements; - bool complete_p, valid_const_initializer; + int complete_p; + bool valid_const_initializer; /* Aggregate types must lower constructors to initialization of individual elements. The exception is that a CONSTRUCTOR node @@ -5668,6 +5669,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, /* If a single access to the target must be ensured and all elements are zero, then it's optimal to clear whatever their number. */ cleared = true; + /* If the object is small enough to go in registers, and it's + not required to be constructed in memory, clear it first. + That will avoid wasting cycles preserving any padding bits + that might be there, and if there aren't any, the compiler + is smart enough to optimize the clearing out. */ + else if (complete_p <= 0 + && !TREE_ADDRESSABLE (ctor) && !TREE_THIS_VOLATILE (object) + && (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK (type)) + && (opt_for_fn (cfun->decl, optimize) + || opt_for_fn (cfun->decl, optimize_size))) + cleared = true; else cleared = false; diff --git a/gcc/testsuite/gcc.dg/init-pad-1.c b/gcc/testsuite/gcc.dg/init-pad-1.c new file mode 100644 index 0000000000000..801f93813e3ad --- /dev/null +++ b/gcc/testsuite/gcc.dg/init-pad-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-Og -fdump-tree-gimple" } */ + +struct s { + short a : 3; + short b : 3; + char c; +}; + +extern void g(struct s *); + +void f() { + struct s x = { 0, 0, 1 }; + g (&x); +} + +/* { dg-final { scan-tree-dump-times "= {};" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-not "= 0;" "gimple" } } */