diff mbox series

[v2] Optimize initialization of small padded objects

Message ID orikw19kpg.fsf@lxoliva.fsfla.org
State New
Headers show
Series [v2] Optimize initialization of small padded objects | expand

Commit Message

Alexandre Oliva Aug. 16, 2024, 4:04 a.m. UTC
On Aug 15, 2024, Alexandre Oliva <oliva@adacore.com> wrote:

> I can't quite envision what to check for in a target-independent test.

Got it.  Also dropped some occurrences of CONST_CAST_TREE that I added,
then changed function signatures but failed to remove them.

Retested on x86_64-linux-gnu.  Ok to install?


When small objects containing padding bits (or bytes) are fully
initialized, we will often store them in registers, and setting
bitfields and other small fields will attempt to preserve the
uninitialized padding bits, which tends to be expensive.
Zero-initializing registers, OTOH, tends to be cheap.

So, if we're optimizing, zero-initialize such small padded objects
even if that's not needed for correctness.  We can't zero-initialize
all such padding objects, though: if there's no padding whatsoever,
and all fields are initialized with nonzero, the zero initialization
would be flagged as dead.  That's why we introduce machinery to detect
whether objects have padding bits.  I considered distinguishing
between bitfields, units and larger padding elements, but I didn't
pursue that distinction.

Since the object's zero-initialization subsumes fields'
zero-initialization, the empty string test in builtin-snprintf-6.c's
test_assign_aggregate would regress without the addition of
native_encode_constructor.


for  gcc/ChangeLog

	* expr.cc (categorize_ctor_elements_1): Change p_complete to
	int, to distinguish complete initialization in presence or
	absence of uninitialized padding bits.
	(categorize_ctor_elements): Likewise.  Adjust all callers...
	* expr.h (categorize_ctor_elements): ... and declaration.
	(type_has_padding_at_level_p): New.
	* gimple-fold.cc (type_has_padding_at_level_p): New.
	* fold-const.cc (native_encode_constructor): New.
	(native_encode_expr): Call it.
	* gimplify.cc (gimplify_init_constructor): Clear small
	non-addressable non-volatile objects with padding or
	other uninitialized fields as an optimization.

for  gcc/testsuite/ChangeLog

	* gcc.dg/init-pad-1.c: New.
---
 gcc/expr.cc                       |   20 ++++++++++-----
 gcc/expr.h                        |    3 +-
 gcc/fold-const.cc                 |   33 ++++++++++++++++++++++++
 gcc/gimple-fold.cc                |   50 +++++++++++++++++++++++++++++++++++++
 gcc/gimplify.cc                   |   14 ++++++++++
 gcc/testsuite/gcc.dg/init-pad-1.c |   18 +++++++++++++
 6 files changed, 129 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/init-pad-1.c

Comments

Richard Biener Aug. 22, 2024, 9:41 a.m. UTC | #1
On Fri, Aug 16, 2024 at 6:05 AM Alexandre Oliva <oliva@adacore.com> wrote:
>
> On Aug 15, 2024, Alexandre Oliva <oliva@adacore.com> wrote:
>
> > I can't quite envision what to check for in a target-independent test.
>
> Got it.  Also dropped some occurrences of CONST_CAST_TREE that I added,
> then changed function signatures but failed to remove them.
>
> Retested on x86_64-linux-gnu.  Ok to install?
>
>
> When small objects containing padding bits (or bytes) are fully
> initialized, we will often store them in registers, and setting
> bitfields and other small fields will attempt to preserve the
> uninitialized padding bits, which tends to be expensive.
> Zero-initializing registers, OTOH, tends to be cheap.
>
> So, if we're optimizing, zero-initialize such small padded objects
> even if that's not needed for correctness.  We can't zero-initialize
> all such padding objects, though: if there's no padding whatsoever,
> and all fields are initialized with nonzero, the zero initialization
> would be flagged as dead.  That's why we introduce machinery to detect
> whether objects have padding bits.  I considered distinguishing
> between bitfields, units and larger padding elements, but I didn't
> pursue that distinction.
>
> Since the object's zero-initialization subsumes fields'
> zero-initialization, the empty string test in builtin-snprintf-6.c's
> test_assign_aggregate would regress without the addition of
> native_encode_constructor.

I hope Jakub can look at the has_padding part, I note that we
have native_encode_initializer which handles CONSTRUCTORs.
For GIMPLE you'd only ever see empty ones, so I think handling
those in native_encode_expr is fine - a comment that one shouldn't
try to improve handling there would be nice.

Otherwise looks OK to me, one question and two minor adjustments
below.

Richard.

>
> for  gcc/ChangeLog
>
>         * expr.cc (categorize_ctor_elements_1): Change p_complete to
>         int, to distinguish complete initialization in presence or
>         absence of uninitialized padding bits.
>         (categorize_ctor_elements): Likewise.  Adjust all callers...
>         * expr.h (categorize_ctor_elements): ... and declaration.
>         (type_has_padding_at_level_p): New.
>         * gimple-fold.cc (type_has_padding_at_level_p): New.
>         * fold-const.cc (native_encode_constructor): New.
>         (native_encode_expr): Call it.
>         * gimplify.cc (gimplify_init_constructor): Clear small
>         non-addressable non-volatile objects with padding or
>         other uninitialized fields as an optimization.
>
> for  gcc/testsuite/ChangeLog
>
>         * gcc.dg/init-pad-1.c: New.
> ---
>  gcc/expr.cc                       |   20 ++++++++++-----
>  gcc/expr.h                        |    3 +-
>  gcc/fold-const.cc                 |   33 ++++++++++++++++++++++++
>  gcc/gimple-fold.cc                |   50 +++++++++++++++++++++++++++++++++++++
>  gcc/gimplify.cc                   |   14 ++++++++++
>  gcc/testsuite/gcc.dg/init-pad-1.c |   18 +++++++++++++
>  6 files changed, 129 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/init-pad-1.c
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 2089c2b86a98a..a701c67b3485d 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -7096,7 +7096,7 @@ count_type_elements (const_tree type, bool for_ctor_p)
>  static bool
>  categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
>                             HOST_WIDE_INT *p_unique_nz_elts,
> -                           HOST_WIDE_INT *p_init_elts, bool *p_complete)
> +                           HOST_WIDE_INT *p_init_elts, int *p_complete)
>  {
>    unsigned HOST_WIDE_INT idx;
>    HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields;
> @@ -7218,7 +7218,10 @@ categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
>
>    if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
>                                                 num_fields, elt_type))
> -    *p_complete = false;
> +    *p_complete = 0;
> +  else if (*p_complete > 0
> +          && type_has_padding_at_level_p (TREE_TYPE (ctor)))
> +    *p_complete = -1;
>
>    *p_nz_elts += nz_elts;
>    *p_unique_nz_elts += unique_nz_elts;
> @@ -7239,7 +7242,10 @@ categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
>       and place it in *P_ELT_COUNT.
>     * whether the constructor is complete -- in the sense that every
>       meaningful byte is explicitly given a value --
> -     and place it in *P_COMPLETE.
> +     and place it in *P_COMPLETE:
> +     -  0 if any field is missing
> +     -  1 if all fields are initialized, and there's no padding
> +     - -1 if all fields are initialized, but there's padding
>
>     Return whether or not CTOR is a valid static constant initializer, the same
>     as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0".  */
> @@ -7247,12 +7253,12 @@ categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
>  bool
>  categorize_ctor_elements (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
>                           HOST_WIDE_INT *p_unique_nz_elts,
> -                         HOST_WIDE_INT *p_init_elts, bool *p_complete)
> +                         HOST_WIDE_INT *p_init_elts, int *p_complete)
>  {
>    *p_nz_elts = 0;
>    *p_unique_nz_elts = 0;
>    *p_init_elts = 0;
> -  *p_complete = true;
> +  *p_complete = 1;
>
>    return categorize_ctor_elements_1 (ctor, p_nz_elts, p_unique_nz_elts,
>                                      p_init_elts, p_complete);
> @@ -7313,7 +7319,7 @@ mostly_zeros_p (const_tree exp)
>    if (TREE_CODE (exp) == CONSTRUCTOR)
>      {
>        HOST_WIDE_INT nz_elts, unz_elts, init_elts;
> -      bool complete_p;
> +      int complete_p;
>
>        categorize_ctor_elements (exp, &nz_elts, &unz_elts, &init_elts,
>                                 &complete_p);
> @@ -7331,7 +7337,7 @@ all_zeros_p (const_tree exp)
>    if (TREE_CODE (exp) == CONSTRUCTOR)
>      {
>        HOST_WIDE_INT nz_elts, unz_elts, init_elts;
> -      bool complete_p;
> +      int complete_p;
>
>        categorize_ctor_elements (exp, &nz_elts, &unz_elts, &init_elts,
>                                 &complete_p);
> diff --git a/gcc/expr.h b/gcc/expr.h
> index 533ae0af3871a..04782b15f192c 100644
> --- a/gcc/expr.h
> +++ b/gcc/expr.h
> @@ -361,7 +361,8 @@ extern unsigned HOST_WIDE_INT highest_pow2_factor (const_tree);
>
>  extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *,
>                                       HOST_WIDE_INT *, HOST_WIDE_INT *,
> -                                     bool *);
> +                                     int *);
> +extern bool type_has_padding_at_level_p (tree);
>  extern bool immediate_const_ctor_p (const_tree, unsigned int words = 1);
>  extern void store_constructor (tree, rtx, int, poly_int64, bool);
>  extern HOST_WIDE_INT int_expr_size (const_tree exp);
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 8908e7381e72c..ff2c8f35303b8 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -8193,6 +8193,36 @@ native_encode_string (const_tree expr, unsigned char *ptr, int len, int off)
>    return len;
>  }
>
> +/* Subroutine of native_encode_expr.  Encode the CONSTRUCTOR
> +   specified by EXPR into the buffer PTR of length LEN bytes.
> +   Return the number of bytes placed in the buffer, or zero
> +   upon failure.  */
> +
> +static int
> +native_encode_constructor (const_tree expr, unsigned char *ptr, int len, int off)
> +{
> +  /* We are only concerned with zero-initialization constructors here.  */
> +  if (CONSTRUCTOR_NELTS (expr))
> +    return 0;
> +
> +  /* Wide-char strings are encoded in target byte-order so native
> +     encoding them is trivial.  */
> +  if (BITS_PER_UNIT != CHAR_BIT
> +      || !tree_fits_shwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))))
> +    return 0;
> +
> +  HOST_WIDE_INT total_bytes = tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE (expr)));
> +  if ((off == -1 && total_bytes > len) || off >= total_bytes)
> +    return 0;
> +  if (off == -1)
> +    off = 0;
> +  len = MIN (total_bytes - off, len);
> +  if (ptr == NULL)
> +    /* Dry run.  */;
> +  else
> +    memset (ptr, 0, len);
> +  return len;
> +}
>
>  /* Subroutine of fold_view_convert_expr.  Encode the INTEGER_CST, REAL_CST,
>     FIXED_CST, COMPLEX_CST, STRING_CST, or VECTOR_CST specified by EXPR into
> @@ -8229,6 +8259,9 @@ native_encode_expr (const_tree expr, unsigned char *ptr, int len, int off)
>      case STRING_CST:
>        return native_encode_string (expr, ptr, len, off);
>
> +    case CONSTRUCTOR:
> +      return native_encode_constructor (expr, ptr, len, off);
> +
>      default:
>        return 0;
>      }
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 18d7a6b176db7..f955a006ebe4e 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -4661,6 +4661,56 @@ clear_padding_type_may_have_padding_p (tree type)
>      }
>  }
>
> +/* Return true if TYPE has padding bits aside from those in fields,
> +   elements, etc.  */
> +
> +bool
> +type_has_padding_at_level_p (tree type)
> +{
> +  switch (TREE_CODE (type))
> +    {
> +    case RECORD_TYPE:
> +      {
> +       tree bitpos = size_zero_node;
> +       /* Expect fields to be sorted by bit position.  */
> +       for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> +         if (TREE_CODE (f) == FIELD_DECL)
> +           {
> +             if (DECL_PADDING_P (f))
> +               return true;
> +             tree pos = bit_position (f);
> +             if (simple_cst_equal (bitpos, pos) != 1)
> +               return true;
> +             if (!DECL_SIZE (f))
> +               return true;
> +             bitpos = int_const_binop (PLUS_EXPR, pos, DECL_SIZE (f));
> +           }
> +       if (simple_cst_equal (bitpos, TYPE_SIZE (type)) != 1)
> +         return true;
> +       return false;
> +      }
> +    case UNION_TYPE:
> +      /* If any of the fields is smaller than the whole, there is padding.  */
> +      for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> +       if (TREE_CODE (f) == FIELD_DECL)
> +         if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
> +                               TREE_TYPE (type)) != 1)
> +           return true;
> +      return false;
> +    case ARRAY_TYPE:
> +    case COMPLEX_TYPE:
> +    case VECTOR_TYPE:
> +      /* No recursing here, no padding at this level.  */
> +      return false;
> +    case REAL_TYPE:
> +      return clear_padding_real_needs_padding_p (type);
> +    case BITINT_TYPE:
> +      return clear_padding_bitint_needs_padding_p (type);
> +    default:
> +      return false;
> +    }
> +}
> +
>  /* Emit a runtime loop:
>     for (; buf.base != end; buf.base += sz)
>       __builtin_clear_padding (buf.base);  */
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index 26a216e151d6f..354a6f53a7216 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -5564,7 +5564,8 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
>         struct gimplify_init_ctor_preeval_data preeval_data;
>         HOST_WIDE_INT num_ctor_elements, num_nonzero_elements;
>         HOST_WIDE_INT num_unique_nonzero_elements;
> -       bool complete_p, valid_const_initializer;
> +       int complete_p;
> +       bool valid_const_initializer;
>
>         /* Aggregate types must lower constructors to initialization of
>            individual elements.  The exception is that a CONSTRUCTOR node
> @@ -5668,6 +5669,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
>           /* If a single access to the target must be ensured and all elements
>              are zero, then it's optimal to clear whatever their number.  */
>           cleared = true;
> +       /* If the object is small enough to go in registers, and it's
> +          not required to be constructed in memory, clear it first.
> +          That will avoid wasting cycles preserving any padding bits
> +          that might be there, and if there aren't any, the compiler
> +          is smart enough to optimize the clearing out.  */
> +       else if (complete_p <= 0

I wonder if for complete_p == 1 zeroing first also makes a difference?
 I also wonder
whether the extra zeroing confuses SRA - try say struct s { short a;
int b; } and see
if SRA is still happily considering it.

> +                && !TREE_ADDRESSABLE (ctor) && !TREE_THIS_VOLATILE (object)

2nd && to the next line

> +                && (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK (type))
> +                && (opt_for_fn (cfun->decl, optimize)
> +                    || opt_for_fn (cfun->decl, optimize_size)))

that's simplified as && optimize

> +         cleared = true;
>         else
>           cleared = false;
>
> diff --git a/gcc/testsuite/gcc.dg/init-pad-1.c b/gcc/testsuite/gcc.dg/init-pad-1.c
> new file mode 100644
> index 0000000000000..801f93813e3ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/init-pad-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Og -fdump-tree-gimple" } */
> +
> +struct s {
> +  short a : 3;
> +  short b : 3;
> +  char  c;
> +};
> +
> +extern void g(struct s *);
> +
> +void f() {
> +  struct s x = { 0, 0, 1 };
> +  g (&x);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "= {};" 1 "gimple" } } */
> +/* { dg-final { scan-tree-dump-not "= 0;" "gimple" } } */
>
> --
> Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
>    Free Software Activist                   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
diff mbox series

Patch

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 2089c2b86a98a..a701c67b3485d 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7096,7 +7096,7 @@  count_type_elements (const_tree type, bool for_ctor_p)
 static bool
 categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
 			    HOST_WIDE_INT *p_unique_nz_elts,
-			    HOST_WIDE_INT *p_init_elts, bool *p_complete)
+			    HOST_WIDE_INT *p_init_elts, int *p_complete)
 {
   unsigned HOST_WIDE_INT idx;
   HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields;
@@ -7218,7 +7218,10 @@  categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
 
   if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
 						num_fields, elt_type))
-    *p_complete = false;
+    *p_complete = 0;
+  else if (*p_complete > 0
+	   && type_has_padding_at_level_p (TREE_TYPE (ctor)))
+    *p_complete = -1;
 
   *p_nz_elts += nz_elts;
   *p_unique_nz_elts += unique_nz_elts;
@@ -7239,7 +7242,10 @@  categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
      and place it in *P_ELT_COUNT.
    * whether the constructor is complete -- in the sense that every
      meaningful byte is explicitly given a value --
-     and place it in *P_COMPLETE.
+     and place it in *P_COMPLETE:
+     -  0 if any field is missing
+     -  1 if all fields are initialized, and there's no padding
+     - -1 if all fields are initialized, but there's padding
 
    Return whether or not CTOR is a valid static constant initializer, the same
    as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0".  */
@@ -7247,12 +7253,12 @@  categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
 bool
 categorize_ctor_elements (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
 			  HOST_WIDE_INT *p_unique_nz_elts,
-			  HOST_WIDE_INT *p_init_elts, bool *p_complete)
+			  HOST_WIDE_INT *p_init_elts, int *p_complete)
 {
   *p_nz_elts = 0;
   *p_unique_nz_elts = 0;
   *p_init_elts = 0;
-  *p_complete = true;
+  *p_complete = 1;
 
   return categorize_ctor_elements_1 (ctor, p_nz_elts, p_unique_nz_elts,
 				     p_init_elts, p_complete);
@@ -7313,7 +7319,7 @@  mostly_zeros_p (const_tree exp)
   if (TREE_CODE (exp) == CONSTRUCTOR)
     {
       HOST_WIDE_INT nz_elts, unz_elts, init_elts;
-      bool complete_p;
+      int complete_p;
 
       categorize_ctor_elements (exp, &nz_elts, &unz_elts, &init_elts,
 				&complete_p);
@@ -7331,7 +7337,7 @@  all_zeros_p (const_tree exp)
   if (TREE_CODE (exp) == CONSTRUCTOR)
     {
       HOST_WIDE_INT nz_elts, unz_elts, init_elts;
-      bool complete_p;
+      int complete_p;
 
       categorize_ctor_elements (exp, &nz_elts, &unz_elts, &init_elts,
 				&complete_p);
diff --git a/gcc/expr.h b/gcc/expr.h
index 533ae0af3871a..04782b15f192c 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -361,7 +361,8 @@  extern unsigned HOST_WIDE_INT highest_pow2_factor (const_tree);
 
 extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *,
 				      HOST_WIDE_INT *, HOST_WIDE_INT *,
-				      bool *);
+				      int *);
+extern bool type_has_padding_at_level_p (tree);
 extern bool immediate_const_ctor_p (const_tree, unsigned int words = 1);
 extern void store_constructor (tree, rtx, int, poly_int64, bool);
 extern HOST_WIDE_INT int_expr_size (const_tree exp);
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 8908e7381e72c..ff2c8f35303b8 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8193,6 +8193,36 @@  native_encode_string (const_tree expr, unsigned char *ptr, int len, int off)
   return len;
 }
 
+/* Subroutine of native_encode_expr.  Encode the CONSTRUCTOR
+   specified by EXPR into the buffer PTR of length LEN bytes.
+   Return the number of bytes placed in the buffer, or zero
+   upon failure.  */
+
+static int
+native_encode_constructor (const_tree expr, unsigned char *ptr, int len, int off)
+{
+  /* We are only concerned with zero-initialization constructors here.  */
+  if (CONSTRUCTOR_NELTS (expr))
+    return 0;
+
+  /* Wide-char strings are encoded in target byte-order so native
+     encoding them is trivial.  */
+  if (BITS_PER_UNIT != CHAR_BIT
+      || !tree_fits_shwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))))
+    return 0;
+
+  HOST_WIDE_INT total_bytes = tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE (expr)));
+  if ((off == -1 && total_bytes > len) || off >= total_bytes)
+    return 0;
+  if (off == -1)
+    off = 0;
+  len = MIN (total_bytes - off, len);
+  if (ptr == NULL)
+    /* Dry run.  */;
+  else
+    memset (ptr, 0, len);
+  return len;
+}
 
 /* Subroutine of fold_view_convert_expr.  Encode the INTEGER_CST, REAL_CST,
    FIXED_CST, COMPLEX_CST, STRING_CST, or VECTOR_CST specified by EXPR into
@@ -8229,6 +8259,9 @@  native_encode_expr (const_tree expr, unsigned char *ptr, int len, int off)
     case STRING_CST:
       return native_encode_string (expr, ptr, len, off);
 
+    case CONSTRUCTOR:
+      return native_encode_constructor (expr, ptr, len, off);
+
     default:
       return 0;
     }
diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 18d7a6b176db7..f955a006ebe4e 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -4661,6 +4661,56 @@  clear_padding_type_may_have_padding_p (tree type)
     }
 }
 
+/* Return true if TYPE has padding bits aside from those in fields,
+   elements, etc.  */
+
+bool
+type_has_padding_at_level_p (tree type)
+{
+  switch (TREE_CODE (type))
+    {
+    case RECORD_TYPE:
+      {
+	tree bitpos = size_zero_node;
+	/* Expect fields to be sorted by bit position.  */
+	for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
+	  if (TREE_CODE (f) == FIELD_DECL)
+	    {
+	      if (DECL_PADDING_P (f))
+		return true;
+	      tree pos = bit_position (f);
+	      if (simple_cst_equal (bitpos, pos) != 1)
+		return true;
+	      if (!DECL_SIZE (f))
+		return true;
+	      bitpos = int_const_binop (PLUS_EXPR, pos, DECL_SIZE (f));
+	    }
+	if (simple_cst_equal (bitpos, TYPE_SIZE (type)) != 1)
+	  return true;
+	return false;
+      }
+    case UNION_TYPE:
+      /* If any of the fields is smaller than the whole, there is padding.  */
+      for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
+	if (TREE_CODE (f) == FIELD_DECL)
+	  if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
+				TREE_TYPE (type)) != 1)
+	    return true;
+      return false;
+    case ARRAY_TYPE:
+    case COMPLEX_TYPE:
+    case VECTOR_TYPE:
+      /* No recursing here, no padding at this level.  */
+      return false;
+    case REAL_TYPE:
+      return clear_padding_real_needs_padding_p (type);
+    case BITINT_TYPE:
+      return clear_padding_bitint_needs_padding_p (type);
+    default:
+      return false;
+    }
+}
+
 /* Emit a runtime loop:
    for (; buf.base != end; buf.base += sz)
      __builtin_clear_padding (buf.base);  */
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 26a216e151d6f..354a6f53a7216 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -5564,7 +5564,8 @@  gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	struct gimplify_init_ctor_preeval_data preeval_data;
 	HOST_WIDE_INT num_ctor_elements, num_nonzero_elements;
 	HOST_WIDE_INT num_unique_nonzero_elements;
-	bool complete_p, valid_const_initializer;
+	int complete_p;
+	bool valid_const_initializer;
 
 	/* Aggregate types must lower constructors to initialization of
 	   individual elements.  The exception is that a CONSTRUCTOR node
@@ -5668,6 +5669,17 @@  gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	  /* If a single access to the target must be ensured and all elements
 	     are zero, then it's optimal to clear whatever their number.  */
 	  cleared = true;
+	/* If the object is small enough to go in registers, and it's
+	   not required to be constructed in memory, clear it first.
+	   That will avoid wasting cycles preserving any padding bits
+	   that might be there, and if there aren't any, the compiler
+	   is smart enough to optimize the clearing out.  */
+	else if (complete_p <= 0
+		 && !TREE_ADDRESSABLE (ctor) && !TREE_THIS_VOLATILE (object)
+		 && (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK (type))
+		 && (opt_for_fn (cfun->decl, optimize)
+		     || opt_for_fn (cfun->decl, optimize_size)))
+	  cleared = true;
 	else
 	  cleared = false;
 
diff --git a/gcc/testsuite/gcc.dg/init-pad-1.c b/gcc/testsuite/gcc.dg/init-pad-1.c
new file mode 100644
index 0000000000000..801f93813e3ad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/init-pad-1.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Og -fdump-tree-gimple" } */
+
+struct s {
+  short a : 3;
+  short b : 3;
+  char  c;
+};
+
+extern void g(struct s *);
+
+void f() {
+  struct s x = { 0, 0, 1 };
+  g (&x);
+}
+
+/* { dg-final { scan-tree-dump-times "= {};" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-not "= 0;" "gimple" } } */