diff mbox series

expr, v2: Don't clear whole unions [PR116416]

Message ID ZwAfnu/gQvxWL3xU@tucnak
State New
Headers show
Series expr, v2: Don't clear whole unions [PR116416] | expand

Commit Message

Jakub Jelinek Oct. 4, 2024, 5:02 p.m. UTC
On Thu, Oct 03, 2024 at 12:14:35PM -0400, Jason Merrill wrote:
> Agreed, the padding bits have indeterminate values (or erroneous in C++26),
> so it's correct for infoleak-1.c to complain about 4b.

I've been afraid what the kernel people would say about this change (because
reading Linus' mails shows he doesn't care about what the standards say,
but what he expects to see, anything else is "broken").

> > Though, looking at godbolt, clang and icc 19 and older gcc all do zero
> > initialize the whole union before storing the single member in there (if
> > non-zero, otherwise just clear).
> > 
> > So whether we want to do this or do it by default is another question.
> 
> We will want to initialize the padding (for all types) to something for
> C++26, but that's a separate issue...

But ideally in a way where uninit warnings know the bits aren't initialized
even if they are.

> > Anyway, bootstrapped/regtested on x86_64-linux and i686-linux successfully.
> > 
> > 2024-09-28  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	PR c++/116416
> > 	* expr.cc (categorize_ctor_elements_1): Fix up union handling of
> > 	*p_complete.  Clear it only if num_fields is 0 and the union has
> > 	at least one FIELD_DECL, set to -1 if either union has no fields
> > 	and non-zero size, or num_fields is 1 and complete_ctor_at_level_p
> > 	returned false.
> 
> Hmm, complete_ctor_at_level_p also seems to need a change for this
> understanding of union semantics: "every meaningful byte" depends on the
> active member, so it seems like it should return true for a union iff
> num_elts == 1.

I thought complete_ctor_at_level_p has a single caller, but apparently
that isn't the case, cp/typeck2.cc uses it too.

Here is an updated version of the patch, which
a) moves some of the stuff into complete_ctor_at_level_p (but not
   all the *p_complete = 0; case, for that it would need to change
   so that it passes around the ctor rather than just its type)
b) introduces a new option, so that users can either get the new
   behavior (only what is guaranteed by the standards, the default),
   or previous behavior (union padding zero initialization, no such
   guarantees in structures) or also a guarantee in structures
c) introduces a new CONSTRUCTOR flag which says that the padding bits
   (if any) should be zero initialized (and sets it for now in the C++
   FE for C23 {} initializers).

Am not sure the CONSTRUCTOR_ZERO_PADDING_BITS flag is really needed
for C23, if there is just empty initializer, I think we already mark
it as incomplete if there are any missing initializers.  Maybe with
some designated initializer games, say
void foo () {
  struct S { char a; long long b; };
  struct T { struct S c; } t = { .c = {}, .c.a = 1, .c.b = 2 };
...
}
Is this supposed to initialize padding bits in C23 and then the .c.a = 1
and .c.b = 2 stores preserve those padding bits, so is that supposed
to be different from struct T t2 = { .c = { 1, 2 } };
?  What about just struct T t3 = { .c.a = 1, .c.b = 2 }; ?

And I haven't touched the C++ FE for the flag, because I'm afraid I'm lost
on where exactly is zero-initialization done (vs. other types of
initialization) and where is e.g. zero-initialization of a temporary then
(member-wise) copied.
Say
struct S { char a; long long b; };
struct T { constexpr T (int a, int b) : c () { c.a = a; c.b = b; } S c; };
void bar (T *);

void
foo ()
{
  T t (1, 2);
  bar (&t);
}
Is the c () value-initialization of t.c followed by c.a and c.b updates
which preserve the zero initialized padding bits?  Or is there some
copy construction involved which does member-wise copying and makes the
padding bits undefined?
Looking at (older) clang++ with -O2, it initializes also the padding bits
when c () is used and doesn't with c {}.
For GCC, note that there is that optimization from Alex to zero padding bits
for optimization purposes for small aggregates, so either one needs to look
at -O0 -fdump-tree-gimple dumps, or use larger structures which aren't
optimized that way.

Only lightly tested so far, this is mostly for further discussions.
And also a question what exactly does cp/typeck2.cc want from
complete_ctor_at_level_p, e.g. if it wants false for all the cases where
categorize_ctor_elements_1 does *p_complete = 0; (in that case it would need
to know whether CONSTRUCTOR_ZERO_PADDING_BITS flag was set).

2024-10-04  Jakub Jelinek  <jakub@redhat.com>

	PR c++/116416
gcc/
	* flag-types.h (enum zero_init_padding_bits_kind): New type.
	* tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define.
	* common.opt (fzero-init-padding-bits=): New option.
	* expr.cc (categorize_ctor_elements_1): Handle
	CONSTRUCTOR_ZERO_PADDING_BITS or
	flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL.  Fix up
	*p_complete = -1; setting for unions.
	(complete_ctor_at_level_p): Handle unions differently for
	flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD.
	* gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE
	handling, return also true for UNION_TYPE with no FIELD_DECLs
	and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE.
	* doc/invoke.texi (-fzero-init-padding-bits=@var{value}): Document.
gcc/c/
	* c-parser.cc (c_parser_braced_init): Set CONSTRUCTOR_ZERO_PADDING_BITS
	for flag_isoc23 empty initializers.
gcc/testsuite/
	* gcc.dg/plugin/infoleak-1.c (test_union_2b, test_union_4b): Expect
	diagnostics.



	Jakub
diff mbox series

Patch

--- gcc/flag-types.h.jj	2024-09-23 09:07:42.591932097 +0200
+++ gcc/flag-types.h	2024-10-04 16:36:24.484249827 +0200
@@ -291,6 +291,13 @@  enum auto_init_type {
   AUTO_INIT_ZERO = 2
 };
 
+/* Initialization of padding bits with zeros.  */
+enum zero_init_padding_bits_kind {
+  ZERO_INIT_PADDING_BITS_STANDARD = 0,
+  ZERO_INIT_PADDING_BITS_UNIONS = 1,
+  ZERO_INIT_PADDING_BITS_ALL = 2
+};
+
 /* Different instrumentation modes.  */
 enum sanitize_code {
   /* AddressSanitizer.  */
--- gcc/tree.h.jj	2024-09-02 17:07:51.549836445 +0200
+++ gcc/tree.h	2024-10-04 15:09:10.976894865 +0200
@@ -1225,6 +1225,9 @@  extern void omp_clause_range_check_faile
   (vec_safe_length (CONSTRUCTOR_ELTS (NODE)))
 #define CONSTRUCTOR_NO_CLEARING(NODE) \
   (CONSTRUCTOR_CHECK (NODE)->base.public_flag)
+/* True if even padding bits should be zeroed during initialization.  */
+#define CONSTRUCTOR_ZERO_PADDING_BITS(NODE) \
+  (CONSTRUCTOR_CHECK (NODE)->base.default_def_flag)
 
 /* Iterate through the vector V of CONSTRUCTOR_ELT elements, yielding the
    value of each element (stored within VAL). IX must be a scratch variable
--- gcc/common.opt.jj	2024-10-03 17:27:34.067148984 +0200
+++ gcc/common.opt	2024-10-04 16:46:08.781152850 +0200
@@ -3505,6 +3505,22 @@  fzero-call-used-regs=
 Common RejectNegative Joined
 Clear call-used registers upon function return.
 
+fzero-init-padding-bits=
+Common Joined RejectNegative Enum(zero_init_padding_bits_kind) Var(flag_zero_init_padding_bits) Init(ZERO_INIT_PADDING_BITS_STANDARD)
+-fzero-init-padding-bits=[standard|unions|all]	Zero padding bits in initializers.
+
+Enum
+Name(zero_init_padding_bits_kind) Type(enum zero_init_padding_bits_kind) UnknownError(unrecognized zero init padding bits kind %qs)
+
+EnumValue
+Enum(zero_init_padding_bits_kind) String(standard) Value(ZERO_INIT_PADDING_BITS_STANDARD)
+
+EnumValue
+Enum(zero_init_padding_bits_kind) String(unions) Value(ZERO_INIT_PADDING_BITS_UNIONS)
+
+EnumValue
+Enum(zero_init_padding_bits_kind) String(all) Value(ZERO_INIT_PADDING_BITS_ALL)
+
 g
 Common Driver RejectNegative JoinedOrMissing
 Generate debug information in default format.
--- gcc/expr.cc.jj	2024-09-27 16:05:33.463898698 +0200
+++ gcc/expr.cc	2024-10-04 16:24:33.596111730 +0200
@@ -7219,6 +7219,28 @@  categorize_ctor_elements_1 (const_tree c
   if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
 						num_fields, elt_type))
     *p_complete = 0;
+  else if (TREE_CODE (TREE_TYPE (ctor)) == UNION_TYPE
+	   || TREE_CODE (TREE_TYPE (ctor)) == QUAL_UNION_TYPE)
+    {
+      if (*p_complete
+	  && CONSTRUCTOR_ZERO_PADDING_BITS (ctor)
+	  && (num_fields
+	      ? simple_cst_equal (TYPE_SIZE (TREE_TYPE (ctor)),
+				  TYPE_SIZE (elt_type)) != 1
+	      : type_has_padding_at_level_p (TREE_TYPE (ctor))))
+	*p_complete = 0;
+      else if (*p_complete > 0
+	       && (num_fields
+		   ? simple_cst_equal (TYPE_SIZE (TREE_TYPE (ctor)),
+				       TYPE_SIZE (elt_type)) != 1
+		   : type_has_padding_at_level_p (TREE_TYPE (ctor))))
+	*p_complete = -1;
+    }
+  else if (*p_complete
+	   && (CONSTRUCTOR_ZERO_PADDING_BITS (ctor)
+	       || flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL)
+	   && type_has_padding_at_level_p (TREE_TYPE (ctor)))
+    *p_complete = 0;
   else if (*p_complete > 0
 	   && type_has_padding_at_level_p (TREE_TYPE (ctor)))
     *p_complete = -1;
@@ -7293,19 +7315,32 @@  bool
 complete_ctor_at_level_p (const_tree type, HOST_WIDE_INT num_elts,
 			  const_tree last_type)
 {
-  if (TREE_CODE (type) == UNION_TYPE
-      || TREE_CODE (type) == QUAL_UNION_TYPE)
+  if (TREE_CODE (type) == UNION_TYPE || TREE_CODE (type) == QUAL_UNION_TYPE)
     {
       if (num_elts == 0)
-	return false;
+	{
+	  if (flag_zero_init_padding_bits >= ZERO_INIT_PADDING_BITS_UNIONS)
+	    return false;
+
+	  /* If the CONSTRUCTOR doesn't have any elts, it is
+	     incomplete if the union has at least one field.  */
+	  for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
+	    if (TREE_CODE (f) == FIELD_DECL)
+	      return false;
+
+	  return true;
+	}
 
       gcc_assert (num_elts == 1 && last_type);
 
-      /* ??? We could look at each element of the union, and find the
-	 largest element.  Which would avoid comparing the size of the
-	 initialized element against any tail padding in the union.
-	 Doesn't seem worth the effort...  */
-      return simple_cst_equal (TYPE_SIZE (type), TYPE_SIZE (last_type)) == 1;
+      if (flag_zero_init_padding_bits >= ZERO_INIT_PADDING_BITS_UNIONS)
+	/* ??? We could look at each element of the union, and find the
+	   largest element.  Which would avoid comparing the size of the
+	   initialized element against any tail padding in the union.
+	   Doesn't seem worth the effort...  */
+	return simple_cst_equal (TYPE_SIZE (type), TYPE_SIZE (last_type)) == 1;
+
+      return true;
     }
 
   return count_type_elements (type, true) == num_elts;
--- gcc/gimple-fold.cc.jj	2024-09-27 16:05:33.482898441 +0200
+++ gcc/gimple-fold.cc	2024-10-04 15:04:26.103849271 +0200
@@ -4814,12 +4814,22 @@  type_has_padding_at_level_p (tree type)
 	return false;
       }
     case UNION_TYPE:
+    case QUAL_UNION_TYPE:
+      bool any_fields;
+      any_fields = false;
       /* If any of the fields is smaller than the whole, there is padding.  */
       for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
-	if (TREE_CODE (f) == FIELD_DECL)
-	  if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
-				TREE_TYPE (type)) != 1)
-	    return true;
+	if (TREE_CODE (f) != FIELD_DECL)
+	  continue;
+	else if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
+				   TYPE_SIZE (type)) != 1)
+	  return true;
+	else
+	  any_fields = true;
+      /* If the union doesn't have any fields and still has non-zero size,
+	 all of it is padding.  */
+      if (!any_fields && !integer_zerop (TYPE_SIZE (type)))
+	return true;
       return false;
     case ARRAY_TYPE:
     case COMPLEX_TYPE:
--- gcc/doc/invoke.texi.jj	2024-10-04 12:28:08.722900531 +0200
+++ gcc/doc/invoke.texi	2024-10-04 17:55:26.227576480 +0200
@@ -745,7 +745,8 @@  Objective-C and Objective-C++ Dialects}.
 -ftrampolines -ftrampoline-impl=@r{[}stack@r{|}heap@r{]}
 -ftrapv  -fwrapv
 -fvisibility=@r{[}default@r{|}internal@r{|}hidden@r{|}protected@r{]}
--fstrict-volatile-bitfields  -fsync-libcalls}
+-fstrict-volatile-bitfields  -fsync-libcalls
+-fzero-init-padding-bits=@var{value}}
 
 @item Developer Options
 @xref{Developer Options,,GCC Developer Options}.
@@ -20013,6 +20014,43 @@  The default value of this option is enab
 of the option is @option{-fno-sync-libcalls}.  This option is used in
 the implementation of the @file{libatomic} runtime library.
 
+@opindex fzero-init-padding-bits=@var{value}
+@item -fzero-init-padding-bits=@var{value}
+Guarantee zero initalization of padding bits in automatic variable
+initializers.
+Certain languages guarantee zero initialization of padding bits in
+certain cases, e.g. C23 when using empty initializers (@code{@{@}}),
+or C++ when using zero-initialization or C guarantees that fields
+not specified in an initializer have their padding bits zero initialized.
+This option allows to change when padding bits in initializers are
+guaranteed to be zero initialized.
+The default is @code{-fzero-init-padding-bits=standard}, which makes
+no further guarantees than the corresponding standard.  E.g.@:
+
+@smallexample
+  struct A @{ char a; unsigned long long b; char c; @};
+  union B @{ char a; unsigned long long b; @};
+  struct A a = @{@}; // C23 guarantees padding bits are zero.
+  struct A b = @{ 1, 2, 3 @}; // No guarantees.
+  union B c = @{@}; // C23 guarantees padding bits are zero.
+  union B d = @{ 1 @}; // No guarantees.
+@end smallexample
+
+@code{-fzero-init-padding-bits=unions} guarantees zero initialization
+of padding bits in unions on top of what the standards guarantee,
+if the initializer of an union is empty (then all bits of the union
+are zero initialized) or if the initialized member of the union is
+smaller than the size of the union (in that case guarantees padding
+bits outside of the initialized member to be zero initialized).
+This was the GCC behavior before GCC 15 and in the above example guarantees
+zero initialization of last @code{sizeof (unsigned long long) - 1}
+bytes in the union.
+
+@code{-fzero-init-padding-bits=all} guarantees additionally
+zero initialization of padding bits of other aggregates, so
+the padding in between @code{b.a} and @code{b.b} (if any) and
+tail padding in the structure (if any).
+
 @end table
 
 @node Developer Options
--- gcc/c/c-parser.cc.jj	2024-10-02 13:30:10.984427128 +0200
+++ gcc/c/c-parser.cc	2024-10-04 16:37:59.770929382 +0200
@@ -6189,6 +6189,7 @@  c_parser_braced_init (c_parser *parser,
   gcc_assert (c_parser_next_token_is (parser, CPP_OPEN_BRACE));
   bool save_c_omp_array_section_p = c_omp_array_section_p;
   c_omp_array_section_p = false;
+  bool zero_init_padding_bits = false;
   matching_braces braces;
   braces.consume_open (parser);
   if (nested_p)
@@ -6202,6 +6203,8 @@  c_parser_braced_init (c_parser *parser,
     {
       pedwarn_c11 (brace_loc, OPT_Wpedantic,
 		   "ISO C forbids empty initializer braces before C23");
+      if (flag_isoc23)
+	zero_init_padding_bits = true;
     }
   else
     {
@@ -6242,6 +6245,8 @@  c_parser_braced_init (c_parser *parser,
   location_t close_loc = next_tok->location;
   c_parser_consume_token (parser);
   ret = pop_init_level (brace_loc, 0, &braced_init_obstack, close_loc);
+  if (zero_init_padding_bits && TREE_CODE (ret.value) == CONSTRUCTOR)
+    CONSTRUCTOR_ZERO_PADDING_BITS (ret.value) = 1;
   obstack_free (&braced_init_obstack, NULL);
   set_c_expr_source_range (&ret, brace_loc, close_loc);
   return ret;
--- gcc/testsuite/gcc.dg/plugin/infoleak-1.c.jj	2022-09-12 10:32:00.845088105 +0200
+++ gcc/testsuite/gcc.dg/plugin/infoleak-1.c	2024-10-04 15:04:26.159848494 +0200
@@ -123,9 +123,12 @@  void test_union_2a (void __user *dst, u8
 
 void test_union_2b (void __user *dst, u8 v)
 {
-  union un_b u = {0};
+  union un_b u = {0}; /* { dg-message "region created on stack here" "where" } */
+  /* { dg-message "capacity: 4 bytes" "capacity" { target *-*-* } .-1 } */
   u.j = v;
-  copy_to_user(dst, &u, sizeof (union un_b));
+  copy_to_user(dst, &u, sizeof (union un_b)); /* { dg-warning "potential exposure of sensitive information by copying uninitialized data from stack" "warning" } */
+  /* { dg-message "3 bytes are uninitialized" "note how much" { target *-*-* } .-1 } */
+  /* { dg-message "bytes 1 - 3 are uninitialized" "note how much" { target *-*-* } .-2 } */
 }
 
 void test_union_3a (void __user *dst, u32 v)
@@ -150,8 +153,11 @@  void test_union_4a (void __user *dst, u8
 
 void test_union_4b (void __user *dst, u8 v)
 {
-  union un_b u = {0};
-  copy_to_user(dst, &u, sizeof (union un_b)); /* { dg-bogus "" } */
+  union un_b u = {0}; /* { dg-message "region created on stack here" "where" } */
+  /* { dg-message "capacity: 4 bytes" "capacity" { target *-*-* } .-1 } */
+  copy_to_user(dst, &u, sizeof (union un_b)); /* { dg-warning "potential exposure of sensitive information by copying uninitialized data from stack" "warning" } */
+  /* { dg-message "3 bytes are uninitialized" "note how much" { target *-*-* } .-1 } */
+  /* { dg-message "bytes 1 - 3 are uninitialized" "note how much" { target *-*-* } .-2 } */
 }
 
 struct st_union_5