diff mbox series

Invalid gimple __BB# accepted due to usage of atoi -> replace atoi with stroul in c_parser_gimple_parse_bb_spec [PR114541]

Message ID 8b3ea7a3-9c7d-4906-bc0b-59b1c5eba45c@hexco.de
State New
Headers show
Series Invalid gimple __BB# accepted due to usage of atoi -> replace atoi with stroul in c_parser_gimple_parse_bb_spec [PR114541] | expand

Commit Message

Heiko Eißfeldt Dec. 5, 2024, 12:41 a.m. UTC
As commented in PR114541 here is a first patch that
1. replaces atoi() with strtoul() with ERANGE checking and
2. fixes the handling of misparsed gimple compounds to return early.
3. adds two new test cases.

There is more work to do for Andrews testcase to succeed, so PR114541
is not done yet.

===

Replace atoi() with strtoul() with ERANGE checking.

The function c_parser_gimple_parse_bb_spec uses atoi,
which can silently return valid numbers even for
some too large numbers in the string.

Furthermore in function c_parser_parse_gimple_body
handle the case of gimple compound statement errors
more generically. In the case of cdil != cdil_gimple
now consider them as errors and return early.
This avoids further processing with erroneous data.

2024-12-05 Heiko Eißfeldt <heiko@hexco.de>

	PR c/114541
	* gimple-parser.cc (c_parser_gimple_parse_bb_spec):
	Use strtoul with ERANGE check instead of atoi

	* gimple-parser.cc (c_parser_parse_gimple_body):
	separate check for errors in c_parser_gimple_compound_statement
	and special handling of cdil == cdil_gimple to allow
	a return in case of errors for cdil != cdil_gimple

	* gcc.dg/pr114541-else-BB#-and-garbagechar.c: New test.
	* gcc.dg/pr114541-then-BB#-and-garbagechar.c: New test.


Signed-off-by: Heiko Eißfeldt <heiko@hexco.de>

Comments

Richard Biener Dec. 5, 2024, 7:45 a.m. UTC | #1
On Thu, Dec 5, 2024 at 1:55 AM Heiko Eißfeldt <heiko@hexco.de> wrote:
>
> As commented in PR114541 here is a first patch that
> 1. replaces atoi() with strtoul() with ERANGE checking and
> 2. fixes the handling of misparsed gimple compounds to return early.
> 3. adds two new test cases.
>
> There is more work to do for Andrews testcase to succeed, so PR114541
> is not done yet.
>
> ===
>
> Replace atoi() with strtoul() with ERANGE checking.
>
> The function c_parser_gimple_parse_bb_spec uses atoi,
> which can silently return valid numbers even for
> some too large numbers in the string.
>
> Furthermore in function c_parser_parse_gimple_body
> handle the case of gimple compound statement errors
> more generically. In the case of cdil != cdil_gimple
> now consider them as errors and return early.
> This avoids further processing with erroneous data.

c_parser_gimple_compound_statement returns whether the
compound statement ended with a return statement, not
whether there was an error, so this change looks wrong.

The hunk in c_parser_gimple_parse_bb_spec is OK.

Richard.

> 2024-12-05 Heiko Eißfeldt <heiko@hexco.de>
>
> PR c/114541
> * gimple-parser.cc (c_parser_gimple_parse_bb_spec):
> Use strtoul with ERANGE check instead of atoi
>
> * gimple-parser.cc (c_parser_parse_gimple_body):
> separate check for errors in c_parser_gimple_compound_statement
> and special handling of cdil == cdil_gimple to allow
> a return in case of errors for cdil != cdil_gimple
>
> * gcc.dg/pr114541-else-BB#-and-garbagechar.c: New test.
> * gcc.dg/pr114541-then-BB#-and-garbagechar.c: New test.
>
>
> Signed-off-by: Heiko Eißfeldt <heiko@hexco.de>
>
>
>
>
Heiko Eißfeldt Dec. 5, 2024, 9:09 a.m. UTC | #2
On 12/5/24 8:45 AM, Richard Biener wrote:
> c_parser_gimple_compound_statement returns whether the
> compound statement ended with a return statement, not
> whether there was an error, so this change looks wrong.
Thanks, I will have to dig deeper to understand error handling then.

Heiko
Heiko Eißfeldt Dec. 6, 2024, 11:12 p.m. UTC | #3
2nd try,

1. replaces atoi() with strtoul() with ERANGE checking (as before)
2. fixes the handling of misparsed 'bb_spec's in c_parser_gimple_if_stmt to return early.
3. adds a new test case.

I hope I am wright with the assumption that in c_parser_gimple_if_stmt
(cfun->curr_properties & PROP_cfg) should imply valid bb_spec's after goto.

	PR c/114541
	* gimple-parser.cc (c_parser_gimple_parse_bb_spec):
	Use strtoul with ERANGE check instead of atoi to avoid UB

	* gimple-parser.cc (c_parser_gimple_if_stmt):
	require valid __BB# basic block indices after goto
	in both branches otherwise return with c_parser_error

	* gcc.dg/pr114541Andrew.c: New test based on
	Andrew's template in the PR.

Signed-off-by: Heiko Eißfeldt <heiko@hexco.de>
On 12/5/24 8:45 AM, Richard Biener wrote:

> On Thu, Dec 5, 2024 at 1:55 AM Heiko Eißfeldt<heiko@hexco.de> wrote:
>> As commented in PR114541 here is a first patch that
>> 1. replaces atoi() with strtoul() with ERANGE checking and
>> 2. fixes the handling of misparsed gimple compounds to return early.
>> 3. adds two new test cases.
>>
>> There is more work to do for Andrews testcase to succeed, so PR114541
>> is not done yet.
>>
>> ===
>>
>> Replace atoi() with strtoul() with ERANGE checking.
>>
>> The function c_parser_gimple_parse_bb_spec uses atoi,
>> which can silently return valid numbers even for
>> some too large numbers in the string.
>>
>> Furthermore in function c_parser_parse_gimple_body
>> handle the case of gimple compound statement errors
>> more generically. In the case of cdil != cdil_gimple
>> now consider them as errors and return early.
>> This avoids further processing with erroneous data.
> c_parser_gimple_compound_statement returns whether the
> compound statement ended with a return statement, not
> whether there was an error, so this change looks wrong.
>
> The hunk in c_parser_gimple_parse_bb_spec is OK.
>
> Richard.
>
>> 2024-12-05 Heiko Eißfeldt<heiko@hexco.de>
>>
>> PR c/114541
>> * gimple-parser.cc (c_parser_gimple_parse_bb_spec):
>> Use strtoul with ERANGE check instead of atoi
>>
>> * gimple-parser.cc (c_parser_parse_gimple_body):
>> separate check for errors in c_parser_gimple_compound_statement
>> and special handling of cdil == cdil_gimple to allow
>> a return in case of errors for cdil != cdil_gimple
>>
>> * gcc.dg/pr114541-else-BB#-and-garbagechar.c: New test.
>> * gcc.dg/pr114541-then-BB#-and-garbagechar.c: New test.
>>
>>
>> Signed-off-by: Heiko Eißfeldt<heiko@hexco.de>
>>
>>
>>
>>
Heiko Eißfeldt Dec. 6, 2024, 11:16 p.m. UTC | #4
and here is the forgotten patch (it is late...)
diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc
index 78e85d93487..b018bb6afb6 100644
--- a/gcc/c/gimple-parser.cc
+++ b/gcc/c/gimple-parser.cc
@@ -133,11 +133,21 @@ c_parser_gimple_parse_bb_spec (tree val, int *index)
 {
   if (!startswith (IDENTIFIER_POINTER (val), "__BB"))
     return false;
-  for (const char *p = IDENTIFIER_POINTER (val) + 4; *p; ++p)
-    if (!ISDIGIT (*p))
-      return false;
-  *index = atoi (IDENTIFIER_POINTER (val) + 4);
-  return *index > 0;
+
+  const char *bb = IDENTIFIER_POINTER (val) + 4;
+  if (! ISDIGIT (*bb))
+    return false;
+
+  char *pend;
+  errno = 0;
+  const unsigned long number = strtoul (bb, &pend, 10);
+  if (errno == ERANGE
+      || *pend != '\0'
+      || number > INT_MAX)
+    return false;
+
+  *index = number;
+  return true;
 }
 
 /* See if VAL is an identifier matching __BB<num> and return <num>
@@ -2384,11 +2394,20 @@ c_parser_gimple_if_stmt (gimple_parser &parser, gimple_seq *seq)
       c_parser_consume_token (parser);
       int dest_index;
       profile_probability prob;
-      if ((cfun->curr_properties & PROP_cfg)
-	  && c_parser_gimple_parse_bb_spec_edge_probability (label, parser,
-							     &dest_index, &prob))
-	parser.push_edge (parser.current_bb->index, dest_index,
-			  EDGE_TRUE_VALUE, prob);
+      if (cfun->curr_properties & PROP_cfg)
+	{
+	  if (c_parser_gimple_parse_bb_spec_edge_probability (label, parser,
+							      &dest_index, &prob))
+	    {
+	      parser.push_edge (parser.current_bb->index, dest_index,
+				EDGE_TRUE_VALUE, prob);
+	    }
+	  else
+	    {
+	      c_parser_error (parser, "expected valid __BB#");
+	      return;
+	    }
+	}
       else
 	t_label = lookup_label_for_goto (loc, label);
       if (! c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>"))
@@ -2421,11 +2440,20 @@ c_parser_gimple_if_stmt (gimple_parser &parser, gimple_seq *seq)
       c_parser_consume_token (parser);
       int dest_index;
       profile_probability prob;
-      if ((cfun->curr_properties & PROP_cfg)
-	  && c_parser_gimple_parse_bb_spec_edge_probability (label, parser,
-							     &dest_index, &prob))
-	parser.push_edge (parser.current_bb->index, dest_index,
-			  EDGE_FALSE_VALUE, prob);
+      if (cfun->curr_properties & PROP_cfg)
+	{
+	  if (c_parser_gimple_parse_bb_spec_edge_probability (label, parser,
+							      &dest_index, &prob))
+	    {
+	      parser.push_edge (parser.current_bb->index, dest_index,
+			       EDGE_FALSE_VALUE, prob);
+	    }
+	  else
+	    {
+	      c_parser_error (parser, "expected valid __BB#");
+	      return;
+	    }
+	}
       else
 	f_label = lookup_label_for_goto (loc, label);
       if (! c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>"))
diff --git a/gcc/testsuite/gcc.dg/pr114541Andrew.c b/gcc/testsuite/gcc.dg/pr114541Andrew.c
new file mode 100644
index 00000000000..bc92473a79e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114541Andrew.c
@@ -0,0 +1,28 @@
+/* PR middle-end/114541 */
+/* { dg-do compile } */
+/* { dg-options "-O -fgimple -c" } */
+
+void __GIMPLE (ssa,startwith ("dse2")) foo ()
+{
+  int a;
+
+__BB(2):
+  if (a_5(D) > 4)
+    goto __BB4294967299;  /* { dg-error "expected valid __BB# before ';' token" } */
+  else
+    goto __BB4;
+
+__BB(3):
+  a_2 = 10;
+  goto __BB5;
+
+__BB(4):
+  a_3 = 20;
+  goto __BB5;
+
+__BB(5):
+  a_1 = __PHI (__BB3: a_2, __BB4: a_3);
+  a_4 = a_1 + 4;
+
+return;
+}
Richard Biener Dec. 8, 2024, 9:17 a.m. UTC | #5
On Sat, Dec 7, 2024 at 12:28 AM Heiko Eißfeldt <heiko@hexco.de> wrote:
>
> and here is the forgotten patch (it is late...)

Hmm.  While this might seem to be "good", the failure mode after fixing
the issue with atoi (or the alternate failure mode when using __BBB1234)
shows that we are missing handling in the GIMPLE frontend for
labels used that are not declared - lookup_label_for_goto tentatively
records a label.  I couldn't quickly find how the C frontend later discovers
used but not bound labels - in principle the GIMPLE FE would need sth
similar.

There's also missing support for indirect gotos it seems.

void foo(int x)
{
  void *l = &&lab;
lab:
  goto *l;
}

is, when in SSA form

void __GIMPLE (ssa)
foo (int x)
{
  void * gotovar.0;
  void * l;
  void * _2;

  __BB(2):
  l_1 = &lab;
  goto __BB3;

  __BB(3):
lab:
  _2 = l_1;
  goto __BB4;

  __BB(4):
  goto _2;

}

so you can see we have to support a non-__BB(N) goto even on GIMPLE
(labels shouldn't appear there - those are resolved to __BB(N)).

With your patch applied the following still ICEs in calc_dfs_tree for me
and no error is emitted:

void __GIMPLE (cfg)
foo (int x)
{
  void * gotovar;
  void * l;

  __BB(2):
lab:
  goto __BBB3;

  __BB(3):
  if (x != 0)
    goto __BB5;
  else
    goto __BB6;

  __BB(4):
  goto gotovar;

  __BB(5):
  gotovar = l;
  goto __BB4;

  __BB(6):
  return;

}

Note you picked a difficult patch - as said, the atoi fix looks OK to
me, even if
it doesn't solve downstream issues.  Can you split that out so I can
approve that
separately?

The GIMPLE FE is really not meant to parse arbitrary errorneous input - the ICEs
you run into should be viewed as error reporting (even if not exactly
helpful in this case).

That said, we should emit helpful errors for feeding
-fdump-tree-X-gimple output into
the GIMPLE FE since the dump output you get is not yet parseable without manual
editing.  "Fuzzing" attempts shouldn't be considered priority when fixing.

Thanks,
Richard.
Heiko Eißfeldt Dec. 9, 2024, 7:26 a.m. UTC | #6
On 12/8/24 10:17 AM, Richard Biener wrote:
> Note you picked a difficult patch - as said, the atoi fix looks OK to
> me, even if
> it doesn't solve downstream issues.  Can you split that out so I can
> approve that
> separately?
Thanks for your insights. I am a bit relieved that I don't have to
handle all of the fallout.

	PR c/114541
	* gimple-parser.cc (c_parser_gimple_parse_bb_spec):
	Use strtoul with ERANGE check instead of atoi to avoid UB
	and detect invalid __BB#. The full treatment of these invalid
	values was considered out of scope for this patch.

Signed-off-by: Heiko Eißfeldt <heiko@hexco.de>
diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc
index 78e85d93487..b018bb6afb6 100644
--- a/gcc/c/gimple-parser.cc
+++ b/gcc/c/gimple-parser.cc
@@ -133,11 +133,21 @@ c_parser_gimple_parse_bb_spec (tree val, int *index)
 {
   if (!startswith (IDENTIFIER_POINTER (val), "__BB"))
     return false;
-  for (const char *p = IDENTIFIER_POINTER (val) + 4; *p; ++p)
-    if (!ISDIGIT (*p))
-      return false;
-  *index = atoi (IDENTIFIER_POINTER (val) + 4);
-  return *index > 0;
+
+  const char *bb = IDENTIFIER_POINTER (val) + 4;
+  if (! ISDIGIT (*bb))
+    return false;
+
+  char *pend;
+  errno = 0;
+  const unsigned long number = strtoul (bb, &pend, 10);
+  if (errno == ERANGE
+      || *pend != '\0'
+      || number > INT_MAX)
+    return false;
+
+  *index = number;
+  return true;
 }
 
 /* See if VAL is an identifier matching __BB<num> and return <num>
Richard Biener Dec. 9, 2024, 9:15 a.m. UTC | #7
On Mon, Dec 9, 2024 at 8:39 AM Heiko Eißfeldt <heiko@hexco.de> wrote:
>
> On 12/8/24 10:17 AM, Richard Biener wrote:
>
> Note you picked a difficult patch - as said, the atoi fix looks OK to
> me, even if
> it doesn't solve downstream issues.  Can you split that out so I can
> approve that
> separately?
>
> Thanks for your insights. I am a bit relieved that I don't have to
> handle all of the fallout.

Can you send the patch in git format-patch format please so that
it includes the commit message and Author tag and I can git am
for pushing?  At least you are not listed in MAINTAINERS and thus
likely do not have git commit access?

Thanks,
Richard.

> PR c/114541
> * gimple-parser.cc (c_parser_gimple_parse_bb_spec):
> Use strtoul with ERANGE check instead of atoi to avoid UB
> and detect invalid __BB#. The full treatment of these invalid
> values was considered out of scope for this patch.
>
> Signed-off-by: Heiko Eißfeldt <heiko@hexco.de>
>
Heiko Eißfeldt Dec. 9, 2024, 11:22 a.m. UTC | #8
On 12/9/24 10:15 AM, Richard Biener wrote:
> Can you send the patch in git format-patch format please so that
> it includes the commit message and Author tag and I can git am
> for pushing?  At least you are not listed in MAINTAINERS and thus
> likely do not have git commit access?
Yes, currently I do not have no commit access.
Here it comes:

	PR c/114541
	* gimple-parser.cc (c_parser_gimple_parse_bb_spec):
	Use strtoul with ERANGE check instead of atoi to avoid UB
	and detect invalid __BB#. The full treatment of these invalid
	values was considered out of scope for this patch.
Richard Biener Dec. 9, 2024, 12:17 p.m. UTC | #9
On Mon, Dec 9, 2024 at 12:34 PM Heiko Eißfeldt <heiko@hexco.de> wrote:
>
> On 12/9/24 10:15 AM, Richard Biener wrote:
> > Can you send the patch in git format-patch format please so that
> > it includes the commit message and Author tag and I can git am
> > for pushing?  At least you are not listed in MAINTAINERS and thus
> > likely do not have git commit access?
> Yes, currently I do not have no commit access.
> Here it comes:
>
>         PR c/114541
>         * gimple-parser.cc (c_parser_gimple_parse_bb_spec):
>         Use strtoul with ERANGE check instead of atoi to avoid UB
>         and detect invalid __BB#. The full treatment of these invalid
>         values was considered out of scope for this patch.

Pushed.  I've edited the above ChangeLog entry into the commit message
as follows.

From: Heiko Eißfeldt <heiko@hexco.de>

The full treatment of these invalid values was considered out of
scope for this patch.

        PR c/114541
        * gimple-parser.cc (c_parser_gimple_parse_bb_spec):
        Use strtoul with ERANGE check instead of atoi to avoid UB
        and detect invalid __BB#.

Signed-off-by: Heiko Eißfeldt <heiko@hexco.de>
---
 gcc/c/gimple-parser.cc | 20 +++++++++++++++-----
...
diff mbox series

Patch

diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc
index 78e85d93487..21631fa02f5 100644
--- a/gcc/c/gimple-parser.cc
+++ b/gcc/c/gimple-parser.cc
@@ -133,11 +133,21 @@  c_parser_gimple_parse_bb_spec (tree val, int *index)
 {
   if (!startswith (IDENTIFIER_POINTER (val), "__BB"))
     return false;
-  for (const char *p = IDENTIFIER_POINTER (val) + 4; *p; ++p)
-    if (!ISDIGIT (*p))
-      return false;
-  *index = atoi (IDENTIFIER_POINTER (val) + 4);
-  return *index > 0;
+
+  const char *bb = IDENTIFIER_POINTER (val) + 4;
+  if (! ISDIGIT (*bb))
+    return false;
+
+  char *pend;
+  errno = 0;
+  const unsigned long number = strtoul (bb, &pend, 10);
+  if (errno == ERANGE
+      || *pend != '\0'
+      || number > INT_MAX)
+    return false;
+
+  *index = number;
+  return true;
 }
 
 /* See if VAL is an identifier matching __BB<num> and return <num>
@@ -250,11 +260,18 @@  c_parser_parse_gimple_body (c_parser *cparser, char *gimple_pass,
 	}
     }
 
-  if (! c_parser_gimple_compound_statement (parser, &seq)
-      && cdil == cdil_gimple)
+  if (! c_parser_gimple_compound_statement (parser, &seq))
     {
-      gimple *ret = gimple_build_return (NULL);
-      gimple_seq_add_stmt_without_update (&seq, ret);
+      if (cdil == cdil_gimple)
+	{
+	  gimple *ret = gimple_build_return (NULL);
+	  gimple_seq_add_stmt_without_update (&seq, ret);
+	}
+      else
+	{
+	  /* in case of syntax errors abort early */
+	  return;
+	}
     }
 
   tree block = pop_scope ();
diff --git a/gcc/testsuite/gcc.dg/pr114541-else-BB#-and-garbagechar.c b/gcc/testsuite/gcc.dg/pr114541-else-BB#-and-garbagechar.c
new file mode 100644
index 00000000000..4cb990f4423
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114541-else-BB#-and-garbagechar.c
@@ -0,0 +1,29 @@ 
+/* PR middle-end/114541 */
+/* { dg-do compile } */
+/* { dg-options "-fgimple" } */
+/* { dg-bogus "internal compiler error" } */
+
+void __GIMPLE (ssa,startwith ("dse2")) foo ()
+{
+  int a;
+
+__BB(2):
+  if (a_5(D) > 4)
+    goto __BB3;
+  else
+    goto __BB4&; /* { dg-error "expected ';' before" } */
+
+__BB(3):
+  a_2 = 10;
+  goto __BB5;
+
+__BB(4):
+  a_3 = 20;
+  goto __BB5;
+
+__BB(5):
+  a_1 = __PHI (__BB3: a_2, __BB4: a_3);
+  a_4 = a_1 + 4;
+
+return;
+}
diff --git a/gcc/testsuite/gcc.dg/pr114541-then-BB#-and-garbagechar.c b/gcc/testsuite/gcc.dg/pr114541-then-BB#-and-garbagechar.c
new file mode 100644
index 00000000000..bcb6a937283
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114541-then-BB#-and-garbagechar.c
@@ -0,0 +1,29 @@ 
+/* PR middle-end/114541 */
+/* { dg-do compile } */
+/* { dg-options "-fgimple" } */
+/* { dg-bogus "internal compiler error" } */
+
+void __GIMPLE (ssa,startwith ("dse2")) foo ()
+{
+  int a;
+
+__BB(2):
+  if (a_5(D) > 4)
+    goto __BB3&; /* { dg-error "expected ';' before" } */
+  else
+    goto __BB4;
+
+__BB(3):
+  a_2 = 10;
+  goto __BB5;
+
+__BB(4):
+  a_3 = 20;
+  goto __BB5;
+
+__BB(5):
+  a_1 = __PHI (__BB3: a_2, __BB4: a_3);
+  a_4 = a_1 + 4;
+
+return;
+}