Message ID | 8b3ea7a3-9c7d-4906-bc0b-59b1c5eba45c@hexco.de |
---|---|
State | New |
Headers | show |
Series | Invalid gimple __BB# accepted due to usage of atoi -> replace atoi with stroul in c_parser_gimple_parse_bb_spec [PR114541] | expand |
On Thu, Dec 5, 2024 at 1:55 AM Heiko Eißfeldt <heiko@hexco.de> wrote: > > As commented in PR114541 here is a first patch that > 1. replaces atoi() with strtoul() with ERANGE checking and > 2. fixes the handling of misparsed gimple compounds to return early. > 3. adds two new test cases. > > There is more work to do for Andrews testcase to succeed, so PR114541 > is not done yet. > > === > > Replace atoi() with strtoul() with ERANGE checking. > > The function c_parser_gimple_parse_bb_spec uses atoi, > which can silently return valid numbers even for > some too large numbers in the string. > > Furthermore in function c_parser_parse_gimple_body > handle the case of gimple compound statement errors > more generically. In the case of cdil != cdil_gimple > now consider them as errors and return early. > This avoids further processing with erroneous data. c_parser_gimple_compound_statement returns whether the compound statement ended with a return statement, not whether there was an error, so this change looks wrong. The hunk in c_parser_gimple_parse_bb_spec is OK. Richard. > 2024-12-05 Heiko Eißfeldt <heiko@hexco.de> > > PR c/114541 > * gimple-parser.cc (c_parser_gimple_parse_bb_spec): > Use strtoul with ERANGE check instead of atoi > > * gimple-parser.cc (c_parser_parse_gimple_body): > separate check for errors in c_parser_gimple_compound_statement > and special handling of cdil == cdil_gimple to allow > a return in case of errors for cdil != cdil_gimple > > * gcc.dg/pr114541-else-BB#-and-garbagechar.c: New test. > * gcc.dg/pr114541-then-BB#-and-garbagechar.c: New test. > > > Signed-off-by: Heiko Eißfeldt <heiko@hexco.de> > > > >
On 12/5/24 8:45 AM, Richard Biener wrote: > c_parser_gimple_compound_statement returns whether the > compound statement ended with a return statement, not > whether there was an error, so this change looks wrong. Thanks, I will have to dig deeper to understand error handling then. Heiko
2nd try, 1. replaces atoi() with strtoul() with ERANGE checking (as before) 2. fixes the handling of misparsed 'bb_spec's in c_parser_gimple_if_stmt to return early. 3. adds a new test case. I hope I am wright with the assumption that in c_parser_gimple_if_stmt (cfun->curr_properties & PROP_cfg) should imply valid bb_spec's after goto. PR c/114541 * gimple-parser.cc (c_parser_gimple_parse_bb_spec): Use strtoul with ERANGE check instead of atoi to avoid UB * gimple-parser.cc (c_parser_gimple_if_stmt): require valid __BB# basic block indices after goto in both branches otherwise return with c_parser_error * gcc.dg/pr114541Andrew.c: New test based on Andrew's template in the PR. Signed-off-by: Heiko Eißfeldt <heiko@hexco.de> On 12/5/24 8:45 AM, Richard Biener wrote: > On Thu, Dec 5, 2024 at 1:55 AM Heiko Eißfeldt<heiko@hexco.de> wrote: >> As commented in PR114541 here is a first patch that >> 1. replaces atoi() with strtoul() with ERANGE checking and >> 2. fixes the handling of misparsed gimple compounds to return early. >> 3. adds two new test cases. >> >> There is more work to do for Andrews testcase to succeed, so PR114541 >> is not done yet. >> >> === >> >> Replace atoi() with strtoul() with ERANGE checking. >> >> The function c_parser_gimple_parse_bb_spec uses atoi, >> which can silently return valid numbers even for >> some too large numbers in the string. >> >> Furthermore in function c_parser_parse_gimple_body >> handle the case of gimple compound statement errors >> more generically. In the case of cdil != cdil_gimple >> now consider them as errors and return early. >> This avoids further processing with erroneous data. > c_parser_gimple_compound_statement returns whether the > compound statement ended with a return statement, not > whether there was an error, so this change looks wrong. > > The hunk in c_parser_gimple_parse_bb_spec is OK. > > Richard. > >> 2024-12-05 Heiko Eißfeldt<heiko@hexco.de> >> >> PR c/114541 >> * gimple-parser.cc (c_parser_gimple_parse_bb_spec): >> Use strtoul with ERANGE check instead of atoi >> >> * gimple-parser.cc (c_parser_parse_gimple_body): >> separate check for errors in c_parser_gimple_compound_statement >> and special handling of cdil == cdil_gimple to allow >> a return in case of errors for cdil != cdil_gimple >> >> * gcc.dg/pr114541-else-BB#-and-garbagechar.c: New test. >> * gcc.dg/pr114541-then-BB#-and-garbagechar.c: New test. >> >> >> Signed-off-by: Heiko Eißfeldt<heiko@hexco.de> >> >> >> >>
and here is the forgotten patch (it is late...) diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc index 78e85d93487..b018bb6afb6 100644 --- a/gcc/c/gimple-parser.cc +++ b/gcc/c/gimple-parser.cc @@ -133,11 +133,21 @@ c_parser_gimple_parse_bb_spec (tree val, int *index) { if (!startswith (IDENTIFIER_POINTER (val), "__BB")) return false; - for (const char *p = IDENTIFIER_POINTER (val) + 4; *p; ++p) - if (!ISDIGIT (*p)) - return false; - *index = atoi (IDENTIFIER_POINTER (val) + 4); - return *index > 0; + + const char *bb = IDENTIFIER_POINTER (val) + 4; + if (! ISDIGIT (*bb)) + return false; + + char *pend; + errno = 0; + const unsigned long number = strtoul (bb, &pend, 10); + if (errno == ERANGE + || *pend != '\0' + || number > INT_MAX) + return false; + + *index = number; + return true; } /* See if VAL is an identifier matching __BB<num> and return <num> @@ -2384,11 +2394,20 @@ c_parser_gimple_if_stmt (gimple_parser &parser, gimple_seq *seq) c_parser_consume_token (parser); int dest_index; profile_probability prob; - if ((cfun->curr_properties & PROP_cfg) - && c_parser_gimple_parse_bb_spec_edge_probability (label, parser, - &dest_index, &prob)) - parser.push_edge (parser.current_bb->index, dest_index, - EDGE_TRUE_VALUE, prob); + if (cfun->curr_properties & PROP_cfg) + { + if (c_parser_gimple_parse_bb_spec_edge_probability (label, parser, + &dest_index, &prob)) + { + parser.push_edge (parser.current_bb->index, dest_index, + EDGE_TRUE_VALUE, prob); + } + else + { + c_parser_error (parser, "expected valid __BB#"); + return; + } + } else t_label = lookup_label_for_goto (loc, label); if (! c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>")) @@ -2421,11 +2440,20 @@ c_parser_gimple_if_stmt (gimple_parser &parser, gimple_seq *seq) c_parser_consume_token (parser); int dest_index; profile_probability prob; - if ((cfun->curr_properties & PROP_cfg) - && c_parser_gimple_parse_bb_spec_edge_probability (label, parser, - &dest_index, &prob)) - parser.push_edge (parser.current_bb->index, dest_index, - EDGE_FALSE_VALUE, prob); + if (cfun->curr_properties & PROP_cfg) + { + if (c_parser_gimple_parse_bb_spec_edge_probability (label, parser, + &dest_index, &prob)) + { + parser.push_edge (parser.current_bb->index, dest_index, + EDGE_FALSE_VALUE, prob); + } + else + { + c_parser_error (parser, "expected valid __BB#"); + return; + } + } else f_label = lookup_label_for_goto (loc, label); if (! c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>")) diff --git a/gcc/testsuite/gcc.dg/pr114541Andrew.c b/gcc/testsuite/gcc.dg/pr114541Andrew.c new file mode 100644 index 00000000000..bc92473a79e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr114541Andrew.c @@ -0,0 +1,28 @@ +/* PR middle-end/114541 */ +/* { dg-do compile } */ +/* { dg-options "-O -fgimple -c" } */ + +void __GIMPLE (ssa,startwith ("dse2")) foo () +{ + int a; + +__BB(2): + if (a_5(D) > 4) + goto __BB4294967299; /* { dg-error "expected valid __BB# before ';' token" } */ + else + goto __BB4; + +__BB(3): + a_2 = 10; + goto __BB5; + +__BB(4): + a_3 = 20; + goto __BB5; + +__BB(5): + a_1 = __PHI (__BB3: a_2, __BB4: a_3); + a_4 = a_1 + 4; + +return; +}
On Sat, Dec 7, 2024 at 12:28 AM Heiko Eißfeldt <heiko@hexco.de> wrote: > > and here is the forgotten patch (it is late...) Hmm. While this might seem to be "good", the failure mode after fixing the issue with atoi (or the alternate failure mode when using __BBB1234) shows that we are missing handling in the GIMPLE frontend for labels used that are not declared - lookup_label_for_goto tentatively records a label. I couldn't quickly find how the C frontend later discovers used but not bound labels - in principle the GIMPLE FE would need sth similar. There's also missing support for indirect gotos it seems. void foo(int x) { void *l = &&lab; lab: goto *l; } is, when in SSA form void __GIMPLE (ssa) foo (int x) { void * gotovar.0; void * l; void * _2; __BB(2): l_1 = &lab; goto __BB3; __BB(3): lab: _2 = l_1; goto __BB4; __BB(4): goto _2; } so you can see we have to support a non-__BB(N) goto even on GIMPLE (labels shouldn't appear there - those are resolved to __BB(N)). With your patch applied the following still ICEs in calc_dfs_tree for me and no error is emitted: void __GIMPLE (cfg) foo (int x) { void * gotovar; void * l; __BB(2): lab: goto __BBB3; __BB(3): if (x != 0) goto __BB5; else goto __BB6; __BB(4): goto gotovar; __BB(5): gotovar = l; goto __BB4; __BB(6): return; } Note you picked a difficult patch - as said, the atoi fix looks OK to me, even if it doesn't solve downstream issues. Can you split that out so I can approve that separately? The GIMPLE FE is really not meant to parse arbitrary errorneous input - the ICEs you run into should be viewed as error reporting (even if not exactly helpful in this case). That said, we should emit helpful errors for feeding -fdump-tree-X-gimple output into the GIMPLE FE since the dump output you get is not yet parseable without manual editing. "Fuzzing" attempts shouldn't be considered priority when fixing. Thanks, Richard.
On 12/8/24 10:17 AM, Richard Biener wrote: > Note you picked a difficult patch - as said, the atoi fix looks OK to > me, even if > it doesn't solve downstream issues. Can you split that out so I can > approve that > separately? Thanks for your insights. I am a bit relieved that I don't have to handle all of the fallout. PR c/114541 * gimple-parser.cc (c_parser_gimple_parse_bb_spec): Use strtoul with ERANGE check instead of atoi to avoid UB and detect invalid __BB#. The full treatment of these invalid values was considered out of scope for this patch. Signed-off-by: Heiko Eißfeldt <heiko@hexco.de> diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc index 78e85d93487..b018bb6afb6 100644 --- a/gcc/c/gimple-parser.cc +++ b/gcc/c/gimple-parser.cc @@ -133,11 +133,21 @@ c_parser_gimple_parse_bb_spec (tree val, int *index) { if (!startswith (IDENTIFIER_POINTER (val), "__BB")) return false; - for (const char *p = IDENTIFIER_POINTER (val) + 4; *p; ++p) - if (!ISDIGIT (*p)) - return false; - *index = atoi (IDENTIFIER_POINTER (val) + 4); - return *index > 0; + + const char *bb = IDENTIFIER_POINTER (val) + 4; + if (! ISDIGIT (*bb)) + return false; + + char *pend; + errno = 0; + const unsigned long number = strtoul (bb, &pend, 10); + if (errno == ERANGE + || *pend != '\0' + || number > INT_MAX) + return false; + + *index = number; + return true; } /* See if VAL is an identifier matching __BB<num> and return <num>
On Mon, Dec 9, 2024 at 8:39 AM Heiko Eißfeldt <heiko@hexco.de> wrote: > > On 12/8/24 10:17 AM, Richard Biener wrote: > > Note you picked a difficult patch - as said, the atoi fix looks OK to > me, even if > it doesn't solve downstream issues. Can you split that out so I can > approve that > separately? > > Thanks for your insights. I am a bit relieved that I don't have to > handle all of the fallout. Can you send the patch in git format-patch format please so that it includes the commit message and Author tag and I can git am for pushing? At least you are not listed in MAINTAINERS and thus likely do not have git commit access? Thanks, Richard. > PR c/114541 > * gimple-parser.cc (c_parser_gimple_parse_bb_spec): > Use strtoul with ERANGE check instead of atoi to avoid UB > and detect invalid __BB#. The full treatment of these invalid > values was considered out of scope for this patch. > > Signed-off-by: Heiko Eißfeldt <heiko@hexco.de> >
On 12/9/24 10:15 AM, Richard Biener wrote: > Can you send the patch in git format-patch format please so that > it includes the commit message and Author tag and I can git am > for pushing? At least you are not listed in MAINTAINERS and thus > likely do not have git commit access? Yes, currently I do not have no commit access. Here it comes: PR c/114541 * gimple-parser.cc (c_parser_gimple_parse_bb_spec): Use strtoul with ERANGE check instead of atoi to avoid UB and detect invalid __BB#. The full treatment of these invalid values was considered out of scope for this patch.
On Mon, Dec 9, 2024 at 12:34 PM Heiko Eißfeldt <heiko@hexco.de> wrote: > > On 12/9/24 10:15 AM, Richard Biener wrote: > > Can you send the patch in git format-patch format please so that > > it includes the commit message and Author tag and I can git am > > for pushing? At least you are not listed in MAINTAINERS and thus > > likely do not have git commit access? > Yes, currently I do not have no commit access. > Here it comes: > > PR c/114541 > * gimple-parser.cc (c_parser_gimple_parse_bb_spec): > Use strtoul with ERANGE check instead of atoi to avoid UB > and detect invalid __BB#. The full treatment of these invalid > values was considered out of scope for this patch. Pushed. I've edited the above ChangeLog entry into the commit message as follows. From: Heiko Eißfeldt <heiko@hexco.de> The full treatment of these invalid values was considered out of scope for this patch. PR c/114541 * gimple-parser.cc (c_parser_gimple_parse_bb_spec): Use strtoul with ERANGE check instead of atoi to avoid UB and detect invalid __BB#. Signed-off-by: Heiko Eißfeldt <heiko@hexco.de> --- gcc/c/gimple-parser.cc | 20 +++++++++++++++----- ...
diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc index 78e85d93487..21631fa02f5 100644 --- a/gcc/c/gimple-parser.cc +++ b/gcc/c/gimple-parser.cc @@ -133,11 +133,21 @@ c_parser_gimple_parse_bb_spec (tree val, int *index) { if (!startswith (IDENTIFIER_POINTER (val), "__BB")) return false; - for (const char *p = IDENTIFIER_POINTER (val) + 4; *p; ++p) - if (!ISDIGIT (*p)) - return false; - *index = atoi (IDENTIFIER_POINTER (val) + 4); - return *index > 0; + + const char *bb = IDENTIFIER_POINTER (val) + 4; + if (! ISDIGIT (*bb)) + return false; + + char *pend; + errno = 0; + const unsigned long number = strtoul (bb, &pend, 10); + if (errno == ERANGE + || *pend != '\0' + || number > INT_MAX) + return false; + + *index = number; + return true; } /* See if VAL is an identifier matching __BB<num> and return <num> @@ -250,11 +260,18 @@ c_parser_parse_gimple_body (c_parser *cparser, char *gimple_pass, } } - if (! c_parser_gimple_compound_statement (parser, &seq) - && cdil == cdil_gimple) + if (! c_parser_gimple_compound_statement (parser, &seq)) { - gimple *ret = gimple_build_return (NULL); - gimple_seq_add_stmt_without_update (&seq, ret); + if (cdil == cdil_gimple) + { + gimple *ret = gimple_build_return (NULL); + gimple_seq_add_stmt_without_update (&seq, ret); + } + else + { + /* in case of syntax errors abort early */ + return; + } } tree block = pop_scope (); diff --git a/gcc/testsuite/gcc.dg/pr114541-else-BB#-and-garbagechar.c b/gcc/testsuite/gcc.dg/pr114541-else-BB#-and-garbagechar.c new file mode 100644 index 00000000000..4cb990f4423 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr114541-else-BB#-and-garbagechar.c @@ -0,0 +1,29 @@ +/* PR middle-end/114541 */ +/* { dg-do compile } */ +/* { dg-options "-fgimple" } */ +/* { dg-bogus "internal compiler error" } */ + +void __GIMPLE (ssa,startwith ("dse2")) foo () +{ + int a; + +__BB(2): + if (a_5(D) > 4) + goto __BB3; + else + goto __BB4&; /* { dg-error "expected ';' before" } */ + +__BB(3): + a_2 = 10; + goto __BB5; + +__BB(4): + a_3 = 20; + goto __BB5; + +__BB(5): + a_1 = __PHI (__BB3: a_2, __BB4: a_3); + a_4 = a_1 + 4; + +return; +} diff --git a/gcc/testsuite/gcc.dg/pr114541-then-BB#-and-garbagechar.c b/gcc/testsuite/gcc.dg/pr114541-then-BB#-and-garbagechar.c new file mode 100644 index 00000000000..bcb6a937283 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr114541-then-BB#-and-garbagechar.c @@ -0,0 +1,29 @@ +/* PR middle-end/114541 */ +/* { dg-do compile } */ +/* { dg-options "-fgimple" } */ +/* { dg-bogus "internal compiler error" } */ + +void __GIMPLE (ssa,startwith ("dse2")) foo () +{ + int a; + +__BB(2): + if (a_5(D) > 4) + goto __BB3&; /* { dg-error "expected ';' before" } */ + else + goto __BB4; + +__BB(3): + a_2 = 10; + goto __BB5; + +__BB(4): + a_3 = 20; + goto __BB5; + +__BB(5): + a_1 = __PHI (__BB3: a_2, __BB4: a_3); + a_4 = a_1 + 4; + +return; +}
As commented in PR114541 here is a first patch that 1. replaces atoi() with strtoul() with ERANGE checking and 2. fixes the handling of misparsed gimple compounds to return early. 3. adds two new test cases. There is more work to do for Andrews testcase to succeed, so PR114541 is not done yet. === Replace atoi() with strtoul() with ERANGE checking. The function c_parser_gimple_parse_bb_spec uses atoi, which can silently return valid numbers even for some too large numbers in the string. Furthermore in function c_parser_parse_gimple_body handle the case of gimple compound statement errors more generically. In the case of cdil != cdil_gimple now consider them as errors and return early. This avoids further processing with erroneous data. 2024-12-05 Heiko Eißfeldt <heiko@hexco.de> PR c/114541 * gimple-parser.cc (c_parser_gimple_parse_bb_spec): Use strtoul with ERANGE check instead of atoi * gimple-parser.cc (c_parser_parse_gimple_body): separate check for errors in c_parser_gimple_compound_statement and special handling of cdil == cdil_gimple to allow a return in case of errors for cdil != cdil_gimple * gcc.dg/pr114541-else-BB#-and-garbagechar.c: New test. * gcc.dg/pr114541-then-BB#-and-garbagechar.c: New test. Signed-off-by: Heiko Eißfeldt <heiko@hexco.de>