diff mbox

[3/3] Improve -Wmissing-indentation heuristics

Message ID 1433871067-30661-3-git-send-email-patrick@parcs.ath.cx
State New
Headers show

Commit Message

Patrick Palka June 9, 2015, 5:31 p.m. UTC
This patch improves the heuristics of the warning in a number of ways.
The improvements are hopefully adequately documented in the code
comments.

The additions to the test case also highlight the improvements.

I tested an earlier version of this patch on more than a dozen C code
bases.  I only found one class of bogus warnings yet emitted, in the
libpng and bdwgc projects.  These projects have a coding style which
indents code inside #ifdefs as if this code was guarded by an if(), e.g.

  if (foo != 0)
    x = 10;
  else       // GUARD
    y = 100; // BODY

  #ifdef BAR
    blah ();  // NEXT
  #endif

These bogus warnings are pre-existing, however (i.e. not caused by this
patch).

gcc/c-family/ChangeLog:

	* c-indentation.c (should_warn_for_misleading_indentation):
	Improve heuristics.

gcc/testsuite/ChangeLog:

	* c-c++-common/Wmisleading-indentation.c: Add more tests.
---
 gcc/c-family/c-indentation.c                       | 167 ++++++++++++++++++---
 .../c-c++-common/Wmisleading-indentation.c         | 166 ++++++++++++++++++++
 2 files changed, 313 insertions(+), 20 deletions(-)

Comments

David Malcolm June 18, 2015, 4:31 p.m. UTC | #1
On Tue, 2015-06-09 at 13:31 -0400, Patrick Palka wrote:
> This patch improves the heuristics of the warning in a number of ways.
> The improvements are hopefully adequately documented in the code
> comments.
> 
> The additions to the test case also highlight the improvements.
> 
> I tested an earlier version of this patch on more than a dozen C code
> bases.  I only found one class of bogus warnings yet emitted, in the
> libpng and bdwgc projects.  These projects have a coding style which
> indents code inside #ifdefs as if this code was guarded by an if(), e.g.
> 
>   if (foo != 0)
>     x = 10;
>   else       // GUARD
>     y = 100; // BODY
> 
>   #ifdef BAR
>     blah ();  // NEXT
>   #endif

We have detect_preprocessor_logic which suppresses warnings when there's
preprocessor logic between BODY_EXPLOC and NEXT_STMT_EXPLOC, for cases
like this:

	if (flagA)
	  foo ();
	  ^ BODY_EXPLOC
      #if SOME_CONDITION_THAT_DOES_NOT_HOLD
	if (flagB)
      #endif
	  bar ();
	  ^ NEXT_STMT_EXPLOC

This currently requires that it be multiline preprocessor logic: there
must be 3 or more lines between BODY_EXPLOC and NEXT_STMT_EXPLOC for
this rejection heuristic to fire.

Perhaps we could tweak or reuse that heuristic, perhaps if there's an
entirely blank (or all whitespace) line separating them (given that this
warning is all about the "look" of the code).

> These bogus warnings are pre-existing, however (i.e. not caused by this
> patch).

(nods)   Fixing the false positives from libpng/bdwgc sounds like a
separate issue and thus a separate patch then.

[...snip...]
Patrick Palka June 18, 2015, 4:57 p.m. UTC | #2
On Thu, Jun 18, 2015 at 12:31 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> On Tue, 2015-06-09 at 13:31 -0400, Patrick Palka wrote:
>> This patch improves the heuristics of the warning in a number of ways.
>> The improvements are hopefully adequately documented in the code
>> comments.
>>
>> The additions to the test case also highlight the improvements.
>>
>> I tested an earlier version of this patch on more than a dozen C code
>> bases.  I only found one class of bogus warnings yet emitted, in the
>> libpng and bdwgc projects.  These projects have a coding style which
>> indents code inside #ifdefs as if this code was guarded by an if(), e.g.
>>
>>   if (foo != 0)
>>     x = 10;
>>   else       // GUARD
>>     y = 100; // BODY
>>
>>   #ifdef BAR
>>     blah ();  // NEXT
>>   #endif
>
> We have detect_preprocessor_logic which suppresses warnings when there's
> preprocessor logic between BODY_EXPLOC and NEXT_STMT_EXPLOC, for cases
> like this:
>
>         if (flagA)
>           foo ();
>           ^ BODY_EXPLOC
>       #if SOME_CONDITION_THAT_DOES_NOT_HOLD
>         if (flagB)
>       #endif
>           bar ();
>           ^ NEXT_STMT_EXPLOC
>
> This currently requires that it be multiline preprocessor logic: there
> must be 3 or more lines between BODY_EXPLOC and NEXT_STMT_EXPLOC for
> this rejection heuristic to fire.

Oh I now see why it requires 3 or more lines: one line each for the
#if, #endif and for the

>
> Perhaps we could tweak or reuse that heuristic, perhaps if there's an
> entirely blank (or all whitespace) line separating them (given that this
> warning is all about the "look" of the code).

That makes sense.  What about just checking in
detect_preprocessor_logic() if there is > 1 line (instead of >= 3
lines) between the body and the next statement?  When that's the case,
then whatever is in between the start of the body must either be more
of the body (if it's a multi-line compound statement) or whitespace.
In either case we should not warn if the next statement is aligned
with the body.  Yet we will still rightfully warn on the following
code:

    if (foo)  // GUARD
      bar ();  // BODY
    #ifdef BAZ
      baz ();  // NEXT
    #endif

because there is just one line between the body and the next
statement. The user can add a line between the body and the next
statement to suppress the warning if it's bogus.

>
>> These bogus warnings are pre-existing, however (i.e. not caused by this
>> patch).
>
> (nods)   Fixing the false positives from libpng/bdwgc sounds like a
> separate issue and thus a separate patch then.
>
> [...snip...]
>
>
Patrick Palka June 18, 2015, 5:10 p.m. UTC | #3
On Thu, Jun 18, 2015 at 12:57 PM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> On Thu, Jun 18, 2015 at 12:31 PM, David Malcolm <dmalcolm@redhat.com> wrote:
>> On Tue, 2015-06-09 at 13:31 -0400, Patrick Palka wrote:
>>> This patch improves the heuristics of the warning in a number of ways.
>>> The improvements are hopefully adequately documented in the code
>>> comments.
>>>
>>> The additions to the test case also highlight the improvements.
>>>
>>> I tested an earlier version of this patch on more than a dozen C code
>>> bases.  I only found one class of bogus warnings yet emitted, in the
>>> libpng and bdwgc projects.  These projects have a coding style which
>>> indents code inside #ifdefs as if this code was guarded by an if(), e.g.
>>>
>>>   if (foo != 0)
>>>     x = 10;
>>>   else       // GUARD
>>>     y = 100; // BODY
>>>
>>>   #ifdef BAR
>>>     blah ();  // NEXT
>>>   #endif
>>
>> We have detect_preprocessor_logic which suppresses warnings when there's
>> preprocessor logic between BODY_EXPLOC and NEXT_STMT_EXPLOC, for cases
>> like this:
>>
>>         if (flagA)
>>           foo ();
>>           ^ BODY_EXPLOC
>>       #if SOME_CONDITION_THAT_DOES_NOT_HOLD
>>         if (flagB)
>>       #endif
>>           bar ();
>>           ^ NEXT_STMT_EXPLOC
>>
>> This currently requires that it be multiline preprocessor logic: there
>> must be 3 or more lines between BODY_EXPLOC and NEXT_STMT_EXPLOC for
>> this rejection heuristic to fire.
>
> Oh I now see why it requires 3 or more lines: one line each for the
> #if, #endif and for the
>
>>
>> Perhaps we could tweak or reuse that heuristic, perhaps if there's an
>> entirely blank (or all whitespace) line separating them (given that this
>> warning is all about the "look" of the code).
>
> That makes sense.  What about just checking in
> detect_preprocessor_logic() if there is > 1 line (instead of >= 3
> lines) between the body and the next statement?  When that's the case,
> then whatever is in between the start of the body must either be more
> of the body (if it's a multi-line compound statement) or whitespace.
> In either case we should not warn if the next statement is aligned
> with the body.

Actually, the body cannot be a compound statement because (with this
patch) we have already bailed out in that case.  So if there is > 1
line between the body and the next statement then there must be
whitespace... unless there are just more #ifdefs, e.g.

    if (foo)
      bar ();
    #ifdef A
    #ifdef B
      baz ();
    #endif
    #endif

We would want to warn here even though there are 2 lines, not 1 line,
in between the body and the next statement.  So we still have to
directly check for a whitespace line,  I think.

> Yet we will still rightfully warn on the following
> code:

>
>     if (foo)  // GUARD
>       bar ();  // BODY
>     #ifdef BAZ
>       baz ();  // NEXT
>     #endif
>
> because there is just one line between the body and the next
> statement. The user can add a line between the body and the next
> statement to suppress the warning if it's bogus.

I meant to say, a (empty) line between the body and the #ifdef.

>
>>
>>> These bogus warnings are pre-existing, however (i.e. not caused by this
>>> patch).
>>
>> (nods)   Fixing the false positives from libpng/bdwgc sounds like a
>> separate issue and thus a separate patch then.
>>
>> [...snip...]
>>
>>
Jeff Law June 22, 2015, 4:37 p.m. UTC | #4
On 06/18/2015 10:57 AM, Patrick Palka wrote:
[ big snip ]
>>
>>> These bogus warnings are pre-existing, however (i.e. not caused by this
>>> patch).
>>
>> (nods)   Fixing the false positives from libpng/bdwgc sounds like a
>> separate issue and thus a separate patch then.
Agreed.

jeff
Jeff Law June 22, 2015, 5:37 p.m. UTC | #5
On 06/09/2015 11:31 AM, Patrick Palka wrote:
> This patch improves the heuristics of the warning in a number of ways.
> The improvements are hopefully adequately documented in the code
> comments.
>
> The additions to the test case also highlight the improvements.
>
> I tested an earlier version of this patch on more than a dozen C code
> bases.  I only found one class of bogus warnings yet emitted, in the
> libpng and bdwgc projects.  These projects have a coding style which
> indents code inside #ifdefs as if this code was guarded by an if(), e.g.
>
>    if (foo != 0)
>      x = 10;
>    else       // GUARD
>      y = 100; // BODY
>
>    #ifdef BAR
>      blah ();  // NEXT
>    #endif
>
> These bogus warnings are pre-existing, however (i.e. not caused by this
> patch).
>
> gcc/c-family/ChangeLog:
>
> 	* c-indentation.c (should_warn_for_misleading_indentation):
> 	Improve heuristics.
>
> gcc/testsuite/ChangeLog:
>
> 	* c-c++-common/Wmisleading-indentation.c: Add more tests.
OK after confirming a successful bootstrap & regression test.

jeff
diff mbox

Patch

diff --git a/gcc/c-family/c-indentation.c b/gcc/c-family/c-indentation.c
index f43aee6..8de3bc5 100644
--- a/gcc/c-family/c-indentation.c
+++ b/gcc/c-family/c-indentation.c
@@ -185,6 +185,7 @@  should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
   location_t body_loc = body_tinfo.location;
   location_t next_stmt_loc = next_tinfo.location;
 
+  enum cpp_ttype body_type = body_tinfo.type;
   enum cpp_ttype next_tok_type = next_tinfo.type;
 
   /* Don't attempt to compare the indentation of BODY_LOC and NEXT_STMT_LOC
@@ -230,12 +231,33 @@  should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
       || next_tinfo.keyword == RID_ELSE)
     return false;
 
+  /* Likewise, if the body of the guard is a compound statement then control
+     flow is quite visually explicit regardless of the code's possibly poor
+     indentation, e.g.
+
+     while (foo)  <- GUARD
+       {          <- BODY
+       bar ();
+       }
+       baz ();    <- NEXT
+
+    Things only get muddy when the body of the guard does not have
+    braces, e.g.
+
+    if (foo)  <- GUARD
+      bar (); <- BODY
+      baz (); <- NEXT
+  */
+  if (body_type == CPP_OPEN_BRACE)
+    return false;
+
   /* Don't warn here about spurious semicolons.  */
   if (next_tok_type == CPP_SEMICOLON)
     return false;
 
   expanded_location body_exploc = expand_location (body_loc);
   expanded_location next_stmt_exploc = expand_location (next_stmt_loc);
+  expanded_location guard_exploc = expand_location (guard_loc);
 
   /* They must be in the same file.  */
   if (next_stmt_exploc.file != body_exploc.file)
@@ -253,13 +275,20 @@  should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
        if (flag) foo (); bar ();
                          ^ WARN HERE
 
+
+       if (flag) ; {
+                   ^ WARN HERE
+
+       if (flag)
+        ; {
+          ^ WARN HERE
+
      Cases where we don't want to issue a warning:
 
        various_code (); if (flag) foo (); bar (); more_code ();
                                           ^ DON'T WARN HERE.  */
   if (next_stmt_exploc.line == body_exploc.line)
     {
-      expanded_location guard_exploc = expand_location (guard_loc);
       if (guard_exploc.file != body_exploc.file)
 	return true;
       if (guard_exploc.line < body_exploc.line)
@@ -307,14 +336,10 @@  should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
 	  bar ();
 	  ^ DON'T WARN HERE
 
-        if (flag) {
-          foo ();
-        } else
-        {
-          bar ();
-        }
-        baz ();
-        ^ DON'T WARN HERE
+	if (flag)
+	  ;
+	  foo ();
+	  ^ DON'T WARN HERE
   */
   if (next_stmt_exploc.line > body_exploc.line)
     {
@@ -322,29 +347,84 @@  should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
 	 "visual column"...  */
       unsigned int next_stmt_vis_column;
       unsigned int body_vis_column;
+      unsigned int body_line_first_nws;
       /* If we can't determine it, don't issue a warning.  This is sometimes
 	 the case for input files containing #line directives, and these
 	 are often for autogenerated sources (e.g. from .md files), where
 	 it's not clear that it's meaningful to look at indentation.  */
       if (!get_visual_column (next_stmt_exploc, &next_stmt_vis_column))
 	return false;
-      if (!get_visual_column (body_exploc, &body_vis_column))
+      if (!get_visual_column (body_exploc,
+			      &body_vis_column,
+			      &body_line_first_nws))
 	return false;
-      if (next_stmt_vis_column == body_vis_column)
+      if ((body_type != CPP_SEMICOLON
+	   && next_stmt_vis_column == body_vis_column)
+	  /* As a special case handle the case where the body is a semicolon
+	     that may be hidden by a preceding comment, e.g.  */
+
+	  // if (p)
+	  //   /* blah */;
+	  //   foo (1);
+
+	  /*  by looking instead at the column of the first non-whitespace
+	      character on the body line.  */
+	  || (body_type == CPP_SEMICOLON
+	      && body_exploc.line > guard_exploc.line
+	      && body_line_first_nws != body_vis_column
+	      && next_stmt_vis_column == body_line_first_nws))
 	{
-	  /* Don't warn if they aren't aligned on the same column
-	     as the guard itself (suggesting autogenerated code that
-	     doesn't bother indenting at all).  */
-	  expanded_location guard_exploc = expand_location (guard_loc);
+          /* Don't warn if they are aligned on the same column
+	     as the guard itself (suggesting autogenerated code that doesn't
+	     bother indenting at all).  We consider the column of the first
+	     non-whitespace character on the guard line instead of the column
+	     of the actual guard token itself because it is more sensible.
+	     Consider:
+
+	     if (p) {
+	     foo (1);
+	     } else     // GUARD
+	     foo (2);   // BODY
+	     foo (3);   // NEXT
+
+	     and:
+
+	     if (p)
+	       foo (1);
+	     } else       // GUARD
+	       foo (2);   // BODY
+	       foo (3);   // NEXT
+
+	     If we just used the column of the guard token, we would warn on
+	     the first example and not warn on the second.  But we want the
+	     exact opposite to happen: to not warn on the first example (which
+	     is probably autogenerated) and to warn on the second (whose
+	     indentation is misleading).  Using the column of the first
+	     non-whitespace character on the guard line makes that
+	     happen.  */
 	  unsigned int guard_vis_column;
-	  if (!get_visual_column (guard_exploc, &guard_vis_column))
+	  unsigned int guard_line_first_nws;
+	  if (!get_visual_column (guard_exploc,
+				  &guard_vis_column,
+				  &guard_line_first_nws))
 	    return false;
-	  if (guard_vis_column == body_vis_column)
+
+	  if (guard_line_first_nws == body_vis_column)
 	    return false;
 
-	  /* PR 66220: Don't warn if the guarding statement is more
-	     indented than the next/body stmts.  */
-	  if (guard_vis_column > body_vis_column)
+	  /* We may have something like:
+
+	     if (p)
+	       {
+	       foo (1);
+	       } else  // GUARD
+	     foo (2);  // BODY
+	     foo (3);  // NEXT
+
+	     in which case the columns are not aligned but the code is not
+	     misleadingly indented.  If the column of the body is less than
+	     that of the guard line then don't warn.  */
+	  if (body_vis_column < guard_line_first_nws)
 	    return false;
 
 	  /* Don't warn if there is multiline preprocessor logic between
@@ -355,6 +435,53 @@  should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
 	  /* Otherwise, they are visually aligned: issue a warning.  */
 	  return true;
 	}
+
+	/* Also issue a warning for code having the form:
+
+	   if (flag);
+	     foo ();
+
+	   while (flag);
+	   {
+	     ...
+	   }
+
+	   for (...);
+	     {
+	       ...
+	     }
+
+	   if (flag)
+	     ;
+	   else if (flag);
+	     foo ();
+
+	   where the semicolon at the end of each guard is most likely spurious.
+
+	   But do not warn on:
+
+	   for (..);
+	   foo ();
+
+	   where the next statement is aligned with the guard.
+	*/
+	if (body_type == CPP_SEMICOLON)
+	  {
+	    if (body_exploc.line == guard_exploc.line)
+	      {
+		unsigned int guard_vis_column;
+		unsigned int guard_line_first_nws;
+		if (!get_visual_column (guard_exploc,
+					&guard_vis_column,
+					&guard_line_first_nws))
+		  return false;
+
+		if (next_stmt_vis_column > guard_line_first_nws
+		    || (next_tok_type == CPP_OPEN_BRACE
+			&& next_stmt_vis_column == guard_vis_column))
+		  return true;
+	      }
+	  }
     }
 
   return false;
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation.c b/gcc/testsuite/c-c++-common/Wmisleading-indentation.c
index 443e3dd..0d6d8d2 100644
--- a/gcc/testsuite/c-c++-common/Wmisleading-indentation.c
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation.c
@@ -691,3 +691,169 @@  fn_36 (void)
 	}
 	foo(6); /* We shouldn't warn here.  */
 }
+
+/* The following function contain code whose indentation is misleading, thus
+   we warn about it.  */
+
+void
+fn_37 (void)
+{
+  int i;
+
+#define EMPTY
+#define FOR_EACH(VAR, START, STOP) for (VAR = START; VAR < STOP; VAR++)
+
+  while (flagA); /* { dg-message "3: ...this 'while' clause" } */
+    foo (0); /* { dg-warning "statement is indented as if" } */
+
+  if (flagA)
+    ;
+  else if (flagA); /* { dg-message "8: ...this 'if' clause" } */
+    foo (0); /* { dg-warning "statement is indented as if" } */
+  while (flagA) /* { dg-message "3: ...this 'while' clause" } */
+    /* blah */;
+    foo (0); /* { dg-warning "statement is indented as if" } */
+
+  if (flagA)
+    ;
+  else if (flagA) /* { dg-message "8: ...this 'if' clause" } */
+    foo (1);
+    foo (2); /* { dg-warning "statement is indented as if" } */
+
+  if (flagA)
+    foo (1);
+  else if (flagA) /* { dg-message "8: ...this 'if' clause" } */
+    foo (2);
+    foo (3); /* { dg-warning "statement is indented as if" } */
+
+  if (flagB) /* { dg-message "3: ...this 'if' clause" } */
+    /* blah */;
+    { /* { dg-warning "statement is indented as if" } */
+      foo (0);
+    }
+
+  if (flagB)
+    ;
+  else; foo (0); /* { dg-warning "statement is indented as if" } */
+
+  if (flagC); foo (2); /* { dg-warning "statement is indented as if" } */
+
+  if (flagA)
+    ; /* blah */ { /* { dg-warning "statement is indented as if" } */
+      foo (1);
+    }
+
+  if (flagB) ; /* { dg-message "3: ...this 'if' clause" } */
+    return; /* { dg-warning "statement is indented as if" } */
+
+  if (flagB) EMPTY; /* { dg-message "3: ...this 'if' clause" } */
+    foo (1); /* { dg-warning "statement is indented as if" } */
+
+  for (i = 0; i < 10; i++); /* { dg-message "3: ...this 'for' clause" } */
+    foo (2); /* { dg-warning "statement is indented as if" } */
+
+  FOR_EACH (i, 0, 10);
+    foo (2); /* { dg-warning "statement is indented as if" } */
+
+  FOR_EACH (i, 0, 10);
+    { /* { dg-warning "statement is indented as if" } */
+      foo (3);
+    }
+
+  FOR_EACH (i, 0, 10);
+  { /* { dg-warning "statement is indented as if" } */
+    foo (3);
+  }
+
+  while (i++); { /* { dg-warning "statement is indented as if" } */
+    foo (3);
+  }
+
+  if (i++); { /* { dg-warning "statement is indented as if" } */
+    foo (3);
+  }
+
+  if (flagA) {
+    foo (1);
+  } else /* { dg-message "5: ...this 'else' clause" } */
+    if (flagB)
+       foo (2);
+    foo (3); /* { dg-warning "statement is indented as if" } */
+
+  if (flagA)
+    foo (1);
+  else if (flagB); /* { dg-message "8: ...this 'if' clause" } */
+    foo (2); /* { dg-warning "statement is indented as if" } */
+
+#undef EMPTY
+#undef FOR_EACH
+}
+
+/* The following function contains code whose indentation is not great but not
+   misleading, thus we don't warn.  */
+
+void
+fn_38 (void)
+{
+  int i = 0;
+
+  while (flagA)
+    ;
+    foo (0);
+
+  if (flagB)
+    ;
+    {
+      foo (0);
+    }
+
+  while (flagC);
+  foo (2);
+
+  if (flagA)
+    while (flagC++);
+  else
+    foo (2);
+
+  if (i)
+    while (i++ < 10000);
+  foo (5);
+
+  if (i) while (i++ < 10000);
+  foo (5);
+
+  if (flagA) {
+    foo (1);
+  } else
+  if (flagB)
+    foo (2);
+  foo (3);
+
+  if (flagA)
+    {
+    foo (1);
+    } else
+  if (flagB)
+    foo (2);
+  foo (3);
+}
+
+/* The following function contains good indentation which we definitely should
+   not warn about.  */
+
+void
+fn_39 (void)
+{
+  if (flagA)
+    ;
+  if (flagB)
+    ;
+
+  if (flagA)
+    if (flagB)
+      foo (0);
+    else
+      foo (1);
+  else
+    foo (2);
+}