diff mbox series

diagnostics: libcpp: Improve locations for _Pragma lexing diagnostics [PR114423]

Message ID 20241018132520.817073-1-lhyatt@gmail.com
State New
Headers show
Series diagnostics: libcpp: Improve locations for _Pragma lexing diagnostics [PR114423] | expand

Commit Message

Lewis Hyatt Oct. 18, 2024, 1:25 p.m. UTC
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114423

The diagnostics we issue while lexing tokens from a _Pragma string have
always come out at invalid locations. I had tried a couple years ago to
fix this in a general way, but I think that ended up being too invasive a
change to fix a problem that's pretty minor in practice, and it never got
over the finish line. Here is a simple patch that improves the situation
and addresses the recently filed PR on that topic, hopefully this
incremental improvement is a better way to make some progress on this?

It just adds a way for libcpp to override the location for all diagnostics
temporarily, so that the diagnostics issued while lexing from a _Pragma
string are issued at a real location (the location of the _Pragma token)
and not a bogus one. That's a lot simpler than trying to arrange to
produce valid locations when lexing tokens from an internal
buffer. Bootstrap + regtest all languages on x86-64 Linux, tweaked a few
existing tests to adjust to the new locations. OK for trunk? Thanks!

-Lewis

-- >8 --

libcpp is not currently set up to be able to generate valid
locations for tokens lexed from a _Pragma string. Instead, after obtaining
the tokens, it sets their locations all to the location of the _Pragma
operator itself. This makes things like _Pragma("GCC diagnostic") work well
enough, but if any diagnostics are issued during lexing, prior to resetting
the token locations, those diagnostics get issued at the invalid
locations. Fix that up by adding a new field pfile->diagnostic_override_loc
that instructs libcpp to issue diagnostics at the alternate location.

libcpp/ChangeLog:

	PR preprocessor/114423
	* internal.h (struct cpp_reader): Add DIAGNOSTIC_OVERRIDE_LOC
	field.
	* directives.cc (destringize_and_run): Set the new field to the
	location of the _Pragma operator.
	* errors.cc (cpp_diagnostic_at): Support DIAGNOSTIC_OVERRIDE_LOC to
	temporarily issue diagnostics at a different location.
	(cpp_diagnostic_with_line): Likewise.

gcc/testsuite/ChangeLog:

	PR preprocessor/114423
	* c-c++-common/cpp/pragma-diagnostic-loc.c: New test.
	* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust expected output.
	* g++.dg/pch/operator-1.C: Likewise.
---
 libcpp/directives.cc                          |  7 +++++++
 libcpp/errors.cc                              | 19 +++++++++++++------
 libcpp/internal.h                             |  4 ++++
 .../c-c++-common/cpp/diagnostic-pragma-1.c    |  9 ++++-----
 .../c-c++-common/cpp/pragma-diagnostic-loc.c  | 17 +++++++++++++++++
 gcc/testsuite/g++.dg/pch/operator-1.C         |  8 +++++++-
 6 files changed, 52 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pragma-diagnostic-loc.c

Comments

David Malcolm Oct. 18, 2024, 3:24 p.m. UTC | #1
On Fri, 2024-10-18 at 09:25 -0400, Lewis Hyatt wrote:
> Hello-
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114423
> 
> The diagnostics we issue while lexing tokens from a _Pragma string
> have
> always come out at invalid locations. I had tried a couple years ago
> to
> fix this in a general way, but I think that ended up being too
> invasive a
> change to fix a problem that's pretty minor in practice, and it never
> got
> over the finish line. Here is a simple patch that improves the
> situation
> and addresses the recently filed PR on that topic, hopefully this
> incremental improvement is a better way to make some progress on
> this?
> 
> It just adds a way for libcpp to override the location for all
> diagnostics
> temporarily, so that the diagnostics issued while lexing from a
> _Pragma
> string are issued at a real location (the location of the _Pragma
> token)
> and not a bogus one. That's a lot simpler than trying to arrange to
> produce valid locations when lexing tokens from an internal
> buffer. Bootstrap + regtest all languages on x86-64 Linux, tweaked a
> few
> existing tests to adjust to the new locations. OK for trunk? Thanks!

[...snip...]

> diff --git a/libcpp/errors.cc b/libcpp/errors.cc
> index ad45f61913c..96fc165c12a 100644
> --- a/libcpp/errors.cc
> +++ b/libcpp/errors.cc
> @@ -60,13 +60,14 @@ cpp_diagnostic_at (cpp_reader * pfile, enum
> cpp_diagnostic_level level,
>  		   enum cpp_warning_reason reason, rich_location
> *richloc,
>  		   const char *msgid, va_list *ap)
>  {
> -  bool ret;
> -
>    if (!pfile->cb.diagnostic)
>      abort ();
> -  ret = pfile->cb.diagnostic (pfile, level, reason, richloc,
> _(msgid), ap);
> -
> -  return ret;
> +  if (pfile->diagnostic_override_loc && level != CPP_DL_NOTE)
> +    {
> +      rich_location rc2{pfile->line_table, pfile-
> >diagnostic_override_loc};
> +      return pfile->cb.diagnostic (pfile, level, reason, &rc2,
> _(msgid), ap);
> +    }

This will effectively override the primary location in the
rich_location, but by using a second rich_location instance it will
also ignore any secondary locations and fix-it hints.

This might will be what we want here, but did you consider 
  richloc.set_range (0, pfile->diagnostic_override_loc,
                     SHOW_RANGE_WITH_CARET);
to reset the primary location?

Otherwise, looks good to me.

[...snip...]

Thanks
Dave
Lewis Hyatt Oct. 18, 2024, 5:58 p.m. UTC | #2
On Fri, Oct 18, 2024 at 11:25 AM David Malcolm <dmalcolm@redhat.com> wrote:
> >    if (!pfile->cb.diagnostic)
> >      abort ();
> > -  ret = pfile->cb.diagnostic (pfile, level, reason, richloc,
> > _(msgid), ap);
> > -
> > -  return ret;
> > +  if (pfile->diagnostic_override_loc && level != CPP_DL_NOTE)
> > +    {
> > +      rich_location rc2{pfile->line_table, pfile-
> > >diagnostic_override_loc};
> > +      return pfile->cb.diagnostic (pfile, level, reason, &rc2,
> > _(msgid), ap);
> > +    }
>
> This will effectively override the primary location in the
> rich_location, but by using a second rich_location instance it will
> also ignore any secondary locations and fix-it hints.
>
> This might will be what we want here, but did you consider
>   richloc.set_range (0, pfile->diagnostic_override_loc,
>                      SHOW_RANGE_WITH_CARET);
> to reset the primary location?
>
> Otherwise, looks good to me.
>
> [...snip...]
>
> Thanks
> Dave
>

Thanks for taking a look! My thinking was, when libcpp produces tokens
from a _Pragma string, basically every location_t that it generates is
wrong and shouldn't be used. Because it doesn't actually set up the
line_map, it gets something random that's just sorta close to
reasonable. So I think it makes sense to discard fixits and secondary
locations too.

libcpp does use rich_location pretty sparingly, but I suppose the goal
is to use it more over time. We use one fixit hint for invalid
directive names currently, that one can't show up in a _Pragma though.
Right now we do define an encoding_rich_location subclass for escaping
unprintable bytes, which inherits rich_location and just adds a new
constructor to set the escape flag when it is created. You are
definitely right that this patch as of now loses that information.

Here's a source that uses an improperly normalized character:

_Pragma("ோ")

Currently we output:

t.cpp:1:1: warning: ‘\U00000bc7\U00000bbe’ is not in NFC [-Wnormalized=]
    1 | _Pragma("<U+0BC7><U+0BBE>")
      | ^~~~~~

With the patch we output:
t.cpp:1:1: warning: ‘\U00000bc7\U00000bbe’ is not in NFC [-Wnormalized=]
    1 | _Pragma("ோ")
      | ^~~~~~~

So the main location range is improved (it underlines all of _Pragma
instead of most of it), but we have indeed lost the intended feature
that the incorrect bytes are escaped on the source line.

For this particular case I could improve it with a one line addition like

rc2.set_escape_on_output (richloc->escape_on_output_p ());

and that would actually handle all currently needed cases since we
don't use a lot of rich_locations in libcpp. It would just become
stale if some other option gets added to rich_location in the future
that we also should preserve. I think to fix it in a fully general
way, it would be necessary to add a new interface to class
rich_location. It already has a method to delete all the fixit hints,
it would also need a method to delete all the ranges. Then we could
make a copy of the richloc and just delete everything we don't want to
preserve. Do you have a preference one way or the other?

By the way, your suggestion was to directly modify richloc... these
functions do take the richloc by non-const pointer, but is it OK to
modify it or is a copy needed? The functions like cpp_warning_at() are
exposed in the public header file, although at the moment, all call
sites are within libcpp and don't look like they would notice if the
argument was modified. I wasn't sure what is the expected interface
here.

Thanks again...

-Lewis
diff mbox series

Patch

diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index 9d235fa1b05..5706c28b835 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -2430,6 +2430,12 @@  destringize_and_run (cpp_reader *pfile, const cpp_string *in,
   pfile->buffer->file = pfile->buffer->prev->file;
   pfile->buffer->sysp = pfile->buffer->prev->sysp;
 
+  /* See comment below regarding the use of expansion_loc as the location
+     for all tokens; arrange here that diagnostics issued during lexing
+     get the same treatment.  */
+  const auto prev_loc_override = pfile->diagnostic_override_loc;
+  pfile->diagnostic_override_loc = expansion_loc;
+
   start_directive (pfile);
   _cpp_clean_line (pfile);
   save_directive = pfile->directive;
@@ -2497,6 +2503,7 @@  destringize_and_run (cpp_reader *pfile, const cpp_string *in,
      make that applicable to the real buffer too.  */
   pfile->buffer->prev->sysp = pfile->buffer->sysp;
   _cpp_pop_buffer (pfile);
+  pfile->diagnostic_override_loc = prev_loc_override;
 
   /* Reset the old macro state before ...  */
   XDELETE (pfile->context);
diff --git a/libcpp/errors.cc b/libcpp/errors.cc
index ad45f61913c..96fc165c12a 100644
--- a/libcpp/errors.cc
+++ b/libcpp/errors.cc
@@ -60,13 +60,14 @@  cpp_diagnostic_at (cpp_reader * pfile, enum cpp_diagnostic_level level,
 		   enum cpp_warning_reason reason, rich_location *richloc,
 		   const char *msgid, va_list *ap)
 {
-  bool ret;
-
   if (!pfile->cb.diagnostic)
     abort ();
-  ret = pfile->cb.diagnostic (pfile, level, reason, richloc, _(msgid), ap);
-
-  return ret;
+  if (pfile->diagnostic_override_loc && level != CPP_DL_NOTE)
+    {
+      rich_location rc2{pfile->line_table, pfile->diagnostic_override_loc};
+      return pfile->cb.diagnostic (pfile, level, reason, &rc2, _(msgid), ap);
+    }
+  return pfile->cb.diagnostic (pfile, level, reason, richloc, _(msgid), ap);
 }
 
 /* Print a diagnostic at the location of the previously lexed token.  */
@@ -201,8 +202,14 @@  cpp_diagnostic_with_line (cpp_reader * pfile, enum cpp_diagnostic_level level,
   
   if (!pfile->cb.diagnostic)
     abort ();
+  /* Don't override note locations, which will likely make the note
+     more confusing.  */
+  const bool do_loc_override
+    = pfile->diagnostic_override_loc && level != CPP_DL_NOTE;
+  if (do_loc_override)
+    src_loc = pfile->diagnostic_override_loc;
   rich_location richloc (pfile->line_table, src_loc);
-  if (column)
+  if (column && !do_loc_override)
     richloc.override_column (column);
   ret = pfile->cb.diagnostic (pfile, level, reason, &richloc, _(msgid), ap);
 
diff --git a/libcpp/internal.h b/libcpp/internal.h
index a658a8c5739..13186c5a5b0 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -616,6 +616,10 @@  struct cpp_reader
      zero of said file.  */
   location_t main_loc;
 
+  /* If non-zero, override diagnostic locations (other than DK_NOTE
+     diagnostics) to this one.  */
+  location_t diagnostic_override_loc;
+
   /* Returns true iff we should warn about UTF-8 bidirectional control
      characters.  */
   bool warn_bidi_p () const
diff --git a/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c
index 9867c94a8dd..6e37294fd2b 100644
--- a/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c
+++ b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c
@@ -3,9 +3,8 @@ 
 #pragma GCC warning "warn-a" // { dg-warning warn-a }
 #pragma GCC error "err-b" // { dg-error err-b }
 
-#define CONST1 _Pragma("GCC warning \"warn-c\"") 1
-#define CONST2 _Pragma("GCC error \"err-d\"") 2
-
-char a[CONST1]; // { dg-warning warn-c }
-char b[CONST2]; // { dg-error err-d }
+#define CONST1 _Pragma("GCC warning \"warn-c\"") 1 // { dg-warning warn-c }
+#define CONST2 _Pragma("GCC error \"err-d\"") 2 // { dg-error err-d }
 
+char a[CONST1]; // { dg-note "in expansion of macro 'CONST1'" }
+char b[CONST2]; // { dg-note "in expansion of macro 'CONST2'" }
diff --git a/gcc/testsuite/c-c++-common/cpp/pragma-diagnostic-loc.c b/gcc/testsuite/c-c++-common/cpp/pragma-diagnostic-loc.c
new file mode 100644
index 00000000000..4ef40420cdd
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pragma-diagnostic-loc.c
@@ -0,0 +1,17 @@ 
+/* { dg-do preprocess } */
+/* PR preprocessor/114423 */
+/* Check that we now issue diagnostics at the location of the _Pragma
+   instead of an invalid location.  If we someday manage to issue
+   diagnostics at better locations in the future, this will need
+   updating.  */
+_Pragma("GCC warning \"warning1\"") /* { dg-warning "1:warning1" } */
+#define P _Pragma("GCC warning \"warning2\"") /* { dg-warning "11:warning2" } */
+P /* { dg-note "in expansion of macro" } */
+#define S "GCC warning \"warning3\""
+/**/ _Pragma(S) /* { dg-warning "6:warning3" } */
+
+/* This diagnostic uses a different code path (cpp_diagnostic_at() rather
+   than cpp_error_with_line()).  Also make sure that the dg-note location
+   does not get overridden to the _Pragma location.  */
+#pragma GCC poison xyz /* { dg-note "poisoned here" } */
+/* */ _Pragma("xyz") /* { dg-error "7:attempt to use poisoned" } */
diff --git a/gcc/testsuite/g++.dg/pch/operator-1.C b/gcc/testsuite/g++.dg/pch/operator-1.C
index 290b5f7ab21..1138a969536 100644
--- a/gcc/testsuite/g++.dg/pch/operator-1.C
+++ b/gcc/testsuite/g++.dg/pch/operator-1.C
@@ -1,2 +1,8 @@ 
 #include "operator-1.H"
-int main(void){ major(0);} /* { dg-warning "Did not Work" } */
+int main(void){ major(0);} /* { dg-note "in expansion of macro 'major'" } */
+/* Line numbers below pertain to the header file.  */
+/* { dg-warning "Did not Work" "" { target *-*-* } 1 } */
+/* { dg-note "in expansion of macro '__glibc_macro_warning1'" "" { target *-*-* } 3 } */
+/* { dg-note "in expansion of macro '__glibc_macro_warning'" "" { target *-*-* } 4 } */
+/* { dg-note "in expansion of macro '__SYSMACROS_DM1'" "" { target *-*-* } 6 } */
+/* { dg-note "in expansion of macro '__SYSMACROS_DM'" "" { target *-*-* } 9 } */