diff mbox series

[RFC,v1] Provide more contexts for -Warray-bounds warning messages

Message ID 20240702161750.2948930-1-qing.zhao@oracle.com
State New
Headers show
Series [RFC,v1] Provide more contexts for -Warray-bounds warning messages | expand

Commit Message

Qing Zhao July 2, 2024, 4:17 p.m. UTC
due to code duplication from jump threading [PR109071]
Control this with a new option -fdiagnostic-try-to-explain-harder.

This patch has been tested with -fdiagnostic-try-to-expain-harder on by
default to bootstrap gcc and regression testing on both x86 and aarch64,
resolved all bootstrap issues and regression testing issues. 

I need some help in the following two items:
1. suggestions on better documentation wordings for the new option.
2. checking on the new data structures copy_history and the
   memory management for it.
   In the beginning, I tried to use the GGC for it.
   but have met quite some issues (possibly because the gimple and the containing
   basic block elimination), then I gave up the GGC scheme and instead
   used the obstack and manually clean and deleted the obstack in the end of
   the compilation.  

The following is more details on the patchs

Thanks a lot!

Qing

$ cat t.c
extern void warn(void);
static inline void assign(int val, int *regs, int *index)
{
  if (*index >= 4)
    warn();
  *regs = val;
}
struct nums {int vals[4];};

void sparx5_set (int *ptr, struct nums *sg, int index)
{
  int *val = &sg->vals[index];

  assign(0,    ptr, &index);
  assign(*val, ptr, &index);
}

$ gcc -Wall -O2  -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
      |               ~~~~~~~~^~~~~~~
t.c:8:18: note: while referencing ‘vals’
    8 | struct nums {int vals[4];};
      |                  ^~~~

In the above, Although the warning is correct in theory, the warning message
itself is confusing to the end-user since there is information that cannot
be connected to the source code directly.

It will be a nice improvement to add more information in the warning message
to report where such index value come from.

In order to achieve this, we add a new data structure copy_history to record
the condition and the transformation that triggered the code duplication.
Whenever there is a code duplication due to some specific transformations,
such as jump threading, loop switching, etc, a copy_history structure is
created and attached to the duplicated gimple statement.

During array out-of-bound checking or other warning checking, the copy_history
that was attached to the gimple statement is used to form a sequence of
diagnostic events that are added to the corresponding rich location to be used
to report the warning message.

This behavior is controled by the new option -fdiagnostic-try-to-explain-harder
which is off by default.

With this change, by adding -fdiagnostic-try-to-explain-harder,
the warning message for the above testing case is now:

t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
      |               ~~~~~~~~^~~~~~~
  event 1
    |
    |    4 |   if (*index >= 4)
    |      |      ^
    |      |      |
    |      |      (1) when the condition is evaluated to true
    |
t.c:8:18: note: while referencing ‘vals’
    8 | struct nums {int vals[4];};
      |                  ^~~~

gcc/ChangeLog:

	PR tree-optimization/109071

	* Makefile.in (OBJS): Add diagnostic-copy-history.o.
	* gcc/common.opt (fdiagnostics-try-to-explain-harder): New option.
	* gcc/doc/invoke.texi (fdiagnostics-try-to-explain-harder): Add
	documentation for the new option.
	* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Change
	location_t to gcc_rich_location for warning.
	(array_bounds_checker::check_array_ref): Likewise.
	(array_bounds_checker::check_mem_ref): Likewise.
	(array_bounds_checker::check_addr_expr): Likewise.
	(array_bounds_checker::check_array_bounds): Generate a rich
	location for the location and add events to the rich location.
	* gimple-array-bounds.h: Update the prototypes of routines.
	* gimple-iterator.cc (gsi_remove): Remove the copy history when
	removing the gimple.
	* toplev.cc (toplev::finalize): Call copy_history_finalize.
	* tree-ssa-threadupdate.cc (set_copy_history_to_stmts_in_bb): New
	function.
	(back_jt_path_registry::duplicate_thread_path): Call the new function
	to set copy history for every duplicated gimple.
	* diagnostic-copy-history.cc: New file.
	* diagnostic-copy-history.h: New file.

gcc/testsuite/ChangeLog:

	PR tree-optimization/109071

	* gcc.dg/pr109071.c: New test.
---
 gcc/Makefile.in                 |   4 +-
 gcc/common.opt                  |   4 +
 gcc/diagnostic-copy-history.cc  | 163 ++++++++++++++++++++++++++++++++
 gcc/diagnostic-copy-history.h   |  81 ++++++++++++++++
 gcc/doc/invoke.texi             |   7 ++
 gcc/gimple-array-bounds.cc      |  62 +++++++-----
 gcc/gimple-array-bounds.h       |   8 +-
 gcc/gimple-iterator.cc          |   3 +
 gcc/testsuite/gcc.dg/pr109071.c |  36 +++++++
 gcc/toplev.cc                   |   3 +
 gcc/tree-ssa-threadupdate.cc    |  42 ++++++++
 11 files changed, 386 insertions(+), 27 deletions(-)
 create mode 100644 gcc/diagnostic-copy-history.cc
 create mode 100644 gcc/diagnostic-copy-history.h
 create mode 100644 gcc/testsuite/gcc.dg/pr109071.c

Comments

David Malcolm July 2, 2024, 10:02 p.m. UTC | #1
On Tue, 2024-07-02 at 16:17 +0000, Qing Zhao wrote:
> due to code duplication from jump threading [PR109071]
> Control this with a new option -fdiagnostic-try-to-explain-harder.

The name -fdiagnostic-try-to-explain-harder seems a little too "cute"
to me, but I can't think of a better name.

Various comments inline below...  I'm sorry I didn't take a close look
at the copy history implementation; I'm hoping Richi will dig into that
part of the patch.

> 
> This patch has been tested with -fdiagnostic-try-to-expain-harder on
> by
> default to bootstrap gcc and regression testing on both x86 and
> aarch64,
> resolved all bootstrap issues and regression testing issues. 
> 
> I need some help in the following two items:
> 1. suggestions on better documentation wordings for the new option.
> 2. checking on the new data structures copy_history and the
>    memory management for it.
>    In the beginning, I tried to use the GGC for it.
>    but have met quite some issues (possibly because the gimple and
> the containing
>    basic block elimination), then I gave up the GGC scheme and
> instead
>    used the obstack and manually clean and deleted the obstack in the
> end of
>    the compilation.  
> 
> The following is more details on the patchs
> 
> Thanks a lot!
> 
> Qing
> 
> $ cat t.c
> extern void warn(void);
> static inline void assign(int val, int *regs, int *index)
> {
>   if (*index >= 4)
>     warn();
>   *regs = val;
> }
> struct nums {int vals[4];};
> 
> void sparx5_set (int *ptr, struct nums *sg, int index)
> {
>   int *val = &sg->vals[index];
> 
>   assign(0,    ptr, &index);
>   assign(*val, ptr, &index);
> }
> 
> $ gcc -Wall -O2  -c -o t.o t.c
> t.c: In function ‘sparx5_set’:
> t.c:12:23: warning: array subscript 4 is above array bounds of
> ‘int[4]’ [-Warray-bounds=]
>    12 |   int *val = &sg->vals[index];
>       |               ~~~~~~~~^~~~~~~
> t.c:8:18: note: while referencing ‘vals’
>     8 | struct nums {int vals[4];};
>       |                  ^~~~
> 
> In the above, Although the warning is correct in theory, the warning
> message
> itself is confusing to the end-user since there is information that
> cannot
> be connected to the source code directly.
> 
> It will be a nice improvement to add more information in the warning
> message
> to report where such index value come from.
> 
> In order to achieve this, we add a new data structure copy_history to
> record
> the condition and the transformation that triggered the code
> duplication.
> Whenever there is a code duplication due to some specific
> transformations,
> such as jump threading, loop switching, etc, a copy_history structure
> is
> created and attached to the duplicated gimple statement.
> 
> During array out-of-bound checking or other warning checking, the
> copy_history
> that was attached to the gimple statement is used to form a sequence
> of
> diagnostic events that are added to the corresponding rich location
> to be used
> to report the warning message.
> 
> This behavior is controled by the new option -fdiagnostic-try-to-
> explain-harder
> which is off by default.
> 
> With this change, by adding -fdiagnostic-try-to-explain-harder,
> the warning message for the above testing case is now:
> 
> t.c: In function ‘sparx5_set’:
> t.c:12:23: warning: array subscript 4 is above array bounds of
> ‘int[4]’ [-Warray-bounds=]
>    12 |   int *val = &sg->vals[index];
>       |               ~~~~~~~~^~~~~~~
>   event 1
>     |
>     |    4 |   if (*index >= 4)
>     |      |      ^
>     |      |      |
>     |      |      (1) when the condition is evaluated to true
>     |
> t.c:8:18: note: while referencing ‘vals’
>     8 | struct nums {int vals[4];};
>       |                  ^~~~

BTW I notice in the above example you have the extra vertical line
below "event 1" here:

  event 1
    |
    |    4 |   if (*index >= 4)
    |      |      ^
    |      |      |
    |      |      (1) when the condition is evaluated to true
    |

whereas in the testcase it looks like you're expecting the simpler:

  event 1
   4 |   if (*index >= 4)
     |      ^
     |      |
     |      (1) when the condition is evaluated to true

which is due to r15-533-g3cd267446755ab, so I think the example text is
slightly outdated.

I wonder if the wording of the event could be improved to better
explain to the user what the optimizer is "thinking".

Perhaps it could be two events:

(1) when specializing the code for both branches...
(2) ...and considering the 'true' branch...

or something like that?  (I'm not sure)

Is there a more complicated testcase with multiple events?

Various other points about the path:

When the analyzer builds paths for its diagnostics, it adds a final
event that repeats the problem.  This is somewhat tautological, but
might make the path more readable - I'm not sure.  Consider:

t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of
‘int[4]’ [-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
      |               ~~~~~~~~^~~~~~~
  events 1-2
    4 |   if (*index >= 4)
      |      ^
      |      |
      |      (1) when the condition is evaluated to true
   ...
   12 |   int *val = &sg->vals[index];
      |               ~~~~~~~~^~~~~~~
      |                       |
      |                       (2) ⚠️ out of array bounds here

t.c:8:18: note: while referencing ‘vals’
    8 | struct nums {int vals[4];};
      |                  ^~~~

FWIW there's a way to get get a ⚠️  emoji on that final event, by
having its meaning have verb == diagnostic_event::VERB_danger.

GCC 15 also supports visualizing control flow between events, by having
diagnostic_event::connect_to_next_event_p () return true; see r15-636-
g770657d02c986c, so that might be something to consider adding.


[...snip...]

> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 7d3ea27a6ab..36a5b5fa305 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1436,6 +1436,7 @@ OBJS = \
>         df-problems.o \
>         df-scan.o \
>         dfp.o \
> +       diagnostic-copy-history.o \
>         digraph.o \
>         dojump.o \
>         dominance.o \
> @@ -1821,7 +1822,8 @@ OBJS = \
>  
>  # Objects in libcommon.a, potentially used by all host binaries and
> with
>  # no target dependencies.
> -OBJS-libcommon = diagnostic-spec.o diagnostic.o diagnostic-color.o \
> +OBJS-libcommon = diagnostic-spec.o \
> +       diagnostic.o diagnostic-color.o \

Is this change to OBJS-libcommon a stray edit?  I don't think it
changes the meaning.


>         diagnostic-format-json.o \
>         diagnostic-format-sarif.o \
>         diagnostic-global-context.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 327230967ea..833cc2f4da4 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1558,6 +1558,10 @@ fdiagnostics-minimum-margin-width=
>  Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6)
>  Set minimum width of left margin of source code when showing source.
>  
> +fdiagnostics-try-to-explain-harder
> +Common Var(flag_diagnostics_try_to_explain_harder)
> +Collect and print more context information for diagnostics.
> +

Eventually you'll want to run "make regenerate-opt-urls".

[...snip...]

> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
> index 1637a2fc4f4..9050b06bf1f 100644
> --- a/gcc/gimple-array-bounds.cc
> +++ b/gcc/gimple-array-bounds.cc

[...snip...]

> @@ -801,6 +802,21 @@ array_bounds_checker::check_array_bounds (tree
> *tp, int *walk_subtree,
>    else
>      location = gimple_location (wi->stmt);
>  
> +  /* Generate a rich location for this location.  */
> +  gcc_rich_location richloc (location);
> +  simple_diagnostic_path path (global_dc->printer);
> +
> +  /* Add events to this diagnostic_path per the copy_history
> attached to
> +     the corresponding STMT.  */
> +  copy_history_t cp_history = wi->stmt ? get_copy_history (wi->stmt)
> : NULL;
> +  for (copy_history_t cur_ch = cp_history; cur_ch;
> +       cur_ch = cur_ch->prev_copy)
> +    path.add_event (cur_ch->condition, NULL_TREE, 0,
> +                   "when the condition is evaluated to %s",
> +                   cur_ch->is_true_path ? "true" : "false");

add_event's 2nd param is the function in which the event occurs, which
you may want to set to cfun, and set the stack depth to 1, rather than
0, suggesting a purely intraprocedural control flow.

FWIW the analyzer has some nasty logic to try to reconstruct functions
and stack frames in the light of inlining, which might or might not
make sense for you to make use of (I'm not sure).
 

> +
> +  richloc.set_path (&path);

Creating and populating "path" here is doing non-trivial work per call
to check_array_bounds, and if I'm reading things right that's being
done for every statement.

So maybe we want something like:

  lazy_diaqnostic_path path (callback, data);

that is backed by a diagnostic_path *, which gets populated on demand
if any vfuncs on the lazy_diagnostic_path are actually called.

Or we could abandon simple_diagnostic_path here and have a:

  copy_history_diagnostic_path path (wi->stmt);

subclass of diagnostic_path that implements its vfuncs directly in
terms of a copy_history_t; the ctor simply sets up the vtable ptr and
initializes a gimple *.

Or, all those

      if (for_array_bound)
	warned = warning_at (richloc, OPT_Warray_bounds_, 
                             /* etc */);

could become:

      if (for_array_bound)
        {
          copy_history_diagnostic_path path (location, stmt);
	  warned = warning_at (richloc, OPT_Warray_bounds_, 
                               /* etc */);
        }

or somesuch, explicitly deferring creation of the path until we're
actually about to emit a warning, and putting the logic in
copy_history_diagnostic_path's ctor.

> +
>    *walk_subtree = true;
>  
>    bool warned = false;
> @@ -808,14 +824,14 @@ array_bounds_checker::check_array_bounds (tree
> *tp, int *walk_subtree,
>    gcc_assert (checker->m_stmt == wi->stmt);
>  
>    if (TREE_CODE (t) == ARRAY_REF)
> -    warned = checker->check_array_ref (location, t, wi->stmt,
> +    warned = checker->check_array_ref (&richloc, t, wi->stmt,
>                                        false/*ignore_off_by_one*/);
>    else if (TREE_CODE (t) == MEM_REF)
> -    warned = checker->check_mem_ref (location, t,
> +    warned = checker->check_mem_ref (&richloc, t,
>                                      false /*ignore_off_by_one*/);
>    else if (TREE_CODE (t) == ADDR_EXPR)
>      {
> -      checker->check_addr_expr (location, t, wi->stmt);
> +      checker->check_addr_expr (&richloc, t, wi->stmt);
>        *walk_subtree = false;
>      }
>    else if (inbounds_memaccess_p (t, wi->stmt))

[...snip...]

> diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
> index aa7ca8e9730..c2f7a017fc2 100644
> --- a/gcc/gimple-array-bounds.h
> +++ b/gcc/gimple-array-bounds.h
> @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define GCC_GIMPLE_ARRAY_BOUNDS_H
>  
>  #include "pointer-query.h"
> +#include "gcc-rich-location.h"
>  
>  class array_bounds_checker
>  {
> @@ -32,9 +33,10 @@ public:
>  
>  private:
>    static tree check_array_bounds (tree *tp, int *walk_subtree, void
> *data);
> -  bool check_array_ref (location_t, tree, gimple *, bool
> ignore_off_by_one);
> -  bool check_mem_ref (location_t, tree, bool ignore_off_by_one);
> -  void check_addr_expr (location_t, tree, gimple *);
> +  bool check_array_ref (gcc_rich_location *, tree, gimple *,
> +                       bool ignore_off_by_one);

Note that these could just take a rich_location *, I think, and drop
the #include above - I don't think there's anything you're using in
gcc_rich_location that isn't in the rich_location base class.

> +  bool check_mem_ref (gcc_rich_location *, tree, bool
> ignore_off_by_one);
> +  void check_addr_expr (gcc_rich_location *, tree, gimple *);
>    void get_value_range (irange &r, const_tree op, gimple *);
>  
>    /* Current function.  */

[...snip...]


Hope this is constructive
Dave
David Malcolm July 2, 2024, 10:08 p.m. UTC | #2
On Tue, 2024-07-02 at 18:02 -0400, David Malcolm wrote:
> On Tue, 2024-07-02 at 16:17 +0000, Qing Zhao wrote:

[...snip...]
> 
> > +    path.add_event (cur_ch->condition, NULL_TREE, 0,
> > +                   "when the condition is evaluated to %s",
> > +                   cur_ch->is_true_path ? "true" : "false");
> 
> add_event's 2nd param is the function in which the event occurs,
> which
> you may want to set to cfun, and set the stack depth to 1, rather
                         ^^^^
                         cfun's decl, that is, sorry.


[...snip...]

> 
> Or, all those
> 
>       if (for_array_bound)
>         warned = warning_at (richloc, OPT_Warray_bounds_, 
>                              /* etc */);
> 
> could become:
> 
>       if (for_array_bound)
>         {
>           copy_history_diagnostic_path path (location, stmt);
>           warned = warning_at (richloc, OPT_Warray_bounds_, 
>                                /* etc */);
>         }

Or rather:

      if (for_array_bound)
        {
          copy_history_location richloc (location, stmt);
          warned = warning_at (&richloc, OPT_Warray_bounds_, 
                               /* etc */);
        }

where the copy_history_location is a new rich_location subclass, and it
only populates the path if it's asked for one (when we're actually
emitting the diagnostic).

> or somesuch, explicitly deferring creation of the path until we're
> actually about to emit a warning, and putting the logic in
> copy_history_diagnostic_path's ctor.

[...snip...]


Dave
Qing Zhao July 3, 2024, 2:23 p.m. UTC | #3
> On Jul 2, 2024, at 18:02, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Tue, 2024-07-02 at 16:17 +0000, Qing Zhao wrote:
>> due to code duplication from jump threading [PR109071]
>> Control this with a new option -fdiagnostic-try-to-explain-harder.
> 
> The name -fdiagnostic-try-to-explain-harder seems a little too "cute"
> to me, but I can't think of a better name.
Me either -:)
> 
> Various comments inline below...  I'm sorry I didn't take a close look
> at the copy history implementation; I'm hoping Richi will dig into that
> part of the patch.
> 
>> 
>> This patch has been tested with -fdiagnostic-try-to-expain-harder on
>> by
>> default to bootstrap gcc and regression testing on both x86 and
>> aarch64,
>> resolved all bootstrap issues and regression testing issues. 
>> 
>> I need some help in the following two items:
>> 1. suggestions on better documentation wordings for the new option.
>> 2. checking on the new data structures copy_history and the
>>    memory management for it.
>>    In the beginning, I tried to use the GGC for it.
>>    but have met quite some issues (possibly because the gimple and
>> the containing
>>    basic block elimination), then I gave up the GGC scheme and
>> instead
>>    used the obstack and manually clean and deleted the obstack in the
>> end of
>>    the compilation.  
>> 
>> The following is more details on the patchs
>> 
>> Thanks a lot!
>> 
>> Qing
>> 
>> $ cat t.c
>> extern void warn(void);
>> static inline void assign(int val, int *regs, int *index)
>> {
>>   if (*index >= 4)
>>     warn();
>>   *regs = val;
>> }
>> struct nums {int vals[4];};
>> 
>> void sparx5_set (int *ptr, struct nums *sg, int index)
>> {
>>   int *val = &sg->vals[index];
>> 
>>   assign(0,    ptr, &index);
>>   assign(*val, ptr, &index);
>> }
>> 
>> $ gcc -Wall -O2  -c -o t.o t.c
>> t.c: In function ‘sparx5_set’:
>> t.c:12:23: warning: array subscript 4 is above array bounds of
>> ‘int[4]’ [-Warray-bounds=]
>>    12 |   int *val = &sg->vals[index];
>>       |               ~~~~~~~~^~~~~~~
>> t.c:8:18: note: while referencing ‘vals’
>>     8 | struct nums {int vals[4];};
>>       |                  ^~~~
>> 
>> In the above, Although the warning is correct in theory, the warning
>> message
>> itself is confusing to the end-user since there is information that
>> cannot
>> be connected to the source code directly.
>> 
>> It will be a nice improvement to add more information in the warning
>> message
>> to report where such index value come from.
>> 
>> In order to achieve this, we add a new data structure copy_history to
>> record
>> the condition and the transformation that triggered the code
>> duplication.
>> Whenever there is a code duplication due to some specific
>> transformations,
>> such as jump threading, loop switching, etc, a copy_history structure
>> is
>> created and attached to the duplicated gimple statement.
>> 
>> During array out-of-bound checking or other warning checking, the
>> copy_history
>> that was attached to the gimple statement is used to form a sequence
>> of
>> diagnostic events that are added to the corresponding rich location
>> to be used
>> to report the warning message.
>> 
>> This behavior is controled by the new option -fdiagnostic-try-to-
>> explain-harder
>> which is off by default.
>> 
>> With this change, by adding -fdiagnostic-try-to-explain-harder,
>> the warning message for the above testing case is now:
>> 
>> t.c: In function ‘sparx5_set’:
>> t.c:12:23: warning: array subscript 4 is above array bounds of
>> ‘int[4]’ [-Warray-bounds=]
>>    12 |   int *val = &sg->vals[index];
>>       |               ~~~~~~~~^~~~~~~
>>   event 1
>>     |
>>     |    4 |   if (*index >= 4)
>>     |      |      ^
>>     |      |      |
>>     |      |      (1) when the condition is evaluated to true
>>     |
>> t.c:8:18: note: while referencing ‘vals’
>>     8 | struct nums {int vals[4];};
>>       |                  ^~~~
> 
> BTW I notice in the above example you have the extra vertical line
> below "event 1" here:
> 
>  event 1
>    |
>    |    4 |   if (*index >= 4)
>    |      |      ^
>    |      |      |
>    |      |      (1) when the condition is evaluated to true
>    |
> 
> whereas in the testcase it looks like you're expecting the simpler:
> 
>  event 1
>   4 |   if (*index >= 4)
>     |      ^
>     |      |
>     |      (1) when the condition is evaluated to true
> 
> which is due to r15-533-g3cd267446755ab, so I think the example text is
> slightly outdated.

Yes, you are right, I updated the format in the testing case but forgot to update it in
the comment of my patch. 
I will update the comment of the patch too.
> 
> I wonder if the wording of the event could be improved to better
> explain to the user what the optimizer is "thinking".
> 
> Perhaps it could be two events:
> 
> (1) when specializing the code for both branches...
> (2) ...and considering the 'true' branch...
> 
> or something like that?  (I'm not sure)
I can explain the event in more details as your suggested
> 
> Is there a more complicated testcase with multiple events?

When I bootstrapped GCC, there were some issues relating to the multiple copies of the same gimple statement,
all in jump threading pass, however, no any warning issue. 

It’s not easy to come up with a testing case with multiple copies of the same gimple statement and generate warning 
at the same time.
I will try to think harder here.

> 
> Various other points about the path:
> 
> When the analyzer builds paths for its diagnostics, it adds a final
> event that repeats the problem.  This is somewhat tautological, but
> might make the path more readable - I'm not sure.  Consider:
> 
> t.c: In function ‘sparx5_set’:
> t.c:12:23: warning: array subscript 4 is above array bounds of
> ‘int[4]’ [-Warray-bounds=]
>   12 |   int *val = &sg->vals[index];
>      |               ~~~~~~~~^~~~~~~
>  events 1-2
>    4 |   if (*index >= 4)
>      |      ^
>      |      |
>      |      (1) when the condition is evaluated to true
>   ...
>   12 |   int *val = &sg->vals[index];
>      |               ~~~~~~~~^~~~~~~
>      |                       |
>      |                       (2) ⚠️ out of array bounds here
> 
> t.c:8:18: note: while referencing ‘vals’
>    8 | struct nums {int vals[4];};
>      |                  ^~~~
> 
> FWIW there's a way to get get a ⚠️  emoji on that final event, by
> having its meaning have verb == diagnostic_event::VERB_danger.
> 
> GCC 15 also supports visualizing control flow between events, by having
> diagnostic_event::connect_to_next_event_p () return true; see r15-636-
> g770657d02c986c, so that might be something to consider adding.

Yes, that above diagnostic message is much more clear, I will try this.
Thanks for the suggestion.
> 
> 
> [...snip...]
> 
>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>> index 7d3ea27a6ab..36a5b5fa305 100644
>> --- a/gcc/Makefile.in
>> +++ b/gcc/Makefile.in
>> @@ -1436,6 +1436,7 @@ OBJS = \
>>         df-problems.o \
>>         df-scan.o \
>>         dfp.o \
>> +       diagnostic-copy-history.o \
>>         digraph.o \
>>         dojump.o \
>>         dominance.o \
>> @@ -1821,7 +1822,8 @@ OBJS = \
>>  
>>  # Objects in libcommon.a, potentially used by all host binaries and
>> with
>>  # no target dependencies.
>> -OBJS-libcommon = diagnostic-spec.o diagnostic.o diagnostic-color.o \
>> +OBJS-libcommon = diagnostic-spec.o \
>> +       diagnostic.o diagnostic-color.o \
> 
> Is this change to OBJS-libcommon a stray edit?  I don't think it
> changes the meaning.

You are right, will delete this change.
> 
> 
>>         diagnostic-format-json.o \
>>         diagnostic-format-sarif.o \
>>         diagnostic-global-context.o \
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 327230967ea..833cc2f4da4 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -1558,6 +1558,10 @@ fdiagnostics-minimum-margin-width=
>>  Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6)
>>  Set minimum width of left margin of source code when showing source.
>>  
>> +fdiagnostics-try-to-explain-harder
>> +Common Var(flag_diagnostics_try_to_explain_harder)
>> +Collect and print more context information for diagnostics.
>> +
> 
> Eventually you'll want to run "make regenerate-opt-urls".

Yes, will do that.
> 
> [...snip...]
> 
>> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
>> index 1637a2fc4f4..9050b06bf1f 100644
>> --- a/gcc/gimple-array-bounds.cc
>> +++ b/gcc/gimple-array-bounds.cc
> 
> [...snip...]
> 
>> @@ -801,6 +802,21 @@ array_bounds_checker::check_array_bounds (tree
>> *tp, int *walk_subtree,
>>    else
>>      location = gimple_location (wi->stmt);
>>  
>> +  /* Generate a rich location for this location.  */
>> +  gcc_rich_location richloc (location);
>> +  simple_diagnostic_path path (global_dc->printer);
>> +
>> +  /* Add events to this diagnostic_path per the copy_history
>> attached to
>> +     the corresponding STMT.  */
>> +  copy_history_t cp_history = wi->stmt ? get_copy_history (wi->stmt)
>> : NULL;
>> +  for (copy_history_t cur_ch = cp_history; cur_ch;
>> +       cur_ch = cur_ch->prev_copy)
>> +    path.add_event (cur_ch->condition, NULL_TREE, 0,
>> +                   "when the condition is evaluated to %s",
>> +                   cur_ch->is_true_path ? "true" : "false");
> 
> add_event's 2nd param is the function in which the event occurs, which
> you may want to set to cfun, and set the stack depth to 1, rather than
> 0, suggesting a purely intraprocedural control flow.
> 
> FWIW the analyzer has some nasty logic to try to reconstruct functions
> and stack frames in the light of inlining, which might or might not
> make sense for you to make use of (I'm not sure).

Okay, thanks for the info, will study a little bit here.
> 
> 
>> +
>> +  richloc.set_path (&path);
> 
> Creating and populating "path" here is doing non-trivial work per call
> to check_array_bounds, and if I'm reading things right that's being
> done for every statement.
> 
> So maybe we want something like:
> 
>  lazy_diaqnostic_path path (callback, data);
> 
> that is backed by a diagnostic_path *, which gets populated on demand
> if any vfuncs on the lazy_diagnostic_path are actually called.
> 
> Or we could abandon simple_diagnostic_path here and have a:
> 
>  copy_history_diagnostic_path path (wi->stmt);
> 
> subclass of diagnostic_path that implements its vfuncs directly in
> terms of a copy_history_t; the ctor simply sets up the vtable ptr and
> initializes a gimple *.
> 
> Or, all those
> 
>      if (for_array_bound)
> warned = warning_at (richloc, OPT_Warray_bounds_, 
>                             /* etc */);
> 
> could become:
> 
>      if (for_array_bound)
>        {
>          copy_history_diagnostic_path path (location, stmt);
>   warned = warning_at (richloc, OPT_Warray_bounds_, 
>                               /* etc */);
>        }
> 
> or somesuch, explicitly deferring creation of the path until we're
> actually about to emit a warning, and putting the logic in
> copy_history_diagnostic_path's ctor.

So, the major complain here is, the creating and populating “path” is expensive, it’s a waste to do this
for every gimple, it’s better to delay the creation of the path until we’re about to emit a warning.

Agreed, I will try to delay the creation of the path until necessary.

> 
>> +
>>    *walk_subtree = true;
>>  
>>    bool warned = false;
>> @@ -808,14 +824,14 @@ array_bounds_checker::check_array_bounds (tree
>> *tp, int *walk_subtree,
>>    gcc_assert (checker->m_stmt == wi->stmt);
>>  
>>    if (TREE_CODE (t) == ARRAY_REF)
>> -    warned = checker->check_array_ref (location, t, wi->stmt,
>> +    warned = checker->check_array_ref (&richloc, t, wi->stmt,
>>                                        false/*ignore_off_by_one*/);
>>    else if (TREE_CODE (t) == MEM_REF)
>> -    warned = checker->check_mem_ref (location, t,
>> +    warned = checker->check_mem_ref (&richloc, t,
>>                                      false /*ignore_off_by_one*/);
>>    else if (TREE_CODE (t) == ADDR_EXPR)
>>      {
>> -      checker->check_addr_expr (location, t, wi->stmt);
>> +      checker->check_addr_expr (&richloc, t, wi->stmt);
>>        *walk_subtree = false;
>>      }
>>    else if (inbounds_memaccess_p (t, wi->stmt))
> 
> [...snip...]
> 
>> diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
>> index aa7ca8e9730..c2f7a017fc2 100644
>> --- a/gcc/gimple-array-bounds.h
>> +++ b/gcc/gimple-array-bounds.h
>> @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #define GCC_GIMPLE_ARRAY_BOUNDS_H
>>  
>>  #include "pointer-query.h"
>> +#include "gcc-rich-location.h"
>>  
>>  class array_bounds_checker
>>  {
>> @@ -32,9 +33,10 @@ public:
>>  
>>  private:
>>    static tree check_array_bounds (tree *tp, int *walk_subtree, void
>> *data);
>> -  bool check_array_ref (location_t, tree, gimple *, bool
>> ignore_off_by_one);
>> -  bool check_mem_ref (location_t, tree, bool ignore_off_by_one);
>> -  void check_addr_expr (location_t, tree, gimple *);
>> +  bool check_array_ref (gcc_rich_location *, tree, gimple *,
>> +                       bool ignore_off_by_one);
> 
> Note that these could just take a rich_location *, I think, and drop
> the #include above - I don't think there's anything you're using in
> gcc_rich_location that isn't in the rich_location base class.

Okay, will update this.
> 
>> +  bool check_mem_ref (gcc_rich_location *, tree, bool
>> ignore_off_by_one);
>> +  void check_addr_expr (gcc_rich_location *, tree, gimple *);
>>    void get_value_range (irange &r, const_tree op, gimple *);
>>  
>>    /* Current function.  */
> 
> [...snip...]
> 
> 
> Hope this is constructive

Thanks a lot for all your comment and suggestions, very helpful.

Qing
> Dave
diff mbox series

Patch

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7d3ea27a6ab..36a5b5fa305 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1436,6 +1436,7 @@  OBJS = \
 	df-problems.o \
 	df-scan.o \
 	dfp.o \
+	diagnostic-copy-history.o \
 	digraph.o \
 	dojump.o \
 	dominance.o \
@@ -1821,7 +1822,8 @@  OBJS = \
 
 # Objects in libcommon.a, potentially used by all host binaries and with
 # no target dependencies.
-OBJS-libcommon = diagnostic-spec.o diagnostic.o diagnostic-color.o \
+OBJS-libcommon = diagnostic-spec.o \
+	diagnostic.o diagnostic-color.o \
 	diagnostic-format-json.o \
 	diagnostic-format-sarif.o \
 	diagnostic-global-context.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 327230967ea..833cc2f4da4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1558,6 +1558,10 @@  fdiagnostics-minimum-margin-width=
 Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6)
 Set minimum width of left margin of source code when showing source.
 
+fdiagnostics-try-to-explain-harder
+Common Var(flag_diagnostics_try_to_explain_harder)
+Collect and print more context information for diagnostics.
+
 fdisable-
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fdisable-[tree|rtl|ipa]-<pass>=range1+range2	Disable an optimization pass.
diff --git a/gcc/diagnostic-copy-history.cc b/gcc/diagnostic-copy-history.cc
new file mode 100644
index 00000000000..2733d137622
--- /dev/null
+++ b/gcc/diagnostic-copy-history.cc
@@ -0,0 +1,163 @@ 
+/* Functions to handle copy history.
+   Copyright (C) 2024-2024 Free Software Foundation, Inc.
+   Contributed by Qing Zhao <qing.zhao@oracle.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "diagnostic-copy-history.h"
+
+/* A mapping from a gimple to a pointer to the copy history of it.  */
+static copy_history_map_t *copy_history_map;
+
+/* Obstack for copy history.  */
+static struct obstack copy_history_obstack;
+
+/* Create a new copy history.  */
+
+copy_history_t
+create_copy_history (location_t condition,
+		     bool is_true_path,
+		     enum copy_reason reason,
+		     copy_history_t prev_copy)
+{
+  static bool initialized = false;
+
+  if (!initialized)
+    {
+      gcc_obstack_init (&copy_history_obstack);
+      initialized = true;
+    }
+
+  copy_history_t copy_history
+    = (copy_history_t) obstack_alloc (&copy_history_obstack,
+				      sizeof (struct copy_history));
+  copy_history->condition = condition;
+  copy_history->is_true_path = is_true_path;
+  copy_history->reason = reason;
+  copy_history->prev_copy = prev_copy;
+  return copy_history;
+}
+
+/* Insert the copy history for the gimple STMT assuming the linked list
+   of CP_HISTORY does not have duplications.  It's the caller's
+   responsibility to make sure that the linked list of CP_HISTORY does
+   not have duplications.  */
+
+void
+insert_copy_history (gimple *stmt, copy_history_t cp_history)
+{
+  if (!copy_history_map)
+    copy_history_map = new copy_history_map_t;
+
+  copy_history_map->put (stmt, cp_history);
+  return;
+}
+
+/* Get the copy history for the gimple STMT, return NULL when there is
+   no associated copy history.  */
+
+copy_history_t
+get_copy_history (gimple *stmt)
+{
+  if (!copy_history_map)
+    return NULL;
+
+  if (const copy_history_t *cp_history_p = copy_history_map->get (stmt))
+    return *cp_history_p;
+
+  return NULL;
+}
+
+/* Remove the copy history for STMT.  */
+
+void
+remove_copy_history (gimple *stmt)
+{
+  if (!copy_history_map)
+    return;
+  copy_history_map->remove (stmt);
+  return;
+}
+
+/* Check whether the cond_location, is_true_path and reason existed
+ * in the OLD_COPY_HISTORY.  */
+
+static bool
+is_copy_history_existed (location_t cond_location, bool is_true_path,
+			 enum copy_reason reason,
+			 copy_history_t old_copy_history)
+{
+  for (copy_history_t cur_ch = old_copy_history; cur_ch;
+       cur_ch = cur_ch->prev_copy)
+    if ((cur_ch->condition == cond_location)
+	&& (cur_ch->is_true_path == is_true_path)
+	&& (cur_ch->reason == reason))
+      return true;
+
+  return false;
+}
+
+/* Set copy history for the gimple STMT.  Return TRUE when a new copy
+ * history is created and inserted.  Otherwise return FALSE.  */
+
+bool
+set_copy_history (gimple *stmt, location_t cond_location,
+		  bool is_true_path, enum copy_reason reason)
+{
+
+  /* First, get the old copy history associated with this STMT.  */
+  copy_history_t old_cp_history = get_copy_history (stmt);
+
+  /* If the current copy history is not in the STMT's copy history linked
+     list yet, create the new copy history, put the old_copy_history as the
+     prev_copy of it.  */
+  copy_history_t new_cp_history = NULL;
+  if (!is_copy_history_existed (cond_location, is_true_path,
+				reason, old_cp_history))
+    new_cp_history
+      = create_copy_history (cond_location, is_true_path,
+			     reason, old_cp_history);
+
+  /* Insert the copy history into the hash map.  */
+  if (new_cp_history)
+    {
+      insert_copy_history (stmt, new_cp_history);
+      return true;
+    }
+
+  return false;
+}
+
+/* Reset all state for diagnostic-copy-history.cc so that we can rerun the
+   compiler within the same process.  For use by toplev::finalize.  */
+
+void
+copy_history_finalize (void)
+{
+  if (copy_history_map)
+    {
+      delete copy_history_map;
+      copy_history_map = NULL;
+    }
+  obstack_free (&copy_history_obstack, NULL);
+  return;
+}
diff --git a/gcc/diagnostic-copy-history.h b/gcc/diagnostic-copy-history.h
new file mode 100644
index 00000000000..98c95388070
--- /dev/null
+++ b/gcc/diagnostic-copy-history.h
@@ -0,0 +1,81 @@ 
+/* Copy history associated with a gimple statement to record its history
+   of duplications due to different transformations.
+   The copy history will be used to construct events for later diagnostic.
+
+   Copyright (C) 2024-2024 Free Software Foundation, Inc.
+   Contributed by Qing Zhao <qing.zhao@oracle.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef DIAGNOSTIC_COPY_HISTORY_H_INCLUDED
+#define DIAGNOSTIC_COPY_HISTORY_H_INCLUDED
+
+#include "hash-map.h"
+#include "line-map.h"
+#include "gimple.h"
+
+/* An enum for the reason why this copy is made.  Right now, there are
+   three reasons, we can add more if needed.  */
+enum copy_reason {
+  COPY_BY_THREAD_JUMP,
+  COPY_BY_LOOP_UNSWITCH,
+  COPY_BY_LOOP_UNROLL
+};
+
+/* This data structure records the information when a statement is
+   duplicated during different transformations.  Such information
+   will be used by the later diagnostic messages to report more
+   contexts of the warnings or errors.  */
+struct copy_history {
+  /* The location of the condition statement that triggered the code
+     duplication.  */
+  location_t condition;
+
+  /* Whether this copy is on the TRUE path of the condition.  */
+  bool is_true_path;
+
+  /* The reason for the code duplication.  */
+  enum copy_reason reason;
+
+  /* This statement itself might be a previous code duplication.  */
+  struct copy_history *prev_copy;
+};
+
+typedef struct copy_history *copy_history_t;
+
+/* Create a new copy history.  */
+extern copy_history_t create_copy_history (location_t, bool,
+					   enum copy_reason, copy_history_t);
+
+typedef hash_map<gimple *, copy_history_t> copy_history_map_t;
+
+/* Get the copy history for the gimple STMT, return NULL when there is
+   no associated copy history.  */
+extern copy_history_t get_copy_history (gimple *);
+
+/* Remove the copy history for STMT from the copy_history_map.  */
+extern void remove_copy_history (gimple *);
+
+/* Set copy history for the gimple STMT.  */
+extern bool set_copy_history (gimple *, location_t,
+			      bool, enum copy_reason);
+
+/* Reset all state for diagnostic-copy-history.cc so that we can rerun the
+   compiler within the same process.  For use by toplev::finalize.  */
+extern void copy_history_finalize (void);
+
+#endif // DIAGNOSTIC_COPY_HISTORY_H_INCLUDED
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 30c4b002d1f..e1509950165 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -321,6 +321,7 @@  Objective-C and Objective-C++ Dialects}.
 -fdiagnostics-column-origin=@var{origin}
 -fdiagnostics-escape-format=@r{[}unicode@r{|}bytes@r{]}
 -fdiagnostics-text-art-charset=@r{[}none@r{|}ascii@r{|}unicode@r{|}emoji@r{]}}
+-fdiagnostics-try-to-explain-harder
 
 @item Warning Options
 @xref{Warning Options,,Options to Request or Suppress Warnings}.
@@ -5517,6 +5518,12 @@  left margin.
 This option controls the minimum width of the left margin printed by
 @option{-fdiagnostics-show-line-numbers}.  It defaults to 6.
 
+@opindex fdiagnostics-try-to-explain-harder
+@item -fdiagnostics-try-to-explain-harder
+With this option, the compiler collects more context information for
+diagnostics and emits them to the users to provide more hints on how
+the diagnostics come from.
+
 @opindex fdiagnostics-parseable-fixits
 @item -fdiagnostics-parseable-fixits
 Emit fix-it hints in a machine-parseable format, suitable for consumption
diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 1637a2fc4f4..9050b06bf1f 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -31,6 +31,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "fold-const.h"
 #include "diagnostic-core.h"
+#include "simple-diagnostic-path.h"
+#include "diagnostic-copy-history.h"
 #include "intl.h"
 #include "tree-vrp.h"
 #include "alloc-pool.h"
@@ -261,7 +263,7 @@  get_up_bounds_for_array_ref (tree ref, tree *decl,
    return TRUE if warnings are issued.  */
 
 static bool
-check_out_of_bounds_and_warn (location_t location, tree ref,
+check_out_of_bounds_and_warn (gcc_rich_location *richloc, tree ref,
 			      tree low_sub_org, tree low_sub, tree up_sub,
 			      tree up_bound, tree up_bound_p1,
 			      const irange *vr,
@@ -280,7 +282,7 @@  check_out_of_bounds_and_warn (location_t location, tree ref,
     {
       *out_of_bound = true;
       if (for_array_bound)
-	warned = warning_at (location, OPT_Warray_bounds_,
+	warned = warning_at (richloc, OPT_Warray_bounds_,
 			     "array subscript %E is outside array"
 			     " bounds of %qT", low_sub_org, artype);
     }
@@ -299,7 +301,7 @@  check_out_of_bounds_and_warn (location_t location, tree ref,
 	{
 	  *out_of_bound = true;
 	  if (for_array_bound)
-	    warned = warning_at (location, OPT_Warray_bounds_,
+	    warned = warning_at (richloc, OPT_Warray_bounds_,
 				 "array subscript [%E, %E] is outside "
 				 "array bounds of %qT",
 				 low_sub, up_sub, artype);
@@ -313,7 +315,7 @@  check_out_of_bounds_and_warn (location_t location, tree ref,
     {
       *out_of_bound = true;
       if (for_array_bound)
-	warned = warning_at (location, OPT_Warray_bounds_,
+	warned = warning_at (richloc, OPT_Warray_bounds_,
 			     "array subscript %E is above array bounds of %qT",
 			     up_sub, artype);
     }
@@ -322,7 +324,7 @@  check_out_of_bounds_and_warn (location_t location, tree ref,
     {
       *out_of_bound = true;
       if (for_array_bound)
-	warned = warning_at (location, OPT_Warray_bounds_,
+	warned = warning_at (richloc, OPT_Warray_bounds_,
 			     "array subscript %E is below array bounds of %qT",
 			     low_sub, artype);
     }
@@ -338,7 +340,7 @@  check_out_of_bounds_and_warn (location_t location, tree ref,
    no-warning is set.  */
 
 bool
-array_bounds_checker::check_array_ref (location_t location, tree ref,
+array_bounds_checker::check_array_ref (gcc_rich_location *richloc , tree ref,
 				       gimple *stmt, bool ignore_off_by_one)
 {
   if (warning_suppressed_p (ref, OPT_Warray_bounds_))
@@ -388,15 +390,14 @@  array_bounds_checker::check_array_ref (location_t location, tree ref,
 	}
     }
 
-  warned = check_out_of_bounds_and_warn (location, ref,
+  warned = check_out_of_bounds_and_warn (richloc, ref,
 					 low_sub_org, low_sub, up_sub,
 					 up_bound, up_bound_p1, &vr,
 					 ignore_off_by_one, warn_array_bounds,
 					 &out_of_bound);
 
-
   if (!warned && sam == special_array_member::int_0)
-    warned = warning_at (location, OPT_Wzero_length_bounds,
+    warned = warning_at (richloc, OPT_Wzero_length_bounds,
 			 (TREE_CODE (low_sub) == INTEGER_CST
 			  ? G_("array subscript %E is outside the bounds "
 			       "of an interior zero-length array %qT")
@@ -420,7 +421,7 @@  array_bounds_checker::check_array_ref (location_t location, tree ref,
       && DECL_NOT_FLEXARRAY (afield_decl))
     {
       bool warned1
-	= warning_at (location, OPT_Wstrict_flex_arrays,
+	= warning_at (richloc, OPT_Wstrict_flex_arrays,
 		      "trailing array %qT should not be used as "
 		      "a flexible array member",
 		      artype);
@@ -477,7 +478,7 @@  array_bounds_checker::check_array_ref (location_t location, tree ref,
    Returns true if a warning has been issued.  */
 
 bool
-array_bounds_checker::check_mem_ref (location_t location, tree ref,
+array_bounds_checker::check_mem_ref (gcc_rich_location *richloc, tree ref,
 				     bool ignore_off_by_one)
 {
   if (warning_suppressed_p (ref, OPT_Warray_bounds_))
@@ -580,12 +581,12 @@  array_bounds_checker::check_mem_ref (location_t location, tree ref,
   if (lboob)
     {
       if (offrange[0] == offrange[1])
-	warned = warning_at (location, OPT_Warray_bounds_,
+	warned = warning_at (richloc, OPT_Warray_bounds_,
 			     "array subscript %wi is outside array bounds "
 			     "of %qT",
 			     offrange[0].to_shwi (), reftype);
       else
-	warned = warning_at (location, OPT_Warray_bounds_,
+	warned = warning_at (richloc, OPT_Warray_bounds_,
 			     "array subscript [%wi, %wi] is outside "
 			     "array bounds of %qT",
 			     offrange[0].to_shwi (),
@@ -600,7 +601,7 @@  array_bounds_checker::check_mem_ref (location_t location, tree ref,
 	backtype = build_array_type_nelts (unsigned_char_type_node,
 					   aref.sizrng[1].to_uhwi ());
 
-      warned = warning_at (location, OPT_Warray_bounds_,
+      warned = warning_at (richloc, OPT_Warray_bounds_,
 			   "array subscript %<%T[%wi]%> is partly "
 			   "outside array bounds of %qT",
 			   axstype, offrange[0].to_shwi (), backtype);
@@ -623,7 +624,7 @@  array_bounds_checker::check_mem_ref (location_t location, tree ref,
     {
       HOST_WIDE_INT tmpidx = (aref.offmax[i] / eltsize).to_shwi ();
 
-      if (warning_at (location, OPT_Warray_bounds_,
+      if (warning_at (richloc, OPT_Warray_bounds_,
 		      "intermediate array offset %wi is outside array bounds "
 		      "of %qT", tmpidx, reftype))
 	{
@@ -639,7 +640,7 @@  array_bounds_checker::check_mem_ref (location_t location, tree ref,
    address of an ARRAY_REF, and call check_array_ref on it.  */
 
 void
-array_bounds_checker::check_addr_expr (location_t location, tree t,
+array_bounds_checker::check_addr_expr (gcc_rich_location *richloc, tree t,
 				       gimple *stmt)
 {
   /* For the most significant subscript only, accept taking the address
@@ -652,11 +653,11 @@  array_bounds_checker::check_addr_expr (location_t location, tree t,
       bool warned = false;
       if (TREE_CODE (t) == ARRAY_REF)
 	{
-	  warned = check_array_ref (location, t, stmt, ignore_off_by_one);
+	  warned = check_array_ref (richloc, t, stmt, ignore_off_by_one);
 	  ignore_off_by_one = false;
 	}
       else if (TREE_CODE (t) == MEM_REF)
-	warned = check_mem_ref (location, t, ignore_off_by_one);
+	warned = check_mem_ref (richloc, t, ignore_off_by_one);
 
       if (warned)
 	suppress_warning (t, OPT_Warray_bounds_);
@@ -702,7 +703,7 @@  array_bounds_checker::check_addr_expr (location_t location, tree t,
 	  dump_generic_expr (MSG_NOTE, TDF_SLIM, t);
 	  fprintf (dump_file, "\n");
 	}
-      warned = warning_at (location, OPT_Warray_bounds_,
+      warned = warning_at (richloc, OPT_Warray_bounds_,
 			   "array subscript %wi is below "
 			   "array bounds of %qT",
 			   idx.to_shwi (), TREE_TYPE (tem));
@@ -716,7 +717,7 @@  array_bounds_checker::check_addr_expr (location_t location, tree t,
 	  dump_generic_expr (MSG_NOTE, TDF_SLIM, t);
 	  fprintf (dump_file, "\n");
 	}
-      warned = warning_at (location, OPT_Warray_bounds_,
+      warned = warning_at (richloc, OPT_Warray_bounds_,
 			   "array subscript %wu is above "
 			   "array bounds of %qT",
 			   idx.to_uhwi (), TREE_TYPE (tem));
@@ -801,6 +802,21 @@  array_bounds_checker::check_array_bounds (tree *tp, int *walk_subtree,
   else
     location = gimple_location (wi->stmt);
 
+  /* Generate a rich location for this location.  */
+  gcc_rich_location richloc (location);
+  simple_diagnostic_path path (global_dc->printer);
+
+  /* Add events to this diagnostic_path per the copy_history attached to
+     the corresponding STMT.  */
+  copy_history_t cp_history = wi->stmt ? get_copy_history (wi->stmt) : NULL;
+  for (copy_history_t cur_ch = cp_history; cur_ch;
+       cur_ch = cur_ch->prev_copy)
+    path.add_event (cur_ch->condition, NULL_TREE, 0,
+		    "when the condition is evaluated to %s",
+		    cur_ch->is_true_path ? "true" : "false");
+
+  richloc.set_path (&path);
+
   *walk_subtree = true;
 
   bool warned = false;
@@ -808,14 +824,14 @@  array_bounds_checker::check_array_bounds (tree *tp, int *walk_subtree,
   gcc_assert (checker->m_stmt == wi->stmt);
 
   if (TREE_CODE (t) == ARRAY_REF)
-    warned = checker->check_array_ref (location, t, wi->stmt,
+    warned = checker->check_array_ref (&richloc, t, wi->stmt,
 				       false/*ignore_off_by_one*/);
   else if (TREE_CODE (t) == MEM_REF)
-    warned = checker->check_mem_ref (location, t,
+    warned = checker->check_mem_ref (&richloc, t,
 				     false /*ignore_off_by_one*/);
   else if (TREE_CODE (t) == ADDR_EXPR)
     {
-      checker->check_addr_expr (location, t, wi->stmt);
+      checker->check_addr_expr (&richloc, t, wi->stmt);
       *walk_subtree = false;
     }
   else if (inbounds_memaccess_p (t, wi->stmt))
diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
index aa7ca8e9730..c2f7a017fc2 100644
--- a/gcc/gimple-array-bounds.h
+++ b/gcc/gimple-array-bounds.h
@@ -21,6 +21,7 @@  along with GCC; see the file COPYING3.  If not see
 #define GCC_GIMPLE_ARRAY_BOUNDS_H
 
 #include "pointer-query.h"
+#include "gcc-rich-location.h"
 
 class array_bounds_checker
 {
@@ -32,9 +33,10 @@  public:
 
 private:
   static tree check_array_bounds (tree *tp, int *walk_subtree, void *data);
-  bool check_array_ref (location_t, tree, gimple *, bool ignore_off_by_one);
-  bool check_mem_ref (location_t, tree, bool ignore_off_by_one);
-  void check_addr_expr (location_t, tree, gimple *);
+  bool check_array_ref (gcc_rich_location *, tree, gimple *,
+			bool ignore_off_by_one);
+  bool check_mem_ref (gcc_rich_location *, tree, bool ignore_off_by_one);
+  void check_addr_expr (gcc_rich_location *, tree, gimple *);
   void get_value_range (irange &r, const_tree op, gimple *);
 
   /* Current function.  */
diff --git a/gcc/gimple-iterator.cc b/gcc/gimple-iterator.cc
index 93646262eac..472a79360d7 100644
--- a/gcc/gimple-iterator.cc
+++ b/gcc/gimple-iterator.cc
@@ -33,6 +33,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "value-prof.h"
 #include "gimplify.h"
+#include "diagnostic-copy-history.h"
 
 
 /* Mark the statement STMT as modified, and update it.  */
@@ -581,6 +582,8 @@  gsi_remove (gimple_stmt_iterator *i, bool remove_permanently)
 	cfun->debug_marker_count--;
       require_eh_edge_purge = remove_stmt_from_eh_lp (stmt);
       gimple_remove_stmt_histograms (cfun, stmt);
+      if (get_copy_history (stmt) != NULL)
+	remove_copy_history (stmt);
     }
 
   /* Update the iterator and re-wire the links in I->SEQ.  */
diff --git a/gcc/testsuite/gcc.dg/pr109071.c b/gcc/testsuite/gcc.dg/pr109071.c
new file mode 100644
index 00000000000..fd96a1a3a7f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr109071.c
@@ -0,0 +1,36 @@ 
+/* PR tree-optimization/109071 need more context for -Warray-bounds warnings
+   due to code duplication from jump threading.  */  
+/* { dg-options "-O2 -Wall -fdiagnostics-try-to-explain-harder" } */
+/* { dg-additional-options "-fdiagnostics-show-line-numbers -fdiagnostics-path-format=inline-events -fdiagnostics-show-caret" } */
+/* { dg-enable-nn-line-numbers "" } */
+
+extern void warn(void);
+static inline void assign(int val, int *regs, int index)
+{
+  if (index >= 4)
+    warn();
+  *regs = val;
+}
+struct nums {int vals[4];};
+
+void sparx5_set (int *ptr, struct nums *sg, int index)
+{
+  int *val = &sg->vals[index]; /* { dg-warning "is above array bounds" } */
+
+  assign(0,    ptr, index);
+  assign(*val, ptr, index);
+}
+/* { dg-begin-multiline-output "" }
+   NN |   int *val = &sg->vals[index];
+      |               ~~~~~~~~^~~~~~~
+  event 1
+   NN |   if (index >= 4)
+      |      ^
+      |      |
+      |      (1) when the condition is evaluated to true
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+   NN | struct nums {int vals[4];};
+      |                  ^~~~
+   { dg-end-multiline-output "" } */
diff --git a/gcc/toplev.cc b/gcc/toplev.cc
index f715f977b72..e9ae55f64be 100644
--- a/gcc/toplev.cc
+++ b/gcc/toplev.cc
@@ -41,6 +41,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "coverage.h"
 #include "diagnostic.h"
+#include "diagnostic-copy-history.h"
 #include "varasm.h"
 #include "tree-inline.h"
 #include "realmpfr.h"	/* For GMP/MPFR/MPC versions, in print_version.  */
@@ -2381,6 +2382,8 @@  toplev::finalize (void)
   tree_cc_finalize ();
   reginfo_cc_finalize ();
 
+  copy_history_finalize ();
+
   /* save_decoded_options uses opts_obstack, so these must
      be cleaned up together.  */
   obstack_free (&opts_obstack, NULL);
diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
index fa61ba9512b..78ee17da3a1 100644
--- a/gcc/tree-ssa-threadupdate.cc
+++ b/gcc/tree-ssa-threadupdate.cc
@@ -36,6 +36,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-cfg.h"
 #include "tree-vectorizer.h"
 #include "tree-pass.h"
+#include "diagnostic-copy-history.h"
 
 /* Given a block B, update the CFG and SSA graph to reflect redirecting
    one or more in-edges to B to instead reach the destination of an
@@ -2371,6 +2372,37 @@  back_jt_path_registry::adjust_paths_after_duplication (unsigned curr_path_num)
     }
 }
 
+/* Set copy history to all the stmts in the basic block BB based on
+   the entry edge ENTRY and whether this basic block is on the true
+   path of the condition in the destination of the edge ENTRY.
+   Return TRUE when copy history are set, otherwise return FALSE.  */
+
+static bool
+set_copy_history_to_stmts_in_bb (basic_block bb, edge entry, bool is_true_path)
+{
+  /* First, get the condition statement in the destination of the
+     edge ENTRY.  */
+  basic_block cond_block = entry->dest;
+  gimple *cond_stmt = NULL;
+  gimple_stmt_iterator gsi;
+  gsi = gsi_last_bb (cond_block);
+  /* if the cond_block ends with a conditional statement, get it.  */
+  if (!gsi_end_p (gsi)
+      && gsi_stmt (gsi)
+      && (gimple_code (gsi_stmt (gsi)) == GIMPLE_COND))
+    cond_stmt = gsi_stmt (gsi);
+
+  if (!cond_stmt)
+    return false;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    set_copy_history (gsi_stmt (gsi),
+		      gimple_location (cond_stmt),
+		      is_true_path,
+		      COPY_BY_THREAD_JUMP);
+  return true;
+}
+
 /* Duplicates a jump-thread path of N_REGION basic blocks.
    The ENTRY edge is redirected to the duplicate of the region.
 
@@ -2419,6 +2451,16 @@  back_jt_path_registry::duplicate_thread_path (edge entry,
   copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop,
 	    split_edge_bb_loc (entry), false);
 
+  /* Set the copy history for all the stmts in both original and copied
+     basic blocks.  The copied regions are in the true path.  */
+  if (flag_diagnostics_try_to_explain_harder)
+    for (i = 0; i < n_region; i++)
+      {
+	set_copy_history_to_stmts_in_bb (region[i], entry, false);
+	set_copy_history_to_stmts_in_bb (region_copy[i], entry, true);
+      }
+
+
   /* Fix up: copy_bbs redirects all edges pointing to copied blocks.  The
      following code ensures that all the edges exiting the jump-thread path are
      redirected back to the original code: these edges are exceptions