diff mbox

Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

Message ID CA+4CFy4eTkdzsh=xFukjduXXRvg6OWhMb9MOafBqEjWQEhSg9Q@mail.gmail.com
State New
Headers show

Commit Message

Wei Mi Nov. 25, 2013, 7:16 p.m. UTC
On Mon, Nov 25, 2013 at 10:36 AM, Jeff Law <law@redhat.com> wrote:
> On 11/24/13 00:30, Wei Mi wrote:
>>
>> Sorry about the problem.
>>
>> For the failed testcase, it was compiled using -fmodulo-sched.
>> modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which
>> means the jump insn should be scheduled with prev insn as a group.
>> When modulo scheduling is finished, the flag of SCHED_GROUP_P is not
>> cleaned up. After that, pass_jump2 phase split the bb and move the
>> prev insn to another bb. Then pass_sched2 see the flag and mistakenly
>> try to bind the jump insn with a code label.
>>
>> I am thinking other cases setting SCHED_GROUP_P should have the same
>> problem because SCHED_GROUP_P is not cleaned up after scheduling is
>> done. The flag may become inconsistent after some optimizations and
>> may cause problem if it is used by later scheduling passes. I don't
>> know why similar problem was never exposed before.
>>
>> The fix is to simply cleanup SCHED_GROUP_P flag in sched_finish.
>
> I think this is showing up because this is the first time we have used
> SCHED_GROUP_P in cases where we merely want to keep two instructions
> consecutive vs cases where we are required to keep certain instructions
> consecutive.  For example, all the RTL passes already know they need to keep
> a cc0 setter and cc0 user consecutive on a HAVE_cc0 target.
>
> In the latter case passes should already be doing what is necessary to keep
> those instructions consecutive.  In the former case, we'd have to audit &
> fix passes to honor the desire to keep certain instructions consecutive.
>

I see. Thanks for showing me the reason.

>>
>> bootstrap is ok. regression test is going on. Is it ok if regression
>> passes?
>>
>> Thanks,
>> Wei.
>>
>> 2013-11-23  Wei Mi  <wmi@google.com>
>>
>>          PR rtl-optimization/59020
>>          * haifa-sched.c (cleanup_sched_group): New function.
>>          (sched_finish): Call cleanup_sched_group to cleanup
>> SCHED_GROUP_P.
>>
>> 2013-11-23  Wei Mi  <wmi@google.com>
>>          PR rtl-optimization/59020
>>          * testsuite/gcc.dg/pr59020.c (void f):
>
> I'll note you're doing an extra pass over all the RTL here.   Is there any
> clean way you can clean SCHED_GROUP_P without that extra pass over the RTL?
> Perhaps when the group actually gets scheduled?
>
> jeff
>

With your help to understand that sched group will not be broken by
other passes in other cases, I can cleanup SCHED_GROUP_P for
macrofusion only by checking every condjump insn which is at the end
of BB. Then the cost will be in the same scale with bb nums. Do you
think it is ok?

Thanks,
Wei.

2013-11-25  Wei Mi  <wmi@google.com>

        PR rtl-optimization/59020
        * haifa-sched.c (cleanup_sched_group): New function.
        (sched_finish): Call cleanup_sched_group to cleanup SCHED_GROUP_P.

2013-11-25  Wei Mi  <wmi@google.com>
        PR rtl-optimization/59020
        * testsuite/gcc.dg/pr59020.c (void f):

Comments

Jeff Law Nov. 25, 2013, 7:25 p.m. UTC | #1
On 11/25/13 12:16, Wei Mi wrote:
>>
>> I'll note you're doing an extra pass over all the RTL here.   Is there any
>> clean way you can clean SCHED_GROUP_P without that extra pass over the RTL?
>> Perhaps when the group actually gets scheduled?
>>
>> jeff
>>
>
> With your help to understand that sched group will not be broken by
> other passes in other cases, I can cleanup SCHED_GROUP_P for
> macrofusion only by checking every condjump insn which is at the end
> of BB. Then the cost will be in the same scale with bb nums. Do you
> think it is ok?
>
> Thanks,
> Wei.
>
> 2013-11-25  Wei Mi  <wmi@google.com>
>
>          PR rtl-optimization/59020
>          * haifa-sched.c (cleanup_sched_group): New function.
>          (sched_finish): Call cleanup_sched_group to cleanup SCHED_GROUP_P.
>
> 2013-11-25  Wei Mi  <wmi@google.com>
>          PR rtl-optimization/59020
>          * testsuite/gcc.dg/pr59020.c (void f):
But there's nothing that requires the SCHED_GROUP_P to be at the end of 
a block.  The cc0-setter/cc0-user case was just an example.  Another 
example would be groups created around call insns on small register 
class machines.

ISTM that when an insn moves from the ready list to back to the main 
insn chain, that you can just clear SCHED_GROUP_P at that time.  Is that 
not the case?

jeff
Wei Mi Nov. 25, 2013, 7:48 p.m. UTC | #2
On Mon, Nov 25, 2013 at 11:25 AM, Jeff Law <law@redhat.com> wrote:
> On 11/25/13 12:16, Wei Mi wrote:
>>>
>>>
>>> I'll note you're doing an extra pass over all the RTL here.   Is there
>>> any
>>> clean way you can clean SCHED_GROUP_P without that extra pass over the
>>> RTL?
>>> Perhaps when the group actually gets scheduled?
>>>
>>> jeff
>>>
>>
>> With your help to understand that sched group will not be broken by
>> other passes in other cases, I can cleanup SCHED_GROUP_P for
>> macrofusion only by checking every condjump insn which is at the end
>> of BB. Then the cost will be in the same scale with bb nums. Do you
>> think it is ok?
>>
>> Thanks,
>> Wei.
>>
>> 2013-11-25  Wei Mi  <wmi@google.com>
>>
>>          PR rtl-optimization/59020
>>          * haifa-sched.c (cleanup_sched_group): New function.
>>          (sched_finish): Call cleanup_sched_group to cleanup
>> SCHED_GROUP_P.
>>
>> 2013-11-25  Wei Mi  <wmi@google.com>
>>          PR rtl-optimization/59020
>>          * testsuite/gcc.dg/pr59020.c (void f):
>
> But there's nothing that requires the SCHED_GROUP_P to be at the end of a
> block.  The cc0-setter/cc0-user case was just an example.  Another example
> would be groups created around call insns on small register class machines.
>

Doing the cleanup at the end of BB could ensure all the groups
inserted for macrofusion will be cleaned. For groups not at the end of
a block, no matter whether they are cleaned up or not, nothing will
happen because other passes will not mess up those groups -- you said
cc0-setter/cc0-user was such a case. Is it call group a different
case?

> ISTM that when an insn moves from the ready list to back to the main insn
> chain, that you can just clear SCHED_GROUP_P at that time.  Is that not the
> case?
>
> jeff
>

For sched1 and sched2, we can do that. Actually, I find it has been
done in move_insn when commit_schedule. But for modulo scheduling, I
havn't found a good place to do it.

Thanks,
Wei.
Jeff Law Nov. 25, 2013, 10:12 p.m. UTC | #3
>
> Doing the cleanup at the end of BB could ensure all the groups
> inserted for macrofusion will be cleaned. For groups not at the end of
> a block, no matter whether they are cleaned up or not, nothing will
> happen because other passes will not mess up those groups -- you said
> cc0-setter/cc0-user was such a case. Is it call group a different
> case?
True, it would be safe, but it seems inconsistent and confusing that 
some SCHED_GROUP_P references would be purged and others remain.

Given SCHED_GROUP_P is to used strictly in the scheduler ISTM that we 
should be wiping it as we leave and that our RTL checkers ought to be 
verifying there are no insns with SCHED_GROUP_P left on.

>
> For sched1 and sched2, we can do that. Actually, I find it has been
> done in move_insn when commit_schedule. But for modulo scheduling, I
> havn't found a good place to do it.
Well, that's where I'd suggest focusing attention.

jeff
Wei Mi Nov. 26, 2013, 7:33 p.m. UTC | #4
On Mon, Nov 25, 2013 at 2:12 PM, Jeff Law <law@redhat.com> wrote:
>
>>
>> Doing the cleanup at the end of BB could ensure all the groups
>> inserted for macrofusion will be cleaned. For groups not at the end of
>> a block, no matter whether they are cleaned up or not, nothing will
>> happen because other passes will not mess up those groups -- you said
>> cc0-setter/cc0-user was such a case. Is it call group a different
>> case?
>
> True, it would be safe, but it seems inconsistent and confusing that some
> SCHED_GROUP_P references would be purged and others remain.
>
> Given SCHED_GROUP_P is to used strictly in the scheduler ISTM that we should
> be wiping it as we leave and that our RTL checkers ought to be verifying
> there are no insns with SCHED_GROUP_P left on.
>

How about add a verifier TODO_verify_sched_group_flag similar as
TODO_verify_rtl_sharing, and add the verifier in the todo lists of all
the scheduling passes.

>
>>
>> For sched1 and sched2, we can do that. Actually, I find it has been
>> done in move_insn when commit_schedule. But for modulo scheduling, I
>> havn't found a good place to do it.
>
> Well, that's where I'd suggest focusing attention.
>
> jeff
>

After looking it carefully, even for sched1 and sched2, it is not ok
to depend on move_insn in commit_schedule to clean up all the
SCHED_GROUP_P, suppose a block is decided not to be scheduled by
dbg_cnt, then SCHED_GROUP_P inside the block will not be cleaned.
It is even more difficult to find a place inside SMS scheduling to do
the cleanup.

Any suggestions?

Thanks,
Wei.
Jeff Law Nov. 27, 2013, 5:34 a.m. UTC | #5
On 11/26/13 12:33, Wei Mi wrote:
> On Mon, Nov 25, 2013 at 2:12 PM, Jeff Law <law@redhat.com> wrote:
>>
>>>
>>> Doing the cleanup at the end of BB could ensure all the groups
>>> inserted for macrofusion will be cleaned. For groups not at the end of
>>> a block, no matter whether they are cleaned up or not, nothing will
>>> happen because other passes will not mess up those groups -- you said
>>> cc0-setter/cc0-user was such a case. Is it call group a different
>>> case?
>>
>> True, it would be safe, but it seems inconsistent and confusing that some
>> SCHED_GROUP_P references would be purged and others remain.
>>
>> Given SCHED_GROUP_P is to used strictly in the scheduler ISTM that we should
>> be wiping it as we leave and that our RTL checkers ought to be verifying
>> there are no insns with SCHED_GROUP_P left on.
>>
>
> How about add a verifier TODO_verify_sched_group_flag similar as
> TODO_verify_rtl_sharing, and add the verifier in the todo lists of all
> the scheduling passes.
>
>>
>>>
>>> For sched1 and sched2, we can do that. Actually, I find it has been
>>> done in move_insn when commit_schedule. But for modulo scheduling, I
>>> havn't found a good place to do it.
>>
>> Well, that's where I'd suggest focusing attention.
>>
>> jeff
>>
>
> After looking it carefully, even for sched1 and sched2, it is not ok
> to depend on move_insn in commit_schedule to clean up all the
> SCHED_GROUP_P, suppose a block is decided not to be scheduled by
> dbg_cnt, then SCHED_GROUP_P inside the block will not be cleaned.
> It is even more difficult to find a place inside SMS scheduling to do
> the cleanup.
>
> Any suggestions?
Hmm, maybe attack from the other direction? -- could we clear 
SCHED_GROUP_P for each insn at the start of this loop in sched_analyze?

It's not as clean in the sense that SCHED_GROUP_P "escapes" the 
scheduler, but it might be an option.

    for (insn = head;; insn = NEXT_INSN (insn))
     {

       if (INSN_P (insn))
         {
           /* And initialize deps_lists.  */
           sd_init_insn (insn);
         }

       deps_analyze_insn (deps, insn);

       if (insn == tail)
         {
           if (sched_deps_info->use_cselib)
             cselib_finish ();
           return;
         }
     }
Jeff
>
Wei Mi Nov. 27, 2013, 6:55 a.m. UTC | #6
On Tue, Nov 26, 2013 at 9:34 PM, Jeff Law <law@redhat.com> wrote:
> On 11/26/13 12:33, Wei Mi wrote:
>>
>> On Mon, Nov 25, 2013 at 2:12 PM, Jeff Law <law@redhat.com> wrote:
>>>
>>>
>>>>
>>>> Doing the cleanup at the end of BB could ensure all the groups
>>>> inserted for macrofusion will be cleaned. For groups not at the end of
>>>> a block, no matter whether they are cleaned up or not, nothing will
>>>> happen because other passes will not mess up those groups -- you said
>>>> cc0-setter/cc0-user was such a case. Is it call group a different
>>>> case?
>>>
>>>
>>> True, it would be safe, but it seems inconsistent and confusing that some
>>> SCHED_GROUP_P references would be purged and others remain.
>>>
>>> Given SCHED_GROUP_P is to used strictly in the scheduler ISTM that we
>>> should
>>> be wiping it as we leave and that our RTL checkers ought to be verifying
>>> there are no insns with SCHED_GROUP_P left on.
>>>
>>
>> How about add a verifier TODO_verify_sched_group_flag similar as
>> TODO_verify_rtl_sharing, and add the verifier in the todo lists of all
>> the scheduling passes.
>>
>>>
>>>>
>>>> For sched1 and sched2, we can do that. Actually, I find it has been
>>>> done in move_insn when commit_schedule. But for modulo scheduling, I
>>>> havn't found a good place to do it.
>>>
>>>
>>> Well, that's where I'd suggest focusing attention.
>>>
>>> jeff
>>>
>>
>> After looking it carefully, even for sched1 and sched2, it is not ok
>> to depend on move_insn in commit_schedule to clean up all the
>> SCHED_GROUP_P, suppose a block is decided not to be scheduled by
>> dbg_cnt, then SCHED_GROUP_P inside the block will not be cleaned.
>> It is even more difficult to find a place inside SMS scheduling to do
>> the cleanup.
>>
>> Any suggestions?
>
> Hmm, maybe attack from the other direction? -- could we clear SCHED_GROUP_P
> for each insn at the start of this loop in sched_analyze?
>
> It's not as clean in the sense that SCHED_GROUP_P "escapes" the scheduler,
> but it might be an option.
>
>    for (insn = head;; insn = NEXT_INSN (insn))
>     {
>
>       if (INSN_P (insn))
>         {
>           /* And initialize deps_lists.  */
>           sd_init_insn (insn);
>         }
>
>       deps_analyze_insn (deps, insn);
>
>       if (insn == tail)
>         {
>           if (sched_deps_info->use_cselib)
>             cselib_finish ();
>           return;
>         }
>     }
> Jeff
>>
>>
>

Thanks for the suggestion. It looks workable. Then I need to move the
SCHED_GROUP_P setting for macrofusion from sched_init to a place
inside sched_analyze after the SCHED_GROUP_P cleanup. It will be more
consistent with the settings for cc0 setter-user group and call group,
which are both inside sched_analyze.
I am trying this method...

Thanks,
Wei.
diff mbox

Patch

Index: haifa-sched.c
===================================================================
--- haifa-sched.c       (revision 204923)
+++ haifa-sched.c       (working copy)
@@ -6598,6 +6598,25 @@  group_insns_for_macro_fusion ()
     try_group_insn (BB_END (bb));
 }

+/* Cleanup SCHED_GROUP_P after scheduling is done. This is necessary because
+   bb may be changed by other optimizations and the flag from last scheduling
+   may become invalid. If later scheduler see the flag generated from last
+   scheduling, it may produces incorrect result.  */
+
+static void
+cleanup_sched_group ()
+{
+  basic_block bb;
+  rtx insn;
+
+  FOR_EACH_BB (bb)
+    {
+      insn = BB_END (bb);
+      if (INSN_P (insn) && SCHED_GROUP_P (insn))
+       SCHED_GROUP_P (insn) = 0;
+    }
+}
+
 /* Initialize some global state for the scheduler.  This function works
    with the common data shared between all the schedulers.  It is called
    from the scheduler specific initialization routine.  */
@@ -6841,6 +6860,8 @@  sched_finish (void)
     }
   free (curr_state);

+  cleanup_sched_group ();
+
   if (targetm.sched.finish_global)
     targetm.sched.finish_global (sched_dump, sched_verbose);

Index: testsuite/gcc.dg/pr59020.c
===================================================================
--- testsuite/gcc.dg/pr59020.c  (revision 0)
+++ testsuite/gcc.dg/pr59020.c  (revision 0)
@@ -0,0 +1,15 @@ 
+/* PR rtl-optimization/59020 */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fmodulo-sched -fno-inline -march=corei7" } */
+
+int a, b, d;
+unsigned c;
+
+void f()
+{
+  unsigned q;
+  for(; a; a++)
+    if(((c %= d && 1) ? : 1) & 1)
+      for(; b; q++);
+}