diff mbox

New automaton_option "collapse-ndfa"

Message ID 4E1B6671.5000405@codesourcery.com
State New
Headers show

Commit Message

Bernd Schmidt July 11, 2011, 9:09 p.m. UTC
On C6X, we want to use the "ndfa" option to give the scheduler maximum
freedom when assigning units to instructions. After scheduling is
complete, we process the insns again in c6x_reorg, looking at each cycle
and assigning a unit specifier to each instruction so that there is no
conflict within a cycle.

This works, except for a few reservations that span more than the first
cycle. To handle these properly, one possibility would be to consider
the entire scheduled block with a more complicated algorithm. While
feasible, I'd prefer not to go there for the moment.

I came up with the notion of adding a new transition to the NDFA, one
which collapses a nondeterministic state (which is composed of multiple
possible deterministic ones) to just one of its component states. This
can be done at the end of each cycle, and gives a state that can be
processed with cpu_unit_reservation_p to identify the units chosen by
the scheduler. The following patch implements this.

The new option also modifies the generation of advance-cycle transitions
so that they only exist in deterministic states. This matches the
expected use of the feature where we have a collapse-ndfa transition
before the end of each cycle (using the dfa_pre_cycle_insn hook).
state_transition now recognizes const0_rtx as the collapse-ndfa
transition (NULL_RTX was taken for advance-cycle).

Tested with 4.5 c6x-elf so far. I hope to commit the C6X port to
mainline soon and will retest the patch with that as well. IA64 is
another user of the "ndfa" option, but it failed to bootstrap a clean
tree when I tried it a few days ago, so I've only built a cross-cc1 and
examined the generated insn-automata.c before/after the patch. No
changes beyond slight expected reorganization in the code recognizing
NULL_RTX as advance-cycle, and no changes in the "ia64.dfa" file
generated with the "v" option.

Ok?


Bernd
* doc/md.texi (automata_option): Document collapse-ndfa.
	* genautomata.c (COLLAPSE_OPTION): New macro.
	(collapse_flag): New static variable.
	(struct description): New member normal_decls_num.
	(struct automaton): New members advance_ainsn and collapse_ainsn.
	(gen_automata_option): Check for COLLAPSE_OPTION.
	(collapse_ndfa_insn_decl): New static variable.
	(add_collapse_ndfa_insn_decl, special_decl_p): New functions.
	(find_arc): If insn is the collapse-ndfa insn, accept any arc we
	find.
	(transform_insn_regexps): Call add_collapse_ndfa_insn_decl if
	necessary.  Use normal_decls_num rather than decls_num, remove
	test for special decls.
	(create_alt_states, form_ainsn_with_same_reservs): Use
	special_decl_p.
	(make_automaton); Likewise.  Use the new advance_cycle_insn member
	of struct automaton.
	(create_composed_state): Disallow advance-cycle arcs if collapse_flag
	is set.
	(NDFA_to_DFA): Don't create composed states for the collapse-ndfa
	transition.  Create the necessary transitions for it.
	(create_ainsns): Return void.  Take an automaton_t argument, and
	update its ainsn_list, advance_ainsn and collapse_ainsn members.  All
	callers changed.
	(COLLAPSE_NDFA_VALUE_NAME): New macro.
	(output_tables): Output code to define it.
	(output_internal_insn_code_evaluation): Output code to accept
	const0_rtx as collapse-ndfa transition.
	(output_default_latencies, output_print_reservation_func,
	output_print_description): Reorganize loops to use normal_decls_num
	as loop bound; remove special case for advance_cycle_insn_decl.
	(initiate_automaton_gen): Handle COLLAPSE_OPTION.
	(check_automata_insn_issues): Check for collapse_ainsn.
	(expand_automate): Allocate sufficient space.  Initialize
	normal_decls_num.

Comments

Vladimir Makarov July 18, 2011, 3:09 p.m. UTC | #1
On 07/11/2011 05:09 PM, Bernd Schmidt wrote:
> On C6X, we want to use the "ndfa" option to give the scheduler maximum
> freedom when assigning units to instructions. After scheduling is
> complete, we process the insns again in c6x_reorg, looking at each cycle
> and assigning a unit specifier to each instruction so that there is no
> conflict within a cycle.
>
> This works, except for a few reservations that span more than the first
> cycle. To handle these properly, one possibility would be to consider
> the entire scheduled block with a more complicated algorithm. While
> feasible, I'd prefer not to go there for the moment.
>
> I came up with the notion of adding a new transition to the NDFA, one
> which collapses a nondeterministic state (which is composed of multiple
> possible deterministic ones) to just one of its component states. This
> can be done at the end of each cycle, and gives a state that can be
> processed with cpu_unit_reservation_p to identify the units chosen by
> the scheduler. The following patch implements this.
>
> The new option also modifies the generation of advance-cycle transitions
> so that they only exist in deterministic states. This matches the
> expected use of the feature where we have a collapse-ndfa transition
> before the end of each cycle (using the dfa_pre_cycle_insn hook).
> state_transition now recognizes const0_rtx as the collapse-ndfa
> transition (NULL_RTX was taken for advance-cycle).
>
> Tested with 4.5 c6x-elf so far. I hope to commit the C6X port to
> mainline soon and will retest the patch with that as well. IA64 is
> another user of the "ndfa" option, but it failed to bootstrap a clean
> tree when I tried it a few days ago, so I've only built a cross-cc1 and
> examined the generated insn-automata.c before/after the patch. No
> changes beyond slight expected reorganization in the code recognizing
> NULL_RTX as advance-cycle, and no changes in the "ia64.dfa" file
> generated with the "v" option.
>
> Ok?
>
Bernd, sorry for the delay.  I was on vacation last week.

The patch is ok for me.  Thanks, Bernd.

IA64 actually uses the two automata,  one automaton can be considered 
NDFA and is used for scheduling, another one can be considered as DFA 
and used for insn bundling.

I should acknowledge your solution is not trivial and more elegant and 
permitting to use smaller description.

It would be interesting to rewrite IA64 in your way.  Although I think 
it will be never done because it is nontrivial work to do and probably 
better not to fix something when it already works.
Bernd Schmidt July 18, 2011, 3:27 p.m. UTC | #2
On 07/18/11 17:09, Vladimir Makarov wrote:
> On 07/11/2011 05:09 PM, Bernd Schmidt wrote:
>> I came up with the notion of adding a new transition to the NDFA, one
>> which collapses a nondeterministic state (which is composed of multiple
>> possible deterministic ones) to just one of its component states. This
>> can be done at the end of each cycle, and gives a state that can be
>> processed with cpu_unit_reservation_p to identify the units chosen by
>> the scheduler. The following patch implements this.

> Bernd, sorry for the delay.  I was on vacation last week.

No problem. A week is rapid :)

> IA64 actually uses the two automata,  one automaton can be considered
> NDFA and is used for scheduling, another one can be considered as DFA
> and used for insn bundling.
> 
> I should acknowledge your solution is not trivial and more elegant and
> permitting to use smaller description.
> 
> It would be interesting to rewrite IA64 in your way.  Although I think
> it will be never done because it is nontrivial work to do and probably
> better not to fix something when it already works.

I'd been wondering about whether it could be usable for things like
ia64, but I don't actually understand the ia64 scheduling description
very well... On ia64 there's the additional complication that
end-of-cycle and end-of-bundle don't always coincide.
Probably no one cares enough about this target anyway to attempt such
changes.

I'll probably also need another new option, "no_comb_vectors", as I have
a slight problem with this loop:

 3884568992: 7379:  for (comb_vect_index = 1, i = vect_length; i <
comb_vect_els_num;
3884508354: 7380:       comb_vect_index++, i++)
        -: 7381:    {
3884508592: 7382:      comb_vect_mask = (comb_vect_mask << 1) | 1;
7769017184: 7383:      comb_vect_mask ^= (VEC_index (vect_el_t,
tab->comb_vect, i)
        -: 7384:                         == undefined_vect_el_value);
3884508592: 7385:      if ((vect_mask & comb_vect_mask) == 0)
        -: 7386:        goto found;
        -: 7387:    }

(Those are the numbers _before_ I make the changes that explode the
compile time). Any objection in principle to such an option, or any
ideas how to optimize this code? Having a description of a "comb vector"
in genautomata.c would be a good idea; I think I've figured out what
it's supposed to do, but it's not immediately obvious.


Bernd
Vladimir Makarov July 18, 2011, 4:47 p.m. UTC | #3
On 07/18/2011 11:27 AM, Bernd Schmidt wrote:
> On 07/18/11 17:09, Vladimir Makarov wrote:
>> On 07/11/2011 05:09 PM, Bernd Schmidt wrote:
>>> I came up with the notion of adding a new transition to the NDFA, one
>>> which collapses a nondeterministic state (which is composed of multiple
>>> possible deterministic ones) to just one of its component states. This
>>> can be done at the end of each cycle, and gives a state that can be
>>> processed with cpu_unit_reservation_p to identify the units chosen by
>>> the scheduler. The following patch implements this.
>> Bernd, sorry for the delay.  I was on vacation last week.
> No problem. A week is rapid :)
>
>> IA64 actually uses the two automata,  one automaton can be considered
>> NDFA and is used for scheduling, another one can be considered as DFA
>> and used for insn bundling.
>>
>> I should acknowledge your solution is not trivial and more elegant and
>> permitting to use smaller description.
>>
>> It would be interesting to rewrite IA64 in your way.  Although I think
>> it will be never done because it is nontrivial work to do and probably
>> better not to fix something when it already works.
> I'd been wondering about whether it could be usable for things like
> ia64, but I don't actually understand the ia64 scheduling description
> very well... On ia64 there's the additional complication that
> end-of-cycle and end-of-bundle don't always coincide.
> Probably no one cares enough about this target anyway to attempt such
> changes.
>
Yes, it is not such popular as it was when the description was written.  
It is better not touch it if it is not broken.

On the other hand, coming in 2012 Poulson i64 processor which will be 
12-wide (instead of 6 insns wide as current processors) will need a new 
description.  I think it could be an opportunity to rewrite the old one 
too.
> I'll probably also need another new option, "no_comb_vectors", as I have
> a slight problem with this loop:
>
>   3884568992: 7379:  for (comb_vect_index = 1, i = vect_length; i<
> comb_vect_els_num;
> 3884508354: 7380:       comb_vect_index++, i++)
>          -: 7381:    {
> 3884508592: 7382:      comb_vect_mask = (comb_vect_mask<<  1) | 1;
> 7769017184: 7383:      comb_vect_mask ^= (VEC_index (vect_el_t,
> tab->comb_vect, i)
>          -: 7384:                         == undefined_vect_el_value);
> 3884508592: 7385:      if ((vect_mask&  comb_vect_mask) == 0)
>          -: 7386:        goto found;
>          -: 7387:    }
>
> (Those are the numbers _before_ I make the changes that explode the
> compile time). Any objection in principle to such an option, or any
> ideas how to optimize this code? Having a description of a "comb vector"
> in genautomata.c would be a good idea; I think I've figured out what
> it's supposed to do, but it's not immediately obvious.
>
Comb-vectors is a widely used compression table technique.  It is used 
in (f)lex, (b)yacc/bison and as I remember it is described in the Dragon 
book.

There are a lot of compression table algorithms with different 
characteristics compression ratio, compression speed, access time.  
About 25 years ago I read an article overviewing about 30 such 
algorithms.  But I don't remember its name.

So I guess the alternative would be in usage of algorithm with the same 
access time, compression ratio, and faster compression speed.

But I guess comb-vector is popular for a reason.  We could tolerate slow 
compression time because it is done once but worse compression and 
slower access would have a really bad impact on the compiler time.
diff mbox

Patch

Index: doc/md.texi
===================================================================
--- doc/md.texi	(revision 176171)
+++ doc/md.texi	(working copy)
@@ -7859,6 +7859,16 @@  nondeterministic treatment means trying
 may be rejected by reservations in the subsequent insns.
 
 @item
+@dfn{collapse-ndfa} modifies the behaviour of the generator when
+producing an automaton.  An additional state transition to collapse a
+nondeterministic @acronym{NDFA} state to a deterministic @acronym{DFA}
+state is generated.  It can be triggered by passing @code{const0_rtx} to
+state_transition.  In such an automaton, cycle advance transitions are
+available only for these collapsed states.  This option is useful for
+ports that want to use the @code{ndfa} option, but also want to use
+@code{define_query_cpu_unit} to assign units to insns issued in a cycle.
+
+@item
 @dfn{progress} means output of a progress bar showing how many states
 were generated so far for automaton being processed.  This is useful
 during debugging a @acronym{DFA} description.  If you see too many
Index: genautomata.c
===================================================================
--- genautomata.c	(revision 176179)
+++ genautomata.c	(working copy)
@@ -251,6 +251,7 @@  static arc_t next_out_arc              (
 #define V_OPTION "-v"
 #define W_OPTION "-w"
 #define NDFA_OPTION "-ndfa"
+#define COLLAPSE_OPTION "-collapse-ndfa"
 #define PROGRESS_OPTION "-progress"
 
 /* The following flags are set up by function `initiate_automaton_gen'.  */
@@ -258,6 +259,11 @@  static arc_t next_out_arc              (
 /* Make automata with nondeterministic reservation by insns (`-ndfa').  */
 static int ndfa_flag;
 
+/* When making an NDFA, produce additional transitions that collapse
+   NDFA state into a deterministic one suitable for querying CPU units.
+   Provide avance-state transitions only for deterministic states.  */
+static int collapse_flag;
+
 /* Do not make minimization of DFA (`-no-minimization').  */
 static int no_minimization_flag;
 
@@ -603,7 +609,7 @@  struct regexp
    NDFA.  */
 struct description
 {
-  int decls_num;
+  int decls_num, normal_decls_num;
 
   /* The following fields are defined by checker.  */
 
@@ -623,9 +629,8 @@  struct description
   automaton_t first_automaton;
 
   /* The following field is created by pipeline hazard parser and
-     contains all declarations.  We allocate additional entry for
-     special insn "cycle advancing" which is added by the automaton
-     generator.  */
+     contains all declarations.  We allocate additional entries for
+     two special insns which are added by the automaton generator.  */
   decl_t decls [1];
 };
 
@@ -810,6 +815,9 @@  struct automaton
   /* The following field value is the list of insn declarations for
      given automaton.  */
   ainsn_t ainsn_list;
+  /* Pointers to the ainsns corresponding to the special reservations.  */
+  ainsn_t advance_ainsn, collapse_ainsn;
+
   /* The following field value is the corresponding automaton
      declaration.  This field is not NULL only if the automatic
      partition on automata is not used.  */
@@ -1528,6 +1536,8 @@  gen_automata_option (rtx def)
     w_flag = 1;
   else if (strcmp (XSTR (def, 0), NDFA_OPTION + 1) == 0)
     ndfa_flag = 1;
+  else if (strcmp (XSTR (def, 0), COLLAPSE_OPTION + 1) == 0)
+    collapse_flag = 1;
   else if (strcmp (XSTR (def, 0), PROGRESS_OPTION + 1) == 0)
     progress_flag = 1;
   else
@@ -3134,6 +3144,10 @@  static ticker_t all_time;
 
 /* Pseudo insn decl which denotes advancing cycle.  */
 static decl_t advance_cycle_insn_decl;
+/* Pseudo insn decl which denotes collapsing the NDFA state.  */
+static decl_t collapse_ndfa_insn_decl;
+
+/* Create and record a decl for the special advance-cycle transition.  */
 static void
 add_advance_cycle_insn_decl (void)
 {
@@ -3149,6 +3163,31 @@  add_advance_cycle_insn_decl (void)
   description->insns_num++;
 }
 
+/* Create and record a decl for the special collapse-NDFA transition.  */
+static void
+add_collapse_ndfa_insn_decl (void)
+{
+  collapse_ndfa_insn_decl = XCREATENODE (struct decl);
+  collapse_ndfa_insn_decl->mode = dm_insn_reserv;
+  collapse_ndfa_insn_decl->pos = no_pos;
+  DECL_INSN_RESERV (collapse_ndfa_insn_decl)->regexp = NULL;
+  DECL_INSN_RESERV (collapse_ndfa_insn_decl)->name = "$collapse_ndfa";
+  DECL_INSN_RESERV (collapse_ndfa_insn_decl)->insn_num
+    = description->insns_num;
+  description->decls [description->decls_num] = collapse_ndfa_insn_decl;
+  description->decls_num++;
+  description->insns_num++;
+}
+
+/* True if DECL is either of the two special decls we created.  */
+static bool
+special_decl_p (struct insn_reserv_decl *decl)
+{
+  return (decl == DECL_INSN_RESERV (advance_cycle_insn_decl)
+	  || (collapse_flag
+	      && decl == DECL_INSN_RESERV (collapse_ndfa_insn_decl)));
+}
+
 
 /* Abstract data `alternative states' which represents
    nondeterministic nature of the description (see comments for
@@ -3903,7 +3940,12 @@  find_arc (state_t from_state, state_t to
   arc_t arc;
 
   for (arc = first_out_arc (from_state); arc != NULL; arc = next_out_arc (arc))
-    if (arc->to_state == to_state && arc->insn == insn)
+    if (arc->insn == insn
+	&& (arc->to_state == to_state
+	    || (collapse_flag
+		/* Any arc is good enough for a collapse-ndfa transition.  */
+		&& (insn->insn_reserv_decl
+		    == DECL_INSN_RESERV (collapse_ndfa_insn_decl)))))
       return arc;
   return NULL;
 }
@@ -4888,12 +4930,14 @@  transform_insn_regexps (void)
 
   transform_time = create_ticker ();
   add_advance_cycle_insn_decl ();
+  if (collapse_flag)
+    add_collapse_ndfa_insn_decl ();
   if (progress_flag)
     fprintf (stderr, "Reservation transformation...");
-  for (i = 0; i < description->decls_num; i++)
+  for (i = 0; i < description->normal_decls_num; i++)
     {
       decl = description->decls [i];
-      if (decl->mode == dm_insn_reserv && decl != advance_cycle_insn_decl)
+      if (decl->mode == dm_insn_reserv)
 	DECL_INSN_RESERV (decl)->transformed_regexp
 	  = transform_regexp (copy_insn_regexp
 			      (DECL_INSN_RESERV (decl)->regexp));
@@ -5364,7 +5408,7 @@  create_alt_states (automaton_t automaton
        curr_ainsn = curr_ainsn->next_ainsn)
     {
       reserv_decl = curr_ainsn->insn_reserv_decl;
-      if (reserv_decl != DECL_INSN_RESERV (advance_cycle_insn_decl))
+      if (!special_decl_p (reserv_decl))
         {
           curr_ainsn->alt_states = NULL;
           process_alts_for_forming_states (reserv_decl->transformed_regexp,
@@ -5393,8 +5437,7 @@  form_ainsn_with_same_reservs (automaton_
   for (curr_ainsn = automaton->ainsn_list;
        curr_ainsn != NULL;
        curr_ainsn = curr_ainsn->next_ainsn)
-    if (curr_ainsn->insn_reserv_decl
-	== DECL_INSN_RESERV (advance_cycle_insn_decl))
+    if (special_decl_p (curr_ainsn->insn_reserv_decl))
       {
         curr_ainsn->next_same_reservs_insn = NULL;
         curr_ainsn->first_insn_with_same_reservs = 1;
@@ -5462,7 +5505,6 @@  make_automaton (automaton_t automaton)
   state_t state;
   state_t start_state;
   state_t state2;
-  ainsn_t advance_cycle_ainsn;
   VEC(state_t, heap) *state_stack = VEC_alloc(state_t, heap, 150);
   int states_n;
   reserv_sets_t reservs_matter = form_reservs_matter (automaton);
@@ -5476,14 +5518,13 @@  make_automaton (automaton_t automaton)
   while (VEC_length (state_t, state_stack) != 0)
     {
       state = VEC_pop (state_t, state_stack);
-      advance_cycle_ainsn = NULL;
       for (ainsn = automaton->ainsn_list;
 	   ainsn != NULL;
 	   ainsn = ainsn->next_ainsn)
         if (ainsn->first_insn_with_same_reservs)
           {
             insn_reserv_decl = ainsn->insn_reserv_decl;
-            if (insn_reserv_decl != DECL_INSN_RESERV (advance_cycle_insn_decl))
+            if (!special_decl_p (insn_reserv_decl))
               {
 		/* We process alt_states in the same order as they are
                    present in the description.  */
@@ -5510,8 +5551,6 @@  make_automaton (automaton_t automaton)
                       }
                   }
               }
-            else
-              advance_cycle_ainsn = ainsn;
           }
       /* Add transition to advance cycle.  */
       state2 = state_shift (state, reservs_matter);
@@ -5523,8 +5562,7 @@  make_automaton (automaton_t automaton)
 	  if (progress_flag && states_n % 100 == 0)
 	    fprintf (stderr, ".");
         }
-      gcc_assert (advance_cycle_ainsn);
-      add_arc (state, state2, advance_cycle_ainsn);
+      add_arc (state, state2, automaton->advance_ainsn);
     }
   VEC_free (state_t, heap, state_stack);
 }
@@ -5632,7 +5670,13 @@  create_composed_state (state_t original_
                 for (curr_arc = first_out_arc (curr_alt_state->state);
                      curr_arc != NULL;
                      curr_arc = next_out_arc (curr_arc))
-		  add_arc (state, curr_arc->to_state, curr_arc->insn);
+		  if (!collapse_flag
+		      /* When producing collapse-NDFA transitions, we
+			 only add advance-cycle transitions to the
+			 collapsed states.  */
+		      || (curr_arc->insn->insn_reserv_decl
+			  != DECL_INSN_RESERV (advance_cycle_insn_decl)))
+		    add_arc (state, curr_arc->to_state, curr_arc->insn);
             }
           arcs_marked_by_insn->to_state = state;
           for (alts_number = 0,
@@ -5682,6 +5726,7 @@  NDFA_to_DFA (automaton_t automaton)
 	{
 	  decl = description->decls [i];
 	  if (decl->mode == dm_insn_reserv
+	      && decl != collapse_ndfa_insn_decl
 	      && create_composed_state
 	         (state, DECL_INSN_RESERV (decl)->arcs_marked_by_insn,
 		  &state_stack))
@@ -5691,6 +5736,22 @@  NDFA_to_DFA (automaton_t automaton)
 		fprintf (stderr, ".");
 	    }
 	}
+      /* Add a transition to collapse the NDFA.  */
+      if (collapse_flag)
+	{
+	  if (state->component_states != NULL)
+	    {
+	      state_t state2 = state->component_states->state;
+	      if (!state2->it_was_placed_in_stack_for_DFA_forming)
+		{
+		  state2->it_was_placed_in_stack_for_DFA_forming = 1;
+		  VEC_safe_push (state_t, heap, state_stack, state2);
+		}
+	      add_arc (state, state2, automaton->collapse_ainsn);
+	    }
+	  else
+	    add_arc (state, state, automaton->collapse_ainsn);
+	}
     }
   VEC_free (state_t, heap, state_stack);
 }
@@ -5746,8 +5807,7 @@  add_achieved_state (state_t state)
 
 /* The function sets up equivalence numbers of insns which mark all
    out arcs of STATE by equiv_class_num_1 (if ODD_ITERATION_FLAG has
-   nonzero value) or by equiv_class_num_2 of the destination state.
-   The function returns number of out arcs of STATE.  */
+   nonzero value) or by equiv_class_num_2 of the destination state.  */
 static void
 set_out_arc_insns_equiv_num (state_t state, int odd_iteration_flag)
 {
@@ -6515,8 +6575,8 @@  units_to_automata_heuristic_distr (void)
 /* The functions creates automaton insns for each automata.  Automaton
    insn is simply insn for given automaton which makes reservation
    only of units of the automaton.  */
-static ainsn_t
-create_ainsns (void)
+static void
+create_ainsns (automaton_t automaton)
 {
   decl_t decl;
   ainsn_t first_ainsn;
@@ -6539,10 +6599,14 @@  create_ainsns (void)
 	    first_ainsn = curr_ainsn;
 	  else
 	    prev_ainsn->next_ainsn = curr_ainsn;
+	  if (decl == advance_cycle_insn_decl)
+	    automaton->advance_ainsn = curr_ainsn;
+	  else if (decl == collapse_ndfa_insn_decl)
+	    automaton->collapse_ainsn = curr_ainsn;
 	  prev_ainsn = curr_ainsn;
 	}
     }
-  return first_ainsn;
+  automaton->ainsn_list = first_ainsn;
 }
 
 /* The function assigns automata to units according to constructions
@@ -6590,7 +6654,7 @@  create_automata (void)
            curr_automaton_num++, prev_automaton = curr_automaton)
         {
 	  curr_automaton = XCREATENODE (struct automaton);
-	  curr_automaton->ainsn_list = create_ainsns ();
+	  create_ainsns (curr_automaton);
 	  curr_automaton->corresponding_automaton_decl = NULL;
 	  curr_automaton->next_automaton = NULL;
           curr_automaton->automaton_order_num = curr_automaton_num;
@@ -6611,7 +6675,7 @@  create_automata (void)
 	      && DECL_AUTOMATON (decl)->automaton_is_used)
 	    {
 	      curr_automaton = XCREATENODE (struct automaton);
-	      curr_automaton->ainsn_list = create_ainsns ();
+	      create_ainsns (curr_automaton);
 	      curr_automaton->corresponding_automaton_decl
 		= DECL_AUTOMATON (decl);
 	      curr_automaton->next_automaton = NULL;
@@ -6628,7 +6692,7 @@  create_automata (void)
       if (curr_automaton_num == 0)
 	{
 	  curr_automaton = XCREATENODE (struct automaton);
-	  curr_automaton->ainsn_list = create_ainsns ();
+	  create_ainsns (curr_automaton);
 	  curr_automaton->corresponding_automaton_decl = NULL;
 	  curr_automaton->next_automaton = NULL;
 	  description->first_automaton = curr_automaton;
@@ -6857,10 +6921,11 @@  output_temp_chip_member_name (FILE *f, a
   output_chip_member_name (f, automaton);
 }
 
-/* This is name of macro value which is code of pseudo_insn
-   representing advancing cpu cycle.  Its value is used as internal
-   code unknown insn.  */
+/* This is name of macro value which is code of pseudo_insns
+   representing advancing cpu cycle and collapsing the NDFA.
+   Its value is used as internal code unknown insn.  */
 #define ADVANCE_CYCLE_VALUE_NAME "DFA__ADVANCE_CYCLE"
+#define COLLAPSE_NDFA_VALUE_NAME "NDFA__COLLAPSE"
 
 /* Output name of translate vector for given automaton.  */
 static void
@@ -7735,6 +7800,9 @@  output_tables (void)
     }
   fprintf (output_file, "\n#define %s %d\n\n", ADVANCE_CYCLE_VALUE_NAME,
            DECL_INSN_RESERV (advance_cycle_insn_decl)->insn_num);
+  if (collapse_flag)
+    fprintf (output_file, "\n#define %s %d\n\n", COLLAPSE_NDFA_VALUE_NAME,
+	     DECL_INSN_RESERV (collapse_ndfa_insn_decl)->insn_num);
 }
 
 /* The function outputs definition and value of PHR interface variable
@@ -8012,13 +8080,20 @@  output_internal_insn_code_evaluation (co
 				      const char *insn_code_name,
 				      int code)
 {
-  fprintf (output_file, "\n  if (%s != 0)\n    {\n", insn_name);
+  fprintf (output_file, "\n  if (%s == 0)\n", insn_name);
+  fprintf (output_file, "    %s = %s;\n\n",
+	   insn_code_name, ADVANCE_CYCLE_VALUE_NAME);
+  if (collapse_flag)
+    {
+      fprintf (output_file, "\n  else if (%s == const0_rtx)\n", insn_name);
+      fprintf (output_file, "    %s = %s;\n\n",
+	       insn_code_name, COLLAPSE_NDFA_VALUE_NAME);
+    }
+  fprintf (output_file, "\n  else\n    {\n");
   fprintf (output_file, "      %s = %s (%s);\n", insn_code_name,
 	   DFA_INSN_CODE_FUNC_NAME, insn_name);
-  fprintf (output_file, "      if (%s > %s)\n        return %d;\n",
+  fprintf (output_file, "      if (%s > %s)\n        return %d;\n    }\n",
 	   insn_code_name, ADVANCE_CYCLE_VALUE_NAME, code);
-  fprintf (output_file, "    }\n  else\n    %s = %s;\n\n",
-	   insn_code_name, ADVANCE_CYCLE_VALUE_NAME);
 }
 
 
@@ -8219,9 +8294,8 @@  output_default_latencies (void)
   fprintf (output_file, "  static const %s default_latencies[] =\n    {",
 	   tabletype);
 
-  for (i = 0, j = 0, col = 7; i < description->decls_num; i++)
-    if (description->decls[i]->mode == dm_insn_reserv
-	&& description->decls[i] != advance_cycle_insn_decl)
+  for (i = 0, j = 0, col = 7; i < description->normal_decls_num; i++)
+    if (description->decls[i]->mode == dm_insn_reserv)
       {
 	if ((col = (col+1) % 8) == 0)
 	  fputs ("\n     ", output_file);
@@ -8230,7 +8304,7 @@  output_default_latencies (void)
 	fprintf (output_file, "% 4d,",
 		 DECL_INSN_RESERV (decl)->default_latency);
       }
-  gcc_assert (j == DECL_INSN_RESERV (advance_cycle_insn_decl)->insn_num);
+  gcc_assert (j == description->insns_num - (collapse_flag ? 2 : 1));
   fputs ("\n    };\n", output_file);
 }
 
@@ -8411,10 +8485,10 @@  output_print_reservation_func (void)
   fputs ("  static const char *const reservation_names[] =\n    {",
 	 output_file);
 
-  for (i = 0, j = 0; i < description->decls_num; i++)
+  for (i = 0, j = 0; i < description->normal_decls_num; i++)
     {
       decl = description->decls [i];
-      if (decl->mode == dm_insn_reserv && decl != advance_cycle_insn_decl)
+      if (decl->mode == dm_insn_reserv)
 	{
 	  gcc_assert (j == DECL_INSN_RESERV (decl)->insn_num);
 	  j++;
@@ -8424,7 +8498,7 @@  output_print_reservation_func (void)
 	  finish_regexp_representation ();
 	}
     }
-  gcc_assert (j == DECL_INSN_RESERV (advance_cycle_insn_decl)->insn_num);
+  gcc_assert (j == description->insns_num - (collapse_flag ? 2 : 1));
 
   fprintf (output_file, "\n      \"%s\"\n    };\n  int %s;\n\n",
 	   NOTHING_NAME, INTERNAL_INSN_CODE_NAME);
@@ -8734,7 +8808,7 @@  output_description (void)
 	}
     }
   fprintf (output_description_file, "\n");
-  for (i = 0; i < description->decls_num; i++)
+  for (i = 0; i < description->normal_decls_num; i++)
     {
       decl = description->decls [i];
       if (decl->mode == dm_reserv)
@@ -8744,7 +8818,7 @@  output_description (void)
           output_regexp (DECL_RESERV (decl)->regexp);
           fprintf (output_description_file, "\n");
         }
-      else if (decl->mode == dm_insn_reserv && decl != advance_cycle_insn_decl)
+      else if (decl->mode == dm_insn_reserv)
         {
           fprintf (output_description_file, "insn reservation %s ",
 		   DECL_INSN_RESERV (decl)->name);
@@ -9150,6 +9224,8 @@  initiate_automaton_gen (int argc, char *
       w_flag = 1;
     else if (strcmp (argv [i], NDFA_OPTION) == 0)
       ndfa_flag = 1;
+    else if (strcmp (argv [i], COLLAPSE_OPTION) == 0)
+      collapse_flag = 1;
     else if (strcmp (argv [i], PROGRESS_OPTION) == 0)
       progress_flag = 1;
     else if (strcmp (argv [i], "-split") == 0)
@@ -9192,7 +9268,8 @@  check_automata_insn_issues (void)
       for (ainsn = automaton->ainsn_list;
 	   ainsn != NULL;
 	   ainsn = ainsn->next_ainsn)
-	if (ainsn->first_insn_with_same_reservs && !ainsn->arc_exists_p)
+	if (ainsn->first_insn_with_same_reservs && !ainsn->arc_exists_p
+	    && ainsn != automaton->collapse_ainsn)
 	  {
 	    for (reserv_ainsn = ainsn;
 		 reserv_ainsn != NULL;
@@ -9306,9 +9383,10 @@  expand_automata (void)
 
   description = XCREATENODEVAR (struct description,
 				sizeof (struct description)
-				/* One entry for cycle advancing insn.  */
-				+ sizeof (decl_t) * VEC_length (decl_t, decls));
+				/* Two entries for special insns.  */
+				+ sizeof (decl_t) * (VEC_length (decl_t, decls) + 1));
   description->decls_num = VEC_length (decl_t, decls);
+  description->normal_decls_num = description->decls_num;
   description->query_units_num = 0;
   for (i = 0; i < description->decls_num; i++)
     {