diff mbox series

[aarch64] v2: Preparatory patch to place target independent and,dependent changed code in one file

Message ID 6f43d8ab-4f66-4079-b1dc-e846af95cb3a@linux.ibm.com
State New
Headers show
Series [aarch64] v2: Preparatory patch to place target independent and,dependent changed code in one file | expand

Commit Message

Ajit Agarwal April 21, 2024, 7:52 a.m. UTC
Hello Alex/Richard:

All review comments are addressed and changes are made to transform_for_base
function as per consensus.

Common infrastructure of load store pair fusion is divided into target
independent and target dependent changed code.

Target independent code is the Generic code with pure virtual function
to interface betwwen target independent and dependent code.

Target dependent code is the implementation of pure virtual function for
aarch64 target and the call to target independent code.

Bootstrapped on aarch64-linux-gnu.

Thanks & Regards
Ajit



aarch64: Preparatory patch to place target independent and
dependent changed code in one file

Common infrastructure of load store pair fusion is divided into target
independent and target dependent changed code.

Target independent code is the Generic code with pure virtual function
to interface betwwen target independent and dependent code.

Target dependent code is the implementation of pure virtual function for
aarch64 target and the call to target independent code.

2024-04-21  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>

gcc/ChangeLog:

	* config/aarch64/aarch64-ldp-fusion.cc: Place target
	independent and dependent changed code
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 484 +++++++++++++++--------
 1 file changed, 325 insertions(+), 159 deletions(-)

Comments

Alex Coplan May 8, 2024, 12:48 p.m. UTC | #1
Hi Ajit,

Sorry for the long delay in reviewing this.

This is really getting there now.  I've left a few more comments below.

Apart from minor style things, the main remaining issues are mostly
around comments.  It's important to have good clear comments for
functions with the parameters (and return value, if any) clearly
described.  See https://www.gnu.org/prep/standards/standards.html#Comments

Note that this now needs a little rebasing, too.

On 21/04/2024 13:22, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All review comments are addressed and changes are made to transform_for_base
> function as per consensus.
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Bootstrapped on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> 
> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-04-21  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
> 
> gcc/ChangeLog:
> 
> 	* config/aarch64/aarch64-ldp-fusion.cc: Place target
> 	independent and dependent changed code
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 484 +++++++++++++++--------
>  1 file changed, 325 insertions(+), 159 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 365dcf48b22..83a917e1d20 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,189 @@ struct alt_base
>    poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +// Forward declaration to be used inside the aarch64_pair_fusion class.
> +bool ldp_operand_mode_ok_p (machine_mode mode);
> +rtx aarch64_destructure_load_pair (rtx regs[2], rtx pattern);
> +rtx aarch64_destructure_store_pair (rtx regs[2], rtx pattern);
> +rtx aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
> +			    bool load_p);

I don't think we want to change the linkage of these, they should be kept
static.

> +enum class writeback{

Nit: space before '{'

> +  WRITEBACK_PAIR_P,
> +  WRITEBACK
> +};

We're going to want some more descriptive names here.  How about
EXISTING and ALL?  Note that the WRITEBACK_ prefix isn't necessary as
you're using an enum class, so uses of the enumerators need to be
prefixed with writeback:: anyway.  A comment describing the usage of the
enum as well as comments above the enumerators describing their
interpretation would be good.

> +
> +struct pair_fusion {
> +

Nit: excess blank line.

> +  pair_fusion ()
> +  {
> +    calculate_dominance_info (CDI_DOMINATORS);
> +    df_analyze ();
> +    crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };

Can we have one blank line between the virtual functions, please?  I
think that would be more readable now that there are comments above each
of them.

> +  // Return true if GPR is FP or SIMD accesses, passed
> +  // with GPR reg_op rtx, machine mode and load_p.

It's slightly awkward trying to document this without the parameter
names, but I can see that you're omitting them to avoid unused parameter
warnings.  One option would be to introduce names in the comment as you
go.  How about this instead:

// Given:
// - an rtx REG_OP, the non-memory operand in a load/store insn,
// - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
// - a boolean LOAD_P (true iff the insn is a load), then:
// return true if the access should be considered an FP/SIMD access.
// Such accesses are segregated from GPR accesses, since we only want to
// form pairs for accesses that use the same register file.

> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +    return false;
> +  }
> +  // Return true if pair operand mode is ok. Passed with
> +  // machine mode.

Could you use something closer to the comment that is already above
ldp_operand_mode_ok_p?  The purpose of this predicate is really to test
the following: "is it a good idea (for optimization) to form paired
accesses with this operand mode at this stage in compilation?"

> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +  // Return true if reg operand is ok, passed with load_p,
> +  // reg_op rtx and machine mode.

Can you try to use the parameter names in the comment where possible?
The comment needs to be a bit more precise, what does "ok" mean here?
How about:

// Return true iff REG_OP is a suitable register operand for a paired
// memory access, where LOAD_P is true if we're asking about loads and
// false for stores.  MEM_MODE gives the mode of the operand.

Also ...

> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +				      machine_mode mem_mode) = 0;

... can we use just mode (instead of mem_mode) as the parameter name
here?  We're only using the mem mode because the non-memory operand
might be a const_int, in which case we need to infer the mode of the
register operand from the mode of the memory operand.  But really it's
just the mode of REG_OP we're passing here.

> +  // Return alias check limit.
> +  virtual int pair_mem_alias_check_limit () = 0;

Can you extend this comment with something like:  "This is needed to
avoid unbounded quadratic behaviour when performing alias analysis."

> +  // Return true if there is writeback opportunities. Passed
> +  // with enum writeback.

Not exactly.  This returns true if we should try to handle writeback
opportunities (not whether there are any).

> +  virtual bool handle_writeback_opportunities (enum writeback wback) = 0 ;
> +  // Return true if mem ok ldp stp policy model passed with
> +  // rtx mem, load_p and machine mode.

This doesn't parse.  Please can you write full grammatically-correct
sentences in comments?

I can't see a need to pass the mode separately here (the current code
just passes GET_MODE (first_mem)), so I think we only need two parameters
here: first_mem and load_p.  Let's rename s/first_mem/base_mem/ here, too.

Then the comment can read something like:

// Given BASE_MEM, the mem from the lower candidate access for a pair,
// and LOAD_P (true if the access is a load), check if we should proceed
// to form the pair given the target's code generation policy on
// paired accesses.

> +  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
> +					machine_mode mode) = 0;
> +  // Gen load store mem pair. Return load store rtx passed
> +  // with arguments load store pattern, writeback rtx and
> +  // load_p.

As above, let's use the formal parameter names here.  This comment could
also do with more detail.  How about:

// Generate the pattern for a paired access.  PATS gives the patterns
// for the individual memory accesses (which by this point must share a
// common base register).  If WRITEBACK is non-NULL, then this rtx
// describes the update to the base register that should be performed by
// the resulting insn.  LOAD_P is true iff the accesses are loads.

> +  virtual rtx gen_mem_pair (rtx *pats, rtx writeback,
> +			    bool load_p) = 0;

Nit on naming: how about just gen_pair?  It is clear from being a member
function of the pass what sort of pair we are talking about.

> +  // Return true if memory writeback can be promoted, passed
> +  // with insn, rtx pattern and load_p. load_p is set by this
> +  // hook.
> +  virtual bool pair_mem_promote_writeback_p (insn_info *, rtx, bool &)
> +  {
> +     return false;
> +  }

I think we should change the interface here, see the comments in
ldp_fusion_bb below.

> +  // Return true if we track loads.

I think this reads better if you say "if we _should_ track loads".
Likewise for the store case below.

> +  virtual bool track_loads_p ()
> +  {
> +    return true;
> +  }
> +  // Return true if we track stores.
> +  virtual bool track_stores_p ()
> +  {
> +    return true;
> +  }
> +  // Return true if offset is out of range.
> +  virtual bool pair_mem_out_of_range_p (HOST_WIDE_INT off) = 0;
> +  // Return destructure pair. Passed with rtx reg, insn pattern
> +  // and load_p.
> +  virtual rtx gen_destructure_pair (rtx regs[2], rtx rti, bool load_p) = 0;

So firstly this shouldn't be prefixed with gen_ as there is no RTL generation
going on, it's just pulling apart an existing pair insn.

On the comment: again you should use the formal parameter names and describe
the purpose of each parameter.

> +  // Return writeback pair. Passed with rtx writeback effect, mem rtx
> +  // regs rtx and load_p.
> +  virtual rtx gen_writeback_pair (rtx wb_effect, rtx mem,
> +				  rtx regs[2], bool load_p) = 0;

I realise this is pre-existing, but: I think we want a more descriptive name
for this function.  The key bit of the comment above
aarch64_gen_writeback_pair is:

// This is used when promoting existing non-writeback pairs to writeback
// variants.

so how about gen_promote_writeback_pair instead?  Including that part of
the comment above would be good, too.  I wasn't sure on a first glance
how this function differed from your gen_mem_pair function, which can
also generate writeback pairs.

> +  void ldp_fusion_bb (bb_info *bb);
> +  insn_info * find_trailing_add (insn_info *insns[2],

Style nit: no space after *.  Also this should probably be marked inline
as we want the linkage kept within this file.  Likewise with the next
two functions.

> +				 const insn_range_info &pair_range,
> +				 int initial_writeback,
> +				 rtx *writeback_effect,
> +				 def_info **add_def,
> +				 def_info *base_def,
> +				 poly_int64 initial_offset,
> +				 unsigned access_size);
> +  int get_viable_bases (insn_info *insns[2],
> +			vec<base_cand> &base_cands,
> +			rtx cand_mems[2],
> +			unsigned access_size,
> +			bool reversed);
> +  void do_alias_analysis (insn_info *alias_hazards[4],
> +			  alias_walker *walkers[4],
> +			  bool load_p);
> +  void run ();
> +  ~pair_fusion()
> +  {
> +    if (crtl->ssa->perform_pending_updates ())
> +      cleanup_cfg (0);
> +
> +    free_dominance_info (CDI_DOMINATORS);
> +
> +    delete crtl->ssa;
> +    crtl->ssa = nullptr;
> +  }
> +};
> +
> +void
> +pair_fusion::run ()

This function could do with a comment.

> +{
> +  if (!track_loads_p () && !track_stores_p ())
> +    return;
> +
> +  for (auto bb : crtl->ssa->bbs ())
> +    ldp_fusion_bb (bb);
> +}
> +
> +struct aarch64_pair_fusion : public pair_fusion
> +{
> +  bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
> +		    bool load_p) override final
> +  {
> +    return reload_completed
> +      ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
> +      : (GET_MODE_CLASS (mem_mode) != MODE_INT
> +	 && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
> +  }
> +  bool pair_mem_promote_writeback_p (insn_info *insn, rtx pat, bool &load_p);

All of these other virtual overrides should be marked with the
override final keywords, too.

> +  bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
> +				machine_mode mode)
> +  {
> +    return aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
> +						     load_p,
> +						     mode);
> +  }
> +  bool pair_operand_mode_ok_p (machine_mode mode)
> +  {
> +    return ldp_operand_mode_ok_p (mode);
> +  }

Here you can just rename the static function ldp_operand_mode_ok_p
to aarch64_pair_fusion::pair_operand_mode_ok_p.  No need for the
wrapper function.

> +  rtx gen_mem_pair (rtx *pats, rtx writeback, bool load_p);
> +  bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +			      machine_mode mem_mode)
> +  {
> +    return (load_p
> +	     ? aarch64_ldp_reg_operand (reg_op, mem_mode)
> +	     : aarch64_stp_reg_operand (reg_op, mem_mode));
> +  }
> +  int pair_mem_alias_check_limit ()
> +  {
> +    return aarch64_ldp_alias_check_limit;
> +  }
> +  bool handle_writeback_opportunities (enum writeback wback)

Just a suggestion: you could call this parameter "which" as it is
talking about which opportunities we want to handle.

> +  {
> +    if (wback == writeback::WRITEBACK_PAIR_P)
> +      return aarch64_ldp_writeback > 1;
> +    else
> +      return aarch64_ldp_writeback;
> +  }
> +  bool track_loads_p ()
> +  {
> +    return
> +      aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;

Style nit: I _think_ this would be better formatted with the LHS of the
expression on the same line as the return keyword, then the != on a new
line.  I.e.:

return aarch64_tune_params.ldp_policy_model
       != AARCH64_LDP_STP_POLICY_NEVER;

> +  }
> +  bool track_stores_p ()
> +  {
> +    return
> +      aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;

Likewise.

> +  }
> +  bool pair_mem_out_of_range_p (HOST_WIDE_INT off)
> +  {
> +    return (off < LDP_MIN_IMM || off > LDP_MAX_IMM);
> +  }

When I said (in
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648878.html):

> Please can you invert the condition and fix up callers appropriately?

I was expecting you to keep the function named pair_mem_in_range_p and invert
the condition, then update callers to introduce negations.

I think it's more natural to have the function test for the positive condition
(is this offset in range) and then callers negate it if they want to check if
an offset is out of range.  That matches existing practice elsewhere.

> +  rtx gen_writeback_pair (rtx wb_effect, rtx mem, rtx regs[2], bool load_p)
> +  {
> +    return aarch64_gen_writeback_pair (wb_effect, mem, regs, load_p);
> +  }

Just rename the static function aarch64_gen_writeback_pair to
aarch64_pair_fusion::gen_writeback_pair, then there is no need for this
wrapper.

> +  rtx gen_destructure_pair (rtx regs[2], rtx rti, bool load_p);
> +};
> +
>  // State used by the pass for a given basic block.
>  struct ldp_bb_info
>  {
> @@ -160,8 +343,11 @@ struct ldp_bb_info
>  
>    static const size_t obstack_alignment = sizeof (void *);
>    bb_info *m_bb;
> +  pair_fusion *m_pass;

I realise this is pre-existing, but can m_bb and m_pass both be private
members?

>  
> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
> +  ldp_bb_info (bb_info *bb,
> +		      pair_fusion *d) : m_bb (bb),
> +		      m_pass (d), m_emitted_tombstone (false)

I think the preferred style is to have the parameters on the same line
and the initializer list (starting with ':') on a new line.

>    {
>      obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
>  				obstack_alignment, obstack_chunk_alloc,
> @@ -177,10 +363,28 @@ struct ldp_bb_info
>  	bitmap_obstack_release (&m_bitmap_obstack);
>        }
>    }
> -

Any need for this whitespace change?

>    inline void track_access (insn_info *, bool load, rtx mem);
>    inline void transform ();
>    inline void cleanup_tombstones ();
> +  inline void merge_pairs (insn_list_t &, insn_list_t &,
> +		    bool load_p, unsigned access_size);
> +  inline void transform_for_base (int load_size, access_group &group);
> +
> +  inline bool try_fuse_pair (bool load_p, unsigned access_size,
> +			     insn_info *i1, insn_info *i2);
> +
> +  inline bool fuse_pair (bool load_p, unsigned access_size,
> +		  int writeback,
> +		  insn_info *i1, insn_info *i2,
> +		  base_cand &base,
> +		  const insn_range_info &move_range);
> +
> +  inline void track_tombstone (int uid);
> +
> +  inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
> +
> +  template<typename Map>
> +    void traverse_base_map (Map &map);

What's the reason for changing the access of these functions?
Can they be kept private?

>  
>  private:
>    obstack m_obstack;
> @@ -191,27 +395,60 @@ private:
>    bool m_emitted_tombstone;
>  
>    inline splay_tree_node<access_record *> *node_alloc (access_record *);
> +};
>  
> -  template<typename Map>
> -  inline void traverse_base_map (Map &map);
> -  inline void transform_for_base (int load_size, access_group &group);
> -
> -  inline void merge_pairs (insn_list_t &, insn_list_t &,
> -			   bool load_p, unsigned access_size);
> +bool
> +aarch64_pair_fusion::pair_mem_promote_writeback_p (insn_info *insn, rtx pat,
> +						   bool &load_p)
> +{
> +  if (reload_completed
> +      && aarch64_ldp_writeback > 1
> +      && GET_CODE (pat) == PARALLEL
> +      && XVECLEN (pat, 0) == 2)
> +    {
> +      auto rti = insn->rtl ();
> +      const auto attr = get_attr_ldpstp (rti);
> +      if (attr == LDPSTP_NONE)
> +	return false;
>  
> -  inline bool try_fuse_pair (bool load_p, unsigned access_size,
> -			     insn_info *i1, insn_info *i2);
> +      load_p = (attr == LDPSTP_LDP);
> +      gcc_checking_assert (load_p || attr == LDPSTP_STP);
> +      return true;
> +    }
> +  return false;
> +}
>  
> -  inline bool fuse_pair (bool load_p, unsigned access_size,
> -			 int writeback,
> -			 insn_info *i1, insn_info *i2,
> -			 base_cand &base,
> -			 const insn_range_info &move_range);
> +rtx
> +aarch64_pair_fusion::gen_destructure_pair (rtx regs[2], rtx rti, bool load_p)
> +{
> +  if (load_p)
> +    return aarch64_destructure_load_pair (regs, rti);
> +  else
> +    return aarch64_destructure_store_pair (regs, rti);
> +}
>  
> -  inline void track_tombstone (int uid);
> +rtx
> +aarch64_pair_fusion::gen_mem_pair (rtx *pats,
> +		  rtx writeback,
> +		  bool load_p)
> +{
> +  rtx pair_pat;
>  
> -  inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
> -};
> +  if (writeback)
> +    {
> +      auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]);
> +      return  gen_rtx_PARALLEL (VOIDmode, patvec);
> +    }
> +  else if (load_p)
> +    return aarch64_gen_load_pair (XEXP (pats[0], 0),
> +				  XEXP (pats[1], 0),
> +				  XEXP (pats[0], 1));
> +  else
> +    return aarch64_gen_store_pair (XEXP (pats[0], 0),
> +				   XEXP (pats[0], 1),
> +				   XEXP (pats[1], 1));
> +  return pair_pat;
> +}
>  
>  splay_tree_node<access_record *> *
>  ldp_bb_info::node_alloc (access_record *access)
> @@ -312,7 +549,7 @@ any_post_modify_p (rtx x)
>  
>  // Return true if we should consider forming ldp/stp insns from memory
>  // accesses with operand mode MODE at this stage in compilation.
> -static bool
> +bool
>  ldp_operand_mode_ok_p (machine_mode mode)
>  {
>    const bool allow_qregs
> @@ -412,9 +649,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
>    const machine_mode mem_mode = GET_MODE (mem);
>    const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
>  
> -  // Punt on misaligned offsets.  LDP/STP instructions require offsets to be a
> -  // multiple of the access size, and we believe that misaligned offsets on
> -  // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL bases.
> +  // Punt on misaligned offsets.  Paired memory access instructions require
> +  // offsets to be a multiple of the access size, and we believe that
> +  // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned
> +  // offsets w.r.t. RTL bases.
>    if (!multiple_p (offset, mem_size))
>      return false;
>  
> @@ -438,10 +676,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
>  }
>  
>  // Main function to begin pair discovery.  Given a memory access INSN,
> -// determine whether it could be a candidate for fusing into an ldp/stp,
> -// and if so, track it in the appropriate data structure for this basic
> -// block.  LOAD_P is true if the access is a load, and MEM is the mem
> -// rtx that occurs in INSN.
> +// determine whether it could be a candidate for fusing into a paired
> +// access, and if so, track it in the appropriate data structure for
> +// this basic block.  LOAD_P is true if the access is a load, and MEM
> +//  is the mem rtx that occurs in INSN.
>  void
>  ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>  {
> @@ -449,35 +687,26 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>    if (MEM_VOLATILE_P (mem))
>      return;
>  
> -  // Ignore writeback accesses if the param says to do so.
> -  if (!aarch64_ldp_writeback
> +  // Ignore writeback accesses if the hook says to do so.
> +  if (!m_pass->handle_writeback_opportunities (writeback::WRITEBACK)
>        && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
>      return;
>  
>    const machine_mode mem_mode = GET_MODE (mem);
> -  if (!ldp_operand_mode_ok_p (mem_mode))
> +  if (!m_pass->pair_operand_mode_ok_p (mem_mode))
>      return;
>  
>    rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p);
>  
> -  // Ignore the access if the register operand isn't suitable for ldp/stp.
> -  if (load_p
> -      ? !aarch64_ldp_reg_operand (reg_op, mem_mode)
> -      : !aarch64_stp_reg_operand (reg_op, mem_mode))
> +  if (!m_pass->pair_reg_operand_ok_p (load_p, reg_op, mem_mode))
>      return;
> -

Let's keep this newline.

>    // We want to segregate FP/SIMD accesses from GPR accesses.
>    //
>    // Before RA, we use the modes, noting that stores of constant zero
>    // operands use GPRs (even in non-integer modes).  After RA, we use
>    // the hard register numbers.

The second para of this comment would probably be best moved to the aarch64
implementation of the hook (with the first sentence duplicated in both places).

> -  const bool fpsimd_op_p
> -    = reload_completed
> -    ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
> -    : (GET_MODE_CLASS (mem_mode) != MODE_INT
> -       && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
> -
> -  // Note ldp_operand_mode_ok_p already rejected VL modes.
> +  const bool fpsimd_op_p = m_pass->fpsimd_op_p (reg_op, mem_mode, load_p);

Might be more readable with a blank line inserted here.

> +  // Note pair_operand_mode_ok_p already rejected VL modes.
>    const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
>    const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size };
>  
> @@ -506,8 +735,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>    // elimination offset pre-RA, we should postpone forming pairs on such
>    // accesses until after RA.
>    //
> -  // As it stands, addresses with offsets in range for LDR but not
> -  // in range for LDP/STP are currently reloaded inefficiently,
> +  // As it stands, addresses in range for an individual load/store but not
> +  // for a paired access are currently reloaded inefficiently,
>    // ending up with a separate base register for each pair.
>    //
>    // In theory LRA should make use of
> @@ -519,8 +748,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>    // that calls targetm.legitimize_address_displacement.
>    //
>    // So for now, it's better to punt when we can't be sure that the
> -  // offset is in range for LDP/STP.  Out-of-range cases can then be
> -  // handled after RA by the out-of-range LDP/STP peepholes.  Eventually, it
> +  // offset is in range for paired access.  Out-of-range cases can then be
> +  // handled after RA by the out-of-range PAIR MEM  peepholes.  Eventually, it

Does this part really apply to rs6000?  Do you have peephole2 patterns for lxvp?
If not you may want to caveat this part as being specific to aarch64.  E.g. "On
aarch64, out-of-range cases ..."

>    // would be nice to handle known out-of-range opportunities in the
>    // pass itself (for stack accesses, this would be in the post-RA pass).
>    if (!reload_completed
> @@ -573,8 +802,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>  	gcc_unreachable (); // Base defs should be unique.
>      }
>  
> -  // Punt on misaligned offsets.  LDP/STP require offsets to be a multiple of
> -  // the access size.
> +  // Punt on misaligned offsets.  Paired memory accesses require offsets
> +  // to be a multiple of the access size.
>    if (!multiple_p (mem_off, mem_size))
>      return;
>  
> @@ -1207,8 +1436,8 @@ extract_writebacks (bool load_p, rtx pats[2], int changed)
>  // base register.  If there is one, we choose the first such update after
>  // PAIR_DST that is still in the same BB as our pair.  We return the new def in
>  // *ADD_DEF and the resulting writeback effect in *WRITEBACK_EFFECT.
> -static insn_info *
> -find_trailing_add (insn_info *insns[2],
> +insn_info *
> +pair_fusion::find_trailing_add (insn_info *insns[2],
>  		   const insn_range_info &pair_range,
>  		   int initial_writeback,
>  		   rtx *writeback_effect,
> @@ -1286,7 +1515,7 @@ find_trailing_add (insn_info *insns[2],
>  
>    off_hwi /= access_size;
>  
> -  if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM)
> +  if (pair_mem_out_of_range_p (off_hwi))
>      return nullptr;
>  
>    auto dump_prefix = [&]()
> @@ -1800,7 +2029,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>      {
>        if (dump_file)
>  	fprintf (dump_file,
> -		 "  ldp: i%d has wb but subsequent i%d has non-wb "
> +		 "  load pair: i%d has wb but subsequent i%d has non-wb "
>  		 "update of base (r%d), dropping wb\n",
>  		 insns[0]->uid (), insns[1]->uid (), base_regno);
>        gcc_assert (writeback_effect);
> @@ -1823,7 +2052,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>      }
>  
>    // If either of the original insns had writeback, but the resulting pair insn
> -  // does not (can happen e.g. in the ldp edge case above, or if the writeback
> +  // does not (can happen e.g. in the load pair edge case above, or if the writeback
>    // effects cancel out), then drop the def(s) of the base register as
>    // appropriate.
>    //
> @@ -1842,7 +2071,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>    // update of the base register and try and fold it in to make this into a
>    // writeback pair.
>    insn_info *trailing_add = nullptr;
> -  if (aarch64_ldp_writeback > 1
> +  if (m_pass->handle_writeback_opportunities (writeback::WRITEBACK_PAIR_P)
>        && !writeback_effect
>        && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1,
>  					 XEXP (pats[0], 0), nullptr)
> @@ -1850,7 +2079,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>  					     XEXP (pats[1], 0), nullptr))))
>      {
>        def_info *add_def;
> -      trailing_add = find_trailing_add (insns, move_range, writeback,
> +      trailing_add = m_pass->find_trailing_add (insns, move_range, writeback,
>  					&writeback_effect,
>  					&add_def, base.def, offsets[0],
>  					access_size);
> @@ -1863,14 +2092,14 @@ ldp_bb_info::fuse_pair (bool load_p,
>      }
>  
>    // Now that we know what base mem we're going to use, check if it's OK
> -  // with the ldp/stp policy.
> +  // with the pair mem policy.
>    rtx first_mem = XEXP (pats[0], load_p);
> -  if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
> -						load_p,
> -						GET_MODE (first_mem)))
> +  if (!m_pass->pair_mem_ok_with_policy (first_mem,
> +			  load_p,
> +			  GET_MODE (first_mem)))

The indentation looks off here.  I suppose when you drop the mode parameter
(as mentioned above) this will all fit on one line anyway.

>      {
>        if (dump_file)
> -	fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n",
> +	fprintf (dump_file, "punting on pair (%d,%d), pair mem  policy says no\n",

Nit: excess space after mem.

>  		 i1->uid (), i2->uid ());
>        return false;
>      }
> @@ -1878,21 +2107,10 @@ ldp_bb_info::fuse_pair (bool load_p,
>    rtx reg_notes = combine_reg_notes (first, second, load_p);
>  
>    rtx pair_pat;
> -  if (writeback_effect)
> -    {
> -      auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]);
> -      pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec);
> -    }
> -  else if (load_p)
> -    pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0),
> -				      XEXP (pats[1], 0),
> -				      XEXP (pats[0], 1));
> -  else
> -    pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0),
> -				       XEXP (pats[0], 1),
> -				       XEXP (pats[1], 1));
>  
> +  pair_pat = m_pass->gen_mem_pair (pats, writeback_effect, load_p);

Please can you declare pair_pat on the same line it is set, so that we have:

  rtx pair_pat = m_pass->gen_mem_pair (pats, writeback_effect, load_p);


>    insn_change *pair_change = nullptr;
> +

Spurious whitespace change?

>    auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) {
>      rtx_insn *rti = change->insn ()->rtl ();
>      validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
> @@ -2133,15 +2351,6 @@ load_modified_by_store_p (insn_info *load,
>    return false;
>  }
>  
> -// Virtual base class for load/store walkers used in alias analysis.
> -struct alias_walker
> -{
> -  virtual bool conflict_p (int &budget) const = 0;
> -  virtual insn_info *insn () const = 0;
> -  virtual bool valid () const = 0;
> -  virtual void advance () = 0;
> -};
> -
>  // Implement some common functionality used by both store_walker
>  // and load_walker.
>  template<bool reverse>
> @@ -2259,13 +2468,13 @@ public:
>  //
>  // We try to maintain the invariant that if a walker becomes invalid, we
>  // set its pointer to null.
> -static void
> -do_alias_analysis (insn_info *alias_hazards[4],
> +void
> +pair_fusion::do_alias_analysis (insn_info *alias_hazards[4],
>  		   alias_walker *walkers[4],
>  		   bool load_p)
>  {
>    const int n_walkers = 2 + (2 * !load_p);
> -  int budget = aarch64_ldp_alias_check_limit;
> +  int budget = pair_mem_alias_check_limit ();
>  
>    auto next_walker = [walkers,n_walkers](int current) -> int {
>      for (int j = 1; j <= n_walkers; j++)
> @@ -2350,8 +2559,8 @@ do_alias_analysis (insn_info *alias_hazards[4],
>  //
>  // Returns an integer where bit (1 << i) is set if INSNS[i] uses writeback
>  // addressing.
> -static int
> -get_viable_bases (insn_info *insns[2],
> +int
> +pair_fusion::get_viable_bases (insn_info *insns[2],
>  		  vec<base_cand> &base_cands,
>  		  rtx cand_mems[2],
>  		  unsigned access_size,
> @@ -2397,7 +2606,7 @@ get_viable_bases (insn_info *insns[2],
>        if (!is_lower)
>  	base_off--;
>  
> -      if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM)
> +      if (pair_mem_out_of_range_p (base_off))
>  	continue;
>  
>        use_info *use = find_access (insns[i]->uses (), REGNO (base));
> @@ -2454,7 +2663,7 @@ get_viable_bases (insn_info *insns[2],
>  }
>  
>  // Given two adjacent memory accesses of the same size, I1 and I2, try
> -// and see if we can merge them into a ldp or stp.
> +// and see if we can merge them into a paired access load and store.

Delete "load and store" here.

>  //
>  // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true
>  // if the accesses are both loads, otherwise they are both stores.
> @@ -2494,7 +2703,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>      {
>        if (dump_file)
>  	fprintf (dump_file,
> -		 "punting on ldp due to reg conflcits (%d,%d)\n",
> +		 "punting on pair mem load  due to reg conflcits (%d,%d)\n",
>  		 insns[0]->uid (), insns[1]->uid ());
>        return false;
>      }
> @@ -2512,7 +2721,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>  
>    auto_vec<base_cand, 2> base_cands (2);
>  
> -  int writeback = get_viable_bases (insns, base_cands, cand_mems,
> +  int writeback = m_pass->get_viable_bases (insns, base_cands, cand_mems,
>  				    access_size, reversed);
>    if (base_cands.is_empty ())
>      {
> @@ -2641,7 +2850,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>      walkers[1] = &backward_store_walker;
>  
>    if (load_p && (mem_defs[0] || mem_defs[1]))
> -    do_alias_analysis (alias_hazards, walkers, load_p);
> +    m_pass->do_alias_analysis (alias_hazards, walkers, load_p);
>    else
>      {
>        // We want to find any loads hanging off the first store.
> @@ -2650,7 +2859,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>        load_walker<true> backward_load_walker (mem_defs[1], insns[1], insns[0]);
>        walkers[2] = &forward_load_walker;
>        walkers[3] = &backward_load_walker;
> -      do_alias_analysis (alias_hazards, walkers, load_p);
> +      m_pass->do_alias_analysis (alias_hazards, walkers, load_p);
>        // Now consolidate hazards back down.
>        if (alias_hazards[2]
>  	  && (!alias_hazards[0] || (*alias_hazards[2] < *alias_hazards[0])))
> @@ -2891,7 +3100,7 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list,
>  // merge_pairs.
>  void
>  ldp_bb_info::transform_for_base (int encoded_lfs,
> -				 access_group &group)
> +					 access_group &group)

Looks like a bad change in indentation.

>  {
>    const auto lfs = decode_lfs (encoded_lfs);
>    const unsigned access_size = lfs.size;
> @@ -2964,29 +3173,9 @@ ldp_bb_info::transform ()
>    traverse_base_map (def_map);
>  }
>  
> -static void
> -ldp_fusion_init ()
> -{
> -  calculate_dominance_info (CDI_DOMINATORS);
> -  df_analyze ();
> -  crtl->ssa = new rtl_ssa::function_info (cfun);
> -}
> -
> -static void
> -ldp_fusion_destroy ()
> -{
> -  if (crtl->ssa->perform_pending_updates ())
> -    cleanup_cfg (0);
> -
> -  free_dominance_info (CDI_DOMINATORS);
> -
> -  delete crtl->ssa;
> -  crtl->ssa = nullptr;
> -}
> -
>  // Given a load pair insn in PATTERN, unpack the insn, storing
>  // the registers in REGS and returning the mem.
> -static rtx
> +rtx

Again, no need to change the linkage here.

>  aarch64_destructure_load_pair (rtx regs[2], rtx pattern)
>  {
>    rtx mem = NULL_RTX;
> @@ -3012,7 +3201,7 @@ aarch64_destructure_load_pair (rtx regs[2], rtx pattern)
>  
>  // Given a store pair insn in PATTERN, unpack the insn, storing
>  // the register operands in REGS, and returning the mem.
> -static rtx
> +rtx

Likewise.

>  aarch64_destructure_store_pair (rtx regs[2], rtx pattern)
>  {
>    rtx mem = XEXP (pattern, 0);
> @@ -3030,7 +3219,7 @@ aarch64_destructure_store_pair (rtx regs[2], rtx pattern)
>  //
>  // This is used when promoting existing non-writeback pairs to writeback
>  // variants.
> -static rtx
> +rtx
>  aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
>  			    bool load_p)
>  {
> @@ -3068,22 +3257,13 @@ aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
>  // the base register which we can fold in to make this pair use
>  // a writeback addressing mode.
>  static void
> -try_promote_writeback (insn_info *insn)
> +try_promote_writeback (insn_info *insn, bool load_p, pair_fusion *pass)

I think this would be better as a member function of pair_fusion.  Then
there's no need to pass a pointer to the pass structure explicitly.

>  {
> -  auto rti = insn->rtl ();
> -  const auto attr = get_attr_ldpstp (rti);
> -  if (attr == LDPSTP_NONE)
> -    return;
> -
> -  bool load_p = (attr == LDPSTP_LDP);
> -  gcc_checking_assert (load_p || attr == LDPSTP_STP);
> -
>    rtx regs[2];
>    rtx mem = NULL_RTX;
> -  if (load_p)
> -    mem = aarch64_destructure_load_pair (regs, PATTERN (rti));
> -  else
> -    mem = aarch64_destructure_store_pair (regs, PATTERN (rti));
> +
> +  mem = pass->gen_destructure_pair (regs, PATTERN (insn->rtl ()), load_p);
> +
>    gcc_checking_assert (MEM_P (mem));
>  
>    poly_int64 offset;
> @@ -3120,9 +3300,10 @@ try_promote_writeback (insn_info *insn)
>    def_info *add_def;
>    const insn_range_info pair_range (insn);
>    insn_info *insns[2] = { nullptr, insn };
> -  insn_info *trailing_add = find_trailing_add (insns, pair_range, 0, &wb_effect,
> -					       &add_def, base_def, offset,
> -					       access_size);
> +  insn_info *trailing_add
> +    = pass->find_trailing_add (insns, pair_range, 0, &wb_effect,
> +			       &add_def, base_def, offset,
> +			       access_size);
>    if (!trailing_add)
>      return;
>  
> @@ -3132,8 +3313,9 @@ try_promote_writeback (insn_info *insn)
>    insn_change del_change (trailing_add, insn_change::DELETE);
>    insn_change *changes[] = { &pair_change, &del_change };
>  
> -  rtx pair_pat = aarch64_gen_writeback_pair (wb_effect, mem, regs, load_p);
> -  validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
> +  rtx pair_pat = pass->gen_writeback_pair (wb_effect, mem, regs, load_p);
> +

No need for this blank line IMO.

> +  validate_unshare_change (insn->rtl(), &PATTERN (insn->rtl()), pair_pat, true);
>  
>    // The pair must gain the def of the base register from the add.
>    pair_change.new_defs = insert_access (attempt,
> @@ -3167,14 +3349,12 @@ try_promote_writeback (insn_info *insn)
>  // for load/store candidates.  If running after RA, also try and promote
>  // non-writeback pairs to use writeback addressing.  Then try to fuse
>  // candidates into pairs.
> -void ldp_fusion_bb (bb_info *bb)
> +void pair_fusion::ldp_fusion_bb (bb_info *bb)
>  {
> -  const bool track_loads
> -    = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> -  const bool track_stores
> -    = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> +  const bool track_loads = track_loads_p ();
> +  const bool track_stores = track_stores_p ();
>  
> -  ldp_bb_info bb_state (bb);
> +  ldp_bb_info bb_state (bb, this);
>  
>    for (auto insn : bb->nondebug_insns ())
>      {
> @@ -3184,11 +3364,9 @@ void ldp_fusion_bb (bb_info *bb)
>  	continue;
>  
>        rtx pat = PATTERN (rti);
> -      if (reload_completed
> -	  && aarch64_ldp_writeback > 1
> -	  && GET_CODE (pat) == PARALLEL
> -	  && XVECLEN (pat, 0) == 2)
> -	try_promote_writeback (insn);
> +      bool load_p;
> +      if (pair_mem_promote_writeback_p (insn, pat, load_p))
> +	try_promote_writeback (insn, load_p, this);

As above, if we make try_promote_writeback a member function of pair_fusion,
the this pointer would be implicit, I think that would be a lot cleaner.

Also, sorry for the run-around, but I think the function
pair_mem_promote_writeback_p is trying to do too many things at once.

I think we should keep these checks (making the latter suitable for generic
code):

  if (reload_completed
      && aarch64_ldp_writeback > 1

in ldp_fusion_bb, at which point pair_mem_promote_writeback_p just
becomes a test for "is this insn a (non-writeback) paired memory
access? (and if so, is it a load?)"

So ultimately I think we want the following in ldp_fusion_bb:

  bool load_p;
  if (reload_completed
      && handle_writeback_opportunities (writeback::ALL)
      && pair_mem_insn_p (rti, load_p))
    try_promote_writeback (insn, load_p);

  rtx pat = PATTERN (rti);
  [...]

It probably makes most sense to pass just the rtx_insn * to
pair_mem_insn_p instead of passing both an insn_info * and an rtx for
the PATTERN.

>  
>        if (GET_CODE (pat) != SET)
>  	continue;
> @@ -3205,12 +3383,8 @@ void ldp_fusion_bb (bb_info *bb)
>  
>  void ldp_fusion ()
>  {
> -  ldp_fusion_init ();
> -
> -  for (auto bb : crtl->ssa->bbs ())
> -    ldp_fusion_bb (bb);
> -
> -  ldp_fusion_destroy ();
> +  aarch64_pair_fusion pass;
> +  pass.run ();
>  }

I think we mentioned this before, but since this is now a very simple
two-line function, can we inline it into pass_ldp_fusion::execute?
ISTM there's no need for the additional wrapper.

>  
>  namespace {
> @@ -3242,14 +3416,6 @@ public:
>        if (!optimize || optimize_debug)
>  	return false;
>  
> -      // If the tuning policy says never to form ldps or stps, don't run
> -      // the pass.
> -      if ((aarch64_tune_params.ldp_policy_model
> -	   == AARCH64_LDP_STP_POLICY_NEVER)
> -	  && (aarch64_tune_params.stp_policy_model
> -	      == AARCH64_LDP_STP_POLICY_NEVER))
> -	return false;
> -
>        if (reload_completed)
>  	return flag_aarch64_late_ldp_fusion;
>        else
> -- 
> 2.39.3
> 

Thanks,
Alex
Ajit Agarwal May 9, 2024, 10:16 a.m. UTC | #2
Hello Alex:

On 08/05/24 6:18 pm, Alex Coplan wrote:
> Hi Ajit,
> 
> Sorry for the long delay in reviewing this.
> 
> This is really getting there now.  I've left a few more comments below.
> 
> Apart from minor style things, the main remaining issues are mostly
> around comments.  It's important to have good clear comments for
> functions with the parameters (and return value, if any) clearly
> described.  See https://www.gnu.org/prep/standards/standards.html#Comments
> 
> Note that this now needs a little rebasing, too.
> 

Done.

> On 21/04/2024 13:22, Ajit Agarwal wrote:
>> Hello Alex/Richard:
>>
>> All review comments are addressed and changes are made to transform_for_base
>> function as per consensus.
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> Bootstrapped on aarch64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>>
>>
>> aarch64: Preparatory patch to place target independent and
>> dependent changed code in one file
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> 2024-04-21  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
>>
>> gcc/ChangeLog:
>>
>> 	* config/aarch64/aarch64-ldp-fusion.cc: Place target
>> 	independent and dependent changed code
>> ---
>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 484 +++++++++++++++--------
>>  1 file changed, 325 insertions(+), 159 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 365dcf48b22..83a917e1d20 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> @@ -138,6 +138,189 @@ struct alt_base
>>    poly_int64 offset;
>>  };
>>  
>> +// Virtual base class for load/store walkers used in alias analysis.
>> +struct alias_walker
>> +{
>> +  virtual bool conflict_p (int &budget) const = 0;
>> +  virtual insn_info *insn () const = 0;
>> +  virtual bool valid () const = 0;
>> +  virtual void advance () = 0;
>> +};
>> +
>> +// Forward declaration to be used inside the aarch64_pair_fusion class.
>> +bool ldp_operand_mode_ok_p (machine_mode mode);
>> +rtx aarch64_destructure_load_pair (rtx regs[2], rtx pattern);
>> +rtx aarch64_destructure_store_pair (rtx regs[2], rtx pattern);
>> +rtx aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
>> +			    bool load_p);
> 
> I don't think we want to change the linkage of these, they should be kept
> static.
> 
>> +enum class writeback{
> 
> Nit: space before '{'
> 
>> +  WRITEBACK_PAIR_P,
>> +  WRITEBACK
>> +};
> 
> We're going to want some more descriptive names here.  How about
> EXISTING and ALL?  Note that the WRITEBACK_ prefix isn't necessary as
> you're using an enum class, so uses of the enumerators need to be
> prefixed with writeback:: anyway.  A comment describing the usage of the
> enum as well as comments above the enumerators describing their
> interpretation would be good.
> 

Done.
>> +
>> +struct pair_fusion {
>> +
> 
> Nit: excess blank line.
> 
>> +  pair_fusion ()
>> +  {
>> +    calculate_dominance_info (CDI_DOMINATORS);
>> +    df_analyze ();
>> +    crtl->ssa = new rtl_ssa::function_info (cfun);
>> +  };
> 
> Can we have one blank line between the virtual functions, please?  I
> think that would be more readable now that there are comments above each
> of them.
> 

Done.

>> +  // Return true if GPR is FP or SIMD accesses, passed
>> +  // with GPR reg_op rtx, machine mode and load_p.
> 
> It's slightly awkward trying to document this without the parameter
> names, but I can see that you're omitting them to avoid unused parameter
> warnings.  One option would be to introduce names in the comment as you
> go.  How about this instead:
> 
> // Given:
> // - an rtx REG_OP, the non-memory operand in a load/store insn,
> // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> // - a boolean LOAD_P (true iff the insn is a load), then:
> // return true if the access should be considered an FP/SIMD access.
> // Such accesses are segregated from GPR accesses, since we only want to
> // form pairs for accesses that use the same register file.
> 

Done.
>> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
>> +  {
>> +    return false;
>> +  }
>> +  // Return true if pair operand mode is ok. Passed with
>> +  // machine mode.
> 
> Could you use something closer to the comment that is already above
> ldp_operand_mode_ok_p?  The purpose of this predicate is really to test
> the following: "is it a good idea (for optimization) to form paired
> accesses with this operand mode at this stage in compilation?"
> 

Done.
>> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>> +  // Return true if reg operand is ok, passed with load_p,
>> +  // reg_op rtx and machine mode.
> 
> Can you try to use the parameter names in the comment where possible?
> The comment needs to be a bit more precise, what does "ok" mean here?
> How about:
> 
> // Return true iff REG_OP is a suitable register operand for a paired
> // memory access, where LOAD_P is true if we're asking about loads and
> // false for stores.  MEM_MODE gives the mode of the operand.
> 
> Also ...
> 
>> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>> +				      machine_mode mem_mode) = 0;
> 
> ... can we use just mode (instead of mem_mode) as the parameter name
> here?  We're only using the mem mode because the non-memory operand
> might be a const_int, in which case we need to infer the mode of the
> register operand from the mode of the memory operand.  But really it's
> just the mode of REG_OP we're passing here.
> 

Done.
>> +  // Return alias check limit.
>> +  virtual int pair_mem_alias_check_limit () = 0;
> 
> Can you extend this comment with something like:  "This is needed to
> avoid unbounded quadratic behaviour when performing alias analysis."
> 
>> +  // Return true if there is writeback opportunities. Passed
>> +  // with enum writeback.
> 
> Not exactly.  This returns true if we should try to handle writeback
> opportunities (not whether there are any).
> 
>> +  virtual bool handle_writeback_opportunities (enum writeback wback) = 0 ;
>> +  // Return true if mem ok ldp stp policy model passed with
>> +  // rtx mem, load_p and machine mode.
> 
> This doesn't parse.  Please can you write full grammatically-correct
> sentences in comments?
> 
Done.

> I can't see a need to pass the mode separately here (the current code
> just passes GET_MODE (first_mem)), so I think we only need two parameters
> here: first_mem and load_p.  Let's rename s/first_mem/base_mem/ here, too.
> 
> Then the comment can read something like:
> 
> // Given BASE_MEM, the mem from the lower candidate access for a pair,
> // and LOAD_P (true if the access is a load), check if we should proceed
> // to form the pair given the target's code generation policy on
> // paired accesses.
> 
>> +  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
>> +					machine_mode mode) = 0;
>> +  // Gen load store mem pair. Return load store rtx passed
>> +  // with arguments load store pattern, writeback rtx and
>> +  // load_p.
> 

Done.
> As above, let's use the formal parameter names here.  This comment could
> also do with more detail.  How about:
> 
> // Generate the pattern for a paired access.  PATS gives the patterns
> // for the individual memory accesses (which by this point must share a
> // common base register).  If WRITEBACK is non-NULL, then this rtx
> // describes the update to the base register that should be performed by
> // the resulting insn.  LOAD_P is true iff the accesses are loads.
> 
>> +  virtual rtx gen_mem_pair (rtx *pats, rtx writeback,
>> +			    bool load_p) = 0;
> 
> Nit on naming: how about just gen_pair?  It is clear from being a member
> function of the pass what sort of pair we are talking about.
> 
>> +  // Return true if memory writeback can be promoted, passed
>> +  // with insn, rtx pattern and load_p. load_p is set by this
>> +  // hook.
>> +  virtual bool pair_mem_promote_writeback_p (insn_info *, rtx, bool &)
>> +  {
>> +     return false;
>> +  }
> 
> I think we should change the interface here, see the comments in
> ldp_fusion_bb below.
> 
>> +  // Return true if we track loads.
> 
> I think this reads better if you say "if we _should_ track loads".
> Likewise for the store case below.
> 
Done.
>> +  virtual bool track_loads_p ()
>> +  {
>> +    return true;
>> +  }
>> +  // Return true if we track stores.
>> +  virtual bool track_stores_p ()
>> +  {
>> +    return true;
>> +  }
>> +  // Return true if offset is out of range.
>> +  virtual bool pair_mem_out_of_range_p (HOST_WIDE_INT off) = 0;
>> +  // Return destructure pair. Passed with rtx reg, insn pattern
>> +  // and load_p.
>> +  virtual rtx gen_destructure_pair (rtx regs[2], rtx rti, bool load_p) = 0;
> 
> So firstly this shouldn't be prefixed with gen_ as there is no RTL generation
> going on, it's just pulling apart an existing pair insn.
> 
> On the comment: again you should use the formal parameter names and describe
> the purpose of each parameter.
> 

Done.
>> +  // Return writeback pair. Passed with rtx writeback effect, mem rtx
>> +  // regs rtx and load_p.
>> +  virtual rtx gen_writeback_pair (rtx wb_effect, rtx mem,
>> +				  rtx regs[2], bool load_p) = 0;
> 
> I realise this is pre-existing, but: I think we want a more descriptive name
> for this function.  The key bit of the comment above
> aarch64_gen_writeback_pair is:
> 
> // This is used when promoting existing non-writeback pairs to writeback
> // variants.
> 
> so how about gen_promote_writeback_pair instead?  Including that part of
> the comment above would be good, too.  I wasn't sure on a first glance
> how this function differed from your gen_mem_pair function, which can
> also generate writeback pairs.
> 

Done.
>> +  void ldp_fusion_bb (bb_info *bb);
>> +  insn_info * find_trailing_add (insn_info *insns[2],
> 
> Style nit: no space after *.  Also this should probably be marked inline
> as we want the linkage kept within this file.  Likewise with the next
> two functions.
> 
>> +				 const insn_range_info &pair_range,
>> +				 int initial_writeback,
>> +				 rtx *writeback_effect,
>> +				 def_info **add_def,
>> +				 def_info *base_def,
>> +				 poly_int64 initial_offset,
>> +				 unsigned access_size);
>> +  int get_viable_bases (insn_info *insns[2],
>> +			vec<base_cand> &base_cands,
>> +			rtx cand_mems[2],
>> +			unsigned access_size,
>> +			bool reversed);
>> +  void do_alias_analysis (insn_info *alias_hazards[4],
>> +			  alias_walker *walkers[4],
>> +			  bool load_p);
>> +  void run ();
>> +  ~pair_fusion()
>> +  {
>> +    if (crtl->ssa->perform_pending_updates ())
>> +      cleanup_cfg (0);
>> +
>> +    free_dominance_info (CDI_DOMINATORS);
>> +
>> +    delete crtl->ssa;
>> +    crtl->ssa = nullptr;
>> +  }
>> +};
>> +
>> +void
>> +pair_fusion::run ()
> 
> This function could do with a comment.
> 

Done.
>> +{
>> +  if (!track_loads_p () && !track_stores_p ())
>> +    return;
>> +
>> +  for (auto bb : crtl->ssa->bbs ())
>> +    ldp_fusion_bb (bb);
>> +}
>> +
>> +struct aarch64_pair_fusion : public pair_fusion
>> +{
>> +  bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
>> +		    bool load_p) override final
>> +  {
>> +    return reload_completed
>> +      ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
>> +      : (GET_MODE_CLASS (mem_mode) != MODE_INT
>> +	 && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
>> +  }
>> +  bool pair_mem_promote_writeback_p (insn_info *insn, rtx pat, bool &load_p);
> 
> All of these other virtual overrides should be marked with the
> override final keywords, too.
> 

Done.
>> +  bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
>> +				machine_mode mode)
>> +  {
>> +    return aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
>> +						     load_p,
>> +						     mode);
>> +  }
>> +  bool pair_operand_mode_ok_p (machine_mode mode)
>> +  {
>> +    return ldp_operand_mode_ok_p (mode);
>> +  }
> 
> Here you can just rename the static function ldp_operand_mode_ok_p
> to aarch64_pair_fusion::pair_operand_mode_ok_p.  No need for the
> wrapper function.
> 
>> +  rtx gen_mem_pair (rtx *pats, rtx writeback, bool load_p);
>> +  bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>> +			      machine_mode mem_mode)
>> +  {
>> +    return (load_p
>> +	     ? aarch64_ldp_reg_operand (reg_op, mem_mode)
>> +	     : aarch64_stp_reg_operand (reg_op, mem_mode));
>> +  }
>> +  int pair_mem_alias_check_limit ()
>> +  {
>> +    return aarch64_ldp_alias_check_limit;
>> +  }
>> +  bool handle_writeback_opportunities (enum writeback wback)
> 
> Just a suggestion: you could call this parameter "which" as it is
> talking about which opportunities we want to handle.
> 

Done.
>> +  {
>> +    if (wback == writeback::WRITEBACK_PAIR_P)
>> +      return aarch64_ldp_writeback > 1;
>> +    else
>> +      return aarch64_ldp_writeback;
>> +  }
>> +  bool track_loads_p ()
>> +  {
>> +    return
>> +      aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> 
> Style nit: I _think_ this would be better formatted with the LHS of the
> expression on the same line as the return keyword, then the != on a new
> line.  I.e.:
> 
> return aarch64_tune_params.ldp_policy_model
>        != AARCH64_LDP_STP_POLICY_NEVER;
> 

Done.
>> +  }
>> +  bool track_stores_p ()
>> +  {
>> +    return
>> +      aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> 
> Likewise.
> 

Done.
>> +  }
>> +  bool pair_mem_out_of_range_p (HOST_WIDE_INT off)
>> +  {
>> +    return (off < LDP_MIN_IMM || off > LDP_MAX_IMM);
>> +  }
> 
> When I said (in
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648878.html):
> 
>> Please can you invert the condition and fix up callers appropriately?
> 
> I was expecting you to keep the function named pair_mem_in_range_p and invert
> the condition, then update callers to introduce negations.
> 
> I think it's more natural to have the function test for the positive condition
> (is this offset in range) and then callers negate it if they want to check if
> an offset is out of range.  That matches existing practice elsewhere.
> 

Done.
>> +  rtx gen_writeback_pair (rtx wb_effect, rtx mem, rtx regs[2], bool load_p)
>> +  {
>> +    return aarch64_gen_writeback_pair (wb_effect, mem, regs, load_p);
>> +  }
> 

Done.
> Just rename the static function aarch64_gen_writeback_pair to
> aarch64_pair_fusion::gen_writeback_pair, then there is no need for this
> wrapper.
> 
>> +  rtx gen_destructure_pair (rtx regs[2], rtx rti, bool load_p);
>> +};
>> +
>>  // State used by the pass for a given basic block.
>>  struct ldp_bb_info
>>  {
>> @@ -160,8 +343,11 @@ struct ldp_bb_info
>>  
>>    static const size_t obstack_alignment = sizeof (void *);
>>    bb_info *m_bb;
>> +  pair_fusion *m_pass;
> 
> I realise this is pre-existing, but can m_bb and m_pass both be private
> members?
> 

Done.
>>  
>> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>> +  ldp_bb_info (bb_info *bb,
>> +		      pair_fusion *d) : m_bb (bb),
>> +		      m_pass (d), m_emitted_tombstone (false)
> 
> I think the preferred style is to have the parameters on the same line
> and the initializer list (starting with ':') on a new line.
> 
>>    {
>>      obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
>>  				obstack_alignment, obstack_chunk_alloc,
>> @@ -177,10 +363,28 @@ struct ldp_bb_info
>>  	bitmap_obstack_release (&m_bitmap_obstack);
>>        }
>>    }
>> -
> 
> Any need for this whitespace change?
> 

Done.
>>    inline void track_access (insn_info *, bool load, rtx mem);
>>    inline void transform ();
>>    inline void cleanup_tombstones ();
>> +  inline void merge_pairs (insn_list_t &, insn_list_t &,
>> +		    bool load_p, unsigned access_size);
>> +  inline void transform_for_base (int load_size, access_group &group);
>> +
>> +  inline bool try_fuse_pair (bool load_p, unsigned access_size,
>> +			     insn_info *i1, insn_info *i2);
>> +
>> +  inline bool fuse_pair (bool load_p, unsigned access_size,
>> +		  int writeback,
>> +		  insn_info *i1, insn_info *i2,
>> +		  base_cand &base,
>> +		  const insn_range_info &move_range);
>> +
>> +  inline void track_tombstone (int uid);
>> +
>> +  inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
>> +
>> +  template<typename Map>
>> +    void traverse_base_map (Map &map);
> 
> What's the reason for changing the access of these functions?
> Can they be kept private?
> 
Done.
>>  
>>  private:
>>    obstack m_obstack;
>> @@ -191,27 +395,60 @@ private:
>>    bool m_emitted_tombstone;
>>  
>>    inline splay_tree_node<access_record *> *node_alloc (access_record *);
>> +};
>>  
>> -  template<typename Map>
>> -  inline void traverse_base_map (Map &map);
>> -  inline void transform_for_base (int load_size, access_group &group);
>> -
>> -  inline void merge_pairs (insn_list_t &, insn_list_t &,
>> -			   bool load_p, unsigned access_size);
>> +bool
>> +aarch64_pair_fusion::pair_mem_promote_writeback_p (insn_info *insn, rtx pat,
>> +						   bool &load_p)
>> +{
>> +  if (reload_completed
>> +      && aarch64_ldp_writeback > 1
>> +      && GET_CODE (pat) == PARALLEL
>> +      && XVECLEN (pat, 0) == 2)
>> +    {
>> +      auto rti = insn->rtl ();
>> +      const auto attr = get_attr_ldpstp (rti);
>> +      if (attr == LDPSTP_NONE)
>> +	return false;
>>  
>> -  inline bool try_fuse_pair (bool load_p, unsigned access_size,
>> -			     insn_info *i1, insn_info *i2);
>> +      load_p = (attr == LDPSTP_LDP);
>> +      gcc_checking_assert (load_p || attr == LDPSTP_STP);
>> +      return true;
>> +    }
>> +  return false;
>> +}
>>  
>> -  inline bool fuse_pair (bool load_p, unsigned access_size,
>> -			 int writeback,
>> -			 insn_info *i1, insn_info *i2,
>> -			 base_cand &base,
>> -			 const insn_range_info &move_range);
>> +rtx
>> +aarch64_pair_fusion::gen_destructure_pair (rtx regs[2], rtx rti, bool load_p)
>> +{
>> +  if (load_p)
>> +    return aarch64_destructure_load_pair (regs, rti);
>> +  else
>> +    return aarch64_destructure_store_pair (regs, rti);
>> +}
>>  
>> -  inline void track_tombstone (int uid);
>> +rtx
>> +aarch64_pair_fusion::gen_mem_pair (rtx *pats,
>> +		  rtx writeback,
>> +		  bool load_p)
>> +{
>> +  rtx pair_pat;
>>  
>> -  inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
>> -};
>> +  if (writeback)
>> +    {
>> +      auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]);
>> +      return  gen_rtx_PARALLEL (VOIDmode, patvec);
>> +    }
>> +  else if (load_p)
>> +    return aarch64_gen_load_pair (XEXP (pats[0], 0),
>> +				  XEXP (pats[1], 0),
>> +				  XEXP (pats[0], 1));
>> +  else
>> +    return aarch64_gen_store_pair (XEXP (pats[0], 0),
>> +				   XEXP (pats[0], 1),
>> +				   XEXP (pats[1], 1));
>> +  return pair_pat;
>> +}
>>  
>>  splay_tree_node<access_record *> *
>>  ldp_bb_info::node_alloc (access_record *access)
>> @@ -312,7 +549,7 @@ any_post_modify_p (rtx x)
>>  
>>  // Return true if we should consider forming ldp/stp insns from memory
>>  // accesses with operand mode MODE at this stage in compilation.
>> -static bool
>> +bool
>>  ldp_operand_mode_ok_p (machine_mode mode)
>>  {
>>    const bool allow_qregs
>> @@ -412,9 +649,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
>>    const machine_mode mem_mode = GET_MODE (mem);
>>    const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
>>  
>> -  // Punt on misaligned offsets.  LDP/STP instructions require offsets to be a
>> -  // multiple of the access size, and we believe that misaligned offsets on
>> -  // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL bases.
>> +  // Punt on misaligned offsets.  Paired memory access instructions require
>> +  // offsets to be a multiple of the access size, and we believe that
>> +  // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned
>> +  // offsets w.r.t. RTL bases.
>>    if (!multiple_p (offset, mem_size))
>>      return false;
>>  
>> @@ -438,10 +676,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
>>  }
>>  
>>  // Main function to begin pair discovery.  Given a memory access INSN,
>> -// determine whether it could be a candidate for fusing into an ldp/stp,
>> -// and if so, track it in the appropriate data structure for this basic
>> -// block.  LOAD_P is true if the access is a load, and MEM is the mem
>> -// rtx that occurs in INSN.
>> +// determine whether it could be a candidate for fusing into a paired
>> +// access, and if so, track it in the appropriate data structure for
>> +// this basic block.  LOAD_P is true if the access is a load, and MEM
>> +//  is the mem rtx that occurs in INSN.
>>  void
>>  ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>>  {
>> @@ -449,35 +687,26 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>>    if (MEM_VOLATILE_P (mem))
>>      return;
>>  
>> -  // Ignore writeback accesses if the param says to do so.
>> -  if (!aarch64_ldp_writeback
>> +  // Ignore writeback accesses if the hook says to do so.
>> +  if (!m_pass->handle_writeback_opportunities (writeback::WRITEBACK)
>>        && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
>>      return;
>>  
>>    const machine_mode mem_mode = GET_MODE (mem);
>> -  if (!ldp_operand_mode_ok_p (mem_mode))
>> +  if (!m_pass->pair_operand_mode_ok_p (mem_mode))
>>      return;
>>  
>>    rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p);
>>  
>> -  // Ignore the access if the register operand isn't suitable for ldp/stp.
>> -  if (load_p
>> -      ? !aarch64_ldp_reg_operand (reg_op, mem_mode)
>> -      : !aarch64_stp_reg_operand (reg_op, mem_mode))
>> +  if (!m_pass->pair_reg_operand_ok_p (load_p, reg_op, mem_mode))
>>      return;
>> -
> 
> Let's keep this newline.
> 
>>    // We want to segregate FP/SIMD accesses from GPR accesses.
>>    //
>>    // Before RA, we use the modes, noting that stores of constant zero
>>    // operands use GPRs (even in non-integer modes).  After RA, we use
>>    // the hard register numbers.
> 
> The second para of this comment would probably be best moved to the aarch64
> implementation of the hook (with the first sentence duplicated in both places).
> 
>> -  const bool fpsimd_op_p
>> -    = reload_completed
>> -    ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
>> -    : (GET_MODE_CLASS (mem_mode) != MODE_INT
>> -       && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
>> -
>> -  // Note ldp_operand_mode_ok_p already rejected VL modes.
>> +  const bool fpsimd_op_p = m_pass->fpsimd_op_p (reg_op, mem_mode, load_p);
> 
> Might be more readable with a blank line inserted here.
>
Done.
 
>> +  // Note pair_operand_mode_ok_p already rejected VL modes.
>>    const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
>>    const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size };
>>  
>> @@ -506,8 +735,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>>    // elimination offset pre-RA, we should postpone forming pairs on such
>>    // accesses until after RA.
>>    //
>> -  // As it stands, addresses with offsets in range for LDR but not
>> -  // in range for LDP/STP are currently reloaded inefficiently,
>> +  // As it stands, addresses in range for an individual load/store but not
>> +  // for a paired access are currently reloaded inefficiently,
>>    // ending up with a separate base register for each pair.
>>    //
>>    // In theory LRA should make use of
>> @@ -519,8 +748,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>>    // that calls targetm.legitimize_address_displacement.
>>    //
>>    // So for now, it's better to punt when we can't be sure that the
>> -  // offset is in range for LDP/STP.  Out-of-range cases can then be
>> -  // handled after RA by the out-of-range LDP/STP peepholes.  Eventually, it
>> +  // offset is in range for paired access.  Out-of-range cases can then be
>> +  // handled after RA by the out-of-range PAIR MEM  peepholes.  Eventually, it
> 
> Does this part really apply to rs6000?  Do you have peephole2 patterns for lxvp?
> If not you may want to caveat this part as being specific to aarch64.  E.g. "On
> aarch64, out-of-range cases ..."
> 
>>    // would be nice to handle known out-of-range opportunities in the
>>    // pass itself (for stack accesses, this would be in the post-RA pass).
>>    if (!reload_completed
>> @@ -573,8 +802,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>>  	gcc_unreachable (); // Base defs should be unique.
>>      }
>>  
>> -  // Punt on misaligned offsets.  LDP/STP require offsets to be a multiple of
>> -  // the access size.
>> +  // Punt on misaligned offsets.  Paired memory accesses require offsets
>> +  // to be a multiple of the access size.
>>    if (!multiple_p (mem_off, mem_size))
>>      return;
>>  
>> @@ -1207,8 +1436,8 @@ extract_writebacks (bool load_p, rtx pats[2], int changed)
>>  // base register.  If there is one, we choose the first such update after
>>  // PAIR_DST that is still in the same BB as our pair.  We return the new def in
>>  // *ADD_DEF and the resulting writeback effect in *WRITEBACK_EFFECT.
>> -static insn_info *
>> -find_trailing_add (insn_info *insns[2],
>> +insn_info *
>> +pair_fusion::find_trailing_add (insn_info *insns[2],
>>  		   const insn_range_info &pair_range,
>>  		   int initial_writeback,
>>  		   rtx *writeback_effect,
>> @@ -1286,7 +1515,7 @@ find_trailing_add (insn_info *insns[2],
>>  
>>    off_hwi /= access_size;
>>  
>> -  if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM)
>> +  if (pair_mem_out_of_range_p (off_hwi))
>>      return nullptr;
>>  
>>    auto dump_prefix = [&]()
>> @@ -1800,7 +2029,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>>      {
>>        if (dump_file)
>>  	fprintf (dump_file,
>> -		 "  ldp: i%d has wb but subsequent i%d has non-wb "
>> +		 "  load pair: i%d has wb but subsequent i%d has non-wb "
>>  		 "update of base (r%d), dropping wb\n",
>>  		 insns[0]->uid (), insns[1]->uid (), base_regno);
>>        gcc_assert (writeback_effect);
>> @@ -1823,7 +2052,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>>      }
>>  
>>    // If either of the original insns had writeback, but the resulting pair insn
>> -  // does not (can happen e.g. in the ldp edge case above, or if the writeback
>> +  // does not (can happen e.g. in the load pair edge case above, or if the writeback
>>    // effects cancel out), then drop the def(s) of the base register as
>>    // appropriate.
>>    //
>> @@ -1842,7 +2071,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>>    // update of the base register and try and fold it in to make this into a
>>    // writeback pair.
>>    insn_info *trailing_add = nullptr;
>> -  if (aarch64_ldp_writeback > 1
>> +  if (m_pass->handle_writeback_opportunities (writeback::WRITEBACK_PAIR_P)
>>        && !writeback_effect
>>        && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1,
>>  					 XEXP (pats[0], 0), nullptr)
>> @@ -1850,7 +2079,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>>  					     XEXP (pats[1], 0), nullptr))))
>>      {
>>        def_info *add_def;
>> -      trailing_add = find_trailing_add (insns, move_range, writeback,
>> +      trailing_add = m_pass->find_trailing_add (insns, move_range, writeback,
>>  					&writeback_effect,
>>  					&add_def, base.def, offsets[0],
>>  					access_size);
>> @@ -1863,14 +2092,14 @@ ldp_bb_info::fuse_pair (bool load_p,
>>      }
>>  
>>    // Now that we know what base mem we're going to use, check if it's OK
>> -  // with the ldp/stp policy.
>> +  // with the pair mem policy.
>>    rtx first_mem = XEXP (pats[0], load_p);
>> -  if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
>> -						load_p,
>> -						GET_MODE (first_mem)))
>> +  if (!m_pass->pair_mem_ok_with_policy (first_mem,
>> +			  load_p,
>> +			  GET_MODE (first_mem)))
> 
> The indentation looks off here.  I suppose when you drop the mode parameter
> (as mentioned above) this will all fit on one line anyway.
> 
>>      {
>>        if (dump_file)
>> -	fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n",
>> +	fprintf (dump_file, "punting on pair (%d,%d), pair mem  policy says no\n",
> 
> Nit: excess space after mem.
> 
Done.
>>  		 i1->uid (), i2->uid ());
>>        return false;
>>      }
>> @@ -1878,21 +2107,10 @@ ldp_bb_info::fuse_pair (bool load_p,
>>    rtx reg_notes = combine_reg_notes (first, second, load_p);
>>  
>>    rtx pair_pat;
>> -  if (writeback_effect)
>> -    {
>> -      auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]);
>> -      pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec);
>> -    }
>> -  else if (load_p)
>> -    pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0),
>> -				      XEXP (pats[1], 0),
>> -				      XEXP (pats[0], 1));
>> -  else
>> -    pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0),
>> -				       XEXP (pats[0], 1),
>> -				       XEXP (pats[1], 1));
>>  
>> +  pair_pat = m_pass->gen_mem_pair (pats, writeback_effect, load_p);
> 
> Please can you declare pair_pat on the same line it is set, so that we have:
> 
>   rtx pair_pat = m_pass->gen_mem_pair (pats, writeback_effect, load_p);
> 
> 

Done.
>>    insn_change *pair_change = nullptr;
>> +
> 
> Spurious whitespace change?
> 
>>    auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) {
>>      rtx_insn *rti = change->insn ()->rtl ();
>>      validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
>> @@ -2133,15 +2351,6 @@ load_modified_by_store_p (insn_info *load,
>>    return false;
>>  }
>>  
>> -// Virtual base class for load/store walkers used in alias analysis.
>> -struct alias_walker
>> -{
>> -  virtual bool conflict_p (int &budget) const = 0;
>> -  virtual insn_info *insn () const = 0;
>> -  virtual bool valid () const = 0;
>> -  virtual void advance () = 0;
>> -};
>> -
>>  // Implement some common functionality used by both store_walker
>>  // and load_walker.
>>  template<bool reverse>
>> @@ -2259,13 +2468,13 @@ public:
>>  //
>>  // We try to maintain the invariant that if a walker becomes invalid, we
>>  // set its pointer to null.
>> -static void
>> -do_alias_analysis (insn_info *alias_hazards[4],
>> +void
>> +pair_fusion::do_alias_analysis (insn_info *alias_hazards[4],
>>  		   alias_walker *walkers[4],
>>  		   bool load_p)
>>  {
>>    const int n_walkers = 2 + (2 * !load_p);
>> -  int budget = aarch64_ldp_alias_check_limit;
>> +  int budget = pair_mem_alias_check_limit ();
>>  
>>    auto next_walker = [walkers,n_walkers](int current) -> int {
>>      for (int j = 1; j <= n_walkers; j++)
>> @@ -2350,8 +2559,8 @@ do_alias_analysis (insn_info *alias_hazards[4],
>>  //
>>  // Returns an integer where bit (1 << i) is set if INSNS[i] uses writeback
>>  // addressing.
>> -static int
>> -get_viable_bases (insn_info *insns[2],
>> +int
>> +pair_fusion::get_viable_bases (insn_info *insns[2],
>>  		  vec<base_cand> &base_cands,
>>  		  rtx cand_mems[2],
>>  		  unsigned access_size,
>> @@ -2397,7 +2606,7 @@ get_viable_bases (insn_info *insns[2],
>>        if (!is_lower)
>>  	base_off--;
>>  
>> -      if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM)
>> +      if (pair_mem_out_of_range_p (base_off))
>>  	continue;
>>  
>>        use_info *use = find_access (insns[i]->uses (), REGNO (base));
>> @@ -2454,7 +2663,7 @@ get_viable_bases (insn_info *insns[2],
>>  }
>>  
>>  // Given two adjacent memory accesses of the same size, I1 and I2, try
>> -// and see if we can merge them into a ldp or stp.
>> +// and see if we can merge them into a paired access load and store.
> 
> Delete "load and store" here.
>

Done.
 
>>  //
>>  // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true
>>  // if the accesses are both loads, otherwise they are both stores.
>> @@ -2494,7 +2703,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>>      {
>>        if (dump_file)
>>  	fprintf (dump_file,
>> -		 "punting on ldp due to reg conflcits (%d,%d)\n",
>> +		 "punting on pair mem load  due to reg conflcits (%d,%d)\n",
>>  		 insns[0]->uid (), insns[1]->uid ());
>>        return false;
>>      }
>> @@ -2512,7 +2721,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>>  
>>    auto_vec<base_cand, 2> base_cands (2);
>>  
>> -  int writeback = get_viable_bases (insns, base_cands, cand_mems,
>> +  int writeback = m_pass->get_viable_bases (insns, base_cands, cand_mems,
>>  				    access_size, reversed);
>>    if (base_cands.is_empty ())
>>      {
>> @@ -2641,7 +2850,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>>      walkers[1] = &backward_store_walker;
>>  
>>    if (load_p && (mem_defs[0] || mem_defs[1]))
>> -    do_alias_analysis (alias_hazards, walkers, load_p);
>> +    m_pass->do_alias_analysis (alias_hazards, walkers, load_p);
>>    else
>>      {
>>        // We want to find any loads hanging off the first store.
>> @@ -2650,7 +2859,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>>        load_walker<true> backward_load_walker (mem_defs[1], insns[1], insns[0]);
>>        walkers[2] = &forward_load_walker;
>>        walkers[3] = &backward_load_walker;
>> -      do_alias_analysis (alias_hazards, walkers, load_p);
>> +      m_pass->do_alias_analysis (alias_hazards, walkers, load_p);
>>        // Now consolidate hazards back down.
>>        if (alias_hazards[2]
>>  	  && (!alias_hazards[0] || (*alias_hazards[2] < *alias_hazards[0])))
>> @@ -2891,7 +3100,7 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list,
>>  // merge_pairs.
>>  void
>>  ldp_bb_info::transform_for_base (int encoded_lfs,
>> -				 access_group &group)
>> +					 access_group &group)
> 
> Looks like a bad change in indentation.
> 

Done.
>>  {
>>    const auto lfs = decode_lfs (encoded_lfs);
>>    const unsigned access_size = lfs.size;
>> @@ -2964,29 +3173,9 @@ ldp_bb_info::transform ()
>>    traverse_base_map (def_map);
>>  }
>>  
>> -static void
>> -ldp_fusion_init ()
>> -{
>> -  calculate_dominance_info (CDI_DOMINATORS);
>> -  df_analyze ();
>> -  crtl->ssa = new rtl_ssa::function_info (cfun);
>> -}
>> -
>> -static void
>> -ldp_fusion_destroy ()
>> -{
>> -  if (crtl->ssa->perform_pending_updates ())
>> -    cleanup_cfg (0);
>> -
>> -  free_dominance_info (CDI_DOMINATORS);
>> -
>> -  delete crtl->ssa;
>> -  crtl->ssa = nullptr;
>> -}
>> -
>>  // Given a load pair insn in PATTERN, unpack the insn, storing
>>  // the registers in REGS and returning the mem.
>> -static rtx
>> +rtx
> 
Done.
> Again, no need to change the linkage here.
> 
>>  aarch64_destructure_load_pair (rtx regs[2], rtx pattern)
>>  {
>>    rtx mem = NULL_RTX;
>> @@ -3012,7 +3201,7 @@ aarch64_destructure_load_pair (rtx regs[2], rtx pattern)
>>  
>>  // Given a store pair insn in PATTERN, unpack the insn, storing
>>  // the register operands in REGS, and returning the mem.
>> -static rtx
>> +rtx
> 
> Likewise.

Done.
> 
>>  aarch64_destructure_store_pair (rtx regs[2], rtx pattern)
>>  {
>>    rtx mem = XEXP (pattern, 0);
>> @@ -3030,7 +3219,7 @@ aarch64_destructure_store_pair (rtx regs[2], rtx pattern)
>>  //
>>  // This is used when promoting existing non-writeback pairs to writeback
>>  // variants.
>> -static rtx
>> +rtx
>>  aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
>>  			    bool load_p)
>>  {
>> @@ -3068,22 +3257,13 @@ aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
>>  // the base register which we can fold in to make this pair use
>>  // a writeback addressing mode.
>>  static void
>> -try_promote_writeback (insn_info *insn)
>> +try_promote_writeback (insn_info *insn, bool load_p, pair_fusion *pass)
> 
> I think this would be better as a member function of pair_fusion.  Then
> there's no need to pass a pointer to the pass structure explicitly.
> 
>>  {
>> -  auto rti = insn->rtl ();
>> -  const auto attr = get_attr_ldpstp (rti);
>> -  if (attr == LDPSTP_NONE)
>> -    return;
>> -
>> -  bool load_p = (attr == LDPSTP_LDP);
>> -  gcc_checking_assert (load_p || attr == LDPSTP_STP);
>> -
>>    rtx regs[2];
>>    rtx mem = NULL_RTX;
>> -  if (load_p)
>> -    mem = aarch64_destructure_load_pair (regs, PATTERN (rti));
>> -  else
>> -    mem = aarch64_destructure_store_pair (regs, PATTERN (rti));
>> +
>> +  mem = pass->gen_destructure_pair (regs, PATTERN (insn->rtl ()), load_p);
>> +
>>    gcc_checking_assert (MEM_P (mem));
>>  
>>    poly_int64 offset;
>> @@ -3120,9 +3300,10 @@ try_promote_writeback (insn_info *insn)
>>    def_info *add_def;
>>    const insn_range_info pair_range (insn);
>>    insn_info *insns[2] = { nullptr, insn };
>> -  insn_info *trailing_add = find_trailing_add (insns, pair_range, 0, &wb_effect,
>> -					       &add_def, base_def, offset,
>> -					       access_size);
>> +  insn_info *trailing_add
>> +    = pass->find_trailing_add (insns, pair_range, 0, &wb_effect,
>> +			       &add_def, base_def, offset,
>> +			       access_size);
>>    if (!trailing_add)
>>      return;
>>  
>> @@ -3132,8 +3313,9 @@ try_promote_writeback (insn_info *insn)
>>    insn_change del_change (trailing_add, insn_change::DELETE);
>>    insn_change *changes[] = { &pair_change, &del_change };
>>  
>> -  rtx pair_pat = aarch64_gen_writeback_pair (wb_effect, mem, regs, load_p);
>> -  validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
>> +  rtx pair_pat = pass->gen_writeback_pair (wb_effect, mem, regs, load_p);
>> +
> 
> No need for this blank line IMO.
> 
>> +  validate_unshare_change (insn->rtl(), &PATTERN (insn->rtl()), pair_pat, true);
>>  
>>    // The pair must gain the def of the base register from the add.
>>    pair_change.new_defs = insert_access (attempt,
>> @@ -3167,14 +3349,12 @@ try_promote_writeback (insn_info *insn)
>>  // for load/store candidates.  If running after RA, also try and promote
>>  // non-writeback pairs to use writeback addressing.  Then try to fuse
>>  // candidates into pairs.
>> -void ldp_fusion_bb (bb_info *bb)
>> +void pair_fusion::ldp_fusion_bb (bb_info *bb)
>>  {
>> -  const bool track_loads
>> -    = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
>> -  const bool track_stores
>> -    = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
>> +  const bool track_loads = track_loads_p ();
>> +  const bool track_stores = track_stores_p ();
>>  
>> -  ldp_bb_info bb_state (bb);
>> +  ldp_bb_info bb_state (bb, this);
>>  
>>    for (auto insn : bb->nondebug_insns ())
>>      {
>> @@ -3184,11 +3364,9 @@ void ldp_fusion_bb (bb_info *bb)
>>  	continue;
>>  
>>        rtx pat = PATTERN (rti);
>> -      if (reload_completed
>> -	  && aarch64_ldp_writeback > 1
>> -	  && GET_CODE (pat) == PARALLEL
>> -	  && XVECLEN (pat, 0) == 2)
>> -	try_promote_writeback (insn);
>> +      bool load_p;
>> +      if (pair_mem_promote_writeback_p (insn, pat, load_p))
>> +	try_promote_writeback (insn, load_p, this);
> 
> As above, if we make try_promote_writeback a member function of pair_fusion,
> the this pointer would be implicit, I think that would be a lot cleaner.
> 
> Also, sorry for the run-around, but I think the function
> pair_mem_promote_writeback_p is trying to do too many things at once.
> 
> I think we should keep these checks (making the latter suitable for generic
> code):
> 
>   if (reload_completed
>       && aarch64_ldp_writeback > 1
> 
> in ldp_fusion_bb, at which point pair_mem_promote_writeback_p just
> becomes a test for "is this insn a (non-writeback) paired memory
> access? (and if so, is it a load?)"
> 
> So ultimately I think we want the following in ldp_fusion_bb:
> 
>   bool load_p;
>   if (reload_completed
>       && handle_writeback_opportunities (writeback::ALL)
>       && pair_mem_insn_p (rti, load_p))
>     try_promote_writeback (insn, load_p);
> 
>   rtx pat = PATTERN (rti);
>   [...]
> 

Done.
> It probably makes most sense to pass just the rtx_insn * to
> pair_mem_insn_p instead of passing both an insn_info * and an rtx for
> the PATTERN.
> 
>>  
>>        if (GET_CODE (pat) != SET)
>>  	continue;
>> @@ -3205,12 +3383,8 @@ void ldp_fusion_bb (bb_info *bb)
>>  
>>  void ldp_fusion ()
>>  {
>> -  ldp_fusion_init ();
>> -
>> -  for (auto bb : crtl->ssa->bbs ())
>> -    ldp_fusion_bb (bb);
>> -
>> -  ldp_fusion_destroy ();
>> +  aarch64_pair_fusion pass;
>> +  pass.run ();
>>  }
> 
> I think we mentioned this before, but since this is now a very simple
> two-line function, can we inline it into pass_ldp_fusion::execute?
> ISTM there's no need for the additional wrapper.
> 

Done.

Thanks & Regards
Ajit
>>  
>>  namespace {
>> @@ -3242,14 +3416,6 @@ public:
>>        if (!optimize || optimize_debug)
>>  	return false;
>>  
>> -      // If the tuning policy says never to form ldps or stps, don't run
>> -      // the pass.
>> -      if ((aarch64_tune_params.ldp_policy_model
>> -	   == AARCH64_LDP_STP_POLICY_NEVER)
>> -	  && (aarch64_tune_params.stp_policy_model
>> -	      == AARCH64_LDP_STP_POLICY_NEVER))
>> -	return false;
>> -
>>        if (reload_completed)
>>  	return flag_aarch64_late_ldp_fusion;
>>        else
>> -- 
>> 2.39.3
>>
> 
> Thanks,
> Alex
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 365dcf48b22..83a917e1d20 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -138,6 +138,189 @@  struct alt_base
   poly_int64 offset;
 };
 
+// Virtual base class for load/store walkers used in alias analysis.
+struct alias_walker
+{
+  virtual bool conflict_p (int &budget) const = 0;
+  virtual insn_info *insn () const = 0;
+  virtual bool valid () const = 0;
+  virtual void advance () = 0;
+};
+
+// Forward declaration to be used inside the aarch64_pair_fusion class.
+bool ldp_operand_mode_ok_p (machine_mode mode);
+rtx aarch64_destructure_load_pair (rtx regs[2], rtx pattern);
+rtx aarch64_destructure_store_pair (rtx regs[2], rtx pattern);
+rtx aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
+			    bool load_p);
+enum class writeback{
+  WRITEBACK_PAIR_P,
+  WRITEBACK
+};
+
+struct pair_fusion {
+
+  pair_fusion ()
+  {
+    calculate_dominance_info (CDI_DOMINATORS);
+    df_analyze ();
+    crtl->ssa = new rtl_ssa::function_info (cfun);
+  };
+  // Return true if GPR is FP or SIMD accesses, passed
+  // with GPR reg_op rtx, machine mode and load_p.
+  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
+  {
+    return false;
+  }
+  // Return true if pair operand mode is ok. Passed with
+  // machine mode.
+  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
+  // Return true if reg operand is ok, passed with load_p,
+  // reg_op rtx and machine mode.
+  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
+				      machine_mode mem_mode) = 0;
+  // Return alias check limit.
+  virtual int pair_mem_alias_check_limit () = 0;
+  // Return true if there is writeback opportunities. Passed
+  // with enum writeback.
+  virtual bool handle_writeback_opportunities (enum writeback wback) = 0 ;
+  // Return true if mem ok ldp stp policy model passed with
+  // rtx mem, load_p and machine mode.
+  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
+					machine_mode mode) = 0;
+  // Gen load store mem pair. Return load store rtx passed
+  // with arguments load store pattern, writeback rtx and
+  // load_p.
+  virtual rtx gen_mem_pair (rtx *pats, rtx writeback,
+			    bool load_p) = 0;
+  // Return true if memory writeback can be promoted, passed
+  // with insn, rtx pattern and load_p. load_p is set by this
+  // hook.
+  virtual bool pair_mem_promote_writeback_p (insn_info *, rtx, bool &)
+  {
+     return false;
+  }
+  // Return true if we track loads.
+  virtual bool track_loads_p ()
+  {
+    return true;
+  }
+  // Return true if we track stores.
+  virtual bool track_stores_p ()
+  {
+    return true;
+  }
+  // Return true if offset is out of range.
+  virtual bool pair_mem_out_of_range_p (HOST_WIDE_INT off) = 0;
+  // Return destructure pair. Passed with rtx reg, insn pattern
+  // and load_p.
+  virtual rtx gen_destructure_pair (rtx regs[2], rtx rti, bool load_p) = 0;
+  // Return writeback pair. Passed with rtx writeback effect, mem rtx
+  // regs rtx and load_p.
+  virtual rtx gen_writeback_pair (rtx wb_effect, rtx mem,
+				  rtx regs[2], bool load_p) = 0;
+  void ldp_fusion_bb (bb_info *bb);
+  insn_info * find_trailing_add (insn_info *insns[2],
+				 const insn_range_info &pair_range,
+				 int initial_writeback,
+				 rtx *writeback_effect,
+				 def_info **add_def,
+				 def_info *base_def,
+				 poly_int64 initial_offset,
+				 unsigned access_size);
+  int get_viable_bases (insn_info *insns[2],
+			vec<base_cand> &base_cands,
+			rtx cand_mems[2],
+			unsigned access_size,
+			bool reversed);
+  void do_alias_analysis (insn_info *alias_hazards[4],
+			  alias_walker *walkers[4],
+			  bool load_p);
+  void run ();
+  ~pair_fusion()
+  {
+    if (crtl->ssa->perform_pending_updates ())
+      cleanup_cfg (0);
+
+    free_dominance_info (CDI_DOMINATORS);
+
+    delete crtl->ssa;
+    crtl->ssa = nullptr;
+  }
+};
+
+void
+pair_fusion::run ()
+{
+  if (!track_loads_p () && !track_stores_p ())
+    return;
+
+  for (auto bb : crtl->ssa->bbs ())
+    ldp_fusion_bb (bb);
+}
+
+struct aarch64_pair_fusion : public pair_fusion
+{
+  bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
+		    bool load_p) override final
+  {
+    return reload_completed
+      ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
+      : (GET_MODE_CLASS (mem_mode) != MODE_INT
+	 && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
+  }
+  bool pair_mem_promote_writeback_p (insn_info *insn, rtx pat, bool &load_p);
+  bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
+				machine_mode mode)
+  {
+    return aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
+						     load_p,
+						     mode);
+  }
+  bool pair_operand_mode_ok_p (machine_mode mode)
+  {
+    return ldp_operand_mode_ok_p (mode);
+  }
+  rtx gen_mem_pair (rtx *pats, rtx writeback, bool load_p);
+  bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
+			      machine_mode mem_mode)
+  {
+    return (load_p
+	     ? aarch64_ldp_reg_operand (reg_op, mem_mode)
+	     : aarch64_stp_reg_operand (reg_op, mem_mode));
+  }
+  int pair_mem_alias_check_limit ()
+  {
+    return aarch64_ldp_alias_check_limit;
+  }
+  bool handle_writeback_opportunities (enum writeback wback)
+  {
+    if (wback == writeback::WRITEBACK_PAIR_P)
+      return aarch64_ldp_writeback > 1;
+    else
+      return aarch64_ldp_writeback;
+  }
+  bool track_loads_p ()
+  {
+    return
+      aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
+  }
+  bool track_stores_p ()
+  {
+    return
+      aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
+  }
+  bool pair_mem_out_of_range_p (HOST_WIDE_INT off)
+  {
+    return (off < LDP_MIN_IMM || off > LDP_MAX_IMM);
+  }
+  rtx gen_writeback_pair (rtx wb_effect, rtx mem, rtx regs[2], bool load_p)
+  {
+    return aarch64_gen_writeback_pair (wb_effect, mem, regs, load_p);
+  }
+  rtx gen_destructure_pair (rtx regs[2], rtx rti, bool load_p);
+};
+
 // State used by the pass for a given basic block.
 struct ldp_bb_info
 {
@@ -160,8 +343,11 @@  struct ldp_bb_info
 
   static const size_t obstack_alignment = sizeof (void *);
   bb_info *m_bb;
+  pair_fusion *m_pass;
 
-  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
+  ldp_bb_info (bb_info *bb,
+		      pair_fusion *d) : m_bb (bb),
+		      m_pass (d), m_emitted_tombstone (false)
   {
     obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
 				obstack_alignment, obstack_chunk_alloc,
@@ -177,10 +363,28 @@  struct ldp_bb_info
 	bitmap_obstack_release (&m_bitmap_obstack);
       }
   }
-
   inline void track_access (insn_info *, bool load, rtx mem);
   inline void transform ();
   inline void cleanup_tombstones ();
+  inline void merge_pairs (insn_list_t &, insn_list_t &,
+		    bool load_p, unsigned access_size);
+  inline void transform_for_base (int load_size, access_group &group);
+
+  inline bool try_fuse_pair (bool load_p, unsigned access_size,
+			     insn_info *i1, insn_info *i2);
+
+  inline bool fuse_pair (bool load_p, unsigned access_size,
+		  int writeback,
+		  insn_info *i1, insn_info *i2,
+		  base_cand &base,
+		  const insn_range_info &move_range);
+
+  inline void track_tombstone (int uid);
+
+  inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
+
+  template<typename Map>
+    void traverse_base_map (Map &map);
 
 private:
   obstack m_obstack;
@@ -191,27 +395,60 @@  private:
   bool m_emitted_tombstone;
 
   inline splay_tree_node<access_record *> *node_alloc (access_record *);
+};
 
-  template<typename Map>
-  inline void traverse_base_map (Map &map);
-  inline void transform_for_base (int load_size, access_group &group);
-
-  inline void merge_pairs (insn_list_t &, insn_list_t &,
-			   bool load_p, unsigned access_size);
+bool
+aarch64_pair_fusion::pair_mem_promote_writeback_p (insn_info *insn, rtx pat,
+						   bool &load_p)
+{
+  if (reload_completed
+      && aarch64_ldp_writeback > 1
+      && GET_CODE (pat) == PARALLEL
+      && XVECLEN (pat, 0) == 2)
+    {
+      auto rti = insn->rtl ();
+      const auto attr = get_attr_ldpstp (rti);
+      if (attr == LDPSTP_NONE)
+	return false;
 
-  inline bool try_fuse_pair (bool load_p, unsigned access_size,
-			     insn_info *i1, insn_info *i2);
+      load_p = (attr == LDPSTP_LDP);
+      gcc_checking_assert (load_p || attr == LDPSTP_STP);
+      return true;
+    }
+  return false;
+}
 
-  inline bool fuse_pair (bool load_p, unsigned access_size,
-			 int writeback,
-			 insn_info *i1, insn_info *i2,
-			 base_cand &base,
-			 const insn_range_info &move_range);
+rtx
+aarch64_pair_fusion::gen_destructure_pair (rtx regs[2], rtx rti, bool load_p)
+{
+  if (load_p)
+    return aarch64_destructure_load_pair (regs, rti);
+  else
+    return aarch64_destructure_store_pair (regs, rti);
+}
 
-  inline void track_tombstone (int uid);
+rtx
+aarch64_pair_fusion::gen_mem_pair (rtx *pats,
+		  rtx writeback,
+		  bool load_p)
+{
+  rtx pair_pat;
 
-  inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
-};
+  if (writeback)
+    {
+      auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]);
+      return  gen_rtx_PARALLEL (VOIDmode, patvec);
+    }
+  else if (load_p)
+    return aarch64_gen_load_pair (XEXP (pats[0], 0),
+				  XEXP (pats[1], 0),
+				  XEXP (pats[0], 1));
+  else
+    return aarch64_gen_store_pair (XEXP (pats[0], 0),
+				   XEXP (pats[0], 1),
+				   XEXP (pats[1], 1));
+  return pair_pat;
+}
 
 splay_tree_node<access_record *> *
 ldp_bb_info::node_alloc (access_record *access)
@@ -312,7 +549,7 @@  any_post_modify_p (rtx x)
 
 // Return true if we should consider forming ldp/stp insns from memory
 // accesses with operand mode MODE at this stage in compilation.
-static bool
+bool
 ldp_operand_mode_ok_p (machine_mode mode)
 {
   const bool allow_qregs
@@ -412,9 +649,10 @@  ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
   const machine_mode mem_mode = GET_MODE (mem);
   const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
 
-  // Punt on misaligned offsets.  LDP/STP instructions require offsets to be a
-  // multiple of the access size, and we believe that misaligned offsets on
-  // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL bases.
+  // Punt on misaligned offsets.  Paired memory access instructions require
+  // offsets to be a multiple of the access size, and we believe that
+  // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned
+  // offsets w.r.t. RTL bases.
   if (!multiple_p (offset, mem_size))
     return false;
 
@@ -438,10 +676,10 @@  ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
 }
 
 // Main function to begin pair discovery.  Given a memory access INSN,
-// determine whether it could be a candidate for fusing into an ldp/stp,
-// and if so, track it in the appropriate data structure for this basic
-// block.  LOAD_P is true if the access is a load, and MEM is the mem
-// rtx that occurs in INSN.
+// determine whether it could be a candidate for fusing into a paired
+// access, and if so, track it in the appropriate data structure for
+// this basic block.  LOAD_P is true if the access is a load, and MEM
+//  is the mem rtx that occurs in INSN.
 void
 ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
 {
@@ -449,35 +687,26 @@  ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
   if (MEM_VOLATILE_P (mem))
     return;
 
-  // Ignore writeback accesses if the param says to do so.
-  if (!aarch64_ldp_writeback
+  // Ignore writeback accesses if the hook says to do so.
+  if (!m_pass->handle_writeback_opportunities (writeback::WRITEBACK)
       && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
     return;
 
   const machine_mode mem_mode = GET_MODE (mem);
-  if (!ldp_operand_mode_ok_p (mem_mode))
+  if (!m_pass->pair_operand_mode_ok_p (mem_mode))
     return;
 
   rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p);
 
-  // Ignore the access if the register operand isn't suitable for ldp/stp.
-  if (load_p
-      ? !aarch64_ldp_reg_operand (reg_op, mem_mode)
-      : !aarch64_stp_reg_operand (reg_op, mem_mode))
+  if (!m_pass->pair_reg_operand_ok_p (load_p, reg_op, mem_mode))
     return;
-
   // We want to segregate FP/SIMD accesses from GPR accesses.
   //
   // Before RA, we use the modes, noting that stores of constant zero
   // operands use GPRs (even in non-integer modes).  After RA, we use
   // the hard register numbers.
-  const bool fpsimd_op_p
-    = reload_completed
-    ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
-    : (GET_MODE_CLASS (mem_mode) != MODE_INT
-       && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
-
-  // Note ldp_operand_mode_ok_p already rejected VL modes.
+  const bool fpsimd_op_p = m_pass->fpsimd_op_p (reg_op, mem_mode, load_p);
+  // Note pair_operand_mode_ok_p already rejected VL modes.
   const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
   const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size };
 
@@ -506,8 +735,8 @@  ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
   // elimination offset pre-RA, we should postpone forming pairs on such
   // accesses until after RA.
   //
-  // As it stands, addresses with offsets in range for LDR but not
-  // in range for LDP/STP are currently reloaded inefficiently,
+  // As it stands, addresses in range for an individual load/store but not
+  // for a paired access are currently reloaded inefficiently,
   // ending up with a separate base register for each pair.
   //
   // In theory LRA should make use of
@@ -519,8 +748,8 @@  ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
   // that calls targetm.legitimize_address_displacement.
   //
   // So for now, it's better to punt when we can't be sure that the
-  // offset is in range for LDP/STP.  Out-of-range cases can then be
-  // handled after RA by the out-of-range LDP/STP peepholes.  Eventually, it
+  // offset is in range for paired access.  Out-of-range cases can then be
+  // handled after RA by the out-of-range PAIR MEM  peepholes.  Eventually, it
   // would be nice to handle known out-of-range opportunities in the
   // pass itself (for stack accesses, this would be in the post-RA pass).
   if (!reload_completed
@@ -573,8 +802,8 @@  ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
 	gcc_unreachable (); // Base defs should be unique.
     }
 
-  // Punt on misaligned offsets.  LDP/STP require offsets to be a multiple of
-  // the access size.
+  // Punt on misaligned offsets.  Paired memory accesses require offsets
+  // to be a multiple of the access size.
   if (!multiple_p (mem_off, mem_size))
     return;
 
@@ -1207,8 +1436,8 @@  extract_writebacks (bool load_p, rtx pats[2], int changed)
 // base register.  If there is one, we choose the first such update after
 // PAIR_DST that is still in the same BB as our pair.  We return the new def in
 // *ADD_DEF and the resulting writeback effect in *WRITEBACK_EFFECT.
-static insn_info *
-find_trailing_add (insn_info *insns[2],
+insn_info *
+pair_fusion::find_trailing_add (insn_info *insns[2],
 		   const insn_range_info &pair_range,
 		   int initial_writeback,
 		   rtx *writeback_effect,
@@ -1286,7 +1515,7 @@  find_trailing_add (insn_info *insns[2],
 
   off_hwi /= access_size;
 
-  if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM)
+  if (pair_mem_out_of_range_p (off_hwi))
     return nullptr;
 
   auto dump_prefix = [&]()
@@ -1800,7 +2029,7 @@  ldp_bb_info::fuse_pair (bool load_p,
     {
       if (dump_file)
 	fprintf (dump_file,
-		 "  ldp: i%d has wb but subsequent i%d has non-wb "
+		 "  load pair: i%d has wb but subsequent i%d has non-wb "
 		 "update of base (r%d), dropping wb\n",
 		 insns[0]->uid (), insns[1]->uid (), base_regno);
       gcc_assert (writeback_effect);
@@ -1823,7 +2052,7 @@  ldp_bb_info::fuse_pair (bool load_p,
     }
 
   // If either of the original insns had writeback, but the resulting pair insn
-  // does not (can happen e.g. in the ldp edge case above, or if the writeback
+  // does not (can happen e.g. in the load pair edge case above, or if the writeback
   // effects cancel out), then drop the def(s) of the base register as
   // appropriate.
   //
@@ -1842,7 +2071,7 @@  ldp_bb_info::fuse_pair (bool load_p,
   // update of the base register and try and fold it in to make this into a
   // writeback pair.
   insn_info *trailing_add = nullptr;
-  if (aarch64_ldp_writeback > 1
+  if (m_pass->handle_writeback_opportunities (writeback::WRITEBACK_PAIR_P)
       && !writeback_effect
       && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1,
 					 XEXP (pats[0], 0), nullptr)
@@ -1850,7 +2079,7 @@  ldp_bb_info::fuse_pair (bool load_p,
 					     XEXP (pats[1], 0), nullptr))))
     {
       def_info *add_def;
-      trailing_add = find_trailing_add (insns, move_range, writeback,
+      trailing_add = m_pass->find_trailing_add (insns, move_range, writeback,
 					&writeback_effect,
 					&add_def, base.def, offsets[0],
 					access_size);
@@ -1863,14 +2092,14 @@  ldp_bb_info::fuse_pair (bool load_p,
     }
 
   // Now that we know what base mem we're going to use, check if it's OK
-  // with the ldp/stp policy.
+  // with the pair mem policy.
   rtx first_mem = XEXP (pats[0], load_p);
-  if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
-						load_p,
-						GET_MODE (first_mem)))
+  if (!m_pass->pair_mem_ok_with_policy (first_mem,
+			  load_p,
+			  GET_MODE (first_mem)))
     {
       if (dump_file)
-	fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n",
+	fprintf (dump_file, "punting on pair (%d,%d), pair mem  policy says no\n",
 		 i1->uid (), i2->uid ());
       return false;
     }
@@ -1878,21 +2107,10 @@  ldp_bb_info::fuse_pair (bool load_p,
   rtx reg_notes = combine_reg_notes (first, second, load_p);
 
   rtx pair_pat;
-  if (writeback_effect)
-    {
-      auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]);
-      pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec);
-    }
-  else if (load_p)
-    pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0),
-				      XEXP (pats[1], 0),
-				      XEXP (pats[0], 1));
-  else
-    pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0),
-				       XEXP (pats[0], 1),
-				       XEXP (pats[1], 1));
 
+  pair_pat = m_pass->gen_mem_pair (pats, writeback_effect, load_p);
   insn_change *pair_change = nullptr;
+
   auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) {
     rtx_insn *rti = change->insn ()->rtl ();
     validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
@@ -2133,15 +2351,6 @@  load_modified_by_store_p (insn_info *load,
   return false;
 }
 
-// Virtual base class for load/store walkers used in alias analysis.
-struct alias_walker
-{
-  virtual bool conflict_p (int &budget) const = 0;
-  virtual insn_info *insn () const = 0;
-  virtual bool valid () const = 0;
-  virtual void advance () = 0;
-};
-
 // Implement some common functionality used by both store_walker
 // and load_walker.
 template<bool reverse>
@@ -2259,13 +2468,13 @@  public:
 //
 // We try to maintain the invariant that if a walker becomes invalid, we
 // set its pointer to null.
-static void
-do_alias_analysis (insn_info *alias_hazards[4],
+void
+pair_fusion::do_alias_analysis (insn_info *alias_hazards[4],
 		   alias_walker *walkers[4],
 		   bool load_p)
 {
   const int n_walkers = 2 + (2 * !load_p);
-  int budget = aarch64_ldp_alias_check_limit;
+  int budget = pair_mem_alias_check_limit ();
 
   auto next_walker = [walkers,n_walkers](int current) -> int {
     for (int j = 1; j <= n_walkers; j++)
@@ -2350,8 +2559,8 @@  do_alias_analysis (insn_info *alias_hazards[4],
 //
 // Returns an integer where bit (1 << i) is set if INSNS[i] uses writeback
 // addressing.
-static int
-get_viable_bases (insn_info *insns[2],
+int
+pair_fusion::get_viable_bases (insn_info *insns[2],
 		  vec<base_cand> &base_cands,
 		  rtx cand_mems[2],
 		  unsigned access_size,
@@ -2397,7 +2606,7 @@  get_viable_bases (insn_info *insns[2],
       if (!is_lower)
 	base_off--;
 
-      if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM)
+      if (pair_mem_out_of_range_p (base_off))
 	continue;
 
       use_info *use = find_access (insns[i]->uses (), REGNO (base));
@@ -2454,7 +2663,7 @@  get_viable_bases (insn_info *insns[2],
 }
 
 // Given two adjacent memory accesses of the same size, I1 and I2, try
-// and see if we can merge them into a ldp or stp.
+// and see if we can merge them into a paired access load and store.
 //
 // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true
 // if the accesses are both loads, otherwise they are both stores.
@@ -2494,7 +2703,7 @@  ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
     {
       if (dump_file)
 	fprintf (dump_file,
-		 "punting on ldp due to reg conflcits (%d,%d)\n",
+		 "punting on pair mem load  due to reg conflcits (%d,%d)\n",
 		 insns[0]->uid (), insns[1]->uid ());
       return false;
     }
@@ -2512,7 +2721,7 @@  ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
 
   auto_vec<base_cand, 2> base_cands (2);
 
-  int writeback = get_viable_bases (insns, base_cands, cand_mems,
+  int writeback = m_pass->get_viable_bases (insns, base_cands, cand_mems,
 				    access_size, reversed);
   if (base_cands.is_empty ())
     {
@@ -2641,7 +2850,7 @@  ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
     walkers[1] = &backward_store_walker;
 
   if (load_p && (mem_defs[0] || mem_defs[1]))
-    do_alias_analysis (alias_hazards, walkers, load_p);
+    m_pass->do_alias_analysis (alias_hazards, walkers, load_p);
   else
     {
       // We want to find any loads hanging off the first store.
@@ -2650,7 +2859,7 @@  ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
       load_walker<true> backward_load_walker (mem_defs[1], insns[1], insns[0]);
       walkers[2] = &forward_load_walker;
       walkers[3] = &backward_load_walker;
-      do_alias_analysis (alias_hazards, walkers, load_p);
+      m_pass->do_alias_analysis (alias_hazards, walkers, load_p);
       // Now consolidate hazards back down.
       if (alias_hazards[2]
 	  && (!alias_hazards[0] || (*alias_hazards[2] < *alias_hazards[0])))
@@ -2891,7 +3100,7 @@  ldp_bb_info::merge_pairs (insn_list_t &left_list,
 // merge_pairs.
 void
 ldp_bb_info::transform_for_base (int encoded_lfs,
-				 access_group &group)
+					 access_group &group)
 {
   const auto lfs = decode_lfs (encoded_lfs);
   const unsigned access_size = lfs.size;
@@ -2964,29 +3173,9 @@  ldp_bb_info::transform ()
   traverse_base_map (def_map);
 }
 
-static void
-ldp_fusion_init ()
-{
-  calculate_dominance_info (CDI_DOMINATORS);
-  df_analyze ();
-  crtl->ssa = new rtl_ssa::function_info (cfun);
-}
-
-static void
-ldp_fusion_destroy ()
-{
-  if (crtl->ssa->perform_pending_updates ())
-    cleanup_cfg (0);
-
-  free_dominance_info (CDI_DOMINATORS);
-
-  delete crtl->ssa;
-  crtl->ssa = nullptr;
-}
-
 // Given a load pair insn in PATTERN, unpack the insn, storing
 // the registers in REGS and returning the mem.
-static rtx
+rtx
 aarch64_destructure_load_pair (rtx regs[2], rtx pattern)
 {
   rtx mem = NULL_RTX;
@@ -3012,7 +3201,7 @@  aarch64_destructure_load_pair (rtx regs[2], rtx pattern)
 
 // Given a store pair insn in PATTERN, unpack the insn, storing
 // the register operands in REGS, and returning the mem.
-static rtx
+rtx
 aarch64_destructure_store_pair (rtx regs[2], rtx pattern)
 {
   rtx mem = XEXP (pattern, 0);
@@ -3030,7 +3219,7 @@  aarch64_destructure_store_pair (rtx regs[2], rtx pattern)
 //
 // This is used when promoting existing non-writeback pairs to writeback
 // variants.
-static rtx
+rtx
 aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
 			    bool load_p)
 {
@@ -3068,22 +3257,13 @@  aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
 // the base register which we can fold in to make this pair use
 // a writeback addressing mode.
 static void
-try_promote_writeback (insn_info *insn)
+try_promote_writeback (insn_info *insn, bool load_p, pair_fusion *pass)
 {
-  auto rti = insn->rtl ();
-  const auto attr = get_attr_ldpstp (rti);
-  if (attr == LDPSTP_NONE)
-    return;
-
-  bool load_p = (attr == LDPSTP_LDP);
-  gcc_checking_assert (load_p || attr == LDPSTP_STP);
-
   rtx regs[2];
   rtx mem = NULL_RTX;
-  if (load_p)
-    mem = aarch64_destructure_load_pair (regs, PATTERN (rti));
-  else
-    mem = aarch64_destructure_store_pair (regs, PATTERN (rti));
+
+  mem = pass->gen_destructure_pair (regs, PATTERN (insn->rtl ()), load_p);
+
   gcc_checking_assert (MEM_P (mem));
 
   poly_int64 offset;
@@ -3120,9 +3300,10 @@  try_promote_writeback (insn_info *insn)
   def_info *add_def;
   const insn_range_info pair_range (insn);
   insn_info *insns[2] = { nullptr, insn };
-  insn_info *trailing_add = find_trailing_add (insns, pair_range, 0, &wb_effect,
-					       &add_def, base_def, offset,
-					       access_size);
+  insn_info *trailing_add
+    = pass->find_trailing_add (insns, pair_range, 0, &wb_effect,
+			       &add_def, base_def, offset,
+			       access_size);
   if (!trailing_add)
     return;
 
@@ -3132,8 +3313,9 @@  try_promote_writeback (insn_info *insn)
   insn_change del_change (trailing_add, insn_change::DELETE);
   insn_change *changes[] = { &pair_change, &del_change };
 
-  rtx pair_pat = aarch64_gen_writeback_pair (wb_effect, mem, regs, load_p);
-  validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
+  rtx pair_pat = pass->gen_writeback_pair (wb_effect, mem, regs, load_p);
+
+  validate_unshare_change (insn->rtl(), &PATTERN (insn->rtl()), pair_pat, true);
 
   // The pair must gain the def of the base register from the add.
   pair_change.new_defs = insert_access (attempt,
@@ -3167,14 +3349,12 @@  try_promote_writeback (insn_info *insn)
 // for load/store candidates.  If running after RA, also try and promote
 // non-writeback pairs to use writeback addressing.  Then try to fuse
 // candidates into pairs.
-void ldp_fusion_bb (bb_info *bb)
+void pair_fusion::ldp_fusion_bb (bb_info *bb)
 {
-  const bool track_loads
-    = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
-  const bool track_stores
-    = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
+  const bool track_loads = track_loads_p ();
+  const bool track_stores = track_stores_p ();
 
-  ldp_bb_info bb_state (bb);
+  ldp_bb_info bb_state (bb, this);
 
   for (auto insn : bb->nondebug_insns ())
     {
@@ -3184,11 +3364,9 @@  void ldp_fusion_bb (bb_info *bb)
 	continue;
 
       rtx pat = PATTERN (rti);
-      if (reload_completed
-	  && aarch64_ldp_writeback > 1
-	  && GET_CODE (pat) == PARALLEL
-	  && XVECLEN (pat, 0) == 2)
-	try_promote_writeback (insn);
+      bool load_p;
+      if (pair_mem_promote_writeback_p (insn, pat, load_p))
+	try_promote_writeback (insn, load_p, this);
 
       if (GET_CODE (pat) != SET)
 	continue;
@@ -3205,12 +3383,8 @@  void ldp_fusion_bb (bb_info *bb)
 
 void ldp_fusion ()
 {
-  ldp_fusion_init ();
-
-  for (auto bb : crtl->ssa->bbs ())
-    ldp_fusion_bb (bb);
-
-  ldp_fusion_destroy ();
+  aarch64_pair_fusion pass;
+  pass.run ();
 }
 
 namespace {
@@ -3242,14 +3416,6 @@  public:
       if (!optimize || optimize_debug)
 	return false;
 
-      // If the tuning policy says never to form ldps or stps, don't run
-      // the pass.
-      if ((aarch64_tune_params.ldp_policy_model
-	   == AARCH64_LDP_STP_POLICY_NEVER)
-	  && (aarch64_tune_params.stp_policy_model
-	      == AARCH64_LDP_STP_POLICY_NEVER))
-	return false;
-
       if (reload_completed)
 	return flag_aarch64_late_ldp_fusion;
       else