diff mbox series

[v3,1/2,APX,CFCMOV] Support APX CFCMOV in if_convert pass

Message ID DM4PR11MB54871BE5F08117C8ABFA7518EC9E2@DM4PR11MB5487.namprd11.prod.outlook.com
State New
Headers show
Series [v3,1/2,APX,CFCMOV] Support APX CFCMOV in if_convert pass | expand

Commit Message

Kong, Lingling Sept. 6, 2024, 6:27 a.m. UTC
Hi,

This version has added a new optab named 'cfmovcc'. The new optab is used
in the middle end to expand to cfcmov. And simplified my patch by trying to
generate the conditional faulting movcc in noce_try_cmove_arith function.

All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
We also tested spec with SDE and passed the runtime test.

Ok for trunk?


APX CFCMOV[1] feature implements conditionally faulting which means
If the comparison is false, all memory faults are suppressed when load
or store a memory operand. Now we could load or store a memory
operand may trap or fault for conditional move.

In middle-end, now we don't support a conditional move if we knew
that a load from A or B could trap or fault. To enable CFCMOV, we
added a new optab named cfmovcc.

Conditional move suppress fault for condition mem store would not
move any arithmetic calculations. For condition mem load now just
support a conditional move one trap mem and one no trap and no mem
cases.


[1].https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

gcc/ChangeLog:

               * doc/md.texi: Add cfmovcc insn pattern explanation.
               * ifcvt.cc (can_use_cmove_load_mem_notrap): New func
               for conditional faulting movcc for load.
               (can_use_cmove_store_mem_notrap): New func for conditional
               faulting movcc for store.
               (can_use_cfmovcc):  New func for conditional faulting.
               (noce_try_cmove_arith): Try to convert to conditional faulting
               movcc.
               (noce_process_if_block): Ditto.
               * optabs.cc (emit_conditional_move): Handle cfmovcc.
               (emit_conditional_move_1): Ditto.
               * optabs.def (OPTAB_D): New optab.
---
gcc/doc/md.texi |  10 ++++
gcc/ifcvt.cc    | 119 ++++++++++++++++++++++++++++++++++++++++++++----
gcc/optabs.cc   |  14 +++++-
gcc/optabs.def  |   1 +
4 files changed, 132 insertions(+), 12 deletions(-)

--
2.31.1

Comments

Richard Sandiford Sept. 6, 2024, 9:18 a.m. UTC | #1
"Kong, Lingling" <lingling.kong@intel.com> writes:
> Hi,
>
> This version has added a new optab named 'cfmovcc'. The new optab is used
> in the middle end to expand to cfcmov. And simplified my patch by trying to
> generate the conditional faulting movcc in noce_try_cmove_arith function.
>
> All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
> We also tested spec with SDE and passed the runtime test.
>
> Ok for trunk?
>
>
> APX CFCMOV[1] feature implements conditionally faulting which means
> If the comparison is false, all memory faults are suppressed when load
> or store a memory operand. Now we could load or store a memory
> operand may trap or fault for conditional move.
>
> In middle-end, now we don't support a conditional move if we knew
> that a load from A or B could trap or fault. To enable CFCMOV, we
> added a new optab named cfmovcc.
>
> Conditional move suppress fault for condition mem store would not
> move any arithmetic calculations. For condition mem load now just
> support a conditional move one trap mem and one no trap and no mem
> cases.

Sorry if this is going over old ground (I haven't read the earlier
versions yet), but: instead of adding a new optab, could we treat
CFCMOV as a scalar instance of maskload_optab?  Robin is working on
adding an "else" value for when the condition/mask is false.  After
that, it would seem to be a pretty close match to CFCMOV.

One reason for preferring maskload is that it makes the load an
explicit part of the interface.  We could then potentially use
it in gimple too, not just expand.

Thanks,
Richard

>
>
> [1].https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
>
> gcc/ChangeLog:
>
>                * doc/md.texi: Add cfmovcc insn pattern explanation.
>                * ifcvt.cc (can_use_cmove_load_mem_notrap): New func
>                for conditional faulting movcc for load.
>                (can_use_cmove_store_mem_notrap): New func for conditional
>                faulting movcc for store.
>                (can_use_cfmovcc):  New func for conditional faulting.
>                (noce_try_cmove_arith): Try to convert to conditional faulting
>                movcc.
>                (noce_process_if_block): Ditto.
>                * optabs.cc (emit_conditional_move): Handle cfmovcc.
>                (emit_conditional_move_1): Ditto.
>                * optabs.def (OPTAB_D): New optab.
> ---
> gcc/doc/md.texi |  10 ++++
> gcc/ifcvt.cc    | 119 ++++++++++++++++++++++++++++++++++++++++++++----
> gcc/optabs.cc   |  14 +++++-
> gcc/optabs.def  |   1 +
> 4 files changed, 132 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index a9259112251..5f563787c49 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8591,6 +8591,16 @@ Return 1 if operand 1 is a normal floating point number and 0
> otherwise.  @var{m} is a scalar floating point mode.  Operand 0
> has mode @code{SImode}, and operand 1 has mode @var{m}.
> +@cindex @code{cfmov@var{mode}cc} instruction pattern
> +@item @samp{cfmov@var{mode}cc}
> +Similar to @samp{mov@var{mode}cc} but for conditional faulting,
> +If the comparison is false, all memory faults are suppressed
> +when load or store a memory operand.
> +
> +Conditionally move operand 2 or operand 3 into operand 0 according
> +to the comparison in operand 1.  If the comparison is true, operand 2
> +is moved into operand 0, otherwise operand 3 is moved.
> +
> @end table
>  @end ifset
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 6487574c514..59845390607 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -778,6 +778,9 @@ static bool noce_try_store_flag_mask (struct noce_if_info *);
> static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
>                                                 rtx, rtx, rtx, rtx = NULL, rtx = NULL);
> static bool noce_try_cmove (struct noce_if_info *);
> +static bool can_use_cmove_load_mem_notrap (rtx, rtx);
> +static bool can_use_cmove_store_mem_notrap (rtx, rtx, rtx, bool);
> +static bool can_use_cfmovcc (struct noce_if_info *);
> static bool noce_try_cmove_arith (struct noce_if_info *);
> static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
> static bool noce_try_minmax (struct noce_if_info *);
> @@ -2132,6 +2135,69 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
>    return true;
> }
> +/* Return TRUE if we could convert "if (test) x = *a; else x = b;"
> +   or "if (test) x = a; else x = *b;" to conditional faulting movcc,
> +   i.e. x86 cfcmov, especially when load a or b may cause memmory faults.  */
> +
> +static bool
> +can_use_cmove_load_mem_notrap (rtx a, rtx b)
> +{
> +  /* Just handle a conditional move from one trap MEM + other non_trap,
> +     non mem cases.  */
> +  if (!(MEM_P (a) ^ MEM_P (b)))
> +      return false;
> +  bool a_trap = may_trap_or_fault_p (a);
> +  bool b_trap = may_trap_or_fault_p (b);
> +
> +  if (!(a_trap ^ b_trap))
> +    return false;
> +  if (a_trap && !MEM_P (a))
> +    return false;
> +  if (b_trap && !MEM_P (b))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Return TRUE if we could convert "if (test) *x = a; else skip" to
> +   conditional faulting movcc, i.e. x86 cfcmov, especially when store
> +   x may cause memmory faults and in else_bb x == b.  */
> +
> +static bool
> +can_use_cmove_store_mem_notrap (rtx x, rtx a, rtx b, bool a_simple)
> +{
> +  gcc_assert (MEM_P (x));
> +
> +  machine_mode x_mode = GET_MODE (x);
> +
> +  if (!rtx_equal_p (x, b) || !may_trap_or_fault_p (x))
> +    return false;
> +  if (!a_simple || !register_operand (a, x_mode))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Return TRUE if backend supports cfmovcc_optab, which suppressed memory
> +   faults when load or store a memory operand and the condition code
> +   evaluates to false.  */
> +
> +static bool
> +can_use_cfmovcc (struct noce_if_info *if_info)
> +{
> +  rtx a = if_info->a;
> +  rtx b = if_info->b;
> +  rtx x = if_info->x;
> +
> +  if (optab_handler (cfmovcc_optab, GET_MODE (x)) == CODE_FOR_nothing)
> +    return false;
> +
> +  if (MEM_P (x))
> +    return can_use_cmove_store_mem_notrap (x, a, b, if_info->then_simple);
> +  else
> +    return can_use_cmove_load_mem_notrap (a, b);
> +}
> +
> /* Try more complex cases involving conditional_move.  */
>  static bool
> @@ -2171,7 +2237,15 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
>    /* ??? We could handle this if we knew that a load from A or B could
>       not trap or fault.  This is also true if we've already loaded
>       from the address along the path from ENTRY.  */
> -  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> +  else if (may_trap_or_fault_p (a) && may_trap_or_fault_p (b))
> +    return false;
> +  /* cfmovcc_optab implements conditionally faulting which means
> +     that if the condition code evaluates to false, all memory faults
> +     are suppressed when load or store a memory operand.  Now we could
> +     load or store a memory operand may trap or fault for conditional
> +     move.  */
> +  else if ((may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> +                && !can_use_cfmovcc (if_info))
>      return false;
>    /* if (test) x = a + b; else x = c - d;
> @@ -2247,9 +2321,14 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
>    /* If either operand is complex, load it into a register first.
>       The best way to do this is to copy the original insn.  In this
>       way we preserve any clobbers etc that the insn may have had.
> -     This is of course not possible in the IS_MEM case.  */
> +     This is of course not possible in the IS_MEM case.
> +     For load or store a operands may trap or fault, should not
> +     hoist the load or store, otherwise it unable to suppress memory
> +     fault, it just a normal arithmetic insn insteads of conditional
> +     faulting movcc.  */
> -  if (! general_operand (a, GET_MODE (a)) || tmp_a)
> +  if (! may_trap_or_fault_p (a)
> +      && (! general_operand (a, GET_MODE (a)) || tmp_a))
>      {
>        if (is_mem)
> @@ -2278,7 +2357,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
>               }
>      }
> -  if (! general_operand (b, GET_MODE (b)) || tmp_b)
> +  if (! may_trap_or_fault_p (b)
> +      && (! general_operand (b, GET_MODE (b)) || tmp_b))
>      {
>        if (is_mem)
>               {
> @@ -4210,12 +4290,31 @@ noce_process_if_block (struct noce_if_info *if_info)
>      }
>    if (!set_b && MEM_P (orig_x))
> -    /* We want to avoid store speculation to avoid cases like
> -              if (pthread_mutex_trylock(mutex))
> -                 ++global_variable;
> -       Rather than go to much effort here, we rely on the SSA optimizers,
> -       which do a good enough job these days.  */
> -    return false;
> +    {
> +      /* When target support conditional faulting movcc, i.e. x86 cfcmov,
> +             we could do conditonal mem store for "if (...) *x = a; else skip"
> +             to cfmovcc_optab, which x may trap or fault.  */
> +      if ((optab_handler (cfmovcc_optab, GET_MODE (orig_x))
> +                != CODE_FOR_nothing)
> +               && HAVE_conditional_move
> +               && may_trap_or_fault_p (orig_x)
> +               && register_operand (a, GET_MODE (orig_x)))
> +             {
> +               x = orig_x;
> +               if_info->x = x;
> +               if (noce_try_cmove_arith (if_info))
> +                 goto success;
> +               else
> +                 return false;
> +             }
> +      /* We want to avoid store speculation to avoid cases like
> +                if (pthread_mutex_trylock(mutex))
> +                  ++global_variable;
> +             Rather than go to much effort here, we rely on the SSA optimizers,
> +             which do a good enough job these days.  */
> +      else
> +       return false;
> +    }
>    if (noce_try_move (if_info))
>      goto success;
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 2bcb3f7b47a..181edf64f03 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -5034,6 +5034,7 @@ emit_conditional_move (rtx target, struct rtx_comparison comp,
>    rtx_insn *last;
>    enum insn_code icode;
>    enum rtx_code reversed;
> +  optab op = movcc_optab;
>    /* If the two source operands are identical, that's just a move.  */
> @@ -5082,7 +5083,11 @@ emit_conditional_move (rtx target, struct rtx_comparison comp,
>    if (mode == VOIDmode)
>      mode = GET_MODE (op2);
> -  icode = direct_optab_handler (movcc_optab, mode);
> +  if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
> +      && optab_handler (cfmovcc_optab, mode) != CODE_FOR_nothing)
> +    op = cfmovcc_optab;
> +
> +  icode = direct_optab_handler (op, mode);
>    if (icode == CODE_FOR_nothing)
>      return NULL_RTX;
> @@ -5194,6 +5199,7 @@ emit_conditional_move_1 (rtx target, rtx comparison,
>                                              rtx op2, rtx op3, machine_mode mode)
> {
>    enum insn_code icode;
> +  optab op = movcc_optab;
>    if (comparison == NULL_RTX || !COMPARISON_P (comparison))
>      return NULL_RTX;
> @@ -5214,7 +5220,11 @@ emit_conditional_move_1 (rtx target, rtx comparison,
>    if (mode == VOIDmode)
>      mode = GET_MODE (op2);
> -  icode = direct_optab_handler (movcc_optab, mode);
> +  if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
> +      && optab_handler (cfmovcc_optab, mode) != CODE_FOR_nothing)
> +    op = cfmovcc_optab;
> +
> +  icode = direct_optab_handler (op, mode);
>    if (icode == CODE_FOR_nothing)
>      return NULL_RTX;
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 58a939442bd..893e5c1c6c2 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -551,3 +551,4 @@ OPTAB_D (len_store_optab, "len_store_$a")
> OPTAB_D (select_vl_optab, "select_vl$a")
> OPTAB_D (andn_optab, "andn$a3")
> OPTAB_D (iorn_optab, "iorn$a3")
> +OPTAB_D (cfmovcc_optab, "cfmov$acc")
> --
> 2.31.1
Kong, Lingling Sept. 11, 2024, 8:18 a.m. UTC | #2
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Friday, September 6, 2024 5:19 PM
> To: Kong, Lingling <lingling.kong@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Jeff Law <jeffreyalaw@gmail.com>; Richard Biener
> <richard.guenther@gmail.com>; Uros Bizjak <ubizjak@gmail.com>; Hongtao Liu
> <crazylht@gmail.com>; Jakub Jelinek <jakub@redhat.com>
> Subject: Re: [PATCH v3 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert
> pass
> 
> "Kong, Lingling" <lingling.kong@intel.com> writes:
> > Hi,
> >
> > This version has added a new optab named 'cfmovcc'. The new optab is
> > used in the middle end to expand to cfcmov. And simplified my patch by
> > trying to generate the conditional faulting movcc in noce_try_cmove_arith
> function.
> >
> > All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
> > We also tested spec with SDE and passed the runtime test.
> >
> > Ok for trunk?
> >
> >
> > APX CFCMOV[1] feature implements conditionally faulting which means If
> > the comparison is false, all memory faults are suppressed when load or
> > store a memory operand. Now we could load or store a memory operand
> > may trap or fault for conditional move.
> >
> > In middle-end, now we don't support a conditional move if we knew that
> > a load from A or B could trap or fault. To enable CFCMOV, we added a
> > new optab named cfmovcc.
> >
> > Conditional move suppress fault for condition mem store would not move
> > any arithmetic calculations. For condition mem load now just support a
> > conditional move one trap mem and one no trap and no mem cases.
> 
> Sorry if this is going over old ground (I haven't read the earlier versions yet), but:
> instead of adding a new optab, could we treat CFCMOV as a scalar instance of
> maskload_optab?  Robin is working on adding an "else" value for when the
> condition/mask is false.  After that, it would seem to be a pretty close match to
> CFCMOV.
> 
> One reason for preferring maskload is that it makes the load an explicit part of
> the interface.  We could then potentially use it in gimple too, not just expand.
> 

Yes, for conditional load is like a scalar instance of  maskload_optab  with else operand.
I could try to use maskload_optab to generate cfcmov in rtl ifcvt pass. But it still after expand.
Now we don't have if-covert pass for scalar in gimple, do we have plan to do that ?

Thanks,
Lingling

> Thanks,
> Richard
> 
> >
> >
> > [1].https://www.intel.com/content/www/us/en/developer/articles/technic
> > al/advanced-performance-extensions-apx.html
> >
> > gcc/ChangeLog:
> >
> >                * doc/md.texi: Add cfmovcc insn pattern explanation.
> >                * ifcvt.cc (can_use_cmove_load_mem_notrap): New func
> >                for conditional faulting movcc for load.
> >                (can_use_cmove_store_mem_notrap): New func for conditional
> >                faulting movcc for store.
> >                (can_use_cfmovcc):  New func for conditional faulting.
> >                (noce_try_cmove_arith): Try to convert to conditional faulting
> >                movcc.
> >                (noce_process_if_block): Ditto.
> >                * optabs.cc (emit_conditional_move): Handle cfmovcc.
> >                (emit_conditional_move_1): Ditto.
> >                * optabs.def (OPTAB_D): New optab.
> > ---
> > gcc/doc/md.texi |  10 ++++
> > gcc/ifcvt.cc    | 119 ++++++++++++++++++++++++++++++++++++++++++++----
> > gcc/optabs.cc   |  14 +++++-
> > gcc/optabs.def  |   1 +
> > 4 files changed, 132 insertions(+), 12 deletions(-)
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > a9259112251..5f563787c49 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -8591,6 +8591,16 @@ Return 1 if operand 1 is a normal floating
> > point number and 0 otherwise.  @var{m} is a scalar floating point
> > mode.  Operand 0 has mode @code{SImode}, and operand 1 has mode
> @var{m}.
> > +@cindex @code{cfmov@var{mode}cc} instruction pattern @item
> > +@samp{cfmov@var{mode}cc} Similar to @samp{mov@var{mode}cc} but for
> > +conditional faulting, If the comparison is false, all memory faults
> > +are suppressed when load or store a memory operand.
> > +
> > +Conditionally move operand 2 or operand 3 into operand 0 according to
> > +the comparison in operand 1.  If the comparison is true, operand 2 is
> > +moved into operand 0, otherwise operand 3 is moved.
> > +
> > @end table
> >  @end ifset
> > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index
> > 6487574c514..59845390607 100644
> > --- a/gcc/ifcvt.cc
> > +++ b/gcc/ifcvt.cc
> > @@ -778,6 +778,9 @@ static bool noce_try_store_flag_mask (struct
> > noce_if_info *); static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum
> rtx_code, rtx,
> >                                                 rtx, rtx, rtx, rtx =
> > NULL, rtx = NULL); static bool noce_try_cmove (struct noce_if_info *);
> > +static bool can_use_cmove_load_mem_notrap (rtx, rtx); static bool
> > +can_use_cmove_store_mem_notrap (rtx, rtx, rtx, bool); static bool
> > +can_use_cfmovcc (struct noce_if_info *);
> > static bool noce_try_cmove_arith (struct noce_if_info *); static rtx
> > noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
> > static bool noce_try_minmax (struct noce_if_info *); @@ -2132,6
> > +2135,69 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
> >    return true;
> > }
> > +/* Return TRUE if we could convert "if (test) x = *a; else x = b;"
> > +   or "if (test) x = a; else x = *b;" to conditional faulting movcc,
> > +   i.e. x86 cfcmov, especially when load a or b may cause memmory
> > +faults.  */
> > +
> > +static bool
> > +can_use_cmove_load_mem_notrap (rtx a, rtx b) {
> > +  /* Just handle a conditional move from one trap MEM + other non_trap,
> > +     non mem cases.  */
> > +  if (!(MEM_P (a) ^ MEM_P (b)))
> > +      return false;
> > +  bool a_trap = may_trap_or_fault_p (a);
> > +  bool b_trap = may_trap_or_fault_p (b);
> > +
> > +  if (!(a_trap ^ b_trap))
> > +    return false;
> > +  if (a_trap && !MEM_P (a))
> > +    return false;
> > +  if (b_trap && !MEM_P (b))
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Return TRUE if we could convert "if (test) *x = a; else skip" to
> > +   conditional faulting movcc, i.e. x86 cfcmov, especially when store
> > +   x may cause memmory faults and in else_bb x == b.  */
> > +
> > +static bool
> > +can_use_cmove_store_mem_notrap (rtx x, rtx a, rtx b, bool a_simple) {
> > +  gcc_assert (MEM_P (x));
> > +
> > +  machine_mode x_mode = GET_MODE (x);
> > +
> > +  if (!rtx_equal_p (x, b) || !may_trap_or_fault_p (x))
> > +    return false;
> > +  if (!a_simple || !register_operand (a, x_mode))
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Return TRUE if backend supports cfmovcc_optab, which suppressed
> memory
> > +   faults when load or store a memory operand and the condition code
> > +   evaluates to false.  */
> > +
> > +static bool
> > +can_use_cfmovcc (struct noce_if_info *if_info) {
> > +  rtx a = if_info->a;
> > +  rtx b = if_info->b;
> > +  rtx x = if_info->x;
> > +
> > +  if (optab_handler (cfmovcc_optab, GET_MODE (x)) == CODE_FOR_nothing)
> > +    return false;
> > +
> > +  if (MEM_P (x))
> > +    return can_use_cmove_store_mem_notrap (x, a, b,
> > +if_info->then_simple);
> > +  else
> > +    return can_use_cmove_load_mem_notrap (a, b); }
> > +
> > /* Try more complex cases involving conditional_move.  */  static bool
> > @@ -2171,7 +2237,15 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
> >    /* ??? We could handle this if we knew that a load from A or B could
> >       not trap or fault.  This is also true if we've already loaded
> >       from the address along the path from ENTRY.  */
> > -  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > +  else if (may_trap_or_fault_p (a) && may_trap_or_fault_p (b))
> > +    return false;
> > +  /* cfmovcc_optab implements conditionally faulting which means
> > +     that if the condition code evaluates to false, all memory faults
> > +     are suppressed when load or store a memory operand.  Now we could
> > +     load or store a memory operand may trap or fault for conditional
> > +     move.  */
> > +  else if ((may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > +                && !can_use_cfmovcc (if_info))
> >      return false;
> >    /* if (test) x = a + b; else x = c - d; @@ -2247,9 +2321,14 @@
> > noce_try_cmove_arith (struct noce_if_info *if_info)
> >    /* If either operand is complex, load it into a register first.
> >       The best way to do this is to copy the original insn.  In this
> >       way we preserve any clobbers etc that the insn may have had.
> > -     This is of course not possible in the IS_MEM case.  */
> > +     This is of course not possible in the IS_MEM case.
> > +     For load or store a operands may trap or fault, should not
> > +     hoist the load or store, otherwise it unable to suppress memory
> > +     fault, it just a normal arithmetic insn insteads of conditional
> > +     faulting movcc.  */
> > -  if (! general_operand (a, GET_MODE (a)) || tmp_a)
> > +  if (! may_trap_or_fault_p (a)
> > +      && (! general_operand (a, GET_MODE (a)) || tmp_a))
> >      {
> >        if (is_mem)
> > @@ -2278,7 +2357,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
> >               }
> >      }
> > -  if (! general_operand (b, GET_MODE (b)) || tmp_b)
> > +  if (! may_trap_or_fault_p (b)
> > +      && (! general_operand (b, GET_MODE (b)) || tmp_b))
> >      {
> >        if (is_mem)
> >               {
> > @@ -4210,12 +4290,31 @@ noce_process_if_block (struct noce_if_info
> *if_info)
> >      }
> >    if (!set_b && MEM_P (orig_x))
> > -    /* We want to avoid store speculation to avoid cases like
> > -              if (pthread_mutex_trylock(mutex))
> > -                 ++global_variable;
> > -       Rather than go to much effort here, we rely on the SSA optimizers,
> > -       which do a good enough job these days.  */
> > -    return false;
> > +    {
> > +      /* When target support conditional faulting movcc, i.e. x86 cfcmov,
> > +             we could do conditonal mem store for "if (...) *x = a; else skip"
> > +             to cfmovcc_optab, which x may trap or fault.  */
> > +      if ((optab_handler (cfmovcc_optab, GET_MODE (orig_x))
> > +                != CODE_FOR_nothing)
> > +               && HAVE_conditional_move
> > +               && may_trap_or_fault_p (orig_x)
> > +               && register_operand (a, GET_MODE (orig_x)))
> > +             {
> > +               x = orig_x;
> > +               if_info->x = x;
> > +               if (noce_try_cmove_arith (if_info))
> > +                 goto success;
> > +               else
> > +                 return false;
> > +             }
> > +      /* We want to avoid store speculation to avoid cases like
> > +                if (pthread_mutex_trylock(mutex))
> > +                  ++global_variable;
> > +             Rather than go to much effort here, we rely on the SSA optimizers,
> > +             which do a good enough job these days.  */
> > +      else
> > +       return false;
> > +    }
> >    if (noce_try_move (if_info))
> >      goto success;
> > diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
> > 2bcb3f7b47a..181edf64f03 100644
> > --- a/gcc/optabs.cc
> > +++ b/gcc/optabs.cc
> > @@ -5034,6 +5034,7 @@ emit_conditional_move (rtx target, struct
> rtx_comparison comp,
> >    rtx_insn *last;
> >    enum insn_code icode;
> >    enum rtx_code reversed;
> > +  optab op = movcc_optab;
> >    /* If the two source operands are identical, that's just a move.
> > */ @@ -5082,7 +5083,11 @@ emit_conditional_move (rtx target, struct
> rtx_comparison comp,
> >    if (mode == VOIDmode)
> >      mode = GET_MODE (op2);
> > -  icode = direct_optab_handler (movcc_optab, mode);
> > +  if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
> > +      && optab_handler (cfmovcc_optab, mode) != CODE_FOR_nothing)
> > +    op = cfmovcc_optab;
> > +
> > +  icode = direct_optab_handler (op, mode);
> >    if (icode == CODE_FOR_nothing)
> >      return NULL_RTX;
> > @@ -5194,6 +5199,7 @@ emit_conditional_move_1 (rtx target, rtx
> comparison,
> >                                              rtx op2, rtx op3,
> > machine_mode mode) {
> >    enum insn_code icode;
> > +  optab op = movcc_optab;
> >    if (comparison == NULL_RTX || !COMPARISON_P (comparison))
> >      return NULL_RTX;
> > @@ -5214,7 +5220,11 @@ emit_conditional_move_1 (rtx target, rtx
> comparison,
> >    if (mode == VOIDmode)
> >      mode = GET_MODE (op2);
> > -  icode = direct_optab_handler (movcc_optab, mode);
> > +  if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
> > +      && optab_handler (cfmovcc_optab, mode) != CODE_FOR_nothing)
> > +    op = cfmovcc_optab;
> > +
> > +  icode = direct_optab_handler (op, mode);
> >    if (icode == CODE_FOR_nothing)
> >      return NULL_RTX;
> > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > 58a939442bd..893e5c1c6c2 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -551,3 +551,4 @@ OPTAB_D (len_store_optab, "len_store_$a") OPTAB_D
> > (select_vl_optab, "select_vl$a") OPTAB_D (andn_optab, "andn$a3")
> > OPTAB_D (iorn_optab, "iorn$a3")
> > +OPTAB_D (cfmovcc_optab, "cfmov$acc")
> > --
> > 2.31.1
Kong, Lingling Sept. 20, 2024, 5:59 a.m. UTC | #3
> > "Kong, Lingling" <lingling.kong@intel.com> writes:
> > > Hi,
> > >
> > > This version has added a new optab named 'cfmovcc'. The new optab is
> > > used in the middle end to expand to cfcmov. And simplified my patch
> > > by trying to generate the conditional faulting movcc in
> > > noce_try_cmove_arith
> > function.
> > >
> > > All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
> > > We also tested spec with SDE and passed the runtime test.
> > >
> > > Ok for trunk?
> > >
> > >
> > > APX CFCMOV[1] feature implements conditionally faulting which means
> > > If the comparison is false, all memory faults are suppressed when
> > > load or store a memory operand. Now we could load or store a memory
> > > operand may trap or fault for conditional move.
> > >
> > > In middle-end, now we don't support a conditional move if we knew
> > > that a load from A or B could trap or fault. To enable CFCMOV, we
> > > added a new optab named cfmovcc.
> > >
> > > Conditional move suppress fault for condition mem store would not
> > > move any arithmetic calculations. For condition mem load now just
> > > support a conditional move one trap mem and one no trap and no mem cases.
> >
> > Sorry if this is going over old ground (I haven't read the earlier versions yet), but:
> > instead of adding a new optab, could we treat CFCMOV as a scalar
> > instance of maskload_optab?  Robin is working on adding an "else"
> > value for when the condition/mask is false.  After that, it would seem
> > to be a pretty close match to CFCMOV.
> >
> > One reason for preferring maskload is that it makes the load an
> > explicit part of the interface.  We could then potentially use it in gimple too, not
> just expand.
> >
> 
> Yes, for conditional load is like a scalar instance of  maskload_optab  with else
> operand.
> I could try to use maskload_optab to generate cfcmov in rtl ifcvt pass. But it still
> after expand.
> Now we don't have if-covert pass for scalar in gimple, do we have plan to do
> that ?
> 

Hi,

I have tried to use maskload/maskstore to generate CFCMOV in ifcvt pass,
Unlike mov<mode>cc, maskload/maskstore are not allowed to FAIL.
But I need restrictions for CFCMOV in backend expand.  Since expand maskload/maskstore 
cannot fail, I can only make restrictions in ifcvt and emit_conditional_move (in optabs.cc).

I'm not sure if this is the right approach, do you have any suggestions?

Thanks,
Lingling

> > Thanks,
> > Richard
> >
> > >
> > >
> > > [1].https://www.intel.com/content/www/us/en/developer/articles/techn
> > > ic al/advanced-performance-extensions-apx.html
> > >
> > > gcc/ChangeLog:
> > >
> > >                * doc/md.texi: Add cfmovcc insn pattern explanation.
> > >                * ifcvt.cc (can_use_cmove_load_mem_notrap): New func
> > >                for conditional faulting movcc for load.
> > >                (can_use_cmove_store_mem_notrap): New func for conditional
> > >                faulting movcc for store.
> > >                (can_use_cfmovcc):  New func for conditional faulting.
> > >                (noce_try_cmove_arith): Try to convert to conditional faulting
> > >                movcc.
> > >                (noce_process_if_block): Ditto.
> > >                * optabs.cc (emit_conditional_move): Handle cfmovcc.
> > >                (emit_conditional_move_1): Ditto.
> > >                * optabs.def (OPTAB_D): New optab.
> > > ---
> > > gcc/doc/md.texi |  10 ++++
> > > gcc/ifcvt.cc    | 119 ++++++++++++++++++++++++++++++++++++++++++++----
> > > gcc/optabs.cc   |  14 +++++-
> > > gcc/optabs.def  |   1 +
> > > 4 files changed, 132 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > a9259112251..5f563787c49 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -8591,6 +8591,16 @@ Return 1 if operand 1 is a normal floating
> > > point number and 0 otherwise.  @var{m} is a scalar floating point
> > > mode.  Operand 0 has mode @code{SImode}, and operand 1 has mode
> > @var{m}.
> > > +@cindex @code{cfmov@var{mode}cc} instruction pattern @item
> > > +@samp{cfmov@var{mode}cc} Similar to @samp{mov@var{mode}cc} but for
> > > +conditional faulting, If the comparison is false, all memory faults
> > > +are suppressed when load or store a memory operand.
> > > +
> > > +Conditionally move operand 2 or operand 3 into operand 0 according
> > > +to the comparison in operand 1.  If the comparison is true, operand
> > > +2 is moved into operand 0, otherwise operand 3 is moved.
> > > +
> > > @end table
> > >  @end ifset
> > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index
> > > 6487574c514..59845390607 100644
> > > --- a/gcc/ifcvt.cc
> > > +++ b/gcc/ifcvt.cc
> > > @@ -778,6 +778,9 @@ static bool noce_try_store_flag_mask (struct
> > > noce_if_info *); static rtx noce_emit_cmove (struct noce_if_info *,
> > > rtx, enum
> > rtx_code, rtx,
> > >                                                 rtx, rtx, rtx, rtx =
> > > NULL, rtx = NULL); static bool noce_try_cmove (struct noce_if_info
> > > *);
> > > +static bool can_use_cmove_load_mem_notrap (rtx, rtx); static bool
> > > +can_use_cmove_store_mem_notrap (rtx, rtx, rtx, bool); static bool
> > > +can_use_cfmovcc (struct noce_if_info *);
> > > static bool noce_try_cmove_arith (struct noce_if_info *); static rtx
> > > noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
> > > static bool noce_try_minmax (struct noce_if_info *); @@ -2132,6
> > > +2135,69 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool
> > > +simple)
> > >    return true;
> > > }
> > > +/* Return TRUE if we could convert "if (test) x = *a; else x = b;"
> > > +   or "if (test) x = a; else x = *b;" to conditional faulting movcc,
> > > +   i.e. x86 cfcmov, especially when load a or b may cause memmory
> > > +faults.  */
> > > +
> > > +static bool
> > > +can_use_cmove_load_mem_notrap (rtx a, rtx b) {
> > > +  /* Just handle a conditional move from one trap MEM + other non_trap,
> > > +     non mem cases.  */
> > > +  if (!(MEM_P (a) ^ MEM_P (b)))
> > > +      return false;
> > > +  bool a_trap = may_trap_or_fault_p (a);
> > > +  bool b_trap = may_trap_or_fault_p (b);
> > > +
> > > +  if (!(a_trap ^ b_trap))
> > > +    return false;
> > > +  if (a_trap && !MEM_P (a))
> > > +    return false;
> > > +  if (b_trap && !MEM_P (b))
> > > +    return false;
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Return TRUE if we could convert "if (test) *x = a; else skip" to
> > > +   conditional faulting movcc, i.e. x86 cfcmov, especially when store
> > > +   x may cause memmory faults and in else_bb x == b.  */
> > > +
> > > +static bool
> > > +can_use_cmove_store_mem_notrap (rtx x, rtx a, rtx b, bool a_simple)
> > > +{
> > > +  gcc_assert (MEM_P (x));
> > > +
> > > +  machine_mode x_mode = GET_MODE (x);
> > > +
> > > +  if (!rtx_equal_p (x, b) || !may_trap_or_fault_p (x))
> > > +    return false;
> > > +  if (!a_simple || !register_operand (a, x_mode))
> > > +    return false;
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Return TRUE if backend supports cfmovcc_optab, which suppressed
> > memory
> > > +   faults when load or store a memory operand and the condition code
> > > +   evaluates to false.  */
> > > +
> > > +static bool
> > > +can_use_cfmovcc (struct noce_if_info *if_info) {
> > > +  rtx a = if_info->a;
> > > +  rtx b = if_info->b;
> > > +  rtx x = if_info->x;
> > > +
> > > +  if (optab_handler (cfmovcc_optab, GET_MODE (x)) == CODE_FOR_nothing)
> > > +    return false;
> > > +
> > > +  if (MEM_P (x))
> > > +    return can_use_cmove_store_mem_notrap (x, a, b,
> > > +if_info->then_simple);
> > > +  else
> > > +    return can_use_cmove_load_mem_notrap (a, b); }
> > > +
> > > /* Try more complex cases involving conditional_move.  */  static
> > > bool @@ -2171,7 +2237,15 @@ noce_try_cmove_arith (struct noce_if_info
> *if_info)
> > >    /* ??? We could handle this if we knew that a load from A or B could
> > >       not trap or fault.  This is also true if we've already loaded
> > >       from the address along the path from ENTRY.  */
> > > -  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > > +  else if (may_trap_or_fault_p (a) && may_trap_or_fault_p (b))
> > > +    return false;
> > > +  /* cfmovcc_optab implements conditionally faulting which means
> > > +     that if the condition code evaluates to false, all memory faults
> > > +     are suppressed when load or store a memory operand.  Now we could
> > > +     load or store a memory operand may trap or fault for conditional
> > > +     move.  */
> > > +  else if ((may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > > +                && !can_use_cfmovcc (if_info))
> > >      return false;
> > >    /* if (test) x = a + b; else x = c - d; @@ -2247,9 +2321,14 @@
> > > noce_try_cmove_arith (struct noce_if_info *if_info)
> > >    /* If either operand is complex, load it into a register first.
> > >       The best way to do this is to copy the original insn.  In this
> > >       way we preserve any clobbers etc that the insn may have had.
> > > -     This is of course not possible in the IS_MEM case.  */
> > > +     This is of course not possible in the IS_MEM case.
> > > +     For load or store a operands may trap or fault, should not
> > > +     hoist the load or store, otherwise it unable to suppress memory
> > > +     fault, it just a normal arithmetic insn insteads of conditional
> > > +     faulting movcc.  */
> > > -  if (! general_operand (a, GET_MODE (a)) || tmp_a)
> > > +  if (! may_trap_or_fault_p (a)
> > > +      && (! general_operand (a, GET_MODE (a)) || tmp_a))
> > >      {
> > >        if (is_mem)
> > > @@ -2278,7 +2357,8 @@ noce_try_cmove_arith (struct noce_if_info
> *if_info)
> > >               }
> > >      }
> > > -  if (! general_operand (b, GET_MODE (b)) || tmp_b)
> > > +  if (! may_trap_or_fault_p (b)
> > > +      && (! general_operand (b, GET_MODE (b)) || tmp_b))
> > >      {
> > >        if (is_mem)
> > >               {
> > > @@ -4210,12 +4290,31 @@ noce_process_if_block (struct noce_if_info
> > *if_info)
> > >      }
> > >    if (!set_b && MEM_P (orig_x))
> > > -    /* We want to avoid store speculation to avoid cases like
> > > -              if (pthread_mutex_trylock(mutex))
> > > -                 ++global_variable;
> > > -       Rather than go to much effort here, we rely on the SSA optimizers,
> > > -       which do a good enough job these days.  */
> > > -    return false;
> > > +    {
> > > +      /* When target support conditional faulting movcc, i.e. x86 cfcmov,
> > > +             we could do conditonal mem store for "if (...) *x = a; else skip"
> > > +             to cfmovcc_optab, which x may trap or fault.  */
> > > +      if ((optab_handler (cfmovcc_optab, GET_MODE (orig_x))
> > > +                != CODE_FOR_nothing)
> > > +               && HAVE_conditional_move
> > > +               && may_trap_or_fault_p (orig_x)
> > > +               && register_operand (a, GET_MODE (orig_x)))
> > > +             {
> > > +               x = orig_x;
> > > +               if_info->x = x;
> > > +               if (noce_try_cmove_arith (if_info))
> > > +                 goto success;
> > > +               else
> > > +                 return false;
> > > +             }
> > > +      /* We want to avoid store speculation to avoid cases like
> > > +                if (pthread_mutex_trylock(mutex))
> > > +                  ++global_variable;
> > > +             Rather than go to much effort here, we rely on the SSA optimizers,
> > > +             which do a good enough job these days.  */
> > > +      else
> > > +       return false;
> > > +    }
> > >    if (noce_try_move (if_info))
> > >      goto success;
> > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
> > > 2bcb3f7b47a..181edf64f03 100644
> > > --- a/gcc/optabs.cc
> > > +++ b/gcc/optabs.cc
> > > @@ -5034,6 +5034,7 @@ emit_conditional_move (rtx target, struct
> > rtx_comparison comp,
> > >    rtx_insn *last;
> > >    enum insn_code icode;
> > >    enum rtx_code reversed;
> > > +  optab op = movcc_optab;
> > >    /* If the two source operands are identical, that's just a move.
> > > */ @@ -5082,7 +5083,11 @@ emit_conditional_move (rtx target, struct
> > rtx_comparison comp,
> > >    if (mode == VOIDmode)
> > >      mode = GET_MODE (op2);
> > > -  icode = direct_optab_handler (movcc_optab, mode);
> > > +  if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
> > > +      && optab_handler (cfmovcc_optab, mode) != CODE_FOR_nothing)
> > > +    op = cfmovcc_optab;
> > > +
> > > +  icode = direct_optab_handler (op, mode);
> > >    if (icode == CODE_FOR_nothing)
> > >      return NULL_RTX;
> > > @@ -5194,6 +5199,7 @@ emit_conditional_move_1 (rtx target, rtx
> > comparison,
> > >                                              rtx op2, rtx op3,
> > > machine_mode mode) {
> > >    enum insn_code icode;
> > > +  optab op = movcc_optab;
> > >    if (comparison == NULL_RTX || !COMPARISON_P (comparison))
> > >      return NULL_RTX;
> > > @@ -5214,7 +5220,11 @@ emit_conditional_move_1 (rtx target, rtx
> > comparison,
> > >    if (mode == VOIDmode)
> > >      mode = GET_MODE (op2);
> > > -  icode = direct_optab_handler (movcc_optab, mode);
> > > +  if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
> > > +      && optab_handler (cfmovcc_optab, mode) != CODE_FOR_nothing)
> > > +    op = cfmovcc_optab;
> > > +
> > > +  icode = direct_optab_handler (op, mode);
> > >    if (icode == CODE_FOR_nothing)
> > >      return NULL_RTX;
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > 58a939442bd..893e5c1c6c2 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -551,3 +551,4 @@ OPTAB_D (len_store_optab, "len_store_$a")
> > > OPTAB_D (select_vl_optab, "select_vl$a") OPTAB_D (andn_optab,
> > > "andn$a3") OPTAB_D (iorn_optab, "iorn$a3")
> > > +OPTAB_D (cfmovcc_optab, "cfmov$acc")
> > > --
> > > 2.31.1
diff mbox series

Patch

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index a9259112251..5f563787c49 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8591,6 +8591,16 @@  Return 1 if operand 1 is a normal floating point number and 0
otherwise.  @var{m} is a scalar floating point mode.  Operand 0
has mode @code{SImode}, and operand 1 has mode @var{m}.
+@cindex @code{cfmov@var{mode}cc} instruction pattern
+@item @samp{cfmov@var{mode}cc}
+Similar to @samp{mov@var{mode}cc} but for conditional faulting,
+If the comparison is false, all memory faults are suppressed
+when load or store a memory operand.
+
+Conditionally move operand 2 or operand 3 into operand 0 according
+to the comparison in operand 1.  If the comparison is true, operand 2
+is moved into operand 0, otherwise operand 3 is moved.
+
@end table
 @end ifset
diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 6487574c514..59845390607 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -778,6 +778,9 @@  static bool noce_try_store_flag_mask (struct noce_if_info *);
static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
                                                rtx, rtx, rtx, rtx = NULL, rtx = NULL);
static bool noce_try_cmove (struct noce_if_info *);
+static bool can_use_cmove_load_mem_notrap (rtx, rtx);
+static bool can_use_cmove_store_mem_notrap (rtx, rtx, rtx, bool);
+static bool can_use_cfmovcc (struct noce_if_info *);
static bool noce_try_cmove_arith (struct noce_if_info *);
static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
static bool noce_try_minmax (struct noce_if_info *);
@@ -2132,6 +2135,69 @@  noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
   return true;
}
+/* Return TRUE if we could convert "if (test) x = *a; else x = b;"
+   or "if (test) x = a; else x = *b;" to conditional faulting movcc,
+   i.e. x86 cfcmov, especially when load a or b may cause memmory faults.  */
+
+static bool
+can_use_cmove_load_mem_notrap (rtx a, rtx b)
+{
+  /* Just handle a conditional move from one trap MEM + other non_trap,
+     non mem cases.  */
+  if (!(MEM_P (a) ^ MEM_P (b)))
+      return false;
+  bool a_trap = may_trap_or_fault_p (a);
+  bool b_trap = may_trap_or_fault_p (b);
+
+  if (!(a_trap ^ b_trap))
+    return false;
+  if (a_trap && !MEM_P (a))
+    return false;
+  if (b_trap && !MEM_P (b))
+    return false;
+
+  return true;
+}
+
+/* Return TRUE if we could convert "if (test) *x = a; else skip" to
+   conditional faulting movcc, i.e. x86 cfcmov, especially when store
+   x may cause memmory faults and in else_bb x == b.  */
+
+static bool
+can_use_cmove_store_mem_notrap (rtx x, rtx a, rtx b, bool a_simple)
+{
+  gcc_assert (MEM_P (x));
+
+  machine_mode x_mode = GET_MODE (x);
+
+  if (!rtx_equal_p (x, b) || !may_trap_or_fault_p (x))
+    return false;
+  if (!a_simple || !register_operand (a, x_mode))
+    return false;
+
+  return true;
+}
+
+/* Return TRUE if backend supports cfmovcc_optab, which suppressed memory
+   faults when load or store a memory operand and the condition code
+   evaluates to false.  */
+
+static bool
+can_use_cfmovcc (struct noce_if_info *if_info)
+{
+  rtx a = if_info->a;
+  rtx b = if_info->b;
+  rtx x = if_info->x;
+
+  if (optab_handler (cfmovcc_optab, GET_MODE (x)) == CODE_FOR_nothing)
+    return false;
+
+  if (MEM_P (x))
+    return can_use_cmove_store_mem_notrap (x, a, b, if_info->then_simple);
+  else
+    return can_use_cmove_load_mem_notrap (a, b);
+}
+
/* Try more complex cases involving conditional_move.  */
 static bool
@@ -2171,7 +2237,15 @@  noce_try_cmove_arith (struct noce_if_info *if_info)
   /* ??? We could handle this if we knew that a load from A or B could
      not trap or fault.  This is also true if we've already loaded
      from the address along the path from ENTRY.  */
-  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
+  else if (may_trap_or_fault_p (a) && may_trap_or_fault_p (b))
+    return false;
+  /* cfmovcc_optab implements conditionally faulting which means
+     that if the condition code evaluates to false, all memory faults
+     are suppressed when load or store a memory operand.  Now we could
+     load or store a memory operand may trap or fault for conditional
+     move.  */
+  else if ((may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
+                && !can_use_cfmovcc (if_info))
     return false;
   /* if (test) x = a + b; else x = c - d;
@@ -2247,9 +2321,14 @@  noce_try_cmove_arith (struct noce_if_info *if_info)
   /* If either operand is complex, load it into a register first.
      The best way to do this is to copy the original insn.  In this
      way we preserve any clobbers etc that the insn may have had.
-     This is of course not possible in the IS_MEM case.  */
+     This is of course not possible in the IS_MEM case.
+     For load or store a operands may trap or fault, should not
+     hoist the load or store, otherwise it unable to suppress memory
+     fault, it just a normal arithmetic insn insteads of conditional
+     faulting movcc.  */
-  if (! general_operand (a, GET_MODE (a)) || tmp_a)
+  if (! may_trap_or_fault_p (a)
+      && (! general_operand (a, GET_MODE (a)) || tmp_a))
     {
       if (is_mem)
@@ -2278,7 +2357,8 @@  noce_try_cmove_arith (struct noce_if_info *if_info)
              }
     }
-  if (! general_operand (b, GET_MODE (b)) || tmp_b)
+  if (! may_trap_or_fault_p (b)
+      && (! general_operand (b, GET_MODE (b)) || tmp_b))
     {
       if (is_mem)
              {
@@ -4210,12 +4290,31 @@  noce_process_if_block (struct noce_if_info *if_info)
     }
   if (!set_b && MEM_P (orig_x))
-    /* We want to avoid store speculation to avoid cases like
-              if (pthread_mutex_trylock(mutex))
-                 ++global_variable;
-       Rather than go to much effort here, we rely on the SSA optimizers,
-       which do a good enough job these days.  */
-    return false;
+    {
+      /* When target support conditional faulting movcc, i.e. x86 cfcmov,
+             we could do conditonal mem store for "if (...) *x = a; else skip"
+             to cfmovcc_optab, which x may trap or fault.  */
+      if ((optab_handler (cfmovcc_optab, GET_MODE (orig_x))
+                != CODE_FOR_nothing)
+               && HAVE_conditional_move
+               && may_trap_or_fault_p (orig_x)
+               && register_operand (a, GET_MODE (orig_x)))
+             {
+               x = orig_x;
+               if_info->x = x;
+               if (noce_try_cmove_arith (if_info))
+                 goto success;
+               else
+                 return false;
+             }
+      /* We want to avoid store speculation to avoid cases like
+                if (pthread_mutex_trylock(mutex))
+                  ++global_variable;
+             Rather than go to much effort here, we rely on the SSA optimizers,
+             which do a good enough job these days.  */
+      else
+       return false;
+    }
   if (noce_try_move (if_info))
     goto success;
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 2bcb3f7b47a..181edf64f03 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -5034,6 +5034,7 @@  emit_conditional_move (rtx target, struct rtx_comparison comp,
   rtx_insn *last;
   enum insn_code icode;
   enum rtx_code reversed;
+  optab op = movcc_optab;
   /* If the two source operands are identical, that's just a move.  */
@@ -5082,7 +5083,11 @@  emit_conditional_move (rtx target, struct rtx_comparison comp,
   if (mode == VOIDmode)
     mode = GET_MODE (op2);
-  icode = direct_optab_handler (movcc_optab, mode);
+  if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
+      && optab_handler (cfmovcc_optab, mode) != CODE_FOR_nothing)
+    op = cfmovcc_optab;
+
+  icode = direct_optab_handler (op, mode);
   if (icode == CODE_FOR_nothing)
     return NULL_RTX;
@@ -5194,6 +5199,7 @@  emit_conditional_move_1 (rtx target, rtx comparison,
                                             rtx op2, rtx op3, machine_mode mode)
{
   enum insn_code icode;
+  optab op = movcc_optab;
   if (comparison == NULL_RTX || !COMPARISON_P (comparison))
     return NULL_RTX;
@@ -5214,7 +5220,11 @@  emit_conditional_move_1 (rtx target, rtx comparison,
   if (mode == VOIDmode)
     mode = GET_MODE (op2);
-  icode = direct_optab_handler (movcc_optab, mode);
+  if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
+      && optab_handler (cfmovcc_optab, mode) != CODE_FOR_nothing)
+    op = cfmovcc_optab;
+
+  icode = direct_optab_handler (op, mode);
   if (icode == CODE_FOR_nothing)
     return NULL_RTX;
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 58a939442bd..893e5c1c6c2 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -551,3 +551,4 @@  OPTAB_D (len_store_optab, "len_store_$a")
OPTAB_D (select_vl_optab, "select_vl$a")
OPTAB_D (andn_optab, "andn$a3")
OPTAB_D (iorn_optab, "iorn$a3")
+OPTAB_D (cfmovcc_optab, "cfmov$acc")