diff mbox

Take known zero bits into account when checking extraction.

Message ID 20160427082004.GE5082@linux.vnet.ibm.com
State New
Headers show

Commit Message

Dominik Vogt April 27, 2016, 8:20 a.m. UTC
The attached patch is a result of discussing an S/390 issue with
"and with complement" in some cases.

  https://gcc.gnu.org/ml/gcc/2016-03/msg00163.html
  https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01586.html

Combine would merge a ZERO_EXTEND and a SET taking the known zero
bits into account, resulting in an AND.  Later on,
make_compound_operation() fails to replace that with a ZERO_EXTEND
which we get for free on S/390 but leaves the AND, eventually
resulting in two consecutive AND instructions.

The current code in make_compound_operation() that detects
opportunities for ZERO_EXTEND does not work here because it does
not take the known zero bits into account:

      /* If the constant is one less than a power of two, this might be
	 representable by an extraction even if no shift is present.
	 If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
	 we are in a COMPARE.  */
      else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
	new_rtx = make_extraction (mode,
			       make_compound_operation (XEXP (x, 0),
							next_code),
			       0, NULL_RTX, i, 1, 0, in_code == COMPARE);

An attempt to use the zero bits in the above conditions resulted
in many situations that generated worse code, so the patch tries
to fix this in a more conservative way.  While the effect is
completely positive on S/390, this will very likely have
unforeseeable consequences on other targets.

Bootstrapped and regression tested on s390 and s390x only at the
moment.

Ciao

Dominik ^_^  ^_^

Comments

Jeff Law April 28, 2016, 4:24 a.m. UTC | #1
On 04/27/2016 02:20 AM, Dominik Vogt wrote:
> The attached patch is a result of discussing an S/390 issue with
> "and with complement" in some cases.
>
>   https://gcc.gnu.org/ml/gcc/2016-03/msg00163.html
>   https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01586.html
>
> Combine would merge a ZERO_EXTEND and a SET taking the known zero
> bits into account, resulting in an AND.  Later on,
> make_compound_operation() fails to replace that with a ZERO_EXTEND
> which we get for free on S/390 but leaves the AND, eventually
> resulting in two consecutive AND instructions.
>
> The current code in make_compound_operation() that detects
> opportunities for ZERO_EXTEND does not work here because it does
> not take the known zero bits into account:
>
>       /* If the constant is one less than a power of two, this might be
> 	 representable by an extraction even if no shift is present.
> 	 If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
> 	 we are in a COMPARE.  */
>       else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
> 	new_rtx = make_extraction (mode,
> 			       make_compound_operation (XEXP (x, 0),
> 							next_code),
> 			       0, NULL_RTX, i, 1, 0, in_code == COMPARE);
>
> An attempt to use the zero bits in the above conditions resulted
> in many situations that generated worse code, so the patch tries
> to fix this in a more conservative way.  While the effect is
> completely positive on S/390, this will very likely have
> unforeseeable consequences on other targets.
>
> Bootstrapped and regression tested on s390 and s390x only at the
> moment.
>
> Ciao
>
> Dominik ^_^  ^_^
>
> -- Dominik Vogt IBM Germany
>
>
> 0001-ChangeLog
>
>
> gcc/ChangeLog
>
> 	* combine.c (make_compound_operation): Take known zero bits into
> 	account when checking for possible zero_extend.
I'd strongly recommend writing some tests for this.  Extra credit if 
they can be run on an x86 target which gets more testing than s390.

If I go back to our original discussion, we have this going into combine:

(insn 6 3 7 2 (parallel [
             (set (reg:SI 64)
                 (and:SI (mem:SI (reg/v/f:DI 63 [ a ]) [1 *a_2(D)+0 S4 A32])
                     (const_int -65521 [0xffffffffffff000f])))
             (clobber (reg:CC 33 %cc))
         ]) andc-immediate.c:21 1481 {*andsi3_zarch}
      (expr_list:REG_DEAD (reg/v/f:DI 63 [ a ])
         (expr_list:REG_UNUSED (reg:CC 33 %cc)
             (nil))))
(insn 7 6 12 2 (set (reg:DI 65)
         (zero_extend:DI (reg:SI 64))) andc-immediate.c:21 1207 
{*zero_extendsidi2}
      (expr_list:REG_DEAD (reg:SI 64)
         (nil)))
(insn 12 7 13 2 (set (reg/i:DI 2 %r2)
         (reg:DI 65)) andc-immediate.c:22 1073 {*movdi_64}
      (expr_list:REG_DEAD (reg:DI 65)
         (nil)))

Which combine turns into:

(insn 6 3 7 2 (parallel [
             (set (reg:SI 64)
                 (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
                     (const_int -65521 [0xffffffffffff000f])))
             (clobber (reg:CC 33 %cc))
         ]) andc-immediate.c:21 1481 {*andsi3_zarch}
      (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ])
         (expr_list:REG_UNUSED (reg:CC 33 %cc)
             (nil))))
(insn 12 7 13 2 (parallel [
             (set (reg/i:DI 2 %r2)
                 (and:DI (subreg:DI (reg:SI 64) 0)
                  ^^^
                     (const_int 4294901775 [0xffff000f])))
                                            ^^^^^^^^^^
             (clobber (reg:CC 33 %cc))
         ]) andc-immediate.c:22 1474 {*anddi3}
      (expr_list:REG_UNUSED (reg:CC 33 %cc)
         (expr_list:REG_DEAD (reg:SI 64)
             (nil))))


Instead you want insn 12 to use a zero-extend to extend (reg:SI 64) into 
(reg:DI 2)?

Can't you achieve this in this clause:

      /* If the constant is one less than a power of two, this might be
          representable by an extraction even if no shift is present.
          If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
          we are in a COMPARE.  */

You extract the constant via UINTVAL (XEXP (x, 1)), then munge it based 
on nonzero_bits and pass the result to exact_log2?

Though I do like how you've conditionalized on the cost of the and vs 
the cost of hte zero-extend.  So maybe your approach is ultimately 
better.  Still curious your thoughts on doing it by just munging the 
constant you pass off to exact_log2 in that earlier clause.


> +      /* If the one operand is a paradoxical subreg of a register or memory and
> +	 the constant (limited to the smaller mode) has only zero bits where
> +	 the sub expression has known zero bits, this can be expressed as
> +	 a zero_extend.  */
> +      else if (GET_CODE (XEXP (x, 0)) == SUBREG)
> +	{
> +	  rtx sub;
> +
> +	  sub = XEXP (XEXP (x, 0), 0);
> +	  machine_mode sub_mode = GET_MODE (sub);
> +	  if ((REG_P (sub) || MEM_P (sub))
> +	      && GET_MODE_PRECISION (sub_mode) < mode_width
> +	      && (UINTVAL (XEXP (x, 1))
> +		  | (~nonzero_bits (sub, sub_mode) & GET_MODE_MASK (sub_mode))
> +		  ) == GET_MODE_MASK (sub_mode))
I'd either factor something out or write a nested conditional to avoid 
the horrid line wrapping here and hopefully make the conditions easier 
to read.

Jeff
Dominik Vogt April 29, 2016, 9:35 a.m. UTC | #2
On Wed, Apr 27, 2016 at 10:24:21PM -0600, Jeff Law wrote:
> On 04/27/2016 02:20 AM, Dominik Vogt wrote:
> >	* combine.c (make_compound_operation): Take known zero bits into
> >	account when checking for possible zero_extend.
> I'd strongly recommend writing some tests for this.  Extra credit if
> they can be run on an x86 target which gets more testing than s390.

I'll look into that later.

> If I go back to our original discussion, we have this going into combine:
> 
> (insn 6 3 7 2 (parallel [
>             (set (reg:SI 64)
>                 (and:SI (mem:SI (reg/v/f:DI 63 [ a ]) [1 *a_2(D)+0 S4 A32])
>                     (const_int -65521 [0xffffffffffff000f])))
>             (clobber (reg:CC 33 %cc))
>         ]) andc-immediate.c:21 1481 {*andsi3_zarch}
>      (expr_list:REG_DEAD (reg/v/f:DI 63 [ a ])
>         (expr_list:REG_UNUSED (reg:CC 33 %cc)
>             (nil))))
> (insn 7 6 12 2 (set (reg:DI 65)
>         (zero_extend:DI (reg:SI 64))) andc-immediate.c:21 1207
> {*zero_extendsidi2}
>      (expr_list:REG_DEAD (reg:SI 64)
>         (nil)))
> (insn 12 7 13 2 (set (reg/i:DI 2 %r2)
>         (reg:DI 65)) andc-immediate.c:22 1073 {*movdi_64}
>      (expr_list:REG_DEAD (reg:DI 65)
>         (nil)))
> 
> Which combine turns into:
> 
> (insn 6 3 7 2 (parallel [
>             (set (reg:SI 64)
>                 (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
>                     (const_int -65521 [0xffffffffffff000f])))
>             (clobber (reg:CC 33 %cc))
>         ]) andc-immediate.c:21 1481 {*andsi3_zarch}
>      (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ])
>         (expr_list:REG_UNUSED (reg:CC 33 %cc)
>             (nil))))
> (insn 12 7 13 2 (parallel [
>             (set (reg/i:DI 2 %r2)
>                 (and:DI (subreg:DI (reg:SI 64) 0)
>                  ^^^
>                     (const_int 4294901775 [0xffff000f])))
>                                            ^^^^^^^^^^
>             (clobber (reg:CC 33 %cc))
>         ]) andc-immediate.c:22 1474 {*anddi3}
>      (expr_list:REG_UNUSED (reg:CC 33 %cc)
>         (expr_list:REG_DEAD (reg:SI 64)
>             (nil))))
> 
> 
> Instead you want insn 12 to use a zero-extend to extend (reg:SI 64)
> into (reg:DI 2)?

Yes, because we get the zero extend for free in this case (through
the constant in the AND or because the input value is a function
argument that is already zero extended).

> Can't you achieve this in this clause:
> 
>      /* If the constant is one less than a power of two, this might be
>          representable by an extraction even if no shift is present.
>          If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
>          we are in a COMPARE.  */
> 
> You extract the constant via UINTVAL (XEXP (x, 1)), then munge it
> based on nonzero_bits and pass the result to exact_log2?

That's what we tried first, but it resulted in worse code in many
places (saved about 250 instructions in the SPEC2006 testsuite but
added about 42000 elsewhere).  It was so bad that I didn't even
bother to check what's going on.  Probably this was triggered all
over the place by small constants like 1, 3, 7 and the like where
s390 has no cheap way for zero extension.  So I limited this to
constants that are actually mode masks, implicitly assuming that
there are zero extend instructions only for known modes (which is
true for s390 but may not for some other targets).  Being
conservative here shouldn't hurt; but I wonder whether there are
targets where this condition still allows too much.

> Though I do like how you've conditionalized on the cost of the and
> vs the cost of hte zero-extend.  So maybe your approach is
> ultimately better.

Actually we wanted to remove the call to rtx_cost() (because
usually combine just assumes that a zero extend is cheaper).  I've
probably forgotten to remove it before posting the patch.  For
s390 it's meaningless whether rtx_cost() is called or not because
at the moment it doesn't model the cost of zero extension (i.e.
the cost of either way is just one instruction, and without
context it's not possible to decide whether s390 needs a separate
instruction for the zero extend or whether it comes for free).

> Still curious your thoughts on doing it by just
> munging the constant you pass off to exact_log2 in that earlier
> clause.

> >+      /* If the one operand is a paradoxical subreg of a register or memory and
> >+	 the constant (limited to the smaller mode) has only zero bits where
> >+	 the sub expression has known zero bits, this can be expressed as
> >+	 a zero_extend.  */
> >+      else if (GET_CODE (XEXP (x, 0)) == SUBREG)
> >+	{
> >+	  rtx sub;
> >+
> >+	  sub = XEXP (XEXP (x, 0), 0);
> >+	  machine_mode sub_mode = GET_MODE (sub);
> >+	  if ((REG_P (sub) || MEM_P (sub))
> >+	      && GET_MODE_PRECISION (sub_mode) < mode_width
> >+	      && (UINTVAL (XEXP (x, 1))
> >+		  | (~nonzero_bits (sub, sub_mode) & GET_MODE_MASK (sub_mode))
> >+		  ) == GET_MODE_MASK (sub_mode))
> I'd either factor something out or write a nested conditional to
> avoid the horrid line wrapping here and hopefully make the
> conditions easier to read.

Will do.

Ciao

Dominik ^_^  ^_^
Dominik Vogt May 9, 2016, 10:07 a.m. UTC | #3
On Wed, Apr 27, 2016 at 10:24:21PM -0600, Jeff Law wrote:
> On 04/27/2016 02:20 AM, Dominik Vogt wrote:
> >The attached patch is a result of discussing an S/390 issue with
> >"and with complement" in some cases.
> >
> >  https://gcc.gnu.org/ml/gcc/2016-03/msg00163.html
> >  https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01586.html
> >
> >Combine would merge a ZERO_EXTEND and a SET taking the known zero
> >bits into account, resulting in an AND.  Later on,
> >make_compound_operation() fails to replace that with a ZERO_EXTEND
> >which we get for free on S/390 but leaves the AND, eventually
> >resulting in two consecutive AND instructions.
> >
> >The current code in make_compound_operation() that detects
> >opportunities for ZERO_EXTEND does not work here because it does
> >not take the known zero bits into account:
> >
> >      /* If the constant is one less than a power of two, this might be
> >	 representable by an extraction even if no shift is present.
> >	 If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
> >	 we are in a COMPARE.  */
> >      else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
> >	new_rtx = make_extraction (mode,
> >			       make_compound_operation (XEXP (x, 0),
> >							next_code),
> >			       0, NULL_RTX, i, 1, 0, in_code == COMPARE);
> >
> >An attempt to use the zero bits in the above conditions resulted
> >in many situations that generated worse code, so the patch tries
> >to fix this in a more conservative way.  While the effect is
> >completely positive on S/390, this will very likely have
> >unforeseeable consequences on other targets.

> >	* combine.c (make_compound_operation): Take known zero bits into
> >	account when checking for possible zero_extend.
> I'd strongly recommend writing some tests for this.

This turns out to be quite difficult.  A small test function
effectively just returns the argument:

  unsigned long bar (unsigned long in) 
  { 
    if ((in & 1) == 0) 
      in = (in & ~(unsigned long)1); 
 
    return in; 
  }

However, Gcc does not notice that the AND is a no-op.  As far as I
understand, zero bit tracking is only done in "combine", so when
folding the assignment statement the information that the lowest
bit is zero is not available and therefore the no-op is not
detected?

(I've been trying to trigger the code from the patch with a
function bases on the above construct.)

Ciao

Dominik ^_^  ^_^
Marc Glisse May 9, 2016, 1:36 p.m. UTC | #4
On Mon, 9 May 2016, Dominik Vogt wrote:

> This turns out to be quite difficult.  A small test function
> effectively just returns the argument:
>
>  unsigned long bar (unsigned long in)
>  {
>    if ((in & 1) == 0)
>      in = (in & ~(unsigned long)1);
>
>    return in;
>  }
>
> However, Gcc does not notice that the AND is a no-op.  As far as I
> understand, zero bit tracking is only done in "combine", so when
> folding the assignment statement the information that the lowest
> bit is zero is not available and therefore the no-op is not
> detected?

VRP is also supposed to track bits that may/must be non-zero. It may be 
possible to enhance it to handle this case.
Richard Biener May 10, 2016, 9:14 a.m. UTC | #5
On Mon, May 9, 2016 at 3:36 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Mon, 9 May 2016, Dominik Vogt wrote:
>
>> This turns out to be quite difficult.  A small test function
>> effectively just returns the argument:
>>
>>  unsigned long bar (unsigned long in)
>>  {
>>    if ((in & 1) == 0)
>>      in = (in & ~(unsigned long)1);
>>
>>    return in;
>>  }
>>
>> However, Gcc does not notice that the AND is a no-op.  As far as I
>> understand, zero bit tracking is only done in "combine", so when
>> folding the assignment statement the information that the lowest
>> bit is zero is not available and therefore the no-op is not
>> detected?
>
>
> VRP is also supposed to track bits that may/must be non-zero. It may be
> possible to enhance it to handle this case.

Actually VRP only tracks value-ranges, CCP tracks may/must be non-zero bits
but does not have a conditional lattice.

Richard.

> --
> Marc Glisse
Dominik Vogt May 10, 2016, 10:07 a.m. UTC | #6
On Tue, May 10, 2016 at 11:14:35AM +0200, Richard Biener wrote:
> On Mon, May 9, 2016 at 3:36 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> > On Mon, 9 May 2016, Dominik Vogt wrote:
> >
> >> This turns out to be quite difficult.  A small test function
> >> effectively just returns the argument:
> >>
> >>  unsigned long bar (unsigned long in)
> >>  {
> >>    if ((in & 1) == 0)
> >>      in = (in & ~(unsigned long)1);
> >>
> >>    return in;
> >>  }
> >>
> >> However, Gcc does not notice that the AND is a no-op.  As far as I
> >> understand, zero bit tracking is only done in "combine", so when
> >> folding the assignment statement the information that the lowest
> >> bit is zero is not available and therefore the no-op is not
> >> detected?
> >
> >
> > VRP is also supposed to track bits that may/must be non-zero. It may be
> > possible to enhance it to handle this case.
> 
> Actually VRP only tracks value-ranges, CCP tracks may/must be non-zero bits
> but does not have a conditional lattice.

Does it track known *one* bits too?  Because the AND doesn't get
eliminated here either:

  in = in | 3;
  in = in ^ 1;
  in = (in & ~(unsigned long)1);

Ciao

Dominik ^_^  ^_^
Richard Biener May 10, 2016, 10:25 a.m. UTC | #7
On Tue, May 10, 2016 at 12:07 PM, Dominik Vogt <vogt@linux.vnet.ibm.com> wrote:
> On Tue, May 10, 2016 at 11:14:35AM +0200, Richard Biener wrote:
>> On Mon, May 9, 2016 at 3:36 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> > On Mon, 9 May 2016, Dominik Vogt wrote:
>> >
>> >> This turns out to be quite difficult.  A small test function
>> >> effectively just returns the argument:
>> >>
>> >>  unsigned long bar (unsigned long in)
>> >>  {
>> >>    if ((in & 1) == 0)
>> >>      in = (in & ~(unsigned long)1);
>> >>
>> >>    return in;
>> >>  }
>> >>
>> >> However, Gcc does not notice that the AND is a no-op.  As far as I
>> >> understand, zero bit tracking is only done in "combine", so when
>> >> folding the assignment statement the information that the lowest
>> >> bit is zero is not available and therefore the no-op is not
>> >> detected?
>> >
>> >
>> > VRP is also supposed to track bits that may/must be non-zero. It may be
>> > possible to enhance it to handle this case.
>>
>> Actually VRP only tracks value-ranges, CCP tracks may/must be non-zero bits
>> but does not have a conditional lattice.
>
> Does it track known *one* bits too?  Because the AND doesn't get
> eliminated here either:
>
>   in = in | 3;
>   in = in ^ 1;
>   in = (in & ~(unsigned long)1);

Yes, but CCP itself does not eliminate this unless something in match.pd
simplifies this (ccp_fold is now simply dispatching there) later.  This means
CCP doesn't properly identify the copy here:

Visiting statement:
in_2 = in_1(D) | 3;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x3 (0x0fffffffffffffffc).  Adding
SSA edges to worklist.
interesting_ssa_edges: adding SSA use in in_3 = in_2 ^ 1;
marking stmt to be not simulated again

Visiting statement:
in_3 = in_2 ^ 1;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x2 (0x0fffffffffffffffc).  Adding
SSA edges to worklist.
interesting_ssa_edges: adding SSA use in in_4 = in_3 & 18446744073709551614;
marking stmt to be not simulated again

Visiting statement:
in_4 = in_3 & 18446744073709551614;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x2 (0x0fffffffffffffffc).  Adding
SSA edges to worklist.
interesting_ssa_edges: adding SSA use in _5 = in_4;
marking stmt to be not simulated again

so it only computes new known bits but not whether the operation is
value-preserving.

Richard.

> Ciao
>
> Dominik ^_^  ^_^
>
> --
>
> Dominik Vogt
> IBM Germany
>
Jeff Law May 16, 2016, 7:06 p.m. UTC | #8
On 04/29/2016 03:35 AM, Dominik Vogt wrote:
> On Wed, Apr 27, 2016 at 10:24:21PM -0600, Jeff Law wrote:
>> Instead you want insn 12 to use a zero-extend to extend (reg:SI 64)
>> into (reg:DI 2)?
>
> Yes, because we get the zero extend for free in this case (through
> the constant in the AND or because the input value is a function
> argument that is already zero extended).
>
>> Can't you achieve this in this clause:
>>
>>      /* If the constant is one less than a power of two, this might be
>>          representable by an extraction even if no shift is present.
>>          If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
>>          we are in a COMPARE.  */
>>
>> You extract the constant via UINTVAL (XEXP (x, 1)), then munge it
>> based on nonzero_bits and pass the result to exact_log2?
>
> That's what we tried first, but it resulted in worse code in many
> places (saved about 250 instructions in the SPEC2006 testsuite but
> added about 42000 elsewhere).  It was so bad that I didn't even
> bother to check what's going on.  Probably this was triggered all
> over the place by small constants like 1, 3, 7 and the like where
> s390 has no cheap way for zero extension.  So I limited this to
> constants that are actually mode masks, implicitly assuming that
> there are zero extend instructions only for known modes (which is
> true for s390 but may not for some other targets).  Being
> conservative here shouldn't hurt; but I wonder whether there are
> targets where this condition still allows too much.
You're probably right.  FWIW, I do believe a variety of targets can do 
these kind of zero extensions.  The PA for example has extru which can 
extract a field from a general register zero extend it, then place the 
result, right justified into another register.


We don't get them "for free", except as a component of a larger sequence 
of statements for bitfield extraction/manipulation.

I believe PPC has similar capabilities.
jeff
diff mbox

Patch

From e70e6e469200b53b3f4ae52a766cdd322a4d365d Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Tue, 12 Apr 2016 09:53:46 +0100
Subject: [PATCH] Take known zero bits into account when checking
 extraction.

Allows AND Insns with a const_int operand to be expressed as ZERO_EXTEND if the
operand ist a power of 2 - 1 even with the known zero bits masked out.
---
 gcc/combine.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index 1d0e8be..44bb1b3 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7988,6 +7988,39 @@  make_compound_operation (rtx x, enum rtx_code in_code)
 							next_code),
 			       i, NULL_RTX, 1, 1, 0, 1);
 
+      /* If the one operand is a paradoxical subreg of a register or memory and
+	 the constant (limited to the smaller mode) has only zero bits where
+	 the sub expression has known zero bits, this can be expressed as
+	 a zero_extend.  */
+      else if (GET_CODE (XEXP (x, 0)) == SUBREG)
+	{
+	  rtx sub;
+
+	  sub = XEXP (XEXP (x, 0), 0);
+	  machine_mode sub_mode = GET_MODE (sub);
+	  if ((REG_P (sub) || MEM_P (sub))
+	      && GET_MODE_PRECISION (sub_mode) < mode_width
+	      && (UINTVAL (XEXP (x, 1))
+		  | (~nonzero_bits (sub, sub_mode) & GET_MODE_MASK (sub_mode))
+		  ) == GET_MODE_MASK (sub_mode))
+	    {
+	      bool speed_p = optimize_insn_for_speed_p ();
+	      rtx temp = gen_rtx_ZERO_EXTEND (mode, sub);
+	      int cost_of_and;
+	      int cost_of_zero_extend;
+
+	      cost_of_and = rtx_cost (x, mode, in_code, 1, speed_p);
+	      cost_of_zero_extend = rtx_cost (temp, mode, in_code, 1, speed_p);
+	      if (cost_of_zero_extend <= cost_of_and)
+		{
+		  new_rtx = make_compound_operation (sub, next_code);
+		  new_rtx = make_extraction (mode, new_rtx, 0, 0,
+					     GET_MODE_PRECISION (sub_mode),
+					     1, 0, in_code == COMPARE);
+		}
+	    }
+	}
+
       break;
 
     case LSHIFTRT:
-- 
2.3.0