diff mbox

Fix RTL patterns for NEON vmovn

Message ID 20101022141220.772d897a@rex.config
State New
Headers show

Commit Message

Julian Brown Oct. 22, 2010, 1:12 p.m. UTC
Hi,

This patch contains a few fixes (likely latent bugs) for the RTL
instruction patterns implementing NEON vmovn support, as added by:

http://gcc.gnu.org/ml/gcc-patches/2010-08/msg02146.html

The fixes are as follows: firstly, the usage of match_dup in the
neon_move_lo_quad_<mode> and neon_move_hi_quad_<mode> is incorrect.
According to the manual, the proper way to implement this type of
operation (modifying a read/write operand) is to use matching
constraints, i.e.:

http://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#index-match_005fdup-3215

Incidentally this incorrect RTL usage showed up as a wrong-code bug in
the testsuite when backporting vmovn support to our 4.5-based tree,
though I haven't observed it happening with current mainline.
Interestingly or not, on 4.5 the tests which failed first went wrong
because web.c rewrote the destination operand 0 (but not the match_dup'd
source version of same) to a different register for a
neon_move_hi_quad insn. The patch from Bernd Schmitt here:

http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01160.html

appeared sufficient to mask the problem on 4.5, and its presence on
mainline is possibly why the incorrect RTL in the above-named patterns
hasn't shown up problems on the latter so far.

The second and third problems fixed by the patch are incorrect logic to
detect redundant move instructions in neon_move_hi_quad ("dest +
2..."), and a missing length specifier in the
quad-word version of vec_pack_trunc_<mode>.

Tested OK with cross to ARM Linux (-march=armv7-a -mfloat-abi=softfp
-mfpu=neon). OK to apply?

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/neon.md (neon_move_lo_quad_<mode>): Use matching
    constraint instead of match_dup.
    (neon_move_hi_quad_<mode>): Likewise. Fix condition for omitting
    redundant vmov instruction.
    (move_hi_quad_<mode>): Add extra argument to
    gen_neon_move_hi_quad_<mode> call.
    (move_lo_quad_<mode>): Similar, for gen_neon_move_lo_quad_<mode>
    call.
    (vec_pack_trunc_<mode>): Fix assembly template, add length
    specifier.

Comments

Bernd Schmidt Oct. 22, 2010, 1:22 p.m. UTC | #1
On 10/22/2010 03:12 PM, Julian Brown wrote:
> The fixes are as follows: firstly, the usage of match_dup in the
> neon_move_lo_quad_<mode> and neon_move_hi_quad_<mode> is incorrect.
> According to the manual, the proper way to implement this type of
> operation (modifying a read/write operand) is to use matching
> constraints, i.e.:
> 
> http://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#index-match_005fdup-3215

I think the docs are misleading here.  Nothing wrong with using
match_dup IMO, except slightly less freedom for the compiler to modify
the operands during optimization.  See the discussion that followed the
web.c patch you quoted.


Bernd
Julian Brown Oct. 26, 2010, 11:46 a.m. UTC | #2
On Fri, 22 Oct 2010 15:22:18 +0200
Bernd Schmidt <bernds@codesourcery.com> wrote:

> On 10/22/2010 03:12 PM, Julian Brown wrote:
> > The fixes are as follows: firstly, the usage of match_dup in the
> > neon_move_lo_quad_<mode> and neon_move_hi_quad_<mode> is incorrect.
> > According to the manual, the proper way to implement this type of
> > operation (modifying a read/write operand) is to use matching
> > constraints, i.e.:
> > 
> > http://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#index-match_005fdup-3215
> 
> I think the docs are misleading here.  Nothing wrong with using
> match_dup IMO, except slightly less freedom for the compiler to modify
> the operands during optimization.  See the discussion that followed
> the web.c patch you quoted.

To clarify: would you expect using matching constraints rather than
match_dup to still provide some improvement (slightly more freedom
for optimizers to modify operands) and thus still be a useful change?
If not, I could commit the other fixes without that fragment.

Cheers,

Julian
diff mbox

Patch

Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 165700)
+++ gcc/config/arm/neon.md	(working copy)
@@ -1165,11 +1165,11 @@ 
 ; we're really at the mercy of the register allocator.
 
 (define_insn "neon_move_lo_quad_<mode>"
-  [(set (match_operand:ANY128 0 "s_register_operand" "+w")
+  [(set (match_operand:ANY128 0 "s_register_operand" "=w")
         (vec_concat:ANY128
           (match_operand:<V_HALF> 1 "s_register_operand" "w")
           (vec_select:<V_HALF> 
-		(match_dup 0)
+		(match_operand:ANY128 3 "s_register_operand" "0")
 	        (match_operand:ANY128 2 "vect_par_constant_high" ""))))]
   "TARGET_NEON"
 {
@@ -1177,7 +1177,7 @@ 
   int src = REGNO (operands[1]);
 
   if (dest != src)
-    return "vmov\t%e0, %P1";
+    return "vmov\t%e0, %P1\t@ neon_move_lo_quad";
   else
     return "";
 }
@@ -1185,10 +1185,10 @@ 
 )
 
 (define_insn "neon_move_hi_quad_<mode>"
-  [(set (match_operand:ANY128 0 "s_register_operand" "+w")
+  [(set (match_operand:ANY128 0 "s_register_operand" "=w")
         (vec_concat:ANY128
           (vec_select:<V_HALF>
-		(match_dup 0)
+		(match_operand:ANY128 3 "s_register_operand" "0")
 	        (match_operand:ANY128 2 "vect_par_constant_low" ""))
           (match_operand:<V_HALF> 1 "s_register_operand" "w")))]
 	   
@@ -1197,8 +1197,8 @@ 
   int dest = REGNO (operands[0]);
   int src = REGNO (operands[1]);
 
-  if (dest != src)
-    return "vmov\t%f0, %P1";
+  if (dest + 2 != src)
+    return "vmov\t%f0, %P1\t@ neon_move_hi_quad";
   else
     return "";
 }
@@ -1218,7 +1218,8 @@ 
      RTVEC_ELT (v, i) = GEN_INT (i);
 
   t1 = gen_rtx_PARALLEL (<MODE>mode, v);
-  emit_insn (gen_neon_move_hi_quad_<mode> (operands[0], operands[1], t1));
+  emit_insn (gen_neon_move_hi_quad_<mode> (operands[0], operands[1], t1,
+					   operands[0]));
 
   DONE;
 })
@@ -1236,7 +1237,8 @@ 
      RTVEC_ELT (v, i) = GEN_INT ((<V_mode_nunits>/2) + i);
 
   t1 = gen_rtx_PARALLEL (<MODE>mode, v);
-  emit_insn (gen_neon_move_lo_quad_<mode> (operands[0], operands[1], t1));
+  emit_insn (gen_neon_move_lo_quad_<mode> (operands[0], operands[1], t1,
+					   operands[0]));
 
   DONE;
 })
@@ -5443,8 +5445,9 @@ 
 		(truncate:<V_narrow>
 			(match_operand:VN 2 "register_operand" "w"))))]
  "TARGET_NEON"
- "vmovn.i<V_sz_elem>\t%e0, %q1\n\tvmovn.i<V_sz_elem>\t%f0, %q2"
- [(set_attr "neon_type" "neon_shift_1")]
+ "vmovn.i<V_sz_elem>\t%e0, %q1\;vmovn.i<V_sz_elem>\t%f0, %q2"
+ [(set_attr "neon_type" "neon_shift_1")
+  (set_attr "length" "8")]
 )
 
 ;; For the non-quad case.