From patchwork Mon Sep 13 16:17:21 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 64611 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id C9D4BB70A4 for ; Tue, 14 Sep 2010 02:17:42 +1000 (EST) Received: (qmail 4153 invoked by alias); 13 Sep 2010 16:17:37 -0000 Received: (qmail 3711 invoked by uid 22791); 13 Sep 2010 16:17:34 -0000 X-SWARE-Spam-Status: No, hits=-1.1 required=5.0 tests=AWL, BAYES_00, NO_DNS_FOR_FROM, TW_AV, TW_IV, TW_MG, TW_QI, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mga02.intel.com (HELO mga02.intel.com) (134.134.136.20) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 13 Sep 2010 16:17:26 +0000 Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 13 Sep 2010 09:17:24 -0700 X-ExtLoop1: 1 Received: from gnu-6.sc.intel.com ([10.3.194.135]) by orsmga001.jf.intel.com with ESMTP; 13 Sep 2010 09:17:21 -0700 Received: by gnu-6.sc.intel.com (Postfix, from userid 500) id 4AAFF207F1; Mon, 13 Sep 2010 09:17:21 -0700 (PDT) Date: Mon, 13 Sep 2010 09:17:21 -0700 From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Uros Bizjak , Jan Hubicka Subject: RFC: PATCH: Add -m8bit-idiv for x86 Message-ID: <20100913161721.GA18471@intel.com> Reply-To: "H.J. Lu" MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, On some processors like Atom, 32bit integer divide is much slower than 8bit unsigned integer divide. This patch adds an option, -m8bit-idiv, for x86. It turns 32bit integer divide into if (dividend an divisor are in [0-255]) use 8bit unsigned integer divide else use 32bit integer divide I didn't turn it on by default for Atom: 1. If inputs are outside of 8bit unsigned integers, extra compare and branch will always be slow. 2. For --- int funca( int a, int b ) { return (a/b) + (a%b); } -- This patch generates 2 idivbs since the optimization is done at RTL expansion. Is there a way to delay this until later when 2 idivls are optimized into 1 idivl and before IRA since this optimization needs a scratch register. I tried to split idivl into idivb and idivl. It didn't work due to recursive split. Thanks. H.J. --- gcc/ 2010-09-13 H.J. Lu * config/i386/i386-protos.h (ix86_expand_idivmod): New. * config/i386/i386.c (predict_jump): Add prototype. (override_options): Handle -m8bit-idiv. (ix86_expand_idivmod): New. * config/i386/i386.md (divmod4): Renamed to ... (divmod4_1): This. (udivmod4): Renamed to ... (udivmod4_1): This. (divmod4): New. (udivmod4): Likewise. * config/i386/i386.opt (m8bit-idiv): New. * doc/invoke.texi: Document -m8bit-idiv. gcc/testsuite/ 2010-09-13 H.J. Lu * gcc.target/i386/divmod-1.c: New. * gcc.target/i386/divmod-2.c: Likewise. * gcc.target/i386/divmod-3.c: Likewise. * gcc.target/i386/udivmod-1.c: Likewise. * gcc.target/i386/udivmod-2.c: Likewise. * gcc.target/i386/udivmod-3.c: Likewise. * gcc.target/i386/umod-1.c: Likewise. * gcc.target/i386/umod-2.c: Likewise. * gcc.target/i386/umod-3.c: Likewise. diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 900b424..6b1f447 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -129,6 +129,7 @@ extern void ix86_split_ashr (rtx *, rtx, enum machine_mode); extern void ix86_split_lshr (rtx *, rtx, enum machine_mode); extern rtx ix86_find_base_term (rtx); extern bool ix86_check_movabs (rtx, int); +extern void ix86_expand_idivmod (enum rtx_code, enum machine_mode, rtx[]); extern rtx assign_386_stack_local (enum machine_mode, enum ix86_stack_slot); extern int ix86_attr_length_immediate_default (rtx, int); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1d79a18..21d9c0f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -1981,6 +1981,7 @@ static bool ix86_expand_vector_init_one_nonzero (bool, enum machine_mode, static void ix86_add_new_builtins (int); static rtx ix86_expand_vec_perm_builtin (tree); static tree ix86_canonical_va_list_type (tree); +static void predict_jump (int); enum ix86_function_specific_strings { @@ -3699,6 +3700,9 @@ override_options (bool main_args_p) #endif } + if (flag_8bit_idiv < 0) + flag_8bit_idiv = 0; + /* Save the initial options in case the user does function specific options */ if (main_args_p) target_option_default_node = target_option_current_node @@ -14647,6 +14651,103 @@ ix86_expand_unary_operator (enum rtx_code code, enum machine_mode mode, emit_move_insn (operands[0], dst); } +/* Expand 32bit divmod with 8bit unsigned divmod if dividend and + divisor are within the the range [0-255]. */ + +void +ix86_expand_idivmod (enum rtx_code code, enum machine_mode mode, + rtx operands[]) +{ + rtx end_label, qimode_label; + rtx insn, div, mod; + rtx tmp0, tmp1, tmp2; + rtx (*gen_divmod4) (rtx, rtx, rtx, rtx); + + switch (mode) + { + case HImode: + gen_divmod4 = code == DIV ? gen_divmodhi4_1 : gen_udivmodhi4_1; + break; + case SImode: + gen_divmod4 = code == DIV ? gen_divmodsi4_1 : gen_udivmodsi4_1; + break; + case DImode: + gen_divmod4 = code == DIV ? gen_divmoddi4_1 : gen_udivmoddi4_1; + break; + default: + gcc_unreachable (); + } + + div = (*gen_divmod4) (operands[0], operands[1], + operands[2], operands[3]); + if (mode != SImode + || !flag_8bit_idiv + || !TARGET_QIMODE_MATH + || optimize_insn_for_size_p ()) + { + emit_insn (div); + return; + } + + end_label = gen_label_rtx (); + qimode_label = gen_label_rtx (); + + tmp0 = gen_reg_rtx (SImode); + + /* Use 8bit unsigned divimod if dividend and divisor are within the + the range [0-255]. */ + emit_move_insn (tmp0, operands[1]); + tmp0 = expand_simple_binop (mode, IOR, tmp0, operands[2], + tmp0, 1, OPTAB_DIRECT); + emit_insn (gen_testsi_ccno_1 (tmp0, GEN_INT (-0x100))); + tmp0 = gen_rtx_REG (CCNOmode, FLAGS_REG); + tmp0 = gen_rtx_EQ (VOIDmode, tmp0, const0_rtx); + tmp0 = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp0, + gen_rtx_LABEL_REF (VOIDmode, qimode_label), + pc_rtx); + insn = emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, tmp0)); + predict_jump (REG_BR_PROB_BASE * 50 / 100); + JUMP_LABEL (insn) = qimode_label; + + /* Generate 32bit signed/unsigned divimod. */ + emit_insn (div); + + /* Branch to the end. */ + emit_jump_insn (gen_jump (end_label)); + emit_barrier (); + + /* Generate 8bit unsigned divide. */ + emit_label (qimode_label); + tmp0 = simplify_gen_subreg (HImode, operands[0], SImode, 0); + tmp1 = simplify_gen_subreg (HImode, operands[1], SImode, 0); + tmp2 = simplify_gen_subreg (QImode, operands[2], SImode, 0); + emit_insn (gen_udivmodhiqi3 (tmp0, tmp1, tmp2)); + + if (code == DIV) + { + div = gen_rtx_DIV (SImode, operands[1], operands[2]); + mod = gen_rtx_MOD (SImode, operands[1], operands[2]); + } + else + { + div = gen_rtx_UDIV (SImode, operands[1], operands[2]); + mod = gen_rtx_UMOD (SImode, operands[1], operands[2]); + } + + /* Extract remainder from AH. */ + tmp1 = gen_rtx_ZERO_EXTRACT (SImode, tmp0, + GEN_INT (8), GEN_INT (8)); + insn = emit_move_insn (operands[3], tmp1); + set_unique_reg_note (insn, REG_EQUAL, mod); + + /* Zero extend quotient from AL. */ + tmp0 = gen_lowpart (QImode, tmp0); + insn = emit_insn (gen_zero_extendqisi2 (operands[0], tmp0)); + set_unique_reg_note (insn, REG_EQUAL, div); + + emit_label (end_label); +} + #define LEA_SEARCH_THRESHOLD 12 /* Search backward for non-agu definition of register number REGNO1 diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 9780eef..1ddf0b4 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -7295,7 +7295,7 @@ [(set_attr "type" "idiv") (set_attr "mode" "QI")]) -(define_expand "divmod4" +(define_expand "divmod4_1" [(parallel [(set (match_operand:SWIM248 0 "register_operand" "") (div:SWIM248 (match_operand:SWIM248 1 "register_operand" "") @@ -7306,6 +7306,17 @@ "" "") +(define_expand "divmod4" + [(parallel [(set (match_operand:SWIM248 0 "register_operand" "") + (div:SWIM248 + (match_operand:SWIM248 1 "register_operand" "") + (match_operand:SWIM248 2 "nonimmediate_operand" ""))) + (set (match_operand:SWIM248 3 "register_operand" "") + (mod:SWIM248 (match_dup 1) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] + "" + "ix86_expand_idivmod (DIV, mode, operands); DONE;") + (define_insn_and_split "*divmod4" [(set (match_operand:SWIM248 0 "register_operand" "=a") (div:SWIM248 (match_operand:SWIM248 2 "register_operand" "0") @@ -7354,7 +7365,7 @@ [(set_attr "type" "idiv") (set_attr "mode" "")]) -(define_expand "udivmod4" +(define_expand "udivmod4_1" [(parallel [(set (match_operand:SWIM248 0 "register_operand" "") (udiv:SWIM248 (match_operand:SWIM248 1 "register_operand" "") @@ -7365,6 +7376,17 @@ "" "") +(define_expand "udivmod4" + [(parallel [(set (match_operand:SWIM248 0 "register_operand" "") + (udiv:SWIM248 + (match_operand:SWIM248 1 "register_operand" "") + (match_operand:SWIM248 2 "nonimmediate_operand" ""))) + (set (match_operand:SWIM248 3 "register_operand" "") + (umod:SWIM248 (match_dup 1) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] + "" + "ix86_expand_idivmod (UDIV, mode, operands); DONE;") + (define_insn_and_split "*udivmod4" [(set (match_operand:SWIM248 0 "register_operand" "=a") (udiv:SWIM248 (match_operand:SWIM248 2 "register_operand" "0") diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 5790e76..f7abe75 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -388,3 +388,7 @@ Support F16C built-in functions and code generation mfentry Target Report Var(flag_fentry) Init(-1) Emit profiling counter call at function entry before prologue. + +m8bit-idiv +Target Report Var(flag_8bit_idiv) Init(-1) Save +Expand 32bit integer divide into control flow with 8bit unsigned integer divide diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index b354382..cb3a756 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -602,7 +602,7 @@ Objective-C and Objective-C++ Dialects}. -momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol -mcmodel=@var{code-model} -mabi=@var{name} @gol -m32 -m64 -mlarge-data-threshold=@var{num} @gol --msse2avx -mfentry} +-msse2avx -mfentry -m8bit-idiv} @emph{IA-64 Options} @gccoptlist{-mbig-endian -mlittle-endian -mgnu-as -mgnu-ld -mno-pic @gol @@ -12647,6 +12647,13 @@ If profiling is active @option{-pg} put the profiling counter call before prologue. Note: On x86 architectures the attribute @code{ms_hook_prologue} isn't possible at the moment for @option{-mfentry} and @option{-pg}. + +@item -m8bit-idiv +@itemx -mno-8bit-idiv +@opindex 8bit-idiv +This option will enable GCC to expand 32bit integer divide into control +flow with 8bit unsigned integer divide. + @end table These @samp{-m} switches are supported in addition to the above diff --git a/gcc/testsuite/gcc.target/i386/divmod-1.c b/gcc/testsuite/gcc.target/i386/divmod-1.c new file mode 100644 index 0000000..2769a21 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/divmod-1.c @@ -0,0 +1,30 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -m8bit-idiv" } */ + +extern void abort (void); + +void +__attribute__((noinline)) +test (int x, int y, int q, int r) +{ + if ((x / y) != q || (x % y) != r) + abort (); +} + +int +main () +{ + test (7, 6, 1, 1); + test (-7, -6, 1, -1); + test (-7, 6, -1, -1); + test (7, -6, -1, 1); + test (255, 254, 1, 1); + test (256, 254, 1, 2); + test (256, 256, 1, 0); + test (254, 256, 0, 254); + test (254, 255, 0, 254); + test (254, 1, 254, 0); + test (255, 2, 127, 1); + test (1, 256, 0, 1); + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/divmod-2.c b/gcc/testsuite/gcc.target/i386/divmod-2.c new file mode 100644 index 0000000..dbe40b7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/divmod-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -m8bit-idiv" } */ + +int +foo (int x, int y) +{ + return x / y; +} + +/* { dg-final { scan-assembler-times "divb" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/divmod-3.c b/gcc/testsuite/gcc.target/i386/divmod-3.c new file mode 100644 index 0000000..4966d7f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/divmod-3.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -m8bit-idiv" } */ + +int +foo (int x, int y) +{ + return x % y; +} + +/* { dg-final { scan-assembler-times "divb" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/udivmod-1.c b/gcc/testsuite/gcc.target/i386/udivmod-1.c new file mode 100644 index 0000000..eebd843 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/udivmod-1.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -m8bit-idiv" } */ + +extern void abort (void); + +void +__attribute__((noinline)) +test (unsigned int x, unsigned int y, unsigned int q, unsigned int r) +{ + if ((x / y) != q || (x % y) != r) + abort (); +} + +int +main () +{ + test (7, 6, 1, 1); + test (255, 254, 1, 1); + test (256, 254, 1, 2); + test (256, 256, 1, 0); + test (254, 256, 0, 254); + test (254, 255, 0, 254); + test (254, 1, 254, 0); + test (255, 2, 127, 1); + test (1, 256, 0, 1); + test (0x80000000, 0x7fffffff, 1, 1); + test (0x7fffffff, 0x80000000, 0, 0x7fffffff); + test (0x80000000, 0x80000003, 0, 0x80000000); + test (0xfffffffd, 0xfffffffe, 0, 0xfffffffd); + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/udivmod-2.c b/gcc/testsuite/gcc.target/i386/udivmod-2.c new file mode 100644 index 0000000..e43bbc7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/udivmod-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -m8bit-idiv" } */ + +unsigned int +foo (unsigned int x, unsigned int y) +{ + return x / y; +} + +/* { dg-final { scan-assembler-times "divb" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/udivmod-3.c b/gcc/testsuite/gcc.target/i386/udivmod-3.c new file mode 100644 index 0000000..9aead9e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/udivmod-3.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -m8bit-idiv" } */ + +unsigned int +foo (unsigned int x, unsigned int y) +{ + return x % y; +} + +/* { dg-final { scan-assembler-times "divb" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/umod-1.c b/gcc/testsuite/gcc.target/i386/umod-1.c index 54edf13..a39e75b 100644 --- a/gcc/testsuite/gcc.target/i386/umod-1.c +++ b/gcc/testsuite/gcc.target/i386/umod-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mtune=atom" } */ +/* { dg-options "-O2 -m8bit-idiv" } */ unsigned char foo (unsigned char x, unsigned char y) diff --git a/gcc/testsuite/gcc.target/i386/umod-2.c b/gcc/testsuite/gcc.target/i386/umod-2.c index 6fe7384..1ef33bb 100644 --- a/gcc/testsuite/gcc.target/i386/umod-2.c +++ b/gcc/testsuite/gcc.target/i386/umod-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mtune=atom" } */ +/* { dg-options "-O2 -m8bit-idiv" } */ extern unsigned char z; diff --git a/gcc/testsuite/gcc.target/i386/umod-3.c b/gcc/testsuite/gcc.target/i386/umod-3.c index 7123bc9..3e5bc40 100644 --- a/gcc/testsuite/gcc.target/i386/umod-3.c +++ b/gcc/testsuite/gcc.target/i386/umod-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mtune=atom" } */ +/* { dg-options "-O2 -m8bit-idiv" } */ extern void abort (void); extern void exit (int);