diff mbox

New parameters to control stringop expansion libcall strategy

Message ID CAAkRFZ+eO3wq9vJhMG5FBc6Awb04+8PQKz44A0PJEpOo7TByfQ@mail.gmail.com
State New
Headers show

Commit Message

Xinliang David Li Aug. 5, 2013, 3:01 a.m. UTC
The attached is a new patch implementing the stringop inline strategy
control using two new -m options:

-mmemcpy-strategy=
-mmemset-strategy=

See changes in doc/invoke.texi for description of the new options. Example:
  -mmemcpy-strategy=rep_8byte:64:unaligned,unrolled_loop:2048:unaligned,libcall:-1:unaligned

tells compiler to inline memcpy using rep_8byte when the size is no
larger than 64 byte, using unrolled_loop when size is no larger than
2048, and for size > 2048, using library call. In all cases,
destination alignment adjustment is not done.

Tested on x86-64/linux. Ok for trunk?

thanks,

David

2013-08-02  Xinliang David Li  <davidxl@google.com>

        * config/i386/stringop.def: New file.
        * config/i386/stringop.opt: New file.
        * config/i386/i386-opts.h: Include stringopt.def.
        * config/i386/i386.opt: Include stringopt.opt.
        * config/i386/i386.c (ix86_option_override_internal):
        Override default size based stringop inline strategies
        with options.
        * config/i386/i386.c (ix86_parse_stringop_strategy_string):
        New function.

2013-08-04  Xinliang David Li  <davidxl@google.com>

        * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test.
        * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto.
        * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto.
        * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.




On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davidxl@google.com> wrote:
> On x86_64, when the expected size of memcpy/memset is known (e.g, with
> FDO), libcall strategy is used with the size is > 8192. This value is
> hard coded, which makes it hard to do performance tuning. This patch
> adds two new parameters to do that. Potential usage includes
> per-application libcall strategy min-size tuning based on summary data
> with FDO (e.g, instruction workset size).
>
> Bootstrap and tested on x86_64/linux. Ok for trunk?
>
> thanks,
>
> David
>
>
> 2013-08-02  Xinliang David Li  <davidxl@google.com>
>
>         * params.def: New parameters.
>         * config/i386/i386.c (ix86_option_override_internal):
>         Override default libcall size limit with parameters.

Comments

Michael Zolotukhin Aug. 5, 2013, 10:57 a.m. UTC | #1
Hi,
This is a really convenient option, thanks for working on it.
I can't approve it as I'm not a maintainer, but it looks ok to me,
except fot a small nitpicking: afair, comments should end with
dot-space-space.

Michael

On 04 Aug 20:01, Xinliang David Li wrote:
> The attached is a new patch implementing the stringop inline strategy
> control using two new -m options:
> 
> -mmemcpy-strategy=
> -mmemset-strategy=
> 
> See changes in doc/invoke.texi for description of the new options. Example:
>   -mmemcpy-strategy=rep_8byte:64:unaligned,unrolled_loop:2048:unaligned,libcall:-1:unaligned
> 
> tells compiler to inline memcpy using rep_8byte when the size is no
> larger than 64 byte, using unrolled_loop when size is no larger than
> 2048, and for size > 2048, using library call. In all cases,
> destination alignment adjustment is not done.
> 
> Tested on x86-64/linux. Ok for trunk?
> 
> thanks,
> 
> David
> 
> 2013-08-02  Xinliang David Li  <davidxl@google.com>
> 
>         * config/i386/stringop.def: New file.
>         * config/i386/stringop.opt: New file.
>         * config/i386/i386-opts.h: Include stringopt.def.
>         * config/i386/i386.opt: Include stringopt.opt.
>         * config/i386/i386.c (ix86_option_override_internal):
>         Override default size based stringop inline strategies
>         with options.
>         * config/i386/i386.c (ix86_parse_stringop_strategy_string):
>         New function.
> 
> 2013-08-04  Xinliang David Li  <davidxl@google.com>
> 
>         * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test.
>         * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto.
>         * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto.
>         * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.
> 
> 
> 
> 
> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davidxl@google.com> wrote:
> > On x86_64, when the expected size of memcpy/memset is known (e.g, with
> > FDO), libcall strategy is used with the size is > 8192. This value is
> > hard coded, which makes it hard to do performance tuning. This patch
> > adds two new parameters to do that. Potential usage includes
> > per-application libcall strategy min-size tuning based on summary data
> > with FDO (e.g, instruction workset size).
> >
> > Bootstrap and tested on x86_64/linux. Ok for trunk?
> >
> > thanks,
> >
> > David
> >
> >
> > 2013-08-02  Xinliang David Li  <davidxl@google.com>
> >
> >         * params.def: New parameters.
> >         * config/i386/i386.c (ix86_option_override_internal):
> >         Override default libcall size limit with parameters.

> Index: config/i386/stringop.def
> ===================================================================
> --- config/i386/stringop.def	(revision 0)
> +++ config/i386/stringop.def	(revision 0)
> @@ -0,0 +1,42 @@
> +/* Definitions for option handling for IA-32.
> +   Copyright (C) 2013 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +<http://www.gnu.org/licenses/>.  */
> +
> +DEF_ENUM
> +DEF_ALG (no_stringop, no_stringop)
> +DEF_ENUM
> +DEF_ALG (libcall, libcall)
> +DEF_ENUM
> +DEF_ALG (rep_prefix_1_byte, rep_byte)
> +DEF_ENUM
> +DEF_ALG (rep_prefix_4_byte, rep_4byte)
> +DEF_ENUM
> +DEF_ALG (rep_prefix_8_byte, rep_8byte)
> +DEF_ENUM
> +DEF_ALG (loop_1_byte, byte_loop)
> +DEF_ENUM
> +DEF_ALG (loop, loop)
> +DEF_ENUM
> +DEF_ALG (unrolled_loop, unrolled_loop)
> +DEF_ENUM
> +DEF_ALG (vector_loop, vector_loop)
> Index: config/i386/i386.opt
> ===================================================================
> --- config/i386/i386.opt	(revision 201458)
> +++ config/i386/i386.opt	(working copy)
> @@ -316,6 +316,14 @@ mstack-arg-probe
>  Target Report Mask(STACK_PROBE) Save
>  Enable stack probing
>  
> +mmemcpy-strategy=
> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
> +Specify memcpy expansion strategy when expected size is known
> +
> +mmemset-strategy=
> +Target RejectNegative Joined Var(ix86_tune_memset_strategy)
> +Specify memset expansion strategy when expected size is known
> +
>  mstringop-strategy=
>  Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop)
>  Chose strategy to generate stringop using
> Index: config/i386/stringop.opt
> ===================================================================
> --- config/i386/stringop.opt	(revision 0)
> +++ config/i386/stringop.opt	(revision 0)
> @@ -0,0 +1,36 @@
> +/* Definitions for option handling for IA-32.
> +   Copyright (C) 2013 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +<http://www.gnu.org/licenses/>.  */
> +
> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
> +
> +#undef DEF_ENUM
> +#define DEF_ENUM EnumValue
> +
> +#undef DEF_ALG
> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
> +
> +#include "stringop.def"
> +
> +#undef DEF_ENUM
> +#undef DEF_ALG
> Index: config/i386/i386.c
> ===================================================================
> --- config/i386/i386.c	(revision 201458)
> +++ config/i386/i386.c	(working copy)
> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost =
>  };
>  
>  /* Processor costs (relative to an add) */
> -static const
> +static
>  struct processor_costs i386_cost = {	/* 386 specific costs */
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (1),			/* cost of a lea instruction */
> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = {	/*
>    1,					/* cond_not_taken_branch_cost.  */
>  };
>  
> -static const
> +static
>  struct processor_costs i486_cost = {	/* 486 specific costs */
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (1),			/* cost of a lea instruction */
> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = {	/*
>    1,					/* cond_not_taken_branch_cost.  */
>  };
>  
> -static const
> +static
>  struct processor_costs pentium_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (1),			/* cost of a lea instruction */
> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = {
>    1,					/* cond_not_taken_branch_cost.  */
>  };
>  
> -static const
> +static
>  struct processor_costs pentiumpro_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (1),			/* cost of a lea instruction */
> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost =
>    1,					/* cond_not_taken_branch_cost.  */
>  };
>  
> -static const
> +static
>  struct processor_costs geode_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (1),			/* cost of a lea instruction */
> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = {
>    1,					/* cond_not_taken_branch_cost.  */
>  };
>  
> -static const
> +static
>  struct processor_costs k6_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (2),			/* cost of a lea instruction */
> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = {
>    1,					/* cond_not_taken_branch_cost.  */
>  };
>  
> -static const
> +static
>  struct processor_costs athlon_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (2),			/* cost of a lea instruction */
> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = {
>    1,					/* cond_not_taken_branch_cost.  */
>  };
>  
> -static const
> +static
>  struct processor_costs k8_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (2),			/* cost of a lea instruction */
> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = {
>    1,					/* cond_not_taken_branch_cost.  */
>  };
>  
> -static const
> +static
>  struct processor_costs pentium4_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (3),			/* cost of a lea instruction */
> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = {
>    1,					/* cond_not_taken_branch_cost.  */
>  };
>  
> -static const
> +static
>  struct processor_costs nocona_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (1),			/* cost of a lea instruction */
> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = {
>    1,					/* cond_not_taken_branch_cost.  */
>  };
>  
> -static const
> +static
>  struct processor_costs atom_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (1) + 1,		/* cost of a lea instruction */
> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = {
>  };
>  
>  /* Generic64 should produce code tuned for Nocona and K8.  */
> -static const
> +static
>  struct processor_costs generic64_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    /* On all chips taken into consideration lea is 2 cycles and more.  With
> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost =
>  };
>  
>  /* core_cost should produce code tuned for Core familly of CPUs.  */
> -static const
> +static
>  struct processor_costs core_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    /* On all chips taken into consideration lea is 2 cycles and more.  With
> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = {
>  
>  /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
>     Athlon and K8.  */
> -static const
> +static
>  struct processor_costs generic32_cost = {
>    COSTS_N_INSNS (1),			/* cost of an add instruction */
>    COSTS_N_INSNS (1) + 1,		/* cost of a lea instruction */
> @@ -2900,6 +2900,150 @@ ix86_debug_options (void)
>  
>    return;
>  }
> +
> +static const char *stringop_alg_names[] = {
> +#define DEF_ENUM
> +#define DEF_ALG(alg, name) #name,
> +#include "stringop.def"
> +#undef DEF_ENUM
> +#undef DEF_ALG
> +};
> +
> +/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=.
> +   The string is of the following form (or comma separated list of it):
> +
> +     strategy_alg:max_size:[align|noalign]
> +
> +   where the full size range for the strategy is either [0, max_size] or
> +   [min_size, max_size], in which min_size is the max_size + 1 of the
> +   preceding range.  The last size range must have max_size == -1.
> +
> +   Examples:
> +
> +    1.
> +       -mmemcpy-strategy=libcall:-1:noalign
> +
> +      this is equivalent to (for known size memcpy) -mstringop-strategy=libcall
> +
> +
> +   2.
> +      -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
> +
> +      This is to tell the compiler to use the following strategy for memset
> +      1) when the expected size is between [1, 16], use rep_8byte strategy;
> +      2) when the size is between [17, 2048], use vector_loop;
> +      3) when the size is > 2048, use libcall.
> +
> +*/
> +
> +struct stringop_size_range
> +{
> +  int min;
> +  int max;
> +  stringop_alg alg;
> +  bool noalign;
> +};
> +
> +static void
> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
> +{
> +  const struct stringop_algs *default_algs;
> +  stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
> +  char *curr_range_str, *next_range_str;
> +  int i = 0, n = 0;
> +
> +  if (is_memset)
> +    default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
> +  else
> +    default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
> +
> +  curr_range_str = strategy_str;
> +
> +  do {
> +
> +    int mins, maxs;
> +    stringop_alg alg;
> +    char alg_name[128];
> +    char align[16];
> +
> +    next_range_str = strchr (curr_range_str, ',');
> +    if (next_range_str)
> +      *next_range_str++ = '\0';
> +
> +    if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align))
> +      {
> +        warning (0, "Wrong arg %s to option %s", curr_range_str,
> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> +        return;
> +      }
> +
> +    if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1))
> +      {
> +        warning (0, "Size ranges of option %s should be increasing",
> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> +        return;
> +      }
> +
> +    for (i = 0; i < last_alg; i++)
> +      {
> +        if (!strcmp (alg_name, stringop_alg_names[i]))
> +	  {
> +	    alg = (stringop_alg) i;
> +	    break;
> +          }
> +      }
> +
> +    if (i == last_alg)
> +      {
> +        warning (0, "Wrong stringop strategy name %s specified for option %s",
> +	         alg_name,
> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> +	return;
> +      }
> +
> +    input_ranges[n].min = mins;
> +    input_ranges[n].max = maxs;
> +    input_ranges[n].alg = alg;
> +    if (!strcmp (align, "align"))
> +      input_ranges[n].noalign = false;
> +    else if (!strcmp (align, "noalign"))
> +      input_ranges[n].noalign = true;
> +    else
> +      {
> +        warning (0, "Unknown alignment %s specified for option %s",
> +                 align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> +        return;
> +      }
> +    n++;
> +    curr_range_str = next_range_str;
> +  } while (curr_range_str);
> +
> +  if (input_ranges[n - 1].max != -1)
> +    {
> +      warning (0, "The max value for the last size range should be -1"
> +               " for option %s",
> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> +      return;
> +    }
> +
> +  if (n > MAX_STRINGOP_ALGS)
> +    {
> +      warning (0, "Too many size ranges specified in option %s",
> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
> +      return;
> +    }
> +
> +  /* Now override the default algs array  */
> +  for (i = 0; i < n; i++)
> +    {
> +      *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max;
> +      *const_cast<stringop_alg *>(&default_algs->size[i].alg)
> +          = input_ranges[i].alg;
> +      *const_cast<int *>(&default_algs->size[i].noalign)
> +          = input_ranges[i].noalign;
> +    }
> +}
> +
>  
>  /* Override various settings based on options.  If MAIN_ARGS_P, the
>     options are from the command line, otherwise they are from
> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main
>    /* Handle stack protector */
>    if (!global_options_set.x_ix86_stack_protector_guard)
>      ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
> +
> +  /* Handle -mmemcpy-strategy= and -mmemset-strategy=  */
> +  if (ix86_tune_memcpy_strategy)
> +    {
> +      char *str = xstrdup (ix86_tune_memcpy_strategy);
> +      ix86_parse_stringop_strategy_string (str, false);
> +      free (str);
> +    }
> +
> +  if (ix86_tune_memset_strategy)
> +    {
> +      char *str = xstrdup (ix86_tune_memset_strategy);
> +      ix86_parse_stringop_strategy_string (str, true);
> +      free (str);
> +    }
>  }
>  
>  /* Implement the TARGET_OPTION_OVERRIDE hook.  */
> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>      {
>      case libcall:
>      case no_stringop:
> +    case last_alg:
>        gcc_unreachable ();
>      case loop_1_byte:
>        need_zero_guard = true;
> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>      {
>      case libcall:
>      case no_stringop:
> +    case last_alg:
>        gcc_unreachable ();
>      case loop_1_byte:
>      case loop:
> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>      {
>      case libcall:
>      case no_stringop:
> +    case last_alg:
>        gcc_unreachable ();
>      case loop:
>        need_zero_guard = true;
> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>      {
>      case libcall:
>      case no_stringop:
> +    case last_alg:
>        gcc_unreachable ();
>      case loop_1_byte:
>      case loop:
> Index: config/i386/i386-opts.h
> ===================================================================
> --- config/i386/i386-opts.h	(revision 201458)
> +++ config/i386/i386-opts.h	(working copy)
> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI
>  /* Algorithm to expand string function with.  */
>  enum stringop_alg
>  {
> -   no_stringop,
> -   libcall,
> -   rep_prefix_1_byte,
> -   rep_prefix_4_byte,
> -   rep_prefix_8_byte,
> -   loop_1_byte,
> -   loop,
> -   unrolled_loop,
> -   vector_loop
> +#undef DEF_ENUM
> +#define DEF_ENUM
> +
> +#undef DEF_ALG
> +#define DEF_ALG(alg, name) alg, 
> +
> +#include "stringop.def"
> +last_alg
> +
> +#undef DEF_ENUM
> +#undef DEF_ALG
>  };
>  
>  /* Available call abi.  */
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi	(revision 201458)
> +++ doc/invoke.texi	(working copy)
> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}.
>  -mbmi2 -mrtm -mlwp -mthreads @gol
>  -mno-align-stringops  -minline-all-stringops @gol
>  -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} 
>  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
>  -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
>  -mregparm=@var{num}  -msseregparm @gol
> @@ -14598,6 +14599,24 @@ Expand into an inline loop.
>  Always use a library call.
>  @end table
>  
> +@item -mmemcpy-strategy=@var{strategy}
> +@opindex mmemcpy-strategy=@var{strategy}
> +Override the internal decision heuristic to decide if @code{__builtin_memcpy}
> +should be inlined and what inline algorithm to use when the expected size
> +of the copy operation is known. @var{strategy} 
> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets. 
> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
> +the max byte size with which inline algorithm @var{alg} is allowed. For the last
> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
> +in the list must be specified in increasing order. The minimal byte size for 
> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the 
> +preceding range.
> +
> +@item -mmemset-strategy=@var{strategy}
> +@opindex mmemset-strategy=@var{strategy}
> +The option is similar to @option{-mmemcpy-strategy=} except that it is to control
> +@code{__builtin_memset} expansion.
> +
>  @item -momit-leaf-frame-pointer
>  @opindex momit-leaf-frame-pointer
>  Don't keep the frame pointer in a register for leaf functions.  This
> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
> ===================================================================
> --- testsuite/gcc.target/i386/memcpy-strategy-1.c	(revision 0)
> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c	(revision 0)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
> +
> +char a[2048];
> +char b[2048];
> +void t (void)
> +{
> +  __builtin_memcpy (a, b, 2048);
> +}
> +
> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
> ===================================================================
> --- testsuite/gcc.target/i386/memcpy-strategy-2.c	(revision 0)
> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c	(revision 0)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
> +
> +char a[2048];
> +char b[2048];
> +void t (void)
> +{
> +  __builtin_memcpy (a, b, 2048);
> +}
> +
> Index: testsuite/gcc.target/i386/memset-strategy-1.c
> ===================================================================
> --- testsuite/gcc.target/i386/memset-strategy-1.c	(revision 0)
> +++ testsuite/gcc.target/i386/memset-strategy-1.c	(revision 0)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
> +/* { dg-final { scan-assembler-times "memset" 2  } } */
> +
> +char a[2048];
> +void t (void)
> +{
> +  __builtin_memset (a, 1, 2048);
> +}
> +
> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
> ===================================================================
> --- testsuite/gcc.target/i386/memcpy-strategy-3.c	(revision 0)
> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c	(revision 0)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
> +/* { dg-final { scan-assembler-times "memcpy" 2  } } */
> +
> +char a[2048];
> +char b[2048];
> +void t (void)
> +{
> +  __builtin_memcpy (a, b, 2048);
> +}
> +
diff mbox

Patch

Index: config/i386/stringop.def
===================================================================
--- config/i386/stringop.def	(revision 0)
+++ config/i386/stringop.def	(revision 0)
@@ -0,0 +1,42 @@ 
+/* Definitions for option handling for IA-32.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+DEF_ENUM
+DEF_ALG (no_stringop, no_stringop)
+DEF_ENUM
+DEF_ALG (libcall, libcall)
+DEF_ENUM
+DEF_ALG (rep_prefix_1_byte, rep_byte)
+DEF_ENUM
+DEF_ALG (rep_prefix_4_byte, rep_4byte)
+DEF_ENUM
+DEF_ALG (rep_prefix_8_byte, rep_8byte)
+DEF_ENUM
+DEF_ALG (loop_1_byte, byte_loop)
+DEF_ENUM
+DEF_ALG (loop, loop)
+DEF_ENUM
+DEF_ALG (unrolled_loop, unrolled_loop)
+DEF_ENUM
+DEF_ALG (vector_loop, vector_loop)
Index: config/i386/i386.opt
===================================================================
--- config/i386/i386.opt	(revision 201458)
+++ config/i386/i386.opt	(working copy)
@@ -316,6 +316,14 @@  mstack-arg-probe
 Target Report Mask(STACK_PROBE) Save
 Enable stack probing
 
+mmemcpy-strategy=
+Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
+Specify memcpy expansion strategy when expected size is known
+
+mmemset-strategy=
+Target RejectNegative Joined Var(ix86_tune_memset_strategy)
+Specify memset expansion strategy when expected size is known
+
 mstringop-strategy=
 Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop)
 Chose strategy to generate stringop using
Index: config/i386/stringop.opt
===================================================================
--- config/i386/stringop.opt	(revision 0)
+++ config/i386/stringop.opt	(revision 0)
@@ -0,0 +1,36 @@ 
+/* Definitions for option handling for IA-32.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
+
+#undef DEF_ENUM
+#define DEF_ENUM EnumValue
+
+#undef DEF_ALG
+#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
+
+#include "stringop.def"
+
+#undef DEF_ENUM
+#undef DEF_ALG
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 201458)
+++ config/i386/i386.c	(working copy)
@@ -156,7 +156,7 @@  struct processor_costs ix86_size_cost =
 };
 
 /* Processor costs (relative to an add) */
-static const
+static
 struct processor_costs i386_cost = {	/* 386 specific costs */
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -226,7 +226,7 @@  struct processor_costs i386_cost = {	/*
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs i486_cost = {	/* 486 specific costs */
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -298,7 +298,7 @@  struct processor_costs i486_cost = {	/*
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs pentium_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -368,7 +368,7 @@  struct processor_costs pentium_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs pentiumpro_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -447,7 +447,7 @@  struct processor_costs pentiumpro_cost =
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs geode_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -518,7 +518,7 @@  struct processor_costs geode_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs k6_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -591,7 +591,7 @@  struct processor_costs k6_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs athlon_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -664,7 +664,7 @@  struct processor_costs athlon_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs k8_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (2),			/* cost of a lea instruction */
@@ -1265,7 +1265,7 @@  struct processor_costs btver2_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs pentium4_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (3),			/* cost of a lea instruction */
@@ -1336,7 +1336,7 @@  struct processor_costs pentium4_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs nocona_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1),			/* cost of a lea instruction */
@@ -1409,7 +1409,7 @@  struct processor_costs nocona_cost = {
   1,					/* cond_not_taken_branch_cost.  */
 };
 
-static const
+static
 struct processor_costs atom_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1) + 1,		/* cost of a lea instruction */
@@ -1556,7 +1556,7 @@  struct processor_costs slm_cost = {
 };
 
 /* Generic64 should produce code tuned for Nocona and K8.  */
-static const
+static
 struct processor_costs generic64_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   /* On all chips taken into consideration lea is 2 cycles and more.  With
@@ -1635,7 +1635,7 @@  struct processor_costs generic64_cost =
 };
 
 /* core_cost should produce code tuned for Core familly of CPUs.  */
-static const
+static
 struct processor_costs core_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   /* On all chips taken into consideration lea is 2 cycles and more.  With
@@ -1717,7 +1717,7 @@  struct processor_costs core_cost = {
 
 /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
    Athlon and K8.  */
-static const
+static
 struct processor_costs generic32_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
   COSTS_N_INSNS (1) + 1,		/* cost of a lea instruction */
@@ -2900,6 +2900,150 @@  ix86_debug_options (void)
 
   return;
 }
+
+static const char *stringop_alg_names[] = {
+#define DEF_ENUM
+#define DEF_ALG(alg, name) #name,
+#include "stringop.def"
+#undef DEF_ENUM
+#undef DEF_ALG
+};
+
+/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=.
+   The string is of the following form (or comma separated list of it):
+
+     strategy_alg:max_size:[align|noalign]
+
+   where the full size range for the strategy is either [0, max_size] or
+   [min_size, max_size], in which min_size is the max_size + 1 of the
+   preceding range.  The last size range must have max_size == -1.
+
+   Examples:
+
+    1.
+       -mmemcpy-strategy=libcall:-1:noalign
+
+      this is equivalent to (for known size memcpy) -mstringop-strategy=libcall
+
+
+   2.
+      -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
+
+      This is to tell the compiler to use the following strategy for memset
+      1) when the expected size is between [1, 16], use rep_8byte strategy;
+      2) when the size is between [17, 2048], use vector_loop;
+      3) when the size is > 2048, use libcall.
+
+*/
+
+struct stringop_size_range
+{
+  int min;
+  int max;
+  stringop_alg alg;
+  bool noalign;
+};
+
+static void
+ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
+{
+  const struct stringop_algs *default_algs;
+  stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
+  char *curr_range_str, *next_range_str;
+  int i = 0, n = 0;
+
+  if (is_memset)
+    default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
+  else
+    default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
+
+  curr_range_str = strategy_str;
+
+  do {
+
+    int mins, maxs;
+    stringop_alg alg;
+    char alg_name[128];
+    char align[16];
+
+    next_range_str = strchr (curr_range_str, ',');
+    if (next_range_str)
+      *next_range_str++ = '\0';
+
+    if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align))
+      {
+        warning (0, "Wrong arg %s to option %s", curr_range_str,
+                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+        return;
+      }
+
+    if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1))
+      {
+        warning (0, "Size ranges of option %s should be increasing",
+                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+        return;
+      }
+
+    for (i = 0; i < last_alg; i++)
+      {
+        if (!strcmp (alg_name, stringop_alg_names[i]))
+	  {
+	    alg = (stringop_alg) i;
+	    break;
+          }
+      }
+
+    if (i == last_alg)
+      {
+        warning (0, "Wrong stringop strategy name %s specified for option %s",
+	         alg_name,
+                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+	return;
+      }
+
+    input_ranges[n].min = mins;
+    input_ranges[n].max = maxs;
+    input_ranges[n].alg = alg;
+    if (!strcmp (align, "align"))
+      input_ranges[n].noalign = false;
+    else if (!strcmp (align, "noalign"))
+      input_ranges[n].noalign = true;
+    else
+      {
+        warning (0, "Unknown alignment %s specified for option %s",
+                 align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+        return;
+      }
+    n++;
+    curr_range_str = next_range_str;
+  } while (curr_range_str);
+
+  if (input_ranges[n - 1].max != -1)
+    {
+      warning (0, "The max value for the last size range should be -1"
+               " for option %s",
+               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+      return;
+    }
+
+  if (n > MAX_STRINGOP_ALGS)
+    {
+      warning (0, "Too many size ranges specified in option %s",
+               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
+      return;
+    }
+
+  /* Now override the default algs array  */
+  for (i = 0; i < n; i++)
+    {
+      *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max;
+      *const_cast<stringop_alg *>(&default_algs->size[i].alg)
+          = input_ranges[i].alg;
+      *const_cast<int *>(&default_algs->size[i].noalign)
+          = input_ranges[i].noalign;
+    }
+}
+
 
 /* Override various settings based on options.  If MAIN_ARGS_P, the
    options are from the command line, otherwise they are from
@@ -4021,6 +4165,21 @@  ix86_option_override_internal (bool main
   /* Handle stack protector */
   if (!global_options_set.x_ix86_stack_protector_guard)
     ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
+
+  /* Handle -mmemcpy-strategy= and -mmemset-strategy=  */
+  if (ix86_tune_memcpy_strategy)
+    {
+      char *str = xstrdup (ix86_tune_memcpy_strategy);
+      ix86_parse_stringop_strategy_string (str, false);
+      free (str);
+    }
+
+  if (ix86_tune_memset_strategy)
+    {
+      char *str = xstrdup (ix86_tune_memset_strategy);
+      ix86_parse_stringop_strategy_string (str, true);
+      free (str);
+    }
 }
 
 /* Implement the TARGET_OPTION_OVERRIDE hook.  */
@@ -22903,6 +23062,7 @@  ix86_expand_movmem (rtx dst, rtx src, rt
     {
     case libcall:
     case no_stringop:
+    case last_alg:
       gcc_unreachable ();
     case loop_1_byte:
       need_zero_guard = true;
@@ -23093,6 +23253,7 @@  ix86_expand_movmem (rtx dst, rtx src, rt
     {
     case libcall:
     case no_stringop:
+    case last_alg:
       gcc_unreachable ();
     case loop_1_byte:
     case loop:
@@ -23304,6 +23465,7 @@  ix86_expand_setmem (rtx dst, rtx count_e
     {
     case libcall:
     case no_stringop:
+    case last_alg:
       gcc_unreachable ();
     case loop:
       need_zero_guard = true;
@@ -23481,6 +23643,7 @@  ix86_expand_setmem (rtx dst, rtx count_e
     {
     case libcall:
     case no_stringop:
+    case last_alg:
       gcc_unreachable ();
     case loop_1_byte:
     case loop:
Index: config/i386/i386-opts.h
===================================================================
--- config/i386/i386-opts.h	(revision 201458)
+++ config/i386/i386-opts.h	(working copy)
@@ -28,15 +28,17 @@  see the files COPYING3 and COPYING.RUNTI
 /* Algorithm to expand string function with.  */
 enum stringop_alg
 {
-   no_stringop,
-   libcall,
-   rep_prefix_1_byte,
-   rep_prefix_4_byte,
-   rep_prefix_8_byte,
-   loop_1_byte,
-   loop,
-   unrolled_loop,
-   vector_loop
+#undef DEF_ENUM
+#define DEF_ENUM
+
+#undef DEF_ALG
+#define DEF_ALG(alg, name) alg, 
+
+#include "stringop.def"
+last_alg
+
+#undef DEF_ENUM
+#undef DEF_ALG
 };
 
 /* Available call abi.  */
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 201458)
+++ doc/invoke.texi	(working copy)
@@ -649,6 +649,7 @@  Objective-C and Objective-C++ Dialects}.
 -mbmi2 -mrtm -mlwp -mthreads @gol
 -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
+-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} 
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
 -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
 -mregparm=@var{num}  -msseregparm @gol
@@ -14598,6 +14599,24 @@  Expand into an inline loop.
 Always use a library call.
 @end table
 
+@item -mmemcpy-strategy=@var{strategy}
+@opindex mmemcpy-strategy=@var{strategy}
+Override the internal decision heuristic to decide if @code{__builtin_memcpy}
+should be inlined and what inline algorithm to use when the expected size
+of the copy operation is known. @var{strategy} 
+is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets. 
+@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
+the max byte size with which inline algorithm @var{alg} is allowed. For the last
+triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
+in the list must be specified in increasing order. The minimal byte size for 
+@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the 
+preceding range.
+
+@item -mmemset-strategy=@var{strategy}
+@opindex mmemset-strategy=@var{strategy}
+The option is similar to @option{-mmemcpy-strategy=} except that it is to control
+@code{__builtin_memset} expansion.
+
 @item -momit-leaf-frame-pointer
 @opindex momit-leaf-frame-pointer
 Don't keep the frame pointer in a register for leaf functions.  This
Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
===================================================================
--- testsuite/gcc.target/i386/memcpy-strategy-1.c	(revision 0)
+++ testsuite/gcc.target/i386/memcpy-strategy-1.c	(revision 0)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
+/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
+/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
+
+char a[2048];
+char b[2048];
+void t (void)
+{
+  __builtin_memcpy (a, b, 2048);
+}
+
Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
===================================================================
--- testsuite/gcc.target/i386/memcpy-strategy-2.c	(revision 0)
+++ testsuite/gcc.target/i386/memcpy-strategy-2.c	(revision 0)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
+/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
+/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
+
+char a[2048];
+char b[2048];
+void t (void)
+{
+  __builtin_memcpy (a, b, 2048);
+}
+
Index: testsuite/gcc.target/i386/memset-strategy-1.c
===================================================================
--- testsuite/gcc.target/i386/memset-strategy-1.c	(revision 0)
+++ testsuite/gcc.target/i386/memset-strategy-1.c	(revision 0)
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
+/* { dg-final { scan-assembler-times "memset" 2  } } */
+
+char a[2048];
+void t (void)
+{
+  __builtin_memset (a, 1, 2048);
+}
+
Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
===================================================================
--- testsuite/gcc.target/i386/memcpy-strategy-3.c	(revision 0)
+++ testsuite/gcc.target/i386/memcpy-strategy-3.c	(revision 0)
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
+/* { dg-final { scan-assembler-times "memcpy" 2  } } */
+
+char a[2048];
+char b[2048];
+void t (void)
+{
+  __builtin_memcpy (a, b, 2048);
+}
+