diff mbox

Dynamic dispatch of multiversioned functions and CPU mocks for code coverage.

Message ID CAAs8HmwJSCONWpMfVNFxo=Qz3B=kK9T+suLZ7JH-JRkr07G=wA@mail.gmail.com
State New
Headers show

Commit Message

Sriraman Tallam May 10, 2013, 2:30 a.m. UTC
Hi,

   This patch is an enhancement to the Function Multiversioning
feature. This patch achieves two things:

*  Primarily, this patch makes it easy to test for code coverage
   of multiversioned functions.
*  Secondary, It makes function multiversioning work when there
   is no ifunc support. Since it invokes the dispatcher for every
   call, it is possible to execute different function versions every
   time. This incurs a performance penalty.

This patch makes it easy to
test for code coverage of multiversioned functions. Here is a
motivating example:

__attribute__((target ("default"))) int foo () { ... return 0; }
__attribute__((target ("sse"))) int foo () { ... return 1; }
__attribute__((target ("popcnt"))) int foo () { ... return 2; }

int main ()
{
  return foo();
}

Lets say your test CPU supports popcnt.  A run of this program will
invoke the popcnt version of foo (). Then, how do we test the sse
version of foo()? To do that for the above example, we need to run
this code on a CPU that has sse support but no popcnt support.
Otherwise, we need to comment out the popcnt version and run this
example. This can get painful when there are many versions. The same
argument applies to testing  the default version of foo.

So, I am introducing the ability to mock a CPU. If the CPU you are
testing on supports sse, you should be able to test the sse version.

First, I have introduced a new flag called -fmultiversion-dynamic-dispatch.
This patch invokes the function version dispatcher every time a call to
a foo () is made. Without that flag, the version dispatch happens once at
startup time via the IFUNC mechanism.

Also, with -fmultiversion-dynamic-dispatch, the version dispatcher uses
the two new builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports"
to check the cpu type and cpu isa.

Then, I plan to add the following hooks to libgcc (in a different patch) :

int set_mock_cpu_is (const char *cpu);
int set_mock_cpu_supports (const char *isa);
int init_mock_cpu (); // Clear the values of the mock cpu.

With this support, here is how you can test for code coverage of the
"sse" version and "default version of foo in the above example:

int main ()
{
  // Test SSE version.
   if (__builtin_cpu_supports ("sse"))
   {
     init_mock_cpu();
     set_mock_cpu_supports ("sse");
     assert (foo () == 1);
   }
  // Test default version.
  init_mock_cpu();
  assert (foo () == 0);
}

Invoking a multiversioned binary several times with appropriate mock
cpu values for the various ISAs and CPUs will give the complete code
coverage desired. Ofcourse, the underlying platform should be able to
support the various features.

Note that the above test will work only with -fmultiversion-dynamic-dispatch
as the dispatcher must be invoked on every multiversioned call to be able to
dynamically change the version.

Multiple ISA features can be set in the mock cpu by calling
"set_mock_cpu_supports" several times with different ISA names.
Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
will set the CPU type.

This patch only includes the gcc changes.  I will separately prepare a
patch for the libgcc changes. Right now, since the libgcc changes are
not available the two new mock cpu builtins check the real CPU like
"__builtin_cpu_is" and "__builtin_cpu_supports".

Patch attached.  Please look at mv14_debug_code_coverage.C for an
exhaustive example of testing for code coverage in the presence of
multiple versions.

This patch was already discussed when sent earlier to google/gcc-4_7
branch.  That is here: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00557.html

Some of the alternative suggested here are:

* Lazy IFUNC relocation, which got shot down due to problems with bad
interactions with other shared libraries.
* Using environment variables to mock CPU architectures:  This may still be
plausible. For instance:
LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are available

However, with dynamic dispatch, there is the unique advantage of executing
different function versions in the same execution.


Patch attached.  Comments please.

Thanks
Sri
This patch achieves two things:

*  Primarily, this patch makes it easy to test for code coverage
   of multiversioned functions.
*  Secondary, It makes function multiversioning work when there
   is no ifunc support.

This patch makes it easy to
test for code coverage of multiversioned functions. Here is a
motivating example:

__attribute__((target ("default"))) int foo () { ... return 0; }
__attribute__((target ("sse"))) int foo () { ... return 1; }
__attribute__((target ("popcnt"))) int foo () { ... return 2; }

int main ()
{
  return foo();
}

Lets say your test CPU supports popcnt.  A run of this program will
invoke the popcnt version of foo (). Then, how do we test the sse
version of foo()? To do that for the above example, we need to run
this code on a CPU that has sse support but no popcnt support.
Otherwise, we need to comment out the popcnt version and run this
example. This can get painful when there are many versions. The same
argument applies to testing  the default version of foo.

So, I am introducing the ability to mock a CPU. If the CPU you are
testing on supports sse, you should be able to test the sse version.

First, I have introduced a new flag called -fmultiversion-dynamic-dispatch.
This patch invokes the function version dispatcher every time a call to
a foo () is made. Without that flag, the version dispatch happens once at
startup time via the IFUNC mechanism.

Also, with -fmultiversion-dynamic-dispatch, the version dispatcher uses
the two new builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports"
to check the cpu type and cpu isa.

Then, I plan to add the following hooks to libgcc (in a different patch) :

int set_mock_cpu_is (const char *cpu);
int set_mock_cpu_supports (const char *isa);
int init_mock_cpu (); // Clear the values of the mock cpu.

With this support, here is how you can test for code coverage of the
"sse" version and "default version of foo in the above example:

int main ()
{
  // Test SSE version.
   if (__builtin_cpu_supports ("sse"))
   {
     init_mock_cpu();
     set_mock_cpu_supports ("sse");
     assert (foo () == 1);
   }
  // Test default version.
  init_mock_cpu();
  assert (foo () == 0);
}

Invoking a multiversioned binary several times with appropriate mock
cpu values for the various ISAs and CPUs will give the complete code
coverage desired. Ofcourse, the underlying platform should be able to
support the various features.

Note that the above test will work only with -fmultiversion-dynamic-dispatch
as the dispatcher must be invoked on every multiversioned call to be able to
dynamically change the version.

Multiple ISA features can be set in the mock cpu by calling
"set_mock_cpu_supports" several times with different ISA names.
Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
will set the CPU type.

This patch only includes the gcc changes.  I will separately prepare a
patch for the libgcc changes. Right now, since the libgcc changes are
not available the two new mock cpu builtins check the real CPU like
"__builtin_cpu_is" and "__builtin_cpu_supports".

Patch attached.  Please look at mv14_debug_code_coverage.C for an
exhaustive example of testing for code coverage in the presence of
multiple versions.

This patch was already discussed when sent earlier to google/gcc-4_7
branch.  That is here: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00557.html

Some of the alternative suggested here are:

* Lazy IFUNC relocation, which got shot down due to problems with bad
interactions with other shared libraries.
* Using environment variables to mock CPU architectures:  This may still be
plausible. For instance:
LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are available

However, with dynamic dispatch, there is the unique advantage of executing
different function versions in the same execution.


	* cgraphunit.c (cgraph_analyze_function): Pass value of
	-fmultiversion-dynamic-dispatch when building resolver.
	* common.opt (fmultiversion-dynamic-dispatch): New flag.
	* target.def (generate_version_dispatcher_body): New
	parameter.
	* doc/tm.texi (TARGET_GENERATE_VERSION_DISPATCHER_BODY):
	Regenerate.
	* doc/tm.texi.in (TARGET_GENERATE_VERSION_DISPATCHER_BODY):
	Update.
	* doc/invoke.texi (-fmultiversion-dynamic-dispatch): Document
	new flag.
	* testsuite/g++.dg/ext/mv1_debug.C: New test.
	* testsuite/g++.dg/ext/mv2_debug.C: New test.
	* testsuite/g++.dg/ext/mv6_debug.C: New test.
	* testsuite/g++.dg/ext/mv14_debug_code_coverage.C: New test.
	* config/i386/i386.c (IX86_BUILTIN_MOCK_CPU_IS): New enum
	value.
	(IX86_BUILTIN_MOCK_CPU_SUPPORTS): Ditto.
	(add_condition_to_bb): New parameter. Handle code gen when
	dynamic dispatch is needed.
	(get_builtin_code_for_version): New parameter.  Handle
	dynamic dispatch.
	(ix86_compare_version_priority): Call get_builtin_code_for_version
	with updated parameters.
	(dispatch_function_versions): New parameter.  Handle dynamic
	dispatch.
	(make_resolver_func): New parameter.  Handle dynamic dispatch.
	(ix86_generate_version_dispatcher_body): Ditto.
	(ix86_init_platform_type_builtins): New builtins.
	(ix86_expand_builtin): Expand new builtins.

Comments

Joseph Myers May 10, 2013, 1:34 p.m. UTC | #1
On Thu, 9 May 2013, Sriraman Tallam wrote:

> Then, I plan to add the following hooks to libgcc (in a different patch) :
> 
> int set_mock_cpu_is (const char *cpu);
> int set_mock_cpu_supports (const char *isa);
> int init_mock_cpu (); // Clear the values of the mock cpu.

Those names are in the user's namespace; I think libgcc should only 
provide or use symbols in the implementation namespace.
Sriraman Tallam May 10, 2013, 6 p.m. UTC | #2
On Fri, May 10, 2013 at 6:34 AM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Thu, 9 May 2013, Sriraman Tallam wrote:
>
>> Then, I plan to add the following hooks to libgcc (in a different patch) :
>>
>> int set_mock_cpu_is (const char *cpu);
>> int set_mock_cpu_supports (const char *isa);
>> int init_mock_cpu (); // Clear the values of the mock cpu.
>
> Those names are in the user's namespace; I think libgcc should only
> provide or use symbols in the implementation namespace.

Shall I just use __builtin prefixes for these too?, would that work?

Thanks
Sri

>
> --
> Joseph S. Myers
> joseph@codesourcery.com
Joseph Myers May 10, 2013, 9:09 p.m. UTC | #3
On Fri, 10 May 2013, Sriraman Tallam wrote:

> On Fri, May 10, 2013 at 6:34 AM, Joseph S. Myers
> <joseph@codesourcery.com> wrote:
> > On Thu, 9 May 2013, Sriraman Tallam wrote:
> >
> >> Then, I plan to add the following hooks to libgcc (in a different patch) :
> >>
> >> int set_mock_cpu_is (const char *cpu);
> >> int set_mock_cpu_supports (const char *isa);
> >> int init_mock_cpu (); // Clear the values of the mock cpu.
> >
> > Those names are in the user's namespace; I think libgcc should only
> > provide or use symbols in the implementation namespace.
> 
> Shall I just use __builtin prefixes for these too?, would that work?

I'm not sure if that's a good idea for something that's actually a library 
function (we've previously discussed rejecting explicit declarations of 
__builtin_* identifiers to some extent - see bug 32455 - which would be an 
issue for defining library functions with such a name if we do decide in 
future to reject such declarations), but use __ prefixes in some form, 
certainly.
Xinliang David Li May 13, 2013, 6:26 p.m. UTC | #4
The MV testing support includes 3 logical parts:
1) runtime APIs to check mocked CPU types and features
(__builtin_mock_cpu_supports ..)
2) runtime APIs to do CPU mocking;
3) compile time option to do lazy dispatching (instead of using IFUNC).

3)  can be used to also support target without IFUNC support, but it
should be handled differently -- for instance, it does not need an
option, nor should it use the mock version of the feature testing.

I like the flexibility the patch provides for testing -- it allows
global mocking via environment variable, and fine grain mocking at
each callsite. The former is good for application testing, and latter
is suitable for unit testing.

What is the design of the environment variable used to control the
behavior of __builtin_mock_cpu...? They are part of the user interface
and should be documented somewhere.

thanks,

David


On Thu, May 9, 2013 at 7:30 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>    This patch is an enhancement to the Function Multiversioning
> feature. This patch achieves two things:
>
> *  Primarily, this patch makes it easy to test for code coverage
>    of multiversioned functions.
> *  Secondary, It makes function multiversioning work when there
>    is no ifunc support. Since it invokes the dispatcher for every
>    call, it is possible to execute different function versions every
>    time. This incurs a performance penalty.
>
> This patch makes it easy to
> test for code coverage of multiversioned functions. Here is a
> motivating example:
>
> __attribute__((target ("default"))) int foo () { ... return 0; }
> __attribute__((target ("sse"))) int foo () { ... return 1; }
> __attribute__((target ("popcnt"))) int foo () { ... return 2; }
>
> int main ()
> {
>   return foo();
> }
>
> Lets say your test CPU supports popcnt.  A run of this program will
> invoke the popcnt version of foo (). Then, how do we test the sse
> version of foo()? To do that for the above example, we need to run
> this code on a CPU that has sse support but no popcnt support.
> Otherwise, we need to comment out the popcnt version and run this
> example. This can get painful when there are many versions. The same
> argument applies to testing  the default version of foo.
>
> So, I am introducing the ability to mock a CPU. If the CPU you are
> testing on supports sse, you should be able to test the sse version.
>
> First, I have introduced a new flag called -fmultiversion-dynamic-dispatch.
> This patch invokes the function version dispatcher every time a call to
> a foo () is made. Without that flag, the version dispatch happens once at
> startup time via the IFUNC mechanism.
>
> Also, with -fmultiversion-dynamic-dispatch, the version dispatcher uses
> the two new builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports"
> to check the cpu type and cpu isa.
>
> Then, I plan to add the following hooks to libgcc (in a different patch) :
>
> int set_mock_cpu_is (const char *cpu);
> int set_mock_cpu_supports (const char *isa);
> int init_mock_cpu (); // Clear the values of the mock cpu.
>
> With this support, here is how you can test for code coverage of the
> "sse" version and "default version of foo in the above example:
>
> int main ()
> {
>   // Test SSE version.
>    if (__builtin_cpu_supports ("sse"))
>    {
>      init_mock_cpu();
>      set_mock_cpu_supports ("sse");
>      assert (foo () == 1);
>    }
>   // Test default version.
>   init_mock_cpu();
>   assert (foo () == 0);
> }
>
> Invoking a multiversioned binary several times with appropriate mock
> cpu values for the various ISAs and CPUs will give the complete code
> coverage desired. Ofcourse, the underlying platform should be able to
> support the various features.
>
> Note that the above test will work only with -fmultiversion-dynamic-dispatch
> as the dispatcher must be invoked on every multiversioned call to be able to
> dynamically change the version.
>
> Multiple ISA features can be set in the mock cpu by calling
> "set_mock_cpu_supports" several times with different ISA names.
> Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
> will set the CPU type.
>
> This patch only includes the gcc changes.  I will separately prepare a
> patch for the libgcc changes. Right now, since the libgcc changes are
> not available the two new mock cpu builtins check the real CPU like
> "__builtin_cpu_is" and "__builtin_cpu_supports".
>
> Patch attached.  Please look at mv14_debug_code_coverage.C for an
> exhaustive example of testing for code coverage in the presence of
> multiple versions.
>
> This patch was already discussed when sent earlier to google/gcc-4_7
> branch.  That is here: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00557.html
>
> Some of the alternative suggested here are:
>
> * Lazy IFUNC relocation, which got shot down due to problems with bad
> interactions with other shared libraries.
> * Using environment variables to mock CPU architectures:  This may still be
> plausible. For instance:
> LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are available
>
> However, with dynamic dispatch, there is the unique advantage of executing
> different function versions in the same execution.
>
>
> Patch attached.  Comments please.
>
> Thanks
> Sri
diff mbox

Patch

Index: cgraphunit.c
===================================================================
--- cgraphunit.c	(revision 198754)
+++ cgraphunit.c	(working copy)
@@ -640,7 +640,13 @@  cgraph_analyze_function (struct cgraph_node *node)
 	{
 	  tree resolver = NULL_TREE;
 	  gcc_assert (targetm.generate_version_dispatcher_body);
-	  resolver = targetm.generate_version_dispatcher_body (node);
+	  /* When -fmultiversion-dynamic-dispatch is not turned on, the
+	     dispatcher should be invoked optimally (once using ifunc support).
+	     When -fmultiversion-dynamic-dispatch is on, the dispatcher should
+	     be invoked every time a call to the multiversioned function is
+	     made.  */
+	  resolver = targetm.generate_version_dispatcher_body (node,
+				flag_multiversion_dynamic_dispatch);
 	  gcc_assert (resolver != NULL_TREE);
 	}
     }
Index: common.opt
===================================================================
--- common.opt	(revision 198754)
+++ common.opt	(working copy)
@@ -1555,6 +1555,10 @@  fmove-loop-invariants
 Common Report Var(flag_move_loop_invariants) Init(1) Optimization
 Move loop invariant computations out of loops
 
+fmultiversion-dynamic-dispatch
+Common Report Var(flag_multiversion_dynamic_dispatch) Init(0)
+Invoke the function version dispatcher for every multiversioned function call.
+
 fdce
 Common Var(flag_dce) Init(1) Optimization
 Use the RTL dead code elimination pass
Index: target.def
===================================================================
--- target.def	(revision 198754)
+++ target.def	(working copy)
@@ -1323,11 +1323,12 @@  DEFHOOK
 /*  Target hook is used to generate the dispatcher logic to invoke the right
     function version at run-time for a given set of function versions.
     ARG points to the callgraph node of the dispatcher function whose body
-    must be generated.  */
+    must be generated.  The version dispatcher is invoked on every call when
+    debug_mode is 1.  */
 DEFHOOK
 (generate_version_dispatcher_body,
  "",
- tree, (void *arg), NULL) 
+ tree, (void *arg, int debug_mode), NULL) 
 
 /* Target hook is used to get the dispatcher function for a set of function
    versions.  The dispatcher function is called to invoke the right function
Index: doc/tm.texi
===================================================================
--- doc/tm.texi	(revision 198754)
+++ doc/tm.texi	(working copy)
@@ -10961,11 +10961,13 @@  version at run-time. @var{decl} is one version fro
 identical versions.
 @end deftypefn
 
-@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
+@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg}, int @var{debug_mode})
 This hook is used to generate the dispatcher logic to invoke the right
 function version at run-time for a given set of function versions.
 @var{arg} points to the callgraph node of the dispatcher function whose
-body must be generated.
+body must be generated.  When @var{debug_mode} is 1, the dispatcher
+logic is invoked on every call. Otherwise, the dispatcher is invoked
+only at start up to minimize call overhead.
 @end deftypefn
 
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in	(revision 198754)
+++ doc/tm.texi.in	(working copy)
@@ -10804,7 +10804,9 @@  identical versions.
 This hook is used to generate the dispatcher logic to invoke the right
 function version at run-time for a given set of function versions.
 @var{arg} points to the callgraph node of the dispatcher function whose
-body must be generated.
+body must be generated.  When @var{debug_mode} is 1, the dispatcher
+logic is invoked on every call. Otherwise, the dispatcher is invoked
+only at start up to minimize call overhead.
 @end deftypefn
 
 @hook TARGET_INVALID_WITHIN_DOLOOP
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 198754)
+++ doc/invoke.texi	(working copy)
@@ -178,6 +178,7 @@  in the following sections.
 @xref{C++ Dialect Options,,Options Controlling C++ Dialect}.
 @gccoptlist{-fabi-version=@var{n}  -fno-access-control  -fcheck-new @gol
 -fconstexpr-depth=@var{n}  -ffriend-injection @gol
+-fmultiversion-dynamic-dispatch @gol
 -fno-elide-constructors @gol
 -fno-enforce-eh-specs @gol
 -ffor-scope  -fno-for-scope  -fno-gnu-keywords @gol
@@ -2023,6 +2024,13 @@  earlier releases.
 This option is for compatibility, and may be removed in a future
 release of G++.
 
+@item -fmultiversion-dynamic-dispatch
+@opindex fmultiversion-dynamic-dispatch
+When using function multiversioning, the function versions dispatcher is
+invoked only once at start-up using IFUNC support to minimize call overhead.
+This flag can be used to instead invoke the dispatcher every time a call to
+a multiversioned function is made.
+
 @item -fno-elide-constructors
 @opindex fno-elide-constructors
 The C++ standard allows an implementation to omit creating a temporary
Index: testsuite/g++.dg/ext/mv1_debug.C
===================================================================
--- testsuite/g++.dg/ext/mv1_debug.C	(revision 0)
+++ testsuite/g++.dg/ext/mv1_debug.C	(revision 0)
@@ -0,0 +1,4 @@ 
+/* Test case to check if mv1.C works with -fmultiversion-dynamic-dispatch additionally added.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fPIC -fmultiversion-dynamic-dispatch" } */
+/* { dg-additional-sources "mv1.C" } */
Index: testsuite/g++.dg/ext/mv14_debug_code_coverage.C
===================================================================
--- testsuite/g++.dg/ext/mv14_debug_code_coverage.C	(revision 0)
+++ testsuite/g++.dg/ext/mv14_debug_code_coverage.C	(revision 0)
@@ -0,0 +1,214 @@ 
+/* Test case to show how code coverage testing of of a multiversioned function
+   can be done using cpu mocks.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fmultiversion-dynamic-dispatch" } */
+
+#include <assert.h>
+#include <string.h>
+
+/* Temporary code till the libgcc hooks for this are checked in. Override
+   __builtin_mock_cpu_* builtins to change the mock cpu.  */
+const char *mock_cpu = NULL;
+int __builtin_mock_cpu_is (const char *cpu)
+{
+  if (strcmp (cpu, mock_cpu) == 0)
+    return 1;
+  return 0;  
+}
+
+/* Temporary code till the libgcc hooks for this are checked in.
+   Only mock one ISA type.  The libgcc hooks will allow mocking multiple
+   ISA features together, like popcnt and avx2.  */
+const char *mock_isa = NULL;
+int __builtin_mock_cpu_supports (const char *isa)
+{
+  if (strcmp (isa, mock_isa) == 0)
+    return 1;
+  return 0;
+}
+/* End of temporary code.  */
+
+
+/* Default version.  */
+int foo () __attribute__ ((target ("default")));
+
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int foo () __attribute__ ((target ("arch=corei7")));
+
+int main ()
+{
+  /* Using CPU mocks run each version of foo() when possible and
+     check the return value.  */
+
+  /* Run Intel corei7 version if possible.  Test if this
+     CPU can mock corei7.  It should support SSE4.2 and
+     below, SSSE3 and MMX. */
+  if (__builtin_cpu_supports ("sse4.2")
+      && __builtin_cpu_supports ("ssse3")
+      && __builtin_cpu_supports ("mmx"))
+    {
+      mock_cpu = "corei7";
+      mock_isa = "";
+      assert (foo () == 11);
+    }
+
+  /* Run avx2 version if possible.  */
+  if (__builtin_cpu_supports ("avx2"))
+    {
+      mock_cpu = "";
+      mock_isa = "avx2";
+      assert (foo () == 1);
+    }
+  /* Run avx version if possible.  */
+  if (__builtin_cpu_supports ("avx"))
+    {
+      mock_cpu = "";
+      mock_isa = "avx";
+      assert (foo () == 2);
+    }
+  /* Run popcnt version if possible.  */
+  if (__builtin_cpu_supports ("popcnt"))
+    {
+      mock_cpu = "";
+      mock_isa = "popcnt";
+      assert (foo () == 3);
+    }
+  /* Run sse4.2 version if possible.  */
+  if (__builtin_cpu_supports ("sse4.2"))
+    {
+      mock_cpu = "";
+      mock_isa = "sse4.2";
+      assert (foo () == 4);
+    }
+  /* Run sse4.1 version if possible.  */
+  if (__builtin_cpu_supports ("sse4.1"))
+    {
+      mock_cpu = "";
+      mock_isa = "sse4.1";
+      assert (foo () == 5);
+    }
+  /* Run ssse3 version if possible.  */
+  if (__builtin_cpu_supports ("ssse3"))
+    {
+      mock_cpu = "";
+      mock_isa = "ssse3";
+      assert (foo () == 6);
+    }
+  /* Run sse3 version if possible.  */
+  if (__builtin_cpu_supports ("sse3"))
+    {
+      mock_cpu = "";
+      mock_isa = "sse3";
+      assert (foo () == 7);
+    }
+  /* Run sse2 version if possible.  */
+  if (__builtin_cpu_supports ("sse2"))
+    {
+      mock_cpu = "";
+      mock_isa = "sse2";
+      assert (foo () == 8);
+    }
+  /* Run sse version if possible.  */
+  if (__builtin_cpu_supports ("sse"))
+    {
+      mock_cpu = "";
+      mock_isa = "sse";
+      assert (foo () == 9);
+    }
+  /* Run mmx version if possible.  */
+  if (__builtin_cpu_supports ("mmx"))
+    {
+      mock_cpu = "";
+      mock_isa = "mmx";
+      assert (foo () == 10);
+    }
+
+  /* Run the default version.  */
+  mock_cpu = "";
+  mock_isa = "";
+  assert (foo () == 0);
+
+  return 0;
+}
+
+int __attribute__ ((target("default")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 11;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: testsuite/g++.dg/ext/mv2_debug.C
===================================================================
--- testsuite/g++.dg/ext/mv2_debug.C	(revision 0)
+++ testsuite/g++.dg/ext/mv2_debug.C	(revision 0)
@@ -0,0 +1,4 @@ 
+/* Test case to check if mv2.C works with -fmultiversion-dynamic-dispatch additionally added.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fmultiversion-dynamic-dispatch" } */
+/* { dg-additional-sources "mv2.C" } */
Index: testsuite/g++.dg/ext/mv6_debug.C
===================================================================
--- testsuite/g++.dg/ext/mv6_debug.C	(revision 0)
+++ testsuite/g++.dg/ext/mv6_debug.C	(revision 0)
@@ -0,0 +1,4 @@ 
+/* Test case to check if mv6.C works with -fmultiversion-dynamic-dispatch additionally added.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-march=x86-64 -fmultiversion-dynamic-dispatch" } */
+/* { dg-additional-sources "mv6.C" } */
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 198754)
+++ config/i386/i386.c	(working copy)
@@ -26779,6 +26779,11 @@  enum ix86_builtins
   IX86_BUILTIN_CPU_IS,
   IX86_BUILTIN_CPU_SUPPORTS,
 
+  /* Builtins to mock CPU and ISA features, for
+     testing multiversioned functions.  */
+  IX86_BUILTIN_MOCK_CPU_IS,
+  IX86_BUILTIN_MOCK_CPU_SUPPORTS,
+
   IX86_BUILTIN_MAX
 };
 
@@ -28631,11 +28636,14 @@  ix86_init_mmx_sse_builtins (void)
    to return a pointer to VERSION_DECL if the outcome of the expression
    formed by PREDICATE_CHAIN is true.  This function will be called during
    version dispatch to decide which function version to execute.  It returns
-   the basic block at the end, to which more conditions can be added.  */
+   the basic block at the end, to which more conditions can be added.  When
+   DEBUG_MODE is 1, the version dispatcher is invoked for every call
+   to the multiversioned function.  */
 
 static basic_block
 add_condition_to_bb (tree function_decl, tree version_decl,
-		     tree predicate_chain, basic_block new_bb)
+		     tree predicate_chain, basic_block new_bb,
+		     int debug_mode)
 {
   gimple return_stmt;
   tree convert_expr, result_var;
@@ -28656,11 +28664,43 @@  add_condition_to_bb (tree function_decl, tree vers
   gcc_assert (new_bb != NULL);
   gseq = bb_seq (new_bb);
 
+  /* If debug_mode is true, generate a call to the versioned function
+     and return the output of the call.  Otherwise, return a pointer to
+     the versioned function.  */
 
-  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
-	     		 build_fold_addr_expr (version_decl));
-  result_var = create_tmp_var (ptr_type_node, NULL);
-  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  if (debug_mode)
+    {
+      tree arg;
+      tree ret_type = TREE_TYPE (TREE_TYPE (function_decl));
+      vec<tree> tmp_vec = vNULL;
+      tmp_vec.create (2);
+      
+      arg = DECL_ARGUMENTS (function_decl);
+
+      while (arg)
+	{
+	  tmp_vec.safe_push (arg);
+	  arg = DECL_CHAIN (arg);
+	}
+
+      convert_stmt = gimple_build_call_vec (version_decl, tmp_vec);
+      tmp_vec.release ();
+      result_var = NULL;
+
+      if (ret_type != void_type_node)
+	{
+          result_var = DECL_RESULT (function_decl);
+          gimple_call_set_lhs (convert_stmt, result_var);
+	}
+    }
+  else
+    {
+      convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		     build_fold_addr_expr (version_decl));
+      result_var = DECL_RESULT (function_decl);
+      convert_stmt = gimple_build_assign (result_var, convert_expr); 
+    }
+
   return_stmt = gimple_build_return (result_var);
 
   if (predicate_chain == NULL_TREE)
@@ -28742,10 +28782,11 @@  add_condition_to_bb (tree function_decl, tree vers
    the right builtin to use to match the platform specification.
    It returns the priority value for this version decl.  If PREDICATE_LIST
    is not NULL, it stores the list of cpu features that need to be checked
-   before dispatching this function.  */
+   before dispatching this function.   When debug_mode is 1, use the mock
+   cpu check builtins to do the dispatch.  */
 
 static unsigned int
-get_builtin_code_for_version (tree decl, tree *predicate_list)
+get_builtin_code_for_version (tree decl, tree *predicate_list, int debug_mode)
 {
   tree attrs;
   struct cl_target_option cur_target;
@@ -28882,7 +28923,10 @@  static unsigned int
     
       if (predicate_list)
 	{
-          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+	  if (debug_mode)
+            predicate_decl = ix86_builtins [(int) IX86_BUILTIN_MOCK_CPU_IS];
+	  else
+            predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
           /* For a C string literal the length includes the trailing NULL.  */
           predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
           predicate_chain = tree_cons (predicate_decl, predicate_arg,
@@ -28894,8 +28938,12 @@  static unsigned int
   tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
   strcpy (tok_str, attrs_str);
   token = strtok (tok_str, ",");
-  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
 
+  if (debug_mode)
+    predicate_decl = ix86_builtins [(int) IX86_BUILTIN_MOCK_CPU_SUPPORTS];
+  else
+    predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
   while (token != NULL)
     {
       /* Do not process "arch="  */
@@ -28957,8 +29005,8 @@  static unsigned int
 static int
 ix86_compare_version_priority (tree decl1, tree decl2)
 {
-  unsigned int priority1 = get_builtin_code_for_version (decl1, NULL);
-  unsigned int priority2 = get_builtin_code_for_version (decl2, NULL);
+  unsigned int priority1 = get_builtin_code_for_version (decl1, NULL, false);
+  unsigned int priority2 = get_builtin_code_for_version (decl2, NULL, false);
 
   return (int)priority1 - (int)priority2;
 }
@@ -28985,12 +29033,15 @@  feature_compare (const void *v1, const void *v2)
    multi-versioned functions.  DISPATCH_DECL is the function which will
    contain the dispatch logic.  FNDECLS are the function choices for
    dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
-   in DISPATCH_DECL in which the dispatch code is generated.  */
+   in DISPATCH_DECL in which the dispatch code is generated.  When
+   DEBUG_MODE is 1, the version dispatcher is invoked for every call
+   to the multiversioned function.  */
 
 static int
 dispatch_function_versions (tree dispatch_decl,
 			    void *fndecls_p,
-			    basic_block *empty_bb)
+			    basic_block *empty_bb,
+			    int debug_mode)
 {
   tree default_decl;
   gimple ifunc_cpu_init_stmt;
@@ -29048,8 +29099,8 @@  dispatch_function_versions (tree dispatch_decl,
       /* Get attribute string, parse it and find the right predicate decl.
          The predicate function could be a lengthy combination of many
 	 features, like arch-type and various isa-variants.  */
-      priority = get_builtin_code_for_version (version_decl,
-	 			               &predicate_chain);
+      priority = get_builtin_code_for_version (version_decl, &predicate_chain,
+					       debug_mode);
 
       if (predicate_chain == NULL_TREE)
 	continue;
@@ -29072,11 +29123,11 @@  dispatch_function_versions (tree dispatch_decl,
     *empty_bb = add_condition_to_bb (dispatch_decl,
 				     function_version_info[i].version_decl,
 				     function_version_info[i].predicate_chain,
-				     *empty_bb);
+				     *empty_bb, debug_mode);
 
   /* dispatch default version at the end.  */
   *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
-				   NULL, *empty_bb);
+				   NULL, *empty_bb, debug_mode);
 
   free (function_version_info);
   return 0;
@@ -29446,7 +29497,7 @@  ix86_get_function_versions_dispatcher (void *decl)
   default_node = default_version_info->this_node;
 
 #if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
-  if (targetm.has_ifunc_p ())
+  if (targetm.has_ifunc_p () || flag_multiversion_dynamic_dispatch)
     {
       struct cgraph_function_version_info *it_v = NULL;
       struct cgraph_node *dispatcher_node = NULL;
@@ -29475,8 +29526,9 @@  ix86_get_function_versions_dispatcher (void *decl)
 #endif
     {
       error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
-		"multiversioning needs ifunc which is not supported "
-		"on this target");
+		"multiversioning needs ifunc"
+		" (or use -fmultiversion-dynamic-dispatch)"
+		" which is not supported on this target");
     }
 
   return dispatch_decl;
@@ -29503,15 +29555,19 @@  make_attribute (const char *name, const char *arg_
 /* Make the resolver function decl to dispatch the versions of
    a multi-versioned function,  DEFAULT_DECL.  Create an
    empty basic block in the resolver and store the pointer in
-   EMPTY_BB.  Return the decl of the resolver function.  */
+   EMPTY_BB.  Return the decl of the resolver function.  When
+   DEBUG_MODE is 1, the resolver function body is not an
+   ifunc resolver; it simply calls the appropriate function
+   version and returns the call output.  */
 
 static tree
 make_resolver_func (const tree default_decl,
 		    const tree dispatch_decl,
-		    basic_block *empty_bb)
+		    basic_block *empty_bb,
+		    int debug_mode)
 {
   char *resolver_name;
-  tree decl, type, decl_name, t;
+  tree decl, type, decl_name, t = NULL;
   bool is_uniq = false;
 
   /* IFUNC's have to be globally visible.  So, if the default_decl is
@@ -29526,8 +29582,19 @@  make_resolver_func (const tree default_decl,
      another module which is based on the same version name.  */
   resolver_name = make_name (default_decl, "resolver", is_uniq);
 
-  /* The resolver function should return a (void *). */
-  type = build_function_type_list (ptr_type_node, NULL_TREE);
+  if (debug_mode)
+    {
+      /* In debug_mode, the resolver function calls the appropriate
+	 function version.  Its type is same as dispatch_decl.  */
+      tree fn_type = TREE_TYPE (dispatch_decl);
+      type = build_function_type (TREE_TYPE (fn_type),
+				  TYPE_ARG_TYPES (fn_type));
+    }
+  else
+    {
+      /* The resolver function should return a (void *). */
+      type = build_function_type_list (ptr_type_node, NULL_TREE);
+    }
 
   decl = build_fn_decl (resolver_name, type);
   decl_name = get_identifier (resolver_name);
@@ -29549,6 +29616,16 @@  make_resolver_func (const tree default_decl,
   DECL_INITIAL (decl) = make_node (BLOCK);
   DECL_STATIC_CONSTRUCTOR (decl) = 0;
 
+  /* In debug_mode, the resolver function is not an ifunc resolver.  Its
+     signature is the same as the dispatch_decl or default_decl.  */
+  if (debug_mode)
+    {
+      tree arg;
+      DECL_ARGUMENTS (decl) = copy_list (DECL_ARGUMENTS (default_decl));
+      for (arg = DECL_ARGUMENTS (decl); arg ; arg = DECL_CHAIN (arg))
+	DECL_CONTEXT (arg) = decl;
+    }
+
   if (DECL_COMDAT_GROUP (default_decl)
       || TREE_PUBLIC (default_decl))
     {
@@ -29559,7 +29636,9 @@  make_resolver_func (const tree default_decl,
       make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
     }
   /* Build result decl and add to function_decl. */
-  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+		  TREE_TYPE (TREE_TYPE (decl)));
+
   DECL_ARTIFICIAL (t) = 1;
   DECL_IGNORED_P (t) = 1;
   DECL_RESULT (decl) = t;
@@ -29574,9 +29653,17 @@  make_resolver_func (const tree default_decl,
   pop_cfun ();
 
   gcc_assert (dispatch_decl != NULL);
-  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
-  DECL_ATTRIBUTES (dispatch_decl) 
-    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+ 
+  /* Mark dispatch_decl as "alias" or "ifunc" with resolver as
+     resolver_name.  */
+  if (debug_mode)
+    DECL_ATTRIBUTES (dispatch_decl) 
+      = make_attribute ("alias", resolver_name,
+	   	        DECL_ATTRIBUTES (dispatch_decl));
+  else
+    DECL_ATTRIBUTES (dispatch_decl) 
+      = make_attribute ("ifunc", resolver_name,
+		        DECL_ATTRIBUTES (dispatch_decl));
 
   /* Create the alias for dispatch to resolver here.  */
   /*cgraph_create_function_alias (dispatch_decl, decl);*/
@@ -29588,10 +29675,13 @@  make_resolver_func (const tree default_decl,
 /* Generate the dispatching code body to dispatch multi-versioned function
    DECL.  The target hook is called to process the "target" attributes and
    provide the code to dispatch the right function at run-time.  NODE points
-   to the dispatcher decl whose body will be created.  */
+   to the dispatcher decl whose body will be created.  When DEBUG_MODE is
+   1, the dispatch checks should be made during every call to the versioned
+   function.  When DEBUG_MODE is 0, ifunc based dispatching is used to
+   keep the call overhead small.  */
 
 static tree 
-ix86_generate_version_dispatcher_body (void *node_p)
+ix86_generate_version_dispatcher_body (void *node_p, int debug_mode)
 {
   tree resolver_decl;
   basic_block empty_bb;
@@ -29618,8 +29708,8 @@  static tree
   /* node is going to be an alias, so remove the finalized bit.  */
   node->local.finalized = false;
 
-  resolver_decl = make_resolver_func (default_ver_decl,
-				      node->symbol.decl, &empty_bb);
+  resolver_decl = make_resolver_func (default_ver_decl, node->symbol.decl,
+				      &empty_bb, debug_mode);
 
   node_version_info->dispatcher_resolver = resolver_decl;
 
@@ -29642,7 +29732,8 @@  static tree
       fn_ver_vec.safe_push (versn->symbol.decl);
     }
 
-  dispatch_function_versions (resolver_decl, &fn_ver_vec, &empty_bb);
+  dispatch_function_versions (resolver_decl, &fn_ver_vec,
+			      &empty_bb, debug_mode);
   fn_ver_vec.release ();
   rebuild_cgraph_edges (); 
   pop_cfun ();
@@ -29828,7 +29919,8 @@  fold_builtin_cpu (tree fndecl, tree *args)
 
   gcc_assert (param_string_cst);
 
-  if (fn_code == IX86_BUILTIN_CPU_IS)
+  if (fn_code == IX86_BUILTIN_CPU_IS
+      || fn_code == IX86_BUILTIN_MOCK_CPU_IS)
     {
       tree ref;
       tree field;
@@ -29877,7 +29969,8 @@  fold_builtin_cpu (tree fndecl, tree *args)
 		      build_int_cstu (unsigned_type_node, field_val));
       return build1 (CONVERT_EXPR, integer_type_node, final);
     }
-  else if (fn_code == IX86_BUILTIN_CPU_SUPPORTS)
+  else if (fn_code == IX86_BUILTIN_CPU_SUPPORTS
+	   || fn_code == IX86_BUILTIN_MOCK_CPU_SUPPORTS)
     {
       tree ref;
       tree array_elt;
@@ -29931,7 +30024,9 @@  ix86_fold_builtin (tree fndecl, int n_args,
       enum ix86_builtins fn_code = (enum ix86_builtins)
 				   DECL_FUNCTION_CODE (fndecl);
       if (fn_code ==  IX86_BUILTIN_CPU_IS
-	  || fn_code == IX86_BUILTIN_CPU_SUPPORTS)
+	  || fn_code == IX86_BUILTIN_CPU_SUPPORTS
+          || fn_code ==  IX86_BUILTIN_MOCK_CPU_IS
+	  || fn_code == IX86_BUILTIN_MOCK_CPU_SUPPORTS)
 	{
 	  gcc_assert (n_args == 1);
           return fold_builtin_cpu (fndecl, args);
@@ -29981,6 +30076,13 @@  ix86_init_platform_type_builtins (void)
 			 INT_FTYPE_PCCHAR, true);
   make_cpu_type_builtin ("__builtin_cpu_supports", IX86_BUILTIN_CPU_SUPPORTS, 
 			 INT_FTYPE_PCCHAR, true);
+  /* Create builtins that mock cpu type and isa features.  This is meant to
+     be used for code coverage testing of multiversioned functions.  */
+  make_cpu_type_builtin ("__builtin_mock_cpu_is", IX86_BUILTIN_MOCK_CPU_IS,
+			 INT_FTYPE_PCCHAR, false);
+  make_cpu_type_builtin ("__builtin_mock_cpu_supports",
+			 IX86_BUILTIN_MOCK_CPU_SUPPORTS,
+			 INT_FTYPE_PCCHAR, false);
 }
 
 /* Internal method for ix86_init_builtins.  */
@@ -31701,6 +31803,8 @@  ix86_expand_builtin (tree exp, rtx target, rtx sub
 	call_expr = build_call_expr (fndecl, 0); 
 	return expand_expr (call_expr, target, mode, EXPAND_NORMAL);
       }
+    case IX86_BUILTIN_MOCK_CPU_IS:
+    case IX86_BUILTIN_MOCK_CPU_SUPPORTS:
     case IX86_BUILTIN_CPU_IS:
     case IX86_BUILTIN_CPU_SUPPORTS:
       {