diff mbox

Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

Message ID 87mvve95af.fsf@schwinge.name
State New
Headers show

Commit Message

Thomas Schwinge Oct. 19, 2015, 4:44 p.m. UTC
Hi!

Ping...

On Wed, 30 Sep 2015 17:54:07 +0200, I wrote:
> On Tue, 29 Sep 2015 10:18:14 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote:
> > > On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > > > So, do I understand well that you'll call GOMP_set_offload_targets from
> > > > construct[ors] of all shared libraries (and the binary) that contain offloaded
> > > > code?  If yes, that is surely going to fail the assertions in there.
> > > 
> > > Indeed.  My original plan has been to generate/invoke this constructor
> > > only for/from the final executable and not for any shared libraries, but
> > > it seems I didn't implemented this correctly.
> > 
> > How would you mean to implement it?
> 
> I have come to realize that we need to generate/invoke this constructor
> From everything that links against libgomp (which is what I implemented),
> that is, executables as well as shared libraries.
> 
> > -fopenmp or -fopenacc code with
> > offloading bits might not be in the final executable at all, nor in shared
> > libraries it is linked against; such libraries could be only dlopened,
> > consider say python plugin.  And this is not just made up, perhaps not with
> > offloading yet, but people regularly use OpenMP code in plugins and then we
> > get complains that fork child of the main program is not allowed to do
> > anything but async-signal-safe functions.
> 
> I'm not sure I'm completely understanding that paragraph?  Are you saying
> that offloaded code can be in libraries that are not linked against
> libgomp?  How would these register (GOMP_offload_register) their
> offloaded code?  I think it's a reasonable to expect that every shared
> library that contains offloaded code must link against libgomp, which
> will happen automatically given that it is built with -fopenmp/-fopenacc?
> 
> > > > You can dlopen such libraries etc.  What if you link one library with
> > > > -fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?
> > > 
> > > So, the first question to answer is: what do we expect to happen in this
> > > case, or similarly, if the executable and any shared libraries are
> > > compiled with different/incompatible -foffload options?
> > 
> > As the device numbers are per-process, the only possibility I see is that
> > all the physically available devices are always available, and just if you
> > try to offload from some code to a device that doesn't support it, you get
> > host fallback.  Because, one shared library could carefully use device(xyz)
> > to offload to say XeonPhi it is compiled for and supports, and another
> > library device(abc) to offload to PTX it is compiled for and supports.
> 
> OK, I think I get that, and it makes sense.  Even though, I don't know
> how you'd do that today: as far as I can tell, there is no specification
> covering the OpenMP 4 target device IDs, so I have no idea how a user
> program/library could realiably use them in practice?  For example, in
> the current GCC implementation, the OpenMP 4 target device IDs depend on
> the number of individual devices availble in the system, and the order in
> which libgomp loads the plugins, which is defined (arbitrarily) by the
> GCC configuration?
> 
> > > For this, I propose that the only mode of operation that we currently can
> > > support is that all of the executable and any shared libraries agree on
> > > the offload targets specified by -foffload, and I thus propose the
> > > following patch on top of what Joseph has posted before (passes the
> > > testsuite, but not yet tested otherwise):
> > 
> > See above, no.
> 
> OK.
> 
> How's the following (complete patch instead of incremental patch; the
> driver changes are still the same as before)?  The changes are:
> 
>   * libgomp/target.c:gomp_target_init again loads all the plugins.
>   * libgomp/target.c:resolve_device and
>     libgomp/oacc-init.c:resolve_device verify that a default device
>     (OpenMP device-var ICV, and acc_device_default, respectively) is
>     actually enabled, or resort to host fallback if not.
>   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
>     to enable devices specified by -foffload.  Can be called multiple
>     times (executable, any shared libraries); the set of enabled devices
>     is the union of all those ever requested.
>   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
>     changed to enable all devices.  This is to maintain compatibility
>     with old executables and shared libraries built without the -foffload
>     constructor support.
>   * IntelMIC mkoffload changed to use GOMP_offload_register_ver instead
>     of GOMP_offload_register, and GOMP_offload_unregister_ver instead of
>     GOMP_offload_unregister.  To avoid enabling all devices
>     (GOMP_offload_register).
>   * New test cases to verify this (-foffload=disable, host fallback).

(Will write ChangeLog once the general approach has been approved.)

> Ilya, I'm aware of your work on additional changes (shared memory),
> <http://news.gmane.org/find-root.php?message_id=%3CCADG%3DZ0EBuhj89WEZdmaNUPy%3DE%3D63BmWofS8An8nY7rygTmdJ_w%40mail.gmail.com%3E>,
> but I think my patch is still an improvement already?
> 
> Jakub, is this OK as an incremental step forward?

Rebased on top of current trunk:

 gcc/config/i386/intelmic-mkoffload.c               |  20 +-
 gcc/fortran/gfortranspec.c                         |   2 +-
 gcc/gcc.c                                          | 139 +++++++++++---
 gcc/gcc.h                                          |   2 +-
 gcc/java/jvspec.c                                  |   2 +-
 libgomp/config.h.in                                |   2 +-
 libgomp/configure                                  |   6 +-
 libgomp/libgomp-plugin.h                           |   3 +-
 libgomp/libgomp.h                                  |   1 +
 libgomp/libgomp.map                                |   1 +
 libgomp/libgomp_g.h                                |   1 +
 libgomp/oacc-init.c                                |  18 +-
 libgomp/plugin/configfrag.ac                       |   8 +-
 libgomp/target.c                                   | 210 +++++++++++++++++----
 libgomp/testsuite/lib/libgomp.exp                  |  24 +--
 .../libgomp.c++/target-1-foffload_disable.C        |   3 +
 .../libgomp.c++/target-foffload_disable.C          |   3 +
 .../libgomp.c/target-1-foffload_disable.c          |   3 +
 .../testsuite/libgomp.c/target-foffload_disable.c  |  18 ++
 .../libgomp.fortran/target-foffload_disable.f      |  14 ++
 .../libgomp.fortran/target1-foffload_disable.f90   |   3 +
 libgomp/testsuite/libgomp.oacc-c++/c++.exp         |  14 +-
 libgomp/testsuite/libgomp.oacc-c/c.exp             |  13 +-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |  14 +-
 24 files changed, 393 insertions(+), 131 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C
 create mode 100644 libgomp/testsuite/libgomp.c++/target-foffload_disable.C
 create mode 100644 libgomp/testsuite/libgomp.c/target-1-foffload_disable.c
 create mode 100644 libgomp/testsuite/libgomp.c/target-foffload_disable.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-foffload_disable.f
 create mode 100644 libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90



Grüße
 Thomas

Comments

Jakub Jelinek Oct. 20, 2015, 10:02 a.m. UTC | #1
On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote:
> > How's the following (complete patch instead of incremental patch; the
> > driver changes are still the same as before)?  The changes are:
> > 
> >   * libgomp/target.c:gomp_target_init again loads all the plugins.
> >   * libgomp/target.c:resolve_device and
> >     libgomp/oacc-init.c:resolve_device verify that a default device
> >     (OpenMP device-var ICV, and acc_device_default, respectively) is
> >     actually enabled, or resort to host fallback if not.
> >   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> >     to enable devices specified by -foffload.  Can be called multiple
> >     times (executable, any shared libraries); the set of enabled devices
> >     is the union of all those ever requested.
> >   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> >     changed to enable all devices.  This is to maintain compatibility
> >     with old executables and shared libraries built without the -foffload
> >     constructor support.

Any reason not to pass the bitmask of the enabled targets to
GOMP_offload_register_ver instead, to decrease the amount of ctors and
the times you lock the various locks during initialization, or just enable
automatically the devices you load data for during GOMP_offload_register_ver?
I mean, GOMP_offload_register would enable for compatibility all devices,
GOMP_offload_register_ver would enable the device it is registered for.
For -foffload=disable on all shared libraries/binaries, naturally you would
not register anything, thus would not enable any devices (only host fallback
would work).

Or are you worried about the case where one shared library is compiled
with say -foffload=intelmic,ptx but doesn't actually contain any
#pragma omp target/#pragma omp declare target (or OpenACC similar
#directives), but only contains #pragma omp target data and/or the device
query/copying routines, then dlopens some other shared library that actually
has the offloading device code?
That could be solved by adding the call you are talking about, but
if we really should care about that unlikely case, it would be better to
only arrange for it if really needed by the shared library (i.e. if it calls
one of the OpenMP or OpenACC library routines that talk to the devices, or
has #pragma omp target data or similar constructs;
I'd strongly prefer not to have constructors in code that just got compiled
with -fopenmp, even in configuration where some offloading is configured by
default, when nothing in the code really cares about offloading.

> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -401,6 +401,8 @@ static const char *compare_debug_auxbase_opt_spec_function (int, const char **);
>  static const char *pass_through_libs_spec_func (int, const char **);
>  static const char *replace_extension_spec_func (int, const char **);
>  static const char *greater_than_spec_func (int, const char **);
> +static const char *add_omp_infile_spec_func (int, const char **);
> +
>  static char *convert_white_space (char *);
>  
>  /* The Specs Language

I'd like to defer review of the driver bits, can Joseph or Bernd please have
a look at those?

> diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
> index 24fbb94..5da4fa7 100644
> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -48,7 +48,8 @@ enum offload_target_type
>    OFFLOAD_TARGET_TYPE_HOST = 2,
>    /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
>    OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
> -  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
> +  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
> +  OFFLOAD_TARGET_TYPE_HWM

What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?

> diff --git a/libgomp/target.c b/libgomp/target.c
> index b767410..df51bfb 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -72,6 +72,9 @@ static int num_offload_images;
>  /* Array of descriptors for all available devices.  */
>  static struct gomp_device_descr *devices;
>  
> +/* Set of enabled devices.  */
> +static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM];

I must say I don't like the locking for this.
If all you ever change on this is that you change it from 0 to 1,
then supposedly just storing it with __atomic_store, perhaps with
rel semantics, and reading it as __atomic_load, with acquire semantics,
would be good enough?  And perhaps change it into int array,
so that it is actually atomic even on the old Alphas (if there are any
around).

	Jakub
Bernd Schmidt Oct. 20, 2015, 10:28 a.m. UTC | #2
On 10/20/2015 12:02 PM, Jakub Jelinek wrote:
> I'd like to defer review of the driver bits, can Joseph or Bernd please have
> a look at those?

Last time around I think I asked for some minor changes, like updated 
documentation for give_switch. Other than that, I'm ok with the patch 
iff you are happy with the overall approach.


Bernd
Thomas Schwinge Oct. 20, 2015, 11:17 a.m. UTC | #3
Hi Jakub!

Thanks for the review.

On Tue, 20 Oct 2015 12:02:45 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote:
> > > How's the following (complete patch instead of incremental patch; the
> > > driver changes are still the same as before)?  The changes are:
> > > 
> > >   * libgomp/target.c:gomp_target_init again loads all the plugins.
> > >   * libgomp/target.c:resolve_device and
> > >     libgomp/oacc-init.c:resolve_device verify that a default device
> > >     (OpenMP device-var ICV, and acc_device_default, respectively) is
> > >     actually enabled, or resort to host fallback if not.
> > >   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> > >     to enable devices specified by -foffload.  Can be called multiple
> > >     times (executable, any shared libraries); the set of enabled devices
> > >     is the union of all those ever requested.
> > >   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> > >     changed to enable all devices.  This is to maintain compatibility
> > >     with old executables and shared libraries built without the -foffload
> > >     constructor support.
> 
> Any reason not to pass the bitmask of the enabled targets to
> GOMP_offload_register_ver instead, to decrease the amount of ctors and
> the times you lock the various locks during initialization, or just enable
> automatically the devices you load data for during GOMP_offload_register_ver?
> I mean, GOMP_offload_register would enable for compatibility all devices,
> GOMP_offload_register_ver would enable the device it is registered for.
> For -foffload=disable on all shared libraries/binaries, naturally you would
> not register anything, thus would not enable any devices (only host fallback
> would work).

As explained a few times already: GOMP_offload_register_ver constructors
will only be generated if there actually are offloaded code regions, but
for example:

    #include <openacc.h>
    int main()
    {
      __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia));
      return 0;
    }

... is a valid OpenACC program (untested), which doesn't contain any
offloaded code regions.  As a user I'd expect it to return different
answers if compiled with -foffload=nvptx-none in contrast to
-foffload=disable.  Actually, I can foresee exactly such code to be used
to probe for offloading being available, for example in testsuites.  And,
I guess we agree that under -foffload=disable we'd like the
compilation/runtime system to be configured in a way that no offloading
will happen?

Always creating (dummy) GOMP_offload_register_ver constructors has been
another suggestion that I had voiced much earlier in this thread (months
ago), but everyone (including me) taking part in the discussion agreed
that it'd cause even higher compile-time overhead.

> Or are you worried about the case where one shared library is compiled
> with say -foffload=intelmic,ptx but doesn't actually contain any
> #pragma omp target/#pragma omp declare target (or OpenACC similar
> #directives), but only contains #pragma omp target data and/or the device
> query/copying routines, then dlopens some other shared library that actually
> has the offloading device code?

That's another example, yes.

> That could be solved by adding the call you are talking about, but
> if we really should care about that unlikely case, it would be better to
> only arrange for it if really needed by the shared library (i.e. if it calls
> one of the OpenMP or OpenACC library routines that talk to the devices, or
> has #pragma omp target data or similar constructs;
> I'd strongly prefer not to have constructors in code that just got compiled
> with -fopenmp, even in configuration where some offloading is configured by
> default, when nothing in the code really cares about offloading.

So, how to resolve our different opinions?  I mean, for any serious
program code, there will be constructor calls into libgomp already; are
you expecting that adding one more really will cause any noticeable
overhead?

I agree that enabling devices for GOMP_offload_register_ver calls makes
sense.  (I indeed had considered this earlier, but it didn't lead to
solving the problem complete -- see above.)  Can we come up with a scheme
to do it this way, and only generate the GOMP_enable_offload_targets
constructor of no GOMP_offload_register_ver constructors have been
generated?  But I have no idea how to implement that in a non-convoluted
way.  (And, it sounds excessive to me in terms of implementation overhead
on our side, in contrast to execution overhead of one libgomp constructor
call.)

> > --- a/gcc/gcc.c
> > +++ b/gcc/gcc.c
> > @@ -401,6 +401,8 @@ static const char *compare_debug_auxbase_opt_spec_function (int, const char **);
> >  static const char *pass_through_libs_spec_func (int, const char **);
> >  static const char *replace_extension_spec_func (int, const char **);
> >  static const char *greater_than_spec_func (int, const char **);
> > +static const char *add_omp_infile_spec_func (int, const char **);
> > +
> >  static char *convert_white_space (char *);
> >  
> >  /* The Specs Language
> 
> I'd like to defer review of the driver bits, can Joseph or Bernd please have
> a look at those?

Joseph has already been working on this code, completing my earlier WIP
patch while I've been out of office, and has submitted it for trunk
inclusion, so I'm assuming these changes do have his blessing.

> > --- a/libgomp/libgomp-plugin.h
> > +++ b/libgomp/libgomp-plugin.h
> > @@ -48,7 +48,8 @@ enum offload_target_type
> >    OFFLOAD_TARGET_TYPE_HOST = 2,
> >    /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
> >    OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
> > -  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
> > +  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
> > +  OFFLOAD_TARGET_TYPE_HWM
> 
> What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?

Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he
told me this means "High Water Mark".  I have no strong opinion on the
name to use, just want to mention that "*_LAST" sounds to me like that
one still is part of the accepted set, whereas in this case it'd be the
first enumerator outside of the accepted ones.  (And I guess, we agree
that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by
"OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is
ugly?)

> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -72,6 +72,9 @@ static int num_offload_images;
> >  /* Array of descriptors for all available devices.  */
> >  static struct gomp_device_descr *devices;
> >  
> > +/* Set of enabled devices.  */
> > +static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM];
> 
> I must say I don't like the locking for this.

Are you worried about the performance issues of a very short locking
cycle that in the majority of all cases should happen without blocking,
in comparison to performance issues related to host/device memory
transfers or kernel launches that will follow after the call to
gomp_offload_target_enabled_p?  I don't really think that is reasonable
to worry about.

> If all you ever change on this is that you change it from 0 to 1,
> then supposedly just storing it with __atomic_store, perhaps with
> rel semantics, and reading it as __atomic_load, with acquire semantics,
> would be good enough?  And perhaps change it into int array,
> so that it is actually atomic even on the old Alphas (if there are any
> around).

If you're really worried about this, I can look into that, but to me that
sounds like unwarranted code complexity/premature optimization...


Grüße
 Thomas
Bernd Schmidt Oct. 20, 2015, 11:45 a.m. UTC | #4
On 10/20/2015 01:17 PM, Thomas Schwinge wrote:
>
> As explained a few times already: GOMP_offload_register_ver constructors
> will only be generated if there actually are offloaded code regions, but
> for example:
>
>      #include <openacc.h>
>      int main()
>      {
>        __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia));
>        return 0;
>      }
>
> ... is a valid OpenACC program (untested), which doesn't contain any
> offloaded code regions.  As a user I'd expect it to return different
> answers if compiled with -foffload=nvptx-none in contrast to
> -foffload=disable.  Actually, I can foresee exactly such code to be used
> to probe for offloading being available, for example in testsuites.  And,
> I guess we agree that under -foffload=disable we'd like the
> compilation/runtime system to be configured in a way that no offloading
> will happen?

Both of you can ignore me if you feel I'm not making sense, but what 
exactly is the use case for -foffload=disable? Isn't it slightly 
redundant with -fno-openacc? IMO it's not an option that alters the 
available devices, that's a question that is answered at run-time and 
doesn't (or shouldn't) really depend on compiler switches. As a user I'd 
expect -foffload=disable to just prevent generation of offloaded code 
for the things I'm compiling. As Jakub pointed out, shared libraries may 
still contain other pieces that are offloadable.

I guess I don't fully understand why you want to go to great lengths to 
disable devices at run-time based on a compile-time switch. What's the 
reasoning here?

> Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he
> told me this means "High Water Mark".  I have no strong opinion on the
> name to use, just want to mention that "*_LAST" sounds to me like that
> one still is part of the accepted set, whereas in this case it'd be the
> first enumerator outside of the accepted ones.  (And I guess, we agree
> that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by
> "OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is
> ugly?)

Nah, just rename HWM to LAST, that's fairly common usage I think.


Bernd
Jakub Jelinek Oct. 20, 2015, 11:50 a.m. UTC | #5
On Tue, Oct 20, 2015 at 01:17:45PM +0200, Thomas Schwinge wrote:
> Always creating (dummy) GOMP_offload_register_ver constructors has been
> another suggestion that I had voiced much earlier in this thread (months
> ago), but everyone (including me) taking part in the discussion agreed
> that it'd cause even higher compile-time overhead.

I'd prefer to just set a flag like "force creation of the GOMP offloading
sections" whenever you see one of the APIs or constructs used in the TU,
and if that flag is set, even when there are no offloaded vars or
functions/kernels, force creation of the corresponding data sections.
Either it can be stardard offloading LTO sections, just not containing
anything, or, if you want to improve compile-time, it could be special too,
so that the linker plugin can quickly identify those that only need
offloading support, but don't have any offloaded vars or code.
But that can certainly be done as an incremental optimization.

For OpenMP that would be whenever
#pragma omp target{, data, enter data, exit data} construct is seen
(e.g. during gimplification or OMP region nesting checking even better),
or for

omp_set_default_device
omp_get_default_device
omp_get_num_devices
omp_is_initial_device
omp_get_initial_device
omp_target_alloc
omp_target_free
omp_target_is_present
omp_target_memcpy
omp_target_memcpy_rect
omp_target_associate_ptr
omp_target_disassociate_ptr

calls.  Guess for OpenACC you have similar set of calls.
The thing is, while OpenACC is standard is pretty much solely about offloading,
OpenMP is not, and in many cases programs just use host OpenMP parallelization
(at least right now, I bet such programs are significantly larger set
than programs that use OpenACC or OpenMP offloading together).
Distributions and others will eventually configure the compilers they are
shipping to enable the offloading, and if that forces a constructor to every
TU or even every shared library just because it has been compiled with
-fopenmp, it is unacceptable overhead.

For the vendor shipped binary compilers, I'm envisioning ideal would be to
be able to configure gcc for many offloading targets, then build such main
compiler and offloading target compilers, but package them separately (one
package (or set of packages) the base compiler, and then another package (or
set of them) for each offloading target.  What the -foffload= actually will
be in the end from the linked shared library or binary POV would depend both
on the configured offloading target, but also on whether the mkoffload
binaries are found (or whatever else is needed first from the offloading
target).  That would mean that we'd not issue hard error or any kind of
diagnostics if mkoffload is missing.  Is that acceptable, or should that
e.g. be limited just to the compiled in configure default (i.e. explicit
-foffload= would error if the requested mkoffload is missing, default
-foffload= would silently skip unavailable ones; I guess this would be my
preference), or should we have two ways of configuring the offloading
targets, as hard requirements and as optional support?

> So, how to resolve our different opinions?  I mean, for any serious
> program code, there will be constructor calls into libgomp already; are
> you expecting that adding one more really will cause any noticeable
> overhead?

See above, that is really not the case.  Most of OpenMP code doesn't have
any constructor calls into libgomp at all, the only exception is
GOMP_offload_register{,_ver} at this point.

> > What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?
> 
> Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he
> told me this means "High Water Mark".  I have no strong opinion on the
> name to use, just want to mention that "*_LAST" sounds to me like that
> one still is part of the accepted set, whereas in this case it'd be the
> first enumerator outside of the accepted ones.  (And I guess, we agree
> that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by
> "OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is
> ugly?)

*_LAST or *_last is actually what we use pretty much everywhere, see e.g.
lots of places in tree-core.h.

> Are you worried about the performance issues of a very short locking
> cycle that in the majority of all cases should happen without blocking,
> in comparison to performance issues related to host/device memory
> transfers or kernel launches that will follow after the call to
> gomp_offload_target_enabled_p?  I don't really think that is reasonable
> to worry about.

Yes, I'm worried about that.  The lock could be contended, and if you take
the lock many times for each construct, it can show up, I'm worried about
cache effects etc.  It is already bad enough that we take/release the locks
for the same device e.g. in each of:
  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);

  struct target_mem_desc *tgt_vars
    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
                     GOMP_MAP_VARS_TARGET);

	Jakub
Jakub Jelinek Oct. 20, 2015, 12:13 p.m. UTC | #6
On Tue, Oct 20, 2015 at 01:45:37PM +0200, Bernd Schmidt wrote:
> Both of you can ignore me if you feel I'm not making sense, but what exactly
> is the use case for -foffload=disable? Isn't it slightly redundant with
> -fno-openacc? IMO it's not an option that alters the available devices,
> that's a question that is answered at run-time and doesn't (or shouldn't)
> really depend on compiler switches. As a user I'd expect -foffload=disable
> to just prevent generation of offloaded code for the things I'm compiling.
> As Jakub pointed out, shared libraries may still contain other pieces that
> are offloadable.
> 
> I guess I don't fully understand why you want to go to great lengths to
> disable devices at run-time based on a compile-time switch. What's the
> reasoning here?

At least for OpenMP, I'm also happy with what we do now (except for the
ability to configure offloading targets as optional, i.e. dynamically
configure the default based on what packages user install rather than
just on how it has been configured, so that e.g. just because it has been
configured for PTX offloading the host GCC itself doesn't have to have a
dependency on the proprietary CUDA stuff in any way).
I believe in OpenMP nobody says that if the device HW is available, but user
chose to not compile offloading code/variables for that particular device
that it can't show up among omp_get_num_devices ().  And I think it is
entirely fine if say target data map succeeds to that device, but then
target is offloaded, if that is caused by users configure or command line
choice.  Maybe OpenACC has different requirements, is it required to
terminate the program if it can't fulfill the requested offloading?

In any case, I'm fine with something I've noted in the last mail, or with
the status quo, but not with running constructors in TUs or even shared
libraries just because they have been compiled with -fopenmp (and either
haven't used any OpenMP code at all, or just the non-*target* directives).

	Jakub
diff mbox

Patch

diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c
index 828b415..a4960a2 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -370,26 +370,34 @@  generate_host_descr_file (const char *host_compiler)
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_register (const void *, int, const void *);\n"
+	   "void GOMP_offload_register_ver "
+	   "(unsigned version, const void *, int, const void *);\n"
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_unregister (const void *, int, const void *);\n\n"
+	   "void GOMP_offload_unregister_ver "
+	   "(unsigned version, const void *, int, const void *);\n\n"
 
 	   "__attribute__((constructor))\n"
 	   "static void\n"
 	   "init (void)\n"
 	   "{\n"
-	   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
-	   "}\n\n", GOMP_DEVICE_INTEL_MIC);
+	   "  GOMP_offload_register_ver (%#x, &__OFFLOAD_TABLE__, "
+	   "%d, __offload_target_data);\n"
+	   "}\n\n",
+	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_INTEL_MIC),
+	   GOMP_DEVICE_INTEL_MIC);
 
   fprintf (src_file,
 	   "__attribute__((destructor))\n"
 	   "static void\n"
 	   "fini (void)\n"
 	   "{\n"
-	   "  GOMP_offload_unregister (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
-	   "}\n", GOMP_DEVICE_INTEL_MIC);
+	   "  GOMP_offload_unregister_ver (%#x, &__OFFLOAD_TABLE__, "
+	   "%d, __offload_target_data);\n"
+	   "}\n",
+	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_INTEL_MIC),
+	   GOMP_DEVICE_INTEL_MIC);
 
   fclose (src_file);
 
diff --git a/gcc/fortran/gfortranspec.c b/gcc/fortran/gfortranspec.c
index fe594db..e3e83ba 100644
--- a/gcc/fortran/gfortranspec.c
+++ b/gcc/fortran/gfortranspec.c
@@ -439,7 +439,7 @@  int
 lang_specific_pre_link (void)
 {
   if (library)
-    do_spec ("%:include(libgfortran.spec)");
+    do_spec ("%:include(libgfortran.spec)", 0);
 
   return 0;
 }
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 7f5a36e..02795e7 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -401,6 +401,8 @@  static const char *compare_debug_auxbase_opt_spec_function (int, const char **);
 static const char *pass_through_libs_spec_func (int, const char **);
 static const char *replace_extension_spec_func (int, const char **);
 static const char *greater_than_spec_func (int, const char **);
+static const char *add_omp_infile_spec_func (int, const char **);
+
 static char *convert_white_space (char *);
 
 /* The Specs Language
@@ -1193,6 +1195,11 @@  static const char *const multilib_defaults_raw[] = MULTILIB_DEFAULTS;
 
 static const char *const driver_self_specs[] = {
   "%{fdump-final-insns:-fdump-final-insns=.} %<fdump-final-insns",
+#ifdef ENABLE_OFFLOADING
+  /* If linking against libgomp, add a setup file.  */
+  "%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):" \
+  "%:add-omp-infile()}",
+#endif /* ENABLE_OFFLOADING */
   DRIVER_SELF_SPECS, CONFIGURE_SPECS, GOMP_SELF_SPECS, GTM_SELF_SPECS,
   CILK_SELF_SPECS
 };
@@ -1620,6 +1627,7 @@  static const struct spec_function static_spec_functions[] =
   { "pass-through-libs",	pass_through_libs_spec_func },
   { "replace-extension",	replace_extension_spec_func },
   { "gt",			greater_than_spec_func },
+  { "add-omp-infile",		add_omp_infile_spec_func },
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -3216,7 +3224,8 @@  execute (void)
    The `validated' field describes whether any spec has looked at this switch;
    if it remains false at the end of the run, the switch must be meaningless.
    The `ordering' field is used to temporarily mark switches that have to be
-   kept in a specific order.  */
+   kept in a specific order.
+   The `lang_mask' field stores the flags associated with this option.  */
 
 #define SWITCH_LIVE    			(1 << 0)
 #define SWITCH_FALSE   			(1 << 1)
@@ -3232,6 +3241,7 @@  struct switchstr
   bool known;
   bool validated;
   bool ordering;
+  unsigned int lang_mask;
 };
 
 static struct switchstr *switches;
@@ -3240,6 +3250,10 @@  static int n_switches;
 
 static int n_switches_alloc;
 
+/* If nonzero, do not pass through switches for languages not matching
+   this mask.  */
+static unsigned int spec_lang_mask_accept;
+
 /* Set to zero if -fcompare-debug is disabled, positive if it's
    enabled and we're running the first compilation, negative if it's
    enabled and we're running the second compilation.  For most of the
@@ -3277,6 +3291,7 @@  struct infile
   const char *name;
   const char *language;
   struct compiler *incompiler;
+  unsigned int lang_mask;
   bool compiled;
   bool preprocessed;
 };
@@ -3470,15 +3485,16 @@  alloc_infile (void)
     }
 }
 
-/* Store an input file with the given NAME and LANGUAGE in
+/* Store an input file with the given NAME and LANGUAGE and LANG_MASK in
    infiles.  */
 
 static void
-add_infile (const char *name, const char *language)
+add_infile (const char *name, const char *language, unsigned int lang_mask)
 {
   alloc_infile ();
   infiles[n_infiles].name = name;
-  infiles[n_infiles++].language = language;
+  infiles[n_infiles].language = language;
+  infiles[n_infiles++].lang_mask = lang_mask;
 }
 
 /* Allocate space for a switch in switches.  */
@@ -3499,11 +3515,12 @@  alloc_switch (void)
 }
 
 /* Save an option OPT with N_ARGS arguments in array ARGS, marking it
-   as validated if VALIDATED and KNOWN if it is an internal switch.  */
+   as validated if VALIDATED and KNOWN if it is an internal switch.
+   LANG_MASK is the flags associated with this option.  */
 
 static void
 save_switch (const char *opt, size_t n_args, const char *const *args,
-	     bool validated, bool known)
+	     bool validated, bool known, unsigned int lang_mask)
 {
   alloc_switch ();
   switches[n_switches].part1 = opt + 1;
@@ -3520,6 +3537,7 @@  save_switch (const char *opt, size_t n_args, const char *const *args,
   switches[n_switches].validated = validated;
   switches[n_switches].known = known;
   switches[n_switches].ordering = 0;
+  switches[n_switches].lang_mask = lang_mask;
   n_switches++;
 }
 
@@ -3537,7 +3555,8 @@  driver_unknown_option_callback (const struct cl_decoded_option *decoded)
 	 diagnosed only if there are warnings.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, true);
+		   &decoded->canonical_option[1], false, true,
+		   cl_options[decoded->opt_index].flags);
       return false;
     }
   if (decoded->opt_index == OPT_SPECIAL_unknown)
@@ -3545,7 +3564,8 @@  driver_unknown_option_callback (const struct cl_decoded_option *decoded)
       /* Give it a chance to define it a spec file.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, false);
+		   &decoded->canonical_option[1], false, false,
+		   cl_options[decoded->opt_index].flags);
       return false;
     }
   else
@@ -3572,7 +3592,8 @@  driver_wrong_lang_callback (const struct cl_decoded_option *decoded,
   else
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], false, true);
+		 &decoded->canonical_option[1], false, true,
+		 option->flags);
 }
 
 static const char *spec_lang = 0;
@@ -3821,7 +3842,8 @@  driver_handle_option (struct gcc_options *opts,
 	compare_debug_opt = NULL;
       else
 	compare_debug_opt = arg;
-      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true);
+      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_fdiagnostics_color_:
@@ -3876,17 +3898,17 @@  driver_handle_option (struct gcc_options *opts,
 	for (j = 0; arg[j]; j++)
 	  if (arg[j] == ',')
 	    {
-	      add_infile (save_string (arg + prev, j - prev), "*");
+	      add_infile (save_string (arg + prev, j - prev), "*", 0);
 	      prev = j + 1;
 	    }
 	/* Record the part after the last comma.  */
-	add_infile (arg + prev, "*");
+	add_infile (arg + prev, "*", 0);
       }
       do_save = false;
       break;
 
     case OPT_Xlinker:
-      add_infile (arg, "*");
+      add_infile (arg, "*", 0);
       do_save = false;
       break;
 
@@ -3903,19 +3925,21 @@  driver_handle_option (struct gcc_options *opts,
     case OPT_l:
       /* POSIX allows separation of -l and the lib arg; canonicalize
 	 by concatenating -l with its arg */
-      add_infile (concat ("-l", arg, NULL), "*");
+      add_infile (concat ("-l", arg, NULL), "*", 0);
       do_save = false;
       break;
 
     case OPT_L:
       /* Similarly, canonicalize -L for linkers that may not accept
 	 separate arguments.  */
-      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_F:
       /* Likewise -F.  */
-      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_save_temps:
@@ -4038,7 +4062,8 @@  driver_handle_option (struct gcc_options *opts,
       save_temps_prefix = xstrdup (arg);
       /* On some systems, ld cannot handle "-o" without a space.  So
 	 split the option from its argument.  */
-      save_switch ("-o", 1, &arg, validated, true);
+      save_switch ("-o", 1, &arg, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
 #ifdef ENABLE_DEFAULT_PIE
@@ -4074,7 +4099,8 @@  driver_handle_option (struct gcc_options *opts,
   if (do_save)
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], validated, true);
+		 &decoded->canonical_option[1], validated, true,
+		 cl_options[opt_index].flags);
   return true;
 }
 
@@ -4371,7 +4397,7 @@  process_command (unsigned int decoded_options_count,
           if (strcmp (fname, "-") != 0 && access (fname, F_OK) < 0)
 	    perror_with_name (fname);
           else
-	    add_infile (arg, spec_lang);
+	    add_infile (arg, spec_lang, 0);
 
           free (fname);
 	  continue;
@@ -4520,7 +4546,8 @@  process_command (unsigned int decoded_options_count,
   if (compare_debug == 2 || compare_debug == 3)
     {
       const char *opt = concat ("-fcompare-debug=", compare_debug_opt, NULL);
-      save_switch (opt, 0, NULL, false, true);
+      save_switch (opt, 0, NULL, false, true,
+		   cl_options[OPT_fcompare_debug_].flags);
       compare_debug = 1;
     }
 
@@ -4531,7 +4558,7 @@  process_command (unsigned int decoded_options_count,
 
       /* Create a dummy input file, so that we can pass
 	 the help option on to the various sub-processes.  */
-      add_infile ("help-dummy", "c");
+      add_infile ("help-dummy", "c", 0);
     }
 
   alloc_switch ();
@@ -4732,13 +4759,15 @@  insert_wrapper (const char *wrapper)
 }
 
 /* Process the spec SPEC and run the commands specified therein.
+   If LANG_MASK is nonzero, switches for other languages are discarded.
    Returns 0 if the spec is successfully processed; -1 if failed.  */
 
 int
-do_spec (const char *spec)
+do_spec (const char *spec, unsigned int lang_mask)
 {
   int value;
 
+  spec_lang_mask_accept = lang_mask;
   value = do_spec_2 (spec);
 
   /* Force out any unfinished command.
@@ -4896,7 +4925,8 @@  do_self_spec (const char *spec)
 	      save_switch (decoded_options[j].canonical_option[0],
 			   (decoded_options[j].canonical_option_num_elements
 			    - 1),
-			   &decoded_options[j].canonical_option[1], false, true);
+			   &decoded_options[j].canonical_option[1], false, true,
+			   cl_options[decoded_options[j].opt_index].flags);
 	      break;
 
 	    default:
@@ -6492,6 +6522,14 @@  check_live_switch (int switchnum, int prefix_length)
 static void
 give_switch (int switchnum, int omit_first_word)
 {
+  int lang_mask = switches[switchnum].lang_mask & ((1U << cl_lang_count) - 1);
+  unsigned int lang_mask_accept = (1U << cl_lang_count) - 1;
+  if (spec_lang_mask_accept != 0)
+    lang_mask_accept = spec_lang_mask_accept;
+  /* Drop switches specific to a language not in the given mask.  */
+  if (lang_mask != 0 && !(lang_mask & lang_mask_accept))
+    return;
+
   if ((switches[switchnum].live_cond & SWITCH_IGNORE) != 0)
     return;
 
@@ -7593,9 +7631,6 @@  driver::maybe_putenv_OFFLOAD_TARGETS () const
 		    strlen (offload_targets) + 1);
       xputenv (XOBFINISH (&collect_obstack, char *));
     }
-
-  free (offload_targets);
-  offload_targets = NULL;
 }
 
 /* Reject switches that no pass was interested in.  */
@@ -7899,7 +7934,8 @@  driver::do_spec_on_infiles () const
 		  debug_check_temp_file[1] = NULL;
 		}
 
-	      value = do_spec (input_file_compiler->spec);
+	      value = do_spec (input_file_compiler->spec,
+			       infiles[i].lang_mask);
 	      infiles[i].compiled = true;
 	      if (value < 0)
 		this_file_error = 1;
@@ -7913,7 +7949,8 @@  driver::do_spec_on_infiles () const
 		  n_switches_alloc = n_switches_alloc_debug_check[1];
 		  switches = switches_debug_check[1];
 
-		  value = do_spec (input_file_compiler->spec);
+		  value = do_spec (input_file_compiler->spec,
+				   infiles[i].lang_mask);
 
 		  compare_debug = -compare_debug;
 		  n_switches = n_switches_debug_check[0];
@@ -8068,7 +8105,7 @@  driver::maybe_run_linker (const char *argv0) const
 		    " to the linker.\n\n"));
 	  fflush (stdout);
 	}
-      int value = do_spec (link_command_spec);
+      int value = do_spec (link_command_spec, 0);
       if (value < 0)
 	errorcount = 1;
       linker_was_run = (tmp != execution_count);
@@ -9659,6 +9696,50 @@  greater_than_spec_func (int argc, const char **argv)
   return NULL;
 }
 
+/* If applicable, generate a C source file containing a constructor call to
+   GOMP_enable_offload_targets, to inform libgomp which offload targets have
+   actually been requested (-foffload=[...]), and add that as an infile.  */
+
+static const char *
+add_omp_infile_spec_func (int argc, const char **)
+{
+  gcc_assert (argc == 0);
+  gcc_assert (offload_targets != NULL);
+
+  /* Nothing to do if we're not actually linking.  */
+  if (have_c)
+    return NULL;
+
+  int err;
+  const char *tmp_filename;
+  tmp_filename = make_temp_file (".c");
+  record_temp_file (tmp_filename, !save_temps_flag, 0);
+  FILE *f = fopen (tmp_filename, "w");
+  if (f == NULL)
+    fatal_error (input_location,
+		 "could not open temporary file %s", tmp_filename);
+  /* As libgomp uses constructors internally, and this code is only added when
+     linking against libgomp, it is fine to use a constructor here.  */
+  err = fprintf (f,
+		 "extern void GOMP_enable_offload_targets (const char *);\n"
+		 "static __attribute__ ((constructor)) void\n"
+		 "init (void)\n"
+		 "{\n"
+		 "  GOMP_enable_offload_targets (\"%s\");\n"
+		 "}\n",
+		 offload_targets);
+  if (err < 0)
+    fatal_error (input_location,
+		 "could not write to temporary file %s", tmp_filename);
+  err = fclose (f);
+  if (err == EOF)
+    fatal_error (input_location,
+		 "could not close temporary file %s", tmp_filename);
+
+  add_infile (tmp_filename, "cpp-output", CL_C);
+  return NULL;
+}
+
 /* Insert backslash before spaces in ORIG (usually a file path), to 
    avoid being broken by spec parser.
 
diff --git a/gcc/gcc.h b/gcc/gcc.h
index e1abe43..c71582d 100644
--- a/gcc/gcc.h
+++ b/gcc/gcc.h
@@ -68,7 +68,7 @@  struct spec_function
 };
 
 /* These are exported by gcc.c.  */
-extern int do_spec (const char *);
+extern int do_spec (const char *, unsigned int);
 extern void record_temp_file (const char *, int, int);
 extern void pfatal_with_name (const char *) ATTRIBUTE_NORETURN;
 extern void set_input (const char *);
diff --git a/gcc/java/jvspec.c b/gcc/java/jvspec.c
index d4efb73..518aa4d 100644
--- a/gcc/java/jvspec.c
+++ b/gcc/java/jvspec.c
@@ -629,7 +629,7 @@  lang_specific_pre_link (void)
      class name.  Append dummy `.c' that can be stripped by set_input so %b
      is correct.  */ 
   set_input (concat (main_class_name, "main.c", NULL));
-  err = do_spec (jvgenmain_spec);
+  err = do_spec (jvgenmain_spec, 0);
   if (err == 0)
     {
       /* Shift the outfiles array so the generated main comes first.
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 2e4c698..d63e56a 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -95,7 +95,7 @@ 
    */
 #undef LT_OBJDIR
 
-/* Define to offload targets, separated by commas. */
+/* Define to offload targets, separated by colons. */
 #undef OFFLOAD_TARGETS
 
 /* Name of package */
diff --git a/libgomp/configure b/libgomp/configure
index 74d4e82..36ae548 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15236,10 +15236,8 @@  if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -15282,9 +15280,9 @@  rm -f core conftest.err conftest.$ac_objext \
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 24fbb94..5da4fa7 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -48,7 +48,8 @@  enum offload_target_type
   OFFLOAD_TARGET_TYPE_HOST = 2,
   /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
   OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
+  OFFLOAD_TARGET_TYPE_HWM
 };
 
 /* Auxiliary struct, used for transferring pairs of addresses from plugin
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9c8b1fb..e945851 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -739,6 +739,7 @@  extern void gomp_free_thread (void *);
 
 extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
+extern bool gomp_offload_target_enabled_p (enum offload_target_type);
 extern void gomp_target_task_fn (void *);
 
 typedef struct splay_tree_node_s *splay_tree_node;
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2153661..05d5195 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -286,6 +286,7 @@  GOMP_4.5 {
 	GOMP_loop_ull_doacross_static_start;
 	GOMP_doacross_ull_post;
 	GOMP_doacross_ull_wait;
+	GOMP_enable_offload_targets;
 } GOMP_4.0.1;
 
 OACC_2.0 {
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index c28ad21..cc19767 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -247,6 +247,7 @@  extern void GOMP_single_copy_end (void *);
 
 /* target.c */
 
+extern void GOMP_enable_offload_targets (const char *);
 extern void GOMP_target (int, void (*) (void *), const void *,
 			 size_t, void **, size_t *, unsigned char *);
 extern void GOMP_target_41 (int, void (*) (void *), size_t, void **, size_t *,
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index a0e62a4..2b357e1 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -122,7 +122,9 @@  resolve_device (acc_device_t d, bool fail_is_error)
       {
 	if (goacc_device_type)
 	  {
-	    /* Lookup the named device.  */
+	    /* Lookup the device that has been explicitly named, so do not pay
+	       attention to gomp_offload_target_enabled_p.  (That is, hard
+	       error if not actually enabled.)  */
 	    while (++d != _ACC_device_hwm)
 	      if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
@@ -148,8 +150,14 @@  resolve_device (acc_device_t d, bool fail_is_error)
     case acc_device_not_host:
       /* Find the first available device after acc_device_not_host.  */
       while (++d != _ACC_device_hwm)
-	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+	if (dispatchers[d]
+	    && dispatchers[d]->get_num_devices_func () > 0
+	    /* No device has been explicitly named, so pay attention to
+	       gomp_offload_target_enabled_p, to not decide on an offload
+	       target that has not been enabled.  */
+	    && gomp_offload_target_enabled_p (dispatchers[d]->type))
 	  goto found;
+      /* No non-host device found.  */
       if (d_arg == acc_device_default)
 	{
 	  d = acc_device_host;
@@ -164,9 +172,6 @@  resolve_device (acc_device_t d, bool fail_is_error)
         return NULL;
       break;
 
-    case acc_device_host:
-      break;
-
     default:
       if (d > _ACC_device_hwm)
 	{
@@ -181,7 +186,8 @@  resolve_device (acc_device_t d, bool fail_is_error)
 
   assert (d != acc_device_none
 	  && d != acc_device_default
-	  && d != acc_device_not_host);
+	  && d != acc_device_not_host
+	  && d < _ACC_device_hwm);
 
   if (dispatchers[d] == NULL && fail_is_error)
     {
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index ad70dd1..a1bfec6 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -92,10 +92,8 @@  if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -127,9 +125,9 @@  if test x"$enable_offload_targets" != x; then
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
@@ -141,7 +139,7 @@  if test x"$enable_offload_targets" != x; then
   done
 fi
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
-  [Define to offload targets, separated by commas.])
+  [Define to offload targets, separated by colons.])
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
diff --git a/libgomp/target.c b/libgomp/target.c
index b767410..df51bfb 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -72,6 +72,9 @@  static int num_offload_images;
 /* Array of descriptors for all available devices.  */
 static struct gomp_device_descr *devices;
 
+/* Set of enabled devices.  */
+static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM];
+
 /* Total number of available devices.  */
 static int num_devices;
 
@@ -123,17 +126,27 @@  gomp_get_num_devices (void)
 }
 
 static struct gomp_device_descr *
-resolve_device (int device_id)
+resolve_device (int device)
 {
-  if (device_id == GOMP_DEVICE_ICV)
+  int device_id;
+  if (device == GOMP_DEVICE_ICV)
     {
       struct gomp_task_icv *icv = gomp_icv (false);
       device_id = icv->default_device_var;
     }
+  else
+    device_id = device;
 
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
     return NULL;
 
+  /* If the device specified by the device-var ICV is not actually enabled,
+     don't try use it (which will fail if it doesn't have offload data
+     available), and use host fallback instead.  */
+  if (device == GOMP_DEVICE_ICV
+      && !gomp_offload_target_enabled_p (devices[device_id].type))
+    return NULL;
+
   gomp_mutex_lock (&devices[device_id].lock);
   if (!devices[device_id].is_initialized)
     gomp_init_device (&devices[device_id]);
@@ -1063,6 +1076,8 @@  void
 GOMP_offload_register_ver (unsigned version, const void *host_table,
 			   int target_type, const void *target_data)
 {
+  gomp_debug (0, "%s (%#x, %d)\n", __FUNCTION__, version, target_type);
+
   int i;
 
   if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
@@ -1100,6 +1115,18 @@  void
 GOMP_offload_register (const void *host_table, int target_type,
 		       const void *target_data)
 {
+  gomp_debug (0, "%s (%d)\n", __FUNCTION__, target_type);
+
+  gomp_mutex_lock (&register_lock);
+  /* If we're seeing this function called, then default to the old behavior of
+     enabling all offload targets: this is what old executables and shared
+     libraries expect.  */
+  for (enum offload_target_type type = 0;
+       type < OFFLOAD_TARGET_TYPE_HWM;
+       ++type)
+    devices_enabled[type] = true;
+  gomp_mutex_unlock (&register_lock);
+
   GOMP_offload_register_ver (0, host_table, target_type, target_data);
 }
 
@@ -1111,6 +1138,8 @@  void
 GOMP_offload_unregister_ver (unsigned version, const void *host_table,
 			     int target_type, const void *target_data)
 {
+  gomp_debug (0, "%s (%#x, %d)\n", __FUNCTION__, version, target_type);
+
   int i;
 
   gomp_mutex_lock (&register_lock);
@@ -1141,6 +1170,8 @@  void
 GOMP_offload_unregister (const void *host_table, int target_type,
 			 const void *target_data)
 {
+  gomp_debug (0, "%s (%d)\n", __FUNCTION__, target_type);
+
   GOMP_offload_unregister_ver (0, host_table, target_type, target_data);
 }
 
@@ -1213,6 +1244,24 @@  gomp_fini_device (struct gomp_device_descr *devicep)
   devicep->is_initialized = false;
 }
 
+/* Has the offload target type TYPE been enabled?
+
+   We cannot verify that *all* offload data is available that could possibly be
+   required, so if we later find any offload data missing for this offload
+   target, then that's user error.  */
+
+attribute_hidden bool
+gomp_offload_target_enabled_p (enum offload_target_type type)
+{
+  bool ret;
+
+  gomp_mutex_lock (&register_lock);
+  ret = devices_enabled[type];
+  gomp_mutex_unlock (&register_lock);
+
+  return ret;
+}
+
 /* Host fallback for GOMP_target{,_41} routines.  */
 
 static void
@@ -2071,6 +2120,8 @@  static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, plugin_name);
+
   const char *err = NULL, *last_missing = NULL;
 
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
@@ -2169,6 +2220,78 @@  gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return 0;
 }
 
+/* Return the corresponding offload target type for the offload target name
+   OFFLOAD_TARGET, or 0 if unknown.  */
+
+static enum offload_target_type
+offload_target_to_type (const char *offload_target)
+{
+  if (strstr (offload_target, "-intelmic") != NULL)
+    return OFFLOAD_TARGET_TYPE_INTEL_MIC;
+  else if (strncmp (offload_target, "nvptx", 5) == 0)
+    return OFFLOAD_TARGET_TYPE_NVIDIA_PTX;
+  else
+    return 0;
+}
+
+/* Return the corresponding plugin name for the offload target type TYPE, or
+   NULL if unknown.  */
+
+static const char *
+offload_target_type_to_plugin_name (enum offload_target_type type)
+{
+  switch (type)
+    {
+    case OFFLOAD_TARGET_TYPE_INTEL_MIC:
+      return "intelmic";
+    case OFFLOAD_TARGET_TYPE_NVIDIA_PTX:
+      return "nvptx";
+    default:
+      return NULL;
+    }
+}
+
+/* Enable the specified OFFLOAD_TARGETS, the set passed to the compiler at link
+   time.  */
+
+void
+GOMP_enable_offload_targets (const char *offload_targets)
+{
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
+
+  char *offload_targets_dup = strdup (offload_targets);
+  if (offload_targets_dup == NULL)
+    gomp_fatal ("Out of memory");
+
+  gomp_mutex_lock (&register_lock);
+
+  char *cur = offload_targets_dup;
+  while (cur)
+    {
+      char *next = strchr (cur, ':');
+      if (next != NULL)
+	{
+	  *next = '\0';
+	  ++next;
+	}
+      enum offload_target_type type = offload_target_to_type (cur);
+      if (type == 0)
+	{
+	  /* An unknown offload target has been requested; ignore it.  This
+	     makes us (future-)proof if offload targets are requested that
+	     are not supported in this build of libgomp.  */
+	}
+      else
+	devices_enabled[type] = true;
+
+      cur = next;
+    }
+
+  gomp_mutex_unlock (&register_lock);
+
+  free (offload_targets_dup);
+}
+
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -2176,13 +2299,13 @@  gomp_load_plugin_for_device (struct gomp_device_descr *device,
    corresponding devices, first the GOMP_OFFLOAD_CAP_OPENMP_400 ones, follows
    by the others.  */
 
+static const char *gomp_plugin_prefix ="libgomp-plugin-";
+static const char *gomp_plugin_suffix = SONAME_SUFFIX (1);
+
 static void
 gomp_target_init (void)
 {
-  const char *prefix ="libgomp-plugin-";
-  const char *suffix = SONAME_SUFFIX (1);
   const char *cur, *next;
-  char *plugin_name;
   int i, new_num_devices;
 
   num_devices = 0;
@@ -2192,44 +2315,58 @@  gomp_target_init (void)
   if (*cur)
     do
       {
-	struct gomp_device_descr current_device;
-
-	next = strchr (cur, ',');
-
-	plugin_name = (char *) malloc (1 + (next ? next - cur : strlen (cur))
-				       + strlen (prefix) + strlen (suffix));
-	if (!plugin_name)
-	  {
-	    num_devices = 0;
-	    break;
-	  }
-
-	strcpy (plugin_name, prefix);
-	strncat (plugin_name, cur, next ? next - cur : strlen (cur));
-	strcat (plugin_name, suffix);
+	next = strchr (cur, ':');
+	/* If no other offload target following...  */
+	if (next == NULL)
+	  /* ..., point to the terminating NUL character.  */
+	  next = strchr (cur, '\0');
+
+	size_t gomp_plugin_prefix_len = strlen (gomp_plugin_prefix);
+	size_t cur_len = next - cur;
+	size_t gomp_plugin_suffix_len = strlen (gomp_plugin_suffix);
+	char *plugin_name
+	  = gomp_realloc_unlock (NULL, (gomp_plugin_prefix_len
+					+ cur_len
+					+ gomp_plugin_suffix_len
+					+ 1));
+	memcpy (plugin_name, gomp_plugin_prefix, gomp_plugin_prefix_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len, cur, cur_len);
+	/* NUL-terminate the string here...  */
+	plugin_name[gomp_plugin_prefix_len + cur_len] = '\0';
+	/* ..., so that we can then use it to translate the offload target to
+	   the plugin name...  */
+	enum offload_target_type type
+	  = offload_target_to_type (plugin_name + gomp_plugin_prefix_len);
+	const char *cur_plugin_name
+	  = offload_target_type_to_plugin_name (type);
+	size_t cur_plugin_name_len = strlen (cur_plugin_name);
+	assert (cur_plugin_name_len <= cur_len);
+	/* ..., and then rewrite it.  */
+	memcpy (plugin_name + gomp_plugin_prefix_len,
+		cur_plugin_name, cur_plugin_name_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len + cur_plugin_name_len,
+		gomp_plugin_suffix, gomp_plugin_suffix_len);
+	plugin_name[gomp_plugin_prefix_len
+		    + cur_plugin_name_len
+		    + gomp_plugin_suffix_len] = '\0';
 
+	struct gomp_device_descr current_device;
 	if (gomp_load_plugin_for_device (&current_device, plugin_name))
 	  {
 	    new_num_devices = current_device.get_num_devices_func ();
 	    if (new_num_devices >= 1)
 	      {
-		/* Augment DEVICES and NUM_DEVICES.  */
-
-		devices = realloc (devices, (num_devices + new_num_devices)
-				   * sizeof (struct gomp_device_descr));
-		if (!devices)
-		  {
-		    num_devices = 0;
-		    free (plugin_name);
-		    break;
-		  }
-
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
+
+		/* Augment DEVICES and NUM_DEVICES.  */
+		devices = gomp_realloc_unlock
+		  (devices, ((num_devices + new_num_devices)
+			     * sizeof (struct gomp_device_descr)));
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
@@ -2243,18 +2380,13 @@  gomp_target_init (void)
 	free (plugin_name);
 	cur = next + 1;
       }
-    while (next);
+    while (*next);
 
   /* In DEVICES, sort the GOMP_OFFLOAD_CAP_OPENMP_400 ones first, and set
      NUM_DEVICES_OPENMP.  */
   struct gomp_device_descr *devices_s
-    = malloc (num_devices * sizeof (struct gomp_device_descr));
-  if (!devices_s)
-    {
-      num_devices = 0;
-      free (devices);
-      devices = NULL;
-    }
+    = gomp_realloc_unlock (NULL,
+			   num_devices * sizeof (struct gomp_device_descr));
   num_devices_openmp = 0;
   for (i = 0; i < num_devices; i++)
     if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 6dc1e8e..07f85ef 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -37,24 +37,21 @@  load_gcc_lib fortran-modules.exp
 load_file libgomp-test-support.exp
 
 # Populate offload_targets_s (offloading targets separated by a space), and
-# offload_targets_s_openacc (the same, but with OpenACC names; OpenACC spells
-# some of them a little differently).
-set offload_targets_s [split $offload_targets ","]
+# offload_targets_s_openacc (those suitable for OpenACC).
+set offload_targets_s [split $offload_targets ":"]
 set offload_targets_s_openacc {}
 foreach offload_target_openacc $offload_targets_s {
-    switch $offload_target_openacc {
-	intelmic {
+    switch -glob $offload_target_openacc {
+	*-intelmic* {
 	    # Skip; will all FAIL because of missing
 	    # GOMP_OFFLOAD_CAP_OPENACC_200.
 	    continue
 	}
-	nvptx {
-	    set offload_target_openacc "nvidia"
-	}
     }
     lappend offload_targets_s_openacc "$offload_target_openacc"
 }
-lappend offload_targets_s_openacc "host"
+# Host fallback.
+lappend offload_targets_s_openacc "disable"
 
 set dg-do-what-default run
 
@@ -135,7 +132,7 @@  proc libgomp_init { args } {
     # Add liboffloadmic build directory in LD_LIBRARY_PATH to support
     # non-fallback testing for Intel MIC targets
     global offload_targets
-    if { [string match "*,intelmic,*" ",$offload_targets,"] } {
+    if { [string match "*:*-intelmic*:*" ":$offload_targets:"] } {
 	append always_ld_library_path ":${blddir}/../liboffloadmic/.libs"
 	append always_ld_library_path ":${blddir}/../liboffloadmic/plugin/.libs"
 	# libstdc++ is required by liboffloadmic
@@ -346,15 +343,14 @@  proc check_effective_target_openacc_nvidia_accel_present { } {
 }
 
 # Return 1 if at least one nvidia board is present, and the nvidia device type
-# is selected by default by means of setting the environment variable
-# ACC_DEVICE_TYPE.
+# is selected by default.
 
 proc check_effective_target_openacc_nvidia_accel_selected { } {
     if { ![check_effective_target_openacc_nvidia_accel_present] } {
 	return 0;
     }
     global offload_target_openacc
-    if { $offload_target_openacc == "nvidia" } {
+    if { [string match "nvptx*" $offload_target_openacc] } {
         return 1;
     }
     return 0;
@@ -364,7 +360,7 @@  proc check_effective_target_openacc_nvidia_accel_selected { } {
 
 proc check_effective_target_openacc_host_selected { } {
     global offload_target_openacc
-    if { $offload_target_openacc == "host" } {
+    if { $offload_target_openacc == "disable" } {
         return 1;
     }
     return 0;
diff --git a/libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C b/libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C
new file mode 100644
index 0000000..15b9432
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C
@@ -0,0 +1,3 @@ 
+/* { dg-additional-options "-foffload=disable" } */
+
+#include "target-1.C"
diff --git a/libgomp/testsuite/libgomp.c++/target-foffload_disable.C b/libgomp/testsuite/libgomp.c++/target-foffload_disable.C
new file mode 100644
index 0000000..c07dea1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/target-foffload_disable.C
@@ -0,0 +1,3 @@ 
+/* { dg-additional-options "-foffload=disable" } */
+
+#include "../libgomp.c/target-foffload_disable.c"
diff --git a/libgomp/testsuite/libgomp.c/target-1-foffload_disable.c b/libgomp/testsuite/libgomp.c/target-1-foffload_disable.c
new file mode 100644
index 0000000..177cceb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-1-foffload_disable.c
@@ -0,0 +1,3 @@ 
+/* { dg-additional-options "-foffload=disable" } */
+
+#include "target-1.c"
diff --git a/libgomp/testsuite/libgomp.c/target-foffload_disable.c b/libgomp/testsuite/libgomp.c/target-foffload_disable.c
new file mode 100644
index 0000000..4a712da
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-foffload_disable.c
@@ -0,0 +1,18 @@ 
+/* { dg-additional-options "-foffload=disable" } */
+
+#include <omp.h>
+
+int main()
+{
+  if (!omp_is_initial_device())
+    __builtin_abort();
+#pragma omp target
+  {
+    if (!omp_is_initial_device())
+      __builtin_abort();
+  }
+  if (!omp_is_initial_device())
+    __builtin_abort();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/target-foffload_disable.f b/libgomp/testsuite/libgomp.fortran/target-foffload_disable.f
new file mode 100644
index 0000000..0d60534
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target-foffload_disable.f
@@ -0,0 +1,14 @@ 
+!     { dg-additional-options "-foffload=disable" }
+
+      PROGRAM MAIN
+      IMPLICIT NONE
+
+      INCLUDE "omp_lib.h"
+
+      IF (.NOT. OMP_IS_INITIAL_DEVICE()) CALL ABORT
+!$OMP TARGET
+      IF (.NOT. OMP_IS_INITIAL_DEVICE()) CALL ABORT
+!$OMP END TARGET
+      IF (.NOT. OMP_IS_INITIAL_DEVICE()) CALL ABORT
+
+      END
diff --git a/libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90 b/libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90
new file mode 100644
index 0000000..005328e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90
@@ -0,0 +1,3 @@ 
+! { dg-additional-options "-cpp -foffload=disable" }
+
+#include "target1.f90"
diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
index 88b0269..aa545a2 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -75,13 +75,12 @@  if { $lang_test_file_found } {
 
     # Test OpenACC with available accelerators.
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -95,14 +94,13 @@  if { $lang_test_file_found } {
 		lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
-
-	setenv ACC_DEVICE_TYPE $offload_target_openacc
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	dg-runtest $tests "$tagopt" "$libstdcxx_includes $DEFAULT_CFLAGS"
     }
diff --git a/libgomp/testsuite/libgomp.oacc-c/c.exp b/libgomp/testsuite/libgomp.oacc-c/c.exp
index 5020e6a..9d2065f 100644
--- a/libgomp/testsuite/libgomp.oacc-c/c.exp
+++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -38,13 +38,13 @@  set_ld_library_path_env_vars
 set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
 foreach offload_target_openacc $offload_targets_s_openacc {
     set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-    set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
 
-    switch $offload_target_openacc {
-	host {
+    switch -glob $offload_target_openacc {
+	disable {
 	    set acc_mem_shared 1
+	    set tagopt "-DACC_DEVICE_TYPE_host=1"
 	}
-	nvidia {
+	nvptx* {
 	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		# Don't bother; execution testing is going to FAIL.
 		untested "$subdir $offload_target_openacc offloading"
@@ -58,14 +58,13 @@  foreach offload_target_openacc $offload_targets_s_openacc {
 	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 	    set acc_mem_shared 0
+	    set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	}
 	default {
 	    set acc_mem_shared 0
 	}
     }
-    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
-
-    setenv ACC_DEVICE_TYPE $offload_target_openacc
+    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
     dg-runtest $tests "$tagopt" $DEFAULT_CFLAGS
 }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index 2d6b647..3f678ba 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -67,13 +67,12 @@  if { $lang_test_file_found } {
 
     # Test OpenACC with available accelerators.
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -81,14 +80,13 @@  if { $lang_test_file_found } {
 		}
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
-
-	setenv ACC_DEVICE_TYPE $offload_target_openacc
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	# For Fortran we're doing torture testing, as Fortran has far more tests
 	# with arrays etc. that testing just -O0 or -O2 is insufficient, that is