The nvptx port [1/11+] indirect jumps

Message ID	544519D8.70606@codesourcery.com
State	New
Headers	show Return-Path: <gcc-patches-return-381143-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; q=dns; s=default; b=bgt9q6CMd+RY8wIM5 62WbN4Cvs2HEyaoVtUXjpQv6y/wrWW6/+ZqC1+vNkyL28RfDxAvNq1z131xf6mod 4BkfQzsfr//3huTUap7wPiyoFDF0nKmPyAOR8I97FA4gUT5Ar9+pS1ir2VNse7wU ISk4nJI+JGG06Kjs9vc3tNfGcM= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Message-ID: <544519D8.70606@codesourcery.com> Date: Mon, 20 Oct 2014 16:19:04 +0200 From: Bernd Schmidt <bernds@codesourcery.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.8.0 MIME-Version: 1.0 To: GCC Patches <gcc-patches@gcc.gnu.org> Subject: The nvptx port [1/11+] indirect jumps References: <54451994.9070209@codesourcery.com> In-Reply-To: <54451994.9070209@codesourcery.com> Content-Type: multipart/mixed; boundary="------------080301090103000807000206"

Bernd Schmidt Oct. 20, 2014, 2:19 p.m. UTC

ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
defined.  Add a sorry.


Bernd

Jeff Law Oct. 21, 2014, 6:26 p.m. UTC | #1

On 10/20/14 14:19, Bernd Schmidt wrote:
> ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
> defined.  Add a sorry.
>
>
> Bernd
>
> 001-indjumps.diff
>
>
> 	gcc/
> 	* optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
> 	sorry if necessary.
So doesn't this imply no hot-cold partitioning since we use indirect 
jumps to get across the partition?  Similarly doesn't this imply other 
missing features (setjmp/longjmp, nonlocal gotos, computed jumps?

Do you need some mechanism to ensure that hot/cold partitioning isn't 
enabled?  Do you need some kind of message specific to the other 
features, or are we going to assume that the user will map from the 
indirect jump message back to the use of setjmp/longjmp or something 
similar?

How are switches implemented (if at all)?

Jeff

Bernd Schmidt Oct. 21, 2014, 9 p.m. UTC | #2

On 10/21/2014 08:26 PM, Jeff Law wrote:
>>     * optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
>>     sorry if necessary.
> So doesn't this imply no hot-cold partitioning since we use indirect
> jumps to get across the partition?  Similarly doesn't this imply other
> missing features (setjmp/longjmp, nonlocal gotos, computed jumps?

Pretty much yes to all.

> Do you need some mechanism to ensure that hot/cold partitioning isn't
> enabled?

I guess I could clear flag_reorder_blocks_and_partition in 
nvptx_option_override. The problem hasn't come up so far.

> Do you need some kind of message specific to the other
> features, or are we going to assume that the user will map from the
> indirect jump message back to the use of setjmp/longjmp or something
> similar?

I have some sorry calls in things like a dummy nonlocal_goto pattern. It 
doesn't quite manage to catch everything without an ICE yet though.

> How are switches implemented (if at all)?

Comparison tree as you'd generate for small switches on all other targets.

Bernd

Jakub Jelinek Oct. 21, 2014, 9:30 p.m. UTC | #3

On Tue, Oct 21, 2014 at 11:00:35PM +0200, Bernd Schmidt wrote:
> On 10/21/2014 08:26 PM, Jeff Law wrote:
> >>    * optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
> >>    sorry if necessary.
> >So doesn't this imply no hot-cold partitioning since we use indirect
> >jumps to get across the partition?  Similarly doesn't this imply other
> >missing features (setjmp/longjmp, nonlocal gotos, computed jumps?
> 
> Pretty much yes to all.
> 
> >Do you need some mechanism to ensure that hot/cold partitioning isn't
> >enabled?
> 
> I guess I could clear flag_reorder_blocks_and_partition in
> nvptx_option_override. The problem hasn't come up so far.
> 
> >Do you need some kind of message specific to the other
> >features, or are we going to assume that the user will map from the
> >indirect jump message back to the use of setjmp/longjmp or something
> >similar?
> 
> I have some sorry calls in things like a dummy nonlocal_goto pattern. It
> doesn't quite manage to catch everything without an ICE yet though.

With all the sorry additions, what is actually the plan for OpenMP (dunno how
OpenACC is different in this regard)?
At least for OpenMP, the best would be if the #pragma omp target regions
and/or #pragma omp declare target functions contain anything a particular
offloading accelerator can't handle, instead of failing the whole
compilation perhaps just emit some at least by default non-fatal warning
and not emit anything for the particular offloading target, which would mean
either host fallback, or, if some other offloading target succeeded, just
that target.
The unsupported stuff can be machine dependent builtins that can't be
transformed, or e.g. the various things you've listed as unsupportable by
the PTX backend right now.

	Jakub

Bernd Schmidt Oct. 21, 2014, 9:32 p.m. UTC | #4

On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
> At least for OpenMP, the best would be if the #pragma omp target regions
> and/or #pragma omp declare target functions contain anything a particular
> offloading accelerator can't handle, instead of failing the whole
> compilation perhaps just emit some at least by default non-fatal warning
> and not emit anything for the particular offloading target, which would mean
> either host fallback, or, if some other offloading target succeeded, just
> that target.

I guess a test could be added to mkoffload if gcc were to return a 
different value for a sorry vs. any other compilation failure. The tool 
could then choose not to produce offloading support for that target.


Bernd

Richard Biener Oct. 22, 2014, 8:18 a.m. UTC | #5

On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
>>
>> At least for OpenMP, the best would be if the #pragma omp target regions
>> and/or #pragma omp declare target functions contain anything a particular
>> offloading accelerator can't handle, instead of failing the whole
>> compilation perhaps just emit some at least by default non-fatal warning
>> and not emit anything for the particular offloading target, which would
>> mean
>> either host fallback, or, if some other offloading target succeeded, just
>> that target.
>
>
> I guess a test could be added to mkoffload if gcc were to return a different
> value for a sorry vs. any other compilation failure. The tool could then
> choose not to produce offloading support for that target.

But that would be for the whole file instead of for the specific region?

So maybe we should produce one LTO offload object for each offload
function and make the symbols they are supposed to provide weak
so a fail doesn't end up failing to link the main program?

Looks like this gets somewhat awkward with the LTO setup.

Richard.

>
> Bernd
>

Jakub Jelinek Oct. 22, 2014, 8:32 a.m. UTC | #6

On Wed, Oct 22, 2014 at 10:18:49AM +0200, Richard Biener wrote:
> On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> > On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
> >>
> >> At least for OpenMP, the best would be if the #pragma omp target regions
> >> and/or #pragma omp declare target functions contain anything a particular
> >> offloading accelerator can't handle, instead of failing the whole
> >> compilation perhaps just emit some at least by default non-fatal warning
> >> and not emit anything for the particular offloading target, which would
> >> mean
> >> either host fallback, or, if some other offloading target succeeded, just
> >> that target.
> >
> >
> > I guess a test could be added to mkoffload if gcc were to return a different
> > value for a sorry vs. any other compilation failure. The tool could then
> > choose not to produce offloading support for that target.
> 
> But that would be for the whole file instead of for the specific region?
> 
> So maybe we should produce one LTO offload object for each offload
> function and make the symbols they are supposed to provide weak
> so a fail doesn't end up failing to link the main program?
> 
> Looks like this gets somewhat awkward with the LTO setup.

I don't think we want to do a fine-grained granularity here, it will only
lead to significant nightmares.  E.g. a target region can call other target
functions, if a target function it calls (perhaps directly through a series
of other target functions, perhaps indirectly through function pointers
etc.) can't be supported by the host, you'd need to give up on offloading
all target regions that do or could invoke that.  That can be in another TU
within the same shared library etc.  And, if some regions are emitted and
others are not, #pragma omp target data will behave less predictably and
more confusingly, right now it can test, does this library have usable
offloading for everything it provides (i.e. libgomp would ask the plugin to
initialize offloading from the current shared library if not already done,
and if successful, say it supports offloading for the particular device and
map variables to that device as requested, otherwise it would just assume
only host fallback is possible and not really map anything).  When a target
region is hit, from either within the target data region or elsewhere, it is
already figured out if it has to fallback to host or not.

Now, if you have fine-grained offloading, 33.2% of target regions being
offloadable, the rest not, what would you actually do in target data region?
It doesn't generically know what target regions will be encountered.
So act as if offloading perhaps was possible?  But then at each target
region find out if it is really possible?

IMHO people that care about performance will use target regions with care,
with the offloading targets that they care about in mind, for those that
don't care about that, either they will be lucky and things will work out
all, or they will just end up with host fallback.

	Jakub

Thomas Schwinge Oct. 22, 2014, 8:34 a.m. UTC | #7

Hi!

On Wed, 22 Oct 2014 10:18:49 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
> On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> > On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
> >>
> >> At least for OpenMP, the best would be if the #pragma omp target regions
> >> and/or #pragma omp declare target functions contain anything a particular
> >> offloading accelerator can't handle, instead of failing the whole
> >> compilation perhaps just emit some at least by default non-fatal warning
> >> and not emit anything for the particular offloading target, which would
> >> mean
> >> either host fallback, or, if some other offloading target succeeded, just
> >> that target.
> >
> >
> > I guess a test could be added to mkoffload if gcc were to return a different
> > value for a sorry vs. any other compilation failure. The tool could then
> > choose not to produce offloading support for that target.
> 
> But that would be for the whole file instead of for the specific region?

I'm not sure that's what you're suggesting, but at least on non-shared
memory offloading devices, you can't switch arbitrarily between
offloading device(s) and host-fallback, for you have to do data
management between the non-shared memories.

> So maybe we should produce one LTO offload object for each offload
> function and make the symbols they are supposed to provide weak
> so a fail doesn't end up failing to link the main program?
> 
> Looks like this gets somewhat awkward with the LTO setup.


Grüße,
 Thomas

Richard Biener Oct. 22, 2014, 10:02 a.m. UTC | #8

On Wed, Oct 22, 2014 at 10:34 AM, Thomas Schwinge
<thomas@codesourcery.com> wrote:
> Hi!
>
> On Wed, 22 Oct 2014 10:18:49 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> > On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
>> >>
>> >> At least for OpenMP, the best would be if the #pragma omp target regions
>> >> and/or #pragma omp declare target functions contain anything a particular
>> >> offloading accelerator can't handle, instead of failing the whole
>> >> compilation perhaps just emit some at least by default non-fatal warning
>> >> and not emit anything for the particular offloading target, which would
>> >> mean
>> >> either host fallback, or, if some other offloading target succeeded, just
>> >> that target.
>> >
>> >
>> > I guess a test could be added to mkoffload if gcc were to return a different
>> > value for a sorry vs. any other compilation failure. The tool could then
>> > choose not to produce offloading support for that target.
>>
>> But that would be for the whole file instead of for the specific region?
>
> I'm not sure that's what you're suggesting, but at least on non-shared
> memory offloading devices, you can't switch arbitrarily between
> offloading device(s) and host-fallback, for you have to do data
> management between the non-shared memories.

Oh, I see.  For HSA we simply don't emit an offload variant for code
we cannot handle.  But only for those parts.

So it's only offload or fallback for other devices?  Thus also never
share work between both for example (run N threads on the CPU
and M threads on the offload target)?

Richard.

>> So maybe we should produce one LTO offload object for each offload
>> function and make the symbols they are supposed to provide weak
>> so a fail doesn't end up failing to link the main program?
>>
>> Looks like this gets somewhat awkward with the LTO setup.
>
>
> Grüße,
>  Thomas

Jakub Jelinek Oct. 22, 2014, 10:19 a.m. UTC | #9

On Wed, Oct 22, 2014 at 12:02:16PM +0200, Richard Biener wrote:
> > I'm not sure that's what you're suggesting, but at least on non-shared
> > memory offloading devices, you can't switch arbitrarily between
> > offloading device(s) and host-fallback, for you have to do data
> > management between the non-shared memories.
> 
> Oh, I see.  For HSA we simply don't emit an offload variant for code
> we cannot handle.  But only for those parts.
> 
> So it's only offload or fallback for other devices?  Thus also never

Yeah.

> share work between both for example (run N threads on the CPU
> and M threads on the offload target)?

I believe at least for the non-shared memory the OpenMP model wouldn't allow
that.  Of course, user can do the sharing explicitly (though OpenMP 4.0
doesn't have asynchronous target regions): one could e.g. run a couple of
host tasks on the offloading region with if (0) - forced host fallback,
ensure e.g. one team and one parallel thread in that case,
and then in one host task with if (1) and use as many teams and parallel
threads as available on the offloading device.

	Jakub

Bernd Schmidt Nov. 4, 2014, 3:32 p.m. UTC | #10

On 10/20/2014 04:19 PM, Bernd Schmidt wrote:
> ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
> defined.  Add a sorry.

Looking back through all the mails it turns out this one wasn't approved 
yet. Ping?


Bernd

Richard Henderson Nov. 4, 2014, 3:43 p.m. UTC | #11

On 11/04/2014 04:32 PM, Bernd Schmidt wrote:
> On 10/20/2014 04:19 PM, Bernd Schmidt wrote:
>> ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
>> defined.  Add a sorry.
> 
> Looking back through all the mails it turns out this one wasn't approved yet.
> Ping?

Ok.


r~

The nvptx port [1/11+] indirect jumps

Commit Message

Comments

Patch