Message ID | 544519D8.70606@codesourcery.com |
---|---|
State | New |
Headers | show |
On 10/20/14 14:19, Bernd Schmidt wrote: > ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be > defined. Add a sorry. > > > Bernd > > 001-indjumps.diff > > > gcc/ > * optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a > sorry if necessary. So doesn't this imply no hot-cold partitioning since we use indirect jumps to get across the partition? Similarly doesn't this imply other missing features (setjmp/longjmp, nonlocal gotos, computed jumps? Do you need some mechanism to ensure that hot/cold partitioning isn't enabled? Do you need some kind of message specific to the other features, or are we going to assume that the user will map from the indirect jump message back to the use of setjmp/longjmp or something similar? How are switches implemented (if at all)? Jeff
On 10/21/2014 08:26 PM, Jeff Law wrote: >> * optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a >> sorry if necessary. > So doesn't this imply no hot-cold partitioning since we use indirect > jumps to get across the partition? Similarly doesn't this imply other > missing features (setjmp/longjmp, nonlocal gotos, computed jumps? Pretty much yes to all. > Do you need some mechanism to ensure that hot/cold partitioning isn't > enabled? I guess I could clear flag_reorder_blocks_and_partition in nvptx_option_override. The problem hasn't come up so far. > Do you need some kind of message specific to the other > features, or are we going to assume that the user will map from the > indirect jump message back to the use of setjmp/longjmp or something > similar? I have some sorry calls in things like a dummy nonlocal_goto pattern. It doesn't quite manage to catch everything without an ICE yet though. > How are switches implemented (if at all)? Comparison tree as you'd generate for small switches on all other targets. Bernd
On Tue, Oct 21, 2014 at 11:00:35PM +0200, Bernd Schmidt wrote: > On 10/21/2014 08:26 PM, Jeff Law wrote: > >> * optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a > >> sorry if necessary. > >So doesn't this imply no hot-cold partitioning since we use indirect > >jumps to get across the partition? Similarly doesn't this imply other > >missing features (setjmp/longjmp, nonlocal gotos, computed jumps? > > Pretty much yes to all. > > >Do you need some mechanism to ensure that hot/cold partitioning isn't > >enabled? > > I guess I could clear flag_reorder_blocks_and_partition in > nvptx_option_override. The problem hasn't come up so far. > > >Do you need some kind of message specific to the other > >features, or are we going to assume that the user will map from the > >indirect jump message back to the use of setjmp/longjmp or something > >similar? > > I have some sorry calls in things like a dummy nonlocal_goto pattern. It > doesn't quite manage to catch everything without an ICE yet though. With all the sorry additions, what is actually the plan for OpenMP (dunno how OpenACC is different in this regard)? At least for OpenMP, the best would be if the #pragma omp target regions and/or #pragma omp declare target functions contain anything a particular offloading accelerator can't handle, instead of failing the whole compilation perhaps just emit some at least by default non-fatal warning and not emit anything for the particular offloading target, which would mean either host fallback, or, if some other offloading target succeeded, just that target. The unsupported stuff can be machine dependent builtins that can't be transformed, or e.g. the various things you've listed as unsupportable by the PTX backend right now. Jakub
On 10/21/2014 11:30 PM, Jakub Jelinek wrote: > At least for OpenMP, the best would be if the #pragma omp target regions > and/or #pragma omp declare target functions contain anything a particular > offloading accelerator can't handle, instead of failing the whole > compilation perhaps just emit some at least by default non-fatal warning > and not emit anything for the particular offloading target, which would mean > either host fallback, or, if some other offloading target succeeded, just > that target. I guess a test could be added to mkoffload if gcc were to return a different value for a sorry vs. any other compilation failure. The tool could then choose not to produce offloading support for that target. Bernd
On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote: > On 10/21/2014 11:30 PM, Jakub Jelinek wrote: >> >> At least for OpenMP, the best would be if the #pragma omp target regions >> and/or #pragma omp declare target functions contain anything a particular >> offloading accelerator can't handle, instead of failing the whole >> compilation perhaps just emit some at least by default non-fatal warning >> and not emit anything for the particular offloading target, which would >> mean >> either host fallback, or, if some other offloading target succeeded, just >> that target. > > > I guess a test could be added to mkoffload if gcc were to return a different > value for a sorry vs. any other compilation failure. The tool could then > choose not to produce offloading support for that target. But that would be for the whole file instead of for the specific region? So maybe we should produce one LTO offload object for each offload function and make the symbols they are supposed to provide weak so a fail doesn't end up failing to link the main program? Looks like this gets somewhat awkward with the LTO setup. Richard. > > Bernd >
On Wed, Oct 22, 2014 at 10:18:49AM +0200, Richard Biener wrote: > On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote: > > On 10/21/2014 11:30 PM, Jakub Jelinek wrote: > >> > >> At least for OpenMP, the best would be if the #pragma omp target regions > >> and/or #pragma omp declare target functions contain anything a particular > >> offloading accelerator can't handle, instead of failing the whole > >> compilation perhaps just emit some at least by default non-fatal warning > >> and not emit anything for the particular offloading target, which would > >> mean > >> either host fallback, or, if some other offloading target succeeded, just > >> that target. > > > > > > I guess a test could be added to mkoffload if gcc were to return a different > > value for a sorry vs. any other compilation failure. The tool could then > > choose not to produce offloading support for that target. > > But that would be for the whole file instead of for the specific region? > > So maybe we should produce one LTO offload object for each offload > function and make the symbols they are supposed to provide weak > so a fail doesn't end up failing to link the main program? > > Looks like this gets somewhat awkward with the LTO setup. I don't think we want to do a fine-grained granularity here, it will only lead to significant nightmares. E.g. a target region can call other target functions, if a target function it calls (perhaps directly through a series of other target functions, perhaps indirectly through function pointers etc.) can't be supported by the host, you'd need to give up on offloading all target regions that do or could invoke that. That can be in another TU within the same shared library etc. And, if some regions are emitted and others are not, #pragma omp target data will behave less predictably and more confusingly, right now it can test, does this library have usable offloading for everything it provides (i.e. libgomp would ask the plugin to initialize offloading from the current shared library if not already done, and if successful, say it supports offloading for the particular device and map variables to that device as requested, otherwise it would just assume only host fallback is possible and not really map anything). When a target region is hit, from either within the target data region or elsewhere, it is already figured out if it has to fallback to host or not. Now, if you have fine-grained offloading, 33.2% of target regions being offloadable, the rest not, what would you actually do in target data region? It doesn't generically know what target regions will be encountered. So act as if offloading perhaps was possible? But then at each target region find out if it is really possible? IMHO people that care about performance will use target regions with care, with the offloading targets that they care about in mind, for those that don't care about that, either they will be lucky and things will work out all, or they will just end up with host fallback. Jakub
Hi! On Wed, 22 Oct 2014 10:18:49 +0200, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote: > > On 10/21/2014 11:30 PM, Jakub Jelinek wrote: > >> > >> At least for OpenMP, the best would be if the #pragma omp target regions > >> and/or #pragma omp declare target functions contain anything a particular > >> offloading accelerator can't handle, instead of failing the whole > >> compilation perhaps just emit some at least by default non-fatal warning > >> and not emit anything for the particular offloading target, which would > >> mean > >> either host fallback, or, if some other offloading target succeeded, just > >> that target. > > > > > > I guess a test could be added to mkoffload if gcc were to return a different > > value for a sorry vs. any other compilation failure. The tool could then > > choose not to produce offloading support for that target. > > But that would be for the whole file instead of for the specific region? I'm not sure that's what you're suggesting, but at least on non-shared memory offloading devices, you can't switch arbitrarily between offloading device(s) and host-fallback, for you have to do data management between the non-shared memories. > So maybe we should produce one LTO offload object for each offload > function and make the symbols they are supposed to provide weak > so a fail doesn't end up failing to link the main program? > > Looks like this gets somewhat awkward with the LTO setup. Grüße, Thomas
On Wed, Oct 22, 2014 at 10:34 AM, Thomas Schwinge <thomas@codesourcery.com> wrote: > Hi! > > On Wed, 22 Oct 2014 10:18:49 +0200, Richard Biener <richard.guenther@gmail.com> wrote: >> On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote: >> > On 10/21/2014 11:30 PM, Jakub Jelinek wrote: >> >> >> >> At least for OpenMP, the best would be if the #pragma omp target regions >> >> and/or #pragma omp declare target functions contain anything a particular >> >> offloading accelerator can't handle, instead of failing the whole >> >> compilation perhaps just emit some at least by default non-fatal warning >> >> and not emit anything for the particular offloading target, which would >> >> mean >> >> either host fallback, or, if some other offloading target succeeded, just >> >> that target. >> > >> > >> > I guess a test could be added to mkoffload if gcc were to return a different >> > value for a sorry vs. any other compilation failure. The tool could then >> > choose not to produce offloading support for that target. >> >> But that would be for the whole file instead of for the specific region? > > I'm not sure that's what you're suggesting, but at least on non-shared > memory offloading devices, you can't switch arbitrarily between > offloading device(s) and host-fallback, for you have to do data > management between the non-shared memories. Oh, I see. For HSA we simply don't emit an offload variant for code we cannot handle. But only for those parts. So it's only offload or fallback for other devices? Thus also never share work between both for example (run N threads on the CPU and M threads on the offload target)? Richard. >> So maybe we should produce one LTO offload object for each offload >> function and make the symbols they are supposed to provide weak >> so a fail doesn't end up failing to link the main program? >> >> Looks like this gets somewhat awkward with the LTO setup. > > > Grüße, > Thomas
On Wed, Oct 22, 2014 at 12:02:16PM +0200, Richard Biener wrote: > > I'm not sure that's what you're suggesting, but at least on non-shared > > memory offloading devices, you can't switch arbitrarily between > > offloading device(s) and host-fallback, for you have to do data > > management between the non-shared memories. > > Oh, I see. For HSA we simply don't emit an offload variant for code > we cannot handle. But only for those parts. > > So it's only offload or fallback for other devices? Thus also never Yeah. > share work between both for example (run N threads on the CPU > and M threads on the offload target)? I believe at least for the non-shared memory the OpenMP model wouldn't allow that. Of course, user can do the sharing explicitly (though OpenMP 4.0 doesn't have asynchronous target regions): one could e.g. run a couple of host tasks on the offloading region with if (0) - forced host fallback, ensure e.g. one team and one parallel thread in that case, and then in one host task with if (1) and use as many teams and parallel threads as available on the offloading device. Jakub
On 10/20/2014 04:19 PM, Bernd Schmidt wrote: > ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be > defined. Add a sorry. Looking back through all the mails it turns out this one wasn't approved yet. Ping? Bernd
On 11/04/2014 04:32 PM, Bernd Schmidt wrote: > On 10/20/2014 04:19 PM, Bernd Schmidt wrote: >> ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be >> defined. Add a sorry. > > Looking back through all the mails it turns out this one wasn't approved yet. > Ping? Ok. r~
gcc/ * optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a sorry if necessary. ------------------------------------------------------------------------ Index: gcc/optabs.c =================================================================== --- gcc/optabs.c (revision 422345) +++ gcc/optabs.c (revision 422346) @@ -4477,13 +4477,16 @@ prepare_float_lib_cmp (rtx x, rtx y, enu /* Generate code to indirectly jump to a location given in the rtx LOC. */ void -emit_indirect_jump (rtx loc) +emit_indirect_jump (rtx loc ATTRIBUTE_UNUSED) { +#ifndef HAVE_indirect_jump + sorry ("indirect jumps are not available on this target"); +#else struct expand_operand ops[1]; - create_address_operand (&ops[0], loc); expand_jump_insn (CODE_FOR_indirect_jump, 1, ops); emit_barrier (); +#endif } #ifdef HAVE_conditional_move