Message ID | 55a722a5-4452-ed93-267e-e44b1d0572ed@mentor.com |
---|---|
State | New |
Headers | show |
[ was: Re: [nvptx, committed, PR81069] Insert diverging jump alap in nvptx_single ] On 07/17/2017 10:41 AM, Tom de Vries wrote: > Hi, > > Consider nvptx_single: > ... > /* Single neutering according to MASK. FROM is the incoming block and > TO is the outgoing block. These may be the same block. Insert at > start of FROM: > > if (tid.<axis>) goto end. > > and insert before ending branch of TO (if there is such an insn): > > end: > <possibly-broadcast-cond> > <branch> > > We currently only use differnt FROM and TO when skipping an entire > loop. We could do more if we detected superblocks. */ > > static void > nvptx_single (unsigned mask, basic_block from, basic_block to) > ... > > When compiling libgomp.oacc-fortran/nested-function-1.f90 at -O1, we > observed the following pattern: > ... > <bb2>: > goto bb3; > > <bb3>: (with single predecessor) > <some code> > <cond-branch> > ... > > which was translated by nvptx_single into: > ... > <bb2> > if (tid.<axis>) goto end. > goto bb3; > > <bb3>: > <some code> > end: > <broadcast-cond> > <cond-branch> > ... > > There is no benefit to be gained from doing the goto bb3 in neutered > mode, and there is no need to, so we might as well insert the neutering > branch as late as possible: > ... > <bb2> > goto bb3; > > <bb3>: > if (tid.<axis>) goto end. > <some code> > end: > <broadcast-cond> > <cond-branch> > ... > > This patch implements inserting the neutering branch as late as possible. > > [ As it happens, the actual code for > libgomp.oacc-fortran/nested-function-1.f90 at -O1 was more complicated: > there were other bbs inbetween bb2 and bb3. While this doesn't change > anything from a control flow graph point of view, it did trigger a bug > in the ptx JIT compiler where it inserts the synchronization point for > the diverging branch later than the immediate post-dominator point at > the end label. Consequently, the condition broadcast was executed in > divergent mode (which is known to give undefined results), resulting in > a hang. > This patch also works around this ptx JIT compiler bug, for this > test-case. ] > > Build and tested on x86_64 with nvptx accelerator. > > Committed. Jakub, I'd like to backport this nvptx patch to the gcc-7-branch. The patch doesn't trivially fit into the category of regression or documentation fix. Without this patch, when building an nvptx offloading compiler and running the libgomp testsuite for the gcc-7-branch, the GPU hangs, and I've had a report from a colleague who experienced system crashes because of it. However, in principle gcc is not doing anything wrong: the generated code is according to the ptx spec. It's just that the patch makes it less likely to run into a ptx JIT bug. Then again, it's an nvptx patch, neither a primary nor secondary target. I'll commit the backport some time this week, unless there are objections. Thanks, - Tom > 0001-Insert-diverging-jump-alap-in-nvptx_single.patch > > > Insert diverging jump alap in nvptx_single > > 2017-07-17 Tom de Vries <tom@codesourcery.com> > > PR target/81069 > * config/nvptx/nvptx.c (nvptx_single): Insert diverging branch as late > as possible. > > --- > gcc/config/nvptx/nvptx.c | 22 +++++++++++++++++++--- > 1 file changed, 19 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c > index daeec27..cb11686 100644 > --- a/gcc/config/nvptx/nvptx.c > +++ b/gcc/config/nvptx/nvptx.c > @@ -3866,9 +3866,25 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) > rtx_insn *tail = BB_END (to); > unsigned skip_mask = mask; > > - /* Find first insn of from block */ > - while (head != BB_END (from) && !INSN_P (head)) > - head = NEXT_INSN (head); > + while (true) > + { > + /* Find first insn of from block. */ > + while (head != BB_END (from) && !INSN_P (head)) > + head = NEXT_INSN (head); > + > + if (from == to) > + break; > + > + if (!(JUMP_P (head) && single_succ_p (from))) > + break; > + > + basic_block jump_target = single_succ (from); > + if (!single_pred_p (jump_target)) > + break; > + > + from = jump_target; > + head = BB_HEAD (from); > + } > > /* Find last insn of to block */ > rtx_insn *limit = from == to ? head : BB_HEAD (to); >
On Tue, Jul 18, 2017 at 02:37:38PM +0200, Tom de Vries wrote: > I'd like to backport this nvptx patch to the gcc-7-branch. > > The patch doesn't trivially fit into the category of regression or > documentation fix. > > Without this patch, when building an nvptx offloading compiler and running > the libgomp testsuite for the gcc-7-branch, the GPU hangs, and I've had a > report from a colleague who experienced system crashes because of it. > > However, in principle gcc is not doing anything wrong: the generated code is > according to the ptx spec. It's just that the patch makes it less likely to > run into a ptx JIT bug. > > Then again, it's an nvptx patch, neither a primary nor secondary target. > > I'll commit the backport some time this week, unless there are objections. Ok, thanks. > > 0001-Insert-diverging-jump-alap-in-nvptx_single.patch > > > > > > Insert diverging jump alap in nvptx_single > > > > 2017-07-17 Tom de Vries <tom@codesourcery.com> > > > > PR target/81069 > > * config/nvptx/nvptx.c (nvptx_single): Insert diverging branch as late > > as possible. Jakub
Insert diverging jump alap in nvptx_single 2017-07-17 Tom de Vries <tom@codesourcery.com> PR target/81069 * config/nvptx/nvptx.c (nvptx_single): Insert diverging branch as late as possible. --- gcc/config/nvptx/nvptx.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index daeec27..cb11686 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -3866,9 +3866,25 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) rtx_insn *tail = BB_END (to); unsigned skip_mask = mask; - /* Find first insn of from block */ - while (head != BB_END (from) && !INSN_P (head)) - head = NEXT_INSN (head); + while (true) + { + /* Find first insn of from block. */ + while (head != BB_END (from) && !INSN_P (head)) + head = NEXT_INSN (head); + + if (from == to) + break; + + if (!(JUMP_P (head) && single_succ_p (from))) + break; + + basic_block jump_target = single_succ (from); + if (!single_pred_p (jump_target)) + break; + + from = jump_target; + head = BB_HEAD (from); + } /* Find last insn of to block */ rtx_insn *limit = from == to ? head : BB_HEAD (to);