diff mbox

nvptx offloading patches [2/n]

Message ID 5454C944.4090107@codesourcery.com
State New
Headers show

Commit Message

Bernd Schmidt Nov. 1, 2014, 11:51 a.m. UTC
LTO has a mechanism not to stream out common nodes that are expected to 
be identical on each run. When using LTO to communicate between 
compilers for different targets, the va_list_type_node and related ones 
must be excluded from this.

Richard B mentioned in a recent mail that the i386 backend uses direct 
comparisons to va_list_type_node. After investigating a bit it seems to 
me that this is not actually a problem: what's being compared is the 
return value of ix86_canonical_va_list_type, which always chooses one of 
va_list_type_node or its ABI variants, so the comparison should hold 
even with this patch.

Bootstrapped and tested on x86_64-linux, ok?


Bernd

Comments

Jeff Law Nov. 3, 2014, 10:23 p.m. UTC | #1
On 11/01/14 05:51, Bernd Schmidt wrote:
> LTO has a mechanism not to stream out common nodes that are expected to
> be identical on each run. When using LTO to communicate between
> compilers for different targets, the va_list_type_node and related ones
> must be excluded from this.
>
> Richard B mentioned in a recent mail that the i386 backend uses direct
> comparisons to va_list_type_node. After investigating a bit it seems to
> me that this is not actually a problem: what's being compared is the
> return value of ix86_canonical_va_list_type, which always chooses one of
> va_list_type_node or its ABI variants, so the comparison should hold
> even with this patch.
>
> Bootstrapped and tested on x86_64-linux, ok?
Would like the SuSE guys to chime in here since they know more about the 
streamer than I.

jeff
Bernd Schmidt Nov. 14, 2014, 6:22 p.m. UTC | #2
On 11/03/2014 11:23 PM, Jeff Law wrote:
> On 11/01/14 05:51, Bernd Schmidt wrote:
>> LTO has a mechanism not to stream out common nodes that are expected to
>> be identical on each run. When using LTO to communicate between
>> compilers for different targets, the va_list_type_node and related ones
>> must be excluded from this.
>>
>> Richard B mentioned in a recent mail that the i386 backend uses direct
>> comparisons to va_list_type_node. After investigating a bit it seems to
>> me that this is not actually a problem: what's being compared is the
>> return value of ix86_canonical_va_list_type, which always chooses one of
>> va_list_type_node or its ABI variants, so the comparison should hold
>> even with this patch.
>>
>> Bootstrapped and tested on x86_64-linux, ok?
> Would like the SuSE guys to chime in here since they know more about the
> streamer than I.

Ping.


Bernd
Jakub Jelinek Feb. 4, 2015, 10:55 a.m. UTC | #3
On Sat, Nov 01, 2014 at 12:51:32PM +0100, Bernd Schmidt wrote:
> LTO has a mechanism not to stream out common nodes that are expected to be
> identical on each run. When using LTO to communicate between compilers for
> different targets, the va_list_type_node and related ones must be excluded
> from this.
> 
> Richard B mentioned in a recent mail that the i386 backend uses direct
> comparisons to va_list_type_node. After investigating a bit it seems to me
> that this is not actually a problem: what's being compared is the return
> value of ix86_canonical_va_list_type, which always chooses one of
> va_list_type_node or its ABI variants, so the comparison should hold even
> with this patch.
> 
> Bootstrapped and tested on x86_64-linux, ok?

How can the offloading of functions using va_start/va_end/va_arg work,
until we apply (in GCC 6?) Michael's patches and extend them - make
all those 3 internal functions lowered only after IPA?

I mean, nvptx supposedly contains different va_list type (from quick glance
it uses void *, while e.g. x86_64 uses a struct [1]), and we gimplify it
early, so for GCC 5 the only option is IMHO to refuse to compile (sorry?)
when streaming functions that use the host va_list type.

For GCC 6, presumably if it is lowered late, if the host va_list would be
at least as big as target va_list, we could stick stuff in there, or rewrite
to the target va_list.  Still, if e.g. va_list is embedded in structures, or
used in global vars, we'd need to pad the structures or something.

	Jakub
Jakub Jelinek Feb. 4, 2015, 10:58 a.m. UTC | #4
On Wed, Feb 04, 2015 at 11:55:54AM +0100, Jakub Jelinek wrote:
> On Sat, Nov 01, 2014 at 12:51:32PM +0100, Bernd Schmidt wrote:
> > LTO has a mechanism not to stream out common nodes that are expected to be
> > identical on each run. When using LTO to communicate between compilers for
> > different targets, the va_list_type_node and related ones must be excluded
> > from this.
> > 
> > Richard B mentioned in a recent mail that the i386 backend uses direct
> > comparisons to va_list_type_node. After investigating a bit it seems to me
> > that this is not actually a problem: what's being compared is the return
> > value of ix86_canonical_va_list_type, which always chooses one of
> > va_list_type_node or its ABI variants, so the comparison should hold even
> > with this patch.
> > 
> > Bootstrapped and tested on x86_64-linux, ok?
> 
> How can the offloading of functions using va_start/va_end/va_arg work,
> until we apply (in GCC 6?) Michael's patches and extend them - make
> all those 3 internal functions lowered only after IPA?
> 
> I mean, nvptx supposedly contains different va_list type (from quick glance
> it uses void *, while e.g. x86_64 uses a struct [1]), and we gimplify it
> early, so for GCC 5 the only option is IMHO to refuse to compile (sorry?)
> when streaming functions that use the host va_list type.
> 
> For GCC 6, presumably if it is lowered late, if the host va_list would be
> at least as big as target va_list, we could stick stuff in there, or rewrite
> to the target va_list.  Still, if e.g. va_list is embedded in structures, or
> used in global vars, we'd need to pad the structures or something.

That said, if your patch doesn't break normal LTO, I agree it doesn't make
much sense to 
> 
> 	Jakub

	Jakub
Richard Biener Feb. 9, 2015, 10:16 a.m. UTC | #5
On Wed, Feb 4, 2015 at 11:55 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Sat, Nov 01, 2014 at 12:51:32PM +0100, Bernd Schmidt wrote:
>> LTO has a mechanism not to stream out common nodes that are expected to be
>> identical on each run. When using LTO to communicate between compilers for
>> different targets, the va_list_type_node and related ones must be excluded
>> from this.
>>
>> Richard B mentioned in a recent mail that the i386 backend uses direct
>> comparisons to va_list_type_node. After investigating a bit it seems to me
>> that this is not actually a problem: what's being compared is the return
>> value of ix86_canonical_va_list_type, which always chooses one of
>> va_list_type_node or its ABI variants, so the comparison should hold even
>> with this patch.
>>
>> Bootstrapped and tested on x86_64-linux, ok?
>
> How can the offloading of functions using va_start/va_end/va_arg work,
> until we apply (in GCC 6?) Michael's patches and extend them - make
> all those 3 internal functions lowered only after IPA?
>
> I mean, nvptx supposedly contains different va_list type (from quick glance
> it uses void *, while e.g. x86_64 uses a struct [1]), and we gimplify it
> early, so for GCC 5 the only option is IMHO to refuse to compile (sorry?)
> when streaming functions that use the host va_list type.
>
> For GCC 6, presumably if it is lowered late, if the host va_list would be
> at least as big as target va_list, we could stick stuff in there, or rewrite
> to the target va_list.  Still, if e.g. va_list is embedded in structures, or
> used in global vars, we'd need to pad the structures or something.

In principle I am always happy these days to preload less nodes.

Thus, if your patch survives LTO bootstrap and you can still LTO
a TU with ms_abi valist functions successfully (not sure if that's
exercised in the testsuite) then it is fine.

Note that I _did_ run into issues with excempting nodes from
preloading because of pointer comparisons.  The issue is that
types created by the backends and the middle-end do not
participate in the type merging done by LTO.  Thus the actual
issue may be not on x86 (because it implements
the canonical_va_list_type hook) but on other targets that
end up using std_canonical_va_list_type.

Thanks,
Richard.

>         Jakub
Bernd Schmidt Feb. 17, 2015, 4:37 p.m. UTC | #6
On 02/09/2015 11:16 AM, Richard Biener wrote:

> Thus, if your patch survives LTO bootstrap and you can still LTO
> a TU with ms_abi valist functions successfully (not sure if that's
> exercised in the testsuite) then it is fine.

I've now done the LTO bootstrap, and the program below compiled with 
-flto still works. Does that seem sufficient?

> Note that I _did_ run into issues with excempting nodes from
> preloading because of pointer comparisons.  The issue is that
> types created by the backends and the middle-end do not
> participate in the type merging done by LTO.  Thus the actual
> issue may be not on x86 (because it implements
> the canonical_va_list_type hook) but on other targets that
> end up using std_canonical_va_list_type.

Hmm. That doesn't really make me want to commit it at this stage of the 
development process.


Bernd
#include <stdarg.h>
#include <cross-stdarg.h>
#include <stdio.h>
#include <stdlib.h>

static int x;
static const char *y;

__attribute__((noinline,noclone)) static void verror_msg(va_list p)
{
  x = va_arg (p, int);
  y = va_arg (p, const char *);
}

__attribute__((noinline,noclone)) static void err(int errnum, const char *s, ...)
{
  va_list p;

  va_start(p, s);
  verror_msg(p);
  va_end(p);
}

__attribute__((noinline,noclone,ms_abi)) static void verror_msg2(ms_va_list p)
{
  x = va_arg (p, int);
  y = va_arg (p, const char *);
}

__attribute__((noinline,noclone,ms_abi)) static void err2(int errnum, const char *s, ...)
{
  ms_va_list p;

  __ms_va_start (p, s);
  verror_msg2 (p);
  __ms_va_end(p);
}

int main ()
{ 
  const char *p1 = "t1";
  const char *p2 = "t2";
  err (0, "test", 3, p1);
  if (x != 3 || y != p1)
    abort ();
  err2 (0, "ms", 2, p2);
  if (x != 2 || y != p2)
    abort ();
  exit(0);
}
Jakub Jelinek Feb. 17, 2015, 5:10 p.m. UTC | #7
On Tue, Feb 17, 2015 at 05:37:04PM +0100, Bernd Schmidt wrote:
> On 02/09/2015 11:16 AM, Richard Biener wrote:
> 
> >Thus, if your patch survives LTO bootstrap and you can still LTO
> >a TU with ms_abi valist functions successfully (not sure if that's
> >exercised in the testsuite) then it is fine.
> 
> I've now done the LTO bootstrap, and the program below compiled with -flto
> still works. Does that seem sufficient?

E.g. va_list_gpr_counter_field and va_list_fpr_counter_field are compared
for equality though, for va_list_type_node TYPE_MAIN_VARIANT is compared for
equality.  Otherwise the stdarg pass might misbehave.

What exact testcase are you trying to fix with this patch, and how do you
think offloading of code using va_list can work?

	Jakub
Bernd Schmidt Feb. 17, 2015, 8:55 p.m. UTC | #8
On 02/17/2015 06:10 PM, Jakub Jelinek wrote:
>
> What exact testcase are you trying to fix with this patch, and how do you
> think offloading of code using va_list can work?

The exact testcase is any offloaded program - streaming in lto will 
crash if there is a mismatch in these preloaded nodes.

For OpenACC programs using va_list - I don't expect them to work at all. 
I don't believe the spec considers such issues, and ptx isn't expected 
to support variadic functions in the first place ("The current version 
of PTX does not support variadic functions" is what the spec has to say; 
the gcc port overachieves a little by implementing them anyway).


Bernd
diff mbox

Patch

	* tre-streamer.c (preload_common_nodes): Skip TI_VA_LIST_TYPE and
	related nodes.

Index: gcc/tree-streamer.c
===================================================================
--- gcc/tree-streamer.c.orig
+++ gcc/tree-streamer.c
@@ -309,10 +309,14 @@  preload_common_nodes (struct streamer_tr
     record_common_node (cache, sizetype_tab[i]);
 
   for (i = 0; i < TI_MAX; i++)
-    /* Skip boolean type and constants, they are frontend dependent.  */
+    /* Skip boolean type and constants, they are frontend dependent.
+       Skip va_list types, target dependent and may not survive offloading.  */
     if (i != TI_BOOLEAN_TYPE
 	&& i != TI_BOOLEAN_FALSE
-	&& i != TI_BOOLEAN_TRUE)
+	&& i != TI_BOOLEAN_TRUE
+	&& i != TI_VA_LIST_TYPE
+	&& i != TI_VA_LIST_GPR_COUNTER_FIELD
+	&& i != TI_VA_LIST_FPR_COUNTER_FIELD)
       record_common_node (cache, global_trees[i]);
 }