Message ID | 20201029155054.GA54974@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Series | Avoid char[] array in tree_def | expand |
On Thu, Oct 29, 2020 at 04:50:54PM +0100, Jan Hubicka wrote: > * tree.c (build_string): Update. > * tree-core.h (tree_fixed_cst): Avoid typeless storage. Is it valid then to #define TREE_STRING_POINTER(NODE) \ ((const char *)(STRING_CST_CHECK (NODE)->string.str)) and strcpy etc. it around though? Maybe yes, because stores through char can alias anything. > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > index c9280a8d3b1..63dbb5b8eab 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -1401,7 +1401,8 @@ struct GTY(()) tree_fixed_cst { > struct GTY(()) tree_string { > struct tree_typed typed; > int length; > - char str[1]; > + /* Avoid char array that would make whole type to be typeless storage. */ > + struct {char c;} str[1]; > }; > > struct GTY(()) tree_complex { > diff --git a/gcc/tree.c b/gcc/tree.c > index 81f867ddded..84115630184 100644 > --- a/gcc/tree.c > +++ b/gcc/tree.c > @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */) > memcpy (s->string.str, str, len); > else > memset (s->string.str, 0, len); > - s->string.str[len] = '\0'; > + s->string.str[len].c = '\0'; > > return s; > } Jakub
On Thu, 29 Oct 2020, Jan Hubicka wrote: > Hi, > this patch removes second char array from tree_def union and makes it > !TYPELESS_STORAGE. Now all accesses to anything in tree no longer have alias > set 0, but they all have alias set 1 :) > This is because the way we handle unions. However it still increases TBAA > effectivity by about 12%. From: > > Alias oracle query stats: > refs_may_alias_p: 65066258 disambiguations, 74846942 queries > ref_maybe_used_by_call_p: 152444 disambiguations, 65966862 queries > call_may_clobber_ref_p: 22546 disambiguations, 28559 queries > nonoverlapping_component_refs_p: 0 disambiguations, 36816 queries > nonoverlapping_refs_since_match_p: 27230 disambiguations, 58300 must overlaps, 86300 queries > aliasing_component_refs_p: 66090 disambiguations, 2048800 queries > TBAA oracle: 25578632 disambiguations 59483650 queries > 12219919 are in alias set 0 > 10534575 queries asked about the same object > 125 queries asked about the same alias set > 0 access volatile > 9491563 are dependent in the DAG > 1658836 are aritificially in conflict with void * > > Modref stats: > modref use: 14421 disambiguations, 48129 queries > modref clobber: 1528229 disambiguations, 1926907 queries > 3881547 tbaa queries (2.014392 per modref query) > 565057 base compares (0.293246 per modref query) > > PTA query stats: > pt_solution_includes: 947491 disambiguations, 13119151 queries > pt_solutions_intersect: 1043695 disambiguations, 13221495 queries > > To: > > Alias oracle query stats: > refs_may_alias_p: 66455561 disambiguations, 75202803 queries > ref_maybe_used_by_call_p: 155301 disambiguations, 67370278 queries > call_may_clobber_ref_p: 22550 disambiguations, 28587 queries > nonoverlapping_component_refs_p: 0 disambiguations, 37058 queries > nonoverlapping_refs_since_match_p: 28126 disambiguations, 59906 must overlaps, 88990 queries > aliasing_component_refs_p: 66375 disambiguations, 2440039 queries > TBAA oracle: 28800751 disambiguations 64328055 queries > 8053661 are in alias set 0 > 11181983 queries asked about the same object > 125 queries asked about the same alias set > 0 access volatile > 13905691 are dependent in the DAG > 2385844 are aritificially in conflict with void * > > Modref stats: > modref use: 16781 disambiguations, 52031 queries > modref clobber: 1745589 disambiguations, 2149518 queries > 4192266 tbaa queries (1.950328 per modref query) > 559148 base compares (0.260127 per modref query) > > PTA query stats: > pt_solution_includes: 906487 disambiguations, 13105994 queries > pt_solutions_intersect: 1041144 disambiguations, 13659726 queries > > Bootstrapped/regtested x86_64-linux, OK? > > * tree.c (build_string): Update. > * tree-core.h (tree_fixed_cst): Avoid typeless storage. > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > index c9280a8d3b1..63dbb5b8eab 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -1401,7 +1401,8 @@ struct GTY(()) tree_fixed_cst { > struct GTY(()) tree_string { > struct tree_typed typed; > int length; > - char str[1]; > + /* Avoid char array that would make whole type to be typeless storage. */ > + struct {char c;} str[1]; That's ugly and will for sure defeat warning / access code when we access this as char[], no? I mean, we could as well use 'int str[1];' here? Maybe we can invent some C++ attribute for this? [[gnu::string]] or so that marks it as actual char and not typeless storage? Richard. > }; > > struct GTY(()) tree_complex { > diff --git a/gcc/tree.c b/gcc/tree.c > index 81f867ddded..84115630184 100644 > --- a/gcc/tree.c > +++ b/gcc/tree.c > @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */) > memcpy (s->string.str, str, len); > else > memset (s->string.str, 0, len); > - s->string.str[len] = '\0'; > + s->string.str[len].c = '\0'; > > return s; > } >
> On Thu, Oct 29, 2020 at 04:50:54PM +0100, Jan Hubicka wrote: > > * tree.c (build_string): Update. > > * tree-core.h (tree_fixed_cst): Avoid typeless storage. > > Is it valid then to > #define TREE_STRING_POINTER(NODE) \ > ((const char *)(STRING_CST_CHECK (NODE)->string.str)) > and strcpy etc. it around though? > Maybe yes, because stores through char can alias anything. Yep, I think it should be valid for that reason. The whole thing is not terribly pretty (the wide-int change was better), but I do not know of better solution and it affects our core datastructure. Typeless storage is really complicated concept. Forutnately it seems that there are no more hacks like this needed: both tree and gimple now gets non-zero alias set. Honza > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > > index c9280a8d3b1..63dbb5b8eab 100644 > > --- a/gcc/tree-core.h > > +++ b/gcc/tree-core.h > > @@ -1401,7 +1401,8 @@ struct GTY(()) tree_fixed_cst { > > struct GTY(()) tree_string { > > struct tree_typed typed; > > int length; > > - char str[1]; > > + /* Avoid char array that would make whole type to be typeless storage. */ > > + struct {char c;} str[1]; > > }; > > > > struct GTY(()) tree_complex { > > diff --git a/gcc/tree.c b/gcc/tree.c > > index 81f867ddded..84115630184 100644 > > --- a/gcc/tree.c > > +++ b/gcc/tree.c > > @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */) > > memcpy (s->string.str, str, len); > > else > > memset (s->string.str, 0, len); > > - s->string.str[len] = '\0'; > > + s->string.str[len].c = '\0'; > > > > return s; > > } > > Jakub >
> > That's ugly and will for sure defeat warning / access code > when we access this as char[], no? I mean, we could > as well use 'int str[1];' here? Well, we always get char pointer via macro that is IMO OK, but I am also not very much in love with this. > > Maybe we can invent some C++ attribute for this? > > [[gnu::string]] > > or so that marks it as actual char and not typeless storage? Attribute would probably make sense. Not sure if gnu::string is best name given that it can be also meaningful for array of small integers (such as in wide_int). Honza > > Richard. > > > }; > > > > struct GTY(()) tree_complex { > > diff --git a/gcc/tree.c b/gcc/tree.c > > index 81f867ddded..84115630184 100644 > > --- a/gcc/tree.c > > +++ b/gcc/tree.c > > @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */) > > memcpy (s->string.str, str, len); > > else > > memset (s->string.str, 0, len); > > - s->string.str[len] = '\0'; > > + s->string.str[len].c = '\0'; > > > > return s; > > } > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imend
On Thu, 29 Oct 2020, Jan Hubicka wrote: > > > > That's ugly and will for sure defeat warning / access code > > when we access this as char[], no? I mean, we could > > as well use 'int str[1];' here? > > Well, we always get char pointer via macro that is IMO OK, but I am also > not very much in love with this. > > > > Maybe we can invent some C++ attribute for this? > > > > [[gnu::string]] > > > > or so that marks it as actual char and not typeless storage? > > Attribute would probably make sense. Not sure if gnu::string is best > name given that it can be also meaningful for array of small integers > (such as in wide_int). OK, maybe [[gnu::strictly_typed]] then? > Honza > > > > Richard. > > > > > }; > > > > > > struct GTY(()) tree_complex { > > > diff --git a/gcc/tree.c b/gcc/tree.c > > > index 81f867ddded..84115630184 100644 > > > --- a/gcc/tree.c > > > +++ b/gcc/tree.c > > > @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */) > > > memcpy (s->string.str, str, len); > > > else > > > memset (s->string.str, 0, len); > > > - s->string.str[len] = '\0'; > > > + s->string.str[len].c = '\0'; > > > > > > return s; > > > } > > > > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Felix Imend >
On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote: > > > > That's ugly and will for sure defeat warning / access code > > when we access this as char[], no? I mean, we could > > as well use 'int str[1];' here? > > Well, we always get char pointer via macro that is IMO OK, but I am also > not very much in love with this. Do we treat signed char [...]; as typeless storage too, or just what the C++ standard requires (i.e. char, unsigned char and std::byte where the last one is enum type with unsigned char underlying type)? Jakub
> On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote: > > > > > > That's ugly and will for sure defeat warning / access code > > > when we access this as char[], no? I mean, we could > > > as well use 'int str[1];' here? > > > > Well, we always get char pointer via macro that is IMO OK, but I am also > > not very much in love with this. > > Do we treat signed char [...]; as typeless storage too, or just > what the C++ standard requires (i.e. char, unsigned char and std::byte > where the last one is enum type with unsigned char underlying type)? struct a {signed char b[10];int d;} c; void test () { c.d=1; } still leads to alias set 0 access, so perhaps this can be improved. Where the standard specifies this? (also coincidentally I have no idea where C++ sets typeless storage to 1 :) Honza > > Jakub >
> On Thu, 29 Oct 2020, Jan Hubicka wrote: > > > > > > > That's ugly and will for sure defeat warning / access code > > > when we access this as char[], no? I mean, we could > > > as well use 'int str[1];' here? > > > > Well, we always get char pointer via macro that is IMO OK, but I am also > > not very much in love with this. > > > > > > Maybe we can invent some C++ attribute for this? > > > > > > [[gnu::string]] > > > > > > or so that marks it as actual char and not typeless storage? > > > > Attribute would probably make sense. Not sure if gnu::string is best > > name given that it can be also meaningful for array of small integers > > (such as in wide_int). > > OK, maybe [[gnu::strictly_typed]] then? This looks like good idea to me (and probably also making difference with signed char as Jakub suggests). I am adding Jason to CC, since he may know better. Honza
On Thu, 29 Oct 2020, Jakub Jelinek wrote: > On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote: > > > > > > That's ugly and will for sure defeat warning / access code > > > when we access this as char[], no? I mean, we could > > > as well use 'int str[1];' here? > > > > Well, we always get char pointer via macro that is IMO OK, but I am also > > not very much in love with this. > > Do we treat signed char [...]; as typeless storage too, or just > what the C++ standard requires (i.e. char, unsigned char and std::byte > where the last one is enum type with unsigned char underlying type)? All that is covered by is_byte_access_type which includes all character types (including char16_t and wchar it seems) and std::byte. Richard. > Jakub > >
On 10/29/20 1:40 PM, Richard Biener wrote: > On Thu, 29 Oct 2020, Jakub Jelinek wrote: > >> On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote: >>>> >>>> That's ugly and will for sure defeat warning / access code >>>> when we access this as char[], no? I mean, we could >>>> as well use 'int str[1];' here? >>> >>> Well, we always get char pointer via macro that is IMO OK, but I am also >>> not very much in love with this. >> >> Do we treat signed char [...]; as typeless storage too, or just >> what the C++ standard requires (i.e. char, unsigned char and std::byte >> where the last one is enum type with unsigned char underlying type)? > > All that is covered by is_byte_access_type which includes all > character types (including char16_t and wchar it seems) and std::byte. Well, that's a bug, apparently from the PR94923 patch (r11-499); previously it was char, signed char, and unsigned char. And even that is too much; even C++98 said just char and unsigned char could be used for bytewise access. When C++17 clarified this with the notion of "provides storage", it applied to only unsigned char and std::byte, not even the full set of byte-access types. We still need to allow bytewise access using plain char, but perhaps we don't need to treat plain char arrays as typeless. Attributes to say that a particular array does or does not provide storage for objects of other types do sound useful. Jason
> On 10/29/20 1:40 PM, Richard Biener wrote: > > On Thu, 29 Oct 2020, Jakub Jelinek wrote: > > > > > On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote: > > > > > > > > > > That's ugly and will for sure defeat warning / access code > > > > > when we access this as char[], no? I mean, we could > > > > > as well use 'int str[1];' here? > > > > > > > > Well, we always get char pointer via macro that is IMO OK, but I am also > > > > not very much in love with this. > > > > > > Do we treat signed char [...]; as typeless storage too, or just > > > what the C++ standard requires (i.e. char, unsigned char and std::byte > > > where the last one is enum type with unsigned char underlying type)? > > > > All that is covered by is_byte_access_type which includes all > > character types (including char16_t and wchar it seems) and std::byte. > > Well, that's a bug, apparently from the PR94923 patch (r11-499); previously > it was char, signed char, and unsigned char. And even that is too much; > even C++98 said just char and unsigned char could be used for bytewise > access. > > When C++17 clarified this with the notion of "provides storage", it applied > to only unsigned char and std::byte, not even the full set of byte-access > types. We still need to allow bytewise access using plain char, but perhaps > we don't need to treat plain char arrays as typeless. > > Attributes to say that a particular array does or does not provide storage > for objects of other types do sound useful. Thanks a lot for explanation! I am adding Martin Sebor to CC. Perhaps you can fix the problem with std::byte, and signed char to imply typeless storage and ideally also add an attribute? This seem too deep into C++ FE details for me. I was thiking a bit more about all accesses to trees getting same alias set. This is because we allow type puning through unions. If our tree datastructure was a C++ class hiearchy, this would not happen since accesses to specialized tree node would not be through unions but through casted pointers. We could do that with our acessor macros, too. I tried to turn (NODE)->base in our accesors to ((tree_base *)(node))->base and this breaks with const_tree pointers. However the following seem to work and get better TBAA. I think converting other acessors should be quite easy and make our predicates more easy to optimize. What do you think? Honza diff --git a/gcc/tree.h b/gcc/tree.h index 7f0aa5b8d1d..cd8146808f7 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -235,13 +235,24 @@ as_internal_fn (combined_fn code) #define NULL_TREE (tree) NULL +inline tree_base * +as_a_tree_base (tree t) +{ + return (tree_base *)t; +} +inline const tree_base * +as_a_tree_base (const_tree t) +{ + return (const tree_base *)t; +} + /* Define accessors for the fields that all tree nodes have (though some fields are not used for all kinds of nodes). */ /* The tree-code says what kind of node it is. Codes are defined in tree.def. */ -#define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code) -#define TREE_SET_CODE(NODE, VALUE) ((NODE)->base.code = (VALUE)) +#define TREE_CODE(NODE) ((enum tree_code) (as_a_tree_base (NODE)->code)) +#define TREE_SET_CODE(NODE, VALUE) (as_a_tree_base (NODE)->code = (VALUE)) /* When checking is enabled, errors will be generated if a tree node is accessed incorrectly. The macros die with a fatal error. */ @@ -642,7 +653,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, In IDENTIFIER_NODEs, this means that some extern decl for this name had its address taken. That matters for inline functions. In a STMT_EXPR, it means we want the result of the enclosed expression. */ -#define TREE_ADDRESSABLE(NODE) ((NODE)->base.addressable_flag) +#define TREE_ADDRESSABLE(NODE) (as_a_tree_base (NODE)->addressable_flag) /* Set on a CALL_EXPR if the call is in a tail position, ie. just before the exit of a function. Calls for which this is true are candidates for tail @@ -670,7 +681,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, /* In a VAR_DECL, nonzero means allocate static storage. In a FUNCTION_DECL, nonzero if function has been defined. In a CONSTRUCTOR, nonzero means allocate static storage. */ -#define TREE_STATIC(NODE) ((NODE)->base.static_flag) +#define TREE_STATIC(NODE) (as_a_tree_base (NODE)->static_flag) /* In an ADDR_EXPR, nonzero means do not use a trampoline. */ #define TREE_NO_TRAMPOLINE(NODE) (ADDR_EXPR_CHECK (NODE)->base.static_flag) @@ -701,7 +712,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, warnings concerning the decl should be suppressed. This is used at least for used-before-set warnings, and it set after one warning is emitted. */ -#define TREE_NO_WARNING(NODE) ((NODE)->base.nowarning_flag) +#define TREE_NO_WARNING(NODE) (as_a_tree_base (NODE)->nowarning_flag) /* Nonzero if we should warn about the change in empty class parameter passing ABI in this TU. */ @@ -744,7 +755,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, In an IDENTIFIER_NODE, nonzero means an external declaration accessible from outside this translation unit was previously seen for this name in an inner scope. */ -#define TREE_PUBLIC(NODE) ((NODE)->base.public_flag) +#define TREE_PUBLIC(NODE) (as_a_tree_base (NODE)->public_flag) /* In a _TYPE, indicates whether TYPE_CACHED_VALUES contains a vector of cached values, or is something else. */ @@ -766,7 +777,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, reference to a volatile variable. In a ..._DECL, this is set only if the declaration said `volatile'. This will never be set for a constant. */ #define TREE_SIDE_EFFECTS(NODE) \ - (NON_TYPE_CHECK (NODE)->base.side_effects_flag) + (as_a_tree_base (NON_TYPE_CHECK (NODE))->side_effects_flag) /* In a LABEL_DECL, nonzero means this label had its address taken and therefore can never be deleted and is a jump target for @@ -796,7 +807,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, because eventually we may make that a different bit. If this bit is set in an expression, so is TREE_SIDE_EFFECTS. */ -#define TREE_THIS_VOLATILE(NODE) ((NODE)->base.volatile_flag) +#define TREE_THIS_VOLATILE(NODE) (as_a_tree_base (NODE)->volatile_flag) /* Nonzero means this node will not trap. In an INDIRECT_REF, means accessing the memory pointed to won't generate a trap. However, @@ -877,20 +888,20 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, In a BLOCK node, nonzero if reorder_blocks has already seen this block. In an SSA_NAME node, nonzero if the SSA_NAME occurs in an abnormal PHI node. */ -#define TREE_ASM_WRITTEN(NODE) ((NODE)->base.asm_written_flag) +#define TREE_ASM_WRITTEN(NODE) (as_a_tree_base (NODE)->asm_written_flag) /* Nonzero in a _DECL if the name is used in its scope. Nonzero in an expr node means inhibit warning if value is unused. In IDENTIFIER_NODEs, this means that some extern decl for this name was used. In a BLOCK, this means that the block contains variables that are used. */ -#define TREE_USED(NODE) ((NODE)->base.used_flag) +#define TREE_USED(NODE) (as_a_tree_base (NODE)->used_flag) /* In a FUNCTION_DECL, nonzero means a call to the function cannot throw an exception. In a CALL_EXPR, nonzero means the call cannot throw. We can't easily check the node type here as the C++ frontend also uses this flag (for AGGR_INIT_EXPR). */ -#define TREE_NOTHROW(NODE) ((NODE)->base.nothrow_flag) +#define TREE_NOTHROW(NODE) (as_a_tree_base (NODE)->nothrow_flag) /* In a CALL_EXPR, means that it's safe to use the target of the call expansion as the return slot for a call that returns in memory. */ @@ -939,9 +950,9 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, (CALL_EXPR_CHECK (NODE)->base.protected_flag) /* Used in classes in C++. */ -#define TREE_PRIVATE(NODE) ((NODE)->base.private_flag) +#define TREE_PRIVATE(NODE) (as_a_tree_base (NODE)->private_flag) /* Used in classes in C++. */ -#define TREE_PROTECTED(NODE) ((NODE)->base.protected_flag) +#define TREE_PROTECTED(NODE) (as_a_tree_base (NODE)->protected_flag) /* True if reference type NODE is a C++ rvalue reference. */ #define TYPE_REF_IS_RVALUE(NODE) \ @@ -950,7 +961,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, /* Nonzero in a _DECL if the use of the name is defined as a deprecated feature by __attribute__((deprecated)). */ #define TREE_DEPRECATED(NODE) \ - ((NODE)->base.deprecated_flag) + (as_a_tree_base (NODE)->deprecated_flag) /* Nonzero indicates an IDENTIFIER_NODE that names an anonymous aggregate, (as created by anon_aggr_name_format). */ @@ -2170,7 +2181,7 @@ extern tree vector_element_bits_tree (const_tree); /* Used to keep track of visited nodes in tree traversals. This is set to 0 by copy_node and make_node. */ -#define TREE_VISITED(NODE) ((NODE)->base.visited) +#define TREE_VISITED(NODE) (as_a_tree_base (NODE)->visited) /* If set in an ARRAY_TYPE, indicates a string type (for languages that distinguish string from array of char).
On October 31, 2020 11:35:01 PM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote: >> On 10/29/20 1:40 PM, Richard Biener wrote: >> > On Thu, 29 Oct 2020, Jakub Jelinek wrote: >> > >> > > On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote: >> > > > > >> > > > > That's ugly and will for sure defeat warning / access code >> > > > > when we access this as char[], no? I mean, we could >> > > > > as well use 'int str[1];' here? >> > > > >> > > > Well, we always get char pointer via macro that is IMO OK, but >I am also >> > > > not very much in love with this. >> > > >> > > Do we treat signed char [...]; as typeless storage too, or just >> > > what the C++ standard requires (i.e. char, unsigned char and >std::byte >> > > where the last one is enum type with unsigned char underlying >type)? >> > >> > All that is covered by is_byte_access_type which includes all >> > character types (including char16_t and wchar it seems) and >std::byte. >> >> Well, that's a bug, apparently from the PR94923 patch (r11-499); >previously >> it was char, signed char, and unsigned char. And even that is too >much; >> even C++98 said just char and unsigned char could be used for >bytewise >> access. >> >> When C++17 clarified this with the notion of "provides storage", it >applied >> to only unsigned char and std::byte, not even the full set of >byte-access >> types. We still need to allow bytewise access using plain char, but >perhaps >> we don't need to treat plain char arrays as typeless. >> >> Attributes to say that a particular array does or does not provide >storage >> for objects of other types do sound useful. > >Thanks a lot for explanation! >I am adding Martin Sebor to CC. Perhaps you can fix the problem with >std::byte, and signed char to imply typeless storage and ideally also >add an attribute? This seem too deep into C++ FE details for me. > >I was thiking a bit more about all accesses to trees getting same alias >set. This is because we allow type puning through unions. If our tree >datastructure was a C++ class hiearchy, this would not happen since >accesses to specialized tree node would not be through unions but >through casted pointers. > >We could do that with our acessor macros, too. >I tried to turn (NODE)->base in our accesors to ((tree_base >*)(node))->base >and this breaks with const_tree pointers. However the following seem to >work and get better TBAA. > >I think converting other acessors should be quite easy and make our >predicates more easy to optimize. > >What do you think? Can you adjust TREE_SET_CODE to assert the corresponding tree struct doesn't change? Otherwise sure - this would be a good change. Can we avoid the online function though? It'll make -O0 debugging even more awkward... Richard. >Honza > >diff --git a/gcc/tree.h b/gcc/tree.h >index 7f0aa5b8d1d..cd8146808f7 100644 >--- a/gcc/tree.h >+++ b/gcc/tree.h >@@ -235,13 +235,24 @@ as_internal_fn (combined_fn code) > > #define NULL_TREE (tree) NULL > >+inline tree_base * >+as_a_tree_base (tree t) >+{ >+ return (tree_base *)t; >+} >+inline const tree_base * >+as_a_tree_base (const_tree t) >+{ >+ return (const tree_base *)t; >+} >+ > /* Define accessors for the fields that all tree nodes have > (though some fields are not used for all kinds of nodes). */ > > /* The tree-code says what kind of node it is. > Codes are defined in tree.def. */ >-#define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code) >-#define TREE_SET_CODE(NODE, VALUE) ((NODE)->base.code = (VALUE)) >+#define TREE_CODE(NODE) ((enum tree_code) (as_a_tree_base >(NODE)->code)) >+#define TREE_SET_CODE(NODE, VALUE) (as_a_tree_base (NODE)->code = >(VALUE)) > > /* When checking is enabled, errors will be generated if a tree node > is accessed incorrectly. The macros die with a fatal error. */ >@@ -642,7 +653,7 @@ extern void omp_clause_range_check_failed >(const_tree, const char *, int, > In IDENTIFIER_NODEs, this means that some extern decl for this name > had its address taken. That matters for inline functions. >In a STMT_EXPR, it means we want the result of the enclosed expression. > */ >-#define TREE_ADDRESSABLE(NODE) ((NODE)->base.addressable_flag) >+#define TREE_ADDRESSABLE(NODE) (as_a_tree_base >(NODE)->addressable_flag) > >/* Set on a CALL_EXPR if the call is in a tail position, ie. just >before the >exit of a function. Calls for which this is true are candidates for >tail >@@ -670,7 +681,7 @@ extern void omp_clause_range_check_failed >(const_tree, const char *, int, > /* In a VAR_DECL, nonzero means allocate static storage. > In a FUNCTION_DECL, nonzero if function has been defined. > In a CONSTRUCTOR, nonzero means allocate static storage. */ >-#define TREE_STATIC(NODE) ((NODE)->base.static_flag) >+#define TREE_STATIC(NODE) (as_a_tree_base (NODE)->static_flag) > > /* In an ADDR_EXPR, nonzero means do not use a trampoline. */ >#define TREE_NO_TRAMPOLINE(NODE) (ADDR_EXPR_CHECK >(NODE)->base.static_flag) >@@ -701,7 +712,7 @@ extern void omp_clause_range_check_failed >(const_tree, const char *, int, > warnings concerning the decl should be suppressed. This is used at > least for used-before-set warnings, and it set after one warning is > emitted. */ >-#define TREE_NO_WARNING(NODE) ((NODE)->base.nowarning_flag) >+#define TREE_NO_WARNING(NODE) (as_a_tree_base (NODE)->nowarning_flag) > > /* Nonzero if we should warn about the change in empty class parameter > passing ABI in this TU. */ >@@ -744,7 +755,7 @@ extern void omp_clause_range_check_failed >(const_tree, const char *, int, > In an IDENTIFIER_NODE, nonzero means an external declaration > accessible from outside this translation unit was previously seen > for this name in an inner scope. */ >-#define TREE_PUBLIC(NODE) ((NODE)->base.public_flag) >+#define TREE_PUBLIC(NODE) (as_a_tree_base (NODE)->public_flag) > > /* In a _TYPE, indicates whether TYPE_CACHED_VALUES contains a vector > of cached values, or is something else. */ >@@ -766,7 +777,7 @@ extern void omp_clause_range_check_failed >(const_tree, const char *, int, >reference to a volatile variable. In a ..._DECL, this is set only if >the >declaration said `volatile'. This will never be set for a constant. >*/ > #define TREE_SIDE_EFFECTS(NODE) \ >- (NON_TYPE_CHECK (NODE)->base.side_effects_flag) >+ (as_a_tree_base (NON_TYPE_CHECK (NODE))->side_effects_flag) > > /* In a LABEL_DECL, nonzero means this label had its address taken > and therefore can never be deleted and is a jump target for >@@ -796,7 +807,7 @@ extern void omp_clause_range_check_failed >(const_tree, const char *, int, > because eventually we may make that a different bit. > > If this bit is set in an expression, so is TREE_SIDE_EFFECTS. */ >-#define TREE_THIS_VOLATILE(NODE) ((NODE)->base.volatile_flag) >+#define TREE_THIS_VOLATILE(NODE) (as_a_tree_base >(NODE)->volatile_flag) > > /* Nonzero means this node will not trap. In an INDIRECT_REF, means > accessing the memory pointed to won't generate a trap. However, >@@ -877,20 +888,20 @@ extern void omp_clause_range_check_failed >(const_tree, const char *, int, >In a BLOCK node, nonzero if reorder_blocks has already seen this block. > In an SSA_NAME node, nonzero if the SSA_NAME occurs in an abnormal > PHI node. */ >-#define TREE_ASM_WRITTEN(NODE) ((NODE)->base.asm_written_flag) >+#define TREE_ASM_WRITTEN(NODE) (as_a_tree_base >(NODE)->asm_written_flag) > > /* Nonzero in a _DECL if the name is used in its scope. > Nonzero in an expr node means inhibit warning if value is unused. > In IDENTIFIER_NODEs, this means that some extern decl for this name > was used. >In a BLOCK, this means that the block contains variables that are used. > */ >-#define TREE_USED(NODE) ((NODE)->base.used_flag) >+#define TREE_USED(NODE) (as_a_tree_base (NODE)->used_flag) > > /* In a FUNCTION_DECL, nonzero means a call to the function cannot > throw an exception. In a CALL_EXPR, nonzero means the call cannot > throw. We can't easily check the node type here as the C++ > frontend also uses this flag (for AGGR_INIT_EXPR). */ >-#define TREE_NOTHROW(NODE) ((NODE)->base.nothrow_flag) >+#define TREE_NOTHROW(NODE) (as_a_tree_base (NODE)->nothrow_flag) > > /* In a CALL_EXPR, means that it's safe to use the target of the call > expansion as the return slot for a call that returns in memory. */ >@@ -939,9 +950,9 @@ extern void omp_clause_range_check_failed >(const_tree, const char *, int, > (CALL_EXPR_CHECK (NODE)->base.protected_flag) > > /* Used in classes in C++. */ >-#define TREE_PRIVATE(NODE) ((NODE)->base.private_flag) >+#define TREE_PRIVATE(NODE) (as_a_tree_base (NODE)->private_flag) > /* Used in classes in C++. */ >-#define TREE_PROTECTED(NODE) ((NODE)->base.protected_flag) >+#define TREE_PROTECTED(NODE) (as_a_tree_base (NODE)->protected_flag) > > /* True if reference type NODE is a C++ rvalue reference. */ > #define TYPE_REF_IS_RVALUE(NODE) \ >@@ -950,7 +961,7 @@ extern void omp_clause_range_check_failed >(const_tree, const char *, int, > /* Nonzero in a _DECL if the use of the name is defined as a > deprecated feature by __attribute__((deprecated)). */ > #define TREE_DEPRECATED(NODE) \ >- ((NODE)->base.deprecated_flag) >+ (as_a_tree_base (NODE)->deprecated_flag) > > /* Nonzero indicates an IDENTIFIER_NODE that names an anonymous > aggregate, (as created by anon_aggr_name_format). */ >@@ -2170,7 +2181,7 @@ extern tree vector_element_bits_tree >(const_tree); > >/* Used to keep track of visited nodes in tree traversals. This is set >to > 0 by copy_node and make_node. */ >-#define TREE_VISITED(NODE) ((NODE)->base.visited) >+#define TREE_VISITED(NODE) (as_a_tree_base (NODE)->visited) > > /* If set in an ARRAY_TYPE, indicates a string type (for languages > that distinguish string from array of char).
On Sat, Oct 31, 2020 at 6:35 PM Jan Hubicka <hubicka@ucw.cz> wrote: > > On 10/29/20 1:40 PM, Richard Biener wrote: > > > On Thu, 29 Oct 2020, Jakub Jelinek wrote: > > > > > > > On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote: > > > > > > > > > > > > That's ugly and will for sure defeat warning / access code > > > > > > when we access this as char[], no? I mean, we could > > > > > > as well use 'int str[1];' here? > > > > > > > > > > Well, we always get char pointer via macro that is IMO OK, but I > am also > > > > > not very much in love with this. > > > > > > > > Do we treat signed char [...]; as typeless storage too, or just > > > > what the C++ standard requires (i.e. char, unsigned char and > std::byte > > > > where the last one is enum type with unsigned char underlying type)? > > > > > > All that is covered by is_byte_access_type which includes all > > > character types (including char16_t and wchar it seems) and std::byte. > > > > Well, that's a bug, apparently from the PR94923 patch (r11-499); > previously > > it was char, signed char, and unsigned char. And even that is too much; > > even C++98 said just char and unsigned char could be used for bytewise > > access. > > > > When C++17 clarified this with the notion of "provides storage", it > applied > > to only unsigned char and std::byte, not even the full set of byte-access > > types. We still need to allow bytewise access using plain char, but > perhaps > > we don't need to treat plain char arrays as typeless. > > > > Attributes to say that a particular array does or does not provide > storage > > for objects of other types do sound useful. > > Thanks a lot for explanation! > I am adding Martin Sebor to CC. Perhaps you can fix the problem with > std::byte, and signed char to imply typeless storage and ideally also > add an attribute? This seem too deep into C++ FE details for me. > I've fixed is_byte_access_type; now we shouldn't treat arrays of signed char as typeless storage. I don't have time to add the attribute in stage 1. Jason
diff --git a/gcc/tree-core.h b/gcc/tree-core.h index c9280a8d3b1..63dbb5b8eab 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1401,7 +1401,8 @@ struct GTY(()) tree_fixed_cst { struct GTY(()) tree_string { struct tree_typed typed; int length; - char str[1]; + /* Avoid char array that would make whole type to be typeless storage. */ + struct {char c;} str[1]; }; struct GTY(()) tree_complex { diff --git a/gcc/tree.c b/gcc/tree.c index 81f867ddded..84115630184 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */) memcpy (s->string.str, str, len); else memset (s->string.str, 0, len); - s->string.str[len] = '\0'; + s->string.str[len].c = '\0'; return s; }