Message ID | AM5PR0701MB2657F9BAF49A613759D90D13E42E0@AM5PR0701MB2657.eurprd07.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | [Ada] Make middle-end string literals NUL terminated | expand |
Hi Bernd, > On 31 Jul 2018, at 14:07, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > > Hi! > > > This fixes a couple STRING_CST that are not explicitly NUL terminated. > These were caught in a new check in varasm.c I am currently working on. > > Having a NUL terminated string does not change the binary output, but it > makes things easier for he middle-end. > > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? --- gcc/ada/gcc-interface/utils2.c 2017-12-21 07:57:41.000000000 +0100 +++ gcc/ada/gcc-interface/utils2.c 2018-07-31 11:44:01.517117923 +0200 @@ -1844,7 +1844,7 @@ expand_sloc (Node_Id gnat_node, tree *fi } const int len = strlen (str); - *filename = build_string (len, str); + *filename = build_string (len + 1, str); TREE_TYPE (*filename) = build_array_type (char_type_node, build_index_type (size_int (len))); *line = build_int_cst (NULL_TREE, line_number); This one looks good to me. I'm not officially listed as a maintainer so I'll let Eric have the final word. I'm answering because ... --- gcc/ada/gcc-interface/trans.c 2018-07-17 10:10:04.000000000 +0200 +++ gcc/ada/gcc-interface/trans.c 2018-07-31 11:16:27.350728886 +0200 @@ -6079,7 +6079,7 @@ gnat_to_gnu (Node_Id gnat_node) where GCC will want to treat it as a C string. */ string[i] = 0; - gnu_result = build_string (length, string); + gnu_result = build_string (length + 1, string); /* Strings in GCC don't normally have types, but we want this to not be converted to the array type. */ We have a local variant of this one, on which I worked after we realized that it was not enough to achieve the intended effect. The difference at this spot is simply that we prevent the +1 if the original string happens to be explicitly nul terminated already. Like: build_string ((length > 0 && string[length-1] == 0) ? length : length + 1, string); This is however not enough because the extra zero is only conveyed through the string value, not the corresponding type, and varasm.c: mergeable_string_section currently uses the type bounds to search for a zero termination. We can't really change the type, so came up with an adjustment to varasm. The motivation for using the type bounds was https://gcc.gnu.org/ml/gcc-patches/2006-06/msg01004.html which, IIUC, was tightly linked to string constants used as initializers for objects which have a size of their own. (Jakub cc'ed) For string constants not used as initializers, it seems that we might be able to use the string bounds instead, possibly wider. The attached patch does that and passed testing a few weeks ago. So, while we're at it, does that look right, in light of all the string length related issues that have been discussed lately ? I'm not sure I grasped all the ramifications. Thanks in advance! * varasm.c (mergeable_string_section): Accept an extra SCAT argument conveying the section category from which the request originates. Only restrict the search to the string type bounds if we're queried for an initializer. Consider the string bounds otherwise. (default_elf_select_section, STR and STR_INIT): Pass the section category to mergeable_string_section.
On 08/03/18 18:50, Olivier Hainque wrote: > Hi Bernd, > >> On 31 Jul 2018, at 14:07, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: >> >> Hi! >> >> >> This fixes a couple STRING_CST that are not explicitly NUL terminated. >> These were caught in a new check in varasm.c I am currently working on. >> >> Having a NUL terminated string does not change the binary output, but it >> makes things easier for he middle-end. >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > > --- gcc/ada/gcc-interface/utils2.c 2017-12-21 07:57:41.000000000 +0100 > +++ gcc/ada/gcc-interface/utils2.c 2018-07-31 11:44:01.517117923 +0200 > @@ -1844,7 +1844,7 @@ expand_sloc (Node_Id gnat_node, tree *fi > } > > const int len = strlen (str); > - *filename = build_string (len, str); > + *filename = build_string (len + 1, str); > TREE_TYPE (*filename) = build_array_type (char_type_node, > build_index_type (size_int (len))); > *line = build_int_cst (NULL_TREE, line_number); > > This one looks good to me. I'm not officially listed as a maintainer > so I'll let Eric have the final word. I'm answering because ... > > > --- gcc/ada/gcc-interface/trans.c 2018-07-17 10:10:04.000000000 +0200 > +++ gcc/ada/gcc-interface/trans.c 2018-07-31 11:16:27.350728886 +0200 > @@ -6079,7 +6079,7 @@ gnat_to_gnu (Node_Id gnat_node) > where GCC will want to treat it as a C string. */ > string[i] = 0; > > - gnu_result = build_string (length, string); > + gnu_result = build_string (length + 1, string); > > /* Strings in GCC don't normally have types, but we want > this to not be converted to the array type. */ > > We have a local variant of this one, on which I worked after we realized > that it was not enough to achieve the intended effect. > > The difference at this spot is simply that we prevent the +1 if the > original string happens to be explicitly nul terminated already. Like: > > build_string > ((length > 0 && string[length-1] == 0) ? length : length + 1, > string); > > This is however not enough because the extra zero is only conveyed > through the string value, not the corresponding type, and > > varasm.c: mergeable_string_section > > currently uses the type bounds to search for a zero termination. > > We can't really change the type, so came up with an adjustment to > varasm. The motivation for using the type bounds was > > https://gcc.gnu.org/ml/gcc-patches/2006-06/msg01004.html > > which, IIUC, was tightly linked to string constants used as > initializers for objects which have a size of their own. > (Jakub cc'ed) > > For string constants not used as initializers, it seems that > we might be able to use the string bounds instead, possibly > wider. The attached patch does that and passed testing a few weeks > ago. > > So, while we're at it, does that look right, in light of all the > string length related issues that have been discussed lately ? > > I'm not sure I grasped all the ramifications. > > Thanks in advance! > > > * varasm.c (mergeable_string_section): Accept an extra SCAT > argument conveying the section category from which the request > originates. Only restrict the search to the string type bounds > if we're queried for an initializer. Consider the string bounds > otherwise. > (default_elf_select_section, STR and STR_INIT): Pass the section > category to mergeable_string_section. > > > > > Oh, how interesting. My patch only intends to add a dummy byte that is stripped again in the assembly. The purpose is to make it trivial to find out if a string constant is NUL-terminated in the middle-end by comparing TYPE_SIZE_UNIT (TREE_TYPE (decl)) with TREE_STRING_LENGTH (decl). TYPE_SIZE_UNIT >= TREE_STRING_LENGTH means zero terminated TYPE_SIZE_UNIT < TREE_STRING_LENGTH means not necessarily zero terminated, Such strings shall never chop off more than one nul character. Ada strings are generally not zero terminated, right? So in your search loop you would not have to look after type_len, because that byte would be guaranteed to be zero. That is if we agree that we want to restrict the string constants in that way as I proposed. In the case str_len > type_len I do not understand if that is right. Because varasm will only output type_size bytes, this seems only to select a string section but the last nul byte will not be assembled. Something that is not contained in the type_size should not be assembled. What exactly do you want to accomplish? Thanks Bernd.
Hi Olivier, I think I like your idea a lot, it should be highly useful for Fortran and GO as well. Would somthing something like this (untested patch) work for you on top of my patch series. It makes use of the zero-termination properties of STRING_CSTs, that the other patch ensures. I have two C test cases for this: $ cat y3.c const char str[3] = "abc"; $ gcc -fmerge-all-constants -S y3.c $ cat y3.s .file "y3.c" .text .globl str .section .rodata.str1.1,"aMS",@progbits,1 .type str, @object .size str, 3 str: .string "abc" .ident "GCC: (GNU) 9.0.0 20180730 (experimental)" .section .note.GNU-stack,"",@progbits $ cat y4.c const char str[4] = "abc"; $ gcc -fmerge-all-constants -S y4.c $ cat y4.s $ cat y4.s .file "y4.c" .text .globl str .section .rodata.str1.1,"aMS",@progbits,1 .type str, @object .size str, 4 str: .string "abc" .ident "GCC: (GNU) 9.0.0 20180730 (experimental)" .section .note.GNU-stack,"",@progbits The .size directive uses the DECL_SIZE_UNIT, and is still 3. I don't know of that needs to change or not. But what is important that we have .string "abc" and not .ascii "abc". What do you think? Thanks Bernd. Index: gcc/varasm.c =================================================================== --- gcc/varasm.c (revision 263072) +++ gcc/varasm.c (working copy) @@ -111,7 +111,8 @@ static int compare_constant (const tree, const tre static void output_constant_def_contents (rtx); static void output_addressed_constants (tree); static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT, - unsigned int, bool); + unsigned int, bool, + bool = false); static void globalize_decl (tree); static bool decl_readonly_section_1 (enum section_category); #ifdef BSS_SECTION_ASM_OP @@ -827,7 +828,7 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUS unit = GET_MODE_SIZE (mode); /* Check for embedded NUL characters. */ - for (i = 0; i < len; i += unit) + for (i = 0; i < len - unit; i += unit) { for (j = 0; j < unit; j++) if (str[i + j] != '\0') @@ -2117,7 +2118,7 @@ assemble_noswitch_variable (tree decl, const char static void assemble_variable_contents (tree decl, const char *name, - bool dont_output_data) + bool dont_output_data, bool merge_strings = false) { /* Do any machine/system dependent processing of the object. */ #ifdef ASM_DECLARE_OBJECT_NAME @@ -2140,7 +2141,7 @@ assemble_variable_contents (tree decl, const char output_constant (DECL_INITIAL (decl), tree_to_uhwi (DECL_SIZE_UNIT (decl)), get_variable_align (decl), - false); + false, merge_strings); else /* Leave space for it. */ assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl))); @@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_level ATTRIB switch_to_section (sect); if (align > BITS_PER_UNIT) ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT)); - assemble_variable_contents (decl, name, dont_output_data); + assemble_variable_contents (decl, name, dont_output_data, + sect->common.flags & SECTION_MERGE + && sect->common.flags & SECTION_STRINGS); if (asan_protected) { unsigned HOST_WIDE_INT int size @@ -4786,6 +4789,19 @@ static unsigned HOST_WIDE_INT output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int, bool, oc_outer_state *); +/* Find out if a string P of length LEN and character size UNIT + has double nul termination. + The string is already known to have a nul termination at + offset LEN. Find out if there is already a NUL termination + at offset LEN - UNIT. */ + +static bool +redundant_nul_term_p (const char *p, unsigned len, unsigned unit) +{ + gcc_assert(len >= unit); + return !memcmp (p + len, p + len - unit, unit); +} + /* Output assembler code for constant EXP, with no label. This includes the pseudo-op such as ".int" or ".byte", and a newline. Assumes output_addressed_constants has been done on EXP already. @@ -4812,7 +4828,7 @@ output_constructor (tree, unsigned HOST_WIDE_INT, static unsigned HOST_WIDE_INT output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align, - bool reverse) + bool reverse, bool merge_strings /* = false */) { enum tree_code code; unsigned HOST_WIDE_INT thissize; @@ -4940,8 +4956,13 @@ output_constant (tree exp, unsigned HOST_WIDE_INT case CONSTRUCTOR: return output_constructor (exp, size, align, reverse, NULL); case STRING_CST: - thissize - = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size); + thissize = (unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp); + if (thissize > size && merge_strings + && !redundant_nul_term_p (TREE_STRING_POINTER (exp), + size, thissize - size)) + ; + else + thissize = MIN (thissize, size); assemble_string (TREE_STRING_POINTER (exp), thissize); break; case VECTOR_CST:
Hi! This patch is inspired by Olivier's feedback to my previous patch on the zero-termination of Ada STRING_CST. The idea is that strings that do not have embedded nul characters _and_ do not happen to be zero-terminated in the DECL_UNIT_SIZE, are currently not in the merge string sections. To be in the merge string section they need a terminating nul character, that gets written directly in varasm while assembling the string constant. This patch uses the new string properties that my previous patch series implements, and is based on the other patches here: [PATCH] Check the STRING_CSTs in varasm.c https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00331.html [PATCH] Handle overlength strings in the C FE https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00142.html [PATCH] Handle overlength strings in C++ FE https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00045.html Approved: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00105.html [PATCH] Make GO string literals properly NUL terminated https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01931.html [PATCH] [Ada] Make middle-end string literals NUL terminated https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01929.html The existing test case gcc.dg/merge-all-constants-1.c contains two overlength strings that get now stripped down by the C FE, and look in the middle end exactly like normal Ada, Fortran or Go strings. And get now allocated in the merge.str section, unless they have an embedded nul character, which I changed to make the test pass again. Olivier, could you add test cases from the Ada side to this? Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk (after all pre-condition patches are approved/committed)? Thanks Bernd.
Again, with patch.... On 08/04/18 17:43, Bernd Edlinger wrote: > Hi! > > > This patch is inspired by Olivier's feedback to my previous patch on the > zero-termination of Ada STRING_CST. > > The idea is that strings that do not have embedded nul characters _and_ > do not happen to be zero-terminated in the DECL_UNIT_SIZE, are currently > not in the merge string sections. To be in the merge string section > they need a terminating nul character, that gets written directly > in varasm while assembling the string constant. This patch uses > the new string properties that my previous patch series implements, > and is based on the other patches here: > > [PATCH] Check the STRING_CSTs in varasm.c > https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00331.html > > [PATCH] Handle overlength strings in the C FE > https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00142.html > > [PATCH] Handle overlength strings in C++ FE > https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00045.html > Approved: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00105.html > > [PATCH] Make GO string literals properly NUL terminated > https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01931.html > > [PATCH] [Ada] Make middle-end string literals NUL terminated > https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01929.html > > > The existing test case gcc.dg/merge-all-constants-1.c > contains two overlength strings that get now stripped down > by the C FE, and look in the middle end exactly like normal > Ada, Fortran or Go strings. And get now allocated > in the merge.str section, unless they have an embedded > nul character, which I changed to make the test pass again. > > Olivier, could you add test cases from the Ada side to this? > > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk (after all pre-condition patches > are approved/committed)? > > > Thanks > Bernd. gcc: 2018-08-04 Bernd Edlinger <bernd.edlinger@hotmail.de> * varasm.c (output_constant): Add new parameter merge_strings. (assemble_variable_contents): Likewise. (mergeable_string_section): Don't fail if the last char is non-zero. (assemble_variable): Pass merge_strings for string merge sections. (redundant_nul_term_p): New helper function. (output_constant): Make strings properly zero terminated in merge string sections. testsuite: 2018-08-04 Bernd Edlinger <bernd.edlinger@hotmail.de> * gcc.dg/merge-all-constants-1.c: Adjust test. Index: gcc/varasm.c =================================================================== --- gcc/varasm.c (revision 263072) +++ gcc/varasm.c (working copy) @@ -111,7 +111,8 @@ static int compare_constant (const tree, const tre static void output_constant_def_contents (rtx); static void output_addressed_constants (tree); static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT, - unsigned int, bool); + unsigned int, bool, + bool = false); static void globalize_decl (tree); static bool decl_readonly_section_1 (enum section_category); #ifdef BSS_SECTION_ASM_OP @@ -827,7 +828,7 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUS unit = GET_MODE_SIZE (mode); /* Check for embedded NUL characters. */ - for (i = 0; i < len; i += unit) + for (i = 0; i < len - unit; i += unit) { for (j = 0; j < unit; j++) if (str[i + j] != '\0') @@ -2117,7 +2118,7 @@ assemble_noswitch_variable (tree decl, const char static void assemble_variable_contents (tree decl, const char *name, - bool dont_output_data) + bool dont_output_data, bool merge_strings = false) { /* Do any machine/system dependent processing of the object. */ #ifdef ASM_DECLARE_OBJECT_NAME @@ -2140,7 +2141,7 @@ assemble_variable_contents (tree decl, const char output_constant (DECL_INITIAL (decl), tree_to_uhwi (DECL_SIZE_UNIT (decl)), get_variable_align (decl), - false); + false, merge_strings); else /* Leave space for it. */ assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl))); @@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_level ATTRIB switch_to_section (sect); if (align > BITS_PER_UNIT) ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT)); - assemble_variable_contents (decl, name, dont_output_data); + assemble_variable_contents (decl, name, dont_output_data, + sect->common.flags & SECTION_MERGE + && sect->common.flags & SECTION_STRINGS); if (asan_protected) { unsigned HOST_WIDE_INT int size @@ -4786,6 +4789,19 @@ static unsigned HOST_WIDE_INT output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int, bool, oc_outer_state *); +/* Find out if a string P of length LEN and character size UNIT + has double nul termination. + The string is already known to have a nul termination at + offset LEN. Find out if there is already a NUL termination + at offset LEN - UNIT. */ + +static bool +redundant_nul_term_p (const char *p, unsigned len, unsigned unit) +{ + gcc_assert(len >= unit); + return !memcmp (p + len, p + len - unit, unit); +} + /* Output assembler code for constant EXP, with no label. This includes the pseudo-op such as ".int" or ".byte", and a newline. Assumes output_addressed_constants has been done on EXP already. @@ -4812,7 +4828,7 @@ output_constructor (tree, unsigned HOST_WIDE_INT, static unsigned HOST_WIDE_INT output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align, - bool reverse) + bool reverse, bool merge_strings /* = false */) { enum tree_code code; unsigned HOST_WIDE_INT thissize; @@ -4940,8 +4956,13 @@ output_constant (tree exp, unsigned HOST_WIDE_INT case CONSTRUCTOR: return output_constructor (exp, size, align, reverse, NULL); case STRING_CST: - thissize - = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size); + thissize = (unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp); + if (thissize > size && merge_strings + && !redundant_nul_term_p (TREE_STRING_POINTER (exp), + size, thissize - size)) + ; + else + thissize = MIN (thissize, size); gcc_checking_assert (check_string_literal (exp, thissize)); assemble_string (TREE_STRING_POINTER (exp), thissize); break; case VECTOR_CST: Index: gcc/testsuite/gcc.dg/merge-all-constants-1.c =================================================================== --- gcc/testsuite/gcc.dg/merge-all-constants-1.c (revision 263072) +++ gcc/testsuite/gcc.dg/merge-all-constants-1.c (working copy) @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-w -O2 -fmerge-all-constants" } */ -const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz"; +const char str1[36] = "\000123456789abcdefghijklmnopqrstuvwxyz"; const char str2[38] = "0123456789abcdefghijklmnopqrstuvwxyz"; -const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz"; +const char str3[10] = "\000123456789abcdefghijklmnopqrstuvwxyz"; /* { dg-final { scan-assembler-not "\.rodata\.str" } } */
Aehm, I forgot the Fortran FE patch which is also a pre-condition (at least for building with all languages): [PATCH] Create internally nul terminated string literals in fortan FE https://gcc.gnu.org/ml/fortran/2018-08/msg00000.html Thanks Bernd. On 08/04/18 17:44, Bernd Edlinger wrote: > Again, with patch.... > > On 08/04/18 17:43, Bernd Edlinger wrote: >> Hi! >> >> >> This patch is inspired by Olivier's feedback to my previous patch on the >> zero-termination of Ada STRING_CST. >> >> The idea is that strings that do not have embedded nul characters _and_ >> do not happen to be zero-terminated in the DECL_UNIT_SIZE, are currently >> not in the merge string sections. To be in the merge string section >> they need a terminating nul character, that gets written directly >> in varasm while assembling the string constant. This patch uses >> the new string properties that my previous patch series implements, >> and is based on the other patches here: >> >> [PATCH] Check the STRING_CSTs in varasm.c >> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00331.html >> >> [PATCH] Handle overlength strings in the C FE >> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00142.html >> >> [PATCH] Handle overlength strings in C++ FE >> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00045.html >> Approved: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00105.html >> >> [PATCH] Make GO string literals properly NUL terminated >> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01931.html >> >> [PATCH] [Ada] Make middle-end string literals NUL terminated >> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01929.html >> >> >> The existing test case gcc.dg/merge-all-constants-1.c >> contains two overlength strings that get now stripped down >> by the C FE, and look in the middle end exactly like normal >> Ada, Fortran or Go strings. And get now allocated >> in the merge.str section, unless they have an embedded >> nul character, which I changed to make the test pass again. >> >> Olivier, could you add test cases from the Ada side to this? >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk (after all pre-condition patches >> are approved/committed)? >> >> >> Thanks >> Bernd.
Hi Bernd, Things are moving fast so I'll answer your messages incrementally :-) > On 3 Aug 2018, at 20:37, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: [...] > Oh, how interesting. > > My patch only intends to add a dummy byte that is stripped again > in the assembly. The purpose is to make it trivial to find out if > a string constant is NUL-terminated in the middle-end by comparing > TYPE_SIZE_UNIT (TREE_TYPE (decl)) with TREE_STRING_LENGTH (decl). Ah, Ok. I hadn't quite realized that yet. > TYPE_SIZE_UNIT >= TREE_STRING_LENGTH means zero terminated > TYPE_SIZE_UNIT < TREE_STRING_LENGTH means not necessarily zero terminated, That is opposite to the situation we were having for Ada before the patch series: > Such strings shall never chop off more than one nul character. > Ada strings are generally not zero terminated, right? Right. Strings have explicit bounds in Ada, and a NUL character is like any other. It can appear anywhere. For example, the excerpt below constructs a string of 5 characters, bounds 1 to 5, with a nul character in second position: X : String (1 .. 5) := "1" & Ascii.Nul & "345"; ('&' is the catenation operator) The problem we had was that string literals not explicitly NUL terminated (as in X : String := "123" & Ascii.NUL) would never go in mergeable sections. For example: Ada.Text_IO.Put_Line ("Hello World!"); Ada has so called "generic" units that are sometimes instanciated many many times so there is real interest in improving the situation there. > So in your search loop you would not have to look after > type_len, because that byte would be guaranteed to be zero. > > That is if we agree that we want to restrict the string constants > in that way as I proposed. > > In the case str_len > type_len I do not understand if that is right. In your new model, I don't know what sense it would make, indeed. In the previous model, it was clearly expected to be a possibility. > Because varasm will only output type_size bytes, > this seems only to select a string section > but the last nul byte will not be assembled. > Something that is not contained in the type_size > should not be assembled. > > What exactly do you want to accomplish? Based on the previous (say, gcc-7 or gcc-8) code base, arrange for most Ada string literals to end up in a mergeable section on ELF targets. In at least gcc-7 and gcc-8, our gigi adjustment + the patch I posted achieve this effect. For example, for the following source: procedure Hello is procedure Process (S : String) with Import, Convention => C; begin Process ("1234"); end; Without the gigi change, I get (x86_64-linux, -O2 -fmerge-all-constants): .section .rodata .LC0: .ascii "1234" Regular section, no terminating zero. The gigi change alone gets us the terminating nul (*), but not yet a mergeable section, because the extra nul isn't covered by the type bounds: .section .rodata .LC0: .string "1234" With the varasm change on top, checking beyond the type bounds when the string constant isn't an initializer, we get: .section .rodata.str1.1,"aMS",@progbits,1 .LC0: .string "1234" (*) The terminating zero, part of the string value, not spanned by the type bounds, gets assembled through: assemble_constant_contents (...) ... size = get_constant_size (exp); then: get_constant_size (tree exp) size = int_size_in_bytes (TREE_TYPE (exp)); if (TREE_CODE (exp) == STRING_CST) size = MAX (TREE_STRING_LENGTH (exp), size); Again, this is just providing more context on the original patch I posted, based on your questions. I understand there are proposals to change things pretty significantly in this area, so ... now on to your next messages and ideas :-) Olivier
Hi, I realized that the previous version did not handle zero terminated Ada constants correctly as in $ cat hello.adb procedure Hello is procedure puts (S : String) with Import, Convention => C; X : String(1..8) := "abcdefg" & Ascii.Nul; begin puts ("1234" & Ascii.Nul ); puts (X); end; accidentally strings were doubly nul-terminated, because I forgot to handle merge string sections in assemble_constant_contents, and because get_constant_size increased the string literal by one. Fixed, and added a positive test case for the merging non-zero terminated strings in C. Boot-strapped and regtested on x86_64-pc-linux-gnu. Is it OK for trunk (after pre-condition patches)? Thanks Bernd. gcc: 2018-08-04 Bernd Edlinger <bernd.edlinger@hotmail.de> * varasm.c (output_constant): Add new parameter merge_strings. Make strings properly zero terminated in merge string sections. (mergeable_string_section): Don't fail if the last char is non-zero. (assemble_variable_contents): Handle merge string sections. (assemble_variable): Likewise. (assemble_constant_contents): Likewise. (output_constant_def_contents): Likewise. (get_constant_size): Remove special handling of STRING_CSTs. (redundant_nul_term_p): New helper function. testsuite: 2018-08-04 Bernd Edlinger <bernd.edlinger@hotmail.de> * gcc.dg/merge-all-constants-1.c: Adjust test. * gcc.dg/merge-all-constants-2.c: New test. Index: gcc/varasm.c =================================================================== --- gcc/varasm.c (revision 263072) +++ gcc/varasm.c (working copy) @@ -111,7 +111,8 @@ static int compare_constant (const tree, const tre static void output_constant_def_contents (rtx); static void output_addressed_constants (tree); static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT, - unsigned int, bool); + unsigned int, bool, + bool = false); static void globalize_decl (tree); static bool decl_readonly_section_1 (enum section_category); #ifdef BSS_SECTION_ASM_OP @@ -827,7 +828,7 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUS unit = GET_MODE_SIZE (mode); /* Check for embedded NUL characters. */ - for (i = 0; i < len; i += unit) + for (i = 0; i < len - unit; i += unit) { for (j = 0; j < unit; j++) if (str[i + j] != '\0') @@ -2117,7 +2118,7 @@ assemble_noswitch_variable (tree decl, const char static void assemble_variable_contents (tree decl, const char *name, - bool dont_output_data) + bool dont_output_data, bool merge_strings = false) { /* Do any machine/system dependent processing of the object. */ #ifdef ASM_DECLARE_OBJECT_NAME @@ -2140,7 +2141,7 @@ assemble_variable_contents (tree decl, const char output_constant (DECL_INITIAL (decl), tree_to_uhwi (DECL_SIZE_UNIT (decl)), get_variable_align (decl), - false); + false, merge_strings); else /* Leave space for it. */ assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl))); @@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_level ATTRIB switch_to_section (sect); if (align > BITS_PER_UNIT) ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT)); - assemble_variable_contents (decl, name, dont_output_data); + assemble_variable_contents (decl, name, dont_output_data, + sect->common.flags & SECTION_MERGE + && sect->common.flags & SECTION_STRINGS); if (asan_protected) { unsigned HOST_WIDE_INT int size @@ -3300,12 +3303,7 @@ get_constant_section (tree exp, unsigned int align static HOST_WIDE_INT get_constant_size (tree exp) { - HOST_WIDE_INT size; - - size = int_size_in_bytes (TREE_TYPE (exp)); - if (TREE_CODE (exp) == STRING_CST) - size = MAX (TREE_STRING_LENGTH (exp), size); - return size; + return int_size_in_bytes (TREE_TYPE (exp)); } /* Subroutine of output_constant_def: @@ -3468,7 +3466,8 @@ maybe_output_constant_def_contents (struct constan constant's alignment in bits. */ static void -assemble_constant_contents (tree exp, const char *label, unsigned int align) +assemble_constant_contents (tree exp, const char *label, unsigned int align, + bool merge_strings = false) { HOST_WIDE_INT size; @@ -3478,7 +3477,7 @@ static void targetm.asm_out.declare_constant_name (asm_out_file, label, exp, size); /* Output the value of EXP. */ - output_constant (exp, size, align, false); + output_constant (exp, size, align, false, merge_strings); targetm.asm_out.decl_end (); } @@ -3519,10 +3518,13 @@ output_constant_def_contents (rtx symbol) || (VAR_P (decl) && DECL_IN_CONSTANT_POOL (decl)) ? DECL_ALIGN (decl) : symtab_node::get (decl)->definition_alignment ()); - switch_to_section (get_constant_section (exp, align)); + section *sect = get_constant_section (exp, align); + switch_to_section (sect); if (align > BITS_PER_UNIT) ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT)); - assemble_constant_contents (exp, XSTR (symbol, 0), align); + assemble_constant_contents (exp, XSTR (symbol, 0), align, + sect->common.flags & SECTION_MERGE + && sect->common.flags & SECTION_STRINGS); if (asan_protected) { HOST_WIDE_INT size = get_constant_size (exp); @@ -4786,6 +4789,19 @@ static unsigned HOST_WIDE_INT output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int, bool, oc_outer_state *); +/* Find out if a string P of length LEN and character size UNIT + has double nul termination. + The string is already known to have a nul termination at + offset LEN. Find out if there is already a NUL termination + at offset LEN - UNIT. */ + +static bool +redundant_nul_term_p (const char *p, unsigned len, unsigned unit) +{ + gcc_assert(len >= unit); + return !memcmp (p + len, p + len - unit, unit); +} + /* Output assembler code for constant EXP, with no label. This includes the pseudo-op such as ".int" or ".byte", and a newline. Assumes output_addressed_constants has been done on EXP already. @@ -4812,7 +4850,7 @@ output_constructor (tree, unsigned HOST_WIDE_INT, static unsigned HOST_WIDE_INT output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align, - bool reverse) + bool reverse, bool merge_strings /* = false */) { enum tree_code code; unsigned HOST_WIDE_INT thissize; @@ -4940,8 +4978,13 @@ output_constant (tree exp, unsigned HOST_WIDE_INT case CONSTRUCTOR: return output_constructor (exp, size, align, reverse, NULL); case STRING_CST: - thissize - = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size); + thissize = (unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp); + if (thissize > size && merge_strings + && !redundant_nul_term_p (TREE_STRING_POINTER (exp), + size, thissize - size)) + ; + else + thissize = MIN (thissize, size); gcc_checking_assert (check_string_literal (exp, size)); assemble_string (TREE_STRING_POINTER (exp), thissize); break; Index: gcc/testsuite/gcc.dg/merge-all-constants-1.c =================================================================== --- gcc/testsuite/gcc.dg/merge-all-constants-1.c (revision 263072) +++ gcc/testsuite/gcc.dg/merge-all-constants-1.c (working copy) @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-w -O2 -fmerge-all-constants" } */ -const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz"; +const char str1[36] = "\000123456789abcdefghijklmnopqrstuvwxyz"; const char str2[38] = "0123456789abcdefghijklmnopqrstuvwxyz"; -const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz"; +const char str3[10] = "\000123456789abcdefghijklmnopqrstuvwxyz"; /* { dg-final { scan-assembler-not "\.rodata\.str" } } */ Index: gcc/testsuite/gcc.dg/merge-all-constants-2.c =================================================================== --- gcc/testsuite/gcc.dg/merge-all-constants-2.c (revision 0) +++ gcc/testsuite/gcc.dg/merge-all-constants-2.c (working copy) --- /dev/null 2018-07-24 12:25:16.122475518 +0200 +++ gcc/testsuite/gcc.dg/merge-all-constants-2.c 2018-08-07 10:14:35.384190377 +0200 @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-w -O2 -fmerge-all-constants" } */ + +const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz"; +const char str2[37] = "0123456789abcdefghijklmnopqrstuvwxyz"; +const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz"; + +/* { dg-final { scan-assembler-not "\.rodata\[\n\r\]" } } */
Hi Olivier, On 08/06/18 20:01, Olivier Hainque wrote: > Hi Bernd, > > Things are moving fast so I'll answer your > messages incrementally :-) > >> On 3 Aug 2018, at 20:37, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > [...] >> Oh, how interesting. >> >> My patch only intends to add a dummy byte that is stripped again >> in the assembly. The purpose is to make it trivial to find out if >> a string constant is NUL-terminated in the middle-end by comparing >> TYPE_SIZE_UNIT (TREE_TYPE (decl)) with TREE_STRING_LENGTH (decl). > > Ah, Ok. I hadn't quite realized that yet. > >> TYPE_SIZE_UNIT >= TREE_STRING_LENGTH means zero terminated >> TYPE_SIZE_UNIT < TREE_STRING_LENGTH means not necessarily zero terminated, > > That is opposite to the situation we were having > for Ada before the patch series: > >> Such strings shall never chop off more than one nul character. >> Ada strings are generally not zero terminated, right? > > Right. Strings have explicit bounds in Ada, and a NUL character > is like any other. It can appear anywhere. For example, the > excerpt below constructs a string of 5 characters, bounds 1 to 5, > with a nul character in second position: > > X : String (1 .. 5) := "1" & Ascii.Nul & "345"; > > ('&' is the catenation operator) > > The problem we had was that string literals not explicitly > NUL terminated (as in X : String := "123" & Ascii.NUL) would > never go in mergeable sections. For example: > > Ada.Text_IO.Put_Line ("Hello World!"); > > Ada has so called "generic" units that are sometimes > instanciated many many times so there is real interest > in improving the situation there. > >> So in your search loop you would not have to look after >> type_len, because that byte would be guaranteed to be zero. >> >> That is if we agree that we want to restrict the string constants >> in that way as I proposed. >> >> In the case str_len > type_len I do not understand if that is right. > > In your new model, I don't know what sense it would make, indeed. > > In the previous model, it was clearly expected to be a possibility. > >> Because varasm will only output type_size bytes, >> this seems only to select a string section >> but the last nul byte will not be assembled. >> Something that is not contained in the type_size >> should not be assembled. >> >> What exactly do you want to accomplish? > > Based on the previous (say, gcc-7 or gcc-8) code base, arrange > for most Ada string literals to end up in a mergeable section > on ELF targets. > > In at least gcc-7 and gcc-8, our gigi adjustment + the > patch I posted achieve this effect. > > For example, for the following source: > > procedure Hello is > procedure Process (S : String) with Import, Convention => C; > begin > Process ("1234"); > end; > Yes, thanks for this example, $ cat hello.adb procedure Hello is procedure puts (S : String) with Import, Convention => C; X : String(1..8) := "abcdefg" & Ascii.Nul; begin puts ("1234" & Ascii.Nul ); puts (X); end; produces again identical output with the latest patch that I posted right now. > Without the gigi change, I get (x86_64-linux, > -O2 -fmerge-all-constants): > > .section .rodata > .LC0: > .ascii "1234" > > Regular section, no terminating zero. > > The gigi change alone gets us the terminating nul (*), > but not yet a mergeable section, because the extra nul > isn't covered by the type bounds: > > .section .rodata > .LC0: > .string "1234" > > With the varasm change on top, checking beyond the > type bounds when the string constant isn't an initializer, > we get: > > .section .rodata.str1.1,"aMS",@progbits,1 > .LC0: > .string "1234" > > (*) The terminating zero, part of the string value, > not spanned by the type bounds, gets assembled through: > > assemble_constant_contents (...) > ... > size = get_constant_size (exp); > > then: > > get_constant_size (tree exp) > > size = int_size_in_bytes (TREE_TYPE (exp)); > if (TREE_CODE (exp) == STRING_CST) > size = MAX (TREE_STRING_LENGTH (exp), size); > > Again, this is just providing more context on the > original patch I posted, based on your questions. > Ah, good point. That needs to be changed... Actually I would like to have get_constant_size return just the type size. > I understand there are proposals to change things > pretty significantly in this area, so ... > > now on to your next messages and ideas :-) > > When I try this example: $ cat array9.adb -- { dg-do run } procedure Array9 is V1 : String(1..10) := "1234567890"; V2 : String(1..-1) := ""; procedure Compare (S : String) is begin if S'Size /= 8*S'Length then raise Program_Error; end if; end; begin Compare (""); Compare ("1234"); Compare (V1); Compare (V2); end; I see that "1234" is put in the merge section, but V1 is not. Maybe because of the alignment requirement? But it sould not be much different from the C test case, which is now able to merge the non-zero terminated strings. Thanks Bernd.
Hi Bernd, (Thanks for your interest in the Ada case as well :) > On 7 Aug 2018, at 15:07, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > When I try this example: > > $ cat array9.adb > -- { dg-do run } > > procedure Array9 is > > V1 : String(1..10) := "1234567890"; > V2 : String(1..-1) := ""; > > procedure Compare (S : String) is > begin > if S'Size /= 8*S'Length then > raise Program_Error; > end if; > end; > > begin > Compare (""); > Compare ("1234"); > Compare (V1); > Compare (V2); > end; > > I see that "1234" is put in the merge section, > but V1 is not. Maybe because of the alignment requirement? > > But it sould not be much different from the C test case, > which is now able to merge the non-zero terminated strings. I'm happy to have a look. I'd just like to make sure I'll be looking at the right thing: I would need to start from trunk + the patches you referenced at https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00339.html + the last one you sent at https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00481.html Correct ? Can you then confirm the target triplet and compilation options ? Thanks in advance! Olivier
On 08/07/18 18:40, Olivier Hainque wrote: > Hi Bernd, > > (Thanks for your interest in the Ada case as well :) > >> On 7 Aug 2018, at 15:07, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > >> When I try this example: >> >> $ cat array9.adb >> -- { dg-do run } >> >> procedure Array9 is >> >> V1 : String(1..10) := "1234567890"; >> V2 : String(1..-1) := ""; >> >> procedure Compare (S : String) is >> begin >> if S'Size /= 8*S'Length then >> raise Program_Error; >> end if; >> end; >> >> begin >> Compare (""); >> Compare ("1234"); >> Compare (V1); >> Compare (V2); >> end; >> >> I see that "1234" is put in the merge section, >> but V1 is not. Maybe because of the alignment requirement? >> >> But it sould not be much different from the C test case, >> which is now able to merge the non-zero terminated strings. > > I'm happy to have a look. I'd just like to make > sure I'll be looking at the right thing: > > I would need to start from trunk + the patches you > referenced at > > https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00339.html > > + the last one you sent at > > https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00481.html > > Correct ? > Yes, Please use the latest patch patch here: [PATCH] Check the STRING_CSTs in varasm.c https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00361.html There is also a Fortran patch: [PATCH] Create internally nul terminated string literals in fortan FE https://gcc.gnu.org/ml/fortran/2018-08/msg00000.html And a JIT FE patch: [PATCH] Fix not properly nul-terminated string constants in JIT https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00370.html My host triplet is x86_64-pc-linux-gnu (I let config.guess determine it). I use gmp, mpfr, mpc, isl in-tree. And configure simply with ../gcc-trunk/configure --prefix=/home/ed/gnu/install --enable-languages=all Thanks Bernd. > Can you then confirm the target triplet and > compilation options ? > > Thanks in advance! > > Olivier > > > >
Hi Bernd, > On 07 Aug 2018, at 19:10, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > >>> procedure Array9 is >>> I see that "1234" is put in the merge section, >>> but V1 is not. Maybe because of the alignment requirement? >>> >>> But it sould not be much different from the C test case, >>> which is now able to merge the non-zero terminated strings. >> >> I'm happy to have a look. An effect of alignments and inlining I think. I see expected behavior with calls to an external program instead of the local one. I tried the testcases we have here plus a couple of variations, e.g. with strings explicitly terminated by multiple ascii.nul, and they all seem to be working fine :-) Regarding Ada testcases for the dejagnu suite, I'm not quite sure how to go about them. For starters at least, we could for example contemplate something like: -- { dg-do compile } -- { dg-options "-O1 -fmerge-all-constants" } procedure String_Merge1 is procedure Process (X : String); pragma Import (Ada, Process); begin Process ("1234"); Process ("ABCD" & Ascii.NUL); end; -- We expect something like: -- .section .rodata.str1.1,"aMS",@progbits,1 -- .LC2: -- .string "1234" -- .LC3: -- .string "ABCD" -- Not clear how to match that generally enough. -- { dg-final { scan-assembler ".rodata.str1.1,\"aMS\"\[^\n\r\]*\n\.LC.:\n\[^\n\ \r\]*\.string \"1234\"\n\.LC.:\n\[^\n\r\]*\.string \"ABCD\"\n" } } As the comment suggests, I'm not clear if that's robust/general enough. I suppose we probably can't always expect .string, for instance. I suppose we could also have other bits between the .section and the literals. Would be fine as long as it's not something like ".section .rodata". Olivier
Hi, this is an update of the patch, which has just two Ada test cases added for string merging which are based on Olivier's suggested test case. Note that except the "Check STRING_CSTs in varasm.c" patch series, there is now another patch needed to work around issues with 71625: [PATCH] Call braced_list_to_string after array size is fixed https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01015.html Thanks Bernd. On 08/07/18 14:55, Bernd Edlinger wrote: > Hi, > > I realized that the previous version did not handle > zero terminated Ada constants correctly as in > > $ cat hello.adb > procedure Hello is > procedure puts (S : String) with Import, Convention => C; > X : String(1..8) := "abcdefg" & Ascii.Nul; > begin > puts ("1234" & Ascii.Nul ); > puts (X); > end; > > accidentally strings were doubly nul-terminated, because I forgot to > handle merge string sections in assemble_constant_contents, and > because get_constant_size increased the string literal by one. > > Fixed, and added a positive test case for the merging non-zero terminated > strings in C. > > > Boot-strapped and regtested on x86_64-pc-linux-gnu. > Is it OK for trunk (after pre-condition patches)? > > > Thanks > Bernd. gcc: 2018-08-04 Bernd Edlinger <bernd.edlinger@hotmail.de> * varasm.c (output_constant): Add new parameter merge_strings. Make strings properly zero terminated in merge string sections. (mergeable_string_section): Don't fail if the last char is non-zero. (assemble_variable_contents): Handle merge string sections. (assemble_variable): Likewise. (assemble_constant_contents): Likewise. (output_constant_def_contents): Likewise. (get_constant_size): Remove special handling of STRING_CSTs. (redundant_nul_term_p): New helper function. testsuite: 2018-08-04 Bernd Edlinger <bernd.edlinger@hotmail.de> * gnat.dg/string_merge1.adb: New test. * gnat.dg/string_merge2.adb: New test. * gcc.dg/merge-all-constants-1.c: Adjust test. * gcc.dg/merge-all-constants-2.c: New test. Index: gcc/varasm.c =================================================================== --- gcc/varasm.c (revision 263072) +++ gcc/varasm.c (working copy) @@ -111,7 +111,8 @@ static int compare_constant (const tree, const tre static void output_constant_def_contents (rtx); static void output_addressed_constants (tree); static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT, - unsigned int, bool); + unsigned int, bool, + bool = false); static void globalize_decl (tree); static bool decl_readonly_section_1 (enum section_category); #ifdef BSS_SECTION_ASM_OP @@ -827,7 +828,7 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUS unit = GET_MODE_SIZE (mode); /* Check for embedded NUL characters. */ - for (i = 0; i < len; i += unit) + for (i = 0; i < len - unit; i += unit) { for (j = 0; j < unit; j++) if (str[i + j] != '\0') @@ -2117,7 +2118,7 @@ assemble_noswitch_variable (tree decl, const char static void assemble_variable_contents (tree decl, const char *name, - bool dont_output_data) + bool dont_output_data, bool merge_strings = false) { /* Do any machine/system dependent processing of the object. */ #ifdef ASM_DECLARE_OBJECT_NAME @@ -2140,7 +2141,7 @@ assemble_variable_contents (tree decl, const char output_constant (DECL_INITIAL (decl), tree_to_uhwi (DECL_SIZE_UNIT (decl)), get_variable_align (decl), - false); + false, merge_strings); else /* Leave space for it. */ assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl))); @@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_level ATTRIB switch_to_section (sect); if (align > BITS_PER_UNIT) ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT)); - assemble_variable_contents (decl, name, dont_output_data); + assemble_variable_contents (decl, name, dont_output_data, + sect->common.flags & SECTION_MERGE + && sect->common.flags & SECTION_STRINGS); if (asan_protected) { unsigned HOST_WIDE_INT int size @@ -3300,12 +3303,7 @@ get_constant_section (tree exp, unsigned int align static HOST_WIDE_INT get_constant_size (tree exp) { - HOST_WIDE_INT size; - - size = int_size_in_bytes (TREE_TYPE (exp)); - if (TREE_CODE (exp) == STRING_CST) - size = MAX (TREE_STRING_LENGTH (exp), size); - return size; + return int_size_in_bytes (TREE_TYPE (exp)); } /* Subroutine of output_constant_def: @@ -3468,7 +3466,8 @@ maybe_output_constant_def_contents (struct constan constant's alignment in bits. */ static void -assemble_constant_contents (tree exp, const char *label, unsigned int align) +assemble_constant_contents (tree exp, const char *label, unsigned int align, + bool merge_strings = false) { HOST_WIDE_INT size; @@ -3478,7 +3477,7 @@ static void targetm.asm_out.declare_constant_name (asm_out_file, label, exp, size); /* Output the value of EXP. */ - output_constant (exp, size, align, false); + output_constant (exp, size, align, false, merge_strings); targetm.asm_out.decl_end (); } @@ -3519,10 +3518,13 @@ output_constant_def_contents (rtx symbol) || (VAR_P (decl) && DECL_IN_CONSTANT_POOL (decl)) ? DECL_ALIGN (decl) : symtab_node::get (decl)->definition_alignment ()); - switch_to_section (get_constant_section (exp, align)); + section *sect = get_constant_section (exp, align); + switch_to_section (sect); if (align > BITS_PER_UNIT) ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT)); - assemble_constant_contents (exp, XSTR (symbol, 0), align); + assemble_constant_contents (exp, XSTR (symbol, 0), align, + sect->common.flags & SECTION_MERGE + && sect->common.flags & SECTION_STRINGS); if (asan_protected) { HOST_WIDE_INT size = get_constant_size (exp); @@ -4786,6 +4789,19 @@ static unsigned HOST_WIDE_INT output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int, bool, oc_outer_state *); +/* Find out if a string P of length LEN and character size UNIT + has double nul termination. + The string is already known to have a nul termination at + offset LEN. Find out if there is already a NUL termination + at offset LEN - UNIT. */ + +static bool +redundant_nul_term_p (const char *p, unsigned len, unsigned unit) +{ + gcc_assert(len >= unit); + return !memcmp (p + len, p + len - unit, unit); +} + /* Output assembler code for constant EXP, with no label. This includes the pseudo-op such as ".int" or ".byte", and a newline. Assumes output_addressed_constants has been done on EXP already. @@ -4812,7 +4850,7 @@ output_constructor (tree, unsigned HOST_WIDE_INT, static unsigned HOST_WIDE_INT output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align, - bool reverse) + bool reverse, bool merge_strings /* = false */) { enum tree_code code; unsigned HOST_WIDE_INT thissize; @@ -4940,8 +4978,13 @@ output_constant (tree exp, unsigned HOST_WIDE_INT case CONSTRUCTOR: return output_constructor (exp, size, align, reverse, NULL); case STRING_CST: - thissize - = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size); + thissize = (unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp); + if (thissize > size && merge_strings + && !redundant_nul_term_p (TREE_STRING_POINTER (exp), + size, thissize - size)) + ; + else + thissize = MIN (thissize, size); gcc_checking_assert (check_string_literal (exp, size)); assemble_string (TREE_STRING_POINTER (exp), thissize); break; Index: gcc/testsuite/gnat.dg/string_merge1.adb =================================================================== --- gcc/testsuite/gnat.dg/string_merge1.adb (revision 0) +++ gcc/testsuite/gnat.dg/string_merge1.adb (working copy) @@ -0,0 +1,19 @@ +-- { dg-do compile } +-- { dg-options "-O1 -fmerge-all-constants" } + +procedure String_Merge1 is + procedure Process (X : String); + pragma Import (Ada, Process); +begin + Process ("ABCD"); +end; + +-- We expect something like: + +-- .section .rodata.str1.1,"aMS",@progbits,1 +-- .LC1: +-- .string "ABCD" + +-- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } } +-- { dg-final { scan-assembler-times "\\.string" 1 } } +-- { dg-final { scan-assembler-times "\"ABCD\"" 1 } } Index: gcc/testsuite/gnat.dg/string_merge2.adb =================================================================== --- gcc/testsuite/gnat.dg/string_merge2.adb (revision 0) +++ gcc/testsuite/gnat.dg/string_merge2.adb (working copy) @@ -0,0 +1,19 @@ +-- { dg-do compile } +-- { dg-options "-O1 -fmerge-all-constants" } + +procedure String_Merge2 is + procedure Process (X : String); + pragma Import (Ada, Process); +begin + Process ("ABCD" & Ascii.NUL); +end; + +-- We expect something like: + +-- .section .rodata.str1.1,"aMS",@progbits,1 +-- .LC1: +-- .string "ABCD" + +-- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } } +-- { dg-final { scan-assembler-times "\\.string" 1 } } +-- { dg-final { scan-assembler-times "\"ABCD\"" 1 } } Index: gcc/testsuite/gcc.dg/merge-all-constants-1.c =================================================================== --- gcc/testsuite/gcc.dg/merge-all-constants-1.c (revision 263072) +++ gcc/testsuite/gcc.dg/merge-all-constants-1.c (working copy) @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-w -O2 -fmerge-all-constants" } */ -const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz"; +const char str1[36] = "\000123456789abcdefghijklmnopqrstuvwxyz"; const char str2[38] = "0123456789abcdefghijklmnopqrstuvwxyz"; -const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz"; +const char str3[10] = "\000123456789abcdefghijklmnopqrstuvwxyz"; -/* { dg-final { scan-assembler-not "\.rodata\.str" } } */ +/* { dg-final { scan-assembler-not "\\.rodata\\.str" } } */ Index: gcc/testsuite/gcc.dg/merge-all-constants-2.c =================================================================== --- gcc/testsuite/gcc.dg/merge-all-constants-2.c (revision 0) +++ gcc/testsuite/gcc.dg/merge-all-constants-2.c (working copy) @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-w -O2 -fmerge-all-constants" } */ + +const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz"; +const char str2[37] = "0123456789abcdefghijklmnopqrstuvwxyz"; +const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz"; + +/* { dg-final { scan-assembler-not "\\.rodata\[\n\r\]" } } */
2018-07-31 Bernd Edlinger <bernd.edlinger@hotmail.de> * gcc-interface/trans.c (gnat_to_gnu): Make string literal properly NUL terminated. * gcc-interface/utils2.c (expand_sloc): Likewise. diff -pur gcc/ada/gcc-interface/trans.c gcc/ada/gcc-interface/trans.c --- gcc/ada/gcc-interface/trans.c 2018-07-17 10:10:04.000000000 +0200 +++ gcc/ada/gcc-interface/trans.c 2018-07-31 11:16:27.350728886 +0200 @@ -6079,7 +6079,7 @@ gnat_to_gnu (Node_Id gnat_node) where GCC will want to treat it as a C string. */ string[i] = 0; - gnu_result = build_string (length, string); + gnu_result = build_string (length + 1, string); /* Strings in GCC don't normally have types, but we want this to not be converted to the array type. */ diff -pur gcc/ada/gcc-interface/utils2.c gcc/ada/gcc-interface/utils2.c --- gcc/ada/gcc-interface/utils2.c 2017-12-21 07:57:41.000000000 +0100 +++ gcc/ada/gcc-interface/utils2.c 2018-07-31 11:44:01.517117923 +0200 @@ -1844,7 +1844,7 @@ expand_sloc (Node_Id gnat_node, tree *fi } const int len = strlen (str); - *filename = build_string (len, str); + *filename = build_string (len + 1, str); TREE_TYPE (*filename) = build_array_type (char_type_node, build_index_type (size_int (len))); *line = build_int_cst (NULL_TREE, line_number);