Message ID | 20240711092853.3810283-1-stli@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | s390x: Fix segfault in wcsncmp [BZ #31934] | expand |
On 7/11/24 5:28 AM, Stefan Liebler wrote: > The z13/vector-optimized wcsncmp implementation segfaults if n=1 > and there is only one character (equal on both strings) before > the page end. Then it loads and compares one character and misses > to check n again. The following load fails. LGTM. Thank you for fixing this! Please feel free to push since we're currently in bug fixing phase and this has no ABI impact. Reviewed-by: Carlos O'Donell <carlos@redhat.com> > This patch removes the extra load and compare of the first character > and just start with the loop which uses vector-load-to-block-boundary. > This code-path also checks n. > > With this patch both tests are passing: > - the simplified one mentioned in the bugzilla 31934 > - the full one in Florian Weimer's patch: > "manual: Document a GNU extension for strncmp/wcsncmp" > (https://patchwork.sourceware.org/project/glibc/patch/874j9eml6y.fsf@oldenburg.str.redhat.com/): > On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934. > --- > sysdeps/s390/wcsncmp-vx.S | 10 +--------- > 1 file changed, 1 insertion(+), 9 deletions(-) > > diff --git a/sysdeps/s390/wcsncmp-vx.S b/sysdeps/s390/wcsncmp-vx.S > index bf6dfa6bc2..8b081567a2 100644 > --- a/sysdeps/s390/wcsncmp-vx.S > +++ b/sysdeps/s390/wcsncmp-vx.S > @@ -59,14 +59,7 @@ ENTRY(WCSNCMP_Z13) > sllg %r4,%r4,2 /* Convert character-count to byte-count. */ > locgrne %r4,%r1 /* Use max byte-count, if bit 0/1 was one. */ > > - /* Check first character without vector load. */ > - lghi %r5,4 /* current_len = 4 bytes. */ > - /* Check s1/2[0]. */ > - lt %r0,0(%r2) > - l %r1,0(%r3) > - je .Lend_cmp_one_char > - crjne %r0,%r1,.Lend_cmp_one_char OK. Removes the unrolled processing of just 1 character. > - > + lghi %r5,0 /* current_len = 0 bytes. */ OK. Current progress is set to 0 at the start (as expected). > .Lloop: > vlbb %v17,0(%r5,%r3),6 /* Load s2 to block boundary. */ > vlbb %v16,0(%r5,%r2),6 /* Load s1 to block boundary. */ ... and we use vlbb to avoid crossing the block boundary (page boundary). > @@ -167,7 +160,6 @@ ENTRY(WCSNCMP_Z13) > srl %r4,2 /* And convert it to character-index. */ > vlgvf %r0,%v16,0(%r4) /* Load character-values. */ > vlgvf %r1,%v17,0(%r4) > -.Lend_cmp_one_char: OK. Label no longer needed. > cr %r0,%r1 > je .Lend_equal > lghi %r2,1
On 11.07.24 14:57, Carlos O'Donell wrote: > On 7/11/24 5:28 AM, Stefan Liebler wrote: >> The z13/vector-optimized wcsncmp implementation segfaults if n=1 >> and there is only one character (equal on both strings) before >> the page end. Then it loads and compares one character and misses >> to check n again. The following load fails. > > LGTM. Thank you for fixing this! > > Please feel free to push since we're currently in bug fixing phase and this has > no ABI impact. > > Reviewed-by: Carlos O'Donell <carlos@redhat.com> Thanks for reviewing. I've just committed it. Note that the wcsncmp implementation was introduced in glibc 2.23: https://sourceware.org/git/?p=glibc.git;a=commit;h=cee82e70ccb7b2f054cd781b0a603ae244523e72 'S390: Optimize strncmp and wcsncmp.' Can you please advice which release-branches are maintained and to which one I should backport them? To all from glibc 2.39 down to 2.23? Thanks, Stefan > >> This patch removes the extra load and compare of the first character >> and just start with the loop which uses vector-load-to-block-boundary. >> This code-path also checks n. >> >> With this patch both tests are passing: >> - the simplified one mentioned in the bugzilla 31934 >> - the full one in Florian Weimer's patch: >> "manual: Document a GNU extension for strncmp/wcsncmp" >> (https://patchwork.sourceware.org/project/glibc/patch/874j9eml6y.fsf@oldenburg.str.redhat.com/): >> On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934. >> --- >> sysdeps/s390/wcsncmp-vx.S | 10 +--------- >> 1 file changed, 1 insertion(+), 9 deletions(-) >> >> diff --git a/sysdeps/s390/wcsncmp-vx.S b/sysdeps/s390/wcsncmp-vx.S >> index bf6dfa6bc2..8b081567a2 100644 >> --- a/sysdeps/s390/wcsncmp-vx.S >> +++ b/sysdeps/s390/wcsncmp-vx.S >> @@ -59,14 +59,7 @@ ENTRY(WCSNCMP_Z13) >> sllg %r4,%r4,2 /* Convert character-count to byte-count. */ >> locgrne %r4,%r1 /* Use max byte-count, if bit 0/1 was one. */ >> >> - /* Check first character without vector load. */ >> - lghi %r5,4 /* current_len = 4 bytes. */ >> - /* Check s1/2[0]. */ >> - lt %r0,0(%r2) >> - l %r1,0(%r3) >> - je .Lend_cmp_one_char >> - crjne %r0,%r1,.Lend_cmp_one_char > > OK. Removes the unrolled processing of just 1 character. > >> - >> + lghi %r5,0 /* current_len = 0 bytes. */ > > OK. Current progress is set to 0 at the start (as expected). > >> .Lloop: >> vlbb %v17,0(%r5,%r3),6 /* Load s2 to block boundary. */ >> vlbb %v16,0(%r5,%r2),6 /* Load s1 to block boundary. */ > > ... and we use vlbb to avoid crossing the block boundary (page boundary). > >> @@ -167,7 +160,6 @@ ENTRY(WCSNCMP_Z13) >> srl %r4,2 /* And convert it to character-index. */ >> vlgvf %r0,%v16,0(%r4) /* Load character-values. */ >> vlgvf %r1,%v17,0(%r4) >> -.Lend_cmp_one_char: > > OK. Label no longer needed. > >> cr %r0,%r1 >> je .Lend_equal >> lghi %r2,1 >
Am Donnerstag, 11. Juli 2024, 15:17:11 CEST schrieb Stefan Liebler: > On 11.07.24 14:57, Carlos O'Donell wrote: > > On 7/11/24 5:28 AM, Stefan Liebler wrote: > >> The z13/vector-optimized wcsncmp implementation segfaults if n=1 > >> and there is only one character (equal on both strings) before > >> the page end. Then it loads and compares one character and misses > >> to check n again. The following load fails. > > > > LGTM. Thank you for fixing this! > > > > Please feel free to push since we're currently in bug fixing phase and this has > > no ABI impact. > > > > Reviewed-by: Carlos O'Donell <carlos@redhat.com> > > Thanks for reviewing. > I've just committed it. > > Note that the wcsncmp implementation was introduced in glibc 2.23: > https://sourceware.org/git/?p=glibc.git;a=commit;h=cee82e70ccb7b2f054cd781b0a603ae244523e72 > 'S390: Optimize strncmp and wcsncmp.' > > Can you please advice which release-branches are maintained and to which > one I should backport them? > To all from glibc 2.39 down to 2.23? Let's see what Carlos says, but we also have the table on the wiki for this: https://sourceware.org/glibc/wiki/Release I took the liberty of splitting it up into "active" and "EOL'ed" (just now). If anything has been placed wrong there, please feel free to correct. > > Thanks, > Stefan > >
On 7/11/24 9:17 AM, Stefan Liebler wrote: > On 11.07.24 14:57, Carlos O'Donell wrote: >> On 7/11/24 5:28 AM, Stefan Liebler wrote: >>> The z13/vector-optimized wcsncmp implementation segfaults if n=1 >>> and there is only one character (equal on both strings) before >>> the page end. Then it loads and compares one character and misses >>> to check n again. The following load fails. >> >> LGTM. Thank you for fixing this! >> >> Please feel free to push since we're currently in bug fixing phase and this has >> no ABI impact. >> >> Reviewed-by: Carlos O'Donell <carlos@redhat.com> > > Thanks for reviewing. > I've just committed it. > > Note that the wcsncmp implementation was introduced in glibc 2.23: > https://sourceware.org/git/?p=glibc.git;a=commit;h=cee82e70ccb7b2f054cd781b0a603ae244523e72 > 'S390: Optimize strncmp and wcsncmp.' > > Can you please advice which release-branches are maintained and to which > one I should backport them? > To all from glibc 2.39 down to 2.23? The active release branches are here: https://sourceware.org/glibc/wiki/Release/#Current_branches You need only backport from 2.39 down to 2.32. We are actively trying to limit the open branches to those matching downstream distribution efforts. >> >>> This patch removes the extra load and compare of the first character >>> and just start with the loop which uses vector-load-to-block-boundary. >>> This code-path also checks n. >>> >>> With this patch both tests are passing: >>> - the simplified one mentioned in the bugzilla 31934 >>> - the full one in Florian Weimer's patch: >>> "manual: Document a GNU extension for strncmp/wcsncmp" >>> (https://patchwork.sourceware.org/project/glibc/patch/874j9eml6y.fsf@oldenburg.str.redhat.com/): >>> On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934. >>> --- >>> sysdeps/s390/wcsncmp-vx.S | 10 +--------- >>> 1 file changed, 1 insertion(+), 9 deletions(-) >>> >>> diff --git a/sysdeps/s390/wcsncmp-vx.S b/sysdeps/s390/wcsncmp-vx.S >>> index bf6dfa6bc2..8b081567a2 100644 >>> --- a/sysdeps/s390/wcsncmp-vx.S >>> +++ b/sysdeps/s390/wcsncmp-vx.S >>> @@ -59,14 +59,7 @@ ENTRY(WCSNCMP_Z13) >>> sllg %r4,%r4,2 /* Convert character-count to byte-count. */ >>> locgrne %r4,%r1 /* Use max byte-count, if bit 0/1 was one. */ >>> >>> - /* Check first character without vector load. */ >>> - lghi %r5,4 /* current_len = 4 bytes. */ >>> - /* Check s1/2[0]. */ >>> - lt %r0,0(%r2) >>> - l %r1,0(%r3) >>> - je .Lend_cmp_one_char >>> - crjne %r0,%r1,.Lend_cmp_one_char >> >> OK. Removes the unrolled processing of just 1 character. >> >>> - >>> + lghi %r5,0 /* current_len = 0 bytes. */ >> >> OK. Current progress is set to 0 at the start (as expected). >> >>> .Lloop: >>> vlbb %v17,0(%r5,%r3),6 /* Load s2 to block boundary. */ >>> vlbb %v16,0(%r5,%r2),6 /* Load s1 to block boundary. */ >> >> ... and we use vlbb to avoid crossing the block boundary (page boundary). >> >>> @@ -167,7 +160,6 @@ ENTRY(WCSNCMP_Z13) >>> srl %r4,2 /* And convert it to character-index. */ >>> vlgvf %r0,%v16,0(%r4) /* Load character-values. */ >>> vlgvf %r1,%v17,0(%r4) >>> -.Lend_cmp_one_char: >> >> OK. Label no longer needed. >> >>> cr %r0,%r1 >>> je .Lend_equal >>> lghi %r2,1 >> >
On 11.07.24 16:53, Carlos O'Donell wrote: > On 7/11/24 9:17 AM, Stefan Liebler wrote: >> On 11.07.24 14:57, Carlos O'Donell wrote: >>> On 7/11/24 5:28 AM, Stefan Liebler wrote: >>>> The z13/vector-optimized wcsncmp implementation segfaults if n=1 >>>> and there is only one character (equal on both strings) before >>>> the page end. Then it loads and compares one character and misses >>>> to check n again. The following load fails. >>> >>> LGTM. Thank you for fixing this! >>> >>> Please feel free to push since we're currently in bug fixing phase and this has >>> no ABI impact. >>> >>> Reviewed-by: Carlos O'Donell <carlos@redhat.com> >> >> Thanks for reviewing. >> I've just committed it. >> >> Note that the wcsncmp implementation was introduced in glibc 2.23: >> https://sourceware.org/git/?p=glibc.git;a=commit;h=cee82e70ccb7b2f054cd781b0a603ae244523e72 >> 'S390: Optimize strncmp and wcsncmp.' >> >> Can you please advice which release-branches are maintained and to which >> one I should backport them? >> To all from glibc 2.39 down to 2.23? > > The active release branches are here: > https://sourceware.org/glibc/wiki/Release/#Current_branches > > You need only backport from 2.39 down to 2.32. > > We are actively trying to limit the open branches to those matching > downstream distribution efforts. Thanks Andreas, Carlos, then I will cherry-pick it to 2.39 down to 2.32 release-branches. > >>> >>>> This patch removes the extra load and compare of the first character >>>> and just start with the loop which uses vector-load-to-block-boundary. >>>> This code-path also checks n. >>>> >>>> With this patch both tests are passing: >>>> - the simplified one mentioned in the bugzilla 31934 >>>> - the full one in Florian Weimer's patch: >>>> "manual: Document a GNU extension for strncmp/wcsncmp" >>>> (https://patchwork.sourceware.org/project/glibc/patch/874j9eml6y.fsf@oldenburg.str.redhat.com/): >>>> On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934. >>>> --- >>>> sysdeps/s390/wcsncmp-vx.S | 10 +--------- >>>> 1 file changed, 1 insertion(+), 9 deletions(-) >>>> >>>> diff --git a/sysdeps/s390/wcsncmp-vx.S b/sysdeps/s390/wcsncmp-vx.S >>>> index bf6dfa6bc2..8b081567a2 100644 >>>> --- a/sysdeps/s390/wcsncmp-vx.S >>>> +++ b/sysdeps/s390/wcsncmp-vx.S >>>> @@ -59,14 +59,7 @@ ENTRY(WCSNCMP_Z13) >>>> sllg %r4,%r4,2 /* Convert character-count to byte-count. */ >>>> locgrne %r4,%r1 /* Use max byte-count, if bit 0/1 was one. */ >>>> >>>> - /* Check first character without vector load. */ >>>> - lghi %r5,4 /* current_len = 4 bytes. */ >>>> - /* Check s1/2[0]. */ >>>> - lt %r0,0(%r2) >>>> - l %r1,0(%r3) >>>> - je .Lend_cmp_one_char >>>> - crjne %r0,%r1,.Lend_cmp_one_char >>> >>> OK. Removes the unrolled processing of just 1 character. >>> >>>> - >>>> + lghi %r5,0 /* current_len = 0 bytes. */ >>> >>> OK. Current progress is set to 0 at the start (as expected). >>> >>>> .Lloop: >>>> vlbb %v17,0(%r5,%r3),6 /* Load s2 to block boundary. */ >>>> vlbb %v16,0(%r5,%r2),6 /* Load s1 to block boundary. */ >>> >>> ... and we use vlbb to avoid crossing the block boundary (page boundary). >>> >>>> @@ -167,7 +160,6 @@ ENTRY(WCSNCMP_Z13) >>>> srl %r4,2 /* And convert it to character-index. */ >>>> vlgvf %r0,%v16,0(%r4) /* Load character-values. */ >>>> vlgvf %r1,%v17,0(%r4) >>>> -.Lend_cmp_one_char: >>> >>> OK. Label no longer needed. >>> >>>> cr %r0,%r1 >>>> je .Lend_equal >>>> lghi %r2,1 >>> >> >
diff --git a/sysdeps/s390/wcsncmp-vx.S b/sysdeps/s390/wcsncmp-vx.S index bf6dfa6bc2..8b081567a2 100644 --- a/sysdeps/s390/wcsncmp-vx.S +++ b/sysdeps/s390/wcsncmp-vx.S @@ -59,14 +59,7 @@ ENTRY(WCSNCMP_Z13) sllg %r4,%r4,2 /* Convert character-count to byte-count. */ locgrne %r4,%r1 /* Use max byte-count, if bit 0/1 was one. */ - /* Check first character without vector load. */ - lghi %r5,4 /* current_len = 4 bytes. */ - /* Check s1/2[0]. */ - lt %r0,0(%r2) - l %r1,0(%r3) - je .Lend_cmp_one_char - crjne %r0,%r1,.Lend_cmp_one_char - + lghi %r5,0 /* current_len = 0 bytes. */ .Lloop: vlbb %v17,0(%r5,%r3),6 /* Load s2 to block boundary. */ vlbb %v16,0(%r5,%r2),6 /* Load s1 to block boundary. */ @@ -167,7 +160,6 @@ ENTRY(WCSNCMP_Z13) srl %r4,2 /* And convert it to character-index. */ vlgvf %r0,%v16,0(%r4) /* Load character-values. */ vlgvf %r1,%v17,0(%r4) -.Lend_cmp_one_char: cr %r0,%r1 je .Lend_equal lghi %r2,1