Message ID | 20240122193150.2731985-1-xry111@xry111.site |
---|---|
State | New |
Headers | show |
Series | qsort: Fix a typo causing unnecessary malloc/free | expand |
On Mon, Jan 22, 2024 at 11:32 AM Xi Ruoyao <xry111@xry111.site> wrote: > > In qsort_r we allocate a buffer sized QSORT_STACK_SIZE (1024) on stack > and we intend to use it if all elements can fit into it. But there is a > typo: > > if (total_size < sizeof buf) > buf = tmp; > else > /* allocate a buffer on heap and use it ... */ > > Here "buf" is a pointer, thus sizeof buf is just 4 or 8, instead of > 1024. This sounds like a real bug. A glibc bug report is needed to track it. > This bug is detected debugging some strange heap corruption running the > Ruby-3.3.0 test suite (on an experimental Linux From Scratch build using > Binutils-2.41.90 and Glibc trunk, and also Fedora Rawhide [1]). It > seems Ruby is doing some wild "optimization" by jumping into somewhere > in qsort_r instead of calling it normally, resulting in a double free of > buf if we allocate it on heap. The issue can be reproduced > deterministically with: > > LD_PRELOAD=/usr/lib/libc_malloc_debug.so MALLOC_CHECK_=3 \ > LD_LIBRARY_PATH=. ./ruby test/runner.rb test/ruby/test_enum.rb > > in Ruby-3.3.0 tree after building it. This change would hide the issue > for Ruby, but Ruby is likely still buggy (if using this "optimization" > sorting larger arrays). > > [1]:https://kojipkgs.fedoraproject.org//work/tasks/9729/111889729/build.log > > Signed-off-by: Xi Ruoyao <xry111@xry111.site> > --- > stdlib/qsort.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/stdlib/qsort.c b/stdlib/qsort.c > index 7f5a00fb33..1c1505e7b2 100644 > --- a/stdlib/qsort.c > +++ b/stdlib/qsort.c > @@ -353,7 +353,7 @@ __qsort_r (void *const pbase, size_t total_elems, size_t size, > if (size > INDIRECT_SORT_SIZE_THRES) > total_size = 2 * total_elems * sizeof (void *) + size; > > - if (total_size < sizeof buf) > + if (total_size < sizeof tmp) > buf = tmp; > else > { > -- > 2.43.0 >
On Mon, 2024-01-22 at 11:46 -0800, H.J. Lu wrote: > On Mon, Jan 22, 2024 at 11:32 AM Xi Ruoyao <xry111@xry111.site> wrote: > > > > In qsort_r we allocate a buffer sized QSORT_STACK_SIZE (1024) on stack > > and we intend to use it if all elements can fit into it. But there is a > > typo: > > > > if (total_size < sizeof buf) > > buf = tmp; > > else > > /* allocate a buffer on heap and use it ... */ > > > > Here "buf" is a pointer, thus sizeof buf is just 4 or 8, instead of > > 1024. > > This sounds like a real bug. A glibc bug report is needed to track it. https://sourceware.org/bugzilla/show_bug.cgi?id=31276
On Tue, 2024-01-23 at 03:54 +0800, Xi Ruoyao wrote: > On Mon, 2024-01-22 at 11:46 -0800, H.J. Lu wrote: > > On Mon, Jan 22, 2024 at 11:32 AM Xi Ruoyao <xry111@xry111.site> wrote: > > > > > > In qsort_r we allocate a buffer sized QSORT_STACK_SIZE (1024) on stack > > > and we intend to use it if all elements can fit into it. But there is a > > > typo: > > > > > > if (total_size < sizeof buf) Hmm, and it seems this should be "<=" instead of "<". I'm preparing a V2 with the operator changed too, and the BZ number in commit message... > > > buf = tmp; > > > else > > > /* allocate a buffer on heap and use it ... */ > > > > > > Here "buf" is a pointer, thus sizeof buf is just 4 or 8, instead of > > > 1024. > > > > This sounds like a real bug. A glibc bug report is needed to track it. > > https://sourceware.org/bugzilla/show_bug.cgi?id=31276 >
* Xi Ruoyao: > This bug is detected debugging some strange heap corruption running the > Ruby-3.3.0 test suite (on an experimental Linux From Scratch build using > Binutils-2.41.90 and Glibc trunk, and also Fedora Rawhide [1]). It > seems Ruby is doing some wild "optimization" by jumping into somewhere > in qsort_r instead of calling it normally, resulting in a double free of > buf if we allocate it on heap. The issue can be reproduced > deterministically with: > > LD_PRELOAD=/usr/lib/libc_malloc_debug.so MALLOC_CHECK_=3 \ > LD_LIBRARY_PATH=. ./ruby test/runner.rb test/ruby/test_enum.rb > > in Ruby-3.3.0 tree after building it. This change would hide the issue > for Ruby, but Ruby is likely still buggy (if using this "optimization" > sorting larger arrays). > > [1]:https://kojipkgs.fedoraproject.org//work/tasks/9729/111889729/build.log Sorry, missed that part. I suspect it's because the Ruby garbage collector updates pointers in the array during invocation of the comparison function, and we later discard those updates when we copy the sorted runs back into the application-supplied array. Ruby probably scans the stack conservatively, triggering object pinning, which is probably the reason why we don't see the issue with the on-stack copy. The issue is reproducible with the old (glibc 2.37) qsort_r if the Ruby test case uses a suitable array size. We've posted our findings to the Ruby bug: <https://bugs.ruby-lang.org/issues/20203> Thanks, Florian
On Wed, 2024-01-24 at 14:45 +0100, Florian Weimer wrote: > * Xi Ruoyao: > > > This bug is detected debugging some strange heap corruption running the > > Ruby-3.3.0 test suite (on an experimental Linux From Scratch build using > > Binutils-2.41.90 and Glibc trunk, and also Fedora Rawhide [1]). It > > seems Ruby is doing some wild "optimization" by jumping into somewhere > > in qsort_r instead of calling it normally, resulting in a double free of > > buf if we allocate it on heap. The issue can be reproduced > > deterministically with: > > > > LD_PRELOAD=/usr/lib/libc_malloc_debug.so MALLOC_CHECK_=3 \ > > LD_LIBRARY_PATH=. ./ruby test/runner.rb test/ruby/test_enum.rb > > > > in Ruby-3.3.0 tree after building it. This change would hide the issue > > for Ruby, but Ruby is likely still buggy (if using this "optimization" > > sorting larger arrays). > > > > [1]:https://kojipkgs.fedoraproject.org//work/tasks/9729/111889729/build.log > > Sorry, missed that part. I suspect it's because the Ruby garbage > collector updates pointers in the array during invocation of the > comparison function, and we later discard those updates when we copy the > sorted runs back into the application-supplied array. Ruby probably > scans the stack conservatively, triggering object pinning, which is > probably the reason why we don't see the issue with the on-stack copy. > > The issue is reproducible with the old (glibc 2.37) qsort_r if the Ruby > test case uses a suitable array size. > > We've posted our findings to the Ruby bug: > > <https://bugs.ruby-lang.org/issues/20203> Ooooops. We've disabled the use of qsort_r for Ruby in BLFS for now: https://wiki.linuxfromscratch.org/blfs/changeset/77b455955c17e22d6aa544c766b30850978960ab. I guess their own sort should work with their own GC anyway...
diff --git a/stdlib/qsort.c b/stdlib/qsort.c index 7f5a00fb33..1c1505e7b2 100644 --- a/stdlib/qsort.c +++ b/stdlib/qsort.c @@ -353,7 +353,7 @@ __qsort_r (void *const pbase, size_t total_elems, size_t size, if (size > INDIRECT_SORT_SIZE_THRES) total_size = 2 * total_elems * sizeof (void *) + size; - if (total_size < sizeof buf) + if (total_size < sizeof tmp) buf = tmp; else {
In qsort_r we allocate a buffer sized QSORT_STACK_SIZE (1024) on stack and we intend to use it if all elements can fit into it. But there is a typo: if (total_size < sizeof buf) buf = tmp; else /* allocate a buffer on heap and use it ... */ Here "buf" is a pointer, thus sizeof buf is just 4 or 8, instead of 1024. This bug is detected debugging some strange heap corruption running the Ruby-3.3.0 test suite (on an experimental Linux From Scratch build using Binutils-2.41.90 and Glibc trunk, and also Fedora Rawhide [1]). It seems Ruby is doing some wild "optimization" by jumping into somewhere in qsort_r instead of calling it normally, resulting in a double free of buf if we allocate it on heap. The issue can be reproduced deterministically with: LD_PRELOAD=/usr/lib/libc_malloc_debug.so MALLOC_CHECK_=3 \ LD_LIBRARY_PATH=. ./ruby test/runner.rb test/ruby/test_enum.rb in Ruby-3.3.0 tree after building it. This change would hide the issue for Ruby, but Ruby is likely still buggy (if using this "optimization" sorting larger arrays). [1]:https://kojipkgs.fedoraproject.org//work/tasks/9729/111889729/build.log Signed-off-by: Xi Ruoyao <xry111@xry111.site> --- stdlib/qsort.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)