diff mbox

Report dlsym, dlvsym lookup errors using dlerror [BZ #19509]

Message ID 56B9F83A.1000500@redhat.com
State New
Headers show

Commit Message

Florian Weimer Feb. 9, 2016, 2:31 p.m. UTC
I'm not sure if this is the right fix.  I traced back the lack of error
reporting to the original addition of the RTLD_NEXT functionality.  I
don't know why it was omitted from it.

I could not find any callers of _dl_lookup_symbol_x which with skip_map
!= NULL and which are not related to RTLD_NEXT, so this change looks
pretty conservative to me.

Florian

Comments

Martin Sebor Feb. 9, 2016, 9:10 p.m. UTC | #1
On 02/09/2016 07:31 AM, Florian Weimer wrote:
> I'm not sure if this is the right fix.  I traced back the lack of error
> reporting to the original addition of the RTLD_NEXT functionality.  I
> don't know why it was omitted from it.

I happened to come across the corresponding bug in Red Hat Bugzilla
and wanted to chime in on the patch.

POSIX requires dlerror() to return a non-null string after a dynamic
linking error.  After a failed call to dlopen(), dlclose(), or dlsym(),
POSIX says that "More detailed diagnostic information shall be
available through dlerror()."  dlerror() is required to return null
only when no error has occurred since the last call to the function,
or on failure.  (It doesn't matter that RTLD_NEXT isn't fully
specified by POSIX.)

So AFAICS, the patch does the right thing.  I would only suggest to
enhance the test to verify that dlerror() does return a null pointer
when no error has occurred (e.g., the second call to dlerror() with
no other intervening dlxxx() calls should return null).

If there isn't already a test in the test suite that exercises
the interaction of dladdr() and dlerrror() (I couldn't find one)
it would be worth adding one to verify that a failed dladdr()
doesn't affect dlerrror() (it doesn't seem to in my testing).
But that's probably outside of the scope of this bug fix.

Martin
Carlos O'Donell Feb. 9, 2016, 9:37 p.m. UTC | #2
On 02/09/2016 04:10 PM, Martin Sebor wrote:
> On 02/09/2016 07:31 AM, Florian Weimer wrote:
>> I'm not sure if this is the right fix.  I traced back the lack of error
>> reporting to the original addition of the RTLD_NEXT functionality.  I
>> don't know why it was omitted from it.
> 
> I happened to come across the corresponding bug in Red Hat Bugzilla
> and wanted to chime in on the patch.
> 
> POSIX requires dlerror() to return a non-null string after a dynamic
> linking error.  After a failed call to dlopen(), dlclose(), or dlsym(),
> POSIX says that "More detailed diagnostic information shall be
> available through dlerror()."  dlerror() is required to return null
> only when no error has occurred since the last call to the function,
> or on failure.  (It doesn't matter that RTLD_NEXT isn't fully
> specified by POSIX.)

What does "failure" mean? The definition of the API is such that
if no symbol is found (is that an error or not?) then the function
returns NULL. So it might be seen as a normal form of operation
as opposed to real errors like ENOMEM.

A POSIX clarification should probably be made here, but I agree
that from a QoI it is better to return some kind of error because
that's users expect. I also expect that's what other implementations
do (but haven't verified).

Cheers,
Carlos.
Martin Sebor Feb. 9, 2016, 10:19 p.m. UTC | #3
On 02/09/2016 02:37 PM, Carlos O'Donell wrote:
> On 02/09/2016 04:10 PM, Martin Sebor wrote:
>> On 02/09/2016 07:31 AM, Florian Weimer wrote:
>>> I'm not sure if this is the right fix.  I traced back the lack of error
>>> reporting to the original addition of the RTLD_NEXT functionality.  I
>>> don't know why it was omitted from it.
>>
>> I happened to come across the corresponding bug in Red Hat Bugzilla
>> and wanted to chime in on the patch.
>>
>> POSIX requires dlerror() to return a non-null string after a dynamic
>> linking error.  After a failed call to dlopen(), dlclose(), or dlsym(),
>> POSIX says that "More detailed diagnostic information shall be
>> available through dlerror()."  dlerror() is required to return null
>> only when no error has occurred since the last call to the function,
>> or on failure.  (It doesn't matter that RTLD_NEXT isn't fully
>> specified by POSIX.)
>
> What does "failure" mean? The definition of the API is such that
> if no symbol is found (is that an error or not?) then the function
> returns NULL. So it might be seen as a normal form of operation
> as opposed to real errors like ENOMEM.

My reading of the text in dlsym() (for example) is that a successful
completion implies that name refers to either a function or an object.

Otherwise "If handle does not refer to a valid symbol table handle
" or "if the symbol named by name cannot be found in the symbol table
associated with handle" that's a failure.  The text doesn't explicitly
use the word failure but I think that's implied.

Martin

>
> A POSIX clarification should probably be made here, but I agree
> that from a QoI it is better to return some kind of error because
> that's users expect. I also expect that's what other implementations
> do (but haven't verified).
>
> Cheers,
> Carlos.
>
Florian Weimer Feb. 9, 2016, 10:35 p.m. UTC | #4
* Carlos O'Donell:

> A POSIX clarification should probably be made here, but I agree
> that from a QoI it is better to return some kind of error because
> that's users expect. I also expect that's what other implementations
> do (but haven't verified).

It's also what the Solaris examples suggest:

  <http://docs.oracle.com/cd/E19683-01/816-1386/chapter3-26/index.html>

I don't see any reason why RTLD_NEXT should behave differently from
other handles.
Florian Weimer Feb. 9, 2016, 10:37 p.m. UTC | #5
* Martin Sebor:

> So AFAICS, the patch does the right thing.  I would only suggest to
> enhance the test to verify that dlerror() does return a null pointer
> when no error has occurred (e.g., the second call to dlerror() with
> no other intervening dlxxx() calls should return null).

Fair enough, I will put this into the test.

Have you and Carlos looked at more than the test? :)

I'm worried that I'm missing something in the callgraph—as far as I
can tell, the change only affects RTLD_NEXT, but it seems bizarre that
there wasn't any error reporting specifically for that.
Carlos O'Donell Feb. 10, 2016, 7:18 p.m. UTC | #6
On 02/09/2016 05:35 PM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> A POSIX clarification should probably be made here, but I agree
>> that from a QoI it is better to return some kind of error because
>> that's users expect. I also expect that's what other implementations
>> do (but haven't verified).
> 
> It's also what the Solaris examples suggest:
> 
>   <http://docs.oracle.com/cd/E19683-01/816-1386/chapter3-26/index.html>
> 
> I don't see any reason why RTLD_NEXT should behave differently from
> other handles.

Agreed, and that's the *biggest* reason to fix this, namely that the
interfaces should behave the same (principle of least surprise).

I am reviewing your change.

I expect you are suggesting this change for 2.24 and not "right now"
for the upcoming release? I fear the change might change something
of consequence and I'd rather the checkin wait until Adhemerval cuts
the branch on the 18th.

Cheers,
Carlos.
Florian Weimer Feb. 10, 2016, 7:26 p.m. UTC | #7
* Carlos O'Donell:

> I expect you are suggesting this change for 2.24 and not "right now"
> for the upcoming release?

Yes, all patches are for after 2.24 branches.  I'm familiar with the
concept of a freeze. :)

> I fear the change might change something of consequence and I'd
> rather the checkin wait until Adhemerval cuts the branch on the
> 18th.

Absolutely.

Florian
Florian Weimer March 7, 2016, 8:42 p.m. UTC | #8
On 02/09/2016 03:31 PM, Florian Weimer wrote:
> I'm not sure if this is the right fix.  I traced back the lack of error
> reporting to the original addition of the RTLD_NEXT functionality.  I
> don't know why it was omitted from it.
> 
> I could not find any callers of _dl_lookup_symbol_x which with skip_map
> != NULL and which are not related to RTLD_NEXT, so this change looks
> pretty conservative to me.

Ping?

Thanks,
Florian
Florian Weimer March 30, 2016, 11:53 a.m. UTC | #9
On 02/09/2016 03:31 PM, Florian Weimer wrote:
> 2016-02-09  Florian Weimer  <fweimer@redhat.com>
> 
> 	[BZ #19509]
> 	* elf/dl-lookup.c (_dl_lookup_symbol_x): Report error even if
> 	skip_map != NULL.
> 	* elf/bug19509.c: New file.
> 	* elf/Makefile (tests): Add bug19509.
> 	(bug19509): Link against libdl.

Ping?

  https://sourceware.org/ml/libc-alpha/2016-02/msg00172.html

Thanks,
Florian
Carlos O'Donell March 30, 2016, 11:50 p.m. UTC | #10
On 03/30/2016 07:53 AM, Florian Weimer wrote:
> On 02/09/2016 03:31 PM, Florian Weimer wrote:
>> 2016-02-09  Florian Weimer  <fweimer@redhat.com>
>>
>> 	[BZ #19509]
>> 	* elf/dl-lookup.c (_dl_lookup_symbol_x): Report error even if
>> 	skip_map != NULL.
>> 	* elf/bug19509.c: New file.
>> 	* elf/Makefile (tests): Add bug19509.
>> 	(bug19509): Link against libdl.
> 
> Ping?
> 
>   https://sourceware.org/ml/libc-alpha/2016-02/msg00172.html

I agree. The only caller with non-NULL skip_map is:

146   else if (handle == RTLD_NEXT)
147     {
148       if (__glibc_unlikely (match == GL(dl_ns)[LM_ID_BASE]._ns_loaded))
149         {
150           if (match == NULL
151               || caller < match->l_map_start
152               || caller >= match->l_map_end)
153             GLRO(dl_signal_error) (0, NULL, NULL, N_("\
154 RTLD_NEXT used in code not dynamically loaded"));
155         }
156 
157       struct link_map *l = match;
158       while (l->l_loader != NULL)
159         l = l->l_loader;
160 
161       result = GLRO(dl_lookup_symbol_x) (name, match, &ref, l->l_local_scope,
162                                          vers, 0, 0, match);
163     }

Whose purpose it is to skip the current map for lookup.

Your patch and testcase look good to me.
diff mbox

Patch

2016-02-09  Florian Weimer  <fweimer@redhat.com>

	[BZ #19509]
	* elf/dl-lookup.c (_dl_lookup_symbol_x): Report error even if
	skip_map != NULL.
	* elf/bug19509.c: New file.
	* elf/Makefile (tests): Add bug19509.
	(bug19509): Link against libdl.

diff --git a/elf/Makefile b/elf/Makefile
index 63a5355..18baa23 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -149,7 +149,8 @@  tests += loadtest restest1 preloadtest loadfail multiload origtest resolvfail \
 	 tst-nodelete) \
 	 tst-initorder tst-initorder2 tst-relsort1 tst-null-argv \
 	 tst-ptrguard1 tst-tlsalign tst-tlsalign-extern tst-nodelete-opened \
-	 tst-nodelete2 tst-audit11 tst-audit12
+	 tst-nodelete2 tst-audit11 tst-audit12 \
+	 bug19509
 #	 reldep9
 ifeq ($(build-hardcoded-path-in-tests),yes)
 tests += tst-dlopen-aout
@@ -1252,3 +1253,5 @@  $(objpfx)tst-prelink-cmp.out: tst-prelink.exp \
 			      $(objpfx)tst-prelink-conflict.out
 	cmp $^ > $@; \
 	$(evaluate-test)
+
+$(objpfx)bug19509: $(libdl)
diff --git a/elf/bug19509.c b/elf/bug19509.c
new file mode 100644
index 0000000..e7ccf88
--- /dev/null
+++ b/elf/bug19509.c
@@ -0,0 +1,106 @@ 
+/* Test error reporting for dlsym, dlvsym failures.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <gnu/lib-names.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+/* Used to disambiguate symbol names.  */
+static int counter;
+
+static void
+test_one (void *handle, const char *name, void *(func) (void *, const char *),
+          const char *suffix)
+{
+  ++counter;
+  char symbol[32];
+  snprintf (symbol, sizeof (symbol), "no_such_symbol_%d", counter);
+  char *expected_message;
+  if (asprintf (&expected_message, ": undefined symbol: %s%s",
+                symbol, suffix) < 0)
+    {
+      printf ("error: asprintf: %m\n");
+      abort ();
+    }
+
+  void *addr = func (handle, symbol);
+  if (addr != NULL)
+    {
+      printf ("error: %s: found symbol \"no_such_symbol\"\n", name);
+      abort ();
+    }
+  const char *message = dlerror ();
+  if (message == NULL)
+    {
+      printf ("error: %s: missing error message\n", name);
+      abort ();
+    }
+  const char *message_without_path = strchrnul (message, ':');
+  if (strcmp (message_without_path, expected_message) != 0)
+    {
+      printf ("error: %s: unexpected error message: %s\n", name, message);
+      abort ();
+    }
+  free (expected_message);
+}
+
+static void
+test_handles (const char *name, void *(func) (void *, const char *),
+              const char *suffix)
+{
+  test_one (RTLD_DEFAULT, name, func, suffix);
+  test_one (RTLD_NEXT, name, func, suffix);
+
+  void *handle = dlopen (LIBC_SO, RTLD_LAZY);
+  if (handle == NULL)
+    {
+      printf ("error: cannot dlopen %s: %s\n", LIBC_SO, dlerror ());
+      abort ();
+    }
+  test_one (handle, name, func, suffix);
+  dlclose (handle);
+}
+
+static void *
+dlvsym_no_such_version (void *handle, const char *name)
+{
+  return dlvsym (handle, name, "NO_SUCH_VERSION");
+}
+
+static void *
+dlvsym_glibc_private (void *handle, const char *name)
+{
+  return dlvsym (handle, name, "GLIBC_PRIVATE");
+}
+
+static int
+do_test (void)
+{
+  test_handles ("dlsym", dlsym, "");
+  test_handles ("dlvsym", dlvsym_no_such_version,
+                ", version NO_SUCH_VERSION");
+  test_handles ("dlvsym", dlvsym_glibc_private,
+                ", version GLIBC_PRIVATE");
+
+  return 0;
+}
+
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
diff --git a/elf/dl-lookup.c b/elf/dl-lookup.c
index f577759..6d299c1 100644
--- a/elf/dl-lookup.c
+++ b/elf/dl-lookup.c
@@ -858,7 +858,6 @@  _dl_lookup_symbol_x (const char *undef_name, struct link_map *undef_map,
   if (__glibc_unlikely (current_value.s == NULL))
     {
       if ((*ref == NULL || ELFW(ST_BIND) ((*ref)->st_info) != STB_WEAK)
-	  && skip_map == NULL
 	  && !(GLRO(dl_debug_mask) & DL_DEBUG_UNUSED))
 	{
 	  /* We could find no value for a strong reference.  */
-- 
2.4.3