Avoid mapping past end of shared object (BZ #18685)

Message ID	1437033625-13561-1-git-send-email-siddhesh@redhat.com
State	New
Headers	show Return-Path: <libc-alpha-return-61009-incoming=patchwork.ozlabs.org@sourceware.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id; q=dns; s= default; b=MLAsc1k8w/AI/mmtFifhTkrPnLau0FXCvliJghNxrip9zP1PriFVM puBRXDBISSiBs1RrOFs58iIdLfMZA1iyzxC/JKbAJHDz/QkJ3CQUSfOZ91/87UpR hSfVBlMTqN47QBIwJx42sLTX++0UB+hxyt3/4n+ogLOJBW7ZV7Zybk= Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org From: "Siddhesh Poyarekar" <siddhesh@redhat.com> To: libc-alpha@sourceware.org Cc: roland@hack.frob.com Subject: [PATCH] Avoid mapping past end of shared object (BZ #18685) Date: Thu, 16 Jul 2015 13:30:25 +0530 Message-Id: <1437033625-13561-1-git-send-email-siddhesh@redhat.com>

Siddhesh Poyarekar July 16, 2015, 8 a.m. UTC

Some valid ELF objects, like .debug files may refer to sections
outside themselves since they're loaded and patched up to their parent
ELF.  Calling ldd on them may result in a segfault since it may try to
read beyond the end of the mapping.  Red Hat bz:

https://bugzilla.redhat.com/show_bug.cgi?id=741105

has an example, although I haven't been able to find a sample
reproducer file immediately.  This patch has been carried in Fedora
and RHEL for a couple of years now.  Also tested on x86_64 to verify
that there are no regressions.

	[BZ #18685]
	* sysdeps/generic/dl-fileid.h (_dl_get_file_id): Accept
	pointer to a struct stat64.
	* sysdeps/posix/dl-fileid.h (_dl_get_file_id): Likewise.
	* elf/dl-load.c (_dl_map_object_from_fd): Avoid mapping past
	end of ELF file.
---
 elf/dl-load.c               | 14 +++++++++++++-
 sysdeps/generic/dl-fileid.h |  3 ++-
 sysdeps/posix/dl-fileid.h   | 12 +++++-------
 3 files changed, 20 insertions(+), 9 deletions(-)

Mike Frysinger July 16, 2015, 10:14 a.m. UTC | #1

On 16 Jul 2015 13:30, Siddhesh Poyarekar wrote:
> +	  if (__glibc_unlikely (ph->p_offset + ph->p_filesz > st.st_size))

p_offset and p_filesz are 64bit w/64bit ELFs, so seems like this too could 
overflow wrt the 64bit st_size.  is there some other check earlier on that
would catch that ?
-mike

Carlos O'Donell July 16, 2015, 3:59 p.m. UTC | #2

On 07/16/2015 04:00 AM, Siddhesh Poyarekar wrote:
> Some valid ELF objects, like .debug files may refer to sections
> outside themselves since they're loaded and patched up to their parent
> ELF.  Calling ldd on them may result in a segfault since it may try to
> read beyond the end of the mapping.  Red Hat bz:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=741105
> 
> has an example, although I haven't been able to find a sample
> reproducer file immediately.  This patch has been carried in Fedora
> and RHEL for a couple of years now.  Also tested on x86_64 to verify
> that there are no regressions.
> 
> 	[BZ #18685]
> 	* sysdeps/generic/dl-fileid.h (_dl_get_file_id): Accept
> 	pointer to a struct stat64.
> 	* sysdeps/posix/dl-fileid.h (_dl_get_file_id): Likewise.
> 	* elf/dl-load.c (_dl_map_object_from_fd): Avoid mapping past
> 	end of ELF file.

This is not the right fix for this problem. The right fix has not
been attempted because it involves someone doing some real leg work
to gather consensus. This fix adds complex checking in ld.so for
minimal gain, and eventually you'll get a debuginfo file that is
different again in some odd way.

The fix is to allocate a new ET_DEBUG_EXEC and ET_DEBUG_DYN and mark
the debuginfo with those flags. That way ldd knows that the debuginfo
is going to be incomplete and take action e.g. like printing out a
message saying "This is debuginfo for a library" without processing
any of the sections.

Then you update binutils to add the ET_* flags to debuginfo objects.
It can't be a generic elf header bit because most arches have used
them all up. It can't be a .note because that is allocated and may
reorganize the debuginfo in ways which mean it can't easily be layered
on top of the original executable from which it was stripped. Not to
mention .note's are too hard to find for some tools. The best option
is a new set of ET_* flags.

If you can't get those values standardized, and yes we'd have to go
to the Generic ELF ABI people and ask for them, then we could use
ET_GNU_* and use the OS-specific value range.

Cheers,
Carlos.

Andreas Schwab July 16, 2015, 4:36 p.m. UTC | #3

Perhaps ldd should use a specially compiled ld.so that has a lot of
extra checks added (so that it can be run on arbitrary objects without
creating security hazards).

Andreas.

Carlos O'Donell July 16, 2015, 5:17 p.m. UTC | #4

On 07/16/2015 12:36 PM, Andreas Schwab wrote:
> Perhaps ldd should use a specially compiled ld.so that has a lot of
> extra checks added (so that it can be run on arbitrary objects without
> creating security hazards).

ldd should be a distinct tool that uses libelf, and provides deeper
introspection and options. This way we have a second implementation
of the loader rules in a cleaner and concise form that we can use
to double-check assertions about load order and cycle breakage,
and cross-check ld.so changes.

I have a half-written eu-ldd that I've started. I guess I should
contribute that to elfutils soon.

c.

Roland McGrath July 16, 2015, 11:34 p.m. UTC | #5

I concur with Carlos's NAK on this change.  The desireability of
alternatives proposed is not clear to me off hand.  Those should
be discussed in separate threads.

Mike Frysinger July 17, 2015, 2:08 a.m. UTC | #6

On 16 Jul 2015 13:17, Carlos O'Donell wrote:
> On 07/16/2015 12:36 PM, Andreas Schwab wrote:
> > Perhaps ldd should use a specially compiled ld.so that has a lot of
> > extra checks added (so that it can be run on arbitrary objects without
> > creating security hazards).
> 
> ldd should be a distinct tool that uses libelf, and provides deeper
> introspection and options. This way we have a second implementation
> of the loader rules in a cleaner and concise form that we can use
> to double-check assertions about load order and cycle breakage,
> and cross-check ld.so changes.

i agree in having a dedicated/robust tool, but i don't think it means we should 
disable the trace option in the ldso itself.  having a way to get the actual 
details out of the ldso i think is still valuable.

i also don't think glibc should be dependent upon the elfutils package ...
-mike

Carlos O'Donell July 17, 2015, 2:45 a.m. UTC | #7

On 07/16/2015 10:08 PM, Mike Frysinger wrote:
> On 16 Jul 2015 13:17, Carlos O'Donell wrote:
>> On 07/16/2015 12:36 PM, Andreas Schwab wrote:
>>> Perhaps ldd should use a specially compiled ld.so that has a lot of
>>> extra checks added (so that it can be run on arbitrary objects without
>>> creating security hazards).
>>
>> ldd should be a distinct tool that uses libelf, and provides deeper
>> introspection and options. This way we have a second implementation
>> of the loader rules in a cleaner and concise form that we can use
>> to double-check assertions about load order and cycle breakage,
>> and cross-check ld.so changes.
> 
> i agree in having a dedicated/robust tool, but i don't think it means we should 
> disable the trace option in the ldso itself.  having a way to get the actual 
> details out of the ldso i think is still valuable.

Agreed. We need trace to do comparison between eu-ldd and ld.so.
We need trace for developers and for runtime dumps which include
looking at dlopen/dlmopen calls. We need more trace data to look
at namespaces for dlmopen.

Making this trace robust against corrupt binaries is a questionable
endeavor though?

> i also don't think glibc should be dependent upon the elfutils package ...

It would be, but only for testing, not building, and in a bootstrap
without eu-ldd you'd just have UNSUPPORTED tests, like if you didn't
have libstdc++-static available for all the C++ tests.

c.

Mike Frysinger July 17, 2015, 2:55 a.m. UTC | #8

On 16 Jul 2015 22:45, Carlos O'Donell wrote:
> On 07/16/2015 10:08 PM, Mike Frysinger wrote:
> > On 16 Jul 2015 13:17, Carlos O'Donell wrote:
> >> On 07/16/2015 12:36 PM, Andreas Schwab wrote:
> >>> Perhaps ldd should use a specially compiled ld.so that has a lot of
> >>> extra checks added (so that it can be run on arbitrary objects without
> >>> creating security hazards).
> >>
> >> ldd should be a distinct tool that uses libelf, and provides deeper
> >> introspection and options. This way we have a second implementation
> >> of the loader rules in a cleaner and concise form that we can use
> >> to double-check assertions about load order and cycle breakage,
> >> and cross-check ld.so changes.
> > 
> > i agree in having a dedicated/robust tool, but i don't think it means we should 
> > disable the trace option in the ldso itself.  having a way to get the actual 
> > details out of the ldso i think is still valuable.
> 
> Agreed. We need trace to do comparison between eu-ldd and ld.so.
> We need trace for developers and for runtime dumps which include
> looking at dlopen/dlmopen calls. We need more trace data to look
> at namespaces for dlmopen.
> 
> Making this trace robust against corrupt binaries is a questionable
> endeavor though?

i think we need to clarify the goals before we dive into technical details.
how about:
  running `ldd <foo>` should be safe (i.e. not crash or be a security hazard)

if we can all agree on that, then a lot of things shake out from there.

> > i also don't think glibc should be dependent upon the elfutils package ...
> 
> It would be, but only for testing, not building, and in a bootstrap
> without eu-ldd you'd just have UNSUPPORTED tests, like if you didn't
> have libstdc++-static available for all the C++ tests.

if it's not installed, then i have no complaints :).  random/excess deps for 
debug/dev tools are fine.
-mike

H.J. Lu July 17, 2015, 3:15 a.m. UTC | #9

On Thu, Jul 16, 2015 at 1:00 AM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> Some valid ELF objects, like .debug files may refer to sections
> outside themselves since they're loaded and patched up to their parent
> ELF.  Calling ldd on them may result in a segfault since it may try to
> read beyond the end of the mapping.  Red Hat bz:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=741105
>
> has an example, although I haven't been able to find a sample
> reproducer file immediately.  This patch has been carried in Fedora
> and RHEL for a couple of years now.  Also tested on x86_64 to verify
> that there are no regressions.
>
>         [BZ #18685]
>         * sysdeps/generic/dl-fileid.h (_dl_get_file_id): Accept
>         pointer to a struct stat64.
>         * sysdeps/posix/dl-fileid.h (_dl_get_file_id): Likewise.
>         * elf/dl-load.c (_dl_map_object_from_fd): Avoid mapping past
>         end of ELF file.

If I understand it correctly, ld.so crashes on an ELF file with segment headers
where file offset is larger than the file itself.  Isn't it the same
as a corrupted
ELF file with segment headers where file offset is larger than the file itself?
I can create such a file with a binary editor.

How should ld.so deal with such ELF files?

H.J.

Carlos O'Donell July 17, 2015, 3:18 a.m. UTC | #10

On 07/16/2015 10:55 PM, Mike Frysinger wrote:
> On 16 Jul 2015 22:45, Carlos O'Donell wrote:
>> On 07/16/2015 10:08 PM, Mike Frysinger wrote:
>>> On 16 Jul 2015 13:17, Carlos O'Donell wrote:
>>>> On 07/16/2015 12:36 PM, Andreas Schwab wrote:
>>>>> Perhaps ldd should use a specially compiled ld.so that has a lot of
>>>>> extra checks added (so that it can be run on arbitrary objects without
>>>>> creating security hazards).
>>>>
>>>> ldd should be a distinct tool that uses libelf, and provides deeper
>>>> introspection and options. This way we have a second implementation
>>>> of the loader rules in a cleaner and concise form that we can use
>>>> to double-check assertions about load order and cycle breakage,
>>>> and cross-check ld.so changes.
>>>
>>> i agree in having a dedicated/robust tool, but i don't think it means we should 
>>> disable the trace option in the ldso itself.  having a way to get the actual 
>>> details out of the ldso i think is still valuable.
>>
>> Agreed. We need trace to do comparison between eu-ldd and ld.so.
>> We need trace for developers and for runtime dumps which include
>> looking at dlopen/dlmopen calls. We need more trace data to look
>> at namespaces for dlmopen.
>>
>> Making this trace robust against corrupt binaries is a questionable
>> endeavor though?
> 
> i think we need to clarify the goals before we dive into technical details.
> how about:
>   running `ldd <foo>` should be safe (i.e. not crash or be a security hazard)
> 
> if we can all agree on that, then a lot of things shake out from there.

Running `ldd <foo>` should be safe and free of security hazards at a distro
level, but that `ldd` *should* be eu-ldd and based on a hardened and fuzzed
libelf. It should leverage the work we are doing to harden libelf. Hardening
the real loader to serve as an alternate tool is against the "do one thing well"
philosophy. We will always have conflicting goals if we want ld.so to be both
a hardened introspection tool and a fast and performant loader.

Running the result of `elf/ldd.bash.in` (invoking the real loader) should run
the real loader in trace mode. The real loader should rarely compromise to being
slower because it's in trace mode. It should IMO not unduly compromise performance
for hardening against problems which are the result of other failures in the
system e.g. storage, memory, kernel, etc. Where possible some hardening is good
given clearly stated rationale for the change e.g. a real security threat.

In this particular case, hardening against stripped debug data, is the wrong
solution. The debug data must be clearly marked as such since it's actually
not real DSOs or executables, and we've cheated by using objcopy to get something
that looks like a valid ELF file but isn't anymore. We need to stop being sloppy
about how we label that kind of object.

Is that clear enough?

I want to avoid a blanket statement. In this case I think the hardening rational,
that of supporting debug files, is not worth the change in the code.

Cheers,
Carlos.

Siddhesh Poyarekar July 17, 2015, 3:28 a.m. UTC | #11

On Thu, Jul 16, 2015 at 11:59:18AM -0400, Carlos O'Donell wrote:
> This is not the right fix for this problem. The right fix has not
> been attempted because it involves someone doing some real leg work
> to gather consensus. This fix adds complex checking in ld.so for
> minimal gain, and eventually you'll get a debuginfo file that is
> different again in some odd way.

This is not specifically about being able to read debug files, nor is
it about ldd.  It just happens to be ldd (ld.so [--verify|--list])
that crashed, but the offending code is bang in the middle of generic
ld.so code that can potentially be exploited when running arbitrary
binaries.  While it is true that one should not run arbitrary code
anyway, it shouldn't be an excuse for not fixing bugs.  I don't see
the point of not adding such checks as they come up; performance is an
excuse made quite regularly, but what is the actual cost of such
checks?

To be clear, I am not against having an eu-ldd, but that shouldn't be
an excuse for not patching ld.so.  Things that don't crash on eu-ldd,
should not crash on ld.so.

Oh, and did I mention that eu-ldd (and most of elfutils) should
ideally be written in an interpreted language (cough*python*cough) so
that we reduce the attack surface on them?

Siddhesh

Siddhesh Poyarekar July 17, 2015, 3:33 a.m. UTC | #12

On Thu, Jul 16, 2015 at 08:15:17PM -0700, H.J. Lu wrote:
> If I understand it correctly, ld.so crashes on an ELF file with segment headers
> where file offset is larger than the file itself.  Isn't it the same
> as a corrupted
> ELF file with segment headers where file offset is larger than the file itself?
> I can create such a file with a binary editor.

Yes, which is the point of including such fixes IMO.

Siddhesh

H.J. Lu July 17, 2015, 3:40 a.m. UTC | #13

On Thu, Jul 16, 2015 at 8:33 PM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> On Thu, Jul 16, 2015 at 08:15:17PM -0700, H.J. Lu wrote:
>> If I understand it correctly, ld.so crashes on an ELF file with segment headers
>> where file offset is larger than the file itself.  Isn't it the same
>> as a corrupted
>> ELF file with segment headers where file offset is larger than the file itself?
>> I can create such a file with a binary editor.
>
> Yes, which is the point of including such fixes IMO.
>

You can create such a testcase with a simple program which updates
the segment header in a good ELF file.

Carlos O'Donell July 17, 2015, 4:02 a.m. UTC | #14

On 07/16/2015 11:28 PM, Siddhesh Poyarekar wrote:
> On Thu, Jul 16, 2015 at 11:59:18AM -0400, Carlos O'Donell wrote:
>> This is not the right fix for this problem. The right fix has not
>> been attempted because it involves someone doing some real leg work
>> to gather consensus. This fix adds complex checking in ld.so for
>> minimal gain, and eventually you'll get a debuginfo file that is
>> different again in some odd way.
> 
> This is not specifically about being able to read debug files, nor is
> it about ldd.  It just happens to be ldd (ld.so [--verify|--list])
> that crashed, but the offending code is bang in the middle of generic
> ld.so code that can potentially be exploited when running arbitrary
> binaries.  While it is true that one should not run arbitrary code
> anyway, it shouldn't be an excuse for not fixing bugs.  I don't see
> the point of not adding such checks as they come up; performance is an
> excuse made quite regularly, but what is the actual cost of such
> checks?

Is this really a bug?

The cost of the checks is more than just performance, but also maintenance
and support of that code into the future. It is about testing that code,
making sure we exercise those weird conditional failure paths which we
thought would be a good thing to add. Each of these checks is an additional
complication, added branch, and conditional in already complicated code.
I would like to see the core loader as simple as possible and to expect
as correct as possible input.

If I ask you for a test case for bug 18685, do we have one we can checkin
to test this code? That goes a long way to helping maintain this code.

If ldd should not be run on untrusted input... then why are we looking
at this patch? I think the fact on the ground is that ldd *is* run on
untrusted input and eu-ldd isn't ready, and we face the grim specter of
security issues. However, if we're talking security issues, there are
a million ways I could craft relocs to overwrite and assemble ROP
widgets to execute a custom program that would run crafty exploits.
Running ldd is equivalent to ./app.

See: "DEFCON 20: Programming Weird Machines with ELF Metadata"
https://www.youtube.com/watch?v=YgtxxLCVD-o

So again, we can never support real ld.so running or processing untrusted
input? The real fix becomes: Don't use ld.so, don't process relocs,
don't run anything, just analyze.

> To be clear, I am not against having an eu-ldd, but that shouldn't be
> an excuse for not patching ld.so.  Things that don't crash on eu-ldd,
> should not crash on ld.so.

To fix or not to fix ld.so should be done on the merit of the individual
change.

I disagree that eu-ldd and ld.so should both be equally robust against
bad input. Just consider the myriad of problems the binary could have,
the fuzzing we put libelf through, all of that is costly verification
that ld.so should not do.

> Oh, and did I mention that eu-ldd (and most of elfutils) should
> ideally be written in an interpreted language (cough*python*cough) so
> that we reduce the attack surface on them?

How does python reduce the attack surface? I've already started eu-ldd
in C because it needs to know quite a bit about the low-level pieces
and it was easiest for me to do this in C (headers, constants etc).
However, from a "independent implementation" perspective, writing it
in python would make it truly orthogonal from the C version.

Any other benefits to python?

Cheers,
Carlos.

Siddhesh Poyarekar July 17, 2015, 4:37 a.m. UTC | #15

On Fri, Jul 17, 2015 at 12:02:43AM -0400, Carlos O'Donell wrote:
> Is this really a bug?

Yes, because it is in the dynamic linker path and is not provably
isolated to the ldd use case.  It just happens so that it is exposed
by ldd, but I don't think it has been definitely proven yet that it is
not possible to expose the bug in the dynamic linker proper.

> The cost of the checks is more than just performance, but also maintenance
> and support of that code into the future. It is about testing that code,
> making sure we exercise those weird conditional failure paths which we
> thought would be a good thing to add. Each of these checks is an additional
> complication, added branch, and conditional in already complicated code.
> I would like to see the core loader as simple as possible and to expect
> as correct as possible input.
> 
> If I ask you for a test case for bug 18685, do we have one we can checkin
> to test this code? That goes a long way to helping maintain this code.

I don't yet; I can explore H. J.'s suggestion of writing out a corrupt
binary, but I don't yet have a reproducer that will trigger the crash
without --verify.  But not checking it in because of this is a weak
reason IMO when it is very clear that the bug exists.

> If ldd should not be run on untrusted input... then why are we looking
> at this patch? I think the fact on the ground is that ldd *is* run on

Like I said, because it is not specific to ldd; it just happens to be
exposed by ldd.

> untrusted input and eu-ldd isn't ready, and we face the grim specter of
> security issues. However, if we're talking security issues, there are
> a million ways I could craft relocs to overwrite and assemble ROP
> widgets to execute a custom program that would run crafty exploits.

Then we add those million checks as we find them if they affect ld.so
proper, i.e. not only the --verify path.  Although I suspect the
number will be much less than a million :)

> Running ldd is equivalent to ./app.

Not really, but it is not related to the argument I'm making, so I'll
let it go.

> See: "DEFCON 20: Programming Weird Machines with ELF Metadata"
> https://www.youtube.com/watch?v=YgtxxLCVD-o
> 
> So again, we can never support real ld.so running or processing untrusted
> input? The real fix becomes: Don't use ld.so, don't process relocs,
> don't run anything, just analyze.

If that is what we want to do, then we should remove ld.so --verify
and --list immediately and encourage use of other tools.  That is
still not related to this fix, which is directly in the ld.so path.

> I disagree that eu-ldd and ld.so should both be equally robust against
> bad input. Just consider the myriad of problems the binary could have,
> the fuzzing we put libelf through, all of that is costly verification
> that ld.so should not do.

We don't know the cost of verification yet.  We fear that it will be
too much, but we have no idea how much that 'too much' is.  If it is a
5% penalty on load time, IMO it is fine for the hardening it provides.
Maybe we could provide a build or runtime flag to bypass those checks,
but that is an added complication that is not worth the 5% IMO.

> How does python reduce the attack surface? I've already started eu-ldd
> in C because it needs to know quite a bit about the low-level pieces
> and it was easiest for me to do this in C (headers, constants etc).
> However, from a "independent implementation" perspective, writing it
> in python would make it truly orthogonal from the C version.
> 
> Any other benefits to python?

The biggest benefit is what happens in case of bugs in the program.
If the C bits are limited to just the core elfutils library, the worst
bug that the tools can have is a python error due to a missing
validation, which will just result in a crash.  With C, the worst
error is a buffer overrun or an arbitrary pointer dereference, which
may be exploitable.

So what is needed is a python wrapper around libelfutils, which eu-ldd
can then use.  I remember someone talking about such a project, but I
don't know if anyone actually got around to doing it.

Siddhesh

Florian Weimer July 17, 2015, 5:28 a.m. UTC | #16

On 07/16/2015 10:00 AM, Siddhesh Poyarekar wrote:
> Some valid ELF objects, like .debug files may refer to sections
> outside themselves since they're loaded and patched up to their parent
> ELF.  Calling ldd on them may result in a segfault since it may try to
> read beyond the end of the mapping.  Red Hat bz:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=741105
> 
> has an example, although I haven't been able to find a sample
> reproducer file immediately.  This patch has been carried in Fedora
> and RHEL for a couple of years now.  Also tested on x86_64 to verify
> that there are no regressions.

Can you show the ldd output with this patch applied on some files which
crashed before?  Is it useful at all?

I know that debuginfo files usually have garbage interpreter fields, so
I really doubt that ldd can produce anything useful on debuginfo files.

+	      errstring = N_("ELF load command past end of file");

The expectation appears to be that end users see this error message.  It
needs to better reflect what's going on.

Is there a way that an ELF file is corrupted in this way, but can be
still be loaded correctly?  I wonder if this change will break currently
working programs.  I do think we need to support loading corrupted ELF
files indefinitely if there ever was a binutils version producing them.

Siddhesh Poyarekar July 17, 2015, 6:15 a.m. UTC | #17

On Fri, Jul 17, 2015 at 07:28:45AM +0200, Florian Weimer wrote:
> Can you show the ldd output with this patch applied on some files which
> crashed before?  Is it useful at all?

I'll have to build one since I can't find any from a brief look
through files I have on my system.  The examples were mostly the
.debug files IIRC, so that has more to do with not crashing than
actually providing any useful information.

> I know that debuginfo files usually have garbage interpreter fields, so
> I really doubt that ldd can produce anything useful on debuginfo files.
> 
> +	      errstring = N_("ELF load command past end of file");
> 
> The expectation appears to be that end users see this error message.  It
> needs to better reflect what's going on.

OK, that makes sense.

> Is there a way that an ELF file is corrupted in this way, but can be
> still be loaded correctly?  I wonder if this change will break currently
> working programs.  I do think we need to support loading corrupted ELF
> files indefinitely if there ever was a binutils version producing them.

I don't know of any valid ELF programs that broke due to this.  The
patch (AFAICT) has been in Fedora for about 3 years, so if something
had to break, it should have been known by now.

Siddhesh

Ondřej Bílka July 17, 2015, 7:35 a.m. UTC | #18

On Fri, Jul 17, 2015 at 07:28:45AM +0200, Florian Weimer wrote:
> On 07/16/2015 10:00 AM, Siddhesh Poyarekar wrote:
> > Some valid ELF objects, like .debug files may refer to sections
> > outside themselves since they're loaded and patched up to their parent
> > ELF.  Calling ldd on them may result in a segfault since it may try to
> > read beyond the end of the mapping.  Red Hat bz:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=741105
> > 
> > has an example, although I haven't been able to find a sample
> > reproducer file immediately.  This patch has been carried in Fedora
> > and RHEL for a couple of years now.  Also tested on x86_64 to verify
> > that there are no regressions.
> 
> Can you show the ldd output with this patch applied on some files which
> crashed before?  Is it useful at all?
> 
> I know that debuginfo files usually have garbage interpreter fields, so
> I really doubt that ldd can produce anything useful on debuginfo files.
> 
> +	      errstring = N_("ELF load command past end of file");
> 
> The expectation appears to be that end users see this error message.  It
> needs to better reflect what's going on.
> 
> Is there a way that an ELF file is corrupted in this way, but can be
> still be loaded correctly?  I wonder if this change will break currently
> working programs.  I do think we need to support loading corrupted ELF
> files indefinitely if there ever was a binutils version producing them.
>
Also could you explain how this is security risk? If that needs
modification of elf then attacker could run arbitrary commands by
overriding main for example.

Florian Weimer July 17, 2015, 7:43 a.m. UTC | #19

On 07/17/2015 09:35 AM, Ondřej Bílka wrote:

> Also could you explain how this is security risk? If that needs
> modification of elf then attacker could run arbitrary commands by
> overriding main for example.

Just because I comment on a thread doesn't mean I consider it
security-related.

Or what do you mean?

Ondřej Bílka July 17, 2015, 8:03 a.m. UTC | #20

On Fri, Jul 17, 2015 at 09:43:29AM +0200, Florian Weimer wrote:
> On 07/17/2015 09:35 AM, Ondřej Bílka wrote:
> 
> > Also could you explain how this is security risk? If that needs
> > modification of elf then attacker could run arbitrary commands by
> > overriding main for example.
> 
> Just because I comment on a thread doesn't mean I consider it
> security-related.
> 
> Or what do you mean?
> 
Well here I did bit distracted in reading thread, starting reply, then
writing what I wanted to ask to wrong post. I wanted to reply instead to
what Carlos wrote:

"
Running the result of `elf/ldd.bash.in` (invoking the real loader) should run
the real loader in trace mode. The real loader should rarely compromise to being
slower because it's in trace mode. It should IMO not unduly compromise performance
for hardening against problems which are the result of other failures in the
system e.g. storage, memory, kernel, etc. Where possible some hardening is good
given clearly stated rationale for the change e.g. a real security threat.

In this particular case, hardening against stripped debug data, is the wrong
solution. The debug data must be clearly marked as such since it's actually
not real DSOs or executables, and we've cheated by using objcopy to get something
that looks like a valid ELF file but isn't anymore. We need to stop being sloppy
about how we label that kind of object.

Is that clear enough?
"

Florian Weimer July 17, 2015, 11:24 a.m. UTC | #21

On 07/17/2015 08:15 AM, Siddhesh Poyarekar wrote:
> On Fri, Jul 17, 2015 at 07:28:45AM +0200, Florian Weimer wrote:
>> Can you show the ldd output with this patch applied on some files which
>> crashed before?  Is it useful at all?
> 
> I'll have to build one since I can't find any from a brief look
> through files I have on my system.  The examples were mostly the
> .debug files IIRC, so that has more to do with not crashing than
> actually providing any useful information.

Useful information would be that's a debuginfo file.  Showing no output
or garbage is hardly more helpful than crashing because the user still
doesn't know what's wrong.  They won't even report a bug, so we can't
help them.

>> Is there a way that an ELF file is corrupted in this way, but can be
>> still be loaded correctly?  I wonder if this change will break currently
>> working programs.  I do think we need to support loading corrupted ELF
>> files indefinitely if there ever was a binutils version producing them.
> 
> I don't know of any valid ELF programs that broke due to this.  The
> patch (AFAICT) has been in Fedora for about 3 years, so if something
> had to break, it should have been known by now.

Unfortunately, Fedora coverage for non-i386/x86_64 and proprietary
legacy applications is quite poor.

In this check,

+	  if (__glibc_unlikely (ph->p_offset + ph->p_filesz > st.st_size))

do p_offset and p_filesz correspond to the program header values in the
file, or have they already been modified?.

Looking at the unmodified program header values (as reported by elfutils
and the gelf_getphdr function) in a fairly large set of RPMs, I do not
see production ELF files (as opposed debuginfo files or firmware images)
where this constraint is violated (for type 1 headers, that is).  The
RPMs are from every minor Red Hat Enterprise Linux release going back to
version 4 and from recent Fedora releases.  It's not a complete set of
everything shipped by Red Hat or Fedora by a long shot, but it strongly
suggests that we did not have a recent bug introducing systematic
corruption.

One more question, regarding the new check: Is there already a check
that the addition does not overflow?

Florian Weimer July 17, 2015, 11:25 a.m. UTC | #22

On 07/17/2015 10:03 AM, Ondřej Bílka wrote:
> On Fri, Jul 17, 2015 at 09:43:29AM +0200, Florian Weimer wrote:
>> On 07/17/2015 09:35 AM, Ondřej Bílka wrote:
>>
>>> Also could you explain how this is security risk? If that needs
>>> modification of elf then attacker could run arbitrary commands by
>>> overriding main for example.
>>
>> Just because I comment on a thread doesn't mean I consider it
>> security-related.
>>
>> Or what do you mean?
>>
> Well here I did bit distracted in reading thread, starting reply, then
> writing what I wanted to ask to wrong post. I wanted to reply instead to
> what Carlos wrote:

Okay, well, I think a safe ldd should be a goal, but I don't think we
can currently make such a commitment.

Andrew Pinski July 17, 2015, noon UTC | #23

> On Jul 17, 2015, at 12:02 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> On 07/16/2015 11:28 PM, Siddhesh Poyarekar wrote:
>>> On Thu, Jul 16, 2015 at 11:59:18AM -0400, Carlos O'Donell wrote:
>>> This is not the right fix for this problem. The right fix has not
>>> been attempted because it involves someone doing some real leg work
>>> to gather consensus. This fix adds complex checking in ld.so for
>>> minimal gain, and eventually you'll get a debuginfo file that is
>>> different again in some odd way.
>> 
>> This is not specifically about being able to read debug files, nor is
>> it about ldd.  It just happens to be ldd (ld.so [--verify|--list])
>> that crashed, but the offending code is bang in the middle of generic
>> ld.so code that can potentially be exploited when running arbitrary
>> binaries.  While it is true that one should not run arbitrary code
>> anyway, it shouldn't be an excuse for not fixing bugs.  I don't see
>> the point of not adding such checks as they come up; performance is an
>> excuse made quite regularly, but what is the actual cost of such
>> checks?
> 
> Is this really a bug?
> 
> The cost of the checks is more than just performance, but also maintenance
> and support of that code into the future. It is about testing that code,
> making sure we exercise those weird conditional failure paths which we
> thought would be a good thing to add. Each of these checks is an additional
> complication, added branch, and conditional in already complicated code.
> I would like to see the core loader as simple as possible and to expect
> as correct as possible input.

Yes but the cost is cheaper than someone accidentally doing a dlopen of a file and causing a crash rather than retuning malformed elf file. I suspect that is a way how to reproduce this failure. 

Thanks,
Andrew

> 
> If I ask you for a test case for bug 18685, do we have one we can checkin
> to test this code? That goes a long way to helping maintain this code.
> 
> If ldd should not be run on untrusted input... then why are we looking
> at this patch? I think the fact on the ground is that ldd *is* run on
> untrusted input and eu-ldd isn't ready, and we face the grim specter of
> security issues. However, if we're talking security issues, there are
> a million ways I could craft relocs to overwrite and assemble ROP
> widgets to execute a custom program that would run crafty exploits.
> Running ldd is equivalent to ./app.
> 
> See: "DEFCON 20: Programming Weird Machines with ELF Metadata"
> https://www.youtube.com/watch?v=YgtxxLCVD-o
> 
> So again, we can never support real ld.so running or processing untrusted
> input? The real fix becomes: Don't use ld.so, don't process relocs,
> don't run anything, just analyze.
> 
>> To be clear, I am not against having an eu-ldd, but that shouldn't be
>> an excuse for not patching ld.so.  Things that don't crash on eu-ldd,
>> should not crash on ld.so.
> 
> To fix or not to fix ld.so should be done on the merit of the individual
> change.
> 
> I disagree that eu-ldd and ld.so should both be equally robust against
> bad input. Just consider the myriad of problems the binary could have,
> the fuzzing we put libelf through, all of that is costly verification
> that ld.so should not do.
> 
>> Oh, and did I mention that eu-ldd (and most of elfutils) should
>> ideally be written in an interpreted language (cough*python*cough) so
>> that we reduce the attack surface on them?
> 
> How does python reduce the attack surface? I've already started eu-ldd
> in C because it needs to know quite a bit about the low-level pieces
> and it was easiest for me to do this in C (headers, constants etc).
> However, from a "independent implementation" perspective, writing it
> in python would make it truly orthogonal from the C version.
> 
> Any other benefits to python?
> 
> Cheers,
> Carlos.

Siddhesh Poyarekar July 17, 2015, 12:21 p.m. UTC | #24

On Fri, Jul 17, 2015 at 01:24:05PM +0200, Florian Weimer wrote:
> Useful information would be that's a debuginfo file.  Showing no output
> or garbage is hardly more helpful than crashing because the user still
> doesn't know what's wrong.  They won't even report a bug, so we can't
> help them.

The trouble is, you don't know for sure that it is a debuginfo file.
That's where Carlos talked about the need for a new flag.

> Unfortunately, Fedora coverage for non-i386/x86_64 and proprietary
> legacy applications is quite poor.

The patch has been in rhel-6 for a similar amount of time as well,
i.e. since before I started maintaining the tree.

> In this check,
> 
> +	  if (__glibc_unlikely (ph->p_offset + ph->p_filesz > st.st_size))
> 
> do p_offset and p_filesz correspond to the program header values in the
> file, or have they already been modified?.

They correspond to the program header values, why do you think they
have been modified?

> Looking at the unmodified program header values (as reported by elfutils
> and the gelf_getphdr function) in a fairly large set of RPMs, I do not
> see production ELF files (as opposed debuginfo files or firmware images)
> where this constraint is violated (for type 1 headers, that is).  The
> RPMs are from every minor Red Hat Enterprise Linux release going back to
> version 4 and from recent Fedora releases.  It's not a complete set of
> everything shipped by Red Hat or Fedora by a long shot, but it strongly
> suggests that we did not have a recent bug introducing systematic
> corruption.

Yes, ELF files (minus debuginfo) build by our assemblers and linkers
should not have this issue.

> One more question, regarding the new check: Is there already a check
> that the addition does not overflow?

There isn't.  I intend to add it but it doesn't seem like there's
agreement on including this patch at all.

Siddhesh

Florian Weimer July 17, 2015, 12:30 p.m. UTC | #25

On 07/17/2015 02:21 PM, Siddhesh Poyarekar wrote:
> On Fri, Jul 17, 2015 at 01:24:05PM +0200, Florian Weimer wrote:
>> Useful information would be that's a debuginfo file.  Showing no output
>> or garbage is hardly more helpful than crashing because the user still
>> doesn't know what's wrong.  They won't even report a bug, so we can't
>> help them.
> 
> The trouble is, you don't know for sure that it is a debuginfo file.
> That's where Carlos talked about the need for a new flag.
> 
>> Unfortunately, Fedora coverage for non-i386/x86_64 and proprietary
>> legacy applications is quite poor.
> 
> The patch has been in rhel-6 for a similar amount of time as well,
> i.e. since before I started maintaining the tree.

Okay, good to know.

This is addresses my concerns about the backwards compatibility impact
of this change.  Consider them withdrawn.

>> In this check,
>>
>> +	  if (__glibc_unlikely (ph->p_offset + ph->p_filesz > st.st_size))
>>
>> do p_offset and p_filesz correspond to the program header values in the
>> file, or have they already been modified?.
> 
> They correspond to the program header values, why do you think they
> have been modified?

No particular reason, I just wanted to be sure. :-)

>> One more question, regarding the new check: Is there already a check
>> that the addition does not overflow?
> 
> There isn't.  I intend to add it but it doesn't seem like there's
> agreement on including this patch at all.

Oh well, understood.  (I was mainly worried about the compatibility impact.)

Carlos O'Donell July 17, 2015, 12:51 p.m. UTC | #26

On 07/17/2015 08:00 AM, pinskia@gmail.com wrote:
>> Is this really a bug?
>>
>> The cost of the checks is more than just performance, but also maintenance
>> and support of that code into the future. It is about testing that code,
>> making sure we exercise those weird conditional failure paths which we
>> thought would be a good thing to add. Each of these checks is an additional
>> complication, added branch, and conditional in already complicated code.
>> I would like to see the core loader as simple as possible and to expect
>> as correct as possible input.
> 
> Yes but the cost is cheaper than someone accidentally doing a dlopen
> of a file and causing a crash rather than retuning malformed elf
> file. I suspect that is a way how to reproduce this failure.

Andrew,

Sorry, could you please clarify: Cheaper for the glibc community or
cheaper for the user? Could you clarify what you mean by "returning
malformed elf file?"

Cheers,
Carlos.

Carlos O'Donell July 17, 2015, 4:47 p.m. UTC | #27

> We don't know the cost of verification yet.  We fear that it will be
> too much, but we have no idea how much that 'too much' is.  If it is a
> 5% penalty on load time, IMO it is fine for the hardening it provides.
> Maybe we could provide a build or runtime flag to bypass those checks,
> but that is an added complication that is not worth the 5% IMO.

The design goal of the dynamic loader is to consume correctly formed
ELF files and to assemble an in-memory image of the application for
execution by the operating system and hardware.

The dynamic loader will not assume the ELF files are corrupt, and
can proceed directly to use any of the information provided and
encoded in the format in order to carry out it's operations as
quickly and efficiently as possible.

The dynamic loader implementation will be as minimal as possible
to support auditing, code coverage, unit testing, and regression
testing. This means that any added code that is not part of the
above purposes goes against the design goals of the dynamic loader.

There are no security risks in running this way. The only security
risk is that you ran a binary from an untrusted source, or that
the underlying transport system was unreliable and corrupted the
binary.

There are tools to verify the binary and if there aren't they can
be written using interpreted languages like Python and ELF format
handling libraries like libelf.

---

The patch in the this thread...

https://www.sourceware.org/ml/libc-alpha/2015-07/msg00480.html

... violates the design goals of the dyanmic loader by slowing down,
even infinitesimally (since design goals are pedantic things),
the dynamic loader for the purpose of quick and efficient execution
of binaries. The additional code adds to the maintenance burden
of the dynamic loader without meeting the goals of the loader.

Arguing that it is a bug is treading on thin ice, since the bug
does not violate the design goals as stated.

We can and should discuss the design goals of the loader since
they are at the core of this discussion.

Not listed above is: What diagnostics is the loader to provide
if the ELF file is faulty? Any? A crash does not seem like a good
diagnostic? I argue that a crash is a fine diagnostic, and that
it is not the job of the dynamic loader to verify the integrity
of the binary or libraries, and that we should write other tools
for that purpose which can potentially be leveraged by the
dynamic loader running in a strict runtime verification mode
(another discussion to be had).

The course of action I recommend are based on the reality of
our situation and the true causes of crashes in ld.so which
mostly happen to be distro tooling issues and running ldd against
things which are not complete ELF files.

Next steps:

* Work to get ET_* markings for debuginfo files to avoid the
  hacks we have presently thought about adding to the dynamic
  loader.

* Work to complete eu-ldd to replace elf/ldd.bash.in, and then
  work to remove or rewrite the ld.so trace support to provide
  a developer-centric interface for debugging that is not
  restricted by a need to support ldd.

* Work to create a new project e.g. eu-verifyelf (if it doesn't
  exist) which performs strict ELF file verification and can
  for example consume the output of an ld.so trace to verify
  dlmopen/dlopen calls also (eu-ldd could also get this support).

* Lastly, add a mode where ld.so calls out to eu-verifyelf in
  some way to verify the initial binaries and libraries when
  run in some strict ELF checking mode. This cost is born once
  at startup and every time we call dlopen/dlmopen.

Cheers,
Carlos.

Roland McGrath July 17, 2015, 4:54 p.m. UTC | #28

I concur with this analysis of what rtld is and should be.  I concur with
the goal of getting ldd support out of rtld proper.  The details beyond
that need more discussion before becoming specific goals (to me).

Siddhesh Poyarekar July 18, 2015, 1:43 a.m. UTC | #29

I disagree with the design goals as they're stated.  I won't waste
time with this though because I don't see anybody agreeing with my
more conservative stand on how robust the dynamic linker should be.

Siddhesh

Roland McGrath July 18, 2015, 2:15 a.m. UTC | #30

> I disagree with the design goals as they're stated.  I won't waste
> time with this though because I don't see anybody agreeing with my
> more conservative stand on how robust the dynamic linker should be.

I think we can discuss it further to understand your rationale better,
if you care at all.  Do you disagree with the notion of getting to an
ldd that does not involve running rtld?  If you intend to support the
ldd case, then the motivation for graceful handling of bad ELF files
is clear.  If that is not the sole motivation, then I would like to
hear you elaborate on what other motivations lead you to wanting this
sort of change in rtld.


Thanks,
Roland

Siddhesh Poyarekar July 18, 2015, 2:49 a.m. UTC | #31

On Fri, Jul 17, 2015 at 07:15:15PM -0700, Roland McGrath wrote:
> I think we can discuss it further to understand your rationale better,
> if you care at all.  Do you disagree with the notion of getting to an
> ldd that does not involve running rtld?  If you intend to support the

No, I am very much in favour of getting ldd out of libc altogether; in
fact I am of the opinion that we remove those options completely from
the dynamic linker once we have a usable alternative.

> ldd case, then the motivation for graceful handling of bad ELF files
> is clear.  If that is not the sole motivation, then I would like to
> hear you elaborate on what other motivations lead you to wanting this
> sort of change in rtld.

My motivation for wanting this sort of change is primarily to do with
the way the dynamic linker would behave if it weren't patched, i.e. it
would access or modify arbitrary addresses and either crash or do
something that it is not intended to do.  We have a reproducer with
ldd, but the code path is not exclusive to ldd, i.e. ld.so <binary>
could trigger this as well.

We have some validations in place already, so it is not clear to me
what the criteria for selecting them is and at present it seems
arbitrary.  If there is a clear guideline for what kind of sanity
checks are OK for the linker and what aren't, I can live (albeit
slightly uncomfortably since I still cannot reconcile with arbitrary
memory accesses being expected behaviour) with not adding such a
check.

Siddhesh

Siddhesh Poyarekar July 18, 2015, 3:04 a.m. UTC | #32

On Sat, Jul 18, 2015 at 08:19:53AM +0530, Siddhesh Poyarekar wrote:
> We have some validations in place already, so it is not clear to me
> what the criteria for selecting them is and at present it seems
> arbitrary.  If there is a clear guideline for what kind of sanity

OK, so I took another look and the only validation that seems kinda
related is the one that detects the wrong ELF class.  That too is not
specifically a protection against invalid ELF; just not somthing this
linker runs.  So I guess it is easy to claim that ld.so assumes that
the ELF is valid.  I withdraw my patch if the general idea of allowing
the linker to do arbitrary things or crashing on invalid ELF is
acceptable.

Siddhesh

Rich Felker July 18, 2015, 7:37 a.m. UTC | #33

On Fri, Jul 17, 2015 at 12:47:44PM -0400, Carlos O'Donell wrote:
> > We don't know the cost of verification yet.  We fear that it will be
> > too much, but we have no idea how much that 'too much' is.  If it is a
> > 5% penalty on load time, IMO it is fine for the hardening it provides.
> > Maybe we could provide a build or runtime flag to bypass those checks,
> > but that is an added complication that is not worth the 5% IMO.
> 
> The design goal of the dynamic loader is to consume correctly formed
> ELF files and to assemble an in-memory image of the application for
> execution by the operating system and hardware.
> 
> The dynamic loader will not assume the ELF files are corrupt, and
> can proceed directly to use any of the information provided and
> encoded in the format in order to carry out it's operations as
> quickly and efficiently as possible.

Note that any attempt to validate ELF contents at runtime is utterly
useless unless you abandon mmap and read the file contents into memory
instead. Since Linux lacks MAP_COPY and only has MAP_PRIVATE or
MA_SHARED, there is always a risk that the file will be modified after
validation but before use.

Rich

Florian Weimer July 18, 2015, 9:55 a.m. UTC | #34

* Carlos O'Donell:

> The design goal of the dynamic loader is to consume correctly formed
> ELF files and to assemble an in-memory image of the application for
> execution by the operating system and hardware.
>
> The dynamic loader will not assume the ELF files are corrupt, and
> can proceed directly to use any of the information provided and
> encoded in the format in order to carry out it's operations as
> quickly and efficiently as possible.
>
> The dynamic loader implementation will be as minimal as possible
> to support auditing, code coverage, unit testing, and regression
> testing. This means that any added code that is not part of the
> above purposes goes against the design goals of the dynamic loader.

Grudgingly, I agree.

> There are no security risks in running this way.

Correct, and I really hope it stays this way.  There is considerable
interest in implementing code signing, but I have philosophical
objects to that because it can only be used to take away freedoms from
users (and device owners).

Carlos O'Donell July 20, 2015, 12:56 p.m. UTC | #35

On 07/17/2015 10:49 PM, Siddhesh Poyarekar wrote:
> On Fri, Jul 17, 2015 at 07:15:15PM -0700, Roland McGrath wrote:
>> I think we can discuss it further to understand your rationale better,
>> if you care at all.  Do you disagree with the notion of getting to an
>> ldd that does not involve running rtld?  If you intend to support the
> 
> No, I am very much in favour of getting ldd out of libc altogether; in
> fact I am of the opinion that we remove those options completely from
> the dynamic linker once we have a usable alternative.

I agree completely, but we would need to replace it with something like
stap-based probe tracing which compiles to nop's normally but allows an
external program like systemtrace to be used to trace library loading with
all the same detailed data as before but for developers only (and ship
the stap script with libc.so.6).

>> ldd case, then the motivation for graceful handling of bad ELF files
>> is clear.  If that is not the sole motivation, then I would like to
>> hear you elaborate on what other motivations lead you to wanting this
>> sort of change in rtld.
> 
> My motivation for wanting this sort of change is primarily to do with
> the way the dynamic linker would behave if it weren't patched, i.e. it
> would access or modify arbitrary addresses and either crash or do
> something that it is not intended to do.  We have a reproducer with
> ldd, but the code path is not exclusive to ldd, i.e. ld.so <binary>
> could trigger this as well.
> 
> We have some validations in place already, so it is not clear to me
> what the criteria for selecting them is and at present it seems
> arbitrary.  If there is a clear guideline for what kind of sanity
> checks are OK for the linker and what aren't, I can live (albeit
> slightly uncomfortably since I still cannot reconcile with arbitrary
> memory accesses being expected behaviour) with not adding such a
> check.

It important to have these discussions. We should have community
members with different view points, and be ready to use those views
to better serve our users should the needs of those users change.

I too find a certain degree of cognitive dissonance between what I
as a developer want in ld.so and what our users want. I want a robust
ld.so, but our users want performance. Therefore I judge the best
criteria to be two distinct tools e.g. ld.so and ldd (which has all
or most of the trace features of ld.so).

Cheers,
Carlos.

Siddhesh Poyarekar July 20, 2015, 1:51 p.m. UTC | #36

On Mon, Jul 20, 2015 at 08:56:08AM -0400, Carlos O'Donell wrote:
> I agree completely, but we would need to replace it with something like
> stap-based probe tracing which compiles to nop's normally but allows an
> external program like systemtrace to be used to trace library loading with
> all the same detailed data as before but for developers only (and ship
> the stap script with libc.so.6).

I don't understand - we already have stap probe points in ld.so that
do this and we also have LD_DEBUG, which gives the ability to trace
ld.so activities the conventional way, i.e. by dumping to a file or
stderr.  Neither have anything to do with ldd.

> It important to have these discussions. We should have community
> members with different view points, and be ready to use those views
> to better serve our users should the needs of those users change.

I have kinda reconciled with this since I realized that we don't
already validate ELF binaries - I had misread the validations.

Siddhesh

Carlos O'Donell July 20, 2015, 2:19 p.m. UTC | #37

On 07/20/2015 09:51 AM, Siddhesh Poyarekar wrote:
> On Mon, Jul 20, 2015 at 08:56:08AM -0400, Carlos O'Donell wrote:
>> I agree completely, but we would need to replace it with something like
>> stap-based probe tracing which compiles to nop's normally but allows an
>> external program like systemtrace to be used to trace library loading with
>> all the same detailed data as before but for developers only (and ship
>> the stap script with libc.so.6).
> 
> I don't understand - we already have stap probe points in ld.so that
> do this and we also have LD_DEBUG, which gives the ability to trace
> ld.so activities the conventional way, i.e. by dumping to a file or
> stderr.  Neither have anything to do with ldd.

The latter two are code in the core loader that are more difficult
to audit than stap-based probes? If we removed all trace an debug code
from and moved to stap, we would have something that is even easier to audit?

Cheers,
Carlos.

Siddhesh Poyarekar July 20, 2015, 3:13 p.m. UTC | #38

On Mon, Jul 20, 2015 at 10:19:05AM -0400, Carlos O'Donell wrote:
> The latter two are code in the core loader that are more difficult
> to audit than stap-based probes? If we removed all trace an debug code
> from and moved to stap, we would have something that is even easier to audit?

Possibly, if stap with dyninst becomes stable enough (I had a few
difficulties using it a few months ago, I don't remember why) then we
can tout that as the replacement.

Siddhesh

Stan Shebs July 20, 2015, 3:31 p.m. UTC | #39

On Fri, Jul 17, 2015 at 11:47 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>
> [...]
> The dynamic loader implementation will be as minimal as possible
> to support auditing, code coverage, unit testing, and regression
> testing. This means that any added code that is not part of the
> above purposes goes against the design goals of the dynamic loader.
>
> There are no security risks in running this way. The only security
> risk is that you ran a binary from an untrusted source, or that
> the underlying transport system was unreliable and corrupted the
> binary.

I'm just a glibc n00b, so I may be missing something, but it seems like
the stated design goal results in the code being more brittle, which leads
to security risks.  Even if every code reviewer has convinced themselves
than an unchecked pointer dereference is safe, it is still possible for things
to go wrong in a way that nobody thought of - and in TLS for instance, we
are still finding and fixing holes, years after the code was written.

It used to be that we had to take these sorts of chances so as to get
reasonable performance, but I don't think that has been necessary for
awhile.  Do people have statistics showing ld.so dominating execution
time?

Stan

Carlos O'Donell July 20, 2015, 3:39 p.m. UTC | #40

On 07/20/2015 11:31 AM, Stan Shebs wrote:
> On Fri, Jul 17, 2015 at 11:47 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>>
>> [...]
>> The dynamic loader implementation will be as minimal as possible
>> to support auditing, code coverage, unit testing, and regression
>> testing. This means that any added code that is not part of the
>> above purposes goes against the design goals of the dynamic loader.
>>
>> There are no security risks in running this way. The only security
>> risk is that you ran a binary from an untrusted source, or that
>> the underlying transport system was unreliable and corrupted the
>> binary.
> 
> I'm just a glibc n00b, so I may be missing something, but it seems like
> the stated design goal results in the code being more brittle, which leads
> to security risks.  Even if every code reviewer has convinced themselves
> than an unchecked pointer dereference is safe, it is still possible for things
> to go wrong in a way that nobody thought of - and in TLS for instance, we
> are still finding and fixing holes, years after the code was written.

The design goal sets the bar high, and forces you to argue for your change
with solid evidence and fact. It's just a goal after all and we often fall
short.
 
> It used to be that we had to take these sorts of chances so as to get
> reasonable performance, but I don't think that has been necessary for
> awhile.  Do people have statistics showing ld.so dominating execution
> time?

Isn't this a self-fullfilling prophecy though? We don't show up on execution
time precisely because of the measures we take? :-)

Cheers,
Carlos.

Avoid mapping past end of shared object (BZ #18685)

Commit Message

Comments

Patch