Message ID | 1437033625-13561-1-git-send-email-siddhesh@redhat.com |
---|---|
State | New |
Headers | show |
On 16 Jul 2015 13:30, Siddhesh Poyarekar wrote:
> + if (__glibc_unlikely (ph->p_offset + ph->p_filesz > st.st_size))
p_offset and p_filesz are 64bit w/64bit ELFs, so seems like this too could
overflow wrt the 64bit st_size. is there some other check earlier on that
would catch that ?
-mike
On 07/16/2015 04:00 AM, Siddhesh Poyarekar wrote: > Some valid ELF objects, like .debug files may refer to sections > outside themselves since they're loaded and patched up to their parent > ELF. Calling ldd on them may result in a segfault since it may try to > read beyond the end of the mapping. Red Hat bz: > > https://bugzilla.redhat.com/show_bug.cgi?id=741105 > > has an example, although I haven't been able to find a sample > reproducer file immediately. This patch has been carried in Fedora > and RHEL for a couple of years now. Also tested on x86_64 to verify > that there are no regressions. > > [BZ #18685] > * sysdeps/generic/dl-fileid.h (_dl_get_file_id): Accept > pointer to a struct stat64. > * sysdeps/posix/dl-fileid.h (_dl_get_file_id): Likewise. > * elf/dl-load.c (_dl_map_object_from_fd): Avoid mapping past > end of ELF file. This is not the right fix for this problem. The right fix has not been attempted because it involves someone doing some real leg work to gather consensus. This fix adds complex checking in ld.so for minimal gain, and eventually you'll get a debuginfo file that is different again in some odd way. The fix is to allocate a new ET_DEBUG_EXEC and ET_DEBUG_DYN and mark the debuginfo with those flags. That way ldd knows that the debuginfo is going to be incomplete and take action e.g. like printing out a message saying "This is debuginfo for a library" without processing any of the sections. Then you update binutils to add the ET_* flags to debuginfo objects. It can't be a generic elf header bit because most arches have used them all up. It can't be a .note because that is allocated and may reorganize the debuginfo in ways which mean it can't easily be layered on top of the original executable from which it was stripped. Not to mention .note's are too hard to find for some tools. The best option is a new set of ET_* flags. If you can't get those values standardized, and yes we'd have to go to the Generic ELF ABI people and ask for them, then we could use ET_GNU_* and use the OS-specific value range. Cheers, Carlos.
Perhaps ldd should use a specially compiled ld.so that has a lot of extra checks added (so that it can be run on arbitrary objects without creating security hazards). Andreas.
On 07/16/2015 12:36 PM, Andreas Schwab wrote: > Perhaps ldd should use a specially compiled ld.so that has a lot of > extra checks added (so that it can be run on arbitrary objects without > creating security hazards). ldd should be a distinct tool that uses libelf, and provides deeper introspection and options. This way we have a second implementation of the loader rules in a cleaner and concise form that we can use to double-check assertions about load order and cycle breakage, and cross-check ld.so changes. I have a half-written eu-ldd that I've started. I guess I should contribute that to elfutils soon. c.
I concur with Carlos's NAK on this change. The desireability of alternatives proposed is not clear to me off hand. Those should be discussed in separate threads.
On 16 Jul 2015 13:17, Carlos O'Donell wrote: > On 07/16/2015 12:36 PM, Andreas Schwab wrote: > > Perhaps ldd should use a specially compiled ld.so that has a lot of > > extra checks added (so that it can be run on arbitrary objects without > > creating security hazards). > > ldd should be a distinct tool that uses libelf, and provides deeper > introspection and options. This way we have a second implementation > of the loader rules in a cleaner and concise form that we can use > to double-check assertions about load order and cycle breakage, > and cross-check ld.so changes. i agree in having a dedicated/robust tool, but i don't think it means we should disable the trace option in the ldso itself. having a way to get the actual details out of the ldso i think is still valuable. i also don't think glibc should be dependent upon the elfutils package ... -mike
On 07/16/2015 10:08 PM, Mike Frysinger wrote: > On 16 Jul 2015 13:17, Carlos O'Donell wrote: >> On 07/16/2015 12:36 PM, Andreas Schwab wrote: >>> Perhaps ldd should use a specially compiled ld.so that has a lot of >>> extra checks added (so that it can be run on arbitrary objects without >>> creating security hazards). >> >> ldd should be a distinct tool that uses libelf, and provides deeper >> introspection and options. This way we have a second implementation >> of the loader rules in a cleaner and concise form that we can use >> to double-check assertions about load order and cycle breakage, >> and cross-check ld.so changes. > > i agree in having a dedicated/robust tool, but i don't think it means we should > disable the trace option in the ldso itself. having a way to get the actual > details out of the ldso i think is still valuable. Agreed. We need trace to do comparison between eu-ldd and ld.so. We need trace for developers and for runtime dumps which include looking at dlopen/dlmopen calls. We need more trace data to look at namespaces for dlmopen. Making this trace robust against corrupt binaries is a questionable endeavor though? > i also don't think glibc should be dependent upon the elfutils package ... It would be, but only for testing, not building, and in a bootstrap without eu-ldd you'd just have UNSUPPORTED tests, like if you didn't have libstdc++-static available for all the C++ tests. c.
On 16 Jul 2015 22:45, Carlos O'Donell wrote: > On 07/16/2015 10:08 PM, Mike Frysinger wrote: > > On 16 Jul 2015 13:17, Carlos O'Donell wrote: > >> On 07/16/2015 12:36 PM, Andreas Schwab wrote: > >>> Perhaps ldd should use a specially compiled ld.so that has a lot of > >>> extra checks added (so that it can be run on arbitrary objects without > >>> creating security hazards). > >> > >> ldd should be a distinct tool that uses libelf, and provides deeper > >> introspection and options. This way we have a second implementation > >> of the loader rules in a cleaner and concise form that we can use > >> to double-check assertions about load order and cycle breakage, > >> and cross-check ld.so changes. > > > > i agree in having a dedicated/robust tool, but i don't think it means we should > > disable the trace option in the ldso itself. having a way to get the actual > > details out of the ldso i think is still valuable. > > Agreed. We need trace to do comparison between eu-ldd and ld.so. > We need trace for developers and for runtime dumps which include > looking at dlopen/dlmopen calls. We need more trace data to look > at namespaces for dlmopen. > > Making this trace robust against corrupt binaries is a questionable > endeavor though? i think we need to clarify the goals before we dive into technical details. how about: running `ldd <foo>` should be safe (i.e. not crash or be a security hazard) if we can all agree on that, then a lot of things shake out from there. > > i also don't think glibc should be dependent upon the elfutils package ... > > It would be, but only for testing, not building, and in a bootstrap > without eu-ldd you'd just have UNSUPPORTED tests, like if you didn't > have libstdc++-static available for all the C++ tests. if it's not installed, then i have no complaints :). random/excess deps for debug/dev tools are fine. -mike
On Thu, Jul 16, 2015 at 1:00 AM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote: > Some valid ELF objects, like .debug files may refer to sections > outside themselves since they're loaded and patched up to their parent > ELF. Calling ldd on them may result in a segfault since it may try to > read beyond the end of the mapping. Red Hat bz: > > https://bugzilla.redhat.com/show_bug.cgi?id=741105 > > has an example, although I haven't been able to find a sample > reproducer file immediately. This patch has been carried in Fedora > and RHEL for a couple of years now. Also tested on x86_64 to verify > that there are no regressions. > > [BZ #18685] > * sysdeps/generic/dl-fileid.h (_dl_get_file_id): Accept > pointer to a struct stat64. > * sysdeps/posix/dl-fileid.h (_dl_get_file_id): Likewise. > * elf/dl-load.c (_dl_map_object_from_fd): Avoid mapping past > end of ELF file. If I understand it correctly, ld.so crashes on an ELF file with segment headers where file offset is larger than the file itself. Isn't it the same as a corrupted ELF file with segment headers where file offset is larger than the file itself? I can create such a file with a binary editor. How should ld.so deal with such ELF files? H.J.
On 07/16/2015 10:55 PM, Mike Frysinger wrote: > On 16 Jul 2015 22:45, Carlos O'Donell wrote: >> On 07/16/2015 10:08 PM, Mike Frysinger wrote: >>> On 16 Jul 2015 13:17, Carlos O'Donell wrote: >>>> On 07/16/2015 12:36 PM, Andreas Schwab wrote: >>>>> Perhaps ldd should use a specially compiled ld.so that has a lot of >>>>> extra checks added (so that it can be run on arbitrary objects without >>>>> creating security hazards). >>>> >>>> ldd should be a distinct tool that uses libelf, and provides deeper >>>> introspection and options. This way we have a second implementation >>>> of the loader rules in a cleaner and concise form that we can use >>>> to double-check assertions about load order and cycle breakage, >>>> and cross-check ld.so changes. >>> >>> i agree in having a dedicated/robust tool, but i don't think it means we should >>> disable the trace option in the ldso itself. having a way to get the actual >>> details out of the ldso i think is still valuable. >> >> Agreed. We need trace to do comparison between eu-ldd and ld.so. >> We need trace for developers and for runtime dumps which include >> looking at dlopen/dlmopen calls. We need more trace data to look >> at namespaces for dlmopen. >> >> Making this trace robust against corrupt binaries is a questionable >> endeavor though? > > i think we need to clarify the goals before we dive into technical details. > how about: > running `ldd <foo>` should be safe (i.e. not crash or be a security hazard) > > if we can all agree on that, then a lot of things shake out from there. Running `ldd <foo>` should be safe and free of security hazards at a distro level, but that `ldd` *should* be eu-ldd and based on a hardened and fuzzed libelf. It should leverage the work we are doing to harden libelf. Hardening the real loader to serve as an alternate tool is against the "do one thing well" philosophy. We will always have conflicting goals if we want ld.so to be both a hardened introspection tool and a fast and performant loader. Running the result of `elf/ldd.bash.in` (invoking the real loader) should run the real loader in trace mode. The real loader should rarely compromise to being slower because it's in trace mode. It should IMO not unduly compromise performance for hardening against problems which are the result of other failures in the system e.g. storage, memory, kernel, etc. Where possible some hardening is good given clearly stated rationale for the change e.g. a real security threat. In this particular case, hardening against stripped debug data, is the wrong solution. The debug data must be clearly marked as such since it's actually not real DSOs or executables, and we've cheated by using objcopy to get something that looks like a valid ELF file but isn't anymore. We need to stop being sloppy about how we label that kind of object. Is that clear enough? I want to avoid a blanket statement. In this case I think the hardening rational, that of supporting debug files, is not worth the change in the code. Cheers, Carlos.
On Thu, Jul 16, 2015 at 11:59:18AM -0400, Carlos O'Donell wrote: > This is not the right fix for this problem. The right fix has not > been attempted because it involves someone doing some real leg work > to gather consensus. This fix adds complex checking in ld.so for > minimal gain, and eventually you'll get a debuginfo file that is > different again in some odd way. This is not specifically about being able to read debug files, nor is it about ldd. It just happens to be ldd (ld.so [--verify|--list]) that crashed, but the offending code is bang in the middle of generic ld.so code that can potentially be exploited when running arbitrary binaries. While it is true that one should not run arbitrary code anyway, it shouldn't be an excuse for not fixing bugs. I don't see the point of not adding such checks as they come up; performance is an excuse made quite regularly, but what is the actual cost of such checks? To be clear, I am not against having an eu-ldd, but that shouldn't be an excuse for not patching ld.so. Things that don't crash on eu-ldd, should not crash on ld.so. Oh, and did I mention that eu-ldd (and most of elfutils) should ideally be written in an interpreted language (cough*python*cough) so that we reduce the attack surface on them? Siddhesh
On Thu, Jul 16, 2015 at 08:15:17PM -0700, H.J. Lu wrote: > If I understand it correctly, ld.so crashes on an ELF file with segment headers > where file offset is larger than the file itself. Isn't it the same > as a corrupted > ELF file with segment headers where file offset is larger than the file itself? > I can create such a file with a binary editor. Yes, which is the point of including such fixes IMO. Siddhesh
On Thu, Jul 16, 2015 at 8:33 PM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote: > On Thu, Jul 16, 2015 at 08:15:17PM -0700, H.J. Lu wrote: >> If I understand it correctly, ld.so crashes on an ELF file with segment headers >> where file offset is larger than the file itself. Isn't it the same >> as a corrupted >> ELF file with segment headers where file offset is larger than the file itself? >> I can create such a file with a binary editor. > > Yes, which is the point of including such fixes IMO. > You can create such a testcase with a simple program which updates the segment header in a good ELF file.
On 07/16/2015 11:28 PM, Siddhesh Poyarekar wrote: > On Thu, Jul 16, 2015 at 11:59:18AM -0400, Carlos O'Donell wrote: >> This is not the right fix for this problem. The right fix has not >> been attempted because it involves someone doing some real leg work >> to gather consensus. This fix adds complex checking in ld.so for >> minimal gain, and eventually you'll get a debuginfo file that is >> different again in some odd way. > > This is not specifically about being able to read debug files, nor is > it about ldd. It just happens to be ldd (ld.so [--verify|--list]) > that crashed, but the offending code is bang in the middle of generic > ld.so code that can potentially be exploited when running arbitrary > binaries. While it is true that one should not run arbitrary code > anyway, it shouldn't be an excuse for not fixing bugs. I don't see > the point of not adding such checks as they come up; performance is an > excuse made quite regularly, but what is the actual cost of such > checks? Is this really a bug? The cost of the checks is more than just performance, but also maintenance and support of that code into the future. It is about testing that code, making sure we exercise those weird conditional failure paths which we thought would be a good thing to add. Each of these checks is an additional complication, added branch, and conditional in already complicated code. I would like to see the core loader as simple as possible and to expect as correct as possible input. If I ask you for a test case for bug 18685, do we have one we can checkin to test this code? That goes a long way to helping maintain this code. If ldd should not be run on untrusted input... then why are we looking at this patch? I think the fact on the ground is that ldd *is* run on untrusted input and eu-ldd isn't ready, and we face the grim specter of security issues. However, if we're talking security issues, there are a million ways I could craft relocs to overwrite and assemble ROP widgets to execute a custom program that would run crafty exploits. Running ldd is equivalent to ./app. See: "DEFCON 20: Programming Weird Machines with ELF Metadata" https://www.youtube.com/watch?v=YgtxxLCVD-o So again, we can never support real ld.so running or processing untrusted input? The real fix becomes: Don't use ld.so, don't process relocs, don't run anything, just analyze. > To be clear, I am not against having an eu-ldd, but that shouldn't be > an excuse for not patching ld.so. Things that don't crash on eu-ldd, > should not crash on ld.so. To fix or not to fix ld.so should be done on the merit of the individual change. I disagree that eu-ldd and ld.so should both be equally robust against bad input. Just consider the myriad of problems the binary could have, the fuzzing we put libelf through, all of that is costly verification that ld.so should not do. > Oh, and did I mention that eu-ldd (and most of elfutils) should > ideally be written in an interpreted language (cough*python*cough) so > that we reduce the attack surface on them? How does python reduce the attack surface? I've already started eu-ldd in C because it needs to know quite a bit about the low-level pieces and it was easiest for me to do this in C (headers, constants etc). However, from a "independent implementation" perspective, writing it in python would make it truly orthogonal from the C version. Any other benefits to python? Cheers, Carlos.
On Fri, Jul 17, 2015 at 12:02:43AM -0400, Carlos O'Donell wrote: > Is this really a bug? Yes, because it is in the dynamic linker path and is not provably isolated to the ldd use case. It just happens so that it is exposed by ldd, but I don't think it has been definitely proven yet that it is not possible to expose the bug in the dynamic linker proper. > The cost of the checks is more than just performance, but also maintenance > and support of that code into the future. It is about testing that code, > making sure we exercise those weird conditional failure paths which we > thought would be a good thing to add. Each of these checks is an additional > complication, added branch, and conditional in already complicated code. > I would like to see the core loader as simple as possible and to expect > as correct as possible input. > > If I ask you for a test case for bug 18685, do we have one we can checkin > to test this code? That goes a long way to helping maintain this code. I don't yet; I can explore H. J.'s suggestion of writing out a corrupt binary, but I don't yet have a reproducer that will trigger the crash without --verify. But not checking it in because of this is a weak reason IMO when it is very clear that the bug exists. > If ldd should not be run on untrusted input... then why are we looking > at this patch? I think the fact on the ground is that ldd *is* run on Like I said, because it is not specific to ldd; it just happens to be exposed by ldd. > untrusted input and eu-ldd isn't ready, and we face the grim specter of > security issues. However, if we're talking security issues, there are > a million ways I could craft relocs to overwrite and assemble ROP > widgets to execute a custom program that would run crafty exploits. Then we add those million checks as we find them if they affect ld.so proper, i.e. not only the --verify path. Although I suspect the number will be much less than a million :) > Running ldd is equivalent to ./app. Not really, but it is not related to the argument I'm making, so I'll let it go. > See: "DEFCON 20: Programming Weird Machines with ELF Metadata" > https://www.youtube.com/watch?v=YgtxxLCVD-o > > So again, we can never support real ld.so running or processing untrusted > input? The real fix becomes: Don't use ld.so, don't process relocs, > don't run anything, just analyze. If that is what we want to do, then we should remove ld.so --verify and --list immediately and encourage use of other tools. That is still not related to this fix, which is directly in the ld.so path. > I disagree that eu-ldd and ld.so should both be equally robust against > bad input. Just consider the myriad of problems the binary could have, > the fuzzing we put libelf through, all of that is costly verification > that ld.so should not do. We don't know the cost of verification yet. We fear that it will be too much, but we have no idea how much that 'too much' is. If it is a 5% penalty on load time, IMO it is fine for the hardening it provides. Maybe we could provide a build or runtime flag to bypass those checks, but that is an added complication that is not worth the 5% IMO. > How does python reduce the attack surface? I've already started eu-ldd > in C because it needs to know quite a bit about the low-level pieces > and it was easiest for me to do this in C (headers, constants etc). > However, from a "independent implementation" perspective, writing it > in python would make it truly orthogonal from the C version. > > Any other benefits to python? The biggest benefit is what happens in case of bugs in the program. If the C bits are limited to just the core elfutils library, the worst bug that the tools can have is a python error due to a missing validation, which will just result in a crash. With C, the worst error is a buffer overrun or an arbitrary pointer dereference, which may be exploitable. So what is needed is a python wrapper around libelfutils, which eu-ldd can then use. I remember someone talking about such a project, but I don't know if anyone actually got around to doing it. Siddhesh
On 07/16/2015 10:00 AM, Siddhesh Poyarekar wrote: > Some valid ELF objects, like .debug files may refer to sections > outside themselves since they're loaded and patched up to their parent > ELF. Calling ldd on them may result in a segfault since it may try to > read beyond the end of the mapping. Red Hat bz: > > https://bugzilla.redhat.com/show_bug.cgi?id=741105 > > has an example, although I haven't been able to find a sample > reproducer file immediately. This patch has been carried in Fedora > and RHEL for a couple of years now. Also tested on x86_64 to verify > that there are no regressions. Can you show the ldd output with this patch applied on some files which crashed before? Is it useful at all? I know that debuginfo files usually have garbage interpreter fields, so I really doubt that ldd can produce anything useful on debuginfo files. + errstring = N_("ELF load command past end of file"); The expectation appears to be that end users see this error message. It needs to better reflect what's going on. Is there a way that an ELF file is corrupted in this way, but can be still be loaded correctly? I wonder if this change will break currently working programs. I do think we need to support loading corrupted ELF files indefinitely if there ever was a binutils version producing them.
On Fri, Jul 17, 2015 at 07:28:45AM +0200, Florian Weimer wrote: > Can you show the ldd output with this patch applied on some files which > crashed before? Is it useful at all? I'll have to build one since I can't find any from a brief look through files I have on my system. The examples were mostly the .debug files IIRC, so that has more to do with not crashing than actually providing any useful information. > I know that debuginfo files usually have garbage interpreter fields, so > I really doubt that ldd can produce anything useful on debuginfo files. > > + errstring = N_("ELF load command past end of file"); > > The expectation appears to be that end users see this error message. It > needs to better reflect what's going on. OK, that makes sense. > Is there a way that an ELF file is corrupted in this way, but can be > still be loaded correctly? I wonder if this change will break currently > working programs. I do think we need to support loading corrupted ELF > files indefinitely if there ever was a binutils version producing them. I don't know of any valid ELF programs that broke due to this. The patch (AFAICT) has been in Fedora for about 3 years, so if something had to break, it should have been known by now. Siddhesh
On Fri, Jul 17, 2015 at 07:28:45AM +0200, Florian Weimer wrote: > On 07/16/2015 10:00 AM, Siddhesh Poyarekar wrote: > > Some valid ELF objects, like .debug files may refer to sections > > outside themselves since they're loaded and patched up to their parent > > ELF. Calling ldd on them may result in a segfault since it may try to > > read beyond the end of the mapping. Red Hat bz: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=741105 > > > > has an example, although I haven't been able to find a sample > > reproducer file immediately. This patch has been carried in Fedora > > and RHEL for a couple of years now. Also tested on x86_64 to verify > > that there are no regressions. > > Can you show the ldd output with this patch applied on some files which > crashed before? Is it useful at all? > > I know that debuginfo files usually have garbage interpreter fields, so > I really doubt that ldd can produce anything useful on debuginfo files. > > + errstring = N_("ELF load command past end of file"); > > The expectation appears to be that end users see this error message. It > needs to better reflect what's going on. > > Is there a way that an ELF file is corrupted in this way, but can be > still be loaded correctly? I wonder if this change will break currently > working programs. I do think we need to support loading corrupted ELF > files indefinitely if there ever was a binutils version producing them. > Also could you explain how this is security risk? If that needs modification of elf then attacker could run arbitrary commands by overriding main for example.
On 07/17/2015 09:35 AM, Ondřej Bílka wrote: > Also could you explain how this is security risk? If that needs > modification of elf then attacker could run arbitrary commands by > overriding main for example. Just because I comment on a thread doesn't mean I consider it security-related. Or what do you mean?
On Fri, Jul 17, 2015 at 09:43:29AM +0200, Florian Weimer wrote: > On 07/17/2015 09:35 AM, Ondřej Bílka wrote: > > > Also could you explain how this is security risk? If that needs > > modification of elf then attacker could run arbitrary commands by > > overriding main for example. > > Just because I comment on a thread doesn't mean I consider it > security-related. > > Or what do you mean? > Well here I did bit distracted in reading thread, starting reply, then writing what I wanted to ask to wrong post. I wanted to reply instead to what Carlos wrote: " Running the result of `elf/ldd.bash.in` (invoking the real loader) should run the real loader in trace mode. The real loader should rarely compromise to being slower because it's in trace mode. It should IMO not unduly compromise performance for hardening against problems which are the result of other failures in the system e.g. storage, memory, kernel, etc. Where possible some hardening is good given clearly stated rationale for the change e.g. a real security threat. In this particular case, hardening against stripped debug data, is the wrong solution. The debug data must be clearly marked as such since it's actually not real DSOs or executables, and we've cheated by using objcopy to get something that looks like a valid ELF file but isn't anymore. We need to stop being sloppy about how we label that kind of object. Is that clear enough? "
On 07/17/2015 08:15 AM, Siddhesh Poyarekar wrote: > On Fri, Jul 17, 2015 at 07:28:45AM +0200, Florian Weimer wrote: >> Can you show the ldd output with this patch applied on some files which >> crashed before? Is it useful at all? > > I'll have to build one since I can't find any from a brief look > through files I have on my system. The examples were mostly the > .debug files IIRC, so that has more to do with not crashing than > actually providing any useful information. Useful information would be that's a debuginfo file. Showing no output or garbage is hardly more helpful than crashing because the user still doesn't know what's wrong. They won't even report a bug, so we can't help them. >> Is there a way that an ELF file is corrupted in this way, but can be >> still be loaded correctly? I wonder if this change will break currently >> working programs. I do think we need to support loading corrupted ELF >> files indefinitely if there ever was a binutils version producing them. > > I don't know of any valid ELF programs that broke due to this. The > patch (AFAICT) has been in Fedora for about 3 years, so if something > had to break, it should have been known by now. Unfortunately, Fedora coverage for non-i386/x86_64 and proprietary legacy applications is quite poor. In this check, + if (__glibc_unlikely (ph->p_offset + ph->p_filesz > st.st_size)) do p_offset and p_filesz correspond to the program header values in the file, or have they already been modified?. Looking at the unmodified program header values (as reported by elfutils and the gelf_getphdr function) in a fairly large set of RPMs, I do not see production ELF files (as opposed debuginfo files or firmware images) where this constraint is violated (for type 1 headers, that is). The RPMs are from every minor Red Hat Enterprise Linux release going back to version 4 and from recent Fedora releases. It's not a complete set of everything shipped by Red Hat or Fedora by a long shot, but it strongly suggests that we did not have a recent bug introducing systematic corruption. One more question, regarding the new check: Is there already a check that the addition does not overflow?
On 07/17/2015 10:03 AM, Ondřej Bílka wrote: > On Fri, Jul 17, 2015 at 09:43:29AM +0200, Florian Weimer wrote: >> On 07/17/2015 09:35 AM, Ondřej Bílka wrote: >> >>> Also could you explain how this is security risk? If that needs >>> modification of elf then attacker could run arbitrary commands by >>> overriding main for example. >> >> Just because I comment on a thread doesn't mean I consider it >> security-related. >> >> Or what do you mean? >> > Well here I did bit distracted in reading thread, starting reply, then > writing what I wanted to ask to wrong post. I wanted to reply instead to > what Carlos wrote: Okay, well, I think a safe ldd should be a goal, but I don't think we can currently make such a commitment.
> On Jul 17, 2015, at 12:02 PM, Carlos O'Donell <carlos@redhat.com> wrote: > >> On 07/16/2015 11:28 PM, Siddhesh Poyarekar wrote: >>> On Thu, Jul 16, 2015 at 11:59:18AM -0400, Carlos O'Donell wrote: >>> This is not the right fix for this problem. The right fix has not >>> been attempted because it involves someone doing some real leg work >>> to gather consensus. This fix adds complex checking in ld.so for >>> minimal gain, and eventually you'll get a debuginfo file that is >>> different again in some odd way. >> >> This is not specifically about being able to read debug files, nor is >> it about ldd. It just happens to be ldd (ld.so [--verify|--list]) >> that crashed, but the offending code is bang in the middle of generic >> ld.so code that can potentially be exploited when running arbitrary >> binaries. While it is true that one should not run arbitrary code >> anyway, it shouldn't be an excuse for not fixing bugs. I don't see >> the point of not adding such checks as they come up; performance is an >> excuse made quite regularly, but what is the actual cost of such >> checks? > > Is this really a bug? > > The cost of the checks is more than just performance, but also maintenance > and support of that code into the future. It is about testing that code, > making sure we exercise those weird conditional failure paths which we > thought would be a good thing to add. Each of these checks is an additional > complication, added branch, and conditional in already complicated code. > I would like to see the core loader as simple as possible and to expect > as correct as possible input. Yes but the cost is cheaper than someone accidentally doing a dlopen of a file and causing a crash rather than retuning malformed elf file. I suspect that is a way how to reproduce this failure. Thanks, Andrew > > If I ask you for a test case for bug 18685, do we have one we can checkin > to test this code? That goes a long way to helping maintain this code. > > If ldd should not be run on untrusted input... then why are we looking > at this patch? I think the fact on the ground is that ldd *is* run on > untrusted input and eu-ldd isn't ready, and we face the grim specter of > security issues. However, if we're talking security issues, there are > a million ways I could craft relocs to overwrite and assemble ROP > widgets to execute a custom program that would run crafty exploits. > Running ldd is equivalent to ./app. > > See: "DEFCON 20: Programming Weird Machines with ELF Metadata" > https://www.youtube.com/watch?v=YgtxxLCVD-o > > So again, we can never support real ld.so running or processing untrusted > input? The real fix becomes: Don't use ld.so, don't process relocs, > don't run anything, just analyze. > >> To be clear, I am not against having an eu-ldd, but that shouldn't be >> an excuse for not patching ld.so. Things that don't crash on eu-ldd, >> should not crash on ld.so. > > To fix or not to fix ld.so should be done on the merit of the individual > change. > > I disagree that eu-ldd and ld.so should both be equally robust against > bad input. Just consider the myriad of problems the binary could have, > the fuzzing we put libelf through, all of that is costly verification > that ld.so should not do. > >> Oh, and did I mention that eu-ldd (and most of elfutils) should >> ideally be written in an interpreted language (cough*python*cough) so >> that we reduce the attack surface on them? > > How does python reduce the attack surface? I've already started eu-ldd > in C because it needs to know quite a bit about the low-level pieces > and it was easiest for me to do this in C (headers, constants etc). > However, from a "independent implementation" perspective, writing it > in python would make it truly orthogonal from the C version. > > Any other benefits to python? > > Cheers, > Carlos.
On Fri, Jul 17, 2015 at 01:24:05PM +0200, Florian Weimer wrote: > Useful information would be that's a debuginfo file. Showing no output > or garbage is hardly more helpful than crashing because the user still > doesn't know what's wrong. They won't even report a bug, so we can't > help them. The trouble is, you don't know for sure that it is a debuginfo file. That's where Carlos talked about the need for a new flag. > Unfortunately, Fedora coverage for non-i386/x86_64 and proprietary > legacy applications is quite poor. The patch has been in rhel-6 for a similar amount of time as well, i.e. since before I started maintaining the tree. > In this check, > > + if (__glibc_unlikely (ph->p_offset + ph->p_filesz > st.st_size)) > > do p_offset and p_filesz correspond to the program header values in the > file, or have they already been modified?. They correspond to the program header values, why do you think they have been modified? > Looking at the unmodified program header values (as reported by elfutils > and the gelf_getphdr function) in a fairly large set of RPMs, I do not > see production ELF files (as opposed debuginfo files or firmware images) > where this constraint is violated (for type 1 headers, that is). The > RPMs are from every minor Red Hat Enterprise Linux release going back to > version 4 and from recent Fedora releases. It's not a complete set of > everything shipped by Red Hat or Fedora by a long shot, but it strongly > suggests that we did not have a recent bug introducing systematic > corruption. Yes, ELF files (minus debuginfo) build by our assemblers and linkers should not have this issue. > One more question, regarding the new check: Is there already a check > that the addition does not overflow? There isn't. I intend to add it but it doesn't seem like there's agreement on including this patch at all. Siddhesh
On 07/17/2015 02:21 PM, Siddhesh Poyarekar wrote: > On Fri, Jul 17, 2015 at 01:24:05PM +0200, Florian Weimer wrote: >> Useful information would be that's a debuginfo file. Showing no output >> or garbage is hardly more helpful than crashing because the user still >> doesn't know what's wrong. They won't even report a bug, so we can't >> help them. > > The trouble is, you don't know for sure that it is a debuginfo file. > That's where Carlos talked about the need for a new flag. > >> Unfortunately, Fedora coverage for non-i386/x86_64 and proprietary >> legacy applications is quite poor. > > The patch has been in rhel-6 for a similar amount of time as well, > i.e. since before I started maintaining the tree. Okay, good to know. This is addresses my concerns about the backwards compatibility impact of this change. Consider them withdrawn. >> In this check, >> >> + if (__glibc_unlikely (ph->p_offset + ph->p_filesz > st.st_size)) >> >> do p_offset and p_filesz correspond to the program header values in the >> file, or have they already been modified?. > > They correspond to the program header values, why do you think they > have been modified? No particular reason, I just wanted to be sure. :-) >> One more question, regarding the new check: Is there already a check >> that the addition does not overflow? > > There isn't. I intend to add it but it doesn't seem like there's > agreement on including this patch at all. Oh well, understood. (I was mainly worried about the compatibility impact.)
On 07/17/2015 08:00 AM, pinskia@gmail.com wrote: >> Is this really a bug? >> >> The cost of the checks is more than just performance, but also maintenance >> and support of that code into the future. It is about testing that code, >> making sure we exercise those weird conditional failure paths which we >> thought would be a good thing to add. Each of these checks is an additional >> complication, added branch, and conditional in already complicated code. >> I would like to see the core loader as simple as possible and to expect >> as correct as possible input. > > Yes but the cost is cheaper than someone accidentally doing a dlopen > of a file and causing a crash rather than retuning malformed elf > file. I suspect that is a way how to reproduce this failure. Andrew, Sorry, could you please clarify: Cheaper for the glibc community or cheaper for the user? Could you clarify what you mean by "returning malformed elf file?" Cheers, Carlos.
> We don't know the cost of verification yet. We fear that it will be > too much, but we have no idea how much that 'too much' is. If it is a > 5% penalty on load time, IMO it is fine for the hardening it provides. > Maybe we could provide a build or runtime flag to bypass those checks, > but that is an added complication that is not worth the 5% IMO. The design goal of the dynamic loader is to consume correctly formed ELF files and to assemble an in-memory image of the application for execution by the operating system and hardware. The dynamic loader will not assume the ELF files are corrupt, and can proceed directly to use any of the information provided and encoded in the format in order to carry out it's operations as quickly and efficiently as possible. The dynamic loader implementation will be as minimal as possible to support auditing, code coverage, unit testing, and regression testing. This means that any added code that is not part of the above purposes goes against the design goals of the dynamic loader. There are no security risks in running this way. The only security risk is that you ran a binary from an untrusted source, or that the underlying transport system was unreliable and corrupted the binary. There are tools to verify the binary and if there aren't they can be written using interpreted languages like Python and ELF format handling libraries like libelf. --- The patch in the this thread... https://www.sourceware.org/ml/libc-alpha/2015-07/msg00480.html ... violates the design goals of the dyanmic loader by slowing down, even infinitesimally (since design goals are pedantic things), the dynamic loader for the purpose of quick and efficient execution of binaries. The additional code adds to the maintenance burden of the dynamic loader without meeting the goals of the loader. Arguing that it is a bug is treading on thin ice, since the bug does not violate the design goals as stated. We can and should discuss the design goals of the loader since they are at the core of this discussion. Not listed above is: What diagnostics is the loader to provide if the ELF file is faulty? Any? A crash does not seem like a good diagnostic? I argue that a crash is a fine diagnostic, and that it is not the job of the dynamic loader to verify the integrity of the binary or libraries, and that we should write other tools for that purpose which can potentially be leveraged by the dynamic loader running in a strict runtime verification mode (another discussion to be had). The course of action I recommend are based on the reality of our situation and the true causes of crashes in ld.so which mostly happen to be distro tooling issues and running ldd against things which are not complete ELF files. Next steps: * Work to get ET_* markings for debuginfo files to avoid the hacks we have presently thought about adding to the dynamic loader. * Work to complete eu-ldd to replace elf/ldd.bash.in, and then work to remove or rewrite the ld.so trace support to provide a developer-centric interface for debugging that is not restricted by a need to support ldd. * Work to create a new project e.g. eu-verifyelf (if it doesn't exist) which performs strict ELF file verification and can for example consume the output of an ld.so trace to verify dlmopen/dlopen calls also (eu-ldd could also get this support). * Lastly, add a mode where ld.so calls out to eu-verifyelf in some way to verify the initial binaries and libraries when run in some strict ELF checking mode. This cost is born once at startup and every time we call dlopen/dlmopen. Cheers, Carlos.
I concur with this analysis of what rtld is and should be. I concur with the goal of getting ldd support out of rtld proper. The details beyond that need more discussion before becoming specific goals (to me).
I disagree with the design goals as they're stated. I won't waste time with this though because I don't see anybody agreeing with my more conservative stand on how robust the dynamic linker should be. Siddhesh
> I disagree with the design goals as they're stated. I won't waste > time with this though because I don't see anybody agreeing with my > more conservative stand on how robust the dynamic linker should be. I think we can discuss it further to understand your rationale better, if you care at all. Do you disagree with the notion of getting to an ldd that does not involve running rtld? If you intend to support the ldd case, then the motivation for graceful handling of bad ELF files is clear. If that is not the sole motivation, then I would like to hear you elaborate on what other motivations lead you to wanting this sort of change in rtld. Thanks, Roland
On Fri, Jul 17, 2015 at 07:15:15PM -0700, Roland McGrath wrote: > I think we can discuss it further to understand your rationale better, > if you care at all. Do you disagree with the notion of getting to an > ldd that does not involve running rtld? If you intend to support the No, I am very much in favour of getting ldd out of libc altogether; in fact I am of the opinion that we remove those options completely from the dynamic linker once we have a usable alternative. > ldd case, then the motivation for graceful handling of bad ELF files > is clear. If that is not the sole motivation, then I would like to > hear you elaborate on what other motivations lead you to wanting this > sort of change in rtld. My motivation for wanting this sort of change is primarily to do with the way the dynamic linker would behave if it weren't patched, i.e. it would access or modify arbitrary addresses and either crash or do something that it is not intended to do. We have a reproducer with ldd, but the code path is not exclusive to ldd, i.e. ld.so <binary> could trigger this as well. We have some validations in place already, so it is not clear to me what the criteria for selecting them is and at present it seems arbitrary. If there is a clear guideline for what kind of sanity checks are OK for the linker and what aren't, I can live (albeit slightly uncomfortably since I still cannot reconcile with arbitrary memory accesses being expected behaviour) with not adding such a check. Siddhesh
On Sat, Jul 18, 2015 at 08:19:53AM +0530, Siddhesh Poyarekar wrote: > We have some validations in place already, so it is not clear to me > what the criteria for selecting them is and at present it seems > arbitrary. If there is a clear guideline for what kind of sanity OK, so I took another look and the only validation that seems kinda related is the one that detects the wrong ELF class. That too is not specifically a protection against invalid ELF; just not somthing this linker runs. So I guess it is easy to claim that ld.so assumes that the ELF is valid. I withdraw my patch if the general idea of allowing the linker to do arbitrary things or crashing on invalid ELF is acceptable. Siddhesh
On Fri, Jul 17, 2015 at 12:47:44PM -0400, Carlos O'Donell wrote: > > We don't know the cost of verification yet. We fear that it will be > > too much, but we have no idea how much that 'too much' is. If it is a > > 5% penalty on load time, IMO it is fine for the hardening it provides. > > Maybe we could provide a build or runtime flag to bypass those checks, > > but that is an added complication that is not worth the 5% IMO. > > The design goal of the dynamic loader is to consume correctly formed > ELF files and to assemble an in-memory image of the application for > execution by the operating system and hardware. > > The dynamic loader will not assume the ELF files are corrupt, and > can proceed directly to use any of the information provided and > encoded in the format in order to carry out it's operations as > quickly and efficiently as possible. Note that any attempt to validate ELF contents at runtime is utterly useless unless you abandon mmap and read the file contents into memory instead. Since Linux lacks MAP_COPY and only has MAP_PRIVATE or MA_SHARED, there is always a risk that the file will be modified after validation but before use. Rich
* Carlos O'Donell: > The design goal of the dynamic loader is to consume correctly formed > ELF files and to assemble an in-memory image of the application for > execution by the operating system and hardware. > > The dynamic loader will not assume the ELF files are corrupt, and > can proceed directly to use any of the information provided and > encoded in the format in order to carry out it's operations as > quickly and efficiently as possible. > > The dynamic loader implementation will be as minimal as possible > to support auditing, code coverage, unit testing, and regression > testing. This means that any added code that is not part of the > above purposes goes against the design goals of the dynamic loader. Grudgingly, I agree. > There are no security risks in running this way. Correct, and I really hope it stays this way. There is considerable interest in implementing code signing, but I have philosophical objects to that because it can only be used to take away freedoms from users (and device owners).
On 07/17/2015 10:49 PM, Siddhesh Poyarekar wrote: > On Fri, Jul 17, 2015 at 07:15:15PM -0700, Roland McGrath wrote: >> I think we can discuss it further to understand your rationale better, >> if you care at all. Do you disagree with the notion of getting to an >> ldd that does not involve running rtld? If you intend to support the > > No, I am very much in favour of getting ldd out of libc altogether; in > fact I am of the opinion that we remove those options completely from > the dynamic linker once we have a usable alternative. I agree completely, but we would need to replace it with something like stap-based probe tracing which compiles to nop's normally but allows an external program like systemtrace to be used to trace library loading with all the same detailed data as before but for developers only (and ship the stap script with libc.so.6). >> ldd case, then the motivation for graceful handling of bad ELF files >> is clear. If that is not the sole motivation, then I would like to >> hear you elaborate on what other motivations lead you to wanting this >> sort of change in rtld. > > My motivation for wanting this sort of change is primarily to do with > the way the dynamic linker would behave if it weren't patched, i.e. it > would access or modify arbitrary addresses and either crash or do > something that it is not intended to do. We have a reproducer with > ldd, but the code path is not exclusive to ldd, i.e. ld.so <binary> > could trigger this as well. > > We have some validations in place already, so it is not clear to me > what the criteria for selecting them is and at present it seems > arbitrary. If there is a clear guideline for what kind of sanity > checks are OK for the linker and what aren't, I can live (albeit > slightly uncomfortably since I still cannot reconcile with arbitrary > memory accesses being expected behaviour) with not adding such a > check. It important to have these discussions. We should have community members with different view points, and be ready to use those views to better serve our users should the needs of those users change. I too find a certain degree of cognitive dissonance between what I as a developer want in ld.so and what our users want. I want a robust ld.so, but our users want performance. Therefore I judge the best criteria to be two distinct tools e.g. ld.so and ldd (which has all or most of the trace features of ld.so). Cheers, Carlos.
On Mon, Jul 20, 2015 at 08:56:08AM -0400, Carlos O'Donell wrote: > I agree completely, but we would need to replace it with something like > stap-based probe tracing which compiles to nop's normally but allows an > external program like systemtrace to be used to trace library loading with > all the same detailed data as before but for developers only (and ship > the stap script with libc.so.6). I don't understand - we already have stap probe points in ld.so that do this and we also have LD_DEBUG, which gives the ability to trace ld.so activities the conventional way, i.e. by dumping to a file or stderr. Neither have anything to do with ldd. > It important to have these discussions. We should have community > members with different view points, and be ready to use those views > to better serve our users should the needs of those users change. I have kinda reconciled with this since I realized that we don't already validate ELF binaries - I had misread the validations. Siddhesh
On 07/20/2015 09:51 AM, Siddhesh Poyarekar wrote: > On Mon, Jul 20, 2015 at 08:56:08AM -0400, Carlos O'Donell wrote: >> I agree completely, but we would need to replace it with something like >> stap-based probe tracing which compiles to nop's normally but allows an >> external program like systemtrace to be used to trace library loading with >> all the same detailed data as before but for developers only (and ship >> the stap script with libc.so.6). > > I don't understand - we already have stap probe points in ld.so that > do this and we also have LD_DEBUG, which gives the ability to trace > ld.so activities the conventional way, i.e. by dumping to a file or > stderr. Neither have anything to do with ldd. The latter two are code in the core loader that are more difficult to audit than stap-based probes? If we removed all trace an debug code from and moved to stap, we would have something that is even easier to audit? Cheers, Carlos.
On Mon, Jul 20, 2015 at 10:19:05AM -0400, Carlos O'Donell wrote: > The latter two are code in the core loader that are more difficult > to audit than stap-based probes? If we removed all trace an debug code > from and moved to stap, we would have something that is even easier to audit? Possibly, if stap with dyninst becomes stable enough (I had a few difficulties using it a few months ago, I don't remember why) then we can tout that as the replacement. Siddhesh
On Fri, Jul 17, 2015 at 11:47 AM, Carlos O'Donell <carlos@redhat.com> wrote: > > [...] > The dynamic loader implementation will be as minimal as possible > to support auditing, code coverage, unit testing, and regression > testing. This means that any added code that is not part of the > above purposes goes against the design goals of the dynamic loader. > > There are no security risks in running this way. The only security > risk is that you ran a binary from an untrusted source, or that > the underlying transport system was unreliable and corrupted the > binary. I'm just a glibc n00b, so I may be missing something, but it seems like the stated design goal results in the code being more brittle, which leads to security risks. Even if every code reviewer has convinced themselves than an unchecked pointer dereference is safe, it is still possible for things to go wrong in a way that nobody thought of - and in TLS for instance, we are still finding and fixing holes, years after the code was written. It used to be that we had to take these sorts of chances so as to get reasonable performance, but I don't think that has been necessary for awhile. Do people have statistics showing ld.so dominating execution time? Stan
On 07/20/2015 11:31 AM, Stan Shebs wrote: > On Fri, Jul 17, 2015 at 11:47 AM, Carlos O'Donell <carlos@redhat.com> wrote: >> >> [...] >> The dynamic loader implementation will be as minimal as possible >> to support auditing, code coverage, unit testing, and regression >> testing. This means that any added code that is not part of the >> above purposes goes against the design goals of the dynamic loader. >> >> There are no security risks in running this way. The only security >> risk is that you ran a binary from an untrusted source, or that >> the underlying transport system was unreliable and corrupted the >> binary. > > I'm just a glibc n00b, so I may be missing something, but it seems like > the stated design goal results in the code being more brittle, which leads > to security risks. Even if every code reviewer has convinced themselves > than an unchecked pointer dereference is safe, it is still possible for things > to go wrong in a way that nobody thought of - and in TLS for instance, we > are still finding and fixing holes, years after the code was written. The design goal sets the bar high, and forces you to argue for your change with solid evidence and fact. It's just a goal after all and we often fall short. > It used to be that we had to take these sorts of chances so as to get > reasonable performance, but I don't think that has been necessary for > awhile. Do people have statistics showing ld.so dominating execution > time? Isn't this a self-fullfilling prophecy though? We don't show up on execution time precisely because of the measures we take? :-) Cheers, Carlos.
diff --git a/elf/dl-load.c b/elf/dl-load.c index 0c052e4..f8aaa60 100644 --- a/elf/dl-load.c +++ b/elf/dl-load.c @@ -880,7 +880,8 @@ _dl_map_object_from_fd (const char *name, int fd, struct filebuf *fbp, /* Get file information. */ struct r_file_id id; - if (__glibc_unlikely (!_dl_get_file_id (fd, &id))) + struct stat64 st; + if (__glibc_unlikely (!_dl_get_file_id (fd, &id, &st))) { errstring = N_("cannot stat shared object"); call_lose_errno: @@ -1076,6 +1077,17 @@ _dl_map_object_from_fd (const char *name, int fd, struct filebuf *fbp, goto call_lose; } + if (__glibc_unlikely (ph->p_offset + ph->p_filesz > st.st_size)) + { + /* If the segment requires zeroing of part of its last + page, we'll crash when accessing the unmapped page. + There's still a possibility of a race, if the shared + object is truncated between the fxstat above and the + memset below. */ + errstring = N_("ELF load command past end of file"); + goto call_lose; + } + struct loadcmd *c = &loadcmds[nloadcmds++]; c->mapstart = ph->p_vaddr & ~(GLRO(dl_pagesize) - 1); c->mapend = ((ph->p_vaddr + ph->p_filesz + GLRO(dl_pagesize) - 1) diff --git a/sysdeps/generic/dl-fileid.h b/sysdeps/generic/dl-fileid.h index 2cbd21d..9b7f410 100644 --- a/sysdeps/generic/dl-fileid.h +++ b/sysdeps/generic/dl-fileid.h @@ -29,7 +29,8 @@ struct r_file_id On error, returns false, with errno set. */ static inline bool _dl_get_file_id (int fd __attribute__ ((unused)), - struct r_file_id *id __attribute__ ((unused))) + struct r_file_id *id __attribute__ ((unused)), + struct stat64_t *st __attribute__((unused))) { return true; } diff --git a/sysdeps/posix/dl-fileid.h b/sysdeps/posix/dl-fileid.h index d0d5436..7115c3b 100644 --- a/sysdeps/posix/dl-fileid.h +++ b/sysdeps/posix/dl-fileid.h @@ -27,18 +27,16 @@ struct r_file_id ino64_t ino; }; -/* Sample FD to fill in *ID. Returns true on success. +/* Sample FD to fill in *ID and *ST. Returns true on success. On error, returns false, with errno set. */ static inline bool -_dl_get_file_id (int fd, struct r_file_id *id) +_dl_get_file_id (int fd, struct r_file_id *id, struct stat64 *st) { - struct stat64 st; - - if (__glibc_unlikely (__fxstat64 (_STAT_VER, fd, &st) < 0)) + if (__glibc_unlikely (__fxstat64 (_STAT_VER, fd, st) < 0)) return false; - id->dev = st.st_dev; - id->ino = st.st_ino; + id->dev = st->st_dev; + id->ino = st->st_ino; return true; }