Message ID | 1314024650-28510-1-git-send-email-coreyb@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
I'm still totally against this. FD passing is a nice feature for sandboxing, but the passing should be between closely cooperating programs. We'll need a tool shipped from the qemu source tree to open and set up the FDs, and not someone external. With that setup in place we can use a protocol similar to the various OpenBSD privilegue separated deaemons to also allow reopening / snapshots / etc. Opening fds in libvirt and passing them into qemu is exactly the wrong way, and just cements the current horrors where libvirt duplicates parsing of image format headers.
On 08/22/2011 11:38 AM, Christoph Hellwig wrote: > I'm still totally against this. FD passing is a nice feature for sandboxing, > but the passing should be between closely cooperating programs. We'll > need a tool shipped from the qemu source tree to open and set up the > FDs, and not someone external. With that setup in place we can use > a protocol similar to the various OpenBSD privilegue separated deaemons > to also allow reopening / snapshots / etc. > > Opening fds in libvirt and passing them into qemu is exactly the wrong way, > and just cements the current horrors where libvirt duplicates parsing > of image format headers. > This is following suit with exiting support that passes an fd for a TAP interface. Libvirt already passes a file descriptor to Qemu via '-net tap,fd='. Are you against that as well? Regards, Corey
On Mon, Aug 22, 2011 at 05:38:20PM +0200, Christoph Hellwig wrote: > I'm still totally against this. FD passing is a nice feature for sandboxing, > but the passing should be between closely cooperating programs. We'll > need a tool shipped from the qemu source tree to open and set up the > FDs, and not someone external. With that setup in place we can use > a protocol similar to the various OpenBSD privilegue separated deaemons > to also allow reopening / snapshots / etc. > > Opening fds in libvirt and passing them into qemu is exactly the wrong way, > and just cements the current horrors where libvirt duplicates parsing > of image format headers. The primary goal of this work is to allow QEMU to use a file, without giving it permission to open the file. This lets us cope with the current limitations of NFS wrt SELinux labelling. Where ordinarily we'd relabel the disk file to allow QEMU to open them, on NFS we can't do that. So we setup a SELinux policy that allows QEMU to read any NFS files that it is passed, but not actually open them. This allows secure use of QEMU with NFS, without having to solve the NFS + SELinux labelling problems, which is still a long term ongoing effort by NFS vendors. Whether or not libvirt parses image format headers, is a completely unrelated. Consider if libvirt did not parse image formats and instead required the mgmt app to pass in details of all backing files. We still have the problem of how to securely grant just one QEMU instance access to the files. This still needs the FD passing support being proposed here to cope with NFS. So the question of whether or not libvirt should be parsing image format headers is completely irrelevant to acceptability of this FD passing support. Regards, Daniel
On 08/22/2011 11:24 AM, Daniel P. Berrange wrote: > On Mon, Aug 22, 2011 at 05:38:20PM +0200, Christoph Hellwig wrote: >> I'm still totally against this. FD passing is a nice feature for sandboxing, >> but the passing should be between closely cooperating programs. We'll >> need a tool shipped from the qemu source tree to open and set up the >> FDs, and not someone external. With that setup in place we can use >> a protocol similar to the various OpenBSD privilegue separated deaemons >> to also allow reopening / snapshots / etc. >> >> Opening fds in libvirt and passing them into qemu is exactly the wrong way, >> and just cements the current horrors where libvirt duplicates parsing >> of image format headers. > > The primary goal of this work is to allow QEMU to use a file, without > giving it permission to open the file. This lets us cope with the current > limitations of NFS wrt SELinux labelling. Where ordinarily we'd relabel > the disk file to allow QEMU to open them, on NFS we can't do that. So we > setup a SELinux policy that allows QEMU to read any NFS files that it is > passed, but not actually open them. This allows secure use of QEMU with > NFS, without having to solve the NFS + SELinux labelling problems, which > is still a long term ongoing effort by NFS vendors. I think you miss the point Christoph is making. Christoph is suggesting that we have two qemu executables, qemu-fe and qemu-system-x86_64. qemu-fe would be smaller and would carry more rights than qemu-system-x86_64. But I don't think this fixes the problem. Something needs to do dynamic labelling of the backing files to implement a Chinese Wall MAC policy. In order to do that, something needs to parse the image formats. I don't think it makes sense to have qemu-fe do dynamic labelling. You certainly could avoid the fd passing by having qemu-fe do the open though and just let qemu-fe run without the restricted security context. But libvirt would still need to parse image files. Regards, Anthony Liguori > > Whether or not libvirt parses image format headers, is a completely > unrelated. Consider if libvirt did not parse image formats and instead > required the mgmt app to pass in details of all backing files. We still > have the problem of how to securely grant just one QEMU instance access > to the files. This still needs the FD passing support being proposed > here to cope with NFS. > > So the question of whether or not libvirt should be parsing image format > headers is completely irrelevant to acceptability of this FD passing > support. > > Regards, > Daniel
On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: > On 08/22/2011 11:24 AM, Daniel P. Berrange wrote: > >On Mon, Aug 22, 2011 at 05:38:20PM +0200, Christoph Hellwig wrote: > >>I'm still totally against this. FD passing is a nice feature for sandboxing, > >>but the passing should be between closely cooperating programs. We'll > >>need a tool shipped from the qemu source tree to open and set up the > >>FDs, and not someone external. With that setup in place we can use > >>a protocol similar to the various OpenBSD privilegue separated deaemons > >>to also allow reopening / snapshots / etc. > >> > >>Opening fds in libvirt and passing them into qemu is exactly the wrong way, > >>and just cements the current horrors where libvirt duplicates parsing > >>of image format headers. > > > >The primary goal of this work is to allow QEMU to use a file, without > >giving it permission to open the file. This lets us cope with the current > >limitations of NFS wrt SELinux labelling. Where ordinarily we'd relabel > >the disk file to allow QEMU to open them, on NFS we can't do that. So we > >setup a SELinux policy that allows QEMU to read any NFS files that it is > >passed, but not actually open them. This allows secure use of QEMU with > >NFS, without having to solve the NFS + SELinux labelling problems, which > >is still a long term ongoing effort by NFS vendors. > > I think you miss the point Christoph is making. > > Christoph is suggesting that we have two qemu executables, qemu-fe > and qemu-system-x86_64. qemu-fe would be smaller and would carry > more rights than qemu-system-x86_64. > > But I don't think this fixes the problem. Something needs to do > dynamic labelling of the backing files to implement a Chinese Wall > MAC policy. In order to do that, something needs to parse the image > formats. > > I don't think it makes sense to have qemu-fe do dynamic labelling. > You certainly could avoid the fd passing by having qemu-fe do the > open though and just let qemu-fe run without the restricted security > context. qemu-fe would also not be entirely simple, because it will need to act as a proxy for the monitor, in order to make hotplug work. ie the mgmt app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would then have to open the file and send 'drive_add fd:NN' onto the real QEMU, and then pass the results on back. In addition qemu-fe would still have to be under some kind of restricted security context for it to be acceptable. This is going to want to be as locked down as possible. So I'd see that you'd likely end up with the qemu-fe security policy being identical to the qemu security policy, with the exception that it would be allowed to open files on NFS without needing them to be labelled. So I don't really see that all this gives us any tangible benefits over just allowing the mgmt app to pass in the FDs directly. > But libvirt would still need to parse image files. Not neccessarily. As mentioned below, it is entirely possible to enable the mgmt app to pass in details of the backing files, at which point no image parsing is required by libvirt. Hence my assertion that the question of who does image parsing is irrelevant to this discussion. > >Whether or not libvirt parses image format headers, is a completely > >unrelated. Consider if libvirt did not parse image formats and instead > >required the mgmt app to pass in details of all backing files. We still > >have the problem of how to securely grant just one QEMU instance access > >to the files. This still needs the FD passing support being proposed > >here to cope with NFS. > > > >So the question of whether or not libvirt should be parsing image format > >headers is completely irrelevant to acceptability of this FD passing > >support. Daniel
On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: > On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: >> I don't think it makes sense to have qemu-fe do dynamic labelling. >> You certainly could avoid the fd passing by having qemu-fe do the >> open though and just let qemu-fe run without the restricted security >> context. > > qemu-fe would also not be entirely simple, Indeed. > because it will need to act > as a proxy for the monitor, in order to make hotplug work. ie the mgmt > app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would > then have to open the file and send 'drive_add fd:NN' onto the real QEMU, > and then pass the results on back. > > In addition qemu-fe would still have to be under some kind of restricted > security context for it to be acceptable. This is going to want to be as > locked down as possible. I think there's got to be some give and take here. It should at least be as locked down as libvirtd. From a security point of view, we should be able to agree that we want libvirtd to be as locked down as possible. But there shouldn't be a hard requirement to lock down qemu-fe more than libvirtd. Instead, the requirement should be for qemu-fe to be as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is. The fundamental problem here, is that there is some logic in libvirtd that rightly belongs in QEMU. In order to preserve the security model, that means that we're going to have to take a subsection of QEMU and trust it more. > So I'd see that you'd likely end up with the > qemu-fe security policy being identical to the qemu security policy, Then there's no point in doing qemu-fe. qemu-fe should be thought of as QEMU supplied libvirtd plugin. > with the exception that it would be allowed to open files on NFS without > needing them to be labelled. So I don't really see that all this gives us > any tangible benefits over just allowing the mgmt app to pass in the FDs > directly. > >> But libvirt would still need to parse image files. > > Not neccessarily. As mentioned below, it is entirely possible to > enable the mgmt app to pass in details of the backing files, at > which point no image parsing is required by libvirt. Hence my > assertion that the question of who does image parsing is irrelevant > to this discussion. That's certainly true. Regards, Anthony Liguori
On 08/22/2011 01:25 PM, Anthony Liguori wrote: > On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: >> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: >>> I don't think it makes sense to have qemu-fe do dynamic labelling. >>> You certainly could avoid the fd passing by having qemu-fe do the >>> open though and just let qemu-fe run without the restricted security >>> context. >> >> qemu-fe would also not be entirely simple, > > Indeed. > I do like the idea of a privileged qemu-fe performing the open and passing the fd to a restricted qemu. However, I get the impression that this won't get delivered nearly as quickly as fd: passing could be. How soon do we need image isolation for NFS? Btw, this sounds similar to what Blue Swirl recommended here on v1 of this patch: http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html Regards, Corey >> because it will need to act >> as a proxy for the monitor, in order to make hotplug work. ie the mgmt >> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would >> then have to open the file and send 'drive_add fd:NN' onto the real QEMU, >> and then pass the results on back. >> >> In addition qemu-fe would still have to be under some kind of restricted >> security context for it to be acceptable. This is going to want to be as >> locked down as possible. > > I think there's got to be some give and take here. > > It should at least be as locked down as libvirtd. From a security point > of view, we should be able to agree that we want libvirtd to be as > locked down as possible. > > But there shouldn't be a hard requirement to lock down qemu-fe more than > libvirtd. Instead, the requirement should be for qemu-fe to be as/more > vigilant in not trusting qemu-system-x86_64 as libvirtd is. > > The fundamental problem here, is that there is some logic in libvirtd > that rightly belongs in QEMU. In order to preserve the security model, > that means that we're going to have to take a subsection of QEMU and > trust it more. > >> So I'd see that you'd likely end up with the >> qemu-fe security policy being identical to the qemu security policy, > > Then there's no point in doing qemu-fe. qemu-fe should be thought of as > QEMU supplied libvirtd plugin. > >> with the exception that it would be allowed to open files on NFS without >> needing them to be labelled. So I don't really see that all this gives us >> any tangible benefits over just allowing the mgmt app to pass in the FDs >> directly. >> >>> But libvirt would still need to parse image files. >> >> Not neccessarily. As mentioned below, it is entirely possible to >> enable the mgmt app to pass in details of the backing files, at >> which point no image parsing is required by libvirt. Hence my >> assertion that the question of who does image parsing is irrelevant >> to this discussion. > > That's certainly true. > > Regards, > > Anthony Liguori
On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote: > On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: > >On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: > >>I don't think it makes sense to have qemu-fe do dynamic labelling. > >>You certainly could avoid the fd passing by having qemu-fe do the > >>open though and just let qemu-fe run without the restricted security > >>context. > > > >qemu-fe would also not be entirely simple, > > Indeed. > > >because it will need to act > >as a proxy for the monitor, in order to make hotplug work. ie the mgmt > >app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would > >then have to open the file and send 'drive_add fd:NN' onto the real QEMU, > >and then pass the results on back. > > > >In addition qemu-fe would still have to be under some kind of restricted > >security context for it to be acceptable. This is going to want to be as > >locked down as possible. > > I think there's got to be some give and take here. > > It should at least be as locked down as libvirtd. From a security > point of view, we should be able to agree that we want libvirtd to > be as locked down as possible. > > But there shouldn't be a hard requirement to lock down qemu-fe more > than libvirtd. Instead, the requirement should be for qemu-fe to be > as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is. > > The fundamental problem here, is that there is some logic in > libvirtd that rightly belongs in QEMU. In order to preserve the > security model, that means that we're going to have to take a > subsection of QEMU and trust it more. Well we have a process that makes security decisions, and a process which applies those security decisions and a process which is confined by those decisions. Currently libvirtd makes & applies the decisions, and qemu is confined. A qemu-fe model would mean that libvirt is making the decisions, but is then relying on qemu-fe to apply them. IMHO that split is undesirable, but that's besides the point, since this is not a decision that needs to be made now. 'qemu-fe' needs to have a way to communicate with the confined process ('qemu-system-XXX') to supply it the resources (file FDs) it needs to access. The requirements of such a comms channel for qemu-fe are going to be the same as those needed by libvirtd talking to QEMU today, or indeed by any process that is applying security decisions to QEMU. So inventing a 'qemu-fe' does not make the need for passing FDs into QEMU go away, nor does it improve/change the overall security of the architecture, it is merely a different architectural choice. It does however require alot more development work to create this new 'qemu-fe' program and get it debugged & generally usable in production deployments So adding FD passing to QEMU blocks creation of a 'qemu-fe' program, but is not dependant on it. Thus we can add FD passing to QEMU today and be a step closer to being able to create a 'qemu-fe' in the future, while also enabling libvirtd & other mgmt apps to take advantage of this to solve the immediate security problem we have with NFS today, without having to wait a months or years for 'qemu-fe' to get into a usable state. Regards, Daniel
On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant <coreyb@linux.vnet.ibm.com> wrote: > > > On 08/22/2011 01:25 PM, Anthony Liguori wrote: >> >> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: >>> >>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: >>>> >>>> I don't think it makes sense to have qemu-fe do dynamic labelling. >>>> You certainly could avoid the fd passing by having qemu-fe do the >>>> open though and just let qemu-fe run without the restricted security >>>> context. >>> >>> qemu-fe would also not be entirely simple, >> >> Indeed. >> > > I do like the idea of a privileged qemu-fe performing the open and passing > the fd to a restricted qemu. Me too. > However, I get the impression that this won't > get delivered nearly as quickly as fd: passing could be. How soon do we > need image isolation for NFS? > > Btw, this sounds similar to what Blue Swirl recommended here on v1 of this > patch: http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html I was thinking about simply doing fork() + setuid() at some point and using the FD passing structures directly. But would it bring advantages to have two separate executables, are they different from access control point of view vs. single but forked one? > Regards, > Corey > >>> because it will need to act >>> as a proxy for the monitor, in order to make hotplug work. ie the mgmt >>> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would >>> then have to open the file and send 'drive_add fd:NN' onto the real QEMU, >>> and then pass the results on back. >>> >>> In addition qemu-fe would still have to be under some kind of restricted >>> security context for it to be acceptable. This is going to want to be as >>> locked down as possible. >> >> I think there's got to be some give and take here. >> >> It should at least be as locked down as libvirtd. From a security point >> of view, we should be able to agree that we want libvirtd to be as >> locked down as possible. >> >> But there shouldn't be a hard requirement to lock down qemu-fe more than >> libvirtd. Instead, the requirement should be for qemu-fe to be as/more >> vigilant in not trusting qemu-system-x86_64 as libvirtd is. >> >> The fundamental problem here, is that there is some logic in libvirtd >> that rightly belongs in QEMU. In order to preserve the security model, >> that means that we're going to have to take a subsection of QEMU and >> trust it more. >> >>> So I'd see that you'd likely end up with the >>> qemu-fe security policy being identical to the qemu security policy, >> >> Then there's no point in doing qemu-fe. qemu-fe should be thought of as >> QEMU supplied libvirtd plugin. >> >>> with the exception that it would be allowed to open files on NFS without >>> needing them to be labelled. So I don't really see that all this gives us >>> any tangible benefits over just allowing the mgmt app to pass in the FDs >>> directly. >>> >>>> But libvirt would still need to parse image files. >>> >>> Not neccessarily. As mentioned below, it is entirely possible to >>> enable the mgmt app to pass in details of the backing files, at >>> which point no image parsing is required by libvirt. Hence my >>> assertion that the question of who does image parsing is irrelevant >>> to this discussion. >> >> That's certainly true. >> >> Regards, >> >> Anthony Liguori > > >
On Mon, Aug 22, 2011 at 6:22 PM, Daniel P. Berrange <berrange@redhat.com> wrote: > On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote: >> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: >> >On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: >> >>I don't think it makes sense to have qemu-fe do dynamic labelling. >> >>You certainly could avoid the fd passing by having qemu-fe do the >> >>open though and just let qemu-fe run without the restricted security >> >>context. >> > >> >qemu-fe would also not be entirely simple, >> >> Indeed. >> >> >because it will need to act >> >as a proxy for the monitor, in order to make hotplug work. ie the mgmt >> >app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would >> >then have to open the file and send 'drive_add fd:NN' onto the real QEMU, >> >and then pass the results on back. >> > >> >In addition qemu-fe would still have to be under some kind of restricted >> >security context for it to be acceptable. This is going to want to be as >> >locked down as possible. >> >> I think there's got to be some give and take here. >> >> It should at least be as locked down as libvirtd. From a security >> point of view, we should be able to agree that we want libvirtd to >> be as locked down as possible. >> >> But there shouldn't be a hard requirement to lock down qemu-fe more >> than libvirtd. Instead, the requirement should be for qemu-fe to be >> as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is. >> >> The fundamental problem here, is that there is some logic in >> libvirtd that rightly belongs in QEMU. In order to preserve the >> security model, that means that we're going to have to take a >> subsection of QEMU and trust it more. > > Well we have a process that makes security decisions, and a process > which applies those security decisions and a process which is confined > by those decisions. Currently libvirtd makes & applies the decisions, > and qemu is confined. A qemu-fe model would mean that libvirt is making > the decisions, but is then relying on qemu-fe to apply them. IMHO that > split is undesirable, but that's besides the point, since this is not > a decision that needs to be made now. > > 'qemu-fe' needs to have a way to communicate with the confined process > ('qemu-system-XXX') to supply it the resources (file FDs) it needs to > access. The requirements of such a comms channel for qemu-fe are going > to be the same as those needed by libvirtd talking to QEMU today, or > indeed by any process that is applying security decisions to QEMU. > > So inventing a 'qemu-fe' does not make the need for passing FDs into > QEMU go away, nor does it improve/change the overall security of the > architecture, it is merely a different architectural choice. It does > however require alot more development work to create this new 'qemu-fe' > program and get it debugged & generally usable in production deployments > > So adding FD passing to QEMU blocks creation of a 'qemu-fe' program, > but is not dependant on it. Thus we can add FD passing to QEMU today > and be a step closer to being able to create a 'qemu-fe' in the future, > while also enabling libvirtd & other mgmt apps to take advantage of > this to solve the immediate security problem we have with NFS today, > without having to wait a months or years for 'qemu-fe' to get into a > usable state. The advantage of this qemu-fe approach is that block format internals does not need to be shared between QEMU and libvirt. FD passing can still be useful for other purposes.
On 08/22/2011 01:22 PM, Daniel P. Berrange wrote: > On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote: >> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: >>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: >>>> I don't think it makes sense to have qemu-fe do dynamic labelling. >>>> You certainly could avoid the fd passing by having qemu-fe do the >>>> open though and just let qemu-fe run without the restricted security >>>> context. >>> >>> qemu-fe would also not be entirely simple, >> >> Indeed. >> >>> because it will need to act >>> as a proxy for the monitor, in order to make hotplug work. ie the mgmt >>> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would >>> then have to open the file and send 'drive_add fd:NN' onto the real QEMU, >>> and then pass the results on back. >>> >>> In addition qemu-fe would still have to be under some kind of restricted >>> security context for it to be acceptable. This is going to want to be as >>> locked down as possible. >> >> I think there's got to be some give and take here. >> >> It should at least be as locked down as libvirtd. From a security >> point of view, we should be able to agree that we want libvirtd to >> be as locked down as possible. >> >> But there shouldn't be a hard requirement to lock down qemu-fe more >> than libvirtd. Instead, the requirement should be for qemu-fe to be >> as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is. >> >> The fundamental problem here, is that there is some logic in >> libvirtd that rightly belongs in QEMU. In order to preserve the >> security model, that means that we're going to have to take a >> subsection of QEMU and trust it more. > > Well we have a process that makes security decisions, and a process > which applies those security decisions and a process which is confined > by those decisions. Currently libvirtd makes& applies the decisions, > and qemu is confined. A qemu-fe model would mean that libvirt is making > the decisions, but is then relying on qemu-fe to apply them. IMHO that > split is undesirable, but that's besides the point, since this is not > a decision that needs to be made now. > > 'qemu-fe' needs to have a way to communicate with the confined process > ('qemu-system-XXX') to supply it the resources (file FDs) it needs to > access. The requirements of such a comms channel for qemu-fe are going > to be the same as those needed by libvirtd talking to QEMU today, or > indeed by any process that is applying security decisions to QEMU. But the fundamental difference is that libvirtd uses what's ostensible a public, supported interface. That means when we add things like this, we're stuck supporting it for general use cases. It's much more palatable to do these things using a private interface such that we can change these things down the road without worrying about compatibility with third-party tools. Regards, Anthony Liguori
On 08/22/2011 03:25 PM, Anthony Liguori wrote: > On 08/22/2011 01:22 PM, Daniel P. Berrange wrote: >> On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote: >>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: >>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: >>>>> I don't think it makes sense to have qemu-fe do dynamic labelling. >>>>> You certainly could avoid the fd passing by having qemu-fe do the >>>>> open though and just let qemu-fe run without the restricted security >>>>> context. >>>> >>>> qemu-fe would also not be entirely simple, >>> >>> Indeed. >>> >>>> because it will need to act >>>> as a proxy for the monitor, in order to make hotplug work. ie the mgmt >>>> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would >>>> then have to open the file and send 'drive_add fd:NN' onto the real >>>> QEMU, >>>> and then pass the results on back. >>>> >>>> In addition qemu-fe would still have to be under some kind of >>>> restricted >>>> security context for it to be acceptable. This is going to want to >>>> be as >>>> locked down as possible. >>> >>> I think there's got to be some give and take here. >>> >>> It should at least be as locked down as libvirtd. From a security >>> point of view, we should be able to agree that we want libvirtd to >>> be as locked down as possible. >>> >>> But there shouldn't be a hard requirement to lock down qemu-fe more >>> than libvirtd. Instead, the requirement should be for qemu-fe to be >>> as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is. >>> >>> The fundamental problem here, is that there is some logic in >>> libvirtd that rightly belongs in QEMU. In order to preserve the >>> security model, that means that we're going to have to take a >>> subsection of QEMU and trust it more. >> >> Well we have a process that makes security decisions, and a process >> which applies those security decisions and a process which is confined >> by those decisions. Currently libvirtd makes& applies the decisions, >> and qemu is confined. A qemu-fe model would mean that libvirt is making >> the decisions, but is then relying on qemu-fe to apply them. IMHO that >> split is undesirable, but that's besides the point, since this is not >> a decision that needs to be made now. >> >> 'qemu-fe' needs to have a way to communicate with the confined process >> ('qemu-system-XXX') to supply it the resources (file FDs) it needs to >> access. The requirements of such a comms channel for qemu-fe are going >> to be the same as those needed by libvirtd talking to QEMU today, or >> indeed by any process that is applying security decisions to QEMU. > > But the fundamental difference is that libvirtd uses what's ostensible a > public, supported interface. That means when we add things like this, > we're stuck supporting it for general use cases. > > It's much more palatable to do these things using a private interface > such that we can change these things down the road without worrying > about compatibility with third-party tools. > > Regards, > > Anthony Liguori > Is this a nack for the fd: protocol? Or do we want to implement the fd: protocol as a stepping stone on the way to a privilege-separated qemu model? I know the fd: protocol is not ideal, but it does provide NFS image isolation, perhaps much sooner than privilege-separated qemu can.
On 08/23/2011 09:26 AM, Corey Bryant wrote: > On 08/22/2011 03:25 PM, Anthony Liguori wrote: >> On 08/22/2011 01:22 PM, Daniel P. Berrange wrote: >>> On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote: >>>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: >>>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: >>>>>> I don't think it makes sense to have qemu-fe do dynamic labelling. >>>>>> You certainly could avoid the fd passing by having qemu-fe do the >>>>>> open though and just let qemu-fe run without the restricted security >>>>>> context. >>>>> >>>>> qemu-fe would also not be entirely simple, >>>> >>>> Indeed. >>>> >>>>> because it will need to act >>>>> as a proxy for the monitor, in order to make hotplug work. ie the mgmt >>>>> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would >>>>> then have to open the file and send 'drive_add fd:NN' onto the real >>>>> QEMU, >>>>> and then pass the results on back. >>>>> >>>>> In addition qemu-fe would still have to be under some kind of >>>>> restricted >>>>> security context for it to be acceptable. This is going to want to >>>>> be as >>>>> locked down as possible. >>>> >>>> I think there's got to be some give and take here. >>>> >>>> It should at least be as locked down as libvirtd. From a security >>>> point of view, we should be able to agree that we want libvirtd to >>>> be as locked down as possible. >>>> >>>> But there shouldn't be a hard requirement to lock down qemu-fe more >>>> than libvirtd. Instead, the requirement should be for qemu-fe to be >>>> as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is. >>>> >>>> The fundamental problem here, is that there is some logic in >>>> libvirtd that rightly belongs in QEMU. In order to preserve the >>>> security model, that means that we're going to have to take a >>>> subsection of QEMU and trust it more. >>> >>> Well we have a process that makes security decisions, and a process >>> which applies those security decisions and a process which is confined >>> by those decisions. Currently libvirtd makes& applies the decisions, >>> and qemu is confined. A qemu-fe model would mean that libvirt is making >>> the decisions, but is then relying on qemu-fe to apply them. IMHO that >>> split is undesirable, but that's besides the point, since this is not >>> a decision that needs to be made now. >>> >>> 'qemu-fe' needs to have a way to communicate with the confined process >>> ('qemu-system-XXX') to supply it the resources (file FDs) it needs to >>> access. The requirements of such a comms channel for qemu-fe are going >>> to be the same as those needed by libvirtd talking to QEMU today, or >>> indeed by any process that is applying security decisions to QEMU. >> >> But the fundamental difference is that libvirtd uses what's ostensible a >> public, supported interface. That means when we add things like this, >> we're stuck supporting it for general use cases. >> >> It's much more palatable to do these things using a private interface >> such that we can change these things down the road without worrying >> about compatibility with third-party tools. >> >> Regards, >> >> Anthony Liguori >> > > Is this a nack for the fd: protocol? No, I think we're trying to understand what the options are. Regards, Anthony Liguori Or do we want to implement the fd: > protocol as a stepping stone on the way to a privilege-separated qemu > model? I know the fd: protocol is not ideal, but it does provide NFS > image isolation, perhaps much sooner than privilege-separated qemu can. >
On 08/22/2011 02:39 PM, Blue Swirl wrote: > On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com> wrote: >> > >> > >> > On 08/22/2011 01:25 PM, Anthony Liguori wrote: >>> >> >>> >> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: >>>> >>> >>>> >>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: >>>>> >>>> >>>>> >>>> I don't think it makes sense to have qemu-fe do dynamic labelling. >>>>> >>>> You certainly could avoid the fd passing by having qemu-fe do the >>>>> >>>> open though and just let qemu-fe run without the restricted security >>>>> >>>> context. >>>> >>> >>>> >>> qemu-fe would also not be entirely simple, >>> >> >>> >> Indeed. >>> >> >> > >> > I do like the idea of a privileged qemu-fe performing the open and passing >> > the fd to a restricted qemu. > Me too. > >> > However, I get the impression that this won't >> > get delivered nearly as quickly as fd: passing could be. How soon do we >> > need image isolation for NFS? >> > >> > Btw, this sounds similar to what Blue Swirl recommended here on v1 of this >> > patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html > I was thinking about simply doing fork() + setuid() at some point and > using the FD passing structures directly. But would it bring > advantages to have two separate executables, are they different from > access control point of view vs. single but forked one? > We could put together an SELinux policy that would transition qemu-fe to a more restricted domain (ie. no open privilege on NFS files) when it executes qemu-system-x86_64.
On Tue, Aug 23, 2011 at 11:13:34AM -0400, Corey Bryant wrote: > > > On 08/22/2011 02:39 PM, Blue Swirl wrote: > >On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com> wrote: > >>> > >>> > >>> On 08/22/2011 01:25 PM, Anthony Liguori wrote: > >>>>> > >>>>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: > >>>>>>> > >>>>>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: > >>>>>>>>> > >>>>>>>>> I don't think it makes sense to have qemu-fe do dynamic labelling. > >>>>>>>>> You certainly could avoid the fd passing by having qemu-fe do the > >>>>>>>>> open though and just let qemu-fe run without the restricted security > >>>>>>>>> context. > >>>>>>> > >>>>>>> qemu-fe would also not be entirely simple, > >>>>> > >>>>> Indeed. > >>>>> > >>> > >>> I do like the idea of a privileged qemu-fe performing the open and passing > >>> the fd to a restricted qemu. > >Me too. > > > >>> However, I get the impression that this won't > >>> get delivered nearly as quickly as fd: passing could be. How soon do we > >>> need image isolation for NFS? > >>> > >>> Btw, this sounds similar to what Blue Swirl recommended here on v1 of this > >>> patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html > >I was thinking about simply doing fork() + setuid() at some point and > >using the FD passing structures directly. But would it bring > >advantages to have two separate executables, are they different from > >access control point of view vs. single but forked one? > > > > We could put together an SELinux policy that would transition > qemu-fe to a more restricted domain (ie. no open privilege on NFS > files) when it executes qemu-system-x86_64. Thinking about this some more, I don't really think the idea of delegating open of NFS files to a separate qemu-fe is very desirable. Libvirt makes the decision on the security policy that the VM will run under, and provides audit records to log what resources are being assigned to the VM. From that point onwards, we must be able to guarantee that MAC will be enforced on the VM, according to what we logged via the auditd system. In the case where we delegate opening of the files to qemu-fe, and allow its policy to open NFS files, we no longer have a guarentee that the MAC policy will be enforced as we originally intended. Yes, qemu-fe will very likely honour what we tell it and open the correct files, and yes qmeu-fe has lower attack surface wrt the guest than the real qemu does, but we still loose the guarentee of MAC enforcement from libvirt's POV. Daniel
Am 23.08.2011 17:26, schrieb Daniel P. Berrange: > On Tue, Aug 23, 2011 at 11:13:34AM -0400, Corey Bryant wrote: >> >> >> On 08/22/2011 02:39 PM, Blue Swirl wrote: >>> On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com> wrote: >>>>> >>>>> >>>>> On 08/22/2011 01:25 PM, Anthony Liguori wrote: >>>>>>> >>>>>>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: >>>>>>>>> >>>>>>>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: >>>>>>>>>>> >>>>>>>>>>> I don't think it makes sense to have qemu-fe do dynamic labelling. >>>>>>>>>>> You certainly could avoid the fd passing by having qemu-fe do the >>>>>>>>>>> open though and just let qemu-fe run without the restricted security >>>>>>>>>>> context. >>>>>>>>> >>>>>>>>> qemu-fe would also not be entirely simple, >>>>>>> >>>>>>> Indeed. >>>>>>> >>>>> >>>>> I do like the idea of a privileged qemu-fe performing the open and passing >>>>> the fd to a restricted qemu. >>> Me too. >>> >>>>> However, I get the impression that this won't >>>>> get delivered nearly as quickly as fd: passing could be. How soon do we >>>>> need image isolation for NFS? >>>>> >>>>> Btw, this sounds similar to what Blue Swirl recommended here on v1 of this >>>>> patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html >>> I was thinking about simply doing fork() + setuid() at some point and >>> using the FD passing structures directly. But would it bring >>> advantages to have two separate executables, are they different from >>> access control point of view vs. single but forked one? >>> >> >> We could put together an SELinux policy that would transition >> qemu-fe to a more restricted domain (ie. no open privilege on NFS >> files) when it executes qemu-system-x86_64. > > Thinking about this some more, I don't really think the idea of delegating > open of NFS files to a separate qemu-fe is very desirable. Libvirt makes the > decision on the security policy that the VM will run under, and provides > audit records to log what resources are being assigned to the VM. From that > point onwards, we must be able to guarantee that MAC will be enforced on > the VM, according to what we logged via the auditd system. > > In the case where we delegate opening of the files to qemu-fe, and allow > its policy to open NFS files, we no longer have a guarentee that the MAC > policy will be enforced as we originally intended. Yes, qemu-fe will very > likely honour what we tell it and open the correct files, and yes qmeu-fe > has lower attack surface wrt the guest than the real qemu does, but we > still loose the guarentee of MAC enforcement from libvirt's POV. On the other hand, from a qemu POV libvirt is only one possible management tool. In practice, another very popular "management tool" for qemu is bash. With qemu-fe all the other tools, including direct invocation from the command line, would get some protection, too. Kevin
On Tue, Aug 23, 2011 at 05:50:03PM +0200, Kevin Wolf wrote: > Am 23.08.2011 17:26, schrieb Daniel P. Berrange: > > On Tue, Aug 23, 2011 at 11:13:34AM -0400, Corey Bryant wrote: > >> > >> > >> On 08/22/2011 02:39 PM, Blue Swirl wrote: > >>> On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com> wrote: > >>>>> > >>>>> > >>>>> On 08/22/2011 01:25 PM, Anthony Liguori wrote: > >>>>>>> > >>>>>>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: > >>>>>>>>> > >>>>>>>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: > >>>>>>>>>>> > >>>>>>>>>>> I don't think it makes sense to have qemu-fe do dynamic labelling. > >>>>>>>>>>> You certainly could avoid the fd passing by having qemu-fe do the > >>>>>>>>>>> open though and just let qemu-fe run without the restricted security > >>>>>>>>>>> context. > >>>>>>>>> > >>>>>>>>> qemu-fe would also not be entirely simple, > >>>>>>> > >>>>>>> Indeed. > >>>>>>> > >>>>> > >>>>> I do like the idea of a privileged qemu-fe performing the open and passing > >>>>> the fd to a restricted qemu. > >>> Me too. > >>> > >>>>> However, I get the impression that this won't > >>>>> get delivered nearly as quickly as fd: passing could be. How soon do we > >>>>> need image isolation for NFS? > >>>>> > >>>>> Btw, this sounds similar to what Blue Swirl recommended here on v1 of this > >>>>> patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html > >>> I was thinking about simply doing fork() + setuid() at some point and > >>> using the FD passing structures directly. But would it bring > >>> advantages to have two separate executables, are they different from > >>> access control point of view vs. single but forked one? > >>> > >> > >> We could put together an SELinux policy that would transition > >> qemu-fe to a more restricted domain (ie. no open privilege on NFS > >> files) when it executes qemu-system-x86_64. > > > > Thinking about this some more, I don't really think the idea of delegating > > open of NFS files to a separate qemu-fe is very desirable. Libvirt makes the > > decision on the security policy that the VM will run under, and provides > > audit records to log what resources are being assigned to the VM. From that > > point onwards, we must be able to guarantee that MAC will be enforced on > > the VM, according to what we logged via the auditd system. > > > > In the case where we delegate opening of the files to qemu-fe, and allow > > its policy to open NFS files, we no longer have a guarentee that the MAC > > policy will be enforced as we originally intended. Yes, qemu-fe will very > > likely honour what we tell it and open the correct files, and yes qmeu-fe > > has lower attack surface wrt the guest than the real qemu does, but we > > still loose the guarentee of MAC enforcement from libvirt's POV. > > On the other hand, from a qemu POV libvirt is only one possible > management tool. In practice, another very popular "management tool" for > qemu is bash. With qemu-fe all the other tools, including direct > invocation from the command line, would get some protection, too. That's why I said a qemu-fe like tool need not be mutually exclusive with exposing FD passing to a management tool like libvirt. Both qemu-fe and libvirt need to some mechanism to pass open FDs to the real QEMU. We could needlessly invent a new communication channel between qemu-fe and qemu that only it can use, or we can use the channel we already have - QMP - to provide an interface that either qemu-fe or libvirt, can use to pass FDs into the real QEMU. Daniel
On Tue, Aug 23, 2011 at 04:51:31PM +0100, Daniel P. Berrange wrote: > On Tue, Aug 23, 2011 at 05:50:03PM +0200, Kevin Wolf wrote: > > Am 23.08.2011 17:26, schrieb Daniel P. Berrange: > > > On Tue, Aug 23, 2011 at 11:13:34AM -0400, Corey Bryant wrote: > > >> > > >> > > >> On 08/22/2011 02:39 PM, Blue Swirl wrote: > > >>> On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com> wrote: > > >>>>> > > >>>>> > > >>>>> On 08/22/2011 01:25 PM, Anthony Liguori wrote: > > >>>>>>> > > >>>>>>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: > > >>>>>>>>> > > >>>>>>>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> I don't think it makes sense to have qemu-fe do dynamic labelling. > > >>>>>>>>>>> You certainly could avoid the fd passing by having qemu-fe do the > > >>>>>>>>>>> open though and just let qemu-fe run without the restricted security > > >>>>>>>>>>> context. > > >>>>>>>>> > > >>>>>>>>> qemu-fe would also not be entirely simple, > > >>>>>>> > > >>>>>>> Indeed. > > >>>>>>> > > >>>>> > > >>>>> I do like the idea of a privileged qemu-fe performing the open and passing > > >>>>> the fd to a restricted qemu. > > >>> Me too. > > >>> > > >>>>> However, I get the impression that this won't > > >>>>> get delivered nearly as quickly as fd: passing could be. How soon do we > > >>>>> need image isolation for NFS? > > >>>>> > > >>>>> Btw, this sounds similar to what Blue Swirl recommended here on v1 of this > > >>>>> patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html > > >>> I was thinking about simply doing fork() + setuid() at some point and > > >>> using the FD passing structures directly. But would it bring > > >>> advantages to have two separate executables, are they different from > > >>> access control point of view vs. single but forked one? > > >>> > > >> > > >> We could put together an SELinux policy that would transition > > >> qemu-fe to a more restricted domain (ie. no open privilege on NFS > > >> files) when it executes qemu-system-x86_64. > > > > > > Thinking about this some more, I don't really think the idea of delegating > > > open of NFS files to a separate qemu-fe is very desirable. Libvirt makes the > > > decision on the security policy that the VM will run under, and provides > > > audit records to log what resources are being assigned to the VM. From that > > > point onwards, we must be able to guarantee that MAC will be enforced on > > > the VM, according to what we logged via the auditd system. > > > > > > In the case where we delegate opening of the files to qemu-fe, and allow > > > its policy to open NFS files, we no longer have a guarentee that the MAC > > > policy will be enforced as we originally intended. Yes, qemu-fe will very > > > likely honour what we tell it and open the correct files, and yes qmeu-fe > > > has lower attack surface wrt the guest than the real qemu does, but we > > > still loose the guarentee of MAC enforcement from libvirt's POV. > > > > On the other hand, from a qemu POV libvirt is only one possible > > management tool. In practice, another very popular "management tool" for > > qemu is bash. With qemu-fe all the other tools, including direct > > invocation from the command line, would get some protection, too. > > That's why I said a qemu-fe like tool need not be mutually exclusive > with exposing FD passing to a management tool like libvirt. Both > qemu-fe and libvirt need to some mechanism to pass open FDs to the > real QEMU. We could needlessly invent a new communication channel > between qemu-fe and qemu that only it can use, or we can use the > channel we already have - QMP - to provide an interface that either > qemu-fe or libvirt, can use to pass FDs into the real QEMU. Or to put it another way... It should be possible to build a 'qemu-fe' tool which does sandboxing using soley the QEMU command line + QMP monitor. If this is not possible then, IMHO, QMP should be considered incomplete / a failure, or may point to other holes in QEMU's mgmt app APIs. eg perhaps it would demonstrate that we do in fact need a libblockdriver.so for mgmt apps to query info about disks. Regards, Daniel
On 08/23/2011 11:50 AM, Kevin Wolf wrote: > Am 23.08.2011 17:26, schrieb Daniel P. Berrange: >> > On Tue, Aug 23, 2011 at 11:13:34AM -0400, Corey Bryant wrote: >>> >> >>> >> >>> >> On 08/22/2011 02:39 PM, Blue Swirl wrote: >>>> >>> On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com> wrote: >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> On 08/22/2011 01:25 PM, Anthony Liguori wrote: >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> I don't think it makes sense to have qemu-fe do dynamic labelling. >>>>>>>>>>>> >>>>>>>>>>> You certainly could avoid the fd passing by having qemu-fe do the >>>>>>>>>>>> >>>>>>>>>>> open though and just let qemu-fe run without the restricted security >>>>>>>>>>>> >>>>>>>>>>> context. >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> qemu-fe would also not be entirely simple, >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> Indeed. >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>>> >>>>> I do like the idea of a privileged qemu-fe performing the open and passing >>>>>> >>>>> the fd to a restricted qemu. >>>> >>> Me too. >>>> >>> >>>>>> >>>>> However, I get the impression that this won't >>>>>> >>>>> get delivered nearly as quickly as fd: passing could be. How soon do we >>>>>> >>>>> need image isolation for NFS? >>>>>> >>>>> >>>>>> >>>>> Btw, this sounds similar to what Blue Swirl recommended here on v1 of this >>>>>> >>>>> patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html >>>> >>> I was thinking about simply doing fork() + setuid() at some point and >>>> >>> using the FD passing structures directly. But would it bring >>>> >>> advantages to have two separate executables, are they different from >>>> >>> access control point of view vs. single but forked one? >>>> >>> >>> >> >>> >> We could put together an SELinux policy that would transition >>> >> qemu-fe to a more restricted domain (ie. no open privilege on NFS >>> >> files) when it executes qemu-system-x86_64. >> > >> > Thinking about this some more, I don't really think the idea of delegating >> > open of NFS files to a separate qemu-fe is very desirable. Libvirt makes the >> > decision on the security policy that the VM will run under, and provides >> > audit records to log what resources are being assigned to the VM. From that >> > point onwards, we must be able to guarantee that MAC will be enforced on >> > the VM, according to what we logged via the auditd system. >> > >> > In the case where we delegate opening of the files to qemu-fe, and allow >> > its policy to open NFS files, we no longer have a guarentee that the MAC >> > policy will be enforced as we originally intended. Yes, qemu-fe will very >> > likely honour what we tell it and open the correct files, and yes qmeu-fe >> > has lower attack surface wrt the guest than the real qemu does, but we >> > still loose the guarentee of MAC enforcement from libvirt's POV. > On the other hand, from a qemu POV libvirt is only one possible > management tool. In practice, another very popular "management tool" for > qemu is bash. With qemu-fe all the other tools, including direct > invocation from the command line, would get some protection, too. > > Kevin True, but like you said it provides just some protection. To really be useful qemu-fe would need the ability to label qemu guest processes and image files to provide MAC isolation.
diff --git a/block.c b/block.c index 785c88e..e9e5613 100644 --- a/block.c +++ b/block.c @@ -555,7 +555,6 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, if (flags & BDRV_O_SNAPSHOT) { BlockDriverState *bs1; int64_t total_size; - int is_protocol = 0; BlockDriver *bdrv_qcow2; QEMUOptionParameter *options; char tmp_filename[PATH_MAX]; @@ -573,19 +572,17 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, } total_size = bdrv_getlength(bs1) & BDRV_SECTOR_MASK; - if (bs1->drv && bs1->drv->protocol_name) - is_protocol = 1; - bdrv_delete(bs1); get_tmp_filename(tmp_filename, sizeof(tmp_filename)); /* Real path is meaningless for protocols */ - if (is_protocol) + if (path_has_protocol(filename)) { snprintf(backing_filename, sizeof(backing_filename), "%s", filename); - else if (!realpath(filename, backing_filename)) + } else if (!realpath(filename, backing_filename)) { return -errno; + } bdrv_qcow2 = bdrv_find_format("qcow2"); options = parse_option_parameters("", bdrv_qcow2->create_options, NULL); diff --git a/block/raw-posix.c b/block/raw-posix.c index c5c9944..42ae2f4 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -28,6 +28,7 @@ #include "block_int.h" #include "module.h" #include "block/raw-posix-aio.h" +#include "monitor.h" #ifdef CONFIG_COCOA #include <paths.h> @@ -185,7 +186,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename, int bdrv_flags, int open_flags) { BDRVRawState *s = bs->opaque; - int fd, ret; + int fd = -1; + int ret; ret = raw_normalize_devicepath(&filename); if (ret != 0) { @@ -207,13 +209,21 @@ static int raw_open_common(BlockDriverState *bs, const char *filename, if (!(bdrv_flags & BDRV_O_CACHE_WB)) s->open_flags |= O_DSYNC; - s->fd = -1; - fd = qemu_open(filename, s->open_flags, 0644); - if (fd < 0) { - ret = -errno; - if (ret == -EROFS) - ret = -EACCES; - return ret; + if (s->fd == -1) { + fd = qemu_open(filename, s->open_flags, 0644); + if (fd < 0) { + ret = -errno; + if (ret == -EROFS) { + ret = -EACCES; + } + return ret; + } + } else { + fd = dup(s->fd); + if (fd < 0) { + ret = -errno; + return ret; + } } s->fd = fd; s->aligned_buf = NULL; @@ -271,6 +281,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags) { BDRVRawState *s = bs->opaque; + s->fd = -1; s->type = FTYPE_FILE; return raw_open_common(bs, filename, flags, 0); } @@ -904,6 +915,74 @@ static BlockDriver bdrv_file = { .create_options = raw_create_options, }; +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags) +{ + BDRVRawState *s = bs->opaque; + const char *fd_str; + int fd; + + /* extract the file descriptor - fail if it's not fd: */ + if (!strstart(filename, "fd:", &fd_str)) { + return -EINVAL; + } + + if (!qemu_isdigit(fd_str[0])) { + /* get fd from monitor, but don't remove from monitor's fd list */ + fd = qemu_get_fd(fd_str, 0); + if (fd == -1) { + return -EBADF; + } + } else { + char *endptr = NULL; + + fd = strtol(fd_str, &endptr, 10); + if (*endptr || (fd == 0 && fd_str == endptr)) { + return -EBADF; + } + } + + s->fd = fd; + s->type = FTYPE_FILE; + + return raw_open_common(bs, filename, flags, 0); +} + +static void raw_close_fd(BlockDriverState *bs) +{ + const char *fd_str; + + /* if file descriptor is an fdname, remove it from monitor's fd list */ + if (strstart(bs->filename, "fd:", &fd_str)) { + if (!qemu_isdigit(fd_str[0])) { + qemu_get_fd(fd_str, 1); + } + } + + raw_close(bs); +} + +static BlockDriver bdrv_file_fd = { + .format_name = "file", + .protocol_name = "fd", + .instance_size = sizeof(BDRVRawState), + .bdrv_probe = NULL, /* no probe for protocols */ + .bdrv_file_open = raw_open_fd, + .bdrv_read = raw_read, + .bdrv_write = raw_write, + .bdrv_close = raw_close_fd, + .bdrv_flush = raw_flush, + .bdrv_discard = raw_discard, + + .bdrv_aio_readv = raw_aio_readv, + .bdrv_aio_writev = raw_aio_writev, + .bdrv_aio_flush = raw_aio_flush, + + .bdrv_truncate = raw_truncate, + .bdrv_getlength = raw_getlength, + + .create_options = raw_create_options, +}; + /***********************************************/ /* host device */ @@ -1012,6 +1091,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags) } #endif + s->fd = -1; s->type = FTYPE_FILE; #if defined(__linux__) { @@ -1184,6 +1264,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags) BDRVRawState *s = bs->opaque; int ret; + s->fd = -1; s->type = FTYPE_FD; /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */ @@ -1302,6 +1383,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags) { BDRVRawState *s = bs->opaque; + s->fd = -1; s->type = FTYPE_CD; /* open will not fail even if no CD is inserted, so add O_NONBLOCK */ @@ -1527,6 +1609,7 @@ static void bdrv_file_init(void) * Register all the drivers. Note that order is important, the driver * registered last will get probed first. */ + bdrv_register(&bdrv_file_fd); bdrv_register(&bdrv_file); bdrv_register(&bdrv_host_device); #ifdef __linux__ diff --git a/migration-fd.c b/migration-fd.c index aee690a..b1ba625 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -61,7 +61,7 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon, s = g_malloc0(sizeof(*s)); - s->fd = monitor_get_fd(mon, fdname); + s->fd = monitor_get_fd(mon, fdname, 1); if (s->fd == -1) { DPRINTF("fd_migration: invalid file descriptor identifier\n"); goto err_after_alloc; diff --git a/monitor.c b/monitor.c index 39791dc..d03e762 100644 --- a/monitor.c +++ b/monitor.c @@ -1198,21 +1198,21 @@ static int add_graphics_client(Monitor *mon, const QDict *qdict, QObject **ret_d qerror_report(QERR_DEVICE_NOT_ACTIVE, "spice"); return -1; } - qerror_report(QERR_ADD_CLIENT_FAILED); - return -1; + qerror_report(QERR_ADD_CLIENT_FAILED); + return -1; #ifdef CONFIG_VNC } else if (strcmp(protocol, "vnc") == 0) { - int fd = monitor_get_fd(mon, fdname); - vnc_display_add_client(NULL, fd, skipauth); - return 0; + int fd = monitor_get_fd(mon, fdname, 1); + vnc_display_add_client(NULL, fd, skipauth); + return 0; #endif } else if ((s = qemu_chr_find(protocol)) != NULL) { - int fd = monitor_get_fd(mon, fdname); - if (qemu_chr_add_client(s, fd) < 0) { - qerror_report(QERR_ADD_CLIENT_FAILED); - return -1; - } - return 0; + int fd = monitor_get_fd(mon, fdname, 1); + if (qemu_chr_add_client(s, fd) < 0) { + qerror_report(QERR_ADD_CLIENT_FAILED); + return -1; + } + return 0; } qerror_report(QERR_INVALID_PARAMETER, "protocol"); @@ -2833,7 +2833,7 @@ static void do_loadvm(Monitor *mon, const QDict *qdict) } } -int monitor_get_fd(Monitor *mon, const char *fdname) +int monitor_get_fd(Monitor *mon, const char *fdname, unsigned char remove) { mon_fd_t *monfd; @@ -2846,10 +2846,12 @@ int monitor_get_fd(Monitor *mon, const char *fdname) fd = monfd->fd; - /* caller takes ownership of fd */ - QLIST_REMOVE(monfd, next); - g_free(monfd->name); - g_free(monfd); + if (remove) { + /* caller takes ownership of fd */ + QLIST_REMOVE(monfd, next); + g_free(monfd->name); + g_free(monfd); + } return fd; } @@ -2857,6 +2859,11 @@ int monitor_get_fd(Monitor *mon, const char *fdname) return -1; } +int qemu_get_fd(const char *fdname, unsigned char remove) +{ + return cur_mon ? monitor_get_fd(cur_mon, fdname, remove) : -1; +} + static const mon_cmd_t mon_cmds[] = { #include "hmp-commands.h" { NULL, NULL, }, diff --git a/monitor.h b/monitor.h index 4f2d328..f18de19 100644 --- a/monitor.h +++ b/monitor.h @@ -50,7 +50,8 @@ int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs, BlockDriverCompletionFunc *completion_cb, void *opaque); -int monitor_get_fd(Monitor *mon, const char *fdname); +int monitor_get_fd(Monitor *mon, const char *fdname, unsigned char remove); +int qemu_get_fd(const char *fdname, unsigned char remove); void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap) GCC_FMT_ATTR(2, 0); diff --git a/net.c b/net.c index d05930c..3482598 100644 --- a/net.c +++ b/net.c @@ -727,7 +727,7 @@ int net_handle_fd_param(Monitor *mon, const char *param) if (!qemu_isdigit(param[0]) && mon) { - fd = monitor_get_fd(mon, param); + fd = monitor_get_fd(mon, param, 1); if (fd == -1) { error_report("No file descriptor named %s found", param); return -1; diff --git a/qemu-options.hx b/qemu-options.hx index d86815d..869320b 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -131,7 +131,7 @@ using @file{/dev/cdrom} as filename (@pxref{host_drives}). ETEXI DEF("drive", HAS_ARG, QEMU_OPTION_drive, - "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n" + "-drive [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n" " [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n" " [,cache=writethrough|writeback|none|unsafe][,format=f]\n" " [,serial=s][,addr=A][,id=name][,aio=threads|native]\n" @@ -144,10 +144,12 @@ STEXI Define a new drive. Valid options are: @table @option -@item file=@var{file} +@item file=[fd:]@var{file} This option defines which disk image (@pxref{disk_images}) to use with this drive. If the filename contains comma, you must double it -(for instance, "file=my,,file" to use file "my,file"). +(for instance, "file=my,,file" to use file "my,file"). @option{fd:}@var{file} +specifies the file descriptor of an already open disk image. +@option{format=}@var{format} is required by @option{fd:}@var{file}. @item if=@var{interface} This option defines on which type on interface the drive is connected. Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio. diff --git a/qemu-tool.c b/qemu-tool.c index eb89fe0..a2a0129 100644 --- a/qemu-tool.c +++ b/qemu-tool.c @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock) { return 0; } + +int qemu_get_fd(const char *fdname, unsigned char remove) +{ + return -1; +}
sVirt provides SELinux MAC isolation for Qemu guest processes and their corresponding resources (image files). sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, which is needed for image file isolation when using the sVirt SELinux security driver in libvirt. The proposed solution entails a combination of Qemu, libvirt, and SELinux patches that work together to isolate multiple guests' images when they're stored in the same NFS mount. This results in an environment where sVirt isolation and NFS image file isolation can both be provided. Currently, Qemu opens an image file in addition to performing the necessary read and write operations. The proposed solution will move the open out of Qemu and into libvirt. Once libvirt opens an image file for the guest, it will pass the file descriptor to Qemu via a new fd: protocol. If the image file resides in an NFS mount, the following SELinux policy changes will provide image isolation: - A new SELinux boolean is created (e.g. virt_read_write_nfs) to allow Qemu (svirt_t) to only have SELinux read and write permissions on nfs_t files - Qemu (svirt_t) also gets SELinux use permissions on libvirt (virtd_t) file descriptors Following is a sample invocation of Qemu using the fd: protocol on the command line: qemu -drive file=fd:4,format=qcow2 The fd: protocol is also supported by the drive_add monitor command. This requires that the specified file descriptor is passed to the monitor alongside a prior getfd monitor command. This patch also supports the following features for the fd: protocol: - -snapshot command line option - savevm monitor command This patch does not contain support for the following features, all of which are planned to be supported in the future: - Copy-on-write backing files - snapshot_blkdev monitor command - -cdrom command line option - -drive command line option with media=cdrom - change monitor command v2: - Add drive_add monitor command support - Fence off unsupported features that re-open image file v3: - Fence off cdrom and change monitor command support v4: - Removed checks that fenced off features for fd: protocol - Enabled -snapshot and savevm support Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- block.c | 9 ++--- block/raw-posix.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++---- migration-fd.c | 2 +- monitor.c | 39 ++++++++++++-------- monitor.h | 3 +- net.c | 2 +- qemu-options.hx | 8 +++-- qemu-tool.c | 5 +++ 8 files changed, 131 insertions(+), 36 deletions(-)