diff mbox

[v4] Add support for fd: protocol

Message ID 1314024650-28510-1-git-send-email-coreyb@linux.vnet.ibm.com
State New
Headers show

Commit Message

Corey Bryant Aug. 22, 2011, 2:50 p.m. UTC
sVirt provides SELinux MAC isolation for Qemu guest processes and their
corresponding resources (image files). sVirt provides this support
by labeling guests and resources with security labels that are stored
in file system extended attributes. Some file systems, such as NFS, do
not support the extended attribute security namespace, which is needed
for image file isolation when using the sVirt SELinux security driver
in libvirt.

The proposed solution entails a combination of Qemu, libvirt, and
SELinux patches that work together to isolate multiple guests' images
when they're stored in the same NFS mount. This results in an
environment where sVirt isolation and NFS image file isolation can both
be provided.

Currently, Qemu opens an image file in addition to performing the
necessary read and write operations. The proposed solution will move
the open out of Qemu and into libvirt. Once libvirt opens an image
file for the guest, it will pass the file descriptor to Qemu via a
new fd: protocol.

If the image file resides in an NFS mount, the following SELinux policy
changes will provide image isolation:

  - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
    allow Qemu (svirt_t) to only have SELinux read and write
    permissions on nfs_t files

  - Qemu (svirt_t) also gets SELinux use permissions on libvirt
    (virtd_t) file descriptors

Following is a sample invocation of Qemu using the fd: protocol on
the command line:

    qemu -drive file=fd:4,format=qcow2

The fd: protocol is also supported by the drive_add monitor command.
This requires that the specified file descriptor is passed to the
monitor alongside a prior getfd monitor command.

This patch also supports the following features for the fd: protocol:
  - -snapshot command line option
  - savevm monitor command

This patch does not contain support for the following features, all
of which are planned to be supported in the future:
  - Copy-on-write backing files
  - snapshot_blkdev monitor command
  - -cdrom command line option
  - -drive command line option with media=cdrom
  - change monitor command

v2:
  - Add drive_add monitor command support
  - Fence off unsupported features that re-open image file

v3:
  - Fence off cdrom and change monitor command support

v4:
  - Removed checks that fenced off features for fd: protocol
  - Enabled -snapshot and savevm support

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
---
 block.c           |    9 ++---
 block/raw-posix.c |   99 ++++++++++++++++++++++++++++++++++++++++++++++++----
 migration-fd.c    |    2 +-
 monitor.c         |   39 ++++++++++++--------
 monitor.h         |    3 +-
 net.c             |    2 +-
 qemu-options.hx   |    8 +++--
 qemu-tool.c       |    5 +++
 8 files changed, 131 insertions(+), 36 deletions(-)

Comments

Christoph Hellwig Aug. 22, 2011, 3:38 p.m. UTC | #1
I'm still totally against this.  FD passing is a nice feature for sandboxing,
but the passing should be between closely cooperating programs.  We'll
need a tool shipped from the qemu source tree to open and set up the
FDs, and not someone external.  With that setup in place we can use
a protocol similar to the various OpenBSD privilegue separated deaemons
to also allow reopening / snapshots / etc.

Opening fds in libvirt and passing them into qemu is exactly the wrong way,
and just cements the current horrors where libvirt duplicates parsing
of image format headers.
Corey Bryant Aug. 22, 2011, 4:06 p.m. UTC | #2
On 08/22/2011 11:38 AM, Christoph Hellwig wrote:
> I'm still totally against this.  FD passing is a nice feature for sandboxing,
> but the passing should be between closely cooperating programs.  We'll
> need a tool shipped from the qemu source tree to open and set up the
> FDs, and not someone external.  With that setup in place we can use
> a protocol similar to the various OpenBSD privilegue separated deaemons
> to also allow reopening / snapshots / etc.
>
> Opening fds in libvirt and passing them into qemu is exactly the wrong way,
> and just cements the current horrors where libvirt duplicates parsing
> of image format headers.
>

This is following suit with exiting support that passes an fd for a TAP 
interface.  Libvirt already passes a file descriptor to Qemu via '-net 
tap,fd='.  Are you against that as well?

Regards,
Corey
Daniel P. Berrangé Aug. 22, 2011, 4:24 p.m. UTC | #3
On Mon, Aug 22, 2011 at 05:38:20PM +0200, Christoph Hellwig wrote:
> I'm still totally against this.  FD passing is a nice feature for sandboxing,
> but the passing should be between closely cooperating programs.  We'll
> need a tool shipped from the qemu source tree to open and set up the
> FDs, and not someone external.  With that setup in place we can use
> a protocol similar to the various OpenBSD privilegue separated deaemons
> to also allow reopening / snapshots / etc.
> 
> Opening fds in libvirt and passing them into qemu is exactly the wrong way,
> and just cements the current horrors where libvirt duplicates parsing
> of image format headers.

The primary goal of this work is to allow QEMU to use a file, without
giving it permission to open the file. This lets us cope with the current
limitations of NFS wrt SELinux labelling. Where ordinarily we'd relabel
the disk file to allow QEMU to open them, on NFS we can't do that. So we
setup a SELinux policy that allows QEMU to read any NFS files that it is
passed, but not actually open them. This allows secure use of QEMU with
NFS, without having to solve the NFS + SELinux labelling problems, which
is still a long term ongoing effort by NFS vendors.

Whether or not libvirt parses image format headers, is a completely
unrelated. Consider if libvirt did not parse image formats and instead
required the mgmt app to pass in details of all backing files. We still
have the problem of how to securely grant just one QEMU instance access
to the files. This still needs the FD passing support being proposed
here to cope with NFS.

So the question of whether or not libvirt should be parsing image format
headers is completely irrelevant to acceptability of this FD passing
support.

Regards,
Daniel
Anthony Liguori Aug. 22, 2011, 4:29 p.m. UTC | #4
On 08/22/2011 11:24 AM, Daniel P. Berrange wrote:
> On Mon, Aug 22, 2011 at 05:38:20PM +0200, Christoph Hellwig wrote:
>> I'm still totally against this.  FD passing is a nice feature for sandboxing,
>> but the passing should be between closely cooperating programs.  We'll
>> need a tool shipped from the qemu source tree to open and set up the
>> FDs, and not someone external.  With that setup in place we can use
>> a protocol similar to the various OpenBSD privilegue separated deaemons
>> to also allow reopening / snapshots / etc.
>>
>> Opening fds in libvirt and passing them into qemu is exactly the wrong way,
>> and just cements the current horrors where libvirt duplicates parsing
>> of image format headers.
>
> The primary goal of this work is to allow QEMU to use a file, without
> giving it permission to open the file. This lets us cope with the current
> limitations of NFS wrt SELinux labelling. Where ordinarily we'd relabel
> the disk file to allow QEMU to open them, on NFS we can't do that. So we
> setup a SELinux policy that allows QEMU to read any NFS files that it is
> passed, but not actually open them. This allows secure use of QEMU with
> NFS, without having to solve the NFS + SELinux labelling problems, which
> is still a long term ongoing effort by NFS vendors.

I think you miss the point Christoph is making.

Christoph is suggesting that we have two qemu executables, qemu-fe and 
qemu-system-x86_64.  qemu-fe would be smaller and would carry more 
rights than qemu-system-x86_64.

But I don't think this fixes the problem.  Something needs to do dynamic 
labelling of the backing files to implement a Chinese Wall MAC policy. 
In order to do that, something needs to parse the image formats.

I don't think it makes sense to have qemu-fe do dynamic labelling.  You 
certainly could avoid the fd passing by having qemu-fe do the open 
though and just let qemu-fe run without the restricted security context.

But libvirt would still need to parse image files.

Regards,

Anthony Liguori

>
> Whether or not libvirt parses image format headers, is a completely
> unrelated. Consider if libvirt did not parse image formats and instead
> required the mgmt app to pass in details of all backing files. We still
> have the problem of how to securely grant just one QEMU instance access
> to the files. This still needs the FD passing support being proposed
> here to cope with NFS.
>
> So the question of whether or not libvirt should be parsing image format
> headers is completely irrelevant to acceptability of this FD passing
> support.
>
> Regards,
> Daniel
Daniel P. Berrangé Aug. 22, 2011, 4:50 p.m. UTC | #5
On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
> On 08/22/2011 11:24 AM, Daniel P. Berrange wrote:
> >On Mon, Aug 22, 2011 at 05:38:20PM +0200, Christoph Hellwig wrote:
> >>I'm still totally against this.  FD passing is a nice feature for sandboxing,
> >>but the passing should be between closely cooperating programs.  We'll
> >>need a tool shipped from the qemu source tree to open and set up the
> >>FDs, and not someone external.  With that setup in place we can use
> >>a protocol similar to the various OpenBSD privilegue separated deaemons
> >>to also allow reopening / snapshots / etc.
> >>
> >>Opening fds in libvirt and passing them into qemu is exactly the wrong way,
> >>and just cements the current horrors where libvirt duplicates parsing
> >>of image format headers.
> >
> >The primary goal of this work is to allow QEMU to use a file, without
> >giving it permission to open the file. This lets us cope with the current
> >limitations of NFS wrt SELinux labelling. Where ordinarily we'd relabel
> >the disk file to allow QEMU to open them, on NFS we can't do that. So we
> >setup a SELinux policy that allows QEMU to read any NFS files that it is
> >passed, but not actually open them. This allows secure use of QEMU with
> >NFS, without having to solve the NFS + SELinux labelling problems, which
> >is still a long term ongoing effort by NFS vendors.
> 
> I think you miss the point Christoph is making.
> 
> Christoph is suggesting that we have two qemu executables, qemu-fe
> and qemu-system-x86_64.  qemu-fe would be smaller and would carry
> more rights than qemu-system-x86_64.
>
> But I don't think this fixes the problem.  Something needs to do
> dynamic labelling of the backing files to implement a Chinese Wall
> MAC policy. In order to do that, something needs to parse the image
> formats.
> 
> I don't think it makes sense to have qemu-fe do dynamic labelling.
> You certainly could avoid the fd passing by having qemu-fe do the
> open though and just let qemu-fe run without the restricted security
> context.

qemu-fe would also not be entirely simple, because it will need to act
as a proxy for the monitor, in order to make hotplug work. ie the mgmt
app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would
then have to open the file and send 'drive_add fd:NN' onto the real QEMU,
and then pass the results on back.

In addition qemu-fe would still have to be under some kind of restricted
security context for it to be acceptable. This is going to want to be as
locked down as possible. So I'd see that you'd likely end up with the
qemu-fe security policy being identical to the qemu security policy,
with the exception that it would be allowed to open files on NFS without
needing them to be labelled. So I don't really see that all this gives us
any tangible benefits over just allowing the mgmt app to pass in the FDs
directly.

> But libvirt would still need to parse image files.

Not neccessarily. As mentioned below, it is entirely possible to
enable the mgmt app to pass in details of the backing files, at
which point no image parsing is required by libvirt. Hence my
assertion that the question of who does image parsing is irrelevant
to this discussion.

> >Whether or not libvirt parses image format headers, is a completely
> >unrelated. Consider if libvirt did not parse image formats and instead
> >required the mgmt app to pass in details of all backing files. We still
> >have the problem of how to securely grant just one QEMU instance access
> >to the files. This still needs the FD passing support being proposed
> >here to cope with NFS.
> >
> >So the question of whether or not libvirt should be parsing image format
> >headers is completely irrelevant to acceptability of this FD passing
> >support.


Daniel
Anthony Liguori Aug. 22, 2011, 5:25 p.m. UTC | #6
On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
>> I don't think it makes sense to have qemu-fe do dynamic labelling.
>> You certainly could avoid the fd passing by having qemu-fe do the
>> open though and just let qemu-fe run without the restricted security
>> context.
>
> qemu-fe would also not be entirely simple,

Indeed.

> because it will need to act
> as a proxy for the monitor, in order to make hotplug work. ie the mgmt
> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would
> then have to open the file and send 'drive_add fd:NN' onto the real QEMU,
> and then pass the results on back.
>
> In addition qemu-fe would still have to be under some kind of restricted
> security context for it to be acceptable. This is going to want to be as
> locked down as possible.

I think there's got to be some give and take here.

It should at least be as locked down as libvirtd.  From a security point 
of view, we should be able to agree that we want libvirtd to be as 
locked down as possible.

But there shouldn't be a hard requirement to lock down qemu-fe more than 
libvirtd.  Instead, the requirement should be for qemu-fe to be as/more 
vigilant in not trusting qemu-system-x86_64 as libvirtd is.

The fundamental problem here, is that there is some logic in libvirtd 
that rightly belongs in QEMU.  In order to preserve the security model, 
that means that we're going to have to take a subsection of QEMU and 
trust it more.

> So I'd see that you'd likely end up with the
> qemu-fe security policy being identical to the qemu security policy,

Then there's no point in doing qemu-fe.  qemu-fe should be thought of as 
QEMU supplied libvirtd plugin.

> with the exception that it would be allowed to open files on NFS without
> needing them to be labelled. So I don't really see that all this gives us
> any tangible benefits over just allowing the mgmt app to pass in the FDs
> directly.
>
>> But libvirt would still need to parse image files.
>
> Not neccessarily. As mentioned below, it is entirely possible to
> enable the mgmt app to pass in details of the backing files, at
> which point no image parsing is required by libvirt. Hence my
> assertion that the question of who does image parsing is irrelevant
> to this discussion.

That's certainly true.

Regards,

Anthony Liguori
Corey Bryant Aug. 22, 2011, 5:42 p.m. UTC | #7
On 08/22/2011 01:25 PM, Anthony Liguori wrote:
> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
>>> I don't think it makes sense to have qemu-fe do dynamic labelling.
>>> You certainly could avoid the fd passing by having qemu-fe do the
>>> open though and just let qemu-fe run without the restricted security
>>> context.
>>
>> qemu-fe would also not be entirely simple,
>
> Indeed.
>

I do like the idea of a privileged qemu-fe performing the open and 
passing the fd to a restricted qemu.  However, I get the impression that 
this won't get delivered nearly as quickly as fd: passing could be.  How 
soon do we need image isolation for NFS?

Btw, this sounds similar to what Blue Swirl recommended here on v1 of 
this patch: 
http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html

Regards,
Corey

>> because it will need to act
>> as a proxy for the monitor, in order to make hotplug work. ie the mgmt
>> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would
>> then have to open the file and send 'drive_add fd:NN' onto the real QEMU,
>> and then pass the results on back.
>>
>> In addition qemu-fe would still have to be under some kind of restricted
>> security context for it to be acceptable. This is going to want to be as
>> locked down as possible.
>
> I think there's got to be some give and take here.
>
> It should at least be as locked down as libvirtd. From a security point
> of view, we should be able to agree that we want libvirtd to be as
> locked down as possible.
>
> But there shouldn't be a hard requirement to lock down qemu-fe more than
> libvirtd. Instead, the requirement should be for qemu-fe to be as/more
> vigilant in not trusting qemu-system-x86_64 as libvirtd is.
>
> The fundamental problem here, is that there is some logic in libvirtd
> that rightly belongs in QEMU. In order to preserve the security model,
> that means that we're going to have to take a subsection of QEMU and
> trust it more.
>
>> So I'd see that you'd likely end up with the
>> qemu-fe security policy being identical to the qemu security policy,
>
> Then there's no point in doing qemu-fe. qemu-fe should be thought of as
> QEMU supplied libvirtd plugin.
>
>> with the exception that it would be allowed to open files on NFS without
>> needing them to be labelled. So I don't really see that all this gives us
>> any tangible benefits over just allowing the mgmt app to pass in the FDs
>> directly.
>>
>>> But libvirt would still need to parse image files.
>>
>> Not neccessarily. As mentioned below, it is entirely possible to
>> enable the mgmt app to pass in details of the backing files, at
>> which point no image parsing is required by libvirt. Hence my
>> assertion that the question of who does image parsing is irrelevant
>> to this discussion.
>
> That's certainly true.
>
> Regards,
>
> Anthony Liguori
Daniel P. Berrangé Aug. 22, 2011, 6:22 p.m. UTC | #8
On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote:
> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
> >On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
> >>I don't think it makes sense to have qemu-fe do dynamic labelling.
> >>You certainly could avoid the fd passing by having qemu-fe do the
> >>open though and just let qemu-fe run without the restricted security
> >>context.
> >
> >qemu-fe would also not be entirely simple,
> 
> Indeed.
> 
> >because it will need to act
> >as a proxy for the monitor, in order to make hotplug work. ie the mgmt
> >app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would
> >then have to open the file and send 'drive_add fd:NN' onto the real QEMU,
> >and then pass the results on back.
> >
> >In addition qemu-fe would still have to be under some kind of restricted
> >security context for it to be acceptable. This is going to want to be as
> >locked down as possible.
> 
> I think there's got to be some give and take here.
> 
> It should at least be as locked down as libvirtd.  From a security
> point of view, we should be able to agree that we want libvirtd to
> be as locked down as possible.
> 
> But there shouldn't be a hard requirement to lock down qemu-fe more
> than libvirtd.  Instead, the requirement should be for qemu-fe to be
> as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is.
> 
> The fundamental problem here, is that there is some logic in
> libvirtd that rightly belongs in QEMU.  In order to preserve the
> security model, that means that we're going to have to take a
> subsection of QEMU and trust it more.

Well we have a process that makes security decisions, and a process
which applies those security decisions and a process which is confined
by those decisions. Currently libvirtd makes & applies the decisions,
and qemu is confined. A qemu-fe model would mean that libvirt is making
the decisions, but is then relying on qemu-fe to apply them. IMHO that
split is undesirable, but that's besides the point, since this is not
a decision that needs to be made now.

'qemu-fe' needs to have a way to communicate with the confined process
('qemu-system-XXX') to supply it the resources (file FDs) it needs to
access. The requirements of such a comms channel for qemu-fe are going
to be the same as those needed by libvirtd talking to QEMU today, or
indeed by any process that is applying security decisions to QEMU.

So inventing a 'qemu-fe' does not make the need for passing FDs into
QEMU go away, nor does it improve/change the overall security of the
architecture, it is merely a different architectural choice. It does
however require alot more development work to create this new 'qemu-fe'
program and get it debugged & generally usable in production deployments

So adding FD passing to QEMU blocks creation of a 'qemu-fe' program,
but is not dependant on it. Thus we can add FD passing to QEMU today
and be a step closer to being able to create a 'qemu-fe' in the future,
while also enabling libvirtd & other mgmt apps to take advantage of
this to solve the immediate security problem we have with NFS today,
without having to wait a months or years for 'qemu-fe' to get into a
usable state.

Regards,
Daniel
Blue Swirl Aug. 22, 2011, 6:39 p.m. UTC | #9
On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
>
>
> On 08/22/2011 01:25 PM, Anthony Liguori wrote:
>>
>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
>>>
>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
>>>>
>>>> I don't think it makes sense to have qemu-fe do dynamic labelling.
>>>> You certainly could avoid the fd passing by having qemu-fe do the
>>>> open though and just let qemu-fe run without the restricted security
>>>> context.
>>>
>>> qemu-fe would also not be entirely simple,
>>
>> Indeed.
>>
>
> I do like the idea of a privileged qemu-fe performing the open and passing
> the fd to a restricted qemu.

Me too.

>  However, I get the impression that this won't
> get delivered nearly as quickly as fd: passing could be.  How soon do we
> need image isolation for NFS?
>
> Btw, this sounds similar to what Blue Swirl recommended here on v1 of this
> patch: http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html

I was thinking about simply doing fork() + setuid() at some point and
using the FD passing structures directly. But would it bring
advantages to have two separate executables, are they different from
access control point of view vs. single but forked one?

> Regards,
> Corey
>
>>> because it will need to act
>>> as a proxy for the monitor, in order to make hotplug work. ie the mgmt
>>> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would
>>> then have to open the file and send 'drive_add fd:NN' onto the real QEMU,
>>> and then pass the results on back.
>>>
>>> In addition qemu-fe would still have to be under some kind of restricted
>>> security context for it to be acceptable. This is going to want to be as
>>> locked down as possible.
>>
>> I think there's got to be some give and take here.
>>
>> It should at least be as locked down as libvirtd. From a security point
>> of view, we should be able to agree that we want libvirtd to be as
>> locked down as possible.
>>
>> But there shouldn't be a hard requirement to lock down qemu-fe more than
>> libvirtd. Instead, the requirement should be for qemu-fe to be as/more
>> vigilant in not trusting qemu-system-x86_64 as libvirtd is.
>>
>> The fundamental problem here, is that there is some logic in libvirtd
>> that rightly belongs in QEMU. In order to preserve the security model,
>> that means that we're going to have to take a subsection of QEMU and
>> trust it more.
>>
>>> So I'd see that you'd likely end up with the
>>> qemu-fe security policy being identical to the qemu security policy,
>>
>> Then there's no point in doing qemu-fe. qemu-fe should be thought of as
>> QEMU supplied libvirtd plugin.
>>
>>> with the exception that it would be allowed to open files on NFS without
>>> needing them to be labelled. So I don't really see that all this gives us
>>> any tangible benefits over just allowing the mgmt app to pass in the FDs
>>> directly.
>>>
>>>> But libvirt would still need to parse image files.
>>>
>>> Not neccessarily. As mentioned below, it is entirely possible to
>>> enable the mgmt app to pass in details of the backing files, at
>>> which point no image parsing is required by libvirt. Hence my
>>> assertion that the question of who does image parsing is irrelevant
>>> to this discussion.
>>
>> That's certainly true.
>>
>> Regards,
>>
>> Anthony Liguori
>
>
>
Blue Swirl Aug. 22, 2011, 6:54 p.m. UTC | #10
On Mon, Aug 22, 2011 at 6:22 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
> On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote:
>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
>> >On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
>> >>I don't think it makes sense to have qemu-fe do dynamic labelling.
>> >>You certainly could avoid the fd passing by having qemu-fe do the
>> >>open though and just let qemu-fe run without the restricted security
>> >>context.
>> >
>> >qemu-fe would also not be entirely simple,
>>
>> Indeed.
>>
>> >because it will need to act
>> >as a proxy for the monitor, in order to make hotplug work. ie the mgmt
>> >app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would
>> >then have to open the file and send 'drive_add fd:NN' onto the real QEMU,
>> >and then pass the results on back.
>> >
>> >In addition qemu-fe would still have to be under some kind of restricted
>> >security context for it to be acceptable. This is going to want to be as
>> >locked down as possible.
>>
>> I think there's got to be some give and take here.
>>
>> It should at least be as locked down as libvirtd.  From a security
>> point of view, we should be able to agree that we want libvirtd to
>> be as locked down as possible.
>>
>> But there shouldn't be a hard requirement to lock down qemu-fe more
>> than libvirtd.  Instead, the requirement should be for qemu-fe to be
>> as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is.
>>
>> The fundamental problem here, is that there is some logic in
>> libvirtd that rightly belongs in QEMU.  In order to preserve the
>> security model, that means that we're going to have to take a
>> subsection of QEMU and trust it more.
>
> Well we have a process that makes security decisions, and a process
> which applies those security decisions and a process which is confined
> by those decisions. Currently libvirtd makes & applies the decisions,
> and qemu is confined. A qemu-fe model would mean that libvirt is making
> the decisions, but is then relying on qemu-fe to apply them. IMHO that
> split is undesirable, but that's besides the point, since this is not
> a decision that needs to be made now.
>
> 'qemu-fe' needs to have a way to communicate with the confined process
> ('qemu-system-XXX') to supply it the resources (file FDs) it needs to
> access. The requirements of such a comms channel for qemu-fe are going
> to be the same as those needed by libvirtd talking to QEMU today, or
> indeed by any process that is applying security decisions to QEMU.
>
> So inventing a 'qemu-fe' does not make the need for passing FDs into
> QEMU go away, nor does it improve/change the overall security of the
> architecture, it is merely a different architectural choice. It does
> however require alot more development work to create this new 'qemu-fe'
> program and get it debugged & generally usable in production deployments
>
> So adding FD passing to QEMU blocks creation of a 'qemu-fe' program,
> but is not dependant on it. Thus we can add FD passing to QEMU today
> and be a step closer to being able to create a 'qemu-fe' in the future,
> while also enabling libvirtd & other mgmt apps to take advantage of
> this to solve the immediate security problem we have with NFS today,
> without having to wait a months or years for 'qemu-fe' to get into a
> usable state.

The advantage of this qemu-fe approach is that block format internals
does not need to be shared between QEMU and libvirt.

FD passing can still be useful for other purposes.
Anthony Liguori Aug. 22, 2011, 7:25 p.m. UTC | #11
On 08/22/2011 01:22 PM, Daniel P. Berrange wrote:
> On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote:
>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
>>>> I don't think it makes sense to have qemu-fe do dynamic labelling.
>>>> You certainly could avoid the fd passing by having qemu-fe do the
>>>> open though and just let qemu-fe run without the restricted security
>>>> context.
>>>
>>> qemu-fe would also not be entirely simple,
>>
>> Indeed.
>>
>>> because it will need to act
>>> as a proxy for the monitor, in order to make hotplug work. ie the mgmt
>>> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would
>>> then have to open the file and send 'drive_add fd:NN' onto the real QEMU,
>>> and then pass the results on back.
>>>
>>> In addition qemu-fe would still have to be under some kind of restricted
>>> security context for it to be acceptable. This is going to want to be as
>>> locked down as possible.
>>
>> I think there's got to be some give and take here.
>>
>> It should at least be as locked down as libvirtd.  From a security
>> point of view, we should be able to agree that we want libvirtd to
>> be as locked down as possible.
>>
>> But there shouldn't be a hard requirement to lock down qemu-fe more
>> than libvirtd.  Instead, the requirement should be for qemu-fe to be
>> as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is.
>>
>> The fundamental problem here, is that there is some logic in
>> libvirtd that rightly belongs in QEMU.  In order to preserve the
>> security model, that means that we're going to have to take a
>> subsection of QEMU and trust it more.
>
> Well we have a process that makes security decisions, and a process
> which applies those security decisions and a process which is confined
> by those decisions. Currently libvirtd makes&  applies the decisions,
> and qemu is confined. A qemu-fe model would mean that libvirt is making
> the decisions, but is then relying on qemu-fe to apply them. IMHO that
> split is undesirable, but that's besides the point, since this is not
> a decision that needs to be made now.
>
> 'qemu-fe' needs to have a way to communicate with the confined process
> ('qemu-system-XXX') to supply it the resources (file FDs) it needs to
> access. The requirements of such a comms channel for qemu-fe are going
> to be the same as those needed by libvirtd talking to QEMU today, or
> indeed by any process that is applying security decisions to QEMU.

But the fundamental difference is that libvirtd uses what's ostensible a 
public, supported interface.  That means when we add things like this, 
we're stuck supporting it for general use cases.

It's much more palatable to do these things using a private interface 
such that we can change these things down the road without worrying 
about compatibility with third-party tools.

Regards,

Anthony Liguori
Corey Bryant Aug. 23, 2011, 2:26 p.m. UTC | #12
On 08/22/2011 03:25 PM, Anthony Liguori wrote:
> On 08/22/2011 01:22 PM, Daniel P. Berrange wrote:
>> On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote:
>>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
>>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
>>>>> I don't think it makes sense to have qemu-fe do dynamic labelling.
>>>>> You certainly could avoid the fd passing by having qemu-fe do the
>>>>> open though and just let qemu-fe run without the restricted security
>>>>> context.
>>>>
>>>> qemu-fe would also not be entirely simple,
>>>
>>> Indeed.
>>>
>>>> because it will need to act
>>>> as a proxy for the monitor, in order to make hotplug work. ie the mgmt
>>>> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would
>>>> then have to open the file and send 'drive_add fd:NN' onto the real
>>>> QEMU,
>>>> and then pass the results on back.
>>>>
>>>> In addition qemu-fe would still have to be under some kind of
>>>> restricted
>>>> security context for it to be acceptable. This is going to want to
>>>> be as
>>>> locked down as possible.
>>>
>>> I think there's got to be some give and take here.
>>>
>>> It should at least be as locked down as libvirtd. From a security
>>> point of view, we should be able to agree that we want libvirtd to
>>> be as locked down as possible.
>>>
>>> But there shouldn't be a hard requirement to lock down qemu-fe more
>>> than libvirtd. Instead, the requirement should be for qemu-fe to be
>>> as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is.
>>>
>>> The fundamental problem here, is that there is some logic in
>>> libvirtd that rightly belongs in QEMU. In order to preserve the
>>> security model, that means that we're going to have to take a
>>> subsection of QEMU and trust it more.
>>
>> Well we have a process that makes security decisions, and a process
>> which applies those security decisions and a process which is confined
>> by those decisions. Currently libvirtd makes& applies the decisions,
>> and qemu is confined. A qemu-fe model would mean that libvirt is making
>> the decisions, but is then relying on qemu-fe to apply them. IMHO that
>> split is undesirable, but that's besides the point, since this is not
>> a decision that needs to be made now.
>>
>> 'qemu-fe' needs to have a way to communicate with the confined process
>> ('qemu-system-XXX') to supply it the resources (file FDs) it needs to
>> access. The requirements of such a comms channel for qemu-fe are going
>> to be the same as those needed by libvirtd talking to QEMU today, or
>> indeed by any process that is applying security decisions to QEMU.
>
> But the fundamental difference is that libvirtd uses what's ostensible a
> public, supported interface. That means when we add things like this,
> we're stuck supporting it for general use cases.
>
> It's much more palatable to do these things using a private interface
> such that we can change these things down the road without worrying
> about compatibility with third-party tools.
>
> Regards,
>
> Anthony Liguori
>

Is this a nack for the fd: protocol?  Or do we want to implement the fd: 
protocol as a stepping stone on the way to a privilege-separated qemu 
model?  I know the fd: protocol is not ideal, but it does provide NFS 
image isolation, perhaps much sooner than privilege-separated qemu can.
Anthony Liguori Aug. 23, 2011, 2:33 p.m. UTC | #13
On 08/23/2011 09:26 AM, Corey Bryant wrote:
> On 08/22/2011 03:25 PM, Anthony Liguori wrote:
>> On 08/22/2011 01:22 PM, Daniel P. Berrange wrote:
>>> On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote:
>>>> On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
>>>>> On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
>>>>>> I don't think it makes sense to have qemu-fe do dynamic labelling.
>>>>>> You certainly could avoid the fd passing by having qemu-fe do the
>>>>>> open though and just let qemu-fe run without the restricted security
>>>>>> context.
>>>>>
>>>>> qemu-fe would also not be entirely simple,
>>>>
>>>> Indeed.
>>>>
>>>>> because it will need to act
>>>>> as a proxy for the monitor, in order to make hotplug work. ie the mgmt
>>>>> app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would
>>>>> then have to open the file and send 'drive_add fd:NN' onto the real
>>>>> QEMU,
>>>>> and then pass the results on back.
>>>>>
>>>>> In addition qemu-fe would still have to be under some kind of
>>>>> restricted
>>>>> security context for it to be acceptable. This is going to want to
>>>>> be as
>>>>> locked down as possible.
>>>>
>>>> I think there's got to be some give and take here.
>>>>
>>>> It should at least be as locked down as libvirtd. From a security
>>>> point of view, we should be able to agree that we want libvirtd to
>>>> be as locked down as possible.
>>>>
>>>> But there shouldn't be a hard requirement to lock down qemu-fe more
>>>> than libvirtd. Instead, the requirement should be for qemu-fe to be
>>>> as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is.
>>>>
>>>> The fundamental problem here, is that there is some logic in
>>>> libvirtd that rightly belongs in QEMU. In order to preserve the
>>>> security model, that means that we're going to have to take a
>>>> subsection of QEMU and trust it more.
>>>
>>> Well we have a process that makes security decisions, and a process
>>> which applies those security decisions and a process which is confined
>>> by those decisions. Currently libvirtd makes& applies the decisions,
>>> and qemu is confined. A qemu-fe model would mean that libvirt is making
>>> the decisions, but is then relying on qemu-fe to apply them. IMHO that
>>> split is undesirable, but that's besides the point, since this is not
>>> a decision that needs to be made now.
>>>
>>> 'qemu-fe' needs to have a way to communicate with the confined process
>>> ('qemu-system-XXX') to supply it the resources (file FDs) it needs to
>>> access. The requirements of such a comms channel for qemu-fe are going
>>> to be the same as those needed by libvirtd talking to QEMU today, or
>>> indeed by any process that is applying security decisions to QEMU.
>>
>> But the fundamental difference is that libvirtd uses what's ostensible a
>> public, supported interface. That means when we add things like this,
>> we're stuck supporting it for general use cases.
>>
>> It's much more palatable to do these things using a private interface
>> such that we can change these things down the road without worrying
>> about compatibility with third-party tools.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>
> Is this a nack for the fd: protocol?

No, I think we're trying to understand what the options are.

Regards,

Anthony Liguori

  Or do we want to implement the fd:
> protocol as a stepping stone on the way to a privilege-separated qemu
> model? I know the fd: protocol is not ideal, but it does provide NFS
> image isolation, perhaps much sooner than privilege-separated qemu can.
>
Corey Bryant Aug. 23, 2011, 3:13 p.m. UTC | #14
On 08/22/2011 02:39 PM, Blue Swirl wrote:
> On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com>  wrote:
>> >
>> >
>> >  On 08/22/2011 01:25 PM, Anthony Liguori wrote:
>>> >>
>>> >>  On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
>>>> >>>
>>>> >>>  On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
>>>>> >>>>
>>>>> >>>>  I don't think it makes sense to have qemu-fe do dynamic labelling.
>>>>> >>>>  You certainly could avoid the fd passing by having qemu-fe do the
>>>>> >>>>  open though and just let qemu-fe run without the restricted security
>>>>> >>>>  context.
>>>> >>>
>>>> >>>  qemu-fe would also not be entirely simple,
>>> >>
>>> >>  Indeed.
>>> >>
>> >
>> >  I do like the idea of a privileged qemu-fe performing the open and passing
>> >  the fd to a restricted qemu.
> Me too.
>
>> >    However, I get the impression that this won't
>> >  get delivered nearly as quickly as fd: passing could be.  How soon do we
>> >  need image isolation for NFS?
>> >
>> >  Btw, this sounds similar to what Blue Swirl recommended here on v1 of this
>> >  patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html
> I was thinking about simply doing fork() + setuid() at some point and
> using the FD passing structures directly. But would it bring
> advantages to have two separate executables, are they different from
> access control point of view vs. single but forked one?
>

We could put together an SELinux policy that would transition qemu-fe to 
a more restricted domain (ie. no open privilege on NFS files) when it 
executes qemu-system-x86_64.
Daniel P. Berrangé Aug. 23, 2011, 3:26 p.m. UTC | #15
On Tue, Aug 23, 2011 at 11:13:34AM -0400, Corey Bryant wrote:
> 
> 
> On 08/22/2011 02:39 PM, Blue Swirl wrote:
> >On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com>  wrote:
> >>>
> >>>
> >>>  On 08/22/2011 01:25 PM, Anthony Liguori wrote:
> >>>>>
> >>>>>  On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
> >>>>>>>
> >>>>>>>  On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
> >>>>>>>>>
> >>>>>>>>>  I don't think it makes sense to have qemu-fe do dynamic labelling.
> >>>>>>>>>  You certainly could avoid the fd passing by having qemu-fe do the
> >>>>>>>>>  open though and just let qemu-fe run without the restricted security
> >>>>>>>>>  context.
> >>>>>>>
> >>>>>>>  qemu-fe would also not be entirely simple,
> >>>>>
> >>>>>  Indeed.
> >>>>>
> >>>
> >>>  I do like the idea of a privileged qemu-fe performing the open and passing
> >>>  the fd to a restricted qemu.
> >Me too.
> >
> >>>    However, I get the impression that this won't
> >>>  get delivered nearly as quickly as fd: passing could be.  How soon do we
> >>>  need image isolation for NFS?
> >>>
> >>>  Btw, this sounds similar to what Blue Swirl recommended here on v1 of this
> >>>  patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html
> >I was thinking about simply doing fork() + setuid() at some point and
> >using the FD passing structures directly. But would it bring
> >advantages to have two separate executables, are they different from
> >access control point of view vs. single but forked one?
> >
> 
> We could put together an SELinux policy that would transition
> qemu-fe to a more restricted domain (ie. no open privilege on NFS
> files) when it executes qemu-system-x86_64.

Thinking about this some more, I don't really think the idea of delegating
open of NFS files to a separate qemu-fe is very desirable. Libvirt makes the
decision on the security policy that the VM will run under, and provides
audit records to log what resources are being assigned to the VM. From that
point onwards, we must be able to guarantee that MAC will be enforced on
the VM, according to what we logged via the auditd system.

In the case where we delegate opening of the files to qemu-fe, and allow
its policy to open NFS files, we no longer have a guarentee that the MAC
policy will be enforced as we originally intended. Yes, qemu-fe will very
likely honour what we tell it and open the correct files, and yes qmeu-fe
has lower attack surface wrt the guest than the real qemu does, but we
still loose the guarentee of MAC enforcement from libvirt's POV.

Daniel
Kevin Wolf Aug. 23, 2011, 3:50 p.m. UTC | #16
Am 23.08.2011 17:26, schrieb Daniel P. Berrange:
> On Tue, Aug 23, 2011 at 11:13:34AM -0400, Corey Bryant wrote:
>>
>>
>> On 08/22/2011 02:39 PM, Blue Swirl wrote:
>>> On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com>  wrote:
>>>>>
>>>>>
>>>>>  On 08/22/2011 01:25 PM, Anthony Liguori wrote:
>>>>>>>
>>>>>>>  On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
>>>>>>>>>
>>>>>>>>>  On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
>>>>>>>>>>>
>>>>>>>>>>>  I don't think it makes sense to have qemu-fe do dynamic labelling.
>>>>>>>>>>>  You certainly could avoid the fd passing by having qemu-fe do the
>>>>>>>>>>>  open though and just let qemu-fe run without the restricted security
>>>>>>>>>>>  context.
>>>>>>>>>
>>>>>>>>>  qemu-fe would also not be entirely simple,
>>>>>>>
>>>>>>>  Indeed.
>>>>>>>
>>>>>
>>>>>  I do like the idea of a privileged qemu-fe performing the open and passing
>>>>>  the fd to a restricted qemu.
>>> Me too.
>>>
>>>>>    However, I get the impression that this won't
>>>>>  get delivered nearly as quickly as fd: passing could be.  How soon do we
>>>>>  need image isolation for NFS?
>>>>>
>>>>>  Btw, this sounds similar to what Blue Swirl recommended here on v1 of this
>>>>>  patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html
>>> I was thinking about simply doing fork() + setuid() at some point and
>>> using the FD passing structures directly. But would it bring
>>> advantages to have two separate executables, are they different from
>>> access control point of view vs. single but forked one?
>>>
>>
>> We could put together an SELinux policy that would transition
>> qemu-fe to a more restricted domain (ie. no open privilege on NFS
>> files) when it executes qemu-system-x86_64.
> 
> Thinking about this some more, I don't really think the idea of delegating
> open of NFS files to a separate qemu-fe is very desirable. Libvirt makes the
> decision on the security policy that the VM will run under, and provides
> audit records to log what resources are being assigned to the VM. From that
> point onwards, we must be able to guarantee that MAC will be enforced on
> the VM, according to what we logged via the auditd system.
> 
> In the case where we delegate opening of the files to qemu-fe, and allow
> its policy to open NFS files, we no longer have a guarentee that the MAC
> policy will be enforced as we originally intended. Yes, qemu-fe will very
> likely honour what we tell it and open the correct files, and yes qmeu-fe
> has lower attack surface wrt the guest than the real qemu does, but we
> still loose the guarentee of MAC enforcement from libvirt's POV.

On the other hand, from a qemu POV libvirt is only one possible
management tool. In practice, another very popular "management tool" for
qemu is bash. With qemu-fe all the other tools, including direct
invocation from the command line, would get some protection, too.

Kevin
Daniel P. Berrangé Aug. 23, 2011, 3:51 p.m. UTC | #17
On Tue, Aug 23, 2011 at 05:50:03PM +0200, Kevin Wolf wrote:
> Am 23.08.2011 17:26, schrieb Daniel P. Berrange:
> > On Tue, Aug 23, 2011 at 11:13:34AM -0400, Corey Bryant wrote:
> >>
> >>
> >> On 08/22/2011 02:39 PM, Blue Swirl wrote:
> >>> On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com>  wrote:
> >>>>>
> >>>>>
> >>>>>  On 08/22/2011 01:25 PM, Anthony Liguori wrote:
> >>>>>>>
> >>>>>>>  On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
> >>>>>>>>>
> >>>>>>>>>  On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>  I don't think it makes sense to have qemu-fe do dynamic labelling.
> >>>>>>>>>>>  You certainly could avoid the fd passing by having qemu-fe do the
> >>>>>>>>>>>  open though and just let qemu-fe run without the restricted security
> >>>>>>>>>>>  context.
> >>>>>>>>>
> >>>>>>>>>  qemu-fe would also not be entirely simple,
> >>>>>>>
> >>>>>>>  Indeed.
> >>>>>>>
> >>>>>
> >>>>>  I do like the idea of a privileged qemu-fe performing the open and passing
> >>>>>  the fd to a restricted qemu.
> >>> Me too.
> >>>
> >>>>>    However, I get the impression that this won't
> >>>>>  get delivered nearly as quickly as fd: passing could be.  How soon do we
> >>>>>  need image isolation for NFS?
> >>>>>
> >>>>>  Btw, this sounds similar to what Blue Swirl recommended here on v1 of this
> >>>>>  patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html
> >>> I was thinking about simply doing fork() + setuid() at some point and
> >>> using the FD passing structures directly. But would it bring
> >>> advantages to have two separate executables, are they different from
> >>> access control point of view vs. single but forked one?
> >>>
> >>
> >> We could put together an SELinux policy that would transition
> >> qemu-fe to a more restricted domain (ie. no open privilege on NFS
> >> files) when it executes qemu-system-x86_64.
> > 
> > Thinking about this some more, I don't really think the idea of delegating
> > open of NFS files to a separate qemu-fe is very desirable. Libvirt makes the
> > decision on the security policy that the VM will run under, and provides
> > audit records to log what resources are being assigned to the VM. From that
> > point onwards, we must be able to guarantee that MAC will be enforced on
> > the VM, according to what we logged via the auditd system.
> > 
> > In the case where we delegate opening of the files to qemu-fe, and allow
> > its policy to open NFS files, we no longer have a guarentee that the MAC
> > policy will be enforced as we originally intended. Yes, qemu-fe will very
> > likely honour what we tell it and open the correct files, and yes qmeu-fe
> > has lower attack surface wrt the guest than the real qemu does, but we
> > still loose the guarentee of MAC enforcement from libvirt's POV.
> 
> On the other hand, from a qemu POV libvirt is only one possible
> management tool. In practice, another very popular "management tool" for
> qemu is bash. With qemu-fe all the other tools, including direct
> invocation from the command line, would get some protection, too.

That's why I said a qemu-fe like tool need not be mutually exclusive
with exposing FD passing to a management tool like libvirt. Both
qemu-fe and libvirt need to some mechanism to pass open FDs to the
real QEMU.  We could needlessly invent a new communication channel
between qemu-fe and qemu that only it can use, or we can use the
channel we already have - QMP - to provide an interface that either
qemu-fe or libvirt, can use to pass FDs into the real QEMU.

Daniel
Daniel P. Berrangé Aug. 23, 2011, 4:04 p.m. UTC | #18
On Tue, Aug 23, 2011 at 04:51:31PM +0100, Daniel P. Berrange wrote:
> On Tue, Aug 23, 2011 at 05:50:03PM +0200, Kevin Wolf wrote:
> > Am 23.08.2011 17:26, schrieb Daniel P. Berrange:
> > > On Tue, Aug 23, 2011 at 11:13:34AM -0400, Corey Bryant wrote:
> > >>
> > >>
> > >> On 08/22/2011 02:39 PM, Blue Swirl wrote:
> > >>> On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com>  wrote:
> > >>>>>
> > >>>>>
> > >>>>>  On 08/22/2011 01:25 PM, Anthony Liguori wrote:
> > >>>>>>>
> > >>>>>>>  On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
> > >>>>>>>>>
> > >>>>>>>>>  On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>  I don't think it makes sense to have qemu-fe do dynamic labelling.
> > >>>>>>>>>>>  You certainly could avoid the fd passing by having qemu-fe do the
> > >>>>>>>>>>>  open though and just let qemu-fe run without the restricted security
> > >>>>>>>>>>>  context.
> > >>>>>>>>>
> > >>>>>>>>>  qemu-fe would also not be entirely simple,
> > >>>>>>>
> > >>>>>>>  Indeed.
> > >>>>>>>
> > >>>>>
> > >>>>>  I do like the idea of a privileged qemu-fe performing the open and passing
> > >>>>>  the fd to a restricted qemu.
> > >>> Me too.
> > >>>
> > >>>>>    However, I get the impression that this won't
> > >>>>>  get delivered nearly as quickly as fd: passing could be.  How soon do we
> > >>>>>  need image isolation for NFS?
> > >>>>>
> > >>>>>  Btw, this sounds similar to what Blue Swirl recommended here on v1 of this
> > >>>>>  patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html
> > >>> I was thinking about simply doing fork() + setuid() at some point and
> > >>> using the FD passing structures directly. But would it bring
> > >>> advantages to have two separate executables, are they different from
> > >>> access control point of view vs. single but forked one?
> > >>>
> > >>
> > >> We could put together an SELinux policy that would transition
> > >> qemu-fe to a more restricted domain (ie. no open privilege on NFS
> > >> files) when it executes qemu-system-x86_64.
> > > 
> > > Thinking about this some more, I don't really think the idea of delegating
> > > open of NFS files to a separate qemu-fe is very desirable. Libvirt makes the
> > > decision on the security policy that the VM will run under, and provides
> > > audit records to log what resources are being assigned to the VM. From that
> > > point onwards, we must be able to guarantee that MAC will be enforced on
> > > the VM, according to what we logged via the auditd system.
> > > 
> > > In the case where we delegate opening of the files to qemu-fe, and allow
> > > its policy to open NFS files, we no longer have a guarentee that the MAC
> > > policy will be enforced as we originally intended. Yes, qemu-fe will very
> > > likely honour what we tell it and open the correct files, and yes qmeu-fe
> > > has lower attack surface wrt the guest than the real qemu does, but we
> > > still loose the guarentee of MAC enforcement from libvirt's POV.
> > 
> > On the other hand, from a qemu POV libvirt is only one possible
> > management tool. In practice, another very popular "management tool" for
> > qemu is bash. With qemu-fe all the other tools, including direct
> > invocation from the command line, would get some protection, too.
> 
> That's why I said a qemu-fe like tool need not be mutually exclusive
> with exposing FD passing to a management tool like libvirt. Both
> qemu-fe and libvirt need to some mechanism to pass open FDs to the
> real QEMU.  We could needlessly invent a new communication channel
> between qemu-fe and qemu that only it can use, or we can use the
> channel we already have - QMP - to provide an interface that either
> qemu-fe or libvirt, can use to pass FDs into the real QEMU.

Or to put it another way...

It should be possible to build a 'qemu-fe' tool which does sandboxing
using soley the QEMU command line + QMP monitor. If this is not possible
then, IMHO, QMP should be considered incomplete / a failure, or may point
to other holes in QEMU's mgmt app APIs. eg perhaps it would demonstrate
that we do in fact need a libblockdriver.so for mgmt apps to query info
about disks.

Regards,
Daniel
Corey Bryant Aug. 23, 2011, 4:14 p.m. UTC | #19
On 08/23/2011 11:50 AM, Kevin Wolf wrote:
> Am 23.08.2011 17:26, schrieb Daniel P. Berrange:
>> >  On Tue, Aug 23, 2011 at 11:13:34AM -0400, Corey Bryant wrote:
>>> >>
>>> >>
>>> >>  On 08/22/2011 02:39 PM, Blue Swirl wrote:
>>>> >>>  On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant<coreyb@linux.vnet.ibm.com>   wrote:
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>>    On 08/22/2011 01:25 PM, Anthony Liguori wrote:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>    On 08/22/2011 11:50 AM, Daniel P. Berrange wrote:
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>>    On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote:
>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>> >>>>>>>>>>>    I don't think it makes sense to have qemu-fe do dynamic labelling.
>>>>>>>>>>>> >>>>>>>>>>>    You certainly could avoid the fd passing by having qemu-fe do the
>>>>>>>>>>>> >>>>>>>>>>>    open though and just let qemu-fe run without the restricted security
>>>>>>>>>>>> >>>>>>>>>>>    context.
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>>    qemu-fe would also not be entirely simple,
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>    Indeed.
>>>>>>>> >>>>>>>
>>>>>> >>>>>
>>>>>> >>>>>    I do like the idea of a privileged qemu-fe performing the open and passing
>>>>>> >>>>>    the fd to a restricted qemu.
>>>> >>>  Me too.
>>>> >>>
>>>>>> >>>>>      However, I get the impression that this won't
>>>>>> >>>>>    get delivered nearly as quickly as fd: passing could be.  How soon do we
>>>>>> >>>>>    need image isolation for NFS?
>>>>>> >>>>>
>>>>>> >>>>>    Btw, this sounds similar to what Blue Swirl recommended here on v1 of this
>>>>>> >>>>>    patch:http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html
>>>> >>>  I was thinking about simply doing fork() + setuid() at some point and
>>>> >>>  using the FD passing structures directly. But would it bring
>>>> >>>  advantages to have two separate executables, are they different from
>>>> >>>  access control point of view vs. single but forked one?
>>>> >>>
>>> >>
>>> >>  We could put together an SELinux policy that would transition
>>> >>  qemu-fe to a more restricted domain (ie. no open privilege on NFS
>>> >>  files) when it executes qemu-system-x86_64.
>> >
>> >  Thinking about this some more, I don't really think the idea of delegating
>> >  open of NFS files to a separate qemu-fe is very desirable. Libvirt makes the
>> >  decision on the security policy that the VM will run under, and provides
>> >  audit records to log what resources are being assigned to the VM. From that
>> >  point onwards, we must be able to guarantee that MAC will be enforced on
>> >  the VM, according to what we logged via the auditd system.
>> >
>> >  In the case where we delegate opening of the files to qemu-fe, and allow
>> >  its policy to open NFS files, we no longer have a guarentee that the MAC
>> >  policy will be enforced as we originally intended. Yes, qemu-fe will very
>> >  likely honour what we tell it and open the correct files, and yes qmeu-fe
>> >  has lower attack surface wrt the guest than the real qemu does, but we
>> >  still loose the guarentee of MAC enforcement from libvirt's POV.
> On the other hand, from a qemu POV libvirt is only one possible
> management tool. In practice, another very popular "management tool" for
> qemu is bash. With qemu-fe all the other tools, including direct
> invocation from the command line, would get some protection, too.
>
> Kevin

True, but like you said it provides just some protection.  To really be 
useful qemu-fe would need the ability to label qemu guest processes and 
image files to provide MAC isolation.
diff mbox

Patch

diff --git a/block.c b/block.c
index 785c88e..e9e5613 100644
--- a/block.c
+++ b/block.c
@@ -555,7 +555,6 @@  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
     if (flags & BDRV_O_SNAPSHOT) {
         BlockDriverState *bs1;
         int64_t total_size;
-        int is_protocol = 0;
         BlockDriver *bdrv_qcow2;
         QEMUOptionParameter *options;
         char tmp_filename[PATH_MAX];
@@ -573,19 +572,17 @@  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
         }
         total_size = bdrv_getlength(bs1) & BDRV_SECTOR_MASK;
 
-        if (bs1->drv && bs1->drv->protocol_name)
-            is_protocol = 1;
-
         bdrv_delete(bs1);
 
         get_tmp_filename(tmp_filename, sizeof(tmp_filename));
 
         /* Real path is meaningless for protocols */
-        if (is_protocol)
+        if (path_has_protocol(filename)) {
             snprintf(backing_filename, sizeof(backing_filename),
                      "%s", filename);
-        else if (!realpath(filename, backing_filename))
+        } else if (!realpath(filename, backing_filename)) {
             return -errno;
+        }
 
         bdrv_qcow2 = bdrv_find_format("qcow2");
         options = parse_option_parameters("", bdrv_qcow2->create_options, NULL);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index c5c9944..42ae2f4 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -28,6 +28,7 @@ 
 #include "block_int.h"
 #include "module.h"
 #include "block/raw-posix-aio.h"
+#include "monitor.h"
 
 #ifdef CONFIG_COCOA
 #include <paths.h>
@@ -185,7 +186,8 @@  static int raw_open_common(BlockDriverState *bs, const char *filename,
                            int bdrv_flags, int open_flags)
 {
     BDRVRawState *s = bs->opaque;
-    int fd, ret;
+    int fd = -1;
+    int ret;
 
     ret = raw_normalize_devicepath(&filename);
     if (ret != 0) {
@@ -207,13 +209,21 @@  static int raw_open_common(BlockDriverState *bs, const char *filename,
     if (!(bdrv_flags & BDRV_O_CACHE_WB))
         s->open_flags |= O_DSYNC;
 
-    s->fd = -1;
-    fd = qemu_open(filename, s->open_flags, 0644);
-    if (fd < 0) {
-        ret = -errno;
-        if (ret == -EROFS)
-            ret = -EACCES;
-        return ret;
+    if (s->fd == -1) {
+        fd = qemu_open(filename, s->open_flags, 0644);
+        if (fd < 0) {
+            ret = -errno;
+            if (ret == -EROFS) {
+                ret = -EACCES;
+            }
+            return ret;
+        }
+    } else {
+        fd = dup(s->fd);
+        if (fd < 0) {
+            ret = -errno;
+            return ret;
+        }
     }
     s->fd = fd;
     s->aligned_buf = NULL;
@@ -271,6 +281,7 @@  static int raw_open(BlockDriverState *bs, const char *filename, int flags)
 {
     BDRVRawState *s = bs->opaque;
 
+    s->fd = -1;
     s->type = FTYPE_FILE;
     return raw_open_common(bs, filename, flags, 0);
 }
@@ -904,6 +915,74 @@  static BlockDriver bdrv_file = {
     .create_options = raw_create_options,
 };
 
+static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
+{
+    BDRVRawState *s = bs->opaque;
+    const char *fd_str;
+    int fd;
+
+    /* extract the file descriptor - fail if it's not fd: */
+    if (!strstart(filename, "fd:", &fd_str)) {
+        return -EINVAL;
+    }
+
+    if (!qemu_isdigit(fd_str[0])) {
+        /* get fd from monitor, but don't remove from monitor's fd list */
+        fd = qemu_get_fd(fd_str, 0);
+        if (fd == -1) {
+            return -EBADF;
+        }
+    } else {
+        char *endptr = NULL;
+
+        fd = strtol(fd_str, &endptr, 10);
+        if (*endptr || (fd == 0 && fd_str == endptr)) {
+            return -EBADF;
+        }
+    }
+
+    s->fd = fd;
+    s->type = FTYPE_FILE;
+
+    return raw_open_common(bs, filename, flags, 0);
+}
+
+static void raw_close_fd(BlockDriverState *bs)
+{
+    const char *fd_str;
+
+    /* if file descriptor is an fdname, remove it from monitor's fd list */
+    if (strstart(bs->filename, "fd:", &fd_str)) {
+        if (!qemu_isdigit(fd_str[0])) {
+            qemu_get_fd(fd_str, 1);
+        }
+    }
+
+    raw_close(bs);
+}
+
+static BlockDriver bdrv_file_fd = {
+    .format_name = "file",
+    .protocol_name = "fd",
+    .instance_size = sizeof(BDRVRawState),
+    .bdrv_probe = NULL, /* no probe for protocols */
+    .bdrv_file_open = raw_open_fd,
+    .bdrv_read = raw_read,
+    .bdrv_write = raw_write,
+    .bdrv_close = raw_close_fd,
+    .bdrv_flush = raw_flush,
+    .bdrv_discard = raw_discard,
+
+    .bdrv_aio_readv = raw_aio_readv,
+    .bdrv_aio_writev = raw_aio_writev,
+    .bdrv_aio_flush = raw_aio_flush,
+
+    .bdrv_truncate = raw_truncate,
+    .bdrv_getlength = raw_getlength,
+
+    .create_options = raw_create_options,
+};
+
 /***********************************************/
 /* host device */
 
@@ -1012,6 +1091,7 @@  static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
     }
 #endif
 
+    s->fd = -1;
     s->type = FTYPE_FILE;
 #if defined(__linux__)
     {
@@ -1184,6 +1264,7 @@  static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
     BDRVRawState *s = bs->opaque;
     int ret;
 
+    s->fd = -1;
     s->type = FTYPE_FD;
 
     /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
@@ -1302,6 +1383,7 @@  static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
 {
     BDRVRawState *s = bs->opaque;
 
+    s->fd = -1;
     s->type = FTYPE_CD;
 
     /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
@@ -1527,6 +1609,7 @@  static void bdrv_file_init(void)
      * Register all the drivers.  Note that order is important, the driver
      * registered last will get probed first.
      */
+    bdrv_register(&bdrv_file_fd);
     bdrv_register(&bdrv_file);
     bdrv_register(&bdrv_host_device);
 #ifdef __linux__
diff --git a/migration-fd.c b/migration-fd.c
index aee690a..b1ba625 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -61,7 +61,7 @@  MigrationState *fd_start_outgoing_migration(Monitor *mon,
 
     s = g_malloc0(sizeof(*s));
 
-    s->fd = monitor_get_fd(mon, fdname);
+    s->fd = monitor_get_fd(mon, fdname, 1);
     if (s->fd == -1) {
         DPRINTF("fd_migration: invalid file descriptor identifier\n");
         goto err_after_alloc;
diff --git a/monitor.c b/monitor.c
index 39791dc..d03e762 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1198,21 +1198,21 @@  static int add_graphics_client(Monitor *mon, const QDict *qdict, QObject **ret_d
             qerror_report(QERR_DEVICE_NOT_ACTIVE, "spice");
             return -1;
         }
-	qerror_report(QERR_ADD_CLIENT_FAILED);
-	return -1;
+        qerror_report(QERR_ADD_CLIENT_FAILED);
+        return -1;
 #ifdef CONFIG_VNC
     } else if (strcmp(protocol, "vnc") == 0) {
-	int fd = monitor_get_fd(mon, fdname);
-	vnc_display_add_client(NULL, fd, skipauth);
-	return 0;
+        int fd = monitor_get_fd(mon, fdname, 1);
+        vnc_display_add_client(NULL, fd, skipauth);
+        return 0;
 #endif
     } else if ((s = qemu_chr_find(protocol)) != NULL) {
-	int fd = monitor_get_fd(mon, fdname);
-	if (qemu_chr_add_client(s, fd) < 0) {
-	    qerror_report(QERR_ADD_CLIENT_FAILED);
-	    return -1;
-	}
-	return 0;
+        int fd = monitor_get_fd(mon, fdname, 1);
+        if (qemu_chr_add_client(s, fd) < 0) {
+            qerror_report(QERR_ADD_CLIENT_FAILED);
+            return -1;
+        }
+        return 0;
     }
 
     qerror_report(QERR_INVALID_PARAMETER, "protocol");
@@ -2833,7 +2833,7 @@  static void do_loadvm(Monitor *mon, const QDict *qdict)
     }
 }
 
-int monitor_get_fd(Monitor *mon, const char *fdname)
+int monitor_get_fd(Monitor *mon, const char *fdname, unsigned char remove)
 {
     mon_fd_t *monfd;
 
@@ -2846,10 +2846,12 @@  int monitor_get_fd(Monitor *mon, const char *fdname)
 
         fd = monfd->fd;
 
-        /* caller takes ownership of fd */
-        QLIST_REMOVE(monfd, next);
-        g_free(monfd->name);
-        g_free(monfd);
+        if (remove) {
+            /* caller takes ownership of fd */
+            QLIST_REMOVE(monfd, next);
+            g_free(monfd->name);
+            g_free(monfd);
+        }
 
         return fd;
     }
@@ -2857,6 +2859,11 @@  int monitor_get_fd(Monitor *mon, const char *fdname)
     return -1;
 }
 
+int qemu_get_fd(const char *fdname, unsigned char remove)
+{
+    return cur_mon ? monitor_get_fd(cur_mon, fdname, remove) : -1;
+}
+
 static const mon_cmd_t mon_cmds[] = {
 #include "hmp-commands.h"
     { NULL, NULL, },
diff --git a/monitor.h b/monitor.h
index 4f2d328..f18de19 100644
--- a/monitor.h
+++ b/monitor.h
@@ -50,7 +50,8 @@  int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
                                 BlockDriverCompletionFunc *completion_cb,
                                 void *opaque);
 
-int monitor_get_fd(Monitor *mon, const char *fdname);
+int monitor_get_fd(Monitor *mon, const char *fdname, unsigned char remove);
+int qemu_get_fd(const char *fdname, unsigned char remove);
 
 void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
     GCC_FMT_ATTR(2, 0);
diff --git a/net.c b/net.c
index d05930c..3482598 100644
--- a/net.c
+++ b/net.c
@@ -727,7 +727,7 @@  int net_handle_fd_param(Monitor *mon, const char *param)
 
     if (!qemu_isdigit(param[0]) && mon) {
 
-        fd = monitor_get_fd(mon, param);
+        fd = monitor_get_fd(mon, param, 1);
         if (fd == -1) {
             error_report("No file descriptor named %s found", param);
             return -1;
diff --git a/qemu-options.hx b/qemu-options.hx
index d86815d..869320b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -131,7 +131,7 @@  using @file{/dev/cdrom} as filename (@pxref{host_drives}).
 ETEXI
 
 DEF("drive", HAS_ARG, QEMU_OPTION_drive,
-    "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
+    "-drive [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
     "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
     "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
@@ -144,10 +144,12 @@  STEXI
 Define a new drive. Valid options are:
 
 @table @option
-@item file=@var{file}
+@item file=[fd:]@var{file}
 This option defines which disk image (@pxref{disk_images}) to use with
 this drive. If the filename contains comma, you must double it
-(for instance, "file=my,,file" to use file "my,file").
+(for instance, "file=my,,file" to use file "my,file"). @option{fd:}@var{file}
+specifies the file descriptor of an already open disk image.
+@option{format=}@var{format} is required by @option{fd:}@var{file}.
 @item if=@var{interface}
 This option defines on which type on interface the drive is connected.
 Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
diff --git a/qemu-tool.c b/qemu-tool.c
index eb89fe0..a2a0129 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -96,3 +96,8 @@  int64_t qemu_get_clock_ns(QEMUClock *clock)
 {
     return 0;
 }
+
+int qemu_get_fd(const char *fdname, unsigned char remove)
+{
+    return -1;
+}