Message ID | 1271797792-24571-5-git-send-email-lcapitulino@redhat.com |
---|---|
State | New |
Headers | show |
Luiz Capitulino <lcapitulino@redhat.com> wrote: > do_loadvm(), which implements the 'loadvm' Monitor command, pauses > the emulation to load the saved VM, however it will only resume > it if the loading succeeds. > > In other words, if the user issues 'loadvm' and it fails, the > end result will be an error message and a paused VM. > > This seems an undesirable side effect to me because, most of the > time, if a Monitor command fails the best thing we can do is to > leave the VM as it were before the command was executed. > > FIXME: This will try to run a potentially corrupted image, the > solution is to split load_vmstate() in two and only keep > the VM paused if qemu_loadvm_state() fails. Any of the other errors in loadvm also requires you to not load the state. > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Nack. This can cause disk corruption. You tried to load a vm image, failed the load somehow (notice the somehow) and you try to run it anyways. That is a recipe for disaster. If load_vmstate() fails -> you don't run. Current code is a mess, but don't do things worse. Later, Juan.
On Tue, 20 Apr 2010 23:28:23 +0200 Juan Quintela <quintela@redhat.com> wrote: > Luiz Capitulino <lcapitulino@redhat.com> wrote: > > do_loadvm(), which implements the 'loadvm' Monitor command, pauses > > the emulation to load the saved VM, however it will only resume > > it if the loading succeeds. > > > > In other words, if the user issues 'loadvm' and it fails, the > > end result will be an error message and a paused VM. > > > > This seems an undesirable side effect to me because, most of the > > time, if a Monitor command fails the best thing we can do is to > > leave the VM as it were before the command was executed. > > > > FIXME: This will try to run a potentially corrupted image, the > > solution is to split load_vmstate() in two and only keep > > the VM paused if qemu_loadvm_state() fails. > > Any of the other errors in loadvm also requires you to not load the > state. Really? Everything that happens before qemu_fopen_bdrv() seems to be only looking for the snapshot.. > > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> > > Nack. > > This can cause disk corruption. You tried to load a vm image, failed > the load somehow (notice the somehow) and you try to run it anyways. > That is a recipe for disaster. If load_vmstate() fails -> you don't run. My understanding is that the loading only happens in qemu_loadvm_state(), is this wrong? If it isn't, my plan is to split load_vmstate() in two functions: - load_vmstate_prepare(): everything before qemu_fopen_bdrv() - load_vmstate_finish(): qemu_loadvm_state() block then, do_loadvm() would do: err = load_vmstate_prepare(); if (err && vm_running) { vm_start(); return -1; } err = load_vmstate_finish(); if (err) { return -1; } vm_start(); return 0; And load_vmstate() would just call prepare() and finish(), maintaining its current behavior. It's important to understand why this is needed: currently you have a 'return 0' in line savevm.c:1872. It's an error path, but I'm almost sure that this trick is needed because if you return -1 there and loadvm fails for stupid reasons (like a not found snapshot) you get a paused VM.
Luiz Capitulino <lcapitulino@redhat.com> wrote: > On Tue, 20 Apr 2010 23:28:23 +0200 > Juan Quintela <quintela@redhat.com> wrote: > >> Luiz Capitulino <lcapitulino@redhat.com> wrote: >> > do_loadvm(), which implements the 'loadvm' Monitor command, pauses >> > the emulation to load the saved VM, however it will only resume >> > it if the loading succeeds. >> > >> > In other words, if the user issues 'loadvm' and it fails, the >> > end result will be an error message and a paused VM. >> > >> > This seems an undesirable side effect to me because, most of the >> > time, if a Monitor command fails the best thing we can do is to >> > leave the VM as it were before the command was executed. >> > >> > FIXME: This will try to run a potentially corrupted image, the >> > solution is to split load_vmstate() in two and only keep >> > the VM paused if qemu_loadvm_state() fails. >> >> Any of the other errors in loadvm also requires you to not load the >> state. > > Really? Everything that happens before qemu_fopen_bdrv() seems to be > only looking for the snapshot.. Let's see: bs = get_bs_snapshots(); if (!bs) { error_report("No block device supports snapshots"); return -EINVAL; } //// If we are asked to load a vm and there are no snapshots on any disk //// ... trying to run the image look overkill /* Flush all IO requests so they don't interfere with the new state. */ qemu_aio_flush(); QTAILQ_FOREACH(dinfo, &drives, next) { bs1 = dinfo->bdrv; if (bdrv_has_snapshot(bs1)) { /// We found a device that has snapshots ret = bdrv_snapshot_goto(bs1, name); if (ret < 0) { /// And don't have a snapshot with the name that we wanted switch(ret) { case -ENOTSUP: error_report("%sSnapshots not supported on device '%s'", bs != bs1 ? "Warning: " : "", bdrv_get_device_name(bs1)); break; case -ENOENT: error_report("%sCould not find snapshot '%s' on device '%s'", bs != bs1 ? "Warning: " : "", name, bdrv_get_device_name(bs1)); break; default: error_report("%sError %d while activating snapshot on '%s'", bs != bs1 ? "Warning: " : "", ret, bdrv_get_device_name(bs1)); break; } /* fatal on snapshot block device */ // I think that one inconditional exit with predjuice could be in order here // Notice that bdrv_snapshot_goto() modifies the disk, name is as bad as // you can get. It just open the disk, opens the snapshot, increases // its counter of users, and makes it available for use after here // (i.e. loading state, posibly conflicting with previous running // VM a.k.a. disk corruption. if (bs == bs1) return 0; // This error is as bad as it can gets :( We have to load a vmstate, // and the disk that should have the memory image don't have it. // This is an error, I just put the wrong nunmber the previous time. // Notice that this error should be very rare. } } } As stated, I don't think that trying to run the machine at any point would make any sense. Only case where it is safe to run it is if the failure is at get_bs_snapshots(), but at that point running the machine means: <something happens> $ loadvm other_image Error "other_image" snapshot don't exist. $ running the previous VM looks like something that should be done explicitely. If the error happened after that get_bs_snapshots(), We would need a new flag to just refuse to continue. Only valid operations at that point are other loadvm operations, i.e. our state is wrong one way or another. > My understanding is that the loading only happens in qemu_loadvm_state(), > is this wrong? As stated on description, don't make sense that split. It all case, what we need is the new flag to not allow other run operations other than loadvm. Later, Juan.
Am 20.04.2010 23:09, schrieb Luiz Capitulino: > do_loadvm(), which implements the 'loadvm' Monitor command, pauses > the emulation to load the saved VM, however it will only resume > it if the loading succeeds. > > In other words, if the user issues 'loadvm' and it fails, the > end result will be an error message and a paused VM. > > This seems an undesirable side effect to me because, most of the > time, if a Monitor command fails the best thing we can do is to > leave the VM as it were before the command was executed. I completely agree with Juan here, this is broken. If you could leave the VM as it was before, just like you describe above, everything would be okay. But in fact you can't tell where the error occurred. You may still have the old state; or you may have loaded the snapshot on one disk, but not on the other one; or you may have loaded snapshots for all disks, but only half of the devices. We must not run a machine in such an unknown state. I'd even go further and say that after a failed loadvm we must even prevent that a user uses the cont command to resume manually. > FIXME: This will try to run a potentially corrupted image, the > solution is to split load_vmstate() in two and only keep > the VM paused if qemu_loadvm_state() fails. Your suggestion of having a prepare function that doesn't change any state looks fine to me. It could just check if the snapshot is there and contains VM state. This should cover all of the trivial cases where recovery is really as easy as resuming the old state. Another problem that I see is that it's really hard to define what an error is. Current code print "Warning: ..." for all non-primary images and continues with loading the snapshot. I'm really unsure what the right behaviour would be if a snapshot doesn't exist on a secondary image (note that this is not the CD-ROM case, these drives are already excluded by bdrv_has_snapshot as they are read-only). Kevin
On Wed, 21 Apr 2010 10:36:29 +0200 Juan Quintela <quintela@redhat.com> wrote: > QTAILQ_FOREACH(dinfo, &drives, next) { > bs1 = dinfo->bdrv; > if (bdrv_has_snapshot(bs1)) { > > /// We found a device that has snapshots > ret = bdrv_snapshot_goto(bs1, name); > if (ret < 0) { > /// And don't have a snapshot with the name that we wanted > switch(ret) { > case -ENOTSUP: > error_report("%sSnapshots not supported on device '%s'", > bs != bs1 ? "Warning: " : "", > bdrv_get_device_name(bs1)); > break; > case -ENOENT: > error_report("%sCould not find snapshot '%s' on device '%s'", > bs != bs1 ? "Warning: " : "", > name, bdrv_get_device_name(bs1)); > break; > default: > error_report("%sError %d while activating snapshot on '%s'", > bs != bs1 ? "Warning: " : "", > ret, bdrv_get_device_name(bs1)); > break; > } > /* fatal on snapshot block device */ > // I think that one inconditional exit with predjuice could be in order here > > // Notice that bdrv_snapshot_goto() modifies the disk, name is as bad as > // you can get. It just open the disk, opens the snapshot, increases > // its counter of users, and makes it available for use after here > // (i.e. loading state, posibly conflicting with previous running > // VM a.k.a. disk corruption. > > if (bs == bs1) > return 0; > > // This error is as bad as it can gets :( We have to load a vmstate, > // and the disk that should have the memory image don't have it. > // This is an error, I just put the wrong nunmber the previous time. > // Notice that this error should be very rare. So, the current code is buggy and if you fix it (by returning -1) you'll get another bug: loadvm will stop the VM for trivial errors like a not found image. How do you plan to fix this? > As stated, I don't think that trying to run the machine at any point > would make any sense. Only case where it is safe to run it is if the > failure is at get_bs_snapshots(), but at that point running the machine > means: Actually, it must not pause the VM when recovery is (clearly) possible, otherwise it's a usability bug for the user Monitor and a possibly serious bug when you don't have human intervention (eg. QMP). > > <something happens> > $ loadvm other_image > Error "other_image" snapshot don't exist. > $ > > running the previous VM looks like something that should be done > explicitely. If the error happened after that get_bs_snapshots(), > We would need a new flag to just refuse to continue. Only valid > operations at that point are other loadvm operations, i.e. our state is > wrong one way or another. It's not clear to me how this flag can help, but anyway, what we need here is: 1. Fail when failure is reported (vs. report a failure and return OK) 2. Don't keep the VM paused when recovery is possible If you can fix that, it's ok to me: I'll drop this and the next patch. Otherwise I'll have to insist on the split.
On Wed, 21 Apr 2010 15:28:16 +0200 Kevin Wolf <kwolf@redhat.com> wrote: > Am 20.04.2010 23:09, schrieb Luiz Capitulino: > > do_loadvm(), which implements the 'loadvm' Monitor command, pauses > > the emulation to load the saved VM, however it will only resume > > it if the loading succeeds. > > > > In other words, if the user issues 'loadvm' and it fails, the > > end result will be an error message and a paused VM. > > > > This seems an undesirable side effect to me because, most of the > > time, if a Monitor command fails the best thing we can do is to > > leave the VM as it were before the command was executed. > > I completely agree with Juan here, this is broken. Yeah, it's an RFC. I decided to send it as is because I needed feedback as this series is a snowball. > If you could leave the VM as it was before, just like you describe > above, everything would be okay. But in fact you can't tell where the > error occurred. You may still have the old state; or you may have loaded > the snapshot on one disk, but not on the other one; or you may have > loaded snapshots for all disks, but only half of the devices. > > We must not run a machine in such an unknown state. I'd even go further > and say that after a failed loadvm we must even prevent that a user uses > the cont command to resume manually. Maybe 'info status' should have a specific status for this too. (Assuming we don't want to radically call exit(1)). > > FIXME: This will try to run a potentially corrupted image, the > > solution is to split load_vmstate() in two and only keep > > the VM paused if qemu_loadvm_state() fails. > > Your suggestion of having a prepare function that doesn't change any > state looks fine to me. It could just check if the snapshot is there and > contains VM state. This should cover all of the trivial cases where > recovery is really as easy as resuming the old state. That's exactly what I want to do. > Another problem that I see is that it's really hard to define what an > error is. Current code print "Warning: ..." for all non-primary images > and continues with loading the snapshot. I'm really unsure what the > right behaviour would be if a snapshot doesn't exist on a secondary > image (note that this is not the CD-ROM case, these drives are already > excluded by bdrv_has_snapshot as they are read-only). Defining the right behavior and deciding what to expose have been proven very difficult tasks in the QMP world.
Am 21.04.2010 17:08, schrieb Luiz Capitulino: > On Wed, 21 Apr 2010 15:28:16 +0200 > Kevin Wolf <kwolf@redhat.com> wrote: > >> Am 20.04.2010 23:09, schrieb Luiz Capitulino: >>> do_loadvm(), which implements the 'loadvm' Monitor command, pauses >>> the emulation to load the saved VM, however it will only resume >>> it if the loading succeeds. >>> >>> In other words, if the user issues 'loadvm' and it fails, the >>> end result will be an error message and a paused VM. >>> >>> This seems an undesirable side effect to me because, most of the >>> time, if a Monitor command fails the best thing we can do is to >>> leave the VM as it were before the command was executed. >> >> I completely agree with Juan here, this is broken. > > Yeah, it's an RFC. I decided to send it as is because I needed feedback as > this series is a snowball. Which is the right thing to do, so we can find such problems. :-) >> If you could leave the VM as it was before, just like you describe >> above, everything would be okay. But in fact you can't tell where the >> error occurred. You may still have the old state; or you may have loaded >> the snapshot on one disk, but not on the other one; or you may have >> loaded snapshots for all disks, but only half of the devices. >> >> We must not run a machine in such an unknown state. I'd even go further >> and say that after a failed loadvm we must even prevent that a user uses >> the cont command to resume manually. > > Maybe 'info status' should have a specific status for this too. > > (Assuming we don't want to radically call exit(1)). A different status that disallows cont sounds good. But exit() would work for me as well and is probably easier. However, I'm not sure if it would work well for management. >>> FIXME: This will try to run a potentially corrupted image, the >>> solution is to split load_vmstate() in two and only keep >>> the VM paused if qemu_loadvm_state() fails. >> >> Your suggestion of having a prepare function that doesn't change any >> state looks fine to me. It could just check if the snapshot is there and >> contains VM state. This should cover all of the trivial cases where >> recovery is really as easy as resuming the old state. > > That's exactly what I want to do. > >> Another problem that I see is that it's really hard to define what an >> error is. Current code print "Warning: ..." for all non-primary images >> and continues with loading the snapshot. I'm really unsure what the >> right behaviour would be if a snapshot doesn't exist on a secondary >> image (note that this is not the CD-ROM case, these drives are already >> excluded by bdrv_has_snapshot as they are read-only). > > Defining the right behavior and deciding what to expose have been > proven very difficult tasks in the QMP world. There's nothing QMP specific about it. The problem has always been there, QMP just means that you involuntarily get to review all of it. Kevin
Luiz Capitulino <lcapitulino@redhat.com> wrote: > On Wed, 21 Apr 2010 10:36:29 +0200 > Juan Quintela <quintela@redhat.com> wrote: > >> QTAILQ_FOREACH(dinfo, &drives, next) { >> bs1 = dinfo->bdrv; >> if (bdrv_has_snapshot(bs1)) { >> >> /// We found a device that has snapshots >> ret = bdrv_snapshot_goto(bs1, name); >> if (ret < 0) { >> /// And don't have a snapshot with the name that we wanted >> switch(ret) { >> case -ENOTSUP: >> error_report("%sSnapshots not supported on device '%s'", >> bs != bs1 ? "Warning: " : "", >> bdrv_get_device_name(bs1)); >> break; >> case -ENOENT: >> error_report("%sCould not find snapshot '%s' on device '%s'", >> bs != bs1 ? "Warning: " : "", >> name, bdrv_get_device_name(bs1)); >> break; >> default: >> error_report("%sError %d while activating snapshot on '%s'", >> bs != bs1 ? "Warning: " : "", >> ret, bdrv_get_device_name(bs1)); >> break; >> } >> /* fatal on snapshot block device */ >> // I think that one inconditional exit with predjuice could be in order here >> >> // Notice that bdrv_snapshot_goto() modifies the disk, name is as bad as >> // you can get. It just open the disk, opens the snapshot, increases >> // its counter of users, and makes it available for use after here >> // (i.e. loading state, posibly conflicting with previous running >> // VM a.k.a. disk corruption. >> >> if (bs == bs1) >> return 0; >> >> // This error is as bad as it can gets :( We have to load a vmstate, >> // and the disk that should have the memory image don't have it. >> // This is an error, I just put the wrong nunmber the previous time. >> // Notice that this error should be very rare. > > So, the current code is buggy and if you fix it (by returning -1) > you'll get another bug: loadvm will stop the VM for trivial errors > like a not found image. It is not a trivial error!!!! And worse, it is not recoverable :( > How do you plan to fix this? Returning error and stoping machine. >> As stated, I don't think that trying to run the machine at any point >> would make any sense. Only case where it is safe to run it is if the >> failure is at get_bs_snapshots(), but at that point running the machine >> means: > > Actually, it must not pause the VM when recovery is (clearly) possible, > otherwise it's a usability bug for the user Monitor and a possibly serious > bug when you don't have human intervention (eg. QMP). It is not posible, we have change the device status from what was before. bets are off. we don't have a way to go back to the "safe state". >> <something happens> >> $ loadvm other_image >> Error "other_image" snapshot don't exist. >> $ >> >> running the previous VM looks like something that should be done >> explicitely. If the error happened after that get_bs_snapshots(), >> We would need a new flag to just refuse to continue. Only valid >> operations at that point are other loadvm operations, i.e. our state is >> wrong one way or another. > > It's not clear to me how this flag can help, but anyway, what we need > here is: > > 1. Fail when failure is reported (vs. report a failure and return OK) This is a bug, plain an simple. > 2. Don't keep the VM paused when recovery is possible > > If you can fix that, it's ok to me: I'll drop this and the next patch. > > Otherwise I'll have to insist on the split. Re-read my email. At this point, nothing is fixable :( After doing the 1st: >> ret = bdrv_snapshot_goto(bs1, name); and not returning an error -> state has changed, period. You can't restart the machine. If you prefer, you can chang loadvm in a way that after a failure -> you can't "cont" it until you get a "working" loadvm. Later, Juan.
Am 21.04.2010 17:39, schrieb Juan Quintela: > Luiz Capitulino <lcapitulino@redhat.com> wrote: >> 2. Don't keep the VM paused when recovery is possible >> >> If you can fix that, it's ok to me: I'll drop this and the next patch. >> >> Otherwise I'll have to insist on the split. > > Re-read my email. At this point, nothing is fixable :( After doing > the 1st: > >>> ret = bdrv_snapshot_goto(bs1, name); > > and not returning an error -> state has changed, period. You can't > restart the machine. Right, at this point. But the most likely error for bdrv_snapshot_goto is that the snapshot doesn't even exist. This is something that you can check before you change any state. I think this is what Luiz meant (at least it is what I said and he agreed). Kevin
Luiz Capitulino <lcapitulino@redhat.com> wrote: > On Wed, 21 Apr 2010 15:28:16 +0200 > Kevin Wolf <kwolf@redhat.com> wrote: > >> Am 20.04.2010 23:09, schrieb Luiz Capitulino: >> > do_loadvm(), which implements the 'loadvm' Monitor command, pauses >> > the emulation to load the saved VM, however it will only resume >> > it if the loading succeeds. >> > >> > In other words, if the user issues 'loadvm' and it fails, the >> > end result will be an error message and a paused VM. >> > >> > This seems an undesirable side effect to me because, most of the >> > time, if a Monitor command fails the best thing we can do is to >> > leave the VM as it were before the command was executed. >> >> I completely agree with Juan here, this is broken. > > Yeah, it's an RFC. I decided to send it as is because I needed feedback as > this series is a snowball. > >> If you could leave the VM as it was before, just like you describe >> above, everything would be okay. But in fact you can't tell where the >> error occurred. You may still have the old state; or you may have loaded >> the snapshot on one disk, but not on the other one; or you may have >> loaded snapshots for all disks, but only half of the devices. >> >> We must not run a machine in such an unknown state. I'd even go further >> and say that after a failed loadvm we must even prevent that a user uses >> the cont command to resume manually. > > Maybe 'info status' should have a specific status for this too. > > (Assuming we don't want to radically call exit(1)). I tried a variation of this in the past, and was not a clear agreement. Basically, after a working migration to other host, you don't want to allow "cont" on the source node (it target has ever changed anything, it would give disk corruption). But my suggestion to disable "cont" after that got complains that people wanted a "I_know_what_I_am_doing_cont". (not the real syntax). Perhaps it is time to revise this issue? Later, Juan.
Kevin Wolf <kwolf@redhat.com> wrote: > Am 21.04.2010 17:08, schrieb Luiz Capitulino: > A different status that disallows cont sounds good. But exit() would > work for me as well and is probably easier. However, I'm not sure if it > would work well for management. I think that management would prefer it. It is developers who try to loadvm several times on the same machine. For management, if loadvm fails, having to restart qemu is the less of its problems, they already have infrastructure to kill process and rerun (probably with something tweeaked). >> Defining the right behavior and deciding what to expose have been >> proven very difficult tasks in the QMP world. > > There's nothing QMP specific about it. The problem has always been > there, QMP just means that you involuntarily get to review all of it. And we are thankful for the review :) Later, Juan.
Juan Quintela wrote: > Luiz Capitulino <lcapitulino@redhat.com> wrote: > > On Wed, 21 Apr 2010 15:28:16 +0200 > > Kevin Wolf <kwolf@redhat.com> wrote: > I tried a variation of this in the past, and was not a clear agreement. > > Basically, after a working migration to other host, you don't want to > allow "cont" on the source node (it target has ever changed anything, it > would give disk corruption). This is not true if the target is using a copy of the disk. Making copies is cheap on some hosts (Linux btrfs with it's COW features). Forking a guest can be handy for testing things, starting from a known run state. The only thing to get confused is networking because of duplicate addresses, and there are host-side ways around that (Linux network namespaces). If I understand correctly, we can already do this by migrating to a file and copying the files. There's no reason to block the live equivalent, provided there is a way to copy the disk image when it's quiesced. So it's wrong to block "cont" on the source, but "cont --I_know_what_I_am_doing" might be good advice :-) > But my suggestion to disable "cont" after that got complains that people > wanted a "I_know_what_I_am_doing_cont". (not the real syntax). Perhaps > it is time to revise this issue? -- Jamie
On Wed, 21 Apr 2010 17:42:54 +0200 Kevin Wolf <kwolf@redhat.com> wrote: > Am 21.04.2010 17:39, schrieb Juan Quintela: > > Luiz Capitulino <lcapitulino@redhat.com> wrote: > >> 2. Don't keep the VM paused when recovery is possible > >> > >> If you can fix that, it's ok to me: I'll drop this and the next patch. > >> > >> Otherwise I'll have to insist on the split. > > > > Re-read my email. At this point, nothing is fixable :( After doing > > the 1st: > > > >>> ret = bdrv_snapshot_goto(bs1, name); > > > > and not returning an error -> state has changed, period. You can't > > restart the machine. > > Right, at this point. But the most likely error for bdrv_snapshot_goto > is that the snapshot doesn't even exist. This is something that you can > check before you change any state. I think this is what Luiz meant (at > least it is what I said and he agreed). Yes, that's it.
diff --git a/monitor.c b/monitor.c index 975e77c..542c858 100644 --- a/monitor.c +++ b/monitor.c @@ -2472,8 +2472,11 @@ static void do_loadvm(Monitor *mon, const QDict *qdict) vm_stop(0); - if (load_vmstate(name) >= 0 && saved_vm_running) + load_vmstate(name); + + if (saved_vm_running) { vm_start(); + } } int monitor_get_fd(Monitor *mon, const char *fdname)
do_loadvm(), which implements the 'loadvm' Monitor command, pauses the emulation to load the saved VM, however it will only resume it if the loading succeeds. In other words, if the user issues 'loadvm' and it fails, the end result will be an error message and a paused VM. This seems an undesirable side effect to me because, most of the time, if a Monitor command fails the best thing we can do is to leave the VM as it were before the command was executed. FIXME: This will try to run a potentially corrupted image, the solution is to split load_vmstate() in two and only keep the VM paused if qemu_loadvm_state() fails. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> --- monitor.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-)