Message ID | 20230517123752.21615-1-vsementsov@yandex-team.ru |
---|---|
Headers | show |
Series | Restore vmstate on cancelled/failed migration | expand |
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote: > Hi all. > > The problem I want to solve is that guest-panicked state may be lost > when migration is failed (or cancelled) after source stop. > > Still, I try to go further and restore all possible paused states in the > same way. The key patch is the last one and others are refactoring and > preparation. Hi I like and agree with the spirit of the series in general. But I think that we need to drop the "never fail in global_state_store()". We shouldn't kill a guest because we found a bug on migration. Later, Juan.
On 18.05.23 14:23, Juan Quintela wrote: > Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote: >> Hi all. >> >> The problem I want to solve is that guest-panicked state may be lost >> when migration is failed (or cancelled) after source stop. >> >> Still, I try to go further and restore all possible paused states in the >> same way. The key patch is the last one and others are refactoring and >> preparation. > > Hi > > I like and agree with the spirit of the series in general. But I think > that we need to drop the "never fail in global_state_store()". We > shouldn't kill a guest because we found a bug on migration. > Why migration is better in this sense than non-migration? We have a lot of places where we just assert things instead of creating unreachable error messages. I think assert/abort is always better in such cases. Really, if we fail in this assertion it means that memory is corrupted, and stopping the execution is the best thing to do. (Should we consider the case that in future we add 100 character length vmstate? I hope we should not)
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote: > On 18.05.23 14:23, Juan Quintela wrote: >> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote: >>> Hi all. >>> >>> The problem I want to solve is that guest-panicked state may be lost >>> when migration is failed (or cancelled) after source stop. >>> >>> Still, I try to go further and restore all possible paused states in the >>> same way. The key patch is the last one and others are refactoring and >>> preparation. >> Hi >> I like and agree with the spirit of the series in general. But I >> think >> that we need to drop the "never fail in global_state_store()". We >> shouldn't kill a guest because we found a bug on migration. >> > > Why migration is better in this sense than non-migration? We have a > lot of places where we just assert things instead of creating > unreachable error messages. I think assert/abort is always better in > such cases. Really, if we fail in this assertion it means that memory > is corrupted, and stopping the execution is the best thing to do. > > (Should we consider the case that in future we add 100 character length vmstate? I hope we should not) Ok, I give up and integrate the series as they are O:-) I agree that this is a case that shouldn't happen, so assert() is not as out of question. What I am trying to get migration is to really detect errors and be able to recover from them. My long term crusade is getting rid of qemu_file_get_error() and just check the return value for functions that do IO. Yes, it is a big long term because we need to change the whole interface to something saner. Later, Juan.