Message ID | 1361352723-218544-4-git-send-email-dietmar@proxmox.com |
---|---|
State | New |
Headers | show |
On Wed, Feb 20, 2013 at 10:32:00AM +0100, Dietmar Maurer wrote: > We use a generic BackupDriver struct to encapsulate all archive format > related function. > > Another option would be to simply dump <devid,cluster_num,cluster_data> to > the output fh (pipe), and an external binary saves the data. That way we > could move the whole archive format related code out of qemu. That sounds like the NBD option - write the backup to an NBD disk image. The NBD server process can take the WRITE requests and do whatever it wants. The disadvantage of using NBD is that it strictly transports block data. It doesn't really have the concept of VM configuration or device state. > diff --git a/backup.h b/backup.h > index d9395bc..c8ba153 100644 > --- a/backup.h > +++ b/backup.h > @@ -29,4 +29,16 @@ int backup_job_create(BlockDriverState *bs, BackupDumpFunc *backup_dump_cb, > BlockDriverCompletionFunc *backup_complete_cb, > void *opaque, int64_t speed); > > +typedef struct BackupDriver { > + const char *format; > + void *(*open_cb)(const char *filename, uuid_t uuid, Error **errp); > + int (*close_cb)(void *opaque, Error **errp); > + int (*register_config_cb)(void *opaque, const char *name, gpointer data, > + size_t data_len); > + int (*register_stream_cb)(void *opaque, const char *devname, size_t size); > + int (*dump_cb)(void *opaque, uint8_t dev_id, int64_t cluster_num, > + unsigned char *buf, size_t *zero_bytes); > + int (*complete_cb)(void *opaque, uint8_t dev_id, int ret); No need to suffix the functions with _cb. > + /* add configuration file to archive */ > + if (has_config_file) { > + char *cdata = NULL; > + gsize clen = 0; > + GError *err = NULL; > + if (!g_file_get_contents(config_file, &cdata, &clen, &err)) { > + error_setg(errp, "unable to read file '%s'", config_file); > + goto err; > + } > + > + const char *basename = g_path_get_basename(config_file); > + if (driver->register_config_cb(writer, basename, cdata, clen) < 0) { > + error_setg(errp, "register_config failed"); > + g_free(cdata); > + goto err; > + } > + g_free(cdata); > + } This is doing too much inside QEMU. First off, when QEMU is sandboxed or run with SELinux, we cannot expect to be able to access arbitrary files. This is why new QMP commands of this sort usually support file descriptor passing - then a more privileged component can give QEMU access. g_file_get_contents() hangs the VM if the file is over NFS while the server is down. This is bad for reliability. Management tools (proxmox, libvirt, or something else) handle the VM configuration. It may not be a single file. Saving external VM configuration is out of scope for QEMU. > +## > +# @backup: > +# > +# Starts a VM backup. > +# > +# @backup-file: the backup file name > +# > +# @format: format of the backup file > +# > +# @config-filename: #optional name of a configuration file to include into > +# the backup archive. > +# > +# @speed: #optional the maximum speed, in bytes per second > +# > +# @devlist: #optional list of block device names (separated by ',', ';' > +# or ':'). By default the backup includes all writable block devices. > +# > +# Returns: the uuid of the backup job > +# > +# Since: 1.5.0 > +## > +{ 'command': 'backup', 'data': { 'backup-file': 'str', > + '*format': 'BackupFormat', > + '*config-file': 'str', > + '*devlist': 'str', '*speed': 'int' }, devlist should be ['String']. > diff --git a/qmp-commands.hx b/qmp-commands.hx > index 799adea..17e381b 100644 > --- a/qmp-commands.hx > +++ b/qmp-commands.hx > @@ -889,6 +889,18 @@ EQMP > }, > > { > + .name = "backup", > + .args_type = "backup-file:s,format:s?,config-file:F?,speed:o?,devlist:s?", > + .mhandler.cmd_new = qmp_marshal_input_backup, > + }, > + > + { > + .name = "backup-cancel", > + .args_type = "", > + .mhandler.cmd_new = qmp_marshal_input_backup_cancel, > + }, We might want to more than one backup if the guest has multiple disks. For example, the database disk is backed up independently of the main OS disk. This API only allows one backup to run at a time.
> > Another option would be to simply dump > > <devid,cluster_num,cluster_data> to the output fh (pipe), and an > > external binary saves the data. That way we could move the whole archive > format related code out of qemu. > > That sounds like the NBD option - write the backup to an NBD disk image. > The NBD server process can take the WRITE requests and do whatever it wants. > > The disadvantage of using NBD is that it strictly transports block data. > It doesn't really have the concept of VM configuration or device state. My thought was that the BackupDriver is the simplest way to allow different backend. For example you can easily create a BackupDriver to write to a NBD server (sure, you still need to find a way to store configuration). > > diff --git a/backup.h b/backup.h > > index d9395bc..c8ba153 100644 > > --- a/backup.h > > +++ b/backup.h > > @@ -29,4 +29,16 @@ int backup_job_create(BlockDriverState *bs, > BackupDumpFunc *backup_dump_cb, > > BlockDriverCompletionFunc *backup_complete_cb, > > void *opaque, int64_t speed); > > > > +typedef struct BackupDriver { > > + const char *format; > > + void *(*open_cb)(const char *filename, uuid_t uuid, Error **errp); > > + int (*close_cb)(void *opaque, Error **errp); > > + int (*register_config_cb)(void *opaque, const char *name, gpointer data, > > + size_t data_len); > > + int (*register_stream_cb)(void *opaque, const char *devname, size_t size); > > + int (*dump_cb)(void *opaque, uint8_t dev_id, int64_t cluster_num, > > + unsigned char *buf, size_t *zero_bytes); > > + int (*complete_cb)(void *opaque, uint8_t dev_id, int ret); > > No need to suffix the functions with _cb. Ok, will remove that. > > > + /* add configuration file to archive */ > > + if (has_config_file) { > > + char *cdata = NULL; > > + gsize clen = 0; > > + GError *err = NULL; > > + if (!g_file_get_contents(config_file, &cdata, &clen, &err)) { > > + error_setg(errp, "unable to read file '%s'", config_file); > > + goto err; > > + } > > + > > + const char *basename = g_path_get_basename(config_file); > > + if (driver->register_config_cb(writer, basename, cdata, clen) < 0) { > > + error_setg(errp, "register_config failed"); > > + g_free(cdata); > > + goto err; > > + } > > + g_free(cdata); > > + } > > This is doing too much inside QEMU. > > First off, when QEMU is sandboxed or run with SELinux, we cannot expect to be > able to access arbitrary files. This is why new QMP commands of this sort > usually support file descriptor passing - then a more privileged component can > give QEMU access. We can allow to pass fd for that later (if someone really uses it that way). > g_file_get_contents() hangs the VM if the file is over NFS while the server is > down. This is bad for reliability. The solution is to not pass a path which is on NFS. But that is up to the management tool. > Management tools (proxmox, libvirt, or something else) handle the VM > configuration. It may not be a single file. Saving external VM configuration is > out of scope for QEMU. Backup includes configuration, so you need a way to save that. But I do not really get your suggestion - creating a backup without configuration does not make much sense? In future, we can allow to pass multiple config files - the vma archive format can already handle that. > > +## > > +# @backup: > > +# > > +# Starts a VM backup. > > +# > > +# @backup-file: the backup file name > > +# > > +# @format: format of the backup file > > +# > > +# @config-filename: #optional name of a configuration file to include > > +into # the backup archive. > > +# > > +# @speed: #optional the maximum speed, in bytes per second # # > > +@devlist: #optional list of block device names (separated by ',', ';' > > +# or ':'). By default the backup includes all writable block devices. > > +# > > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## { > > +'command': 'backup', 'data': { 'backup-file': 'str', > > + '*format': 'BackupFormat', > > + '*config-file': 'str', > > + '*devlist': 'str', '*speed': 'int' > > +}, > > devlist should be ['String']. I want to be able to use that command on the human monitor. That is no longer possible if I use ['String']? > > > diff --git a/qmp-commands.hx b/qmp-commands.hx index 799adea..17e381b > > 100644 > > --- a/qmp-commands.hx > > +++ b/qmp-commands.hx > > @@ -889,6 +889,18 @@ EQMP > > }, > > > > { > > + .name = "backup", > > + .args_type = "backup-file:s,format:s?,config-file:F?,speed:o?,devlist:s?", > > + .mhandler.cmd_new = qmp_marshal_input_backup, > > + }, > > + > > + { > > + .name = "backup-cancel", > > + .args_type = "", > > + .mhandler.cmd_new = qmp_marshal_input_backup_cancel, > > + }, > > We might want to more than one backup if the guest has multiple disks. > For example, the database disk is backed up independently of the main OS disk. > > This API only allows one backup to run at a time. I do not want multiple backup jobs. You can easily run 2 jobs in sequence to solve above use case.
On Thu, Feb 21, 2013 at 06:21:32AM +0000, Dietmar Maurer wrote: > > > + /* add configuration file to archive */ > > > + if (has_config_file) { > > > + char *cdata = NULL; > > > + gsize clen = 0; > > > + GError *err = NULL; > > > + if (!g_file_get_contents(config_file, &cdata, &clen, &err)) { > > > + error_setg(errp, "unable to read file '%s'", config_file); > > > + goto err; > > > + } > > > + > > > + const char *basename = g_path_get_basename(config_file); > > > + if (driver->register_config_cb(writer, basename, cdata, clen) < 0) { > > > + error_setg(errp, "register_config failed"); > > > + g_free(cdata); > > > + goto err; > > > + } > > > + g_free(cdata); > > > + } > > > > This is doing too much inside QEMU. > > > > First off, when QEMU is sandboxed or run with SELinux, we cannot expect to be > > able to access arbitrary files. This is why new QMP commands of this sort > > usually support file descriptor passing - then a more privileged component can > > give QEMU access. > > We can allow to pass fd for that later (if someone really uses it that way). > > > g_file_get_contents() hangs the VM if the file is over NFS while the server is > > down. This is bad for reliability. > > The solution is to not pass a path which is on NFS. But that is up to the management tool. > > > Management tools (proxmox, libvirt, or something else) handle the VM > > configuration. It may not be a single file. Saving external VM configuration is > > out of scope for QEMU. > > Backup includes configuration, so you need a way to save that. > But I do not really get your suggestion - creating a backup without configuration > does not make much sense? > > In future, we can allow to pass multiple config files - the vma archive format can > already handle that. My point is that QEMU has no business dealing with the management tool's VM configuration file. And I think the management tool-specific archive format also shouldn't be in QEMU. How will a proxmox user restore a VMA file generated with oVirt since the configuration file comes from the management layer? They can't because the VMA is specific to the management layer and should be handled there, not inside QEMU. QEMU must provide the mechanism for point-in-time backups of block devices - your backup block job implements this. Where I disagree with this patch series is that you put the management tool-specific archive format writer into QEMU. That is outside the scope of QEMU. The management tool should tell QEMU to take the backups of block devices and do a live migration to file. The management tool can use a NBD server if it wants to capture all the block device backups into a single archive. And the management tool can bundle the VM configuration into that archive too. But those steps are beyond the scope of QEMU. The approach I'm suggesting is more modular. For example, the backup block job can also be used to copy out the state of a disk into a new qcow2 file. > > > +## > > > +# @backup: > > > +# > > > +# Starts a VM backup. > > > +# > > > +# @backup-file: the backup file name > > > +# > > > +# @format: format of the backup file > > > +# > > > +# @config-filename: #optional name of a configuration file to include > > > +into # the backup archive. > > > +# > > > +# @speed: #optional the maximum speed, in bytes per second # # > > > +@devlist: #optional list of block device names (separated by ',', ';' > > > +# or ':'). By default the backup includes all writable block devices. > > > +# > > > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## { > > > +'command': 'backup', 'data': { 'backup-file': 'str', > > > + '*format': 'BackupFormat', > > > + '*config-file': 'str', > > > + '*devlist': 'str', '*speed': 'int' > > > +}, > > > > devlist should be ['String']. > > I want to be able to use that command on the human monitor. > That is no longer possible if I use ['String']? Good question. I don't know the answer but perhaps Luiz or Anthony do (CCed)? > > > diff --git a/qmp-commands.hx b/qmp-commands.hx index 799adea..17e381b > > > 100644 > > > --- a/qmp-commands.hx > > > +++ b/qmp-commands.hx > > > @@ -889,6 +889,18 @@ EQMP > > > }, > > > > > > { > > > + .name = "backup", > > > + .args_type = "backup-file:s,format:s?,config-file:F?,speed:o?,devlist:s?", > > > + .mhandler.cmd_new = qmp_marshal_input_backup, > > > + }, > > > + > > > + { > > > + .name = "backup-cancel", > > > + .args_type = "", > > > + .mhandler.cmd_new = qmp_marshal_input_backup_cancel, > > > + }, > > > > We might want to more than one backup if the guest has multiple disks. > > For example, the database disk is backed up independently of the main OS disk. > > > > This API only allows one backup to run at a time. > > I do not want multiple backup jobs. You can easily run 2 jobs in sequence to solve above use case. Why do you not want multiple backup jobs? It makes perfect sense to separate database disks from main OS disks. They have different backup characteristics (how often to back up, how to restore) so it's likely that users will ask for multiple backup jobs at the same time. Let's get the QMP interfaces right so that it can be supported in the future, if not right away. Stefan
On Thu, Feb 21, 2013 at 06:21:32AM +0000, Dietmar Maurer wrote: > > > +## > > > +# @backup: > > > +# > > > +# Starts a VM backup. > > > +# > > > +# @backup-file: the backup file name > > > +# > > > +# @format: format of the backup file > > > +# > > > +# @config-filename: #optional name of a configuration file to include > > > +into # the backup archive. > > > +# > > > +# @speed: #optional the maximum speed, in bytes per second # # > > > +@devlist: #optional list of block device names (separated by ',', ';' > > > +# or ':'). By default the backup includes all writable block devices. > > > +# > > > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## { > > > +'command': 'backup', 'data': { 'backup-file': 'str', > > > + '*format': 'BackupFormat', > > > + '*config-file': 'str', > > > + '*devlist': 'str', '*speed': 'int' > > > +}, > > > > devlist should be ['String']. > > I want to be able to use that command on the human monitor. > That is no longer possible if I use ['String']? I forgot to CC Luiz and Anthony. Question for them. Stefan
> > In future, we can allow to pass multiple config files - the vma > > archive format can already handle that. > > My point is that QEMU has no business dealing with the management tool's VM > configuration file. And I think the management tool-specific archive format also > shouldn't be in QEMU. > > How will a proxmox user restore a VMA file generated with oVirt since the > configuration file comes from the management layer? They can't because the > VMA is specific to the management layer and should be handled there, not inside > QEMU. The management tooI just needs to convert the config - looks quite easy to me. > QEMU must provide the mechanism for point-in-time backups of block devices - > your backup block job implements this. > > Where I disagree with this patch series is that you put the management tool- > specific archive format writer into QEMU. That is outside the scope of QEMU. > The management tool should tell QEMU to take the backups of block devices > and do a live migration to file. > > The management tool can use a NBD server if it wants to capture all the block > device backups into a single archive. And the management tool can bundle the > VM configuration into that archive too. But those steps are beyond the scope of > QEMU. > > The approach I'm suggesting is more modular. For example, the backup block > job can also be used to copy out the state of a disk into a new > qcow2 file. OK, I can try to split the code. But I think I will simply define a 'protocol' instead of using an NBD server (what for?) > > > > +## > > > > +# @backup: > > > > +# > > > > +# Starts a VM backup. > > > > +# > > > > +# @backup-file: the backup file name # # @format: format of the > > > > +backup file # # @config-filename: #optional name of a > > > > +configuration file to include into # the backup archive. > > > > +# > > > > +# @speed: #optional the maximum speed, in bytes per second # # > > > > +@devlist: #optional list of block device names (separated by ',', ';' > > > > +# or ':'). By default the backup includes all writable block devices. > > > > +# > > > > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## { > > > > +'command': 'backup', 'data': { 'backup-file': 'str', > > > > + '*format': 'BackupFormat', > > > > + '*config-file': 'str', > > > > + '*devlist': 'str', '*speed': 'int' > > > > +}, > > > > > > devlist should be ['String']. > > > > I want to be able to use that command on the human monitor. > > That is no longer possible if I use ['String']? > > Good question. I don't know the answer but perhaps Luiz or Anthony do (CCed)? > > > > > diff --git a/qmp-commands.hx b/qmp-commands.hx index > > > > 799adea..17e381b > > > > 100644 > > > > --- a/qmp-commands.hx > > > > +++ b/qmp-commands.hx > > > > @@ -889,6 +889,18 @@ EQMP > > > > }, > > > > > > > > { > > > > + .name = "backup", > > > > + .args_type = "backup-file:s,format:s?,config- > file:F?,speed:o?,devlist:s?", > > > > + .mhandler.cmd_new = qmp_marshal_input_backup, > > > > + }, > > > > + > > > > + { > > > > + .name = "backup-cancel", > > > > + .args_type = "", > > > > + .mhandler.cmd_new = qmp_marshal_input_backup_cancel, > > > > + }, > > > > > > We might want to more than one backup if the guest has multiple disks. > > > For example, the database disk is backed up independently of the main OS > disk. > > > > > > This API only allows one backup to run at a time. > > > > I do not want multiple backup jobs. You can easily run 2 jobs in sequence to > solve above use case. > > Why do you not want multiple backup jobs? It makes perfect sense to separate > database disks from main OS disks. They have different backup characteristics > (how often to back up, how to restore) so it's likely that users will ask for > multiple backup jobs at the same time. Let's get the QMP interfaces right so that > it can be supported in the future, if not right away. So you want to pass the 'uuid' to backup-cancel?
On Thu, 21 Feb 2013 14:47:08 +0100 Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Thu, Feb 21, 2013 at 06:21:32AM +0000, Dietmar Maurer wrote: > > > > +## > > > > +# @backup: > > > > +# > > > > +# Starts a VM backup. > > > > +# > > > > +# @backup-file: the backup file name > > > > +# > > > > +# @format: format of the backup file > > > > +# > > > > +# @config-filename: #optional name of a configuration file to include > > > > +into # the backup archive. > > > > +# > > > > +# @speed: #optional the maximum speed, in bytes per second # # > > > > +@devlist: #optional list of block device names (separated by ',', ';' > > > > +# or ':'). By default the backup includes all writable block devices. > > > > +# > > > > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## { > > > > +'command': 'backup', 'data': { 'backup-file': 'str', > > > > + '*format': 'BackupFormat', > > > > + '*config-file': 'str', > > > > + '*devlist': 'str', '*speed': 'int' > > > > +}, > > > > > > devlist should be ['String']. > > > > I want to be able to use that command on the human monitor. > > That is no longer possible if I use ['String']? Unless I'm missing something, it should work just fine. The HMP command can have an argument which is a device list, then it will have to parse that list and transform it into the list expected by QAPI functions, which is what is going to be passed to the qmp_ function.
On Thu, Feb 21, 2013 at 03:48:57PM +0000, Dietmar Maurer wrote: > > > In future, we can allow to pass multiple config files - the vma > > > archive format can already handle that. > > > > My point is that QEMU has no business dealing with the management tool's VM > > configuration file. And I think the management tool-specific archive format also > > shouldn't be in QEMU. > > > > How will a proxmox user restore a VMA file generated with oVirt since the > > configuration file comes from the management layer? They can't because the > > VMA is specific to the management layer and should be handled there, not inside > > QEMU. > > The management tooI just needs to convert the config - looks quite easy to me. It's not an easy problem. This is why there is no central vm-images.com where everyone can share/sell virtual appliances. You cannot trivially convert between VMware, oVirt, proxmox, Xen, EC2, etc. > > QEMU must provide the mechanism for point-in-time backups of block devices - > > your backup block job implements this. > > > > Where I disagree with this patch series is that you put the management tool- > > specific archive format writer into QEMU. That is outside the scope of QEMU. > > The management tool should tell QEMU to take the backups of block devices > > and do a live migration to file. > > > > The management tool can use a NBD server if it wants to capture all the block > > device backups into a single archive. And the management tool can bundle the > > VM configuration into that archive too. But those steps are beyond the scope of > > QEMU. > > > > The approach I'm suggesting is more modular. For example, the backup block > > job can also be used to copy out the state of a disk into a new > > qcow2 file. > > OK, I can try to split the code. But I think I will simply define a 'protocol' instead > of using an NBD server (what for?) block/nbd.c already exists so it saves you from writing the QEMU-side code to export data to another process. The archive writer program opens a listen socket for each block device that is being backed up. Then it handles NBD WRITE requests from QEMU. > > > > > diff --git a/qmp-commands.hx b/qmp-commands.hx index > > > > > 799adea..17e381b > > > > > 100644 > > > > > --- a/qmp-commands.hx > > > > > +++ b/qmp-commands.hx > > > > > @@ -889,6 +889,18 @@ EQMP > > > > > }, > > > > > > > > > > { > > > > > + .name = "backup", > > > > > + .args_type = "backup-file:s,format:s?,config- > > file:F?,speed:o?,devlist:s?", > > > > > + .mhandler.cmd_new = qmp_marshal_input_backup, > > > > > + }, > > > > > + > > > > > + { > > > > > + .name = "backup-cancel", > > > > > + .args_type = "", > > > > > + .mhandler.cmd_new = qmp_marshal_input_backup_cancel, > > > > > + }, > > > > > > > > We might want to more than one backup if the guest has multiple disks. > > > > For example, the database disk is backed up independently of the main OS > > disk. > > > > > > > > This API only allows one backup to run at a time. > > > > > > I do not want multiple backup jobs. You can easily run 2 jobs in sequence to > > solve above use case. > > > > Why do you not want multiple backup jobs? It makes perfect sense to separate > > database disks from main OS disks. They have different backup characteristics > > (how often to back up, how to restore) so it's likely that users will ask for > > multiple backup jobs at the same time. Let's get the QMP interfaces right so that > > it can be supported in the future, if not right away. > > So you want to pass the 'uuid' to backup-cancel? Yes, that works. Or let the user specify the ID for this backup job. That way the management tool can use its own IDs. QEMU doesn't care, it treats the IDs as opaque strings. Stefan
> > The management tooI just needs to convert the config - looks quite easy to > me. > > It's not an easy problem. This is why there is no central vm-images.com where > everyone can share/sell virtual appliances. You cannot trivially convert between > VMware, oVirt, proxmox, Xen, EC2, etc. For that case, I added the ability to write several different config files into a VMA archive. So If someone want to distribute an appliance, he can simply add a config file for each manager. > > > QEMU must provide the mechanism for point-in-time backups of block > > > devices - your backup block job implements this. > > > > > > Where I disagree with this patch series is that you put the > > > management tool- specific archive format writer into QEMU. That is outside > the scope of QEMU. > > > The management tool should tell QEMU to take the backups of block > > > devices and do a live migration to file. > > > > > > The management tool can use a NBD server if it wants to capture all > > > the block device backups into a single archive. And the management > > > tool can bundle the VM configuration into that archive too. But > > > those steps are beyond the scope of QEMU. > > > > > > The approach I'm suggesting is more modular. For example, the > > > backup block job can also be used to copy out the state of a disk > > > into a new > > > qcow2 file. > > > > OK, I can try to split the code. But I think I will simply define a > > 'protocol' instead of using an NBD server (what for?) > > block/nbd.c already exists so it saves you from writing the QEMU-side code to > export data to another process. Is there some documentation about that NBD protocol? > The archive writer program opens a listen socket for each block device that is > being backed up. Then it handles NBD WRITE requests from QEMU. I still think using a specific protocol is wrong. You don't want to use NDB if you can directly talk to some backup server?
On Fri, Feb 22, 2013 at 10:14:03AM +0000, Dietmar Maurer wrote: > > > > QEMU must provide the mechanism for point-in-time backups of block > > > > devices - your backup block job implements this. > > > > > > > > Where I disagree with this patch series is that you put the > > > > management tool- specific archive format writer into QEMU. That is outside > > the scope of QEMU. > > > > The management tool should tell QEMU to take the backups of block > > > > devices and do a live migration to file. > > > > > > > > The management tool can use a NBD server if it wants to capture all > > > > the block device backups into a single archive. And the management > > > > tool can bundle the VM configuration into that archive too. But > > > > those steps are beyond the scope of QEMU. > > > > > > > > The approach I'm suggesting is more modular. For example, the > > > > backup block job can also be used to copy out the state of a disk > > > > into a new > > > > qcow2 file. > > > > > > OK, I can try to split the code. But I think I will simply define a > > > 'protocol' instead of using an NBD server (what for?) > > > > block/nbd.c already exists so it saves you from writing the QEMU-side code to > > export data to another process. > > Is there some documentation about that NBD protocol? See /usr/include/linux/nbd.h. There are four commands to the protocol: enum { NBD_CMD_READ = 0, NBD_CMD_WRITE = 1, NBD_CMD_DISC = 2, /* there is a gap here to match userspace */ NBD_CMD_TRIM = 4 }; NBD_CMD_DISC is "disconnect". NBD_CMD_TRIM discards regions of the device. The command flow is request-response. Each request has a unique identifier called the "handle", which allows the client to issue multiple requests in parallel: /* * This is the packet used for communication between client and * server. All data are in network byte order. */ struct nbd_request { __be32 magic; __be32 type; /* == READ || == WRITE */ char handle[8]; __be64 from; __be32 len; } __attribute__((packed)); /* * This is the reply packet that nbd-server sends back to the client * after * it has completed an I/O request (or an error occurs). */ struct nbd_reply { __be32 magic; __be32 error; /* 0 = ok, else error */ char handle[8]; /* handle you got from request */ }; > > The archive writer program opens a listen socket for each block device that is > > being backed up. Then it handles NBD WRITE requests from QEMU. > > I still think using a specific protocol is wrong. You don't want to use NDB if you can directly > talk to some backup server? NBD enables interprocess communication - any form of IPC requires a protocol and NBD is quite a trivial one. What is a simpler way of talking to a backup server? The only other vendor-independent method is writing a raw file to an iSCSI, NFS, or CIFS export on the backup server. QEMU also supports that. Those are two reasonable options, depending on whether you want to implement a simple protocol (NBD) or you already have an existing iSCSI/NFS/CIFS stack. Stefan
> NBD enables interprocess communication - any form of IPC requires a protocol > and NBD is quite a trivial one. What is a simpler way of talking to a backup > server? Unfortunately, NBD add considerable overheads. I guess the socket communications copies data. This is really unnecessary if I can write directly to the output stream.
On Wed, Feb 27, 2013 at 03:50:53PM +0000, Dietmar Maurer wrote: > > NBD enables interprocess communication - any form of IPC requires a protocol > > and NBD is quite a trivial one. What is a simpler way of talking to a backup > > server? > > Unfortunately, NBD add considerable overheads. I guess the socket communications copies data. > This is really unnecessary if I can write directly to the output stream. The disk is the bottleneck, not memory bandwidth. Hard disks only do 10-100 MB/sec and SSDs only do a couple 100 MB/sec. Memory copy is insignificant compared to the I/O activity required to copy out the entire disk image, not to mention delaying guest writes until we read the original data from the disk. Unless there's a concrete performance problem here this is premature optimization. Stefan
> > Unfortunately, NBD add considerable overheads. I guess the socket > communications copies data. > > This is really unnecessary if I can write directly to the output stream. > > The disk is the bottleneck, not memory bandwidth. Hard disks only do > 10-100 MB/sec and SSDs only do a couple 100 MB/sec. Memory copy is > insignificant compared to the I/O activity required to copy out the entire disk > image, not to mention delaying guest writes until we read the original data from > the disk. > > Unless there's a concrete performance problem here this is premature > optimization. I currently test with about 700MB read speed, and get a slow down by factor 1.7. So memory copy is very significant , or there is something wrong with nbd.c. RAID with normal HDs can easily give you 500MB/s, and RAID with multiple SSDs can give you more than 1000MB/s read speed. There are some high performance ipc libraries, for example the one from corosync: http://kernel.org/doc/ols/2009/ols2009-pages-61-68.pdf Such libraries tries to avoid copying data.
> The disk is the bottleneck, not memory bandwidth. Hard disks only do > 10-100 MB/sec and SSDs only do a couple 100 MB/sec. Memory copy is > insignificant compared to the I/O activity required to copy out the entire disk > image, not to mention delaying guest writes until we read the original data from > the disk. Not to mention fusion-io card, which are currently at 6GB/s.
On Thu, Feb 28, 2013 at 03:24:27PM +0000, Dietmar Maurer wrote: > > > Unfortunately, NBD add considerable overheads. I guess the socket > > communications copies data. > > > This is really unnecessary if I can write directly to the output stream. > > > > The disk is the bottleneck, not memory bandwidth. Hard disks only do > > 10-100 MB/sec and SSDs only do a couple 100 MB/sec. Memory copy is > > insignificant compared to the I/O activity required to copy out the entire disk > > image, not to mention delaying guest writes until we read the original data from > > the disk. > > > > Unless there's a concrete performance problem here this is premature > > optimization. > > I currently test with about 700MB read speed, and get a slow down by factor 1.7. > So memory copy is very significant , or there is something wrong with nbd.c. What are the details of the test? Is it using 64 KB writes and have you tried 256 KB writes? Stefan
> What are the details of the test? > > Is it using 64 KB writes and have you tried 256 KB writes? I use a modified 'qemu-img convert' at 64KB block size (I need 64KB for backup).
Am 04.03.2013 um 14:16 hat Dietmar Maurer geschrieben: > > What are the details of the test? > > > > Is it using 64 KB writes and have you tried 256 KB writes? > > I use a modified 'qemu-img convert' at 64KB block size (I need 64KB for backup). Maybe you'd better use a different output format that doesn't restrict you to 64k writes. Kevin
> > > Is it using 64 KB writes and have you tried 256 KB writes? > > > > I use a modified 'qemu-img convert' at 64KB block size (I need 64KB for > backup). > > Maybe you'd better use a different output format that doesn't restrict you to > 64k writes. The output format is not really the restriction. The problem is that an additional IPC layer add overhead, an d I do not want that (because it is totally unnecessary).
On Mon, Mar 04, 2013 at 02:33:16PM +0000, Dietmar Maurer wrote: > > > > Is it using 64 KB writes and have you tried 256 KB writes? > > > > > > I use a modified 'qemu-img convert' at 64KB block size (I need 64KB for > > backup). > > > > Maybe you'd better use a different output format that doesn't restrict you to > > 64k writes. > > The output format is not really the restriction. The problem is that an additional > IPC layer add overhead, an d I do not want that (because it is totally unnecessary). I missed the reason why you cannot increase the block size. At least increase the NBD write size - the VMA process can break up a large write however it likes. Do this and the overhead is divided by 4. Stefan
> > > Maybe you'd better use a different output format that doesn't > > > restrict you to 64k writes. > > > > The output format is not really the restriction. The problem is that > > an additional IPC layer add overhead, an d I do not want that (because it is > totally unnecessary). > > I missed the reason why you cannot increase the block size. When we run backup, we need to read such block on every write from the guest. So if we increase block size we get additional delays.
Am 06.03.2013 um 15:42 hat Dietmar Maurer geschrieben: > > > > Maybe you'd better use a different output format that doesn't > > > > restrict you to 64k writes. > > > > > > The output format is not really the restriction. The problem is that > > > an additional IPC layer add overhead, an d I do not want that (because it is > > totally unnecessary). > > > > I missed the reason why you cannot increase the block size. > > When we run backup, we need to read such block on every write from the guest. > So if we increase block size we get additional delays. How about variable block sizes? I mean this is a stream format that has a header for each block anyway. Include a size there and be done. Kevin
> > When we run backup, we need to read such block on every write from the > guest. > > So if we increase block size we get additional delays. > > How about variable block sizes? I mean this is a stream format that has a header > for each block anyway. Include a size there and be done. You can make that as complex as you want. I simply do not need that.
Am 06.03.2013 um 16:33 hat Dietmar Maurer geschrieben: > > > When we run backup, we need to read such block on every write from the > > guest. > > > So if we increase block size we get additional delays. > > > > How about variable block sizes? I mean this is a stream format that has a header > > for each block anyway. Include a size there and be done. > > You can make that as complex as you want. I simply do not need that. Then you also don't need the performance that you lose by using NBD. Kevin
> > > How about variable block sizes? I mean this is a stream format that > > > has a header for each block anyway. Include a size there and be done. > > > > You can make that as complex as you want. I simply do not need that. > > Then you also don't need the performance that you lose by using NBD. Why exactly?
Am 06.03.2013 um 18:39 hat Dietmar Maurer geschrieben: > > > > How about variable block sizes? I mean this is a stream format that > > > > has a header for each block anyway. Include a size there and be done. > > > > > > You can make that as complex as you want. I simply do not need that. > > > > Then you also don't need the performance that you lose by using NBD. > > Why exactly? Because your format kills more performance than any NBD connection could. Kevin
On Wed, Mar 06, 2013 at 02:42:57PM +0000, Dietmar Maurer wrote: > > > > Maybe you'd better use a different output format that doesn't > > > > restrict you to 64k writes. > > > > > > The output format is not really the restriction. The problem is that > > > an additional IPC layer add overhead, an d I do not want that (because it is > > totally unnecessary). > > > > I missed the reason why you cannot increase the block size. > > When we run backup, we need to read such block on every write from the guest. > So if we increase block size we get additional delays. Don't increase the bitmap block size. Just let the block job do larger reads. This is the bulk of the I/O workload. You can use large reads here independently of the bitmap block size. That way guest writes still only have a 64 KB read overhead but you reduce the overhead of doing so many 64 KB writes from the backup block job. Stefan
> > > > You can make that as complex as you want. I simply do not need that. > > > > > > Then you also don't need the performance that you lose by using NBD. > > > > Why exactly? > > Because your format kills more performance than any NBD connection could. That is not true.
> > When we run backup, we need to read such block on every write from the > guest. > > So if we increase block size we get additional delays. > > Don't increase the bitmap block size. > > Just let the block job do larger reads. This is the bulk of the I/O workload. You > can use large reads here independently of the bitmap block size. > > That way guest writes still only have a 64 KB read overhead but you reduce the > overhead of doing so many 64 KB writes from the backup block job. If there are many writes from the guest, you still get many 64KB block. Anyways, and additional RPC layer always adds overhead, and It can be completely avoided. Maybe not much, but I want to make backup as efficient as possible.
On Thu, Mar 07, 2013 at 09:28:40AM +0000, Dietmar Maurer wrote: > > > When we run backup, we need to read such block on every write from the > > guest. > > > So if we increase block size we get additional delays. > > > > Don't increase the bitmap block size. > > > > Just let the block job do larger reads. This is the bulk of the I/O workload. You > > can use large reads here independently of the bitmap block size. > > > > That way guest writes still only have a 64 KB read overhead but you reduce the > > overhead of doing so many 64 KB writes from the backup block job. > > If there are many writes from the guest, you still get many 64KB block. In the common case the backup block job does significantly more I/O than the guest, unless your workload is dd if=/dev/vda of=/dev/null. Remember the block job is reading the *entire* disk! > Anyways, and additional RPC layer always adds overhead, and It can be completely avoided. > Maybe not much, but I want to make backup as efficient as possible. The drawbacks outweight the performance advantage: 1. QEMU can neither backup nor restore without help from the management tool. This is a strong indicator that the backup archive code should live outside QEMU. I doesn't make sense for proxmox, oVirt, OpenStack, and others to each maintain their backup archive code inside qemu.git, tied to QEMU's C codebase, release cycle, and license. 2. QEMU already has interfaces to export the vmstate and block device snapshots: migration/savevm and BlockDriverState (NBD for IPC or raw/qcow2/vmdk for file). It is not necessary to add a special-purpose interface just for backup. 3. The backup block job can be composed together with other QMP commands to achieve scenarios besides just VMA backup. It's more flexible to add simple primitives that can be combined instead of adding a monolithic backup command. For these reasons, I'm against putting backup archive code into QEMU. If performance is key, please look into incremental backups. Taking a full copy of the disk image every time affects guest performance much more than IPC overhead. Plus there'll be less IPC if only dirty blocks are backed up :). Stefan
> > Anyways, and additional RPC layer always adds overhead, and It can be > completely avoided. > > Maybe not much, but I want to make backup as efficient as possible. > > The drawbacks outweight the performance advantage: not for me. > 1. QEMU can neither backup nor restore without help from the management > tool. Backup works perfectly with the current patches. You can easily trigger a backup using a HMP command. This is not really important, but works. > This is a strong indicator that the backup archive code should > live outside QEMU. I doesn't make sense for proxmox, oVirt, > OpenStack, and others to each maintain their backup archive code > inside qemu.git, tied to QEMU's C codebase, release cycle, and > license. > 2. QEMU already has interfaces to export the vmstate and block device > snapshots: migration/savevm and BlockDriverState (NBD for IPC or > raw/qcow2/vmdk for file). It is not necessary to add a > special-purpose interface just for backup. > > 3. The backup block job can be composed together with other QMP commands > to achieve scenarios besides just VMA backup. It's more flexible to > add simple primitives that can be combined instead of adding a > monolithic backup command. > > For these reasons, I'm against putting backup archive code into QEMU. That is OK for me - I already maintain the code outside of qemu.
On Fri, Mar 8, 2013 at 12:01 PM, Dietmar Maurer <dietmar@proxmox.com> wrote: >> > Anyways, and additional RPC layer always adds overhead, and It can be >> completely avoided. >> > Maybe not much, but I want to make backup as efficient as possible. >> >> The drawbacks outweight the performance advantage: > > not for me. > >> 1. QEMU can neither backup nor restore without help from the management >> tool. > > Backup works perfectly with the current patches. You can easily trigger a backup using > a HMP command. This is not really important, but works. If you send me a VMA file I can't restore it without knowing your command-line and possibly setting up the environment. QEMU just can't do that, it's out of scope. That is why the backup archive is a management tool concept. >> This is a strong indicator that the backup archive code should >> live outside QEMU. I doesn't make sense for proxmox, oVirt, >> OpenStack, and others to each maintain their backup archive code >> inside qemu.git, tied to QEMU's C codebase, release cycle, and >> license. >> 2. QEMU already has interfaces to export the vmstate and block device >> snapshots: migration/savevm and BlockDriverState (NBD for IPC or >> raw/qcow2/vmdk for file). It is not necessary to add a >> special-purpose interface just for backup. >> >> 3. The backup block job can be composed together with other QMP commands >> to achieve scenarios besides just VMA backup. It's more flexible to >> add simple primitives that can be combined instead of adding a >> monolithic backup command. >> >> For these reasons, I'm against putting backup archive code into QEMU. > > That is OK for me - I already maintain the code outside of qemu. Does this mean you will keep this patch series out-of-tree? What I am looking for is a stripped down patch series with just a backup block job (no backup archive writer or migration code). That would be easily merged and saves you front rebasing this series as QEMU changes. Stefan
> >> 1. QEMU can neither backup nor restore without help from the management > >> tool. > > > > Backup works perfectly with the current patches. You can easily > > trigger a backup using a HMP command. This is not really important, but works. > > If you send me a VMA file I can't restore it without knowing your command-line > and possibly setting up the environment. QEMU just can't do that, it's out of > scope. That is why the backup archive is a management tool concept. I don't really know why you insist on that claim. I can do a backup with hmp command, and then restore that later using the management tools. I just need to issue the correct hmp backup command. But this is not relevant for this discussion, so let us continue. We talked about the overhead of the additional IPC layer. > > >> This is a strong indicator that the backup archive code should > >> live outside QEMU. I doesn't make sense for proxmox, oVirt, > >> OpenStack, and others to each maintain their backup archive code > >> inside qemu.git, tied to QEMU's C codebase, release cycle, and > >> license. > >> 2. QEMU already has interfaces to export the vmstate and block device > >> snapshots: migration/savevm and BlockDriverState (NBD for IPC or > >> raw/qcow2/vmdk for file). It is not necessary to add a > >> special-purpose interface just for backup. > >> > >> 3. The backup block job can be composed together with other QMP > commands > >> to achieve scenarios besides just VMA backup. It's more flexible to > >> add simple primitives that can be combined instead of adding a > >> monolithic backup command. > >> > >> For these reasons, I'm against putting backup archive code into QEMU. > > > > That is OK for me - I already maintain the code outside of qemu. > > Does this mean you will keep this patch series out-of-tree? > > What I am looking for is a stripped down patch series with just a backup block > job (no backup archive writer or migration code). That would be easily merged > and saves you front rebasing this series as QEMU changes. That is Patch 2/6?
> >> For these reasons, I'm against putting backup archive code into QEMU. > > > > That is OK for me - I already maintain the code outside of qemu. > > Does this mean you will keep this patch series out-of-tree? You are 'against putting backup archive code into QEMU', so I need to maintain it out-of-tree.
On Fri, Mar 8, 2013 at 6:44 PM, Dietmar Maurer <dietmar@proxmox.com> wrote: >> >> This is a strong indicator that the backup archive code should >> >> live outside QEMU. I doesn't make sense for proxmox, oVirt, >> >> OpenStack, and others to each maintain their backup archive code >> >> inside qemu.git, tied to QEMU's C codebase, release cycle, and >> >> license. >> >> 2. QEMU already has interfaces to export the vmstate and block device >> >> snapshots: migration/savevm and BlockDriverState (NBD for IPC or >> >> raw/qcow2/vmdk for file). It is not necessary to add a >> >> special-purpose interface just for backup. >> >> >> >> 3. The backup block job can be composed together with other QMP >> commands >> >> to achieve scenarios besides just VMA backup. It's more flexible to >> >> add simple primitives that can be combined instead of adding a >> >> monolithic backup command. >> >> >> >> For these reasons, I'm against putting backup archive code into QEMU. >> > >> > That is OK for me - I already maintain the code outside of qemu. >> >> Does this mean you will keep this patch series out-of-tree? >> >> What I am looking for is a stripped down patch series with just a backup block >> job (no backup archive writer or migration code). That would be easily merged >> and saves you front rebasing this series as QEMU changes. > > That is Patch 2/6? Yes. I sent an RFC series that shows this approach. It is just a prototype but it demonstrates that the NBD approach works and performs reasonably well for a quick Python hack. The stripped down patch series needs: 1. (Optional) query-block virtual_size field if management tool needs to query drive size 2. Patch 2/6 with review comments addressed 3. 'block-backup' QMP command (and optionally HMP command) That's it! Stefan
> >> What I am looking for is a stripped down patch series with just a > >> backup block job (no backup archive writer or migration code). That > >> would be easily merged and saves you front rebasing this series as QEMU > changes. > > > > That is Patch 2/6? > > Yes. I sent an RFC series that shows this approach. It is just a prototype but it > demonstrates that the NBD approach works and performs reasonably well for a > quick Python hack. > > The stripped down patch series needs: > 1. (Optional) query-block virtual_size field if management tool needs to query > drive size 2. Patch 2/6 with review comments addressed 3. 'block-backup' QMP > command (and optionally HMP command) > > That's it! Thanks. I also have an implementation here using that approach (using nbd.c). The problem is the performance you get. 28% delay is not acceptable. In fact, I want to have the fastest possible solution (additional delay = 0). You want to move the code out of qemu, so that management frameworks can easily plug their own backup tools. That is perfectly valid, but we should do it in a way that does not influence performance. So are there any other ideas?
On Sun, Mar 10, 2013 at 10:32 AM, Dietmar Maurer <dietmar@proxmox.com> wrote: >> >> What I am looking for is a stripped down patch series with just a >> >> backup block job (no backup archive writer or migration code). That >> >> would be easily merged and saves you front rebasing this series as QEMU >> changes. >> > >> > That is Patch 2/6? >> >> Yes. I sent an RFC series that shows this approach. It is just a prototype but it >> demonstrates that the NBD approach works and performs reasonably well for a >> quick Python hack. >> >> The stripped down patch series needs: >> 1. (Optional) query-block virtual_size field if management tool needs to query >> drive size 2. Patch 2/6 with review comments addressed 3. 'block-backup' QMP >> command (and optionally HMP command) >> >> That's it! > > Thanks. I also have an implementation here using that approach (using nbd.c). > The problem is the performance you get. 28% delay is not acceptable. > > In fact, I want to have the fastest possible solution (additional delay = 0). > > You want to move the code out of qemu, so that management frameworks > can easily plug their own backup tools. That is perfectly valid, but we should do > it in a way that does not influence performance. > > So are there any other ideas? Same discussion in the other thread, so let's move there where we both just posted. Stefan
diff --git a/backup.h b/backup.h index d9395bc..c8ba153 100644 --- a/backup.h +++ b/backup.h @@ -29,4 +29,16 @@ int backup_job_create(BlockDriverState *bs, BackupDumpFunc *backup_dump_cb, BlockDriverCompletionFunc *backup_complete_cb, void *opaque, int64_t speed); +typedef struct BackupDriver { + const char *format; + void *(*open_cb)(const char *filename, uuid_t uuid, Error **errp); + int (*close_cb)(void *opaque, Error **errp); + int (*register_config_cb)(void *opaque, const char *name, gpointer data, + size_t data_len); + int (*register_stream_cb)(void *opaque, const char *devname, size_t size); + int (*dump_cb)(void *opaque, uint8_t dev_id, int64_t cluster_num, + unsigned char *buf, size_t *zero_bytes); + int (*complete_cb)(void *opaque, uint8_t dev_id, int ret); +} BackupDriver; + #endif /* QEMU_BACKUP_H */ diff --git a/blockdev.c b/blockdev.c index 63e6f1e..c340fde 100644 --- a/blockdev.c +++ b/blockdev.c @@ -20,6 +20,7 @@ #include "qmp-commands.h" #include "trace.h" #include "sysemu/arch_init.h" +#include "backup.h" static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives); @@ -1334,6 +1335,428 @@ void qmp_drive_mirror(const char *device, const char *target, drive_get_ref(drive_get_by_blockdev(bs)); } +/* Backup related function */ + +static void backup_run_next_job(void); + +static struct GenericBackupState { + Error *error; + bool cancel; + uuid_t uuid; + char uuid_str[37]; + int64_t speed; + time_t start_time; + time_t end_time; + char *backup_file; + const BackupDriver *driver; + void *writer; + GList *bcb_list; + size_t total; + size_t transferred; + size_t zero_bytes; +} backup_state; + +typedef struct BackupCB { + BlockDriverState *bs; + uint8_t dev_id; + bool started; + bool completed; + size_t size; + size_t transferred; + size_t zero_bytes; +} BackupCB; + +static int backup_dump_cb(void *opaque, BlockDriverState *bs, + int64_t cluster_num, unsigned char *buf) +{ + BackupCB *bcb = opaque; + + assert(backup_state.driver); + assert(backup_state.writer); + assert(backup_state.driver->dump_cb); + + size_t zero_bytes = 0; + int bytes = backup_state.driver->dump_cb(backup_state.writer, + bcb->dev_id, cluster_num, + buf, &zero_bytes); + + if (bytes > 0) { + bcb->transferred += bytes; + backup_state.transferred += bytes; + if (zero_bytes) { + bcb->zero_bytes += bytes; + backup_state.zero_bytes += zero_bytes; + } + } + + return bytes; +} + +static void backup_cleanup(void) +{ + if (backup_state.writer && backup_state.driver) { + backup_state.end_time = time(NULL); + Error *local_err = NULL; + backup_state.driver->close_cb(backup_state.writer, &local_err); + error_propagate(&backup_state.error, local_err); + backup_state.writer = NULL; + } + + if (backup_state.bcb_list) { + GList *l = backup_state.bcb_list; + while (l) { + BackupCB *bcb = l->data; + l = g_list_next(l); + drive_put_ref_bh_schedule(drive_get_by_blockdev(bcb->bs)); + g_free(bcb); + } + g_list_free(backup_state.bcb_list); + backup_state.bcb_list = NULL; + } +} + +static void backup_complete_cb(void *opaque, int ret) +{ + BackupCB *bcb = opaque; + + assert(backup_state.driver); + assert(backup_state.writer); + assert(backup_state.driver->complete_cb); + assert(backup_state.driver->close_cb); + + bcb->completed = true; + + backup_state.driver->complete_cb(backup_state.writer, bcb->dev_id, ret); + + if (!backup_state.cancel) { + backup_run_next_job(); + } +} + +static void backup_cancel(void) +{ + backup_state.cancel = true; + + if (!backup_state.error) { + error_setg(&backup_state.error, "backup cancelled"); + } + + /* drain all i/o (awake jobs waiting for aio) */ + bdrv_drain_all(); + + int job_count = 0; + GList *l = backup_state.bcb_list; + while (l) { + BackupCB *bcb = l->data; + l = g_list_next(l); + BlockJob *job = bcb->bs->job; + if (job) { + job_count++; + if (!bcb->started) { + bcb->started = true; + backup_job_start(bcb->bs, true); + } + if (!bcb->completed) { + block_job_cancel_sync(job); + } + } + } + + backup_cleanup(); +} + +void qmp_backup_cancel(Error **errp) +{ + backup_cancel(); +} + +static void backup_run_next_job(void) +{ + GList *l = backup_state.bcb_list; + while (l) { + BackupCB *bcb = l->data; + l = g_list_next(l); + + if (bcb->bs && bcb->bs->job && !bcb->completed) { + if (!bcb->started) { + bcb->started = true; + bool cancel = backup_state.error || backup_state.cancel; + backup_job_start(bcb->bs, cancel); + } + return; + } + } + + backup_cleanup(); +} + +static void backup_start_jobs(void) +{ + /* create all jobs (one for each device), start first one */ + GList *l = backup_state.bcb_list; + while (l) { + BackupCB *bcb = l->data; + l = g_list_next(l); + + if (backup_job_create(bcb->bs, backup_dump_cb, backup_complete_cb, + bcb, backup_state.speed) != 0) { + error_setg(&backup_state.error, "backup_job_create failed"); + backup_cancel(); + return; + } + } + + backup_run_next_job(); +} + +char *qmp_backup(const char *backup_file, bool has_format, BackupFormat format, + bool has_config_file, const char *config_file, + bool has_devlist, const char *devlist, + bool has_speed, int64_t speed, Error **errp) +{ + BlockDriverState *bs; + Error *local_err = NULL; + uuid_t uuid; + void *writer = NULL; + gchar **devs = NULL; + GList *bcblist = NULL; + + if (backup_state.bcb_list) { + error_set(errp, ERROR_CLASS_GENERIC_ERROR, + "previous backup not finished"); + return NULL; + } + + /* Todo: try to auto-detect format based on file name */ + format = has_format ? format : BACKUP_FORMAT_VMA; + + /* fixme: find driver for specifued format */ + const BackupDriver *driver = NULL; + + if (!driver) { + error_set(errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format"); + return NULL; + } + + if (has_devlist) { + devs = g_strsplit_set(devlist, ",;:", -1); + + gchar **d = devs; + while (d && *d) { + bs = bdrv_find(*d); + if (bs) { + if (bdrv_is_read_only(bs)) { + error_set(errp, QERR_DEVICE_IS_READ_ONLY, *d); + goto err; + } + if (!bdrv_is_inserted(bs)) { + error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, *d); + goto err; + } + BackupCB *bcb = g_new0(BackupCB, 1); + bcb->bs = bs; + bcblist = g_list_append(bcblist, bcb); + } else { + error_set(errp, QERR_DEVICE_NOT_FOUND, *d); + goto err; + } + d++; + } + + } else { + + bs = NULL; + while ((bs = bdrv_next(bs))) { + + if (!bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) { + continue; + } + + BackupCB *bcb = g_new0(BackupCB, 1); + bcb->bs = bs; + bcblist = g_list_append(bcblist, bcb); + } + } + + if (!bcblist) { + error_set(errp, ERROR_CLASS_GENERIC_ERROR, "empty device list"); + goto err; + } + + GList *l = bcblist; + while (l) { + BackupCB *bcb = l->data; + l = g_list_next(l); + if (bcb->bs->job) { + error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bcb->bs)); + goto err; + } + } + + uuid_generate(uuid); + + writer = driver->open_cb(backup_file, uuid, &local_err); + if (!writer) { + if (error_is_set(&local_err)) { + error_propagate(errp, local_err); + } + goto err; + } + + size_t total = 0; + + /* register all devices for vma writer */ + l = bcblist; + while (l) { + BackupCB *bcb = l->data; + l = g_list_next(l); + + int64_t size = bdrv_getlength(bcb->bs); + const char *devname = bdrv_get_device_name(bcb->bs); + bcb->dev_id = driver->register_stream_cb(writer, devname, size); + if (bcb->dev_id <= 0) { + error_set(errp, ERROR_CLASS_GENERIC_ERROR, + "register_stream failed"); + goto err; + } + bcb->size = size; + total += size; + } + + /* add configuration file to archive */ + if (has_config_file) { + char *cdata = NULL; + gsize clen = 0; + GError *err = NULL; + if (!g_file_get_contents(config_file, &cdata, &clen, &err)) { + error_setg(errp, "unable to read file '%s'", config_file); + goto err; + } + + const char *basename = g_path_get_basename(config_file); + if (driver->register_config_cb(writer, basename, cdata, clen) < 0) { + error_setg(errp, "register_config failed"); + g_free(cdata); + goto err; + } + g_free(cdata); + } + + /* initialize global backup_state now */ + + backup_state.cancel = false; + + if (backup_state.error) { + error_free(backup_state.error); + backup_state.error = NULL; + } + + backup_state.driver = driver; + + backup_state.speed = (has_speed && speed > 0) ? speed : 0; + + backup_state.start_time = time(NULL); + backup_state.end_time = 0; + + if (backup_state.backup_file) { + g_free(backup_state.backup_file); + } + backup_state.backup_file = g_strdup(backup_file); + + backup_state.writer = writer; + + uuid_copy(backup_state.uuid, uuid); + uuid_unparse_lower(uuid, backup_state.uuid_str); + + backup_state.bcb_list = bcblist; + + backup_state.total = total; + backup_state.transferred = 0; + backup_state.zero_bytes = 0; + + /* Grab a reference so hotplug does not delete the + * BlockDriverState from underneath us. + */ + l = bcblist; + while (l) { + BackupCB *bcb = l->data; + l = g_list_next(l); + drive_get_ref(drive_get_by_blockdev(bcb->bs)); + } + + backup_start_jobs(); + + return g_strdup(backup_state.uuid_str); + +err: + + l = bcblist; + while (l) { + g_free(l->data); + l = g_list_next(l); + } + g_list_free(bcblist); + + if (devs) { + g_strfreev(devs); + } + + if (writer) { + unlink(backup_file); + if (driver) { + Error *err = NULL; + driver->close_cb(writer, &err); + } + } + + return NULL; +} + +BackupStatus *qmp_query_backup(Error **errp) +{ + BackupStatus *info = g_malloc0(sizeof(*info)); + + if (!backup_state.start_time) { + /* not started, return {} */ + return info; + } + + info->has_status = true; + info->has_start_time = true; + info->start_time = backup_state.start_time; + + if (backup_state.backup_file) { + info->has_backup_file = true; + info->backup_file = g_strdup(backup_state.backup_file); + } + + info->has_uuid = true; + info->uuid = g_strdup(backup_state.uuid_str); + + if (backup_state.end_time) { + if (backup_state.error) { + info->status = g_strdup("error"); + info->has_errmsg = true; + info->errmsg = g_strdup(error_get_pretty(backup_state.error)); + } else { + info->status = g_strdup("done"); + } + info->has_end_time = true; + info->end_time = backup_state.end_time; + } else { + info->status = g_strdup("active"); + } + + info->has_total = true; + info->total = backup_state.total; + info->has_zero_bytes = true; + info->zero_bytes = backup_state.zero_bytes; + info->has_transferred = true; + info->transferred = backup_state.transferred; + + return info; +} + static BlockJob *find_block_job(const char *device) { BlockDriverState *bs; diff --git a/hmp-commands.hx b/hmp-commands.hx index 64008a9..0f178d8 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -83,6 +83,35 @@ STEXI Copy data from a backing file into a block device. ETEXI + { + .name = "backup", + .args_type = "backupfile:s,speed:o?,devlist:s?", + .params = "backupfile [speed [devlist]]", + .help = "create a VM Backup.", + .mhandler.cmd = hmp_backup, + }, + +STEXI +@item backup +@findex backup +Create a VM backup. +ETEXI + + { + .name = "backup_cancel", + .args_type = "", + .params = "", + .help = "cancel the current VM backup", + .mhandler.cmd = hmp_backup_cancel, + }, + +STEXI +@item backup_cancel +@findex backup_cancel +Cancel the current VM backup. + +ETEXI + { .name = "block_job_set_speed", .args_type = "device:B,speed:o", @@ -1630,6 +1659,8 @@ show CPU statistics show user network stack connection states @item info migrate show migration status +@item info backup +show backup status @item info migrate_capabilities show current migration capabilities @item info migrate_cache_size diff --git a/hmp.c b/hmp.c index 2f47a8a..b2c1f23 100644 --- a/hmp.c +++ b/hmp.c @@ -131,6 +131,38 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict) qapi_free_MouseInfoList(mice_list); } +void hmp_info_backup(Monitor *mon, const QDict *qdict) +{ + BackupStatus *info; + + info = qmp_query_backup(NULL); + if (info->has_status) { + if (info->has_errmsg) { + monitor_printf(mon, "Backup status: %s - %s\n", + info->status, info->errmsg); + } else { + monitor_printf(mon, "Backup status: %s\n", info->status); + } + } + if (info->has_backup_file) { + int per = (info->has_total && info->total && + info->has_transferred && info->transferred) ? + (info->transferred * 100)/info->total : 0; + int zero_per = (info->has_total && info->total && + info->has_zero_bytes && info->zero_bytes) ? + (info->zero_bytes * 100)/info->total : 0; + monitor_printf(mon, "Backup file: %s\n", info->backup_file); + monitor_printf(mon, "Backup uuid: %s\n", info->uuid); + monitor_printf(mon, "Total size: %zd\n", info->total); + monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n", + info->transferred, per); + monitor_printf(mon, "Zero bytes: %zd (%d%%)\n", + info->zero_bytes, zero_per); + } + + qapi_free_BackupStatus(info); +} + void hmp_info_migrate(Monitor *mon, const QDict *qdict) { MigrationInfo *info; @@ -998,6 +1030,37 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict) hmp_handle_error(mon, &error); } +void hmp_backup_cancel(Monitor *mon, const QDict *qdict) +{ + Error *errp = NULL; + + qmp_backup_cancel(&errp); + + if (error_is_set(&errp)) { + monitor_printf(mon, "%s\n", error_get_pretty(errp)); + error_free(errp); + return; + } +} + +void hmp_backup(Monitor *mon, const QDict *qdict) +{ + const char *backup_file = qdict_get_str(qdict, "backup-file"); + const char *devlist = qdict_get_try_str(qdict, "devlist"); + int64_t speed = qdict_get_try_int(qdict, "speed", 0); + + Error *errp = NULL; + + qmp_backup(backup_file, true, BACKUP_FORMAT_VMA, false, NULL, !!devlist, + devlist, qdict_haskey(qdict, "speed"), speed, &errp); + + if (error_is_set(&errp)) { + monitor_printf(mon, "%s\n", error_get_pretty(errp)); + error_free(errp); + return; + } +} + void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict) { Error *error = NULL; diff --git a/hmp.h b/hmp.h index 30b3c20..ad4cf80 100644 --- a/hmp.h +++ b/hmp.h @@ -28,6 +28,7 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict); void hmp_info_migrate(Monitor *mon, const QDict *qdict); void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict); void hmp_info_migrate_cache_size(Monitor *mon, const QDict *qdict); +void hmp_info_backup(Monitor *mon, const QDict *qdict); void hmp_info_cpus(Monitor *mon, const QDict *qdict); void hmp_info_block(Monitor *mon, const QDict *qdict); void hmp_info_blockstats(Monitor *mon, const QDict *qdict); @@ -65,6 +66,8 @@ void hmp_eject(Monitor *mon, const QDict *qdict); void hmp_change(Monitor *mon, const QDict *qdict); void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict); void hmp_block_stream(Monitor *mon, const QDict *qdict); +void hmp_backup(Monitor *mon, const QDict *qdict); +void hmp_backup_cancel(Monitor *mon, const QDict *qdict); void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict); void hmp_block_job_cancel(Monitor *mon, const QDict *qdict); void hmp_block_job_pause(Monitor *mon, const QDict *qdict); diff --git a/monitor.c b/monitor.c index 6a0f257..e4a810c 100644 --- a/monitor.c +++ b/monitor.c @@ -2666,6 +2666,13 @@ static mon_cmd_t info_cmds[] = { }, #endif { + .name = "backup", + .args_type = "", + .params = "", + .help = "show backup status", + .mhandler.cmd = hmp_info_backup, + }, + { .name = "migrate", .args_type = "", .params = "", diff --git a/qapi-schema.json b/qapi-schema.json index 7275b5d..09ca8ef 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -425,6 +425,39 @@ { 'type': 'EventInfo', 'data': {'name': 'str'} } ## +# @BackupStatus: +# +# Detailed backup status. +# +# @status: #optional string describing the current backup status. +# This can be 'active', 'done', 'error'. If this field is not +# returned, no backup process has been initiated +# +# @errmsg: #optional error message (only returned if status is 'error') +# +# @total: #optional total amount of bytes involved in the backup process +# +# @transferred: #optional amount of bytes already backed up. +# +# @zero-bytes: #optional amount of 'zero' bytes detected. +# +# @start-time: #optional time (epoch) when backup job started. +# +# @end-time: #optional time (epoch) when backup job finished. +# +# @backupfile: #optional backup file name +# +# @uuid: #optional uuid for this backup job +# +# Since: 1.5.0 +## +{ 'type': 'BackupStatus', + 'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int', + '*transferred': 'int', '*zero-bytes': 'int', + '*start-time': 'int', '*end-time': 'int', + '*backup-file': 'str', '*uuid': 'str' } } + +## # @query-events: # # Return a list of supported QMP events by this server @@ -1824,6 +1857,68 @@ 'data': { 'path': 'str' }, 'returns': [ 'ObjectPropertyInfo' ] } + +## +# @BackupFormat +# +# An enumeration of supported backup formats. +# +# @vma: Proxmox vma backup format +## +{ 'enum': 'BackupFormat', + 'data': [ 'vma' ] } + +## +# @backup: +# +# Starts a VM backup. +# +# @backup-file: the backup file name +# +# @format: format of the backup file +# +# @config-filename: #optional name of a configuration file to include into +# the backup archive. +# +# @speed: #optional the maximum speed, in bytes per second +# +# @devlist: #optional list of block device names (separated by ',', ';' +# or ':'). By default the backup includes all writable block devices. +# +# Returns: the uuid of the backup job +# +# Since: 1.5.0 +## +{ 'command': 'backup', 'data': { 'backup-file': 'str', + '*format': 'BackupFormat', + '*config-file': 'str', + '*devlist': 'str', '*speed': 'int' }, + 'returns': 'str' } + +## +# @query-backup +# +# Returns information about current/last backup task. +# +# Returns: @BackupStatus +# +# Since: 1.5.0 +## +{ 'command': 'query-backup', 'returns': 'BackupStatus' } + +## +# @backup-cancel +# +# Cancel the current executing backup process. +# +# Returns: nothing on success +# +# Notes: This command succeeds even if there is no backup process running. +# +# Since: 1.5.0 +## +{ 'command': 'backup-cancel' } + ## # @qom-get: # diff --git a/qmp-commands.hx b/qmp-commands.hx index 799adea..17e381b 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -889,6 +889,18 @@ EQMP }, { + .name = "backup", + .args_type = "backup-file:s,format:s?,config-file:F?,speed:o?,devlist:s?", + .mhandler.cmd_new = qmp_marshal_input_backup, + }, + + { + .name = "backup-cancel", + .args_type = "", + .mhandler.cmd_new = qmp_marshal_input_backup_cancel, + }, + + { .name = "block-job-set-speed", .args_type = "device:B,speed:o", .mhandler.cmd_new = qmp_marshal_input_block_job_set_speed, @@ -2566,6 +2578,21 @@ EQMP }, SQMP + +query-backup +------------- + +Backup status. + +EQMP + + { + .name = "query-backup", + .args_type = "", + .mhandler.cmd_new = qmp_marshal_input_query_backup, + }, + +SQMP migrate-set-capabilities -------
We use a generic BackupDriver struct to encapsulate all archive format related function. Another option would be to simply dump <devid,cluster_num,cluster_data> to the output fh (pipe), and an external binary saves the data. That way we could move the whole archive format related code out of qemu. Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> --- backup.h | 12 ++ blockdev.c | 423 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ hmp-commands.hx | 31 ++++ hmp.c | 63 ++++++++ hmp.h | 3 + monitor.c | 7 + qapi-schema.json | 95 ++++++++++++ qmp-commands.hx | 27 ++++ 8 files changed, 661 insertions(+), 0 deletions(-)