Message ID | 1592606622-29884-1-git-send-email-linuxram@us.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | Migrate non-migrated pages of a SVM. | expand |
On Fri, Jun 19, 2020 at 03:43:38PM -0700, Ram Pai wrote: > The time taken to switch a VM to Secure-VM, increases by the size of the VM. A > 100GB VM takes about 7minutes. This is unacceptable. This linear increase is > caused by a suboptimal behavior by the Ultravisor and the Hypervisor. The > Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to > secure-memory. It has to just migrate the necessary and sufficient GFNs. > > However when the optimization is incorporated in the Ultravisor, the Hypervisor > starts misbehaving. The Hypervisor has a inbuilt assumption that the Ultravisor > will explicitly request to migrate, each and every GFN of the VM. If only > necessary and sufficient GFNs are requested for migration, the Hypervisor > continues to manage the remaining GFNs as normal GFNs. This leads of memory > corruption, manifested consistently when the SVM reboots. > > The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor > expects the ultravisor to request migration of all GFNs to secure-GFN. But at > the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests > from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall. This > problem manifests as random errors in the SVM, when a memory-slot is > hotplugged. > > This patch series automatically migrates the non-migrated pages of a SVM, > and thus solves the problem. So this is what I understand as the objective of this patchset: 1. Getting all the pages into the secure memory right when the guest transitions into secure mode is expensive. Ultravisor wants to just get the necessary and sufficient pages in and put the onus on the Hypervisor to mark the remaining pages (w/o actual page-in) as secure during H_SVM_INIT_DONE. 2. During H_SVM_INIT_DONE, you want a way to differentiate the pages that are already secure from the pages that are shared and that are paged-out. For this you are introducing all these new states in HV. UV knows about the shared GFNs and maintains the state of the same. Hence let HV send all the pages (minus already secured pages) via H_SVM_PAGE_IN and if UV finds any shared pages in them, let it fail the uv-page-in call. Then HV can fail the migration for it and the page continues to remain shared. With this, you don't need to maintain a state for secured GFN in HV. In the unlikely case of sending a paged-out page to UV during H_SVM_INIT_DONE, let the page-in succeed and HV will fault on it again if required. With this, you don't need a state in HV to identify a paged-out-but-encrypted state. Doesn't the above work? If so, we can avoid all those extra states in HV. That way HV can continue to differentiate only between two types of pages: secure and not-secure. The rest of the states (shared, paged-out-encrypted) actually belong to SVM/UV and let UV take care of them. Or did I miss something? Regards, Bharata.
On Sun, Jun 28, 2020 at 09:41:53PM +0530, Bharata B Rao wrote: > On Fri, Jun 19, 2020 at 03:43:38PM -0700, Ram Pai wrote: > > The time taken to switch a VM to Secure-VM, increases by the size of the VM. A > > 100GB VM takes about 7minutes. This is unacceptable. This linear increase is > > caused by a suboptimal behavior by the Ultravisor and the Hypervisor. The > > Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to > > secure-memory. It has to just migrate the necessary and sufficient GFNs. > > > > However when the optimization is incorporated in the Ultravisor, the Hypervisor > > starts misbehaving. The Hypervisor has a inbuilt assumption that the Ultravisor > > will explicitly request to migrate, each and every GFN of the VM. If only > > necessary and sufficient GFNs are requested for migration, the Hypervisor > > continues to manage the remaining GFNs as normal GFNs. This leads of memory > > corruption, manifested consistently when the SVM reboots. > > > > The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor > > expects the ultravisor to request migration of all GFNs to secure-GFN. But at > > the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests > > from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall. This > > problem manifests as random errors in the SVM, when a memory-slot is > > hotplugged. > > > > This patch series automatically migrates the non-migrated pages of a SVM, > > and thus solves the problem. > > So this is what I understand as the objective of this patchset: > > 1. Getting all the pages into the secure memory right when the guest > transitions into secure mode is expensive. Ultravisor wants to just get > the necessary and sufficient pages in and put the onus on the Hypervisor > to mark the remaining pages (w/o actual page-in) as secure during > H_SVM_INIT_DONE. > 2. During H_SVM_INIT_DONE, you want a way to differentiate the pages that > are already secure from the pages that are shared and that are paged-out. > For this you are introducing all these new states in HV. > > UV knows about the shared GFNs and maintains the state of the same. Hence > let HV send all the pages (minus already secured pages) via H_SVM_PAGE_IN > and if UV finds any shared pages in them, let it fail the uv-page-in call. > Then HV can fail the migration for it and the page continues to remain > shared. With this, you don't need to maintain a state for secured GFN in HV. > > In the unlikely case of sending a paged-out page to UV during > H_SVM_INIT_DONE, let the page-in succeed and HV will fault on it again > if required. With this, you don't need a state in HV to identify a > paged-out-but-encrypted state. > > Doesn't the above work? I see that you want to infact skip the uv-page-in calls from H_SVM_INIT_DONE. So that would need the extra states in HV which you are proposing here. Regards, Bharata.
On Mon, Jun 29, 2020 at 07:23:30AM +0530, Bharata B Rao wrote: > On Sun, Jun 28, 2020 at 09:41:53PM +0530, Bharata B Rao wrote: > > On Fri, Jun 19, 2020 at 03:43:38PM -0700, Ram Pai wrote: > > > The time taken to switch a VM to Secure-VM, increases by the size of the VM. A > > > 100GB VM takes about 7minutes. This is unacceptable. This linear increase is > > > caused by a suboptimal behavior by the Ultravisor and the Hypervisor. The > > > Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to > > > secure-memory. It has to just migrate the necessary and sufficient GFNs. > > > > > > However when the optimization is incorporated in the Ultravisor, the Hypervisor > > > starts misbehaving. The Hypervisor has a inbuilt assumption that the Ultravisor > > > will explicitly request to migrate, each and every GFN of the VM. If only > > > necessary and sufficient GFNs are requested for migration, the Hypervisor > > > continues to manage the remaining GFNs as normal GFNs. This leads of memory > > > corruption, manifested consistently when the SVM reboots. > > > > > > The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor > > > expects the ultravisor to request migration of all GFNs to secure-GFN. But at > > > the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests > > > from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall. This > > > problem manifests as random errors in the SVM, when a memory-slot is > > > hotplugged. > > > > > > This patch series automatically migrates the non-migrated pages of a SVM, > > > and thus solves the problem. > > > > So this is what I understand as the objective of this patchset: > > > > 1. Getting all the pages into the secure memory right when the guest > > transitions into secure mode is expensive. Ultravisor wants to just get > > the necessary and sufficient pages in and put the onus on the Hypervisor > > to mark the remaining pages (w/o actual page-in) as secure during > > H_SVM_INIT_DONE. > > 2. During H_SVM_INIT_DONE, you want a way to differentiate the pages that > > are already secure from the pages that are shared and that are paged-out. > > For this you are introducing all these new states in HV. > > > > UV knows about the shared GFNs and maintains the state of the same. Hence > > let HV send all the pages (minus already secured pages) via H_SVM_PAGE_IN > > and if UV finds any shared pages in them, let it fail the uv-page-in call. > > Then HV can fail the migration for it and the page continues to remain > > shared. With this, you don't need to maintain a state for secured GFN in HV. > > > > In the unlikely case of sending a paged-out page to UV during > > H_SVM_INIT_DONE, let the page-in succeed and HV will fault on it again > > if required. With this, you don't need a state in HV to identify a > > paged-out-but-encrypted state. > > > > Doesn't the above work? > > I see that you want to infact skip the uv-page-in calls from H_SVM_INIT_DONE. > So that would need the extra states in HV which you are proposing here. Yes. I want to skip to speed up the overall ESM switch. RP