Message ID | 20200317104942.11178-6-david@redhat.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [v2,1/8] drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (ab326587bb5fb91cc97df9b9f48e9e1469f04621) |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 56 lines checked |
snowpatch_ozlabs/needsstable | success | Patch has no Fixes tags |
David Hildenbrand <david@redhat.com> writes: > We get the MEM_ONLINE notifier call if memory is added right from the > kernel via add_memory() or later from user space. > > Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt > mechanism (->done) for that. Initialize the wait event only once and > reinitialize before adding memory. Unconditionally call complete() and > wait_for_completion_timeout(). > > If there are no waiters, complete() will only increment ->done - which > will be reset by reinit_completion(). If complete() has already been > called, wait_for_completion_timeout() will not wait. > > There is still the chance for a small race between concurrent > reinit_completion() and complete(). If complete() wins, we would not > wait - which is tolerable (and the race exists in current code as > well). How can we see concurent reinit_completion() and complete()? Obvioulsy, we are not onlining new memory in kernel and hv_mem_hot_add() calls are serialized, we're waiting up to 5*HZ for the added block to come online before proceeding to the next one. Or do you mean we actually hit this 5*HZ timeout, proceeded to the next block and immediately after reinit_completion() we saw complete() for the previously added block? This is tolerable indeed, we're making forward progress (and this all is 'best effort' anyway). > > Note: We only wait for "some" memory to get onlined, which seems to be > good enough for now. > > Cc: "K. Y. Srinivasan" <kys@microsoft.com> > Cc: Haiyang Zhang <haiyangz@microsoft.com> > Cc: Stephen Hemminger <sthemmin@microsoft.com> > Cc: Wei Liu <wei.liu@kernel.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > Cc: Baoquan He <bhe@redhat.com> > Cc: Wei Yang <richard.weiyang@gmail.com> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Cc: linux-hyperv@vger.kernel.org > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > drivers/hv/hv_balloon.c | 25 ++++++++++--------------- > 1 file changed, 10 insertions(+), 15 deletions(-) > > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c > index a02ce43d778d..af5e09f08130 100644 > --- a/drivers/hv/hv_balloon.c > +++ b/drivers/hv/hv_balloon.c > @@ -533,7 +533,6 @@ struct hv_dynmem_device { > * State to synchronize hot-add. > */ > struct completion ol_waitevent; > - bool ha_waiting; > /* > * This thread handles hot-add > * requests from the host as well as notifying > @@ -634,10 +633,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val, > switch (val) { > case MEM_ONLINE: > case MEM_CANCEL_ONLINE: > - if (dm_device.ha_waiting) { > - dm_device.ha_waiting = false; > - complete(&dm_device.ol_waitevent); > - } > + complete(&dm_device.ol_waitevent); > break; > > case MEM_OFFLINE: > @@ -726,8 +722,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, > has->covered_end_pfn += processed_pfn; > spin_unlock_irqrestore(&dm_device.ha_lock, flags); > > - init_completion(&dm_device.ol_waitevent); > - dm_device.ha_waiting = !memhp_auto_online; > + reinit_completion(&dm_device.ol_waitevent); > > nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); > ret = add_memory(nid, PFN_PHYS((start_pfn)), > @@ -753,15 +748,14 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, > } > > /* > - * Wait for the memory block to be onlined when memory onlining > - * is done outside of kernel (memhp_auto_online). Since the hot > - * add has succeeded, it is ok to proceed even if the pages in > - * the hot added region have not been "onlined" within the > - * allowed time. > + * Wait for memory to get onlined. If the kernel onlined the > + * memory when adding it, this will return directly. Otherwise, > + * it will wait for user space to online the memory. This helps > + * to avoid adding memory faster than it is getting onlined. As > + * adding succeeded, it is ok to proceed even if the memory was > + * not onlined in time. > */ > - if (dm_device.ha_waiting) > - wait_for_completion_timeout(&dm_device.ol_waitevent, > - 5*HZ); > + wait_for_completion_timeout(&dm_device.ol_waitevent, 5 * HZ); > post_status(&dm_device); > } > } > @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev, > #ifdef CONFIG_MEMORY_HOTPLUG > set_online_page_callback(&hv_online_page); > register_memory_notifier(&hv_memory_nb); > + init_completion(&dm_device.ol_waitevent); > #endif > > hv_set_drvdata(dev, &dm_device); Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
On 17.03.20 17:29, Vitaly Kuznetsov wrote: > David Hildenbrand <david@redhat.com> writes: > >> We get the MEM_ONLINE notifier call if memory is added right from the >> kernel via add_memory() or later from user space. >> >> Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt >> mechanism (->done) for that. Initialize the wait event only once and >> reinitialize before adding memory. Unconditionally call complete() and >> wait_for_completion_timeout(). >> >> If there are no waiters, complete() will only increment ->done - which >> will be reset by reinit_completion(). If complete() has already been >> called, wait_for_completion_timeout() will not wait. >> >> There is still the chance for a small race between concurrent >> reinit_completion() and complete(). If complete() wins, we would not >> wait - which is tolerable (and the race exists in current code as >> well). > > How can we see concurent reinit_completion() and complete()? Obvioulsy, > we are not onlining new memory in kernel and hv_mem_hot_add() calls are > serialized, we're waiting up to 5*HZ for the added block to come online > before proceeding to the next one. Or do you mean we actually hit this > 5*HZ timeout, proceeded to the next block and immediately after > reinit_completion() we saw complete() for the previously added block? Yes exactly - or if an admin manually offlines+re-onlines a random memory block. > This is tolerable indeed, we're making forward progress (and this all is > 'best effort' anyway). Exactly my thoughts. [...] > > Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> > Thanks!
> @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev, > #ifdef CONFIG_MEMORY_HOTPLUG > set_online_page_callback(&hv_online_page); > register_memory_notifier(&hv_memory_nb); > + init_completion(&dm_device.ol_waitevent); I'll move this one line up. > #endif > > hv_set_drvdata(dev, &dm_device); >
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index a02ce43d778d..af5e09f08130 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -533,7 +533,6 @@ struct hv_dynmem_device { * State to synchronize hot-add. */ struct completion ol_waitevent; - bool ha_waiting; /* * This thread handles hot-add * requests from the host as well as notifying @@ -634,10 +633,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val, switch (val) { case MEM_ONLINE: case MEM_CANCEL_ONLINE: - if (dm_device.ha_waiting) { - dm_device.ha_waiting = false; - complete(&dm_device.ol_waitevent); - } + complete(&dm_device.ol_waitevent); break; case MEM_OFFLINE: @@ -726,8 +722,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, has->covered_end_pfn += processed_pfn; spin_unlock_irqrestore(&dm_device.ha_lock, flags); - init_completion(&dm_device.ol_waitevent); - dm_device.ha_waiting = !memhp_auto_online; + reinit_completion(&dm_device.ol_waitevent); nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); ret = add_memory(nid, PFN_PHYS((start_pfn)), @@ -753,15 +748,14 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, } /* - * Wait for the memory block to be onlined when memory onlining - * is done outside of kernel (memhp_auto_online). Since the hot - * add has succeeded, it is ok to proceed even if the pages in - * the hot added region have not been "onlined" within the - * allowed time. + * Wait for memory to get onlined. If the kernel onlined the + * memory when adding it, this will return directly. Otherwise, + * it will wait for user space to online the memory. This helps + * to avoid adding memory faster than it is getting onlined. As + * adding succeeded, it is ok to proceed even if the memory was + * not onlined in time. */ - if (dm_device.ha_waiting) - wait_for_completion_timeout(&dm_device.ol_waitevent, - 5*HZ); + wait_for_completion_timeout(&dm_device.ol_waitevent, 5 * HZ); post_status(&dm_device); } } @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev, #ifdef CONFIG_MEMORY_HOTPLUG set_online_page_callback(&hv_online_page); register_memory_notifier(&hv_memory_nb); + init_completion(&dm_device.ol_waitevent); #endif hv_set_drvdata(dev, &dm_device);
We get the MEM_ONLINE notifier call if memory is added right from the kernel via add_memory() or later from user space. Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt mechanism (->done) for that. Initialize the wait event only once and reinitialize before adding memory. Unconditionally call complete() and wait_for_completion_timeout(). If there are no waiters, complete() will only increment ->done - which will be reset by reinit_completion(). If complete() has already been called, wait_for_completion_timeout() will not wait. There is still the chance for a small race between concurrent reinit_completion() and complete(). If complete() wins, we would not wait - which is tolerable (and the race exists in current code as well). Note: We only wait for "some" memory to get onlined, which seems to be good enough for now. Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: linux-hyperv@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> --- drivers/hv/hv_balloon.c | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-)