From patchwork Tue Dec 17 12:38:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1211383 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47cdJV2VFFz9sNH for ; Tue, 17 Dec 2019 23:49:46 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="NLN8ykOl"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 47cdJV0flXzDqMl for ; Tue, 17 Dec 2019 23:49:46 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=205.139.110.61; helo=us-smtp-delivery-1.mimecast.com; envelope-from=david@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="NLN8ykOl"; dkim-atps=neutral Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 47cd4G1ph8zDqWv for ; Tue, 17 Dec 2019 23:39:09 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1576586346; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B246GLaT3HAQaGu/abosRqj+9QWb/G6+/8B/T9JdjTs=; b=NLN8ykOlKEbDicwTqcCtwyEs3j4UEjoMgDFjWgjx6YEX6OjM6IXmnFMqri3RbSZ/VcA/yT 4AhVam6l1ueyFnC55jYCYomMet43oOcS7mfSey4TLzb7O9azMGYA/rU5sd9gUeXwHwqApl qj9LYK2OmNA0DZLFs2giqc24JHCv6PU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-60-lTiwSymxNZWStB5eZN8R8g-1; Tue, 17 Dec 2019 07:39:03 -0500 X-MC-Unique: lTiwSymxNZWStB5eZN8R8g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DE8DF107ACC5; Tue, 17 Dec 2019 12:39:00 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.36.118.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id 538961001281; Tue, 17 Dec 2019 12:38:58 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Subject: [PATCH RFC v1 1/3] powerpc/memtrace: Enforce power of 2 for memory buffer size Date: Tue, 17 Dec 2019 13:38:49 +0100 Message-Id: <20191217123851.8854-2-david@redhat.com> In-Reply-To: <20191217123851.8854-1-david@redhat.com> References: <20191217123851.8854-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: David Hildenbrand , Anshuman Khandual , linux-mm@kvack.org, Paul Mackerras , Andrew Morton , linuxppc-dev@lists.ozlabs.org, Rashmica Gupta , Allison Randal Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The code mentions "Trace memory needs to be aligned to the size", and e.g., round_up() is documented to work on power of 2 only. Also, the whole search is not optimized e.g., for being aligned to memory block size only while allocating multiple memory blocks. Let's just limit to powers of 2 that are at least the size of memory blocks - the granularity we are using for alloc/offline/unplug. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Andrew Morton Cc: David Hildenbrand Cc: Allison Randal Cc: Anshuman Khandual Cc: Balbir Singh Cc: Rashmica Gupta Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/memtrace.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index eb2e75dac369..0c4c54d2e3c4 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -268,15 +268,11 @@ static int memtrace_online(void) static int memtrace_enable_set(void *data, u64 val) { - u64 bytes; - - /* - * Don't attempt to do anything if size isn't aligned to a memory - * block or equal to zero. - */ - bytes = memory_block_size_bytes(); - if (val & (bytes - 1)) { - pr_err("Value must be aligned with 0x%llx\n", bytes); + const unsigned long bytes = memory_block_size_bytes(); + + if (val && (!is_power_of_2(val) || val < bytes)) { + pr_err("Value must be 0 or a power of 2 (at least 0x%lx)\n", + bytes); return -EINVAL; } From patchwork Tue Dec 17 12:38:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1211391 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47cdS01LyRz9sSf for ; Tue, 17 Dec 2019 23:56:16 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="MY1qeILg"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 47cdRz5tvnzDqWM for ; Tue, 17 Dec 2019 23:56:15 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=207.211.31.81; helo=us-smtp-delivery-1.mimecast.com; envelope-from=david@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="MY1qeILg"; dkim-atps=neutral Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 47cd4M2T9CzDqMl for ; Tue, 17 Dec 2019 23:39:14 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1576586350; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rIiQFFDXfUcgIvM2Nb3KxFpLZCLOM7WNsSYtHUtaG8Q=; b=MY1qeILgNrAwCD55NHyv2STptPVYC78rByaRKXQwyN/RFtIWPRmTH4knI1efArD4lhMFG8 a8MJKKx6A7H1cj4YbNYN0dPhDFsxsphjPxpG70v8uJIoRLQ5H8mWEIyiqDv7YxDfJStKM4 JCgcEUlhGRWfSydtniWOm8NgsJmJVJM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-159-muAGo-jKPuemWvDF66R4fw-1; Tue, 17 Dec 2019 07:39:06 -0500 X-MC-Unique: muAGo-jKPuemWvDF66R4fw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E3247800EBF; Tue, 17 Dec 2019 12:39:03 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.36.118.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id 393FA10016DA; Tue, 17 Dec 2019 12:39:01 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Subject: [PATCH RFC v1 2/3] powerpc/memtrace: Factor out readding memory into memtrace_free_node() Date: Tue, 17 Dec 2019 13:38:50 +0100 Message-Id: <20191217123851.8854-3-david@redhat.com> In-Reply-To: <20191217123851.8854-1-david@redhat.com> References: <20191217123851.8854-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jens Axboe , Rashmica Gupta , David Hildenbrand , Anshuman Khandual , linux-mm@kvack.org, Paul Mackerras , Andrew Morton , linuxppc-dev@lists.ozlabs.org, Thomas Gleixner , Allison Randal Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" While at it, move it, we want to reuse it soon. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Andrew Morton Cc: David Hildenbrand Cc: Allison Randal Cc: Jens Axboe Cc: Anshuman Khandual Cc: Thomas Gleixner Cc: Balbir Singh Cc: Rashmica Gupta Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/memtrace.c | 44 ++++++++++++++--------- 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 0c4c54d2e3c4..2d2a0a2acd60 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -50,6 +50,32 @@ static const struct file_operations memtrace_fops = { .open = simple_open, }; +static int online_mem_block(struct memory_block *mem, void *arg) +{ + return device_online(&mem->dev); +} + +static int memtrace_free_node(int nid, unsigned long start, unsigned long size) +{ + int ret; + + ret = add_memory(nid, start, size); + if (!ret) { + /* + * If the kernel isn't compiled with the auto online option, we + * will try to online ourselves. We'll ignore any errors here - + * user space can try to online itself later (after all, the + * memory was added successfully). + */ + if (!memhp_auto_online) { + lock_device_hotplug(); + walk_memory_blocks(start, size, NULL, online_mem_block); + unlock_device_hotplug(); + } + } + return ret; +} + static int check_memblock_online(struct memory_block *mem, void *arg) { if (mem->state != MEM_ONLINE) @@ -202,11 +228,6 @@ static int memtrace_init_debugfs(void) return ret; } -static int online_mem_block(struct memory_block *mem, void *arg) -{ - return device_online(&mem->dev); -} - /* * Iterate through the chunks of memory we have removed from the kernel * and attempt to add them back to the kernel. @@ -229,24 +250,13 @@ static int memtrace_online(void) ent->mem = 0; } - if (add_memory(ent->nid, ent->start, ent->size)) { + if (memtrace_free_node(ent->nid, ent->start, ent->size)) { pr_err("Failed to add trace memory to node %d\n", ent->nid); ret += 1; continue; } - /* - * If kernel isn't compiled with the auto online option - * we need to online the memory ourselves. - */ - if (!memhp_auto_online) { - lock_device_hotplug(); - walk_memory_blocks(ent->start, ent->size, NULL, - online_mem_block); - unlock_device_hotplug(); - } - /* * Memory was added successfully so clean up references to it * so on reentry we can tell that this chunk was added. From patchwork Tue Dec 17 12:38:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1211396 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47cdZm032Fz9sSK for ; Wed, 18 Dec 2019 00:02:08 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="C+C4z2Ky"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 47cdZl3f7DzDqY1 for ; Wed, 18 Dec 2019 00:02:07 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=205.139.110.61; helo=us-smtp-delivery-1.mimecast.com; envelope-from=david@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="C+C4z2Ky"; dkim-atps=neutral Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 47cd4N4JzfzDqWq for ; Tue, 17 Dec 2019 23:39:16 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1576586353; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nFyUkuPsqGsiw1cWJJlXr39io2FjbuA5eTJTxNYIdwY=; b=C+C4z2KyFdhyrDacbj9NkQZ03Ykot6DTe3vxYaOZDTIt1AOkpEIjEFJ9eZYuCUbpTva8KK pAsdAsnNOJaAJp1hCefwFG1fTWBTwNpDQCVCTmG10kKZNRd5p85Pk/2OXicValeYDWdARv nxy9B1nuDiaFwKE3n/uxBY2NgdZLISg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-378-b5utL5nIO3-6gvIu0j3fwA-1; Tue, 17 Dec 2019 07:39:09 -0500 X-MC-Unique: b5utL5nIO3-6gvIu0j3fwA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6121A8017DF; Tue, 17 Dec 2019 12:39:07 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.36.118.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id 38AF51000325; Tue, 17 Dec 2019 12:39:04 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Subject: [PATCH RFC v1 3/3] powerpc/memtrace: Don't offline memory blocks via offline_pages() Date: Tue, 17 Dec 2019 13:38:51 +0100 Message-Id: <20191217123851.8854-4-david@redhat.com> In-Reply-To: <20191217123851.8854-1-david@redhat.com> References: <20191217123851.8854-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jens Axboe , Rashmica Gupta , David Hildenbrand , Anshuman Khandual , Michal Hocko , linux-mm@kvack.org, Paul Mackerras , Andrew Morton , linuxppc-dev@lists.ozlabs.org, Thomas Gleixner , Allison Randal , Oscar Salvador Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" offline_pages() should not be called outside of the MM core. Especially, having to manually fiddle with the memory block states is a sign that this is not a good idea. To offline memory block devices cleanly, device_offline() should be used. This is the only remaining caller of offline_pages(), except the official device_offline() way. E.g., when trying to allocate right now we trigger messages like [ 11.227817] page:c00c000000182000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 [ 11.228056] raw: 007ffff000000000 c000000001538860 c000000001538860 0000000000000000 [ 11.228070] raw: 0000000000000000 0000000000000001 00000001ffffffff 0000000000000000 [ 11.228097] page dumped because: unmovable page and theoretically we might end up looping quite a long time trying to offline memory, which would have to be canceled by the user manually (CTRL-C). Memtrace needs to identify+allocate multiple consecutive memory blocks. It also has to remove the memory blocks to remove all page tables (HW requirement). Let's use alloc_contig_pages() to allocate memory that spans multiple memory block devices. We can then set all pages PageOffline() to allow these pages to get isolated. A temporary memory notifier can then make offlining of these pages succeed by dropping its reference to the pages on MEM_GOING_OFFLINE events(as documented in include/linux/page-flags.h for PageOffline() pages). Error handling is a bit tricky. Note1: ZONE_MOVABLE memory blocks won't be considered. Not sure if that was ever really relevant. (unmovable data would end up on these memory blocks for a tiny little time frame) Note2: We don't have to care about online_page_callback_t, as we forbid re-onlining from our memory notifier. Note3: I was told this feature is never used along with DIMM-based memory hotunplug - otherwise bad things will happen when a DIMM would try to remove "alread-removed" memory (that is still in use). Tested under QEMU with powernv emulation (kernel + initramfs). $ mount -t debugfs none /sys/kernel/debug/ $ cat /sys/devices/system/memory/block_size_bytes 10000000 $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 19.809790] Offlined Pages 4096 [ 19.835842] Offlined Pages 4096 [ 19.853136] memtrace: Allocated trace memory on node 0 at 0x0000000040000000 Unfortunately, QEMU does not support NUMA for powernv yet, so I cannot test that. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Andrew Morton Cc: David Hildenbrand Cc: Allison Randal Cc: Jens Axboe Cc: Anshuman Khandual Cc: Thomas Gleixner Cc: Michal Hocko Cc: Oscar Salvador Cc: Balbir Singh Cc: Rashmica Gupta Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/Kconfig | 1 + arch/powerpc/platforms/powernv/memtrace.c | 175 ++++++++++++++-------- 2 files changed, 112 insertions(+), 64 deletions(-) diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig index 938803eab0ad..571a0fa9f055 100644 --- a/arch/powerpc/platforms/powernv/Kconfig +++ b/arch/powerpc/platforms/powernv/Kconfig @@ -29,6 +29,7 @@ config OPAL_PRD config PPC_MEMTRACE bool "Enable removal of RAM from kernel mappings for tracing" depends on PPC_POWERNV && MEMORY_HOTREMOVE + select CONTIG_ALLOC help Enabling this option allows for the removal of memory (RAM) from the kernel mappings to be used for hardware tracing. diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 2d2a0a2acd60..fe1e8f3926a1 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -76,83 +76,130 @@ static int memtrace_free_node(int nid, unsigned long start, unsigned long size) return ret; } -static int check_memblock_online(struct memory_block *mem, void *arg) -{ - if (mem->state != MEM_ONLINE) - return -1; - - return 0; -} - -static int change_memblock_state(struct memory_block *mem, void *arg) -{ - unsigned long state = (unsigned long)arg; - - mem->state = state; - - return 0; -} +struct memtrace_alloc_info { + struct notifier_block memory_notifier; + unsigned long base_pfn; + unsigned long nr_pages; +}; -/* called with device_hotplug_lock held */ -static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) +static int memtrace_memory_notifier_cb(struct notifier_block *nb, + unsigned long action, void *arg) { - const unsigned long start = PFN_PHYS(start_pfn); - const unsigned long size = PFN_PHYS(nr_pages); - - if (walk_memory_blocks(start, size, NULL, check_memblock_online)) - return false; - - walk_memory_blocks(start, size, (void *)MEM_GOING_OFFLINE, - change_memblock_state); - - if (offline_pages(start_pfn, nr_pages)) { - walk_memory_blocks(start, size, (void *)MEM_ONLINE, - change_memblock_state); - return false; + struct memtrace_alloc_info *info = container_of(nb, + struct memtrace_alloc_info, + memory_notifier); + unsigned long pfn, start_pfn, end_pfn; + const struct memory_notify *mhp = arg; + static bool going_offline; + + /* Ignore ranges that don't overlap. */ + if (mhp->start_pfn + mhp->nr_pages <= info->base_pfn || + info->base_pfn + info->nr_pages <= mhp->start_pfn) + return NOTIFY_OK; + + start_pfn = max_t(unsigned long, info->base_pfn, mhp->start_pfn); + end_pfn = min_t(unsigned long, info->base_pfn + info->nr_pages, + mhp->start_pfn + mhp->nr_pages); + + /* + * Drop our reference to the allocated (PageOffline()) pages, but + * reaquire them in case offlining fails. We might get called for + * MEM_CANCEL_OFFLINE but not for MEM_GOING_OFFLINE in case another + * notifier aborted offlining. + */ + switch (action) { + case MEM_GOING_OFFLINE: + for (pfn = start_pfn; pfn < end_pfn; pfn++) + page_ref_dec(pfn_to_page(pfn)); + going_offline = true; + break; + case MEM_CANCEL_OFFLINE: + if (going_offline) + for (pfn = start_pfn; pfn < end_pfn; pfn++) + page_ref_inc(pfn_to_page(pfn)); + going_offline = false; + break; + case MEM_GOING_ONLINE: + /* + * While our notifier is active, user space could + * offline+re-online this memory. Disallow any such activity. + */ + return notifier_to_errno(-EBUSY); } - - walk_memory_blocks(start, size, (void *)MEM_OFFLINE, - change_memblock_state); - - - return true; + return NOTIFY_OK; } static u64 memtrace_alloc_node(u32 nid, u64 size) { - u64 start_pfn, end_pfn, nr_pages, pfn; - u64 base_pfn; - u64 bytes = memory_block_size_bytes(); + const unsigned long memory_block_bytes = memory_block_size_bytes(); + const unsigned long nr_pages = size >> PAGE_SHIFT; + struct memtrace_alloc_info info = { + .memory_notifier = { + .notifier_call = memtrace_memory_notifier_cb, + }, + }; + unsigned long base_pfn, to_remove_pfn, pfn; + struct page *page; + int ret; if (!node_spanned_pages(nid)) return 0; - start_pfn = node_start_pfn(nid); - end_pfn = node_end_pfn(nid); - nr_pages = size >> PAGE_SHIFT; - - /* Trace memory needs to be aligned to the size */ - end_pfn = round_down(end_pfn - nr_pages, nr_pages); - - lock_device_hotplug(); - for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -= nr_pages) { - if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true) { - /* - * Remove memory in memory block size chunks so that - * iomem resources are always split to the same size and - * we never try to remove memory that spans two iomem - * resources. - */ - end_pfn = base_pfn + nr_pages; - for (pfn = base_pfn; pfn < end_pfn; pfn += bytes>> PAGE_SHIFT) { - __remove_memory(nid, pfn << PAGE_SHIFT, bytes); - } - unlock_device_hotplug(); - return base_pfn << PAGE_SHIFT; - } + /* + * Try to allocate memory (that might span multiple memory blocks) + * on the requested node. Trace memory needs to be aligned to the size, + * which is guaranteed by alloc_contig_pages(). + */ + page = alloc_contig_pages(nr_pages, __GFP_THISNODE, nid, NULL); + if (!page) + return 0; + to_remove_pfn = base_pfn = page_to_pfn(page); + info.base_pfn = base_pfn; + info.nr_pages = nr_pages; + + /* PageOffline() allows to isolate the memory when offlining. */ + for (pfn = base_pfn; pfn < base_pfn + nr_pages; pfn++) + __SetPageOffline(pfn_to_page(pfn)); + + /* A temporary memory notifier allows to offline the isolated memory. */ + ret = register_memory_notifier(&info.memory_notifier); + if (ret) + goto out_free_pages; + + /* + * Try to offline and remove all involved memory blocks. This will + * only fail in the unlikely event that another memory notifier NACKs + * the offlining request - no memory has to be migrated. + * + * Remove memory in memory block size chunks so that iomem resources + * are always split to the same size and we never try to remove memory + * that spans two iomem resources. + */ + for (; to_remove_pfn < base_pfn + nr_pages; + to_remove_pfn += PHYS_PFN(memory_block_bytes)) { + ret = offline_and_remove_memory(nid, PFN_PHYS(to_remove_pfn), + memory_block_bytes); + if (ret) + goto out_readd_memory; } - unlock_device_hotplug(); + unregister_memory_notifier(&info.memory_notifier); + return PFN_PHYS(base_pfn); +out_readd_memory: + /* Unregister before adding+onlining (notifer blocks onlining). */ + unregister_memory_notifier(&info.memory_notifier); + if (to_remove_pfn != base_pfn) { + ret = memtrace_free_node(nid, PFN_PHYS(base_pfn), + PFN_PHYS(to_remove_pfn - base_pfn)); + if (ret) + /* Even more unlikely, log and ignore. */ + pr_err("Failed to add trace memory to node %d\n", nid); + } +out_free_pages: + /* Only free memory that was not temporarily offlined+removed. */ + for (pfn = to_remove_pfn; pfn < base_pfn + nr_pages; pfn++) + __ClearPageOffline(pfn_to_page(pfn)); + free_contig_range(to_remove_pfn, nr_pages - (to_remove_pfn - base_pfn)); return 0; }