um: implement flush_cache_vmap/flush_cache_vunmap

Message ID	20210315233804.d3e52f6a3422.I9672eef7dfa7ce6c3de1ccf7ab8d9aad1fa7f3a6@changeid
State	Accepted
Headers	show Return-Path: <linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org> From: Johannes Berg <johannes@sipsolutions.net> To: linux-um@lists.infradead.org Cc: Johannes Berg <johannes.berg@intel.com> Subject: [PATCH] um: implement flush_cache_vmap/flush_cache_vunmap Date: Mon, 15 Mar 2021 23:38:04 +0100 Message-Id: <20210315233804.d3e52f6a3422.I9672eef7dfa7ce6c3de1ccf7ab8d9aad1fa7f3a6@changeid> MIME-Version: 1.0 preview: From: Johannes Berg <johannes.berg@intel.com> vmalloc() heavy workloads in UML are extremely slow, due to flushing the entire kernel VM space (flush_tlb_kernel_vm()) on the first segfault. Implement flush_cache_vmap() to avoid that, and while at it also add flush_cache_vunmap() since it's trivial. Content analysis details: (0.4 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.0 SPF_NONE SPF: sender does not publish an SPF Record 0.4 KHOP_HELO_FCRDNS Relay HELO differs from its IP's reverse DNS Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-um" <linux-um-bounces@lists.infradead.org> Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org
Series	um: implement flush_cache_vmap/flush_cache_vunmap \| expand um: implement flush_cache_vmap/flush_cache_vunmap

Message ID

20210315233804.d3e52f6a3422.I9672eef7dfa7ce6c3de1ccf7ab8d9aad1fa7f3a6@changeid

State

Accepted

Headers

From: Johannes Berg <johannes@sipsolutions.net>
To: linux-um@lists.infradead.org
Cc: Johannes Berg <johannes.berg@intel.com>
Subject: [PATCH] um: implement flush_cache_vmap/flush_cache_vunmap
Date: Mon, 15 Mar 2021 23:38:04 +0100
Message-Id: 
 <20210315233804.d3e52f6a3422.I9672eef7dfa7ce6c3de1ccf7ab8d9aad1fa7f3a6@changeid>
MIME-Version: 1.0
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-um" <linux-um-bounces@lists.infradead.org>
Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org

Series

um: implement flush_cache_vmap/flush_cache_vunmap | expand

Commit Message

Johannes Berg March 15, 2021, 10:38 p.m. UTC

From: Johannes Berg <johannes.berg@intel.com>

vmalloc() heavy workloads in UML are extremely slow, due to
flushing the entire kernel VM space (flush_tlb_kernel_vm())
on the first segfault.

Implement flush_cache_vmap() to avoid that, and while at it
also add flush_cache_vunmap() since it's trivial.

This speeds up my vmalloc() heavy test of copying files out
from /sys/kernel/debug/gcov/ by 30x (from 30s to 1s.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 arch/um/include/asm/cacheflush.h | 9 +++++++++
 arch/um/include/asm/tlb.h        | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 arch/um/include/asm/cacheflush.h

Comments

Anton Ivanov March 16, 2021, 8:46 a.m. UTC | #1

On 15/03/2021 22:38, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> vmalloc() heavy workloads in UML are extremely slow, due to
> flushing the entire kernel VM space (flush_tlb_kernel_vm())
> on the first segfault.
> 
> Implement flush_cache_vmap() to avoid that, and while at it
> also add flush_cache_vunmap() since it's trivial.
> 
> This speeds up my vmalloc() heavy test of copying files out
> from /sys/kernel/debug/gcov/ by 30x (from 30s to 1s.)
> 
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
>   arch/um/include/asm/cacheflush.h | 9 +++++++++
>   arch/um/include/asm/tlb.h        | 2 +-
>   2 files changed, 10 insertions(+), 1 deletion(-)
>   create mode 100644 arch/um/include/asm/cacheflush.h
> 
> diff --git a/arch/um/include/asm/cacheflush.h b/arch/um/include/asm/cacheflush.h
> new file mode 100644
> index 000000000000..4c9858cd36ec
> --- /dev/null
> +++ b/arch/um/include/asm/cacheflush.h
> @@ -0,0 +1,9 @@
> +#ifndef __UM_ASM_CACHEFLUSH_H
> +#define __UM_ASM_CACHEFLUSH_H
> +
> +#include <asm/tlbflush.h>
> +#define flush_cache_vmap flush_tlb_kernel_range
> +#define flush_cache_vunmap flush_tlb_kernel_range
> +
> +#include <asm-generic/cacheflush.h>
> +#endif /* __UM_ASM_CACHEFLUSH_H */
> diff --git a/arch/um/include/asm/tlb.h b/arch/um/include/asm/tlb.h
> index ff9c62828962..0422467bda5b 100644
> --- a/arch/um/include/asm/tlb.h
> +++ b/arch/um/include/asm/tlb.h
> @@ -5,7 +5,7 @@
>   #include <linux/mm.h>
>   
>   #include <asm/tlbflush.h>
> -#include <asm-generic/cacheflush.h>
> +#include <asm/cacheflush.h>
>   #include <asm-generic/tlb.h>
>   
>   #endif
> 

Well spotted.

Unless I am mistaken, there may be a slightly better way of doing it.

We can implement arch_sync_kernel_mappings() and sync only where the page modified mask says so by setting ARCH_PAGE_TABLE_SYNC_MASK

This way flush_cache_* can remain nops as in asm-generic

I am going to give that a spin, if it works, I will post it to the list by lunchtime GMT

 >

Johannes Berg March 16, 2021, 8:55 a.m. UTC | #2

On Tue, 2021-03-16 at 08:46 +0000, Anton Ivanov wrote:
> 
> Well spotted.
> 
> Unless I am mistaken, there may be a slightly better way of doing it.
> 
> We can implement arch_sync_kernel_mappings() and sync only where the
> page modified mask says so by setting ARCH_PAGE_TABLE_SYNC_MASK
> 
> This way flush_cache_* can remain nops as in asm-generic

Would that actually buy us anything?

Not that I mind, or even understand the TLB code well, but it seems
fairly similar?

> I am going to give that a spin, if it works, I will post it to the list by lunchtime GMT

Sounds good to me :)

I also made these patches:

https://lore.kernel.org/lkml/20210315235453.e3fbb86e99a0.I08a3ee6dbe47ea3e8024956083f162884a958e40@changeid/T/#u


so my "vmalloc-heavy workload" no longer is vmalloc heavy since it now
uses kvmalloc and never hits vmalloc, always kmalloc :)

johannes

Anton Ivanov March 16, 2021, 8:59 a.m. UTC | #3

On 16/03/2021 08:55, Johannes Berg wrote:
> On Tue, 2021-03-16 at 08:46 +0000, Anton Ivanov wrote:
>>
>> Well spotted.
>>
>> Unless I am mistaken, there may be a slightly better way of doing it.
>>
>> We can implement arch_sync_kernel_mappings() and sync only where the
>> page modified mask says so by setting ARCH_PAGE_TABLE_SYNC_MASK
>>
>> This way flush_cache_* can remain nops as in asm-generic
> 
> Would that actually buy us anything?

It makes flushing conditional on a specific mask from the result of the mapping op. In theory, should be better. In practice - probably more of the same.

> 
> Not that I mind, or even understand the TLB code well, but it seems
> fairly similar?
> 
>> I am going to give that a spin, if it works, I will post it to the list by lunchtime GMT
> 
> Sounds good to me :)
> 
> I also made these patches:
> 
> https://lore.kernel.org/lkml/20210315235453.e3fbb86e99a0.I08a3ee6dbe47ea3e8024956083f162884a958e40@changeid/T/#u
> 
> 
> so my "vmalloc-heavy workload" no longer is vmalloc heavy since it now
> uses kvmalloc and never hits vmalloc, always kmalloc :)

:)

> 
> johannes
> 
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
>

Anton Ivanov March 16, 2021, 11:10 a.m. UTC | #4

On 15/03/2021 22:38, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> vmalloc() heavy workloads in UML are extremely slow, due to
> flushing the entire kernel VM space (flush_tlb_kernel_vm())
> on the first segfault.
> 
> Implement flush_cache_vmap() to avoid that, and while at it
> also add flush_cache_vunmap() since it's trivial.
> 
> This speeds up my vmalloc() heavy test of copying files out
> from /sys/kernel/debug/gcov/ by 30x (from 30s to 1s.)
> 
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
>   arch/um/include/asm/cacheflush.h | 9 +++++++++
>   arch/um/include/asm/tlb.h        | 2 +-
>   2 files changed, 10 insertions(+), 1 deletion(-)
>   create mode 100644 arch/um/include/asm/cacheflush.h
> 
> diff --git a/arch/um/include/asm/cacheflush.h b/arch/um/include/asm/cacheflush.h
> new file mode 100644
> index 000000000000..4c9858cd36ec
> --- /dev/null
> +++ b/arch/um/include/asm/cacheflush.h
> @@ -0,0 +1,9 @@
> +#ifndef __UM_ASM_CACHEFLUSH_H
> +#define __UM_ASM_CACHEFLUSH_H
> +
> +#include <asm/tlbflush.h>
> +#define flush_cache_vmap flush_tlb_kernel_range
> +#define flush_cache_vunmap flush_tlb_kernel_range
> +
> +#include <asm-generic/cacheflush.h>
> +#endif /* __UM_ASM_CACHEFLUSH_H */
> diff --git a/arch/um/include/asm/tlb.h b/arch/um/include/asm/tlb.h
> index ff9c62828962..0422467bda5b 100644
> --- a/arch/um/include/asm/tlb.h
> +++ b/arch/um/include/asm/tlb.h
> @@ -5,7 +5,7 @@
>   #include <linux/mm.h>
>   
>   #include <asm/tlbflush.h>
> -#include <asm-generic/cacheflush.h>
> +#include <asm/cacheflush.h>
>   #include <asm-generic/tlb.h>
>   
>   #endif
> 
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>

diff --git a/arch/um/include/asm/cacheflush.h b/arch/um/include/asm/cacheflush.h
new file mode 100644
index 000000000000..4c9858cd36ec
--- /dev/null
+++ b/arch/um/include/asm/cacheflush.h
@@ -0,0 +1,9 @@ 
+#ifndef __UM_ASM_CACHEFLUSH_H
+#define __UM_ASM_CACHEFLUSH_H
+
+#include <asm/tlbflush.h>
+#define flush_cache_vmap flush_tlb_kernel_range
+#define flush_cache_vunmap flush_tlb_kernel_range
+
+#include <asm-generic/cacheflush.h>
+#endif /* __UM_ASM_CACHEFLUSH_H */
diff --git a/arch/um/include/asm/tlb.h b/arch/um/include/asm/tlb.h
index ff9c62828962..0422467bda5b 100644
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -5,7 +5,7 @@ 
 #include <linux/mm.h>
 
 #include <asm/tlbflush.h>
-#include <asm-generic/cacheflush.h>
+#include <asm/cacheflush.h>
 #include <asm-generic/tlb.h>
 
 #endif

um: implement flush_cache_vmap/flush_cache_vunmap

Commit Message

Comments

Patch