Message ID | alpine.LSU.2.11.1601091651130.9808@eggly.anvils (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Sat, Jan 09, 2016 at 04:54:59PM -0800, Hugh Dickins wrote: > Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y > but CONFIG_MEM_SOFT_DIRTY is not set. That's because the non-zero > _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not > discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be > recognized. > > (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on > CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete > attempt to solve this problem.) > > It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and > and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff > should be made more robust; but nevertheless, fix up the powerpc ifdefs > as x86_64 and s390 (which met the same problem) have them, defining the > bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set. > > Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Thank you, Hugh!
Hugh Dickins <hughd@google.com> writes: > Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y > but CONFIG_MEM_SOFT_DIRTY is not set. That's because the non-zero > _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not > discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be > recognized. > > (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on > CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete > attempt to solve this problem.) > > It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and > and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff > should be made more robust; but nevertheless, fix up the powerpc ifdefs > as x86_64 and s390 (which met the same problem) have them, defining the > bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set. Do we need this patch, if we make the maybe_same_pte() more robust. The #ifdef with pte bits is always a confusing one and IMHO, we should avoid that if we can ? > > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > > arch/powerpc/include/asm/book3s/64/hash.h | 5 +++++ > arch/powerpc/include/asm/book3s/64/pgtable.h | 9 ++++++--- > 2 files changed, 11 insertions(+), 3 deletions(-) > > --- 4.4-next/arch/powerpc/include/asm/book3s/64/hash.h 2016-01-06 11:54:01.377508976 -0800 > +++ linux/arch/powerpc/include/asm/book3s/64/hash.h 2016-01-09 13:54:24.410893347 -0800 > @@ -33,7 +33,12 @@ > #define _PAGE_F_GIX_SHIFT 12 > #define _PAGE_F_SECOND 0x08000 /* Whether to use secondary hash or not */ > #define _PAGE_SPECIAL 0x10000 /* software: special page */ > + > +#ifdef CONFIG_MEM_SOFT_DIRTY > #define _PAGE_SOFT_DIRTY 0x20000 /* software: software dirty tracking */ > +#else > +#define _PAGE_SOFT_DIRTY 0x00000 > +#endif > > /* > * We need to differentiate between explicit huge page and THP huge > --- 4.4-next/arch/powerpc/include/asm/book3s/64/pgtable.h 2016-01-06 11:54:01.377508976 -0800 > +++ linux/arch/powerpc/include/asm/book3s/64/pgtable.h 2016-01-09 13:54:24.410893347 -0800 > @@ -162,8 +162,13 @@ static inline void pgd_set(pgd_t *pgdp, > #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) }) > #define __swp_entry_to_pte(x) __pte((x).val) > > -#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY > +#ifdef CONFIG_MEM_SOFT_DIRTY > #define _PAGE_SWP_SOFT_DIRTY (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE)) > +#else > +#define _PAGE_SWP_SOFT_DIRTY 0UL > +#endif /* CONFIG_MEM_SOFT_DIRTY */ > + > +#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY > static inline pte_t pte_swp_mksoft_dirty(pte_t pte) > { > return __pte(pte_val(pte) | _PAGE_SWP_SOFT_DIRTY); > @@ -176,8 +181,6 @@ static inline pte_t pte_swp_clear_soft_d > { > return __pte(pte_val(pte) & ~_PAGE_SWP_SOFT_DIRTY); > } > -#else > -#define _PAGE_SWP_SOFT_DIRTY 0 > #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ > > void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
On Mon, 11 Jan 2016, Aneesh Kumar K.V wrote: > Hugh Dickins <hughd@google.com> writes: > > > Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y > > but CONFIG_MEM_SOFT_DIRTY is not set. That's because the non-zero > > _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not > > discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be > > recognized. > > > > (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on > > CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete > > attempt to solve this problem.) > > > > It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and > > and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff > > should be made more robust; but nevertheless, fix up the powerpc ifdefs > > as x86_64 and s390 (which met the same problem) have them, defining the > > bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set. > > Do we need this patch, if we make the maybe_same_pte() more robust. The > #ifdef with pte bits is always a confusing one and IMHO, we should avoid > that if we can ? If maybe_same_pte() were more robust (as in the pte_same_as_swp() patch), this patch here becomes an optimization rather than a correctness patch: without this patch here, pte_same_as_swp() will perform an unnecessary transformation (masking out _PAGE_SWP_SOFT_DIRTY) from every one of the millions of ptes it has to examine, on configs where it couldn't be set. Or perhaps the processor gets that all nicely lined up without any actual delay, I don't know. I've already agreed that the way SOFT_DIRTY is currently config'ed is too confusing; but until that's improved, I strongly recommend that you follow the same way of handling this as x86_64 and s390 are doing - going off and doing it differently is liable to lead to error, as we have seen. So I recommend using the patch below too, whether or not you care for the optimization. Hugh > > > > > Signed-off-by: Hugh Dickins <hughd@google.com> > > --- > > > > arch/powerpc/include/asm/book3s/64/hash.h | 5 +++++ > > arch/powerpc/include/asm/book3s/64/pgtable.h | 9 ++++++--- > > 2 files changed, 11 insertions(+), 3 deletions(-) > > > > --- 4.4-next/arch/powerpc/include/asm/book3s/64/hash.h 2016-01-06 11:54:01.377508976 -0800 > > +++ linux/arch/powerpc/include/asm/book3s/64/hash.h 2016-01-09 13:54:24.410893347 -0800 > > @@ -33,7 +33,12 @@ > > #define _PAGE_F_GIX_SHIFT 12 > > #define _PAGE_F_SECOND 0x08000 /* Whether to use secondary hash or not */ > > #define _PAGE_SPECIAL 0x10000 /* software: special page */ > > + > > +#ifdef CONFIG_MEM_SOFT_DIRTY > > #define _PAGE_SOFT_DIRTY 0x20000 /* software: software dirty tracking */ > > +#else > > +#define _PAGE_SOFT_DIRTY 0x00000 > > +#endif > > > > /* > > * We need to differentiate between explicit huge page and THP huge > > --- 4.4-next/arch/powerpc/include/asm/book3s/64/pgtable.h 2016-01-06 11:54:01.377508976 -0800 > > +++ linux/arch/powerpc/include/asm/book3s/64/pgtable.h 2016-01-09 13:54:24.410893347 -0800 > > @@ -162,8 +162,13 @@ static inline void pgd_set(pgd_t *pgdp, > > #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) }) > > #define __swp_entry_to_pte(x) __pte((x).val) > > > > -#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY > > +#ifdef CONFIG_MEM_SOFT_DIRTY > > #define _PAGE_SWP_SOFT_DIRTY (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE)) > > +#else > > +#define _PAGE_SWP_SOFT_DIRTY 0UL > > +#endif /* CONFIG_MEM_SOFT_DIRTY */ > > + > > +#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY > > static inline pte_t pte_swp_mksoft_dirty(pte_t pte) > > { > > return __pte(pte_val(pte) | _PAGE_SWP_SOFT_DIRTY); > > @@ -176,8 +181,6 @@ static inline pte_t pte_swp_clear_soft_d > > { > > return __pte(pte_val(pte) & ~_PAGE_SWP_SOFT_DIRTY); > > } > > -#else > > -#define _PAGE_SWP_SOFT_DIRTY 0 > > #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ > > > > void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
Hugh Dickins <hughd@google.com> writes: > On Mon, 11 Jan 2016, Aneesh Kumar K.V wrote: >> Hugh Dickins <hughd@google.com> writes: >> >> > Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y >> > but CONFIG_MEM_SOFT_DIRTY is not set. That's because the non-zero >> > _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not >> > discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be >> > recognized. >> > >> > (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on >> > CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete >> > attempt to solve this problem.) >> > >> > It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and >> > and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff >> > should be made more robust; but nevertheless, fix up the powerpc ifdefs >> > as x86_64 and s390 (which met the same problem) have them, defining the >> > bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set. >> >> Do we need this patch, if we make the maybe_same_pte() more robust. The >> #ifdef with pte bits is always a confusing one and IMHO, we should avoid >> that if we can ? > > If maybe_same_pte() were more robust (as in the pte_same_as_swp() patch), > this patch here becomes an optimization rather than a correctness patch: > without this patch here, pte_same_as_swp() will perform an unnecessary > transformation (masking out _PAGE_SWP_SOFT_DIRTY) from every one of the > millions of ptes it has to examine, on configs where it couldn't be set. > Or perhaps the processor gets that all nicely lined up without any actual > delay, I don't know. But we have #ifndef CONFIG_HAVE_ARCH_SOFT_DIRTY static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) { return pte; } #endif If we fix the CONFIG_HAVE_ARCH_SOFT_DIRTY correctly, we can do the same optmization without the #ifdef of pte bits right ? > > I've already agreed that the way SOFT_DIRTY is currently config'ed is > too confusing; but until that's improved, I strongly recommend that you > follow the same way of handling this as x86_64 and s390 are doing - going > off and doing it differently is liable to lead to error, as we have seen. > > So I recommend using the patch below too, whether or not you care for > the optimization. > > Hugh -aneesh
On Mon, 11 Jan 2016, Aneesh Kumar K.V wrote: > Hugh Dickins <hughd@google.com> writes: > > On Mon, 11 Jan 2016, Aneesh Kumar K.V wrote: > >> Hugh Dickins <hughd@google.com> writes: > >> > >> > Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y > >> > but CONFIG_MEM_SOFT_DIRTY is not set. That's because the non-zero > >> > _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not > >> > discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be > >> > recognized. > >> > > >> > (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on > >> > CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete > >> > attempt to solve this problem.) > >> > > >> > It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and > >> > and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff > >> > should be made more robust; but nevertheless, fix up the powerpc ifdefs > >> > as x86_64 and s390 (which met the same problem) have them, defining the > >> > bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set. > >> > >> Do we need this patch, if we make the maybe_same_pte() more robust. The > >> #ifdef with pte bits is always a confusing one and IMHO, we should avoid > >> that if we can ? > > > > If maybe_same_pte() were more robust (as in the pte_same_as_swp() patch), > > this patch here becomes an optimization rather than a correctness patch: > > without this patch here, pte_same_as_swp() will perform an unnecessary > > transformation (masking out _PAGE_SWP_SOFT_DIRTY) from every one of the > > millions of ptes it has to examine, on configs where it couldn't be set. > > Or perhaps the processor gets that all nicely lined up without any actual > > delay, I don't know. > > But we have > #ifndef CONFIG_HAVE_ARCH_SOFT_DIRTY > static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) > { > return pte; > } > #endif > > If we fix the CONFIG_HAVE_ARCH_SOFT_DIRTY correctly, we can do the same > optmization without the #ifdef of pte bits right ? I'm not sure that I understand you (I'll have to look at your patch), but suspect you're not optimizing the CONFIG_HAVE_ARCH_SOFT_DIRTY=y CONFIG_MEM_SOFT_DIRTY not set case. Which would not be the end of the world, but... > > > > I've already agreed that the way SOFT_DIRTY is currently config'ed is > > too confusing; but until that's improved, I strongly recommend that you > > follow the same way of handling this as x86_64 and s390 are doing - going > > off and doing it differently is liable to lead to error, as we have seen. ... as before, I don't think that doing it differently is a good idea. Hugh
On 10/01/2016 01:54, Hugh Dickins wrote: > Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y > but CONFIG_MEM_SOFT_DIRTY is not set. That's because the non-zero > _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not > discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be > recognized. > > (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on > CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete > attempt to solve this problem.) > > It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and > and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff > should be made more robust; but nevertheless, fix up the powerpc ifdefs > as x86_64 and s390 (which met the same problem) have them, defining the > bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set. > > Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Thanks, Hugh!
On Sun, 2016-10-01 at 00:54:59 UTC, Hugh Dickins wrote: > Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y > but CONFIG_MEM_SOFT_DIRTY is not set. That's because the non-zero > _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not > discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be > recognized. > > (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on > CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete > attempt to solve this problem.) > > It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and > and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff > should be made more robust; but nevertheless, fix up the powerpc ifdefs > as x86_64 and s390 (which met the same problem) have them, defining the > bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set. > > Signed-off-by: Hugh Dickins <hughd@google.com> > Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> > Acked-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/2f10f1a7884e97a68e52c4b6f7 cheers
--- 4.4-next/arch/powerpc/include/asm/book3s/64/hash.h 2016-01-06 11:54:01.377508976 -0800 +++ linux/arch/powerpc/include/asm/book3s/64/hash.h 2016-01-09 13:54:24.410893347 -0800 @@ -33,7 +33,12 @@ #define _PAGE_F_GIX_SHIFT 12 #define _PAGE_F_SECOND 0x08000 /* Whether to use secondary hash or not */ #define _PAGE_SPECIAL 0x10000 /* software: special page */ + +#ifdef CONFIG_MEM_SOFT_DIRTY #define _PAGE_SOFT_DIRTY 0x20000 /* software: software dirty tracking */ +#else +#define _PAGE_SOFT_DIRTY 0x00000 +#endif /* * We need to differentiate between explicit huge page and THP huge --- 4.4-next/arch/powerpc/include/asm/book3s/64/pgtable.h 2016-01-06 11:54:01.377508976 -0800 +++ linux/arch/powerpc/include/asm/book3s/64/pgtable.h 2016-01-09 13:54:24.410893347 -0800 @@ -162,8 +162,13 @@ static inline void pgd_set(pgd_t *pgdp, #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) }) #define __swp_entry_to_pte(x) __pte((x).val) -#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY +#ifdef CONFIG_MEM_SOFT_DIRTY #define _PAGE_SWP_SOFT_DIRTY (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE)) +#else +#define _PAGE_SWP_SOFT_DIRTY 0UL +#endif /* CONFIG_MEM_SOFT_DIRTY */ + +#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY static inline pte_t pte_swp_mksoft_dirty(pte_t pte) { return __pte(pte_val(pte) | _PAGE_SWP_SOFT_DIRTY); @@ -176,8 +181,6 @@ static inline pte_t pte_swp_clear_soft_d { return __pte(pte_val(pte) & ~_PAGE_SWP_SOFT_DIRTY); } -#else -#define _PAGE_SWP_SOFT_DIRTY 0 #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y but CONFIG_MEM_SOFT_DIRTY is not set. That's because the non-zero _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be recognized. (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete attempt to solve this problem.) It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff should be made more robust; but nevertheless, fix up the powerpc ifdefs as x86_64 and s390 (which met the same problem) have them, defining the bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set. Signed-off-by: Hugh Dickins <hughd@google.com> --- arch/powerpc/include/asm/book3s/64/hash.h | 5 +++++ arch/powerpc/include/asm/book3s/64/pgtable.h | 9 ++++++--- 2 files changed, 11 insertions(+), 3 deletions(-)