Message ID | 20241108101853.277808-1-sshegde@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | powerpc: Add preempt lazy support | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 5 jobs. |
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 21 jobs. |
On 2024-11-08 15:48:53 [+0530], Shrikanth Hegde wrote: > Define preempt lazy bit for Powerpc. Use bit 9 which is free and within > 16 bit range of NEED_RESCHED, so compiler can issue single andi. > > Since Powerpc doesn't use the generic entry/exit, add lazy check at exit > to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for > return to kernel. > > Ran a few benchmarks and db workload on Power10. Performance is close to > preempt=none/voluntary. It is possible that some patterns would > differ in lazy[2]. More details of preempt lazy is here [1] > > Since Powerpc system can have large core count and large memory, > preempt lazy is going to be helpful in avoiding soft lockup issues. > > [1]: https://lore.kernel.org/lkml/20241007074609.447006177@infradead.org/ > [2]: https://lore.kernel.org/all/1a973dda-c79e-4d95-935b-e4b93eb077b8@linux.ibm.com/ The lazy bits are only in tip. Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> > --- > diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c > index af62ec974b97..8f4acc55407b 100644 > --- a/arch/powerpc/kernel/interrupt.c > +++ b/arch/powerpc/kernel/interrupt.c > @@ -396,7 +396,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs) > /* Returning to a kernel context with local irqs enabled. */ > WARN_ON_ONCE(!(regs->msr & MSR_EE)); > again: > - if (IS_ENABLED(CONFIG_PREEMPT)) { > + if (IS_ENABLED(CONFIG_PREEMPTION)) { > /* Return to preemptible kernel context */ > if (unlikely(read_thread_flags() & _TIF_NEED_RESCHED)) { > if (preempt_count() == 0) Shouldn't exit_vmx_usercopy() get also this s@CONFIG_PREEMPT@CONFIG_PREEMPTION@ change ? Sebastian
Shrikanth Hegde <sshegde@linux.ibm.com> writes: > Define preempt lazy bit for Powerpc. Use bit 9 which is free and within > 16 bit range of NEED_RESCHED, so compiler can issue single andi. > > Since Powerpc doesn't use the generic entry/exit, add lazy check at exit > to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for > return to kernel. > > Ran a few benchmarks and db workload on Power10. Performance is close to > preempt=none/voluntary. It is possible that some patterns would > differ in lazy[2]. More details of preempt lazy is here [1] > > Since Powerpc system can have large core count and large memory, > preempt lazy is going to be helpful in avoiding soft lockup issues. > > [1]: https://lore.kernel.org/lkml/20241007074609.447006177@infradead.org/ > [2]: https://lore.kernel.org/all/1a973dda-c79e-4d95-935b-e4b93eb077b8@linux.ibm.com/ > > Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Looks good. Reviewed-by: <ankur.a.arora@oracle.com> However, I just checked and powerpc does not have CONFIG_KVM_XFER_TO_GUEST_WORK. Do you need this additional patch for handling the lazy bit at KVM guest entry? diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index f14329989e9a..7bdf7015bb65 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -84,7 +84,8 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) hard_irq_disable(); while (true) { - if (need_resched()) { + unsigned long tf = read_thread_flags(); + if (tf & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) { local_irq_enable(); cond_resched(); hard_irq_disable(); Ankur
Thank you Sebastian for taking a look and rwb tag. > On 2024-11-08 15:48:53 [+0530], Shrikanth Hegde wrote: >> Define preempt lazy bit for Powerpc. Use bit 9 which is free and within >> 16 bit range of NEED_RESCHED, so compiler can issue single andi. >> >> Since Powerpc doesn't use the generic entry/exit, add lazy check at exit >> to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for >> return to kernel. >> >> Ran a few benchmarks and db workload on Power10. Performance is close to >> preempt=none/voluntary. It is possible that some patterns would >> differ in lazy[2]. More details of preempt lazy is here [1] >> >> Since Powerpc system can have large core count and large memory, >> preempt lazy is going to be helpful in avoiding soft lockup issues. >> >> [1]: https://lore.kernel.org/lkml/20241007074609.447006177@infradead.org/ >> [2]: https://lore.kernel.org/all/1a973dda-c79e-4d95-935b-e4b93eb077b8@linux.ibm.com/ > > The lazy bits are only in tip. > Hi Michael, I sent it to powerpc tree since all the changes were in arch/powerpc. Please let me know if i have send it to tip tree instead. > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > >> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> >> --- >> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c >> index af62ec974b97..8f4acc55407b 100644 >> --- a/arch/powerpc/kernel/interrupt.c >> +++ b/arch/powerpc/kernel/interrupt.c >> @@ -396,7 +396,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs) >> /* Returning to a kernel context with local irqs enabled. */ >> WARN_ON_ONCE(!(regs->msr & MSR_EE)); >> again: >> - if (IS_ENABLED(CONFIG_PREEMPT)) { >> + if (IS_ENABLED(CONFIG_PREEMPTION)) { >> /* Return to preemptible kernel context */ >> if (unlikely(read_thread_flags() & _TIF_NEED_RESCHED)) { >> if (preempt_count() == 0) > > Shouldn't exit_vmx_usercopy() get also this > s@CONFIG_PREEMPT@CONFIG_PREEMPTION@ change ? > I had seen this, but wasn't sure. Will take a look at it. Thanks for the pointers. > Sebastian
On 11/9/24 00:36, Ankur Arora wrote: > > Shrikanth Hegde <sshegde@linux.ibm.com> writes: > >> Define preempt lazy bit for Powerpc. Use bit 9 which is free and within >> 16 bit range of NEED_RESCHED, so compiler can issue single andi. >> >> Since Powerpc doesn't use the generic entry/exit, add lazy check at exit >> to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for >> return to kernel. >> >> Ran a few benchmarks and db workload on Power10. Performance is close to >> preempt=none/voluntary. It is possible that some patterns would >> differ in lazy[2]. More details of preempt lazy is here [1] >> >> Since Powerpc system can have large core count and large memory, >> preempt lazy is going to be helpful in avoiding soft lockup issues. >> >> [1]: https://lore.kernel.org/lkml/20241007074609.447006177@infradead.org/ >> [2]: https://lore.kernel.org/all/1a973dda-c79e-4d95-935b-e4b93eb077b8@linux.ibm.com/ >> >> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> > > Looks good. Reviewed-by: <ankur.a.arora@oracle.com> Thank you Ankur for taking a look and rwb tag. > > However, I just checked and powerpc does not have > CONFIG_KVM_XFER_TO_GUEST_WORK. Do you need this additional patch > for handling the lazy bit at KVM guest entry? will take a look. Thanks for the pointers. > > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index f14329989e9a..7bdf7015bb65 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -84,7 +84,8 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) > hard_irq_disable(); > > while (true) { > - if (need_resched()) { > + unsigned long tf = read_thread_flags(); > + if (tf & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) { > local_irq_enable(); > cond_resched(); > hard_irq_disable(); > > > Ankur
Shrikanth Hegde <sshegde@linux.ibm.com> writes: > Thank you Sebastian for taking a look and rwb tag. > >> On 2024-11-08 15:48:53 [+0530], Shrikanth Hegde wrote: >>> Define preempt lazy bit for Powerpc. Use bit 9 which is free and within >>> 16 bit range of NEED_RESCHED, so compiler can issue single andi. >>> >>> Since Powerpc doesn't use the generic entry/exit, add lazy check at exit >>> to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for >>> return to kernel. >>> >>> Ran a few benchmarks and db workload on Power10. Performance is close to >>> preempt=none/voluntary. It is possible that some patterns would >>> differ in lazy[2]. More details of preempt lazy is here [1] >>> >>> Since Powerpc system can have large core count and large memory, >>> preempt lazy is going to be helpful in avoiding soft lockup issues. >>> >>> [1]: https://lore.kernel.org/lkml/20241007074609.447006177@infradead.org/ >>> [2]: https://lore.kernel.org/all/1a973dda-c79e-4d95-935b-e4b93eb077b8@linux.ibm.com/ >> >> The lazy bits are only in tip. > > Hi Michael, I sent it to powerpc tree since all the changes were in > arch/powerpc. Please let me know if i have send it to tip tree instead. I think I'd like it to have a full cycle of testing in next before going into mainline. So I'll plan to take this via the powerpc tree for the next cycle. I assume you haven't tested 32-bit at all? cheers
On 11/14/24 07:31, Michael Ellerman wrote: > Shrikanth Hegde <sshegde@linux.ibm.com> writes: >> Thank you Sebastian for taking a look and rwb tag. >> >>> On 2024-11-08 15:48:53 [+0530], Shrikanth Hegde wrote: >>>> Define preempt lazy bit for Powerpc. Use bit 9 which is free and within >>>> 16 bit range of NEED_RESCHED, so compiler can issue single andi. >>>> >>>> Since Powerpc doesn't use the generic entry/exit, add lazy check at exit >>>> to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for >>>> return to kernel. >>>> >>>> Ran a few benchmarks and db workload on Power10. Performance is close to >>>> preempt=none/voluntary. It is possible that some patterns would >>>> differ in lazy[2]. More details of preempt lazy is here [1] >>>> >>>> Since Powerpc system can have large core count and large memory, >>>> preempt lazy is going to be helpful in avoiding soft lockup issues. >>>> >>>> [1]: https://lore.kernel.org/lkml/20241007074609.447006177@infradead.org/ >>>> [2]: https://lore.kernel.org/all/1a973dda-c79e-4d95-935b-e4b93eb077b8@linux.ibm.com/ >>> >>> The lazy bits are only in tip. >> I have added change suggested by sebastian. I think it makes sense since the large user copy using vmx could take sometime and in preemptible kernel it needs to resched as soon as possible. However i am not making it consider lazy since it would lead to quite a bit of context switches which is not necessary. - if (IS_ENABLED(CONFIG_PREEMPT) && need_resched()) + if (IS_ENABLED(CONFIG_PREEMPTION) && need_resched()) set_dec(1); return 0; >> Hi Michael, I sent it to powerpc tree since all the changes were in >> arch/powerpc. Please let me know if i have send it to tip tree instead. > > I think I'd like it to have a full cycle of testing in next before going > into mainline. So I'll plan to take this via the powerpc tree for the > next cycle. > Make sense. > I assume you haven't tested 32-bit at all? > Yes, 32 bit isn't tested. it would be better if it goes through a test cycle. I will send out v2 soon. > cheers
On 11/9/24 22:24, Shrikanth Hegde wrote: > > > On 11/9/24 00:36, Ankur Arora wrote: >> >> Shrikanth Hegde <sshegde@linux.ibm.com> writes: >> >>> Define preempt lazy bit for Powerpc. Use bit 9 which is free and within >>> 16 bit range of NEED_RESCHED, so compiler can issue single andi. >>> >>> Since Powerpc doesn't use the generic entry/exit, add lazy check at exit >>> to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for >>> return to kernel. >>> >>> Ran a few benchmarks and db workload on Power10. Performance is close to >>> preempt=none/voluntary. It is possible that some patterns would >>> differ in lazy[2]. More details of preempt lazy is here [1] >>> >>> Since Powerpc system can have large core count and large memory, >>> preempt lazy is going to be helpful in avoiding soft lockup issues. >>> >>> [1]: https://lore.kernel.org/ >>> lkml/20241007074609.447006177@infradead.org/ >>> [2]: https://lore.kernel.org/all/1a973dda-c79e-4d95-935b- >>> e4b93eb077b8@linux.ibm.com/ >>> >>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> >> >> Looks good. Reviewed-by: <ankur.a.arora@oracle.com> > > Thank you Ankur for taking a look and rwb tag. > >> >> However, I just checked and powerpc does not have >> CONFIG_KVM_XFER_TO_GUEST_WORK. Do you need this additional patch >> for handling the lazy bit at KVM guest entry? > It doesn't use the generic kvm entry/exit either AFAIK. I need to understand more of this kvm maze. There are quite a lot of combinations. > will take a look. Thanks for the pointers. > >> >> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c >> index f14329989e9a..7bdf7015bb65 100644 >> --- a/arch/powerpc/kvm/powerpc.c >> +++ b/arch/powerpc/kvm/powerpc.c >> @@ -84,7 +84,8 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) >> hard_irq_disable(); >> >> while (true) { >> - if (need_resched()) { >> + unsigned long tf = read_thread_flags(); >> + if (tf & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) { >> local_irq_enable(); >> cond_resched(); >> hard_irq_disable(); >> This is not going help since, with LAZY, cond_resched is nop. So it doesn't call to schedule. Same is true with preempt=full. I need to figure out if kvm stuff was tested for preempt=full. Instead of cond_resched this needs to use schedule instead. Need to test it out and also see other places for kvm. So, i need to spend more time on this and figure out, will send the patches after that. >> >> Ankur >
Shrikanth Hegde <sshegde@linux.ibm.com> writes: > On 11/9/24 22:24, Shrikanth Hegde wrote: >> >> On 11/9/24 00:36, Ankur Arora wrote: >>> >>> Shrikanth Hegde <sshegde@linux.ibm.com> writes: >>> >>>> Define preempt lazy bit for Powerpc. Use bit 9 which is free and within >>>> 16 bit range of NEED_RESCHED, so compiler can issue single andi. >>>> >>>> Since Powerpc doesn't use the generic entry/exit, add lazy check at exit >>>> to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for >>>> return to kernel. >>>> >>>> Ran a few benchmarks and db workload on Power10. Performance is close to >>>> preempt=none/voluntary. It is possible that some patterns would >>>> differ in lazy[2]. More details of preempt lazy is here [1] >>>> >>>> Since Powerpc system can have large core count and large memory, >>>> preempt lazy is going to be helpful in avoiding soft lockup issues. >>>> >>>> [1]: https://lore.kernel.org/ lkml/20241007074609.447006177@infradead.org/ >>>> [2]: https://lore.kernel.org/all/1a973dda-c79e-4d95-935b- >>>> e4b93eb077b8@linux.ibm.com/ >>>> >>>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> >>> >>> Looks good. Reviewed-by: <ankur.a.arora@oracle.com> >> Thank you Ankur for taking a look and rwb tag. >> >>> >>> However, I just checked and powerpc does not have >>> CONFIG_KVM_XFER_TO_GUEST_WORK. Do you need this additional patch >>> for handling the lazy bit at KVM guest entry? >> > > It doesn't use the generic kvm entry/exit either AFAIK. I need to understand > more of this kvm maze. There are quite a lot of combinations. The generic kvm entry/exit is gated by CONFIG_KVM_XFER_TO_GUEST_WORK. >> will take a look. Thanks for the pointers. >> >>> >>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c >>> index f14329989e9a..7bdf7015bb65 100644 >>> --- a/arch/powerpc/kvm/powerpc.c >>> +++ b/arch/powerpc/kvm/powerpc.c >>> @@ -84,7 +84,8 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) >>> hard_irq_disable(); >>> >>> while (true) { >>> - if (need_resched()) { >>> + unsigned long tf = read_thread_flags(); >>> + if (tf & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) { >>> local_irq_enable(); >>> cond_resched(); >>> hard_irq_disable(); >>> > > This is not going help since, with LAZY, cond_resched is nop. So it doesn't call > to schedule. Same is true with preempt=full. I need to figure out if kvm stuff > was tested for preempt=full. > > Instead of cond_resched this needs to use schedule instead. Need to test it out > and also see other places for kvm. Oh yeah. Missed that it was calling cond_resched(). > So, i need to spend more time on this and figure out, will send the patches > after that. > >>> >>> Ankur >> -- ankur
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8094a01974cc..2f625aecf94b 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -145,6 +145,7 @@ config PPC select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PHYS_TO_DMA select ARCH_HAS_PMEM_API + select ARCH_HAS_PREEMPT_LAZY select ARCH_HAS_PTE_DEVMAP if PPC_BOOK3S_64 select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64 diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 6ebca2996f18..2785c7462ebf 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -103,6 +103,7 @@ void arch_setup_new_exec(void); #define TIF_PATCH_PENDING 6 /* pending live patching update */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SINGLESTEP 8 /* singlestepping active */ +#define TIF_NEED_RESCHED_LAZY 9 /* Scheduler driven lazy preemption */ #define TIF_SECCOMP 10 /* secure computing */ #define TIF_RESTOREALL 11 /* Restore all regs (implies NOERROR) */ #define TIF_NOERROR 12 /* Force successful syscall return */ @@ -122,6 +123,7 @@ void arch_setup_new_exec(void); #define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE) #define _TIF_SIGPENDING (1<<TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1<<TIF_NEED_RESCHED) +#define _TIF_NEED_RESCHED_LAZY (1<<TIF_NEED_RESCHED_LAZY) #define _TIF_NOTIFY_SIGNAL (1<<TIF_NOTIFY_SIGNAL) #define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG) #define _TIF_32BIT (1<<TIF_32BIT) @@ -142,9 +144,10 @@ void arch_setup_new_exec(void); _TIF_SYSCALL_EMU) #define _TIF_USER_WORK_MASK (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \ - _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ - _TIF_RESTORE_TM | _TIF_PATCH_PENDING | \ - _TIF_NOTIFY_SIGNAL) + _TIF_NEED_RESCHED_LAZY | _TIF_NOTIFY_RESUME | \ + _TIF_UPROBE | _TIF_RESTORE_TM | \ + _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL) + #define _TIF_PERSYSCALL_MASK (_TIF_RESTOREALL|_TIF_NOERROR) /* Bits in local_flags */ diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c index af62ec974b97..8f4acc55407b 100644 --- a/arch/powerpc/kernel/interrupt.c +++ b/arch/powerpc/kernel/interrupt.c @@ -185,7 +185,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, struct pt_regs *regs) ti_flags = read_thread_flags(); while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) { local_irq_enable(); - if (ti_flags & _TIF_NEED_RESCHED) { + if (ti_flags & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) { schedule(); } else { /* @@ -396,7 +396,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs) /* Returning to a kernel context with local irqs enabled. */ WARN_ON_ONCE(!(regs->msr & MSR_EE)); again: - if (IS_ENABLED(CONFIG_PREEMPT)) { + if (IS_ENABLED(CONFIG_PREEMPTION)) { /* Return to preemptible kernel context */ if (unlikely(read_thread_flags() & _TIF_NEED_RESCHED)) { if (preempt_count() == 0)
Define preempt lazy bit for Powerpc. Use bit 9 which is free and within 16 bit range of NEED_RESCHED, so compiler can issue single andi. Since Powerpc doesn't use the generic entry/exit, add lazy check at exit to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for return to kernel. Ran a few benchmarks and db workload on Power10. Performance is close to preempt=none/voluntary. It is possible that some patterns would differ in lazy[2]. More details of preempt lazy is here [1] Since Powerpc system can have large core count and large memory, preempt lazy is going to be helpful in avoiding soft lockup issues. [1]: https://lore.kernel.org/lkml/20241007074609.447006177@infradead.org/ [2]: https://lore.kernel.org/all/1a973dda-c79e-4d95-935b-e4b93eb077b8@linux.ibm.com/ Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> --- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/thread_info.h | 9 ++++++--- arch/powerpc/kernel/interrupt.c | 4 ++-- 3 files changed, 9 insertions(+), 5 deletions(-)