Message ID | 20231129075424.240653-1-haren@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [v2] powerpc/pseries/vas: Use usleep_range() to support HCALL delay | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
Haren Myneni <haren@linux.ibm.com> writes: > VAS allocate, modify and deallocate HCALLs returns > H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy > delay and expects OS to reissue HCALL after that delay. But using > msleep() will often sleep at least 20 msecs even though the > hypervisor expects to reissue these HCALLs after 1 or 10msecs. I would word this as "the architecture suggests that the OS reissue these [...]" instead of framing it as something the platform "expects". > It might cause these HCALLs takes longer when multiple threads > issue open or close VAS windows simultaneously. This is imprecise. Over-sleeping by the OS doesn't cause individual hcalls to take longer. It is more accurate to say that the higher-level operation (allocate, modify, free) may take longer than necessary in cases where the OS must retry the hcalls involved. > So instead of msleep(), use usleep_range() to ensure sleep with > the expected value before issuing HCALL again. > > Signed-off-by: Haren Myneni <haren@linux.ibm.com> > Suggested-by: Nathan Lynch <nathanl@linux.ibm.com> > > --- > v1 -> v2: > - Use usleep_range instead of using RTAS sleep routine as > suggested by Nathan > --- > arch/powerpc/platforms/pseries/vas.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c > index 71d52a670d95..bade4402741f 100644 > --- a/arch/powerpc/platforms/pseries/vas.c > +++ b/arch/powerpc/platforms/pseries/vas.c > @@ -36,9 +36,31 @@ static bool migration_in_progress; > > static long hcall_return_busy_check(long rc) > { > + unsigned int ms; This should move down into the H_IS_LONG_BUSY() block if it's not used outside of it. > + > /* Check if we are stalled for some time */ > if (H_IS_LONG_BUSY(rc)) { > - msleep(get_longbusy_msecs(rc)); > + ms = get_longbusy_msecs(rc); > + /* > + * Allocate, Modify and Deallocate HCALLs returns > + * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC > + * for the long delay. So the delay should always be 1 > + * or 10msecs, but sleeps 1msec in case if the long > + * delay is > H_LONG_BUSY_ORDER_10_MSEC. > + */ > + if (ms > 10) > + ms = 1; It's strange to coerce ms to 1 when it's greater than 10. Just clamp it to 10, e.g. ms = clamp(get_longbusy_msecs(rc), 1, 10); > + > + /* > + * msleep() will often sleep at least 20 msecs even > + * though the hypervisor expects to reissue these > + * HCALLs after 1 or 10msecs. So use usleep_range() > + * to sleep with the expected value. > + * > + * See Documentation/timers/timers-howto.rst on using > + * the value range in usleep_range(). > + */ > + usleep_range(ms * 100, ms * 1000); If there's going to be commentary here I think it should just explain why potentially sleeping for less than the suggested time is OK. There is wording you can crib in rtas_busy_delay(). > rc = H_BUSY; > } else if (rc == H_BUSY) { > cond_resched(); > -- > 2.26.3
Haren Myneni <haren@linux.ibm.com> writes: > VAS allocate, modify and deallocate HCALLs returns > H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy > delay and expects OS to reissue HCALL after that delay. But using > msleep() will often sleep at least 20 msecs even though the > hypervisor expects to reissue these HCALLs after 1 or 10msecs. > It might cause these HCALLs takes longer when multiple threads > issue open or close VAS windows simultaneously. > > So instead of msleep(), use usleep_range() to ensure sleep with > the expected value before issuing HCALL again. > > Signed-off-by: Haren Myneni <haren@linux.ibm.com> > Suggested-by: Nathan Lynch <nathanl@linux.ibm.com> > > --- > v1 -> v2: > - Use usleep_range instead of using RTAS sleep routine as > suggested by Nathan > --- > arch/powerpc/platforms/pseries/vas.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c > index 71d52a670d95..bade4402741f 100644 > --- a/arch/powerpc/platforms/pseries/vas.c > +++ b/arch/powerpc/platforms/pseries/vas.c > @@ -36,9 +36,31 @@ static bool migration_in_progress; > > static long hcall_return_busy_check(long rc) > { > + unsigned int ms; > + > /* Check if we are stalled for some time */ > if (H_IS_LONG_BUSY(rc)) { > - msleep(get_longbusy_msecs(rc)); > + ms = get_longbusy_msecs(rc); > + /* > + * Allocate, Modify and Deallocate HCALLs returns > + * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC > + * for the long delay. So the delay should always be 1 > + * or 10msecs, but sleeps 1msec in case if the long > + * delay is > H_LONG_BUSY_ORDER_10_MSEC. > + */ > + if (ms > 10) > + ms = 1; I don't understand this. The hypervisor asked you to sleep for more than 10 milliseconds, so instead you sleep for 1? I can understand that we don't want to usleep() for the longer durations that could be returned, but so shouldn't the code be using msleep() for those values? Sleeping for a very short duration definitely seems wrong. > + /* > + * msleep() will often sleep at least 20 msecs even > + * though the hypervisor expects to reissue these That makes it sound like the hypervisor is reissuing the hcalls. Better would be "the hypervisor suggests the kernel should reissue the hcall after ...". > + * HCALLs after 1 or 10msecs. So use usleep_range() > + * to sleep with the expected value. > + * > + * See Documentation/timers/timers-howto.rst on using > + * the value range in usleep_range(). > + */ > + usleep_range(ms * 100, ms * 1000); If ms == 1, then that's 100 usecs, which is not 1 millisecond? Please use USEC_PER_MSEC. > rc = H_BUSY; > } else if (rc == H_BUSY) { > cond_resched(); cheers
On 11/29/23 5:43 PM, Nathan Lynch wrote: > Haren Myneni <haren@linux.ibm.com> writes: >> VAS allocate, modify and deallocate HCALLs returns >> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy >> delay and expects OS to reissue HCALL after that delay. But using >> msleep() will often sleep at least 20 msecs even though the >> hypervisor expects to reissue these HCALLs after 1 or 10msecs. > > I would word this as "the architecture suggests that the OS reissue > these [...]" instead of framing it as something the platform "expects". > >> It might cause these HCALLs takes longer when multiple threads >> issue open or close VAS windows simultaneously. > > This is imprecise. Over-sleeping by the OS doesn't cause individual > hcalls to take longer. It is more accurate to say that the higher-level > operation (allocate, modify, free) may take longer than necessary in > cases where the OS must retry the hcalls involved. Correct, takes longer with multiple threads opening/closing windows. I will make it clear. > >> So instead of msleep(), use usleep_range() to ensure sleep with >> the expected value before issuing HCALL again. >> >> Signed-off-by: Haren Myneni <haren@linux.ibm.com> >> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com> >> >> --- >> v1 -> v2: >> - Use usleep_range instead of using RTAS sleep routine as >> suggested by Nathan >> --- >> arch/powerpc/platforms/pseries/vas.c | 24 +++++++++++++++++++++++- >> 1 file changed, 23 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c >> index 71d52a670d95..bade4402741f 100644 >> --- a/arch/powerpc/platforms/pseries/vas.c >> +++ b/arch/powerpc/platforms/pseries/vas.c >> @@ -36,9 +36,31 @@ static bool migration_in_progress; >> >> static long hcall_return_busy_check(long rc) >> { >> + unsigned int ms; > > This should move down into the H_IS_LONG_BUSY() block if it's not used > outside of it. > >> + >> /* Check if we are stalled for some time */ >> if (H_IS_LONG_BUSY(rc)) { >> - msleep(get_longbusy_msecs(rc)); >> + ms = get_longbusy_msecs(rc); >> + /* >> + * Allocate, Modify and Deallocate HCALLs returns >> + * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC >> + * for the long delay. So the delay should always be 1 >> + * or 10msecs, but sleeps 1msec in case if the long >> + * delay is > H_LONG_BUSY_ORDER_10_MSEC. >> + */ >> + if (ms > 10) >> + ms = 1; > > It's strange to coerce ms to 1 when it's greater than 10. Just clamp it > to 10, e.g. > > ms = clamp(get_longbusy_msecs(rc), 1, 10); Sure, these HCALLs should not return > H_LONG_BUSY_ORDER_10_MSEC. > >> + >> + /* >> + * msleep() will often sleep at least 20 msecs even >> + * though the hypervisor expects to reissue these >> + * HCALLs after 1 or 10msecs. So use usleep_range() >> + * to sleep with the expected value. >> + * >> + * See Documentation/timers/timers-howto.rst on using >> + * the value range in usleep_range(). >> + */ >> + usleep_range(ms * 100, ms * 1000); > > If there's going to be commentary here I think it should just explain > why potentially sleeping for less than the suggested time is OK. There > is wording you can crib in rtas_busy_delay(). > > >> rc = H_BUSY; >> } else if (rc == H_BUSY) { >> cond_resched(); >> -- >> 2.26.3
On 11/29/23 6:07 PM, Michael Ellerman wrote: > Haren Myneni <haren@linux.ibm.com> writes: >> VAS allocate, modify and deallocate HCALLs returns >> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy >> delay and expects OS to reissue HCALL after that delay. But using >> msleep() will often sleep at least 20 msecs even though the >> hypervisor expects to reissue these HCALLs after 1 or 10msecs. >> It might cause these HCALLs takes longer when multiple threads >> issue open or close VAS windows simultaneously. >> >> So instead of msleep(), use usleep_range() to ensure sleep with >> the expected value before issuing HCALL again. >> >> Signed-off-by: Haren Myneni <haren@linux.ibm.com> >> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com> >> >> --- >> v1 -> v2: >> - Use usleep_range instead of using RTAS sleep routine as >> suggested by Nathan >> --- >> arch/powerpc/platforms/pseries/vas.c | 24 +++++++++++++++++++++++- >> 1 file changed, 23 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c >> index 71d52a670d95..bade4402741f 100644 >> --- a/arch/powerpc/platforms/pseries/vas.c >> +++ b/arch/powerpc/platforms/pseries/vas.c >> @@ -36,9 +36,31 @@ static bool migration_in_progress; >> >> static long hcall_return_busy_check(long rc) >> { >> + unsigned int ms; >> + >> /* Check if we are stalled for some time */ >> if (H_IS_LONG_BUSY(rc)) { >> - msleep(get_longbusy_msecs(rc)); >> + ms = get_longbusy_msecs(rc); >> + /* >> + * Allocate, Modify and Deallocate HCALLs returns >> + * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC >> + * for the long delay. So the delay should always be 1 >> + * or 10msecs, but sleeps 1msec in case if the long >> + * delay is > H_LONG_BUSY_ORDER_10_MSEC. >> + */ >> + if (ms > 10) >> + ms = 1; > > I don't understand this. The hypervisor asked you to sleep for more than > 10 milliseconds, so instead you sleep for 1? > > I can understand that we don't want to usleep() for the longer durations > that could be returned, but so shouldn't the code be using msleep() for > those values? > > Sleeping for a very short duration definitely seems wrong. Allocate, modify and deallocate HCALLs return only 1MSECS and 10MSECS for long delay. we should not expect > 10MSECS for these HCALLs. Hence ms = 1 if ms > 10 But it is confusing. So will use ms = 10 for ms >= 10 as Nathan suggested. > > >> + /* >> + * msleep() will often sleep at least 20 msecs even >> + * though the hypervisor expects to reissue these > > That makes it sound like the hypervisor is reissuing the hcalls. > > Better would be "the hypervisor suggests the kernel should reissue the > hcall after ...". > >> + * HCALLs after 1 or 10msecs. So use usleep_range() >> + * to sleep with the expected value. >> + * >> + * See Documentation/timers/timers-howto.rst on using >> + * the value range in usleep_range(). >> + */ >> + usleep_range(ms * 100, ms * 1000); > > If ms == 1, then that's 100 usecs, which is not 1 millisecond? > > Please use USEC_PER_MSEC. Using usleep_range() same way as mentioned in rtas_busy_delay(). Thanks Haren > >> rc = H_BUSY; >> } else if (rc == H_BUSY) { >> cond_resched(); > > cheers >
diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c index 71d52a670d95..bade4402741f 100644 --- a/arch/powerpc/platforms/pseries/vas.c +++ b/arch/powerpc/platforms/pseries/vas.c @@ -36,9 +36,31 @@ static bool migration_in_progress; static long hcall_return_busy_check(long rc) { + unsigned int ms; + /* Check if we are stalled for some time */ if (H_IS_LONG_BUSY(rc)) { - msleep(get_longbusy_msecs(rc)); + ms = get_longbusy_msecs(rc); + /* + * Allocate, Modify and Deallocate HCALLs returns + * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC + * for the long delay. So the delay should always be 1 + * or 10msecs, but sleeps 1msec in case if the long + * delay is > H_LONG_BUSY_ORDER_10_MSEC. + */ + if (ms > 10) + ms = 1; + + /* + * msleep() will often sleep at least 20 msecs even + * though the hypervisor expects to reissue these + * HCALLs after 1 or 10msecs. So use usleep_range() + * to sleep with the expected value. + * + * See Documentation/timers/timers-howto.rst on using + * the value range in usleep_range(). + */ + usleep_range(ms * 100, ms * 1000); rc = H_BUSY; } else if (rc == H_BUSY) { cond_resched();
VAS allocate, modify and deallocate HCALLs returns H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy delay and expects OS to reissue HCALL after that delay. But using msleep() will often sleep at least 20 msecs even though the hypervisor expects to reissue these HCALLs after 1 or 10msecs. It might cause these HCALLs takes longer when multiple threads issue open or close VAS windows simultaneously. So instead of msleep(), use usleep_range() to ensure sleep with the expected value before issuing HCALL again. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com> --- v1 -> v2: - Use usleep_range instead of using RTAS sleep routine as suggested by Nathan --- arch/powerpc/platforms/pseries/vas.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)