diff mbox series

[v2] powerpc/pseries/vas: Use usleep_range() to support HCALL delay

Message ID 20231129075424.240653-1-haren@linux.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series [v2] powerpc/pseries/vas: Use usleep_range() to support HCALL delay | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 23 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.

Commit Message

Haren Myneni Nov. 29, 2023, 7:54 a.m. UTC
VAS allocate, modify and deallocate HCALLs returns
H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
delay and expects OS to reissue HCALL after that delay. But using
msleep() will often sleep at least 20 msecs even though the
hypervisor expects to reissue these HCALLs after 1 or 10msecs.
It might cause these HCALLs takes longer when multiple threads
issue open or close VAS windows simultaneously.

So instead of msleep(), use usleep_range() to ensure sleep with
the expected value before issuing HCALL again.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>

---
v1 -> v2:
- Use usleep_range instead of using RTAS sleep routine as
  suggested by Nathan
---
 arch/powerpc/platforms/pseries/vas.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

Comments

Nathan Lynch Nov. 30, 2023, 1:43 a.m. UTC | #1
Haren Myneni <haren@linux.ibm.com> writes:
> VAS allocate, modify and deallocate HCALLs returns
> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
> delay and expects OS to reissue HCALL after that delay. But using
> msleep() will often sleep at least 20 msecs even though the
> hypervisor expects to reissue these HCALLs after 1 or 10msecs.

I would word this as "the architecture suggests that the OS reissue
these [...]" instead of framing it as something the platform "expects".

> It might cause these HCALLs takes longer when multiple threads
> issue open or close VAS windows simultaneously.

This is imprecise. Over-sleeping by the OS doesn't cause individual
hcalls to take longer. It is more accurate to say that the higher-level
operation (allocate, modify, free) may take longer than necessary in
cases where the OS must retry the hcalls involved.

> So instead of msleep(), use usleep_range() to ensure sleep with
> the expected value before issuing HCALL again.
>
> Signed-off-by: Haren Myneni <haren@linux.ibm.com>
> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
>
> ---
> v1 -> v2:
> - Use usleep_range instead of using RTAS sleep routine as
>   suggested by Nathan
> ---
>  arch/powerpc/platforms/pseries/vas.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
> index 71d52a670d95..bade4402741f 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -36,9 +36,31 @@ static bool migration_in_progress;
>  
>  static long hcall_return_busy_check(long rc)
>  {
> +	unsigned int ms;

This should move down into the H_IS_LONG_BUSY() block if it's not used
outside of it.

> +
>  	/* Check if we are stalled for some time */
>  	if (H_IS_LONG_BUSY(rc)) {
> -		msleep(get_longbusy_msecs(rc));
> +		ms = get_longbusy_msecs(rc);
> +		/*
> +		 * Allocate, Modify and Deallocate HCALLs returns
> +		 * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
> +		 * for the long delay. So the delay should always be 1
> +		 * or 10msecs, but sleeps 1msec in case if the long
> +		 * delay is > H_LONG_BUSY_ORDER_10_MSEC.
> +		 */
> +		if (ms > 10)
> +			ms = 1;

It's strange to coerce ms to 1 when it's greater than 10. Just clamp it
to 10, e.g.

                ms = clamp(get_longbusy_msecs(rc), 1, 10);

> +
> +		/*
> +		 * msleep() will often sleep at least 20 msecs even
> +		 * though the hypervisor expects to reissue these
> +		 * HCALLs after 1 or 10msecs. So use usleep_range()
> +		 * to sleep with the expected value.
> +		 *
> +		 * See Documentation/timers/timers-howto.rst on using
> +		 * the value range in usleep_range().
> +		 */
> +		usleep_range(ms * 100, ms * 1000);

If there's going to be commentary here I think it should just explain
why potentially sleeping for less than the suggested time is OK. There
is wording you can crib in rtas_busy_delay().


>  		rc = H_BUSY;
>  	} else if (rc == H_BUSY) {
>  		cond_resched();
> -- 
> 2.26.3
Michael Ellerman Nov. 30, 2023, 2:07 a.m. UTC | #2
Haren Myneni <haren@linux.ibm.com> writes:
> VAS allocate, modify and deallocate HCALLs returns
> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
> delay and expects OS to reissue HCALL after that delay. But using
> msleep() will often sleep at least 20 msecs even though the
> hypervisor expects to reissue these HCALLs after 1 or 10msecs.
> It might cause these HCALLs takes longer when multiple threads
> issue open or close VAS windows simultaneously.
>
> So instead of msleep(), use usleep_range() to ensure sleep with
> the expected value before issuing HCALL again.
>
> Signed-off-by: Haren Myneni <haren@linux.ibm.com>
> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
>
> ---
> v1 -> v2:
> - Use usleep_range instead of using RTAS sleep routine as
>   suggested by Nathan
> ---
>  arch/powerpc/platforms/pseries/vas.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
> index 71d52a670d95..bade4402741f 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -36,9 +36,31 @@ static bool migration_in_progress;
>  
>  static long hcall_return_busy_check(long rc)
>  {
> +	unsigned int ms;
> +
>  	/* Check if we are stalled for some time */
>  	if (H_IS_LONG_BUSY(rc)) {
> -		msleep(get_longbusy_msecs(rc));
> +		ms = get_longbusy_msecs(rc);
> +		/*
> +		 * Allocate, Modify and Deallocate HCALLs returns
> +		 * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
> +		 * for the long delay. So the delay should always be 1
> +		 * or 10msecs, but sleeps 1msec in case if the long
> +		 * delay is > H_LONG_BUSY_ORDER_10_MSEC.
> +		 */
> +		if (ms > 10)
> +			ms = 1;
 
I don't understand this. The hypervisor asked you to sleep for more than
10 milliseconds, so instead you sleep for 1?

I can understand that we don't want to usleep() for the longer durations
that could be returned, but so shouldn't the code be using msleep() for
those values?

Sleeping for a very short duration definitely seems wrong.


> +		/*
> +		 * msleep() will often sleep at least 20 msecs even
> +		 * though the hypervisor expects to reissue these
 
That makes it sound like the hypervisor is reissuing the hcalls.

Better would be "the hypervisor suggests the kernel should reissue the
hcall after ...".

> +		 * HCALLs after 1 or 10msecs. So use usleep_range()
> +		 * to sleep with the expected value.
> +		 *
> +		 * See Documentation/timers/timers-howto.rst on using
> +		 * the value range in usleep_range().
> +		 */
> +		usleep_range(ms * 100, ms * 1000);

If ms == 1, then that's 100 usecs, which is not 1 millisecond?

Please use USEC_PER_MSEC.

>  		rc = H_BUSY;
>  	} else if (rc == H_BUSY) {
>  		cond_resched();

cheers
Haren Myneni Dec. 1, 2023, 3:56 a.m. UTC | #3
On 11/29/23 5:43 PM, Nathan Lynch wrote:
> Haren Myneni <haren@linux.ibm.com> writes:
>> VAS allocate, modify and deallocate HCALLs returns
>> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
>> delay and expects OS to reissue HCALL after that delay. But using
>> msleep() will often sleep at least 20 msecs even though the
>> hypervisor expects to reissue these HCALLs after 1 or 10msecs.
> 
> I would word this as "the architecture suggests that the OS reissue
> these [...]" instead of framing it as something the platform "expects".
> 
>> It might cause these HCALLs takes longer when multiple threads
>> issue open or close VAS windows simultaneously.
> 
> This is imprecise. Over-sleeping by the OS doesn't cause individual
> hcalls to take longer. It is more accurate to say that the higher-level
> operation (allocate, modify, free) may take longer than necessary in
> cases where the OS must retry the hcalls involved.

Correct, takes longer with multiple threads opening/closing windows. I 
will make it clear.

> 
>> So instead of msleep(), use usleep_range() to ensure sleep with
>> the expected value before issuing HCALL again.
>>
>> Signed-off-by: Haren Myneni <haren@linux.ibm.com>
>> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
>>
>> ---
>> v1 -> v2:
>> - Use usleep_range instead of using RTAS sleep routine as
>>    suggested by Nathan
>> ---
>>   arch/powerpc/platforms/pseries/vas.c | 24 +++++++++++++++++++++++-
>>   1 file changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
>> index 71d52a670d95..bade4402741f 100644
>> --- a/arch/powerpc/platforms/pseries/vas.c
>> +++ b/arch/powerpc/platforms/pseries/vas.c
>> @@ -36,9 +36,31 @@ static bool migration_in_progress;
>>   
>>   static long hcall_return_busy_check(long rc)
>>   {
>> +	unsigned int ms;
> 
> This should move down into the H_IS_LONG_BUSY() block if it's not used
> outside of it.
> 
>> +
>>   	/* Check if we are stalled for some time */
>>   	if (H_IS_LONG_BUSY(rc)) {
>> -		msleep(get_longbusy_msecs(rc));
>> +		ms = get_longbusy_msecs(rc);
>> +		/*
>> +		 * Allocate, Modify and Deallocate HCALLs returns
>> +		 * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
>> +		 * for the long delay. So the delay should always be 1
>> +		 * or 10msecs, but sleeps 1msec in case if the long
>> +		 * delay is > H_LONG_BUSY_ORDER_10_MSEC.
>> +		 */
>> +		if (ms > 10)
>> +			ms = 1;
> 
> It's strange to coerce ms to 1 when it's greater than 10. Just clamp it
> to 10, e.g.
> 
>                  ms = clamp(get_longbusy_msecs(rc), 1, 10);

Sure, these HCALLs should not return > H_LONG_BUSY_ORDER_10_MSEC.

> 
>> +
>> +		/*
>> +		 * msleep() will often sleep at least 20 msecs even
>> +		 * though the hypervisor expects to reissue these
>> +		 * HCALLs after 1 or 10msecs. So use usleep_range()
>> +		 * to sleep with the expected value.
>> +		 *
>> +		 * See Documentation/timers/timers-howto.rst on using
>> +		 * the value range in usleep_range().
>> +		 */
>> +		usleep_range(ms * 100, ms * 1000);
> 
> If there's going to be commentary here I think it should just explain
> why potentially sleeping for less than the suggested time is OK. There
> is wording you can crib in rtas_busy_delay().
> 
> 
>>   		rc = H_BUSY;
>>   	} else if (rc == H_BUSY) {
>>   		cond_resched();
>> -- 
>> 2.26.3
Haren Myneni Dec. 1, 2023, 4:10 a.m. UTC | #4
On 11/29/23 6:07 PM, Michael Ellerman wrote:
> Haren Myneni <haren@linux.ibm.com> writes:
>> VAS allocate, modify and deallocate HCALLs returns
>> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
>> delay and expects OS to reissue HCALL after that delay. But using
>> msleep() will often sleep at least 20 msecs even though the
>> hypervisor expects to reissue these HCALLs after 1 or 10msecs.
>> It might cause these HCALLs takes longer when multiple threads
>> issue open or close VAS windows simultaneously.
>>
>> So instead of msleep(), use usleep_range() to ensure sleep with
>> the expected value before issuing HCALL again.
>>
>> Signed-off-by: Haren Myneni <haren@linux.ibm.com>
>> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
>>
>> ---
>> v1 -> v2:
>> - Use usleep_range instead of using RTAS sleep routine as
>>    suggested by Nathan
>> ---
>>   arch/powerpc/platforms/pseries/vas.c | 24 +++++++++++++++++++++++-
>>   1 file changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
>> index 71d52a670d95..bade4402741f 100644
>> --- a/arch/powerpc/platforms/pseries/vas.c
>> +++ b/arch/powerpc/platforms/pseries/vas.c
>> @@ -36,9 +36,31 @@ static bool migration_in_progress;
>>   
>>   static long hcall_return_busy_check(long rc)
>>   {
>> +	unsigned int ms;
>> +
>>   	/* Check if we are stalled for some time */
>>   	if (H_IS_LONG_BUSY(rc)) {
>> -		msleep(get_longbusy_msecs(rc));
>> +		ms = get_longbusy_msecs(rc);
>> +		/*
>> +		 * Allocate, Modify and Deallocate HCALLs returns
>> +		 * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
>> +		 * for the long delay. So the delay should always be 1
>> +		 * or 10msecs, but sleeps 1msec in case if the long
>> +		 * delay is > H_LONG_BUSY_ORDER_10_MSEC.
>> +		 */
>> +		if (ms > 10)
>> +			ms = 1;
>   
> I don't understand this. The hypervisor asked you to sleep for more than
> 10 milliseconds, so instead you sleep for 1?
> 
> I can understand that we don't want to usleep() for the longer durations
> that could be returned, but so shouldn't the code be using msleep() for
> those values?
> 
> Sleeping for a very short duration definitely seems wrong.

Allocate, modify and deallocate HCALLs return only 1MSECS and 10MSECS 
for long delay. we should not expect > 10MSECS for these HCALLs. Hence 
ms = 1 if ms > 10

But it is confusing. So will use ms = 10 for ms >= 10 as Nathan suggested.

> 
> 
>> +		/*
>> +		 * msleep() will often sleep at least 20 msecs even
>> +		 * though the hypervisor expects to reissue these
>   
> That makes it sound like the hypervisor is reissuing the hcalls.
> 
> Better would be "the hypervisor suggests the kernel should reissue the
> hcall after ...".
> 
>> +		 * HCALLs after 1 or 10msecs. So use usleep_range()
>> +		 * to sleep with the expected value.
>> +		 *
>> +		 * See Documentation/timers/timers-howto.rst on using
>> +		 * the value range in usleep_range().
>> +		 */
>> +		usleep_range(ms * 100, ms * 1000);
> 
> If ms == 1, then that's 100 usecs, which is not 1 millisecond?
> 
> Please use USEC_PER_MSEC.

Using usleep_range() same way as mentioned in  rtas_busy_delay().


Thanks
Haren

> 
>>   		rc = H_BUSY;
>>   	} else if (rc == H_BUSY) {
>>   		cond_resched();
> 
> cheers
>
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
index 71d52a670d95..bade4402741f 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -36,9 +36,31 @@  static bool migration_in_progress;
 
 static long hcall_return_busy_check(long rc)
 {
+	unsigned int ms;
+
 	/* Check if we are stalled for some time */
 	if (H_IS_LONG_BUSY(rc)) {
-		msleep(get_longbusy_msecs(rc));
+		ms = get_longbusy_msecs(rc);
+		/*
+		 * Allocate, Modify and Deallocate HCALLs returns
+		 * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
+		 * for the long delay. So the delay should always be 1
+		 * or 10msecs, but sleeps 1msec in case if the long
+		 * delay is > H_LONG_BUSY_ORDER_10_MSEC.
+		 */
+		if (ms > 10)
+			ms = 1;
+
+		/*
+		 * msleep() will often sleep at least 20 msecs even
+		 * though the hypervisor expects to reissue these
+		 * HCALLs after 1 or 10msecs. So use usleep_range()
+		 * to sleep with the expected value.
+		 *
+		 * See Documentation/timers/timers-howto.rst on using
+		 * the value range in usleep_range().
+		 */
+		usleep_range(ms * 100, ms * 1000);
 		rc = H_BUSY;
 	} else if (rc == H_BUSY) {
 		cond_resched();