diff mbox series

[v6] powerpc/pseries/vas: Use usleep_range() to support HCALL delay

Message ID 20240116055910.421605-1-haren@linux.ibm.com (mailing list archive)
State Accepted
Commit 43ac9f5cd457bb01930f87448ddaaae455f8a8cf
Headers show
Series [v6] powerpc/pseries/vas: Use usleep_range() to support HCALL delay | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.

Commit Message

Haren Myneni Jan. 16, 2024, 5:59 a.m. UTC
VAS allocate, modify and deallocate HCALLs returns
H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
delay and expects OS to reissue HCALL after that delay. But using
msleep() will often sleep at least 20 msecs even though the
hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.

The open and close VAS window functions hold mutex and then issue
these HCALLs. So these operations can take longer than the
necessary when multiple threads issue open or close window APIs
simultaneously, especially might affect the performance in the
case of repeat open/close APIs for each compression request.

Multiple tasks can open / close VAS windows at the same time
which depends on the available VAS credits. For example, 240
cores system provides 4800 VAS credits. It means 4800 tasks can
execute open VAS windows HCALLs with the mutex. Since each
msleep() will often sleep more than 20 msecs, some tasks are
waiting more than 120 secs to acquire mutex. It can cause hung
traces for these tasks in dmesg due to mutex contention around
open/close HCALLs.

Instead of msleep(), use usleep_range() to ensure sleep with
the expected value before issuing HCALL again. So since each
task sleep 10 msecs maximum, this patch allow more tasks can
issue open/close VAS calls without any hung traces in the
dmesg.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>

---
v1 -> v2:
- Use usleep_range instead of using RTAS sleep routine as
  suggested by Nathan
v2 -> v3:
- Sleep 10MSecs even for HCALL delay > 10MSecs and the other
  commit / comemnt changes as suggested by Nathan and Ellerman.
v3 -> v4:
- More description in the commit log with the visible impact for
  the current code as suggested by Aneesh
v4 -> v5:
- Use USEC_PER_MSEC macro in usleep_range as suggested by Aneesh
v5 -> v6:
- Use USEC_PER_MSEC macro to calculate all ranges in usleep_range()
  and more description in the commit log.
---
 arch/powerpc/platforms/pseries/vas.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

Nathan Lynch Jan. 18, 2024, 4:54 p.m. UTC | #1
Haren Myneni <haren@linux.ibm.com> writes:
> VAS allocate, modify and deallocate HCALLs returns
> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
> delay and expects OS to reissue HCALL after that delay. But using
> msleep() will often sleep at least 20 msecs even though the
> hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
>
> The open and close VAS window functions hold mutex and then issue
> these HCALLs. So these operations can take longer than the
> necessary when multiple threads issue open or close window APIs
> simultaneously, especially might affect the performance in the
> case of repeat open/close APIs for each compression request.
>
> Multiple tasks can open / close VAS windows at the same time
> which depends on the available VAS credits. For example, 240
> cores system provides 4800 VAS credits. It means 4800 tasks can
> execute open VAS windows HCALLs with the mutex. Since each
> msleep() will often sleep more than 20 msecs, some tasks are
> waiting more than 120 secs to acquire mutex. It can cause hung
> traces for these tasks in dmesg due to mutex contention around
> open/close HCALLs.
>
> Instead of msleep(), use usleep_range() to ensure sleep with
> the expected value before issuing HCALL again. So since each
> task sleep 10 msecs maximum, this patch allow more tasks can
> issue open/close VAS calls without any hung traces in the
> dmesg.
>
> Signed-off-by: Haren Myneni <haren@linux.ibm.com>
> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>

Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>

IMO this can be converted to a more generic helper in the future, should
one emerge.
Michael Ellerman June 20, 2024, 12:49 p.m. UTC | #2
On Mon, 15 Jan 2024 21:59:10 -0800, Haren Myneni wrote:
> VAS allocate, modify and deallocate HCALLs returns
> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
> delay and expects OS to reissue HCALL after that delay. But using
> msleep() will often sleep at least 20 msecs even though the
> hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
> 
> The open and close VAS window functions hold mutex and then issue
> these HCALLs. So these operations can take longer than the
> necessary when multiple threads issue open or close window APIs
> simultaneously, especially might affect the performance in the
> case of repeat open/close APIs for each compression request.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries/vas: Use usleep_range() to support HCALL delay
      https://git.kernel.org/powerpc/c/43ac9f5cd457bb01930f87448ddaaae455f8a8cf

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
index 71d52a670d95..8e8934564557 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -38,7 +38,27 @@  static long hcall_return_busy_check(long rc)
 {
 	/* Check if we are stalled for some time */
 	if (H_IS_LONG_BUSY(rc)) {
-		msleep(get_longbusy_msecs(rc));
+		unsigned int ms;
+		/*
+		 * Allocate, Modify and Deallocate HCALLs returns
+		 * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
+		 * for the long delay. So the sleep time should always
+		 * be either 1 or 10msecs, but in case if the HCALL
+		 * returns the long delay > 10 msecs, clamp the sleep
+		 * time to 10msecs.
+		 */
+		ms = clamp(get_longbusy_msecs(rc), 1, 10);
+
+		/*
+		 * msleep() will often sleep at least 20 msecs even
+		 * though the hypervisor suggests that the OS reissue
+		 * HCALLs after 1 or 10msecs. Also the delay hint from
+		 * the HCALL is just a suggestion. So OK to pause for
+		 * less time than the hinted delay. Use usleep_range()
+		 * to ensure we don't sleep much longer than actually
+		 * needed.
+		 */
+		usleep_range(ms * (USEC_PER_MSEC / 10), ms * USEC_PER_MSEC);
 		rc = H_BUSY;
 	} else if (rc == H_BUSY) {
 		cond_resched();