mbox series

[RFC,v2,00/14] New TM Model

Message ID 1541508028-31865-1-git-send-email-leitao@debian.org (mailing list archive)
Headers show
Series New TM Model | expand

Message

Breno Leitao Nov. 6, 2018, 12:40 p.m. UTC
This  patchset for the hardware transactional memory (TM) subsystem
aims to avoid spending a lot of time on TM suspended mode in kernel
space.  It basically changes where the reclaim/recheckpoint will be
executed.

The hardware is designed so once a CPU enters in transactional state it
uses a footprint area to track down the loads/stores performed in
transaction so it can be verified later to decide if a conflict happened
due to some change done in that state by another thread.  If a transaction
is active in userspace and there is an exception that takes the CPU to the
kernel space, the CPU moves the transaction state to suspended state but
does not discard the registers (GPR,VEC,VSX,FP) from footprint area,
although the memory footprint might be discarded.

POWER9 has a known problem [1, 2] and does not have enough room in
footprint area for several transactions to be suspended at the same time
on various CPUs leading to CPU stalls.

This new model, together with a future 'fake userspace suspended'
implementation may workaround POWER9 hardware issue.

This patchset aims to reclaim the checkpointed registers as soon as the
kernel is invoked, in the beginning of the exception handlers, thus freeing
room to other CPUs enter in suspended mode for a short period of time as
soon as possible, avoiding too many CPUs in suspended state that can cause
the CPUs to stall. The same mechanism is done on kernel exit, doing a
recheckpoint as late as possible (which will reload the checkpointed
registers into CPU's checkpoint area) at the exception return path.

The way to achieve this goal is creating a macro (TM_KERNEL_ENTRY) which
will check if userspace was in an active transaction just after getting
into kernel space and reclaim the transaction if that's the case. Thus all
exception handlers will call this macro as soon as possible.

All exceptions should reclaim (if necessary) at this stage and only
recheckpoint if the task is tagged as TIF_RESTORE_TM (i.e. was in
transactional state before being interrupted), which will be done at
restore_tm_state().

Hence, by allowing the CPU to be in suspended state for only a brief period
it's possible to create the initial infrastructure that copes with the TM
hardware limitations.

This patchset was tested in different scenarios using different test
suites, as the kernel selftests, OpenJDK TM tests, and htm-torture [3], in the
following configuration:

 * POWER8/pseries LE and BE
 * POWER8/powernv LE
 * POWER9/pseries LE 
 * POWER8/powernv LE hosting KVM guests running TM tests

This patchset is based on initial work done by Cyril Bur:
    https://patchwork.ozlabs.org/cover/875341/

V1 patchset URL: https://patchwork.ozlabs.org/cover/969173/

Major Change from v1:
 
 * restore_tm_state() being called later at the kernel exit path, so, there is
   no way to replay any IRQ, which will be done with TM in suspended state.
   This is mostly described in the 'Recheckpoint at exit path' patch.

 * No neeed to force TEXASR[FS] bit explicitly. This was required because
   in a very specific case, TEXASR SPR was not being restored properly but
   MSR[TM] was set. Fixed in patch 'Do not restore TM without SPRs'.

 * All treclaim/trechkpoint have a WARN_ON() if not called on kernel
   entrance or exit path. tm_reclaim() is only called by TM_KERNEL_ENTRY
   and tm_recheckpoint is only called by restore_tm_state(). All the rest
   causes a warning.
 
Regards,
Breno

[1] Documentation/powerpc/transactional_memory.txt
[2] commit 4bb3c7a0208fc13ca70598efd109901a7cd45ae7
[3] https://github.com/leitao/htm_torture/

Breno Leitao (14):
  powerpc/tm: Reclaim transaction on kernel entry
  powerpc/tm: Reclaim on unavailable exception
  powerpc/tm: Recheckpoint when exiting from kernel
  powerpc/tm: Always set TIF_RESTORE_TM on reclaim
  powerpc/tm: Refactor the __switch_to_tm code
  powerpc/tm: Do not recheckpoint at sigreturn
  powerpc/tm: Do not reclaim on ptrace
  powerpc/tm: Recheckpoint at exit path
  powerpc/tm: Warn if state is transactional
  powerpc/tm: Improve TM debug information
  powerpc/tm: Save MSR to PACA before RFID
  powerpc/tm: Restore transactional SPRs
  powerpc/tm: Do not restore TM without SPRs
  selftests/powerpc: Adapt tm-syscall test to no suspend

 arch/powerpc/include/asm/exception-64s.h      |  50 ++++
 arch/powerpc/include/asm/thread_info.h        |   2 +-
 arch/powerpc/kernel/asm-offsets.c             |   4 +
 arch/powerpc/kernel/entry_64.S                |  37 ++-
 arch/powerpc/kernel/exceptions-64s.S          |  15 +-
 arch/powerpc/kernel/process.c                 | 242 ++++++++++--------
 arch/powerpc/kernel/ptrace.c                  |  16 +-
 arch/powerpc/kernel/signal.c                  |   2 +-
 arch/powerpc/kernel/signal_32.c               |  38 +--
 arch/powerpc/kernel/signal_64.c               |  42 ++-
 arch/powerpc/kernel/tm.S                      |  19 +-
 arch/powerpc/kernel/traps.c                   |  22 +-
 .../testing/selftests/powerpc/tm/tm-syscall.c |   6 -
 13 files changed, 293 insertions(+), 202 deletions(-)

Comments

Florian Weimer Nov. 6, 2018, 6:32 p.m. UTC | #1
* Breno Leitao:

> This  patchset for the hardware transactional memory (TM) subsystem
> aims to avoid spending a lot of time on TM suspended mode in kernel
> space.  It basically changes where the reclaim/recheckpoint will be
> executed.

I assumed that we want to abort on every system call these days?

We have this commit in glibc:

commit f0458cf4f9ff3d870c43b624e6dccaaf657d5e83
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Mon Aug 27 09:42:50 2018 -0300

    powerpc: Only enable TLE with PPC_FEATURE2_HTM_NOSC
    
    Linux from 3.9 through 4.2 does not abort HTM transaction on syscalls,
    instead it suspend and resume it when leaving the kernel.  The
    side-effects of the syscall will always remain visible, even if the
    transaction is aborted.  This is an issue when transaction is used along
    with futex syscall, on pthread_cond_wait for instance, where the futex
    call might succeed but the transaction is rolled back leading the
    pthread_cond object in an inconsistent state.
    
    Glibc used to prevent it by always aborting a transaction before issuing
    a syscall.  Linux 4.2 also decided to abort active transaction in
    syscalls which makes the glibc workaround superfluous.  Worse, glibc
    transaction abortion leads to a performance issue on recent kernels
    where the HTM state is saved/restore lazily (v4.9).  By aborting a
    transaction on every syscalls, regardless whether a transaction has being
    initiated before, GLIBS makes the kernel always save/restore HTM state
    (it can not even lazily disable it after a certain number of syscall
    iterations).
    
    Because of this shortcoming, Transactional Lock Elision is just enabled
    when it has been explicitly set (either by tunables of by a configure
    switch) and if kernel aborts HTM transactions on syscalls
    (PPC_FEATURE2_HTM_NOSC).  It is reported that using simple benchmark [1],
    the context-switch is about 5% faster by not issuing a tabort in every
    syscall in newer kernels.

I wonder how the new TM model interacts with the assumption we currently
have in glibc.

Thanks,
Florian
Breno Leitao Nov. 6, 2018, 7:31 p.m. UTC | #2
hi Florian,

On 11/06/2018 04:32 PM, Florian Weimer wrote:
> * Breno Leitao:
> 
>> This  patchset for the hardware transactional memory (TM) subsystem
>> aims to avoid spending a lot of time on TM suspended mode in kernel
>> space.  It basically changes where the reclaim/recheckpoint will be
>> executed.
> 
> I assumed that we want to abort on every system call these days?
> 
> We have this commit in glibc:
> 
> commit f0458cf4f9ff3d870c43b624e6dccaaf657d5e83
> Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Date:   Mon Aug 27 09:42:50 2018 -0300
> 
>     powerpc: Only enable TLE with PPC_FEATURE2_HTM_NOSC
>     
>     Linux from 3.9 through 4.2 does not abort HTM transaction on syscalls,
>     instead it suspend and resume it when leaving the kernel.  The
>     side-effects of the syscall will always remain visible, even if the
>     transaction is aborted.  This is an issue when transaction is used along
>     with futex syscall, on pthread_cond_wait for instance, where the futex
>     call might succeed but the transaction is rolled back leading the
>     pthread_cond object in an inconsistent state.
>     
>     Glibc used to prevent it by always aborting a transaction before issuing
>     a syscall.  Linux 4.2 also decided to abort active transaction in
>     syscalls which makes the glibc workaround superfluous.  Worse, glibc
>     transaction abortion leads to a performance issue on recent kernels
>     where the HTM state is saved/restore lazily (v4.9).  By aborting a
>     transaction on every syscalls, regardless whether a transaction has being
>     initiated before, GLIBS makes the kernel always save/restore HTM state
>     (it can not even lazily disable it after a certain number of syscall
>     iterations).
>     
>     Because of this shortcoming, Transactional Lock Elision is just enabled
>     when it has been explicitly set (either by tunables of by a configure
>     switch) and if kernel aborts HTM transactions on syscalls
>     (PPC_FEATURE2_HTM_NOSC).  It is reported that using simple benchmark [1],
>     the context-switch is about 5% faster by not issuing a tabort in every
>     syscall in newer kernels.
> 
> I wonder how the new TM model interacts with the assumption we currently
> have in glibc.

This new TM model is almost transparent to userspace. My patchset basically
affects where recheckpoint and reclaim happens inside kernel space, and
should not change userspace behavior.

I say "almost transparent" because it might cause some very specific
transactions to have a higher doom rate, see patch 14/14 for a more detailed
information, and also a reference for GLIBCs "tabort prior system calls"
behavior.

Regarding Adhemerval's patch, it is unaffected to this new model. Prior to
kernel 4.2, kernel was executing a syscall independently of the TM state,
which caused undesired side effect, thus GLIBC decision to abort a
transaction prior to calling a syscall.

Later, kernel system call mechanism was aware of the TM state, and this GLIBC
workaround was not necessary anymore.

More than that, this workaround started to cause  performance degradation on
context switches, mainly when TM facility became lazy enabled, i.e, the TM
facility mechanism would be enabled on demand (a task uses TM explicitly).
This happens because this "abort prior to every system call" workaround
started to trigger the TM facility to be enabled for every task that calls
system calls.

In fact, I was the one that identified this performance degradation issue,
and reported to Adhemerval who kindly fixed it with
f0458cf4f9ff3d870c43b624e6dccaaf657d5e83.

Anyway, I think we are safe here.

Thanks for bringing this up.
Breno
Michael Neuling Nov. 7, 2018, 12:39 a.m. UTC | #3
> In fact, I was the one that identified this performance degradation issue,
> and reported to Adhemerval who kindly fixed it with
> f0458cf4f9ff3d870c43b624e6dccaaf657d5e83.
> 
> Anyway, I think we are safe here.

FWIW Agreed. PPC_FEATURE2_HTM_NOSC should be persevered by this series.

Mikey