um: add RCU syscall hack for time-travel

Message ID	20240830153825.1466691-1-benjamin@sipsolutions.net
State	Changes Requested
Headers	show Return-Path: <linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org> From: Benjamin Berg <benjamin@sipsolutions.net> To: linux-um@lists.infradead.org Cc: Benjamin Berg <benjamin.berg@intel.com> Subject: [PATCH] um: add RCU syscall hack for time-travel Date: Fri, 30 Aug 2024 17:38:25 +0200 Message-ID: <20240830153825.1466691-1-benjamin@sipsolutions.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit preview: From: Benjamin Berg <benjamin.berg@intel.com> In time-travel mode userspace can do a lot of work without any time passing. Unfortunately, this can result in OOM situations as the RCU core code will never be run. Content analysis details: (-0.2 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.0 T_SCC_BODY_TEXT_LINE No description available. Precedence: list Sender: "linux-um" <linux-um-bounces@lists.infradead.org> Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org
Series	um: add RCU syscall hack for time-travel \| expand um: add RCU syscall hack for time-travel

Benjamin Berg Aug. 30, 2024, 3:38 p.m. UTC

From: Benjamin Berg <benjamin.berg@intel.com>

In time-travel mode userspace can do a lot of work without any time
passing. Unfortunately, this can result in OOM situations as the RCU
core code will never be run.

Work around that by kicking the RCU using rcu_sched_clock_irq. So
behave to the RCU code as if a clock tick happened every syscall.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>

---

This patch is on top of "um: fix time-travel syscall scheduling hack"
---
 arch/um/kernel/skas/syscall.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Richard Weinberger Sept. 12, 2024, 7:02 p.m. UTC | #1

On Fri, Aug 30, 2024 at 5:38 PM Benjamin Berg <benjamin@sipsolutions.net> wrote:
>
> From: Benjamin Berg <benjamin.berg@intel.com>
>
> In time-travel mode userspace can do a lot of work without any time
> passing. Unfortunately, this can result in OOM situations as the RCU
> core code will never be run.
>
> Work around that by kicking the RCU using rcu_sched_clock_irq. So
> behave to the RCU code as if a clock tick happened every syscall.
>
> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
>
> ---
>
> This patch is on top of "um: fix time-travel syscall scheduling hack"
> ---
>  arch/um/kernel/skas/syscall.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/arch/um/kernel/skas/syscall.c b/arch/um/kernel/skas/syscall.c
> index b09e85279d2b..4b4ab8bf8a0c 100644
> --- a/arch/um/kernel/skas/syscall.c
> +++ b/arch/um/kernel/skas/syscall.c
> @@ -19,6 +19,21 @@ void handle_syscall(struct uml_pt_regs *r)
>         struct pt_regs *regs = container_of(r, struct pt_regs, regs);
>         int syscall;
>
> +       /*
> +        * This is a "bit" of a hack. But in time-travel mode userspace can do
> +        * a lot of work without any time passing. Unfortunately, this can
> +        * result in OOM situations as the RCU core code will never be run.
> +        *
> +        * Work around that by kicking the RCU using rcu_sched_clock_irq. So
> +        * behave to the RCU code as if a clock tick happened every syscall.
> +        */
> +       if (time_travel_mode == TT_MODE_INFCPU ||
> +           time_travel_mode == TT_MODE_EXTERNAL) {
> +               local_irq_disable();
> +               rcu_sched_clock_irq(1);
> +               local_irq_enable();
> +       }
> +

While I acknowledge that time-travel itself is a beautiful hack, I'd
like to keep the hacks
to keep it working minimal.
So, the problem here is that RCU callbacks never run and just pile up?

I wonder why such a situation does not happen in a nohz_full setup on
regular systems.

Benjamin Berg Sept. 13, 2024, 10:50 a.m. UTC | #2

Hi,

On Thu, 2024-09-12 at 21:02 +0200, Richard Weinberger wrote:
> On Fri, Aug 30, 2024 at 5:38 PM Benjamin Berg
> <benjamin@sipsolutions.net> wrote:
> > 
> > From: Benjamin Berg <benjamin.berg@intel.com>
> > 
> > In time-travel mode userspace can do a lot of work without any time
> > passing. Unfortunately, this can result in OOM situations as the
> > RCU
> > core code will never be run.
> > 
> > Work around that by kicking the RCU using rcu_sched_clock_irq. So
> > behave to the RCU code as if a clock tick happened every syscall.
> > 
> > Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
> > 
> > [SNIP]
> 
> While I acknowledge that time-travel itself is a beautiful hack, I'd
> like to keep the hacks
> to keep it working minimal.
> So, the problem here is that RCU callbacks never run and just pile up?

Yes. A simple example of this is doing a "find /". This will allocate a
lot of inode information which is only free'ed at a later point.

> I wonder why such a situation does not happen in a nohz_full setup on
> regular systems.

Had to search for a bit. But, I think the boot CPU will still have a
tick even on a NOHZ_FULL setup. see the nohz_full= boot parameter.

It does look like the RCU code might try to force scheduling (tiny RCU)
or wake up a worker (tree RCU) in these situations. But neither of
these attempts is going to fix the situation as there will be no call
to rcu_sched_clock_irq with time-travel.

Benjamin

Richard Weinberger Sept. 13, 2024, 11:47 a.m. UTC | #3

----- Ursprüngliche Mail -----
> Von: "Benjamin Berg" <benjamin@sipsolutions.net>
>> While I acknowledge that time-travel itself is a beautiful hack, I'd
>> like to keep the hacks
>> to keep it working minimal.
>> So, the problem here is that RCU callbacks never run and just pile up?
> 
> Yes. A simple example of this is doing a "find /". This will allocate a
> lot of inode information which is only free'ed at a later point.
> 
>> I wonder why such a situation does not happen in a nohz_full setup on
>> regular systems.
> 
> Had to search for a bit. But, I think the boot CPU will still have a
> tick even on a NOHZ_FULL setup. see the nohz_full= boot parameter.
> 
> It does look like the RCU code might try to force scheduling (tiny RCU)
> or wake up a worker (tree RCU) in these situations. But neither of
> these attempts is going to fix the situation as there will be no call
> to rcu_sched_clock_irq with time-travel.

Agreed. I think having a house keeping CPU (thread) will not work in
time-travel mode.
Kicking RCU whenever a syscall is executed is okay, the question is,
are there other scenarios where RCU work can pile up and no syscall is
run for a long time? Maybe we need to kick it at other places (page fault handler?)
too.

Thanks,
//richard

Benjamin Berg Sept. 13, 2024, 12:04 p.m. UTC | #4

Hi

First, it doesn't seem like my patch actually works, so please do not
merge it. It actually appears that tree RCU and tiny RCU (which are
selected depending on the preemption setting) are behaving differently.

So now I am wondering if I can come up with a hack that works for both.

On Fri, 2024-09-13 at 13:47 +0200, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
> > Von: "Benjamin Berg" <benjamin@sipsolutions.net>
> > > While I acknowledge that time-travel itself is a beautiful hack, I'd
> > > like to keep the hacks
> > > to keep it working minimal.
> > > So, the problem here is that RCU callbacks never run and just pile up?
> > 
> > Yes. A simple example of this is doing a "find /". This will allocate a
> > lot of inode information which is only free'ed at a later point.
> > 
> > > I wonder why such a situation does not happen in a nohz_full setup on
> > > regular systems.
> > 
> > Had to search for a bit. But, I think the boot CPU will still have a
> > tick even on a NOHZ_FULL setup. see the nohz_full= boot parameter.
> > 
> > It does look like the RCU code might try to force scheduling (tiny RCU)
> > or wake up a worker (tree RCU) in these situations. But neither of
> > these attempts is going to fix the situation as there will be no call
> > to rcu_sched_clock_irq with time-travel.
> 
> Agreed. I think having a house keeping CPU (thread) will not work in
> time-travel mode.
> Kicking RCU whenever a syscall is executed is okay, the question is,
> are there other scenarios where RCU work can pile up and no syscall is
> run for a long time? Maybe we need to kick it at other places (page fault handler?)
> too.

Hmm, that is good question. I assume that implies major faults for
mapped files (or anonymous memory from swap) happening. I suppose, that
can trigger just about anything in the kernel and could also create
load on the RCU. Not sure how problematic that is, in our case it was
python importing a large amount of files and bringing the system to its
knees in the process.

Anyway, I'll need to reconsider the hack a bit, maybe we can find a
better solution.

Benjamin

Richard Weinberger Sept. 13, 2024, 12:32 p.m. UTC | #5

Hi!

----- Ursprüngliche Mail -----
> Von: "Benjamin Berg" <benjamin@sipsolutions.net>
> First, it doesn't seem like my patch actually works, so please do not
> merge it. It actually appears that tree RCU and tiny RCU (which are
> selected depending on the preemption setting) are behaving differently.
> 
> So now I am wondering if I can come up with a hack that works for both.

Ok!
 
> On Fri, 2024-09-13 at 13:47 +0200, Richard Weinberger wrote:
>> ----- Ursprüngliche Mail -----
>> > Von: "Benjamin Berg" <benjamin@sipsolutions.net>
>> > > While I acknowledge that time-travel itself is a beautiful hack, I'd
>> > > like to keep the hacks
>> > > to keep it working minimal.
>> > > So, the problem here is that RCU callbacks never run and just pile up?
>> > 
>> > Yes. A simple example of this is doing a "find /". This will allocate a
>> > lot of inode information which is only free'ed at a later point.
>> > 
>> > > I wonder why such a situation does not happen in a nohz_full setup on
>> > > regular systems.
>> > 
>> > Had to search for a bit. But, I think the boot CPU will still have a
>> > tick even on a NOHZ_FULL setup. see the nohz_full= boot parameter.
>> > 
>> > It does look like the RCU code might try to force scheduling (tiny RCU)
>> > or wake up a worker (tree RCU) in these situations. But neither of
>> > these attempts is going to fix the situation as there will be no call
>> > to rcu_sched_clock_irq with time-travel.
>> 
>> Agreed. I think having a house keeping CPU (thread) will not work in
>> time-travel mode.
>> Kicking RCU whenever a syscall is executed is okay, the question is,
>> are there other scenarios where RCU work can pile up and no syscall is
>> run for a long time? Maybe we need to kick it at other places (page fault
>> handler?)
>> too.
> 
> Hmm, that is good question. I assume that implies major faults for
> mapped files (or anonymous memory from swap) happening. I suppose, that
> can trigger just about anything in the kernel and could also create
> load on the RCU. Not sure how problematic that is, in our case it was
> python importing a large amount of files and bringing the system to its
> knees in the process.

I had also workloads like heavy network processing without userspace
interaction in mind.
 
> Anyway, I'll need to reconsider the hack a bit, maybe we can find a
> better solution.

We can also add RCU folks into the loop. But I guess they need a good
introduction first what time-traveling is. :-D

Thanks,
//richard

um: add RCU syscall hack for time-travel

Commit Message

Comments

Patch