Message ID | 20231025224504.7be512cdfeb4.Ia20103ef9899e53311bd671b8cfbca4f057a4668@changeid |
---|---|
State | Accepted |
Headers | show |
Series | [v2] um: time-travel: fix time corruption | expand |
On Wed, 2023-10-25 at 22:45 +0200, Johannes Berg wrote: > From: Johannes Berg <johannes.berg@intel.com> > > In 'basic' time-travel mode (without =inf-cpu or =ext), we > still get timer interrupts. These can happen at arbitrary > points in time, i.e. while in timer_read(), which pushes > time forward just a little bit. Then, if we happen to get > the interrupt after calculating the new time to push to, > but before actually finishing that, the interrupt will set > the time to a value that's incompatible with the forward, > and we'll crash because time goes backwards when we do the > forwarding. > > Fix this by reading the time_travel_time, calculating the > adjustment, and doing the adjustment all with interrupts > disabled. > > Reported-by: Vincent Whitchurch <Vincent.Whitchurch@axis.com> > Signed-off-by: Johannes Berg <johannes.berg@intel.com> Thanks, this works for me too. However, one question below. > --- > v2: remove stray debug code > --- > arch/um/kernel/time.c | 32 +++++++++++++++++++++++++++----- > 1 file changed, 27 insertions(+), 5 deletions(-) > > diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c > index 8ff46bc86d09..0c01674e14d5 100644 > --- a/arch/um/kernel/time.c > +++ b/arch/um/kernel/time.c > @@ -551,9 +551,29 @@ static void time_travel_update_time(unsigned long long next, bool idle) > time_travel_del_event(&ne); > } > > > +static void time_travel_update_time_rel(unsigned long long offs) > +{ > + unsigned long flags; > + > + /* > + * Disable interrupts before calculating the new time so > + * that a real timer interrupt (signal) can't happen at > + * a bad time e.g. after we read time_travel_time but > + * before we've completed updating the time. > + */ > + local_irq_save(flags); > + time_travel_update_time(time_travel_time + offs, false); > + local_irq_restore(flags); > +} > + > void time_travel_ndelay(unsigned long nsec) > { > - time_travel_update_time(time_travel_time + nsec, false); > + /* > + * Not strictly needed to use _rel() version since this is > + * only used in INFCPU/EXT modes, but it doesn't hurt and > + * is more readable too. > + */ > + time_travel_update_time_rel(nsec); > } > EXPORT_SYMBOL(time_travel_ndelay); > > > @@ -687,7 +707,11 @@ static void time_travel_set_start(void) > #define time_travel_time 0 > #define time_travel_ext_waiting 0 > > > -static inline void time_travel_update_time(unsigned long long ns, bool retearly) > +static inline void time_travel_update_time(unsigned long long ns, bool idle) > +{ > +} > + > +static inline void time_travel_update_time_rel(unsigned long long offs) > { > } > > > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs) > */ > if (!irqs_disabled() && !in_interrupt() && !in_softirq() && > !time_travel_ext_waiting) > - time_travel_update_time(time_travel_time + > - TIMER_MULTIPLIER, > - false); > + time_travel_update_time_rel(TIMER_MULTIPLIER); > return time_travel_time / TIMER_MULTIPLIER; > } The reason I hesitated with putting the whole of time_travel_update_time() under local_irq_save() in my attempt was because I didn't quite understand the reason for the !irqs_disabled() condition here and the comment just above it about recursion and things getting messed up. If it's OK to disable interrupts as this patch does, is the !irqs_disabled() condition valid?
On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote: > > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs) > > */ > > if (!irqs_disabled() && !in_interrupt() && !in_softirq() && > > !time_travel_ext_waiting) > > - time_travel_update_time(time_travel_time + > > - TIMER_MULTIPLIER, > > - false); > > + time_travel_update_time_rel(TIMER_MULTIPLIER); > > return time_travel_time / TIMER_MULTIPLIER; > > } > > The reason I hesitated with putting the whole of > time_travel_update_time() under local_irq_save() in my attempt was > because I didn't quite understand the reason for the !irqs_disabled() > condition here and the comment just above it about recursion and things > getting messed up. If it's OK to disable interrupts as this patch does, > is the !irqs_disabled() condition valid? Hmm. I was going to say that's different, because it wants to only prevent us from doing this while we're *already* in IRQ context, and the bug you found is calling timer_read() not in IRQ context, but getting an event queued by the signal. But ... now that I think about it, I have a feeling that this was a workaround for the exact same problem, and I just didn't understand it at the time? I mean, recursing into our own processing is now impossible here after this patch - either we're running normally, or the interrupt cannot hit timer_read() in the middle, same as it cannot hit time_travel_handle_real_alarm() in the middle now. Removing that still seems to work with your test, but it's also not a good test for this, since there are no devices etc. that could have interrupts, not sure how to test it right now? Maybe I'll add a comment there saying this might no longer be needed? johannes
On Thu, 2023-10-26 at 09:38 +0200, Johannes Berg wrote: > On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote: > > > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs) > > > */ > > > if (!irqs_disabled() && !in_interrupt() && !in_softirq() && > > > !time_travel_ext_waiting) > > > - time_travel_update_time(time_travel_time + > > > - TIMER_MULTIPLIER, > > > - false); > > > + time_travel_update_time_rel(TIMER_MULTIPLIER); > > > return time_travel_time / TIMER_MULTIPLIER; > > > } > > > > The reason I hesitated with putting the whole of > > time_travel_update_time() under local_irq_save() in my attempt was > > because I didn't quite understand the reason for the !irqs_disabled() > > condition here and the comment just above it about recursion and things > > getting messed up. If it's OK to disable interrupts as this patch does, > > is the !irqs_disabled() condition valid? > > Hmm. I was going to say that's different, because it wants to only > prevent us from doing this while we're *already* in IRQ context, and the > bug you found is calling timer_read() not in IRQ context, but getting an > event queued by the signal. > > But ... now that I think about it, I have a feeling that this was a > workaround for the exact same problem, and I just didn't understand it > at the time? I mean, recursing into our own processing is now impossible > here after this patch - either we're running normally, or the interrupt > cannot hit timer_read() in the middle, same as it cannot hit > time_travel_handle_real_alarm() in the middle now. > > Removing that still seems to work with your test, but it's also not a > good test for this, since there are no devices etc. that could have > interrupts, not sure how to test it right now? > > Maybe I'll add a comment there saying this might no longer be needed? I tried removing the !irqs_disabled() check and that blew up pretty quickly (below) when running the full roadtest suite. It works fine with your unmodified patch so no need to change the comment. Kernel panic - not syncing: time-travel: time goes backwards 26374790000864 -> 26374790000853 show_stack.cold (arch/um/kernel/sysrq.c:56) dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) dump_stack (lib/dump_stack.c:114) panic (kernel/panic.c:262 kernel/panic.c:361) timer_handler.cold (arch/um/kernel/time.c:51 arch/um/kernel/time.c:510 arch/um/kernel/time.c:634) timer_real_alarm_handler (arch/um/os-Linux/signal.c:109) unblock_signals (arch/um/os-Linux/signal.c:338) tick_nohz_idle_exit (kernel/time/tick-sched.c:1364) do_idle (kernel/sched/idle.c:310) cpu_startup_entry (kernel/sched/idle.c:379 (discriminator 1)) kernel_init (init/main.c:1435) 0x60001ce6 0x6000220e 0x60004961 new_thread_handler (arch/um/include/asm/thread_info.h:46 arch/um/kernel/process.c:136) uml_finishsetup (arch/um/kernel/um_arch.c:268)
On Thu, 2023-10-26 at 08:49 +0000, Vincent Whitchurch wrote: > On Thu, 2023-10-26 at 09:38 +0200, Johannes Berg wrote: > > On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote: > > > > @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs) > > > > */ > > > > if (!irqs_disabled() && !in_interrupt() && !in_softirq() && > > > > !time_travel_ext_waiting) > > > > - time_travel_update_time(time_travel_time + > > > > - TIMER_MULTIPLIER, > > > > - false); > > > > + time_travel_update_time_rel(TIMER_MULTIPLIER); > > > > return time_travel_time / TIMER_MULTIPLIER; > > > > } > > > > > > The reason I hesitated with putting the whole of > > > time_travel_update_time() under local_irq_save() in my attempt was > > > because I didn't quite understand the reason for the !irqs_disabled() > > > condition here and the comment just above it about recursion and things > > > getting messed up. If it's OK to disable interrupts as this patch does, > > > is the !irqs_disabled() condition valid? > > > > Hmm. I was going to say that's different, because it wants to only > > prevent us from doing this while we're *already* in IRQ context, and the > > bug you found is calling timer_read() not in IRQ context, but getting an > > event queued by the signal. > > > > But ... now that I think about it, I have a feeling that this was a > > workaround for the exact same problem, and I just didn't understand it > > at the time? I mean, recursing into our own processing is now impossible > > here after this patch - either we're running normally, or the interrupt > > cannot hit timer_read() in the middle, same as it cannot hit > > time_travel_handle_real_alarm() in the middle now. > > > > Removing that still seems to work with your test, but it's also not a > > good test for this, since there are no devices etc. that could have > > interrupts, not sure how to test it right now? > > > > Maybe I'll add a comment there saying this might no longer be needed? > > I tried removing the !irqs_disabled() check and that blew up pretty > quickly (below) when running the full roadtest suite. It works fine > with your unmodified patch so no need to change the comment. > Hah, OK. So maybe when/if I remember what happens there or can figure it out again, I can update the comment :) johannes
Am 26.10.2023 um 11:10 schrieb Johannes Berg: > On Thu, 2023-10-26 at 08:49 +0000, Vincent Whitchurch wrote: >> On Thu, 2023-10-26 at 09:38 +0200, Johannes Berg wrote: >>> On Thu, 2023-10-26 at 07:23 +0000, Vincent Whitchurch wrote: >>>>> @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs) >>>>> */ >>>>> if (!irqs_disabled() && !in_interrupt() && !in_softirq() && >>>>> !time_travel_ext_waiting) >>>>> - time_travel_update_time(time_travel_time + >>>>> - TIMER_MULTIPLIER, >>>>> - false); >>>>> + time_travel_update_time_rel(TIMER_MULTIPLIER); >>>>> return time_travel_time / TIMER_MULTIPLIER; >>>>> } >>>> >>>> The reason I hesitated with putting the whole of >>>> time_travel_update_time() under local_irq_save() in my attempt was >>>> because I didn't quite understand the reason for the !irqs_disabled() >>>> condition here and the comment just above it about recursion and things >>>> getting messed up. If it's OK to disable interrupts as this patch does, >>>> is the !irqs_disabled() condition valid? >>> >>> Hmm. I was going to say that's different, because it wants to only >>> prevent us from doing this while we're *already* in IRQ context, and the >>> bug you found is calling timer_read() not in IRQ context, but getting an >>> event queued by the signal. >>> >>> But ... now that I think about it, I have a feeling that this was a >>> workaround for the exact same problem, and I just didn't understand it >>> at the time? I mean, recursing into our own processing is now impossible >>> here after this patch - either we're running normally, or the interrupt >>> cannot hit timer_read() in the middle, same as it cannot hit >>> time_travel_handle_real_alarm() in the middle now. >>> >>> Removing that still seems to work with your test, but it's also not a >>> good test for this, since there are no devices etc. that could have >>> interrupts, not sure how to test it right now? >>> >>> Maybe I'll add a comment there saying this might no longer be needed? >> >> I tried removing the !irqs_disabled() check and that blew up pretty >> quickly (below) when running the full roadtest suite. It works fine >> with your unmodified patch so no need to change the comment. >> > > Hah, OK. So maybe when/if I remember what happens there or can figure it > out again, I can update the comment :) As I said, I'm also preparing some patches, therefore my few bits about those problems: - we noticed, that those backward running time is mostly created by the "unprotected" access to the variables of the time travel event list. From the stack trace you see the problem even do not appear directly at timer_read but later. I think mostly this means the event list head was moved and then the interrupt happens and the pointer (or stored time) become outdated and the function (here the alarm_handler) continues with them. - besides this, when you look into local_irq_save for um, this does not really deactivate the interrupts, but delay the processing and it adds a memory barrier. I think that could be one of the important consequences as changes to the event list are forced to be populated. But this is only a guess :-D - I'm also not really convinced, that all accesses to the current time should be delayed. I made a patch with a heuristic, that only delays the time read, if it is read from userspace multiple times in a row. Since there are no "busy" loops in the kernel itself and only in a few (bad behaving) userspace programs, I think that helps to reduce the inaccuracy in simulation from well behaving userspace programs only reading the time once. In this case, irqs_disabled more or less only tend to indicate, that the event list could be manipulated, and therefore the update_time call is a bad idea. Benjamin
On Fri, 2023-10-27 at 16:05 +0200, Benjamin Beichler wrote: > > As I said, I'm also preparing some patches, therefore my few bits about > those problems: > > - we noticed, that those backward running time is mostly created by the > "unprotected" access to the variables of the time travel event list. > From the stack trace you see the problem even do not appear directly at > timer_read but later. I think mostly this means the event list head was > moved and then the interrupt happens and the pointer (or stored time) > become outdated and the function (here the alarm_handler) continues with > them. That's not entirely implausible. > - besides this, when you look into local_irq_save for um, this does not > really deactivate the interrupts, but delay the processing and it adds a > memory barrier. I think that could be one of the important consequences > as changes to the event list are forced to be populated. But this is > only a guess :-D No, it *does* disable interrupts from Linux's POV. The memory barrier is just there to make _that_ actually visible "immediately". Yes it doesn't actually disable the *signal* underneath, but that doesn't really mean anything? Once you get the signal, nothing happens if it's disabled. In UML, the kernel binary is kind of both hypervisor and guest kernel, so you can think of the low-level code here as being part of the hypervisor, so that blocking the signal from actually calling out into the guest is like blocking the PIC from sending to the CPU. Obviously the device(s) still send(s) the interrupt(s), but they don't go into the "guest". > - I'm also not really convinced, that all accesses to the current time > should be delayed. I made a patch with a heuristic, that only delays the > time read, if it is read from userspace multiple times in a row. How would you even detect that though? And on the other hand - why not? There's always a cost to things in a real system, it's not free to access the current time? Maybe it's faster in a real system with VDSO though. > Since > there are no "busy" loops in the kernel itself and only in a few (bad > behaving) userspace programs, I think that helps to reduce the > inaccuracy in simulation from well behaving userspace programs only > reading the time once. So I'm not sure I'd call it an inaccuracy? It's ... perfectly accurate in the model that we're implementing? Maybe you don't like the model ;-) > In this case, irqs_disabled more or less only tend to indicate, that the > event list could be manipulated, and therefore the update_time call is a > bad idea. Which is kind of what I was thinking about earlier, but Vincent says it's not _just_ for that. We probably should just add a spinlock for this though - right now none of this is good for eventual SMP support, and it's much easier to reason about when we have a lock? johannes
Am 27.10.2023 um 16:45 schrieb Johannes Berg: > On Fri, 2023-10-27 at 16:05 +0200, Benjamin Beichler wrote: >> - besides this, when you look into local_irq_save for um, this does >> not really deactivate the interrupts, but delay the processing and it >> adds a memory barrier. I think that could be one of the important >> consequences as changes to the event list are forced to be populated. >> But this is only a guess :-D > No, it *does* disable interrupts from Linux's POV. The memory barrier > is just there to make _that_ actually visible "immediately". Yes it > doesn't actually disable the *signal* underneath, but that doesn't > really mean anything? Once you get the signal, nothing happens if it's > disabled. Maybe I was a bit sloppy here. What I meant was, that a signal could interrupt the code even the interrupts are disabled, resulting into all the nice preemption consequences. In real systems that may only do the NMI ? And especially the SIGIO handler keeps intentionally calling the time travel handlers when the interrupts are disabled. The interrupt handlers of drivers are called later, but the time travel handlers tend to change the event list. > In UML, the kernel binary is kind of both hypervisor and guest kernel, > so you can think of the low-level code here as being part of the > hypervisor, so that blocking the signal from actually calling out into > the guest is like blocking the PIC from sending to the CPU. Obviously > the device(s) still send(s) the interrupt(s), but they don't go into > the "guest". >> - I'm also not really convinced, that all accesses to the current >> time should be delayed. I made a patch with a heuristic, that only >> delays the time read, if it is read from userspace multiple times in >> a row. > How would you even detect that though? And on the other hand - why > not? There's always a cost to things in a real system, it's not free > to access the current time? Maybe it's faster in a real system with > VDSO though. >> Since there are no "busy" loops in the kernel itself and only in a >> few (bad behaving) userspace programs, I think that helps to reduce >> the inaccuracy in simulation from well behaving userspace programs >> only reading the time once. > So I'm not sure I'd call it an inaccuracy? It's ... perfectly accurate > in the model that we're implementing? Maybe you don't like the model ;-) I think I like the model too much, it fits so nicely in common DES primitives. :-D We have unlimited processing power (or every processing has zero execution time), so why delaying a program which get once a while a timestamp from the system. Moreover, since I want a really deterministic model, I anticipate that if I send a msg at timestamp t1, my program should create an answer at t1 and not sporadically at t1+delta, because the file system driver took a timestamp at a background task. >> In this case, irqs_disabled more or less only tend to indicate, that >> the event list could be manipulated, and therefore the update_time >> call is a bad idea. > Which is kind of what I was thinking about earlier, but Vincent says > it's not _just_ for that. Mhh, I think Vincent only wondered about whether the recursion statement was right. > We probably should just add a spinlock for this though - right now > none of this is good for eventual SMP support, and it's much easier to > reason about when we have a lock? Mhh I think that would make sense. As I said, we also had problems in ext-mode, which disappeared, after I introduce local_irq_save(flags); around all operation, that modified the event list. I think a spinlock would show clearer the intention. For SMP we may need some fine grain scheme (or RCU-based locking), to prevent deadlocks, but I'm again not sure. > johannes kind regards Benjamin
On Fri, 2023-10-27 at 17:32 +0200, Benjamin Beichler wrote: > Am 27.10.2023 um 16:45 schrieb Johannes Berg: > > On Fri, 2023-10-27 at 16:05 +0200, Benjamin Beichler wrote: > > > - besides this, when you look into local_irq_save for um, this does > > > not really deactivate the interrupts, but delay the processing and it > > > adds a memory barrier. I think that could be one of the important > > > consequences as changes to the event list are forced to be populated. > > > But this is only a guess :-D > > No, it *does* disable interrupts from Linux's POV. The memory barrier > > is just there to make _that_ actually visible "immediately". Yes it > > doesn't actually disable the *signal* underneath, but that doesn't > > really mean anything? Once you get the signal, nothing happens if it's > > disabled. > Maybe I was a bit sloppy here. What I meant was, that a signal could > interrupt the code even the interrupts are disabled, Yes, briefly, to find it shouldn't do anything. So nothing should happen because of it? > resulting into all the nice preemption consequences. What do you mean? > In real systems that may only do the NMI ? > And especially the SIGIO handler keeps intentionally calling the time > travel handlers > when the interrupts are disabled. The interrupt handlers of drivers are > called later, but the > time travel handlers tend to change the event list. Ah! Yes, that's true too ... hmm. > I think I like the model too much, it fits so nicely in common DES > primitives. :-D :) > We have unlimited processing power (or every processing has zero > execution time), > so why delaying a program which get once a while a timestamp from the > system. Yeah, OK, fair enough. Still, we _do_ need this to make progress at all in some cases, like the python case described there somewhere. > Moreover, since I want a really deterministic model, I anticipate that > if I send a msg > at timestamp t1, my program should create an answer at t1 and not > sporadically at > t1+delta, because the file system driver took a timestamp at a > background task. Well at least it should grab the background task at the same time every time ;-) But yeah that's only really true if you have all of the patches we have to nail down random number generation ... > > > In this case, irqs_disabled more or less only tend to indicate, that > > > the event list could be manipulated, and therefore the update_time > > > call is a bad idea. > > Which is kind of what I was thinking about earlier, but Vincent says > > it's not _just_ for that. > Mhh, I think Vincent only wondered about whether the recursion statement > was right. No, I meant the fact that he reported that changing this to always delay was breaking it. > > We probably should just add a spinlock for this though - right now > > none of this is good for eventual SMP support, and it's much easier to > > reason about when we have a lock? > Mhh I think that would make sense. As I said, we also had problems in > ext-mode, which > disappeared, after I introduce local_irq_save(flags); around all > operation, that modified > the event list. I think a spinlock would show clearer the intention. For > SMP we may > need some fine grain scheme (or RCU-based locking), to prevent > deadlocks, but I'm again > not sure. > RCU isn't really locking, I don't see how that'd work in the face of changing the list. :) But a spinlock should work, we might need to also disable IRQs though. Well anyway, I don't think I'll do any work here before you've also posted your patches. And maybe we should apply all the patches on the list now since they make things better, and then reconsider the model more carefully with the next set of changes? johannes
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c index 8ff46bc86d09..0c01674e14d5 100644 --- a/arch/um/kernel/time.c +++ b/arch/um/kernel/time.c @@ -551,9 +551,29 @@ static void time_travel_update_time(unsigned long long next, bool idle) time_travel_del_event(&ne); } +static void time_travel_update_time_rel(unsigned long long offs) +{ + unsigned long flags; + + /* + * Disable interrupts before calculating the new time so + * that a real timer interrupt (signal) can't happen at + * a bad time e.g. after we read time_travel_time but + * before we've completed updating the time. + */ + local_irq_save(flags); + time_travel_update_time(time_travel_time + offs, false); + local_irq_restore(flags); +} + void time_travel_ndelay(unsigned long nsec) { - time_travel_update_time(time_travel_time + nsec, false); + /* + * Not strictly needed to use _rel() version since this is + * only used in INFCPU/EXT modes, but it doesn't hurt and + * is more readable too. + */ + time_travel_update_time_rel(nsec); } EXPORT_SYMBOL(time_travel_ndelay); @@ -687,7 +707,11 @@ static void time_travel_set_start(void) #define time_travel_time 0 #define time_travel_ext_waiting 0 -static inline void time_travel_update_time(unsigned long long ns, bool retearly) +static inline void time_travel_update_time(unsigned long long ns, bool idle) +{ +} + +static inline void time_travel_update_time_rel(unsigned long long offs) { } @@ -839,9 +863,7 @@ static u64 timer_read(struct clocksource *cs) */ if (!irqs_disabled() && !in_interrupt() && !in_softirq() && !time_travel_ext_waiting) - time_travel_update_time(time_travel_time + - TIMER_MULTIPLIER, - false); + time_travel_update_time_rel(TIMER_MULTIPLIER); return time_travel_time / TIMER_MULTIPLIER; }