Message ID | 20171019130215.31354-1-kristian.evensen@gmail.com |
---|---|
State | New |
Delegated to: | John Crispin |
Headers | show |
Series | [LEDE-DEV] procd: Restore respawn on SIGTERM timeout | expand |
Kristian Evensen <kristian.evensen@gmail.com> wrote: > When SIGTERM times out, procd sends SIGKILL and then restarts > the process once SIGCHLD has been received. This all works > fine, with one exception - respawn is not restored when > instance_start() is called from instance_exit(). The reason is > that respawn is always set to false in instance_stop(), and the > same service_instance struct is used for the > instance_start()-call. > > The consequence is that if the process is killed/crashes again, > it will not respawn. Solve this issue by adding a variable used > to store the original value of respawn in instance_stop(), and > then restore the original respawn-value in instance_exit(). It smells like this likely applies to many other fields. Is there a path here that's not using the copy/compare routines for a service/instance? Should they be? Does your path even restore all the parameters of respawn? Cheers, Karl P > > Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> > --- > service/instance.c | 6 ++++-- > service/instance.h | 1 + > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/service/instance.c b/service/instance.c > index b7cb523..76c74ed 100644 > --- a/service/instance.c > +++ b/service/instance.c > @@ -532,9 +532,10 @@ instance_exit(struct uloop_process *p, int ret) > > if (in->halt) { > instance_removepid(in); > - if (in->restart) > + if (in->restart) { > + in->respawn = in->respawn_org; > instance_start(in); > - else { > + } else { > struct service *s = in->srv; > > avl_delete(&s->instances.avl, &in->node.avl); > @@ -567,6 +568,7 @@ instance_stop(struct service_instance *in, bool halt) > if (!in->proc.pending) > return; > in->halt = halt; > + in->respawn_org = in->respawn; > in->restart = in->respawn = false; > kill(in->proc.pid, SIGTERM); > uloop_timeout_set(&in->timeout, in->term_timeout * 1000); > diff --git a/service/instance.h b/service/instance.h > index bdd14de..a0ac302 100644 > --- a/service/instance.h > +++ b/service/instance.h > @@ -48,6 +48,7 @@ struct service_instance { > bool halt; > bool restart; > bool respawn; > + bool respawn_org; > int respawn_count; > int reload_signal; > struct timespec start; > -- > 2.11.0 > > > _______________________________________________ > Lede-dev mailing list > Lede-dev@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/lede-dev
Hi Karl, Sorry for my extremely late reply. For some reason, it was not picked up by Gmail and I did not see it before today. Since it was not picked up by Gmail, I had to do some creative c&p, sorry in advance for weird formatting. >>Kristian Evensen <kristian.evensen@gmail.com> wrote: >> When SIGTERM times out, procd sends SIGKILL and then restarts >> the process once SIGCHLD has been received. This all works >> fine, with one exception - respawn is not restored when >> instance_start() is called from instance_exit(). The reason is >> that respawn is always set to false in instance_stop(), and the >> same service_instance struct is used for the >> instance_start()-call. >> >> The consequence is that if the process is killed/crashes again, >> it will not respawn. Solve this issue by adding a variable used >> to store the original value of respawn in instance_stop(), and >> then restore the original respawn-value in instance_exit(). > > It smells like this likely applies to many other fields. Is there > a path here that's not using the copy/compare routines for a > service/instance? Should they be? Does your path even restore all > the parameters of respawn? As far as I can tell (and based on my tests), only in->respawn is overwritten and not recovered. All other fields keep their value. The rc.local restart-operation seems to be a stop() and then a start()-call, so in->restart is correctly set to false. BR, Kristian
On Thu, Oct 19, 2017 at 3:02 PM, Kristian Evensen <kristian.evensen@gmail.com> wrote: > When SIGTERM times out, procd sends SIGKILL and then restarts the > process once SIGCHLD has been received. This all works fine, with one > exception - respawn is not restored when instance_start() is called from > instance_exit(). The reason is that respawn is always set to false in > instance_stop(), and the same service_instance struct is used for the > instance_start()-call. > > The consequence is that if the process is killed/crashes again, it will > not respawn. Solve this issue by adding a variable used to store the > original value of respawn in instance_stop(), and then restore the > original respawn-value in instance_exit(). > > Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> > --- Ping on this patch :) -Kristian
On 04/01/18 13:17, Kristian Evensen wrote: > On Thu, Oct 19, 2017 at 3:02 PM, Kristian Evensen > <kristian.evensen@gmail.com> wrote: >> When SIGTERM times out, procd sends SIGKILL and then restarts the >> process once SIGCHLD has been received. This all works fine, with one >> exception - respawn is not restored when instance_start() is called from >> instance_exit(). The reason is that respawn is always set to false in >> instance_stop(), and the same service_instance struct is used for the >> instance_start()-call. >> >> The consequence is that if the process is killed/crashes again, it will >> not respawn. Solve this issue by adding a variable used to store the >> original value of respawn in instance_stop(), and then restore the >> original respawn-value in instance_exit(). >> >> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> >> --- > Ping on this patch :) > > -Kristian > Hi Kristian, its still on my todo list. the bug is confirmed i just dont like the solution. however i have so far failed to come up with a better more generic one. i'll give it another shot the next few days John
diff --git a/service/instance.c b/service/instance.c index b7cb523..76c74ed 100644 --- a/service/instance.c +++ b/service/instance.c @@ -532,9 +532,10 @@ instance_exit(struct uloop_process *p, int ret) if (in->halt) { instance_removepid(in); - if (in->restart) + if (in->restart) { + in->respawn = in->respawn_org; instance_start(in); - else { + } else { struct service *s = in->srv; avl_delete(&s->instances.avl, &in->node.avl); @@ -567,6 +568,7 @@ instance_stop(struct service_instance *in, bool halt) if (!in->proc.pending) return; in->halt = halt; + in->respawn_org = in->respawn; in->restart = in->respawn = false; kill(in->proc.pid, SIGTERM); uloop_timeout_set(&in->timeout, in->term_timeout * 1000); diff --git a/service/instance.h b/service/instance.h index bdd14de..a0ac302 100644 --- a/service/instance.h +++ b/service/instance.h @@ -48,6 +48,7 @@ struct service_instance { bool halt; bool restart; bool respawn; + bool respawn_org; int respawn_count; int reload_signal; struct timespec start;
When SIGTERM times out, procd sends SIGKILL and then restarts the process once SIGCHLD has been received. This all works fine, with one exception - respawn is not restored when instance_start() is called from instance_exit(). The reason is that respawn is always set to false in instance_stop(), and the same service_instance struct is used for the instance_start()-call. The consequence is that if the process is killed/crashes again, it will not respawn. Solve this issue by adding a variable used to store the original value of respawn in instance_stop(), and then restore the original respawn-value in instance_exit(). Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> --- service/instance.c | 6 ++++-- service/instance.h | 1 + 2 files changed, 5 insertions(+), 2 deletions(-)