Message ID | 1421122875-2963-3-git-send-email-alistair@popple.id.au |
---|---|
State | Changes Requested |
Headers | show |
Alistair Popple <alistair@popple.id.au> writes: > Add support for an ipmi watchdog timer. This patch will cause the > system to be reset if opal_run_pollers() isn't called for more than > about 60 seconds. > > The timer is reset just prior to running the payload. It is the > responsibility of the payload to ensure either opal_run_pollers() is > called frequently enough or to disable the watchdog timer by sending > appropriate ipmi commands. I'm a little uncomfortable in changing what is essentially an ABI here without having the payload explicitly enable the WDT. I worry about: - existing payloads - will current payloads behave correctly? Do we have anything that will currently make sure opal_run_pollers() is called frequently? - different ABI for payload kernel versus host kernel. If payload (i.e. petitboot environment) disables the watchdog timer, then we'll have different behavior if booted as payload versus as petitboot. - some concern about really low power modes and having to muck with this. I think I'd prefer us to have to enable the watchdog explicitly... Ben, your thoughts?
Hi, On Wed, 28 Jan 2015 16:37:02 Stewart Smith wrote: > Alistair Popple <alistair@popple.id.au> writes: > > Add support for an ipmi watchdog timer. This patch will cause the > > system to be reset if opal_run_pollers() isn't called for more than > > about 60 seconds. > > > > The timer is reset just prior to running the payload. It is the > > responsibility of the payload to ensure either opal_run_pollers() is > > called frequently enough or to disable the watchdog timer by sending > > appropriate ipmi commands. > > I'm a little uncomfortable in changing what is essentially an ABI here > without having the payload explicitly enable the WDT. > > I worry about: > - existing payloads > - will current payloads behave correctly? Do we have anything that > will currently make sure opal_run_pollers() is called frequently? > - different ABI for payload kernel versus host kernel. If payload > (i.e. petitboot environment) disables the watchdog timer, then we'll > have different behavior if booted as payload versus as petitboot. > - some concern about really low power modes and having to muck with > this. > > I think I'd prefer us to have to enable the watchdog explicitly... I tend to agree, although at the time Jeremy and I couldn't come up with much of an alternative. The problem is that we would like the wdt to be activated before skiboot jumps to the payload so that if the payload fails for some reason we reboot. A quick look at the FSP partition table format (FFS) that we use reveals that there are a number of user data fields (16) and some user flags for each partition. Maybe we could look at setting a bit in there to indicate the payload expects the wdt to be started? > Ben, your thoughts?
On Wed, 2015-01-28 at 16:37 +1100, Stewart Smith wrote: > Alistair Popple <alistair@popple.id.au> writes: > > Add support for an ipmi watchdog timer. This patch will cause the > > system to be reset if opal_run_pollers() isn't called for more than > > about 60 seconds. > > > > The timer is reset just prior to running the payload. It is the > > responsibility of the payload to ensure either opal_run_pollers() is > > called frequently enough or to disable the watchdog timer by sending > > appropriate ipmi commands. > > I'm a little uncomfortable in changing what is essentially an ABI here > without having the payload explicitly enable the WDT. > > I worry about: > - existing payloads > - will current payloads behave correctly? Do we have anything that > will currently make sure opal_run_pollers() is called frequently? Not necessarily. I think we put a kthread in recent kernels to do it but otherwise, we are utterly dependent on having hvc console call at a reasonably regular interval. > - different ABI for payload kernel versus host kernel. If payload > (i.e. petitboot environment) disables the watchdog timer, then we'll > have different behavior if booted as payload versus as petitboot. > - some concern about really low power modes and having to muck with > this. > > I think I'd prefer us to have to enable the watchdog explicitly... > > Ben, your thoughts?
On Wed, 2015-01-28 at 18:57 +1100, Alistair Popple wrote: > A quick look at the FSP partition table format (FFS) that we use reveals that > there are a number of user data fields (16) and some user flags for each > partition. Maybe we could look at setting a bit in there to indicate the > payload expects the wdt to be started? Bah, petitboot kernel will start it and that's it, once it's started, it's started and if we kexec something tat doesn't support it we are toast anyway Ben.
Alistair Popple <alistair@popple.id.au> writes: > On Wed, 28 Jan 2015 16:37:02 Stewart Smith wrote: >> Alistair Popple <alistair@popple.id.au> writes: >> > Add support for an ipmi watchdog timer. This patch will cause the >> > system to be reset if opal_run_pollers() isn't called for more than >> > about 60 seconds. >> > >> > The timer is reset just prior to running the payload. It is the >> > responsibility of the payload to ensure either opal_run_pollers() is >> > called frequently enough or to disable the watchdog timer by sending >> > appropriate ipmi commands. >> >> I'm a little uncomfortable in changing what is essentially an ABI here >> without having the payload explicitly enable the WDT. >> >> I worry about: >> - existing payloads >> - will current payloads behave correctly? Do we have anything that >> will currently make sure opal_run_pollers() is called frequently? >> - different ABI for payload kernel versus host kernel. If payload >> (i.e. petitboot environment) disables the watchdog timer, then we'll >> have different behavior if booted as payload versus as petitboot. >> - some concern about really low power modes and having to muck with >> this. >> >> I think I'd prefer us to have to enable the watchdog explicitly... > > I tend to agree, although at the time Jeremy and I couldn't come up with much > of an alternative. The problem is that we would like the wdt to be activated > before skiboot jumps to the payload so that if the payload fails for some > reason we reboot. We could always have the payload enable it *really* early on.. if we don't get to the point of being able to make an OPAL call to enable it, we're probably unlikely to ever actually boot.
On Thu, 2015-01-29 at 09:13 +1100, Stewart Smith wrote: > > I tend to agree, although at the time Jeremy and I couldn't come up with much > > of an alternative. The problem is that we would like the wdt to be activated > > before skiboot jumps to the payload so that if the payload fails for some > > reason we reboot. > > We could always have the payload enable it *really* early on.. if we > don't get to the point of being able to make an OPAL call to enable it, > we're probably unlikely to ever actually boot. But that's more/less pointless, that means we won't catch possible transcient issues preventing boot in OPAL itself etc... In the end, it doesn't matter because the payload is going to be petitboot kernel which we more/less control and can synchronize with OPAL, it doesn't remove the fact that subsequently booted OSes may not do the right thing. For example on Rhesus, there is no HVC, so we don't get a regular callback, not a problem in this case since we also don't have the IPMI watchdog but still... Cheers, Ben.
Hi Ben, > Bah, petitboot kernel will start it and that's it, once it's started, > it's started and if we kexec something tat doesn't support it we are > toast anyway My plan is to have petitboot disable the watchdog timer just before kexec. This way, we don't export this "ABI" beyond the petitboot environment, which we'd have control of anyway. The OS kernel can re-enable it if necessary, but isn't required to. Cheers, Jeremy
On Thu, 2015-01-29 at 08:54 +0800, Jeremy Kerr wrote: > Hi Ben, > > > Bah, petitboot kernel will start it and that's it, once it's started, > > it's started and if we kexec something tat doesn't support it we are > > toast anyway > > My plan is to have petitboot disable the watchdog timer just before > kexec. This way, we don't export this "ABI" beyond the petitboot > environment, which we'd have control of anyway. > > The OS kernel can re-enable it if necessary, but isn't required to. Which means that if it fails to boot, we have no watchdog... I don't like this *that* much... Can the watchdog be tied to an interrupt from the BMC ? Something we know we will react to unless the kernel is dead ? Ben.
On Thu, 29 Jan 2015 11:56:33 Benjamin Herrenschmidt wrote: > On Thu, 2015-01-29 at 08:54 +0800, Jeremy Kerr wrote: > > Hi Ben, > > > > > Bah, petitboot kernel will start it and that's it, once it's started, > > > it's started and if we kexec something tat doesn't support it we are > > > toast anyway > > > > My plan is to have petitboot disable the watchdog timer just before > > kexec. This way, we don't export this "ABI" beyond the petitboot > > environment, which we'd have control of anyway. > > > > The OS kernel can re-enable it if necessary, but isn't required to. > > Which means that if it fails to boot, we have no watchdog... I don't > like this *that* much... > > Can the watchdog be tied to an interrupt from the BMC ? Something we > know we will react to unless the kernel is dead ? Yes, there is a pre-timeout interrupt which works. So how about instead of requiring the payload OS to disable it we just set a pre-timeout interrupt prior to entering the OS? Once we see that interrupt in skiboot we can just disable the WDT and then it's up to the payload to do what it wants with the WDT through IPMI commands. If we never see the interrupt the WDT will trip and the system will reboot. > Ben.
On Thu, 2015-01-29 at 15:00 +1100, Alistair Popple wrote: > On Thu, 29 Jan 2015 11:56:33 Benjamin Herrenschmidt wrote: > > On Thu, 2015-01-29 at 08:54 +0800, Jeremy Kerr wrote: > > > Hi Ben, > > > > > > > Bah, petitboot kernel will start it and that's it, once it's started, > > > > it's started and if we kexec something tat doesn't support it we are > > > > toast anyway > > > > > > My plan is to have petitboot disable the watchdog timer just before > > > kexec. This way, we don't export this "ABI" beyond the petitboot > > > environment, which we'd have control of anyway. > > > > > > The OS kernel can re-enable it if necessary, but isn't required to. > > > > Which means that if it fails to boot, we have no watchdog... I don't > > like this *that* much... > > > > Can the watchdog be tied to an interrupt from the BMC ? Something we > > know we will react to unless the kernel is dead ? > > Yes, there is a pre-timeout interrupt which works. So how about instead of > requiring the payload OS to disable it we just set a pre-timeout interrupt > prior to entering the OS? Once we see that interrupt in skiboot we can just > disable the WDT and then it's up to the payload to do what it wants with the > WDT through IPMI commands. If we never see the interrupt the WDT will trip and > the system will reboot. Or we just set the pre-timeout interrupt always and use it as the watchdog tickle. Ben.
Hi Ben, > Or we just set the pre-timeout interrupt always and use it as the > watchdog tickle. The downside here is that we'd then have a periodic IPMI transaction, which may introduce jitter. I'd prefer to leave it up to the OS to enable - there's the IPMI watchdog in the kernel, which is capable of handling this. The downside is that the watchdog won't "watch" the period between OPAL platform init (where we handle the first interrupt) and watchdog driver init, but I think that's a reasonable compromise. Cheers, Jeremy
On Thu, 2015-01-29 at 12:59 +0800, Jeremy Kerr wrote: > Hi Ben, > > > Or we just set the pre-timeout interrupt always and use it as the > > watchdog tickle. > > The downside here is that we'd then have a periodic IPMI transaction, > which may introduce jitter. I'd prefer to leave it up to the OS to > enable - there's the IPMI watchdog in the kernel, which is capable of > handling this. > > The downside is that the watchdog won't "watch" the period between OPAL > platform init (where we handle the first interrupt) and watchdog driver > init, but I think that's a reasonable compromise. I'm not sure ... the watchdog driver init will typically be in the initramfs or root disk, lots of things can go wrong... Ben. > Cheers, > > > Jeremy
On Thu, 29 Jan 2015 16:08:12 Benjamin Herrenschmidt wrote: > On Thu, 2015-01-29 at 12:59 +0800, Jeremy Kerr wrote: > > Hi Ben, > > > > > Or we just set the pre-timeout interrupt always and use it as the > > > watchdog tickle. > > > > The downside here is that we'd then have a periodic IPMI transaction, > > which may introduce jitter. I'd prefer to leave it up to the OS to > > enable - there's the IPMI watchdog in the kernel, which is capable of > > handling this. > > > > The downside is that the watchdog won't "watch" the period between OPAL > > platform init (where we handle the first interrupt) and watchdog driver > > init, but I think that's a reasonable compromise. > > I'm not sure ... the watchdog driver init will typically be in the > initramfs or root disk, lots of things can go wrong... One of the problems I ran into when leaving the WDT enabled was that the system would actually become *less* reliable due to transient communication errors with a certain other system component that I haven't been able to debug... > Ben. > > > Cheers, > > > > > > Jeremy
diff --git a/core/init.c b/core/init.c index cffa638..ffd4621 100644 --- a/core/init.c +++ b/core/init.c @@ -44,6 +44,8 @@ #include <hostservices.h> #include <timer.h> +#include <ipmi.h> + /* * Boot semaphore, incremented by each CPU calling in * @@ -367,7 +369,7 @@ void __noreturn load_and_boot_kernel(bool is_reboot) } fsp_console_select_stdout(); - /* + /* * OCC takes few secs to boot. Call this as late as * as possible to avoid delay. */ @@ -672,6 +674,8 @@ void __noreturn main_cpu_entry(const void *fdt, u32 master_cpu) /* ... and add remaining reservations to the DT */ mem_region_add_dt_reserved(); + ipmi_wdt_reset(); + load_and_boot_kernel(false); } @@ -707,4 +711,3 @@ void __noreturn secondary_cpu_entry(void) __secondary_cpu_entry(); } - diff --git a/hw/ipmi/Makefile.inc b/hw/ipmi/Makefile.inc index 02670d7..1c358a9 100644 --- a/hw/ipmi/Makefile.inc +++ b/hw/ipmi/Makefile.inc @@ -1,5 +1,6 @@ SUBDIRS += hw/ipmi IPMI_OBJS = ipmi-rtc.o ipmi-power.o ipmi-opal.o ipmi-fru.o ipmi-sel.o +IPMI_OBJS += ipmi-watchdog.o IPMI = hw/ipmi/built-in.o $(IPMI): $(IPMI_OBJS:%=hw/ipmi/%) diff --git a/hw/ipmi/ipmi-watchdog.c b/hw/ipmi/ipmi-watchdog.c new file mode 100644 index 0000000..20607ec --- /dev/null +++ b/hw/ipmi/ipmi-watchdog.c @@ -0,0 +1,142 @@ + +/* Copyright 2013-2014 IBM Corp. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + * implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include <stdlib.h> +#include <ipmi.h> +#include <lock.h> +#include <opal.h> +#include <device.h> +#include <timer.h> +#include <timebase.h> + +#define TIMER_USE_DONT_LOG 0x80 +#define TIMER_USE_DONT_STOP 0x40 +#define TIMER_USE_POST 0x02 + +/* WDT expiration actions */ +#define WDT_POWER_CYCLE_ACTION 0x01 +#define WDT_NO_ACTION 0x00 + +/* How long to set the overall watchdog timeout for. In units of + * 100ms. If the timer is not reset within this time the watchdog + * expiration action will occur. */ +#define WDT_TIMEOUT 600 + +/* How often to reset the timer using schedule_timer(). Too short and +we risk accidently resetting the system due to opal_run_pollers() not +being called in time, too short and we waste time resetting the wdt +more frequently than neccessary. */ +#define WDT_MARGIN 300 + +static struct lock ipmi_msg_lock = LOCK_UNLOCKED; +static bool ipmi_msg_busy = true; +static struct ipmi_msg *ipmi_msg; +static struct timer wdt_timer; + +static void ipmi_wdt_complete(struct ipmi_msg *msg) +{ + ipmi_msg_busy = false; + + if (msg->cmd == IPMI_CMD(IPMI_RESET_WDT) && !msg->user_data) + schedule_timer(&wdt_timer, msecs_to_tb((WDT_TIMEOUT - WDT_MARGIN)*100)); +} + +/* As the WDT message is used continually we reduce stress on the + * memory allocator by using a single message. This function waits for + * the message to become available by running the pollers to flush the + * queue. For the WDT case this makes sense as a backlog of IPMI + * messages could cause a reset if we don't prioritise the processing + * of the IPMI message queue. */ +static void get_ipmi_msg(void) +{ +retry: + while (ipmi_msg_busy) + time_wait_ms(100); + + lock(&ipmi_msg_lock); + if (ipmi_msg_busy) { + unlock(&ipmi_msg_lock); + goto retry; + } + + ipmi_msg_busy = true; + unlock(&ipmi_msg_lock); +} + +static void set_wdt(uint8_t action, uint16_t count) +{ + get_ipmi_msg(); + ipmi_init_msg(ipmi_msg, IPMI_DEFAULT_INTERFACE, IPMI_SET_WDT, + ipmi_wdt_complete, NULL, 6, 0); + ipmi_msg->data[0] = TIMER_USE_DONT_LOG + | TIMER_USE_POST; /* Timer Use */ + ipmi_msg->data[1] = action; /* Timer Actions */ + ipmi_msg->data[2] = 0; /* Pre-timeout Interval */ + ipmi_msg->data[3] = 0; /* Timer Use Flags */ + ipmi_msg->data[4] = count & 0xff; /* Initial countdown (lsb) */ + ipmi_msg->data[5] = (count >> 8) & 0xff; /* Initial countdown (msb) */ + ipmi_queue_msg(ipmi_msg); +} + +static void reset_wdt(struct timer *t __unused, void *data) +{ + get_ipmi_msg(); + ipmi_init_msg(ipmi_msg, IPMI_DEFAULT_INTERFACE, IPMI_RESET_WDT, + ipmi_wdt_complete, data, 0, 0); + ipmi_queue_msg(ipmi_msg); +} + +void ipmi_wdt_final_reset(void) +{ + cancel_timer(&wdt_timer); + reset_wdt(NULL, (void *) !NULL); +} + +void ipmi_wdt_reset(void) +{ + cancel_timer(&wdt_timer); + reset_wdt(NULL, NULL); +} + +void ipmi_wdt_init(void) +{ + ipmi_msg = ipmi_mkmsg(IPMI_DEFAULT_INTERFACE, IPMI_SET_WDT, + ipmi_wdt_complete, NULL, NULL, 6, 0); + if (!ipmi_msg) { + prerror("Unable to allocate watchdog ipmi message\n"); + return; + } + + ipmi_msg_busy = false; + init_timer(&wdt_timer, reset_wdt, NULL); + set_wdt(WDT_POWER_CYCLE_ACTION, WDT_TIMEOUT); + + /* Start the WDT */ + reset_wdt(NULL, NULL); + + /* For some reason we have to reset it twice to get it to + * actually start the first time. */ + reset_wdt(NULL, NULL); + + /* Crank the state machines until we've processed the reset + * command to start the wdt. This ensures the wdt has started + * before we hand control over to the rest of the system.*/ + get_ipmi_msg(); + ipmi_msg_busy = false; + + return; +} diff --git a/include/ipmi.h b/include/ipmi.h index bbeae5a..40dee24 100644 --- a/include/ipmi.h +++ b/include/ipmi.h @@ -93,6 +93,8 @@ #define IPMI_CHASSIS_CONTROL IPMI_CODE(IPMI_NETFN_CHASSIS, 0x02) #define IPMI_SET_POWER_STATE IPMI_CODE(IPMI_NETFN_APP, 0x06) #define IPMI_GET_POWER_STATE IPMI_CODE(IPMI_NETFN_APP, 0x07) +#define IPMI_RESET_WDT IPMI_CODE(IPMI_NETFN_APP, 0x22) +#define IPMI_SET_WDT IPMI_CODE(IPMI_NETFN_APP, 0x24) #define IPMI_PARTIAL_ADD_ESEL IPMI_CODE(IPMI_NETFN_OEM, 0xf0) @@ -202,4 +204,15 @@ void ipmi_fru_init(uint8_t fru_dev_id); struct errorlog; int ipmi_elog_commit(struct errorlog *elog_buf); +/* Starts the watchdog timer */ +void ipmi_wdt_init(void); + +/* Queue a watchdog timer reset. Schedules future resets to prevent + * timer expiration. */ +void ipmi_wdt_reset(void); + +/* Reset the watchdog timer. Does not return until the timer has been + * reset and does not schedule future resets. */ +void ipmi_wdt_final_reset(void); + #endif diff --git a/platforms/astbmc/common.c b/platforms/astbmc/common.c index 993ac4c..336d275 100644 --- a/platforms/astbmc/common.c +++ b/platforms/astbmc/common.c @@ -51,6 +51,7 @@ void astbmc_init(void) /* Register the BT interface with the IPMI layer */ bt_init(); + ipmi_wdt_init(); ipmi_rtc_init(); ipmi_opal_init(); ipmi_fru_init(0x01); diff --git a/platforms/astbmc/palmetto.c b/platforms/astbmc/palmetto.c index a0030e8..b9ef4a4 100644 --- a/platforms/astbmc/palmetto.c +++ b/platforms/astbmc/palmetto.c @@ -51,4 +51,5 @@ DECLARE_PLATFORM(palmetto) = { .cec_power_down = astbmc_ipmi_power_down, .cec_reboot = astbmc_ipmi_reboot, .elog_commit = ipmi_elog_commit, + .exit = ipmi_wdt_final_reset, };
Add support for an ipmi watchdog timer. This patch will cause the system to be reset if opal_run_pollers() isn't called for more than about 60 seconds. The timer is reset just prior to running the payload. It is the responsibility of the payload to ensure either opal_run_pollers() is called frequently enough or to disable the watchdog timer by sending appropriate ipmi commands. Signed-off-by: Alistair Popple <alistair@popple.id.au> --- core/init.c | 7 ++- hw/ipmi/Makefile.inc | 1 + hw/ipmi/ipmi-watchdog.c | 142 ++++++++++++++++++++++++++++++++++++++++++++ include/ipmi.h | 13 ++++ platforms/astbmc/common.c | 1 + platforms/astbmc/palmetto.c | 1 + 6 files changed, 163 insertions(+), 2 deletions(-) create mode 100644 hw/ipmi/ipmi-watchdog.c