mbox series

[v4,0/3] cyclic/watchdog patches

Message ID 20240521084652.1726460-1-rasmus.villemoes@prevas.dk
Headers show
Series cyclic/watchdog patches | expand

Message

Rasmus Villemoes May 21, 2024, 8:46 a.m. UTC
A bit of a mixed bag. I've been wanting to submit something like 3/3
for a while. So when I stumbled on Marek's patch
https://lore.kernel.org/u-boot/20240316201416.211480-1-marek.vasut+renesas@mailbox.org/
, I got reminded of that plan, and I think that patch could be more
readable if we adopt this model.

While actually doing those mostly mechanical changes, I stumbled on
two separate issues that probably want fixing regardless of the fate
of 3/3.

Mostly just compile-tested, and now also checked that at least the
sandbox test runs succesfully, and that it builds both with and
without CONFIG_CYCLIC.

v4: Make sure there's only one definition of struct cyclic_info, previous
versions failed to move the full definition under #ifdef CONFIG_CYCLIC,
breaking builds with !CONFIG_CYCLIC.

v3: Also update the unit test according to the new API.

v2: Add R-bs from Stefan. Fixup whitespace in the doc/ part. Rebase
to current master (676903c1b97), fixing trivial conflict with
301bac6047c8.

Rasmus Villemoes (3):
  cyclic: stop strdup'ing name in cyclic_register()
  wdt-uclass: prevent multiple cyclic_register calls
  cyclic: make clients embed a struct cyclic_info in their own data
    structure

 board/Marvell/octeon_nic23/board.c |  9 ++++---
 cmd/cyclic.c                       | 12 ++++------
 common/cyclic.c                    | 24 +++++--------------
 doc/develop/cyclic.rst             | 26 ++++++++++++--------
 drivers/watchdog/wdt-uclass.c      | 38 ++++++++++++++++--------------
 include/cyclic.h                   | 37 +++++++++++++++--------------
 test/common/cyclic.c               | 19 +++++++++------
 7 files changed, 84 insertions(+), 81 deletions(-)

Comments

Rasmus Villemoes May 21, 2024, 9:47 a.m. UTC | #1
On 21/05/2024 10.46, Rasmus Villemoes wrote:
> A bit of a mixed bag. I've been wanting to submit something like 3/3
> for a while. So when I stumbled on Marek's patch
> https://lore.kernel.org/u-boot/20240316201416.211480-1-marek.vasut+renesas@mailbox.org/
> , I got reminded of that plan, and I think that patch could be more
> readable if we adopt this model.
> 
> While actually doing those mostly mechanical changes, I stumbled on
> two separate issues that probably want fixing regardless of the fate
> of 3/3.
> 
> Mostly just compile-tested, and now also checked that at least the
> sandbox test runs succesfully, and that it builds both with and
> without CONFIG_CYCLIC.

So I managed to trigger an azure test by pushing to github and creating
a dummy PR: https://github.com/u-boot/u-boot/pull/542

That fails, and while it involves the cyclic framework, I'm pretty sure
these patches are not to blame, since the same error also exists in
other pipelines. It's an "expect" failure, because some watchdog
callback apparently sometimes takes more than 1ms, so the default 1000us
threshold is exceeded, and that prints a warning which breaks the "expect".

An example is

https://dev.azure.com/u-boot/u-boot/_build/results?buildId=8459&view=logs&j=a1270dec-081b-5c65-5cd5-5e915a842596&t=69f6cf72-86f3-551a-807d-f28f62a1426f&l=1055
.

I don't know why it only/mostly seems to happen in clang builds, but I
think the fact that these happen quite frequently warrants either
bumping the threshold used in the CI builds quite a lot, or adding a
config option to suppress that warning/limit altogether for CI builds.

Rasmus
Stefan Roese May 21, 2024, 11:54 a.m. UTC | #2
On 5/21/24 11:47, Rasmus Villemoes wrote:
> On 21/05/2024 10.46, Rasmus Villemoes wrote:
>> A bit of a mixed bag. I've been wanting to submit something like 3/3
>> for a while. So when I stumbled on Marek's patch
>> https://lore.kernel.org/u-boot/20240316201416.211480-1-marek.vasut+renesas@mailbox.org/
>> , I got reminded of that plan, and I think that patch could be more
>> readable if we adopt this model.
>>
>> While actually doing those mostly mechanical changes, I stumbled on
>> two separate issues that probably want fixing regardless of the fate
>> of 3/3.
>>
>> Mostly just compile-tested, and now also checked that at least the
>> sandbox test runs succesfully, and that it builds both with and
>> without CONFIG_CYCLIC.
> 
> So I managed to trigger an azure test by pushing to github and creating
> a dummy PR: https://github.com/u-boot/u-boot/pull/542
> 
> That fails, and while it involves the cyclic framework, I'm pretty sure
> these patches are not to blame, since the same error also exists in
> other pipelines. It's an "expect" failure, because some watchdog
> callback apparently sometimes takes more than 1ms, so the default 1000us
> threshold is exceeded, and that prints a warning which breaks the "expect".
> 
> An example is
> 
> https://dev.azure.com/u-boot/u-boot/_build/results?buildId=8459&view=logs&j=a1270dec-081b-5c65-5cd5-5e915a842596&t=69f6cf72-86f3-551a-807d-f28f62a1426f&l=1055
> .
> 
> I don't know why it only/mostly seems to happen in clang builds, but I
> think the fact that these happen quite frequently warrants either
> bumping the threshold used in the CI builds quite a lot, or adding a
> config option to suppress that warning/limit altogether for CI builds.

I've also seen CI build issues from time to time and restarting the
build magically solved these issues. I'm all for making this CI
build more stable, perhaps Tom has some ideas?

Regarding this cyclic patch:

Still some problems, MIPS64 related at least, octeon_nic23 target:

https://dev.azure.com/sr0718/0cded7c3-6e6a-4b57-8d0f-65c99496c42f/_apis/build/builds/357/logs/415

Thanks,
Stefan
Rasmus Villemoes May 21, 2024, 12:45 p.m. UTC | #3
On 21/05/2024 13.54, Stefan Roese wrote:
> On 5/21/24 11:47, Rasmus Villemoes wrote:
>> On 21/05/2024 10.46, Rasmus Villemoes wrote:
>>> A bit of a mixed bag. I've been wanting to submit something like 3/3
>>> for a while. So when I stumbled on Marek's patch
>>> https://lore.kernel.org/u-boot/20240316201416.211480-1-marek.vasut+renesas@mailbox.org/
>>> , I got reminded of that plan, and I think that patch could be more
>>> readable if we adopt this model.
>>>
>>> While actually doing those mostly mechanical changes, I stumbled on
>>> two separate issues that probably want fixing regardless of the fate
>>> of 3/3.
>>>
>>> Mostly just compile-tested, and now also checked that at least the
>>> sandbox test runs succesfully, and that it builds both with and
>>> without CONFIG_CYCLIC.
>>
>> So I managed to trigger an azure test by pushing to github and creating
>> a dummy PR: https://github.com/u-boot/u-boot/pull/542
>>
>> That fails, and while it involves the cyclic framework, I'm pretty sure
>> these patches are not to blame, since the same error also exists in
>> other pipelines. It's an "expect" failure, because some watchdog
>> callback apparently sometimes takes more than 1ms, so the default 1000us
>> threshold is exceeded, and that prints a warning which breaks the
>> "expect".
>>
>> An example is
>>
>> https://dev.azure.com/u-boot/u-boot/_build/results?buildId=8459&view=logs&j=a1270dec-081b-5c65-5cd5-5e915a842596&t=69f6cf72-86f3-551a-807d-f28f62a1426f&l=1055
>> .
>>
>> I don't know why it only/mostly seems to happen in clang builds, but I
>> think the fact that these happen quite frequently warrants either
>> bumping the threshold used in the CI builds quite a lot, or adding a
>> config option to suppress that warning/limit altogether for CI builds.
> 
> I've also seen CI build issues from time to time and restarting the
> build magically solved these issues. I'm all for making this CI
> build more stable, perhaps Tom has some ideas?

Well, the problem seems to be inherent in the warning from the cyclic
framework; maybe more so when the build server is overloaded (as
sometimes those callbacks are reported to have taken 5+ ms). So when
running sandbox, or under qemu, I think that warning should be disabled.

> Regarding this cyclic patch:
> 
> Still some problems, MIPS64 related at least, octeon_nic23 target:
> 
> https://dev.azure.com/sr0718/0cded7c3-6e6a-4b57-8d0f-65c99496c42f/_apis/build/builds/357/logs/415

Oh my, this is starting to be really embarrassing. The fix is trivial
(as the callback doesn't even use any context):

-static void octeon_board_restore_pf(void *ctx)
+static void octeon_board_restore_pf(struct cyclic_info *c)

Should I resend yet again?

Rasmus
Stefan Roese May 21, 2024, 12:48 p.m. UTC | #4
On 5/21/24 14:45, Rasmus Villemoes wrote:
> On 21/05/2024 13.54, Stefan Roese wrote:
>> On 5/21/24 11:47, Rasmus Villemoes wrote:
>>> On 21/05/2024 10.46, Rasmus Villemoes wrote:
>>>> A bit of a mixed bag. I've been wanting to submit something like 3/3
>>>> for a while. So when I stumbled on Marek's patch
>>>> https://lore.kernel.org/u-boot/20240316201416.211480-1-marek.vasut+renesas@mailbox.org/
>>>> , I got reminded of that plan, and I think that patch could be more
>>>> readable if we adopt this model.
>>>>
>>>> While actually doing those mostly mechanical changes, I stumbled on
>>>> two separate issues that probably want fixing regardless of the fate
>>>> of 3/3.
>>>>
>>>> Mostly just compile-tested, and now also checked that at least the
>>>> sandbox test runs succesfully, and that it builds both with and
>>>> without CONFIG_CYCLIC.
>>>
>>> So I managed to trigger an azure test by pushing to github and creating
>>> a dummy PR: https://github.com/u-boot/u-boot/pull/542
>>>
>>> That fails, and while it involves the cyclic framework, I'm pretty sure
>>> these patches are not to blame, since the same error also exists in
>>> other pipelines. It's an "expect" failure, because some watchdog
>>> callback apparently sometimes takes more than 1ms, so the default 1000us
>>> threshold is exceeded, and that prints a warning which breaks the
>>> "expect".
>>>
>>> An example is
>>>
>>> https://dev.azure.com/u-boot/u-boot/_build/results?buildId=8459&view=logs&j=a1270dec-081b-5c65-5cd5-5e915a842596&t=69f6cf72-86f3-551a-807d-f28f62a1426f&l=1055
>>> .
>>>
>>> I don't know why it only/mostly seems to happen in clang builds, but I
>>> think the fact that these happen quite frequently warrants either
>>> bumping the threshold used in the CI builds quite a lot, or adding a
>>> config option to suppress that warning/limit altogether for CI builds.
>>
>> I've also seen CI build issues from time to time and restarting the
>> build magically solved these issues. I'm all for making this CI
>> build more stable, perhaps Tom has some ideas?
> 
> Well, the problem seems to be inherent in the warning from the cyclic
> framework; maybe more so when the build server is overloaded (as
> sometimes those callbacks are reported to have taken 5+ ms). So when
> running sandbox, or under qemu, I think that warning should be disabled.

Agreed.

>> Regarding this cyclic patch:
>>
>> Still some problems, MIPS64 related at least, octeon_nic23 target:
>>
>> https://dev.azure.com/sr0718/0cded7c3-6e6a-4b57-8d0f-65c99496c42f/_apis/build/builds/357/logs/415
> 
> Oh my, this is starting to be really embarrassing. The fix is trivial
> (as the callback doesn't even use any context):
> 
> -static void octeon_board_restore_pf(void *ctx)
> +static void octeon_board_restore_pf(struct cyclic_info *c)
> 
> Should I resend yet again?

No need. I'll apply this and squash it here. If CI build works, I'll
send the pull request. Otherwise this needs to wait for the next
merge window I'm afraid.

Thanks,
Stefan
Tom Rini May 21, 2024, 4:07 p.m. UTC | #5
On Tue, May 21, 2024 at 01:54:58PM +0200, Stefan Roese wrote:
> On 5/21/24 11:47, Rasmus Villemoes wrote:
> > On 21/05/2024 10.46, Rasmus Villemoes wrote:
> > > A bit of a mixed bag. I've been wanting to submit something like 3/3
> > > for a while. So when I stumbled on Marek's patch
> > > https://lore.kernel.org/u-boot/20240316201416.211480-1-marek.vasut+renesas@mailbox.org/
> > > , I got reminded of that plan, and I think that patch could be more
> > > readable if we adopt this model.
> > > 
> > > While actually doing those mostly mechanical changes, I stumbled on
> > > two separate issues that probably want fixing regardless of the fate
> > > of 3/3.
> > > 
> > > Mostly just compile-tested, and now also checked that at least the
> > > sandbox test runs succesfully, and that it builds both with and
> > > without CONFIG_CYCLIC.
> > 
> > So I managed to trigger an azure test by pushing to github and creating
> > a dummy PR: https://github.com/u-boot/u-boot/pull/542
> > 
> > That fails, and while it involves the cyclic framework, I'm pretty sure
> > these patches are not to blame, since the same error also exists in
> > other pipelines. It's an "expect" failure, because some watchdog
> > callback apparently sometimes takes more than 1ms, so the default 1000us
> > threshold is exceeded, and that prints a warning which breaks the "expect".
> > 
> > An example is
> > 
> > https://dev.azure.com/u-boot/u-boot/_build/results?buildId=8459&view=logs&j=a1270dec-081b-5c65-5cd5-5e915a842596&t=69f6cf72-86f3-551a-807d-f28f62a1426f&l=1055
> > .
> > 
> > I don't know why it only/mostly seems to happen in clang builds, but I
> > think the fact that these happen quite frequently warrants either
> > bumping the threshold used in the CI builds quite a lot, or adding a
> > config option to suppress that warning/limit altogether for CI builds.
> 
> I've also seen CI build issues from time to time and restarting the
> build magically solved these issues. I'm all for making this CI
> build more stable, perhaps Tom has some ideas?

In Azure (and not GitLab) we seem to have some tests that fail when (I
speculate..) we get put on busy or slow resources for the job. When I
last looked, retryCountOnTaskFailure was limited to 2 at most. I would
be open to other suggestions / patches (such as yes, increasing the
threshold in a test) to make the pipeline more reliable.
Stefan Roese June 16, 2024, 10:18 a.m. UTC | #6
On 5/21/24 10:46, Rasmus Villemoes wrote:
> A bit of a mixed bag. I've been wanting to submit something like 3/3
> for a while. So when I stumbled on Marek's patch
> https://lore.kernel.org/u-boot/20240316201416.211480-1-marek.vasut+renesas@mailbox.org/
> , I got reminded of that plan, and I think that patch could be more
> readable if we adopt this model.
> 
> While actually doing those mostly mechanical changes, I stumbled on
> two separate issues that probably want fixing regardless of the fate
> of 3/3.
> 
> Mostly just compile-tested, and now also checked that at least the
> sandbox test runs succesfully, and that it builds both with and
> without CONFIG_CYCLIC.
> 
> v4: Make sure there's only one definition of struct cyclic_info, previous
> versions failed to move the full definition under #ifdef CONFIG_CYCLIC,
> breaking builds with !CONFIG_CYCLIC.
> 
> v3: Also update the unit test according to the new API.
> 
> v2: Add R-bs from Stefan. Fixup whitespace in the doc/ part. Rebase
> to current master (676903c1b97), fixing trivial conflict with
> 301bac6047c8.
> 
> Rasmus Villemoes (3):
>    cyclic: stop strdup'ing name in cyclic_register()
>    wdt-uclass: prevent multiple cyclic_register calls
>    cyclic: make clients embed a struct cyclic_info in their own data
>      structure

Applied to u-boot-watchdog/next

Thanks,
Stefan

>   board/Marvell/octeon_nic23/board.c |  9 ++++---
>   cmd/cyclic.c                       | 12 ++++------
>   common/cyclic.c                    | 24 +++++--------------
>   doc/develop/cyclic.rst             | 26 ++++++++++++--------
>   drivers/watchdog/wdt-uclass.c      | 38 ++++++++++++++++--------------
>   include/cyclic.h                   | 37 +++++++++++++++--------------
>   test/common/cyclic.c               | 19 +++++++++------
>   7 files changed, 84 insertions(+), 81 deletions(-)
> 

Viele Grüße,
Stefan Roese