diff mbox series

[nft,v3,2/6] tests/shell: check and generate JSON dump files

Message ID 20231114160903.409552-1-thaller@redhat.com
State Accepted, archived
Delegated to: Florian Westphal
Headers show
Series add and check dump files for JSON in tests/shell | expand

Commit Message

Thomas Haller Nov. 14, 2023, 4:08 p.m. UTC
The rules after a successful test are good opportunity to test
`nft -j list ruleset` and `nft -j --check`. This quite possibly touches
code paths that are not hit by other tests yet.

The only downside is the increase of the test runtime (which seems
negligible, given the benefits of increasing test coverage).

Future commits will generate and commit those ".json-nft" dump files.

"DUMPGEN=y" will, like before, regenerate only the existing
"*.{nodump,nft,json-nft}" files (unless a test has none of the 3 files,
in which case they are all generated and the user is suggested to commit
the correct ones). Now also "DUMPGEN=all" is honored, that will generate
all 3 files, regardless of whether they already existed. That is useful
if you start out with a test that only has a .nft file, and then you
want to generate a .json-nft file too.

Signed-off-by: Thomas Haller <thaller@redhat.com>
---
 tests/shell/helpers/json-sanitize-ruleset.sh |  23 +++
 tests/shell/helpers/test-wrapper.sh          | 143 ++++++++++++++-----
 tests/shell/run-tests.sh                     |  11 +-
 3 files changed, 138 insertions(+), 39 deletions(-)
 create mode 100755 tests/shell/helpers/json-sanitize-ruleset.sh

Comments

Florian Westphal Nov. 15, 2023, 8:24 a.m. UTC | #1
Thomas Haller <thaller@redhat.com> wrote:
> The rules after a successful test are good opportunity to test
> `nft -j list ruleset` and `nft -j --check`. This quite possibly touches
> code paths that are not hit by other tests yet.

This series looks good to me, I'll apply it in the next few hours if
noone else takes any action by then.
Pablo Neira Ayuso Nov. 15, 2023, 9:54 a.m. UTC | #2
On Wed, Nov 15, 2023 at 09:24:27AM +0100, Florian Westphal wrote:
> Thomas Haller <thaller@redhat.com> wrote:
> > The rules after a successful test are good opportunity to test
> > `nft -j list ruleset` and `nft -j --check`. This quite possibly touches
> > code paths that are not hit by other tests yet.
> 
> This series looks good to me, I'll apply it in the next few hours if
> noone else takes any action by then.

Just a question, patch 3 is missing in patchwork. I guess it is too
big.

My understanding is that this performs the json tests if nft comes with
json support.

I wanted to give this a run, description says a few tests are failing.
Last time we talked it is chain binding support, then there is a good
number of tests that are going to fail (or there is a mechanism to
temporarily disable json tests for this without losing coverage?).

What is the current output from tests? I wanted to make this run
myself so I don't need to ask.

I am asking all this because I am finishing backports for older stable
kernels while this is also going on.
Florian Westphal Nov. 15, 2023, 10:01 a.m. UTC | #3
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Wed, Nov 15, 2023 at 09:24:27AM +0100, Florian Westphal wrote:
> > Thomas Haller <thaller@redhat.com> wrote:
> > > The rules after a successful test are good opportunity to test
> > > `nft -j list ruleset` and `nft -j --check`. This quite possibly touches
> > > code paths that are not hit by other tests yet.
> > 
> > This series looks good to me, I'll apply it in the next few hours if
> > noone else takes any action by then.
> 
> Just a question, patch 3 is missing in patchwork. I guess it is too
> big.
> 
> My understanding is that this performs the json tests if nft comes with
> json support.
> 
> I wanted to give this a run, description says a few tests are failing.

... but it says that no dump files are added for the failing test cases.

I'll double check this of course before pushing this out.
Pablo Neira Ayuso Nov. 15, 2023, 10:05 a.m. UTC | #4
On Wed, Nov 15, 2023 at 11:01:01AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Wed, Nov 15, 2023 at 09:24:27AM +0100, Florian Westphal wrote:
> > > Thomas Haller <thaller@redhat.com> wrote:
> > > > The rules after a successful test are good opportunity to test
> > > > `nft -j list ruleset` and `nft -j --check`. This quite possibly touches
> > > > code paths that are not hit by other tests yet.
> > > 
> > > This series looks good to me, I'll apply it in the next few hours if
> > > noone else takes any action by then.
> > 
> > Just a question, patch 3 is missing in patchwork. I guess it is too
> > big.
> > 
> > My understanding is that this performs the json tests if nft comes with
> > json support.
> > 
> > I wanted to give this a run, description says a few tests are failing.
> 
> ... but it says that no dump files are added for the failing test cases.

OK. Then json it is skipped in that case, that is fine.

> I'll double check this of course before pushing this out.

Then, please also disable json dump by now for:

        sets/sets_with_ifnames

because I am currently figuring out how to detach pipapo support from
it without losing coverage.
Florian Westphal Nov. 15, 2023, 10:10 a.m. UTC | #5
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> because I am currently figuring out how to detach pipapo support from
> it without losing coverage.

Is it worth the pain?  I'd probably lean towards skipping the test
entirely, splitting it in two is possible but we'd end up with quite
some duplicated scripting.
Thomas Haller Nov. 15, 2023, 10:11 a.m. UTC | #6
On Wed, 2023-11-15 at 10:54 +0100, Pablo Neira Ayuso wrote:
> On Wed, Nov 15, 2023 at 09:24:27AM +0100, Florian Westphal wrote:
> > Thomas Haller <thaller@redhat.com> wrote:
> > > The rules after a successful test are good opportunity to test
> > > `nft -j list ruleset` and `nft -j --check`. This quite possibly
> > > touches
> > > code paths that are not hit by other tests yet.
> > 
> > This series looks good to me, I'll apply it in the next few hours
> > if
> > noone else takes any action by then.
> 
> Just a question, patch 3 is missing in patchwork. I guess it is too
> big.


Yes, it's not on the list, as it's too large. I CC-ed you and Florian
on patch 2/6.

You can also find it here:
https://gitlab.freedesktop.org/thaller/nftables/-/commit/b0edc64d005510b8c3db8a8ebe496a8296271bf4.patch

> 
> My understanding is that this performs the json tests if nft comes
> with
> json support.

right.

> 
> I wanted to give this a run, description says a few tests are
> failing.

For tests that fail, the patch 2/6 does not add a `.json-nft` file.

If you get any failure, that's wrong. Then the corresponding .json-nft
should be excluded from patch 2/6.

> Last time we talked it is chain binding support, then there is a good
> number of tests that are going to fail (or there is a mechanism to
> temporarily disable json tests for this without losing coverage?).

If chain binding support is missing, then that is detected via the
common mechanism (NFT_TEST_HAVE_chain_binding=n) and the test will be
marked as SKIPPED.

SKIPPED tests don't get their .nft dump checked. The same for .json-nft
files.

(of course, it also honors NFT_TEST_HAVE_json=n to skip the check).

> 
> What is the current output from tests? I wanted to make this run
> myself so I don't need to ask.

it's the same as with .nft files.

If the .nft/.json-nft dump does not match, the test fails with [DUMP
FAIL]. As always, you can find the result data in /tmp.

I find it most useful to run

  grep --color=always ^ -a -R /tmp/nft-test.latest.*/ | less -R

but whatever works for you.


Also, the test wrapper will call `nft --check -f x.nft` and `nft -j --
check -f x.json-nft` after tests. When those fail, you'll see [CHK
DUMP].

You can see that with:

  DUMPGEN=all ./tests/shell/run-tests.sh tests/shell/testcases/chains/0011endless_jump_loop_1



Thomas
Pablo Neira Ayuso Nov. 15, 2023, 10:26 a.m. UTC | #7
On Wed, Nov 15, 2023 at 11:10:20AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > because I am currently figuring out how to detach pipapo support from
> > it without losing coverage.
> 
> Is it worth the pain?  I'd probably lean towards skipping the test
> entirely, splitting it in two is possible but we'd end up with quite
> some duplicated scripting.

I can skip it by now, that is easy, I was just trying not to reduce
coverage.
Florian Westphal Nov. 15, 2023, 10:31 a.m. UTC | #8
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> I can skip it by now, that is easy, I was just trying not to reduce
> coverage.

Sure, I understand that, but OTOH I think there are limitations
as to what we should provide for, in this case the work/benefit ratio
is quite bad...
Pablo Neira Ayuso Nov. 15, 2023, 10:35 a.m. UTC | #9
On Wed, Nov 15, 2023 at 11:31:12AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > I can skip it by now, that is easy, I was just trying not to reduce
> > coverage.
> 
> Sure, I understand that, but OTOH I think there are limitations
> as to what we should provide for, in this case the work/benefit ratio
> is quite bad...

It can be done later, that is what the commit description said.
Pablo Neira Ayuso Nov. 15, 2023, 10:43 a.m. UTC | #10
On Wed, Nov 15, 2023 at 11:01:01AM +0100, Florian Westphal wrote:
[...]
> I'll double check this of course before pushing this out.

OK, then all has been clarified and this can follow its route to git.

I will follow up with my pending patches for tests/shell and older
kernels, with my last patch 5/4 it is done for kernel 5.4.
Florian Westphal Nov. 15, 2023, 12:21 p.m. UTC | #11
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Wed, Nov 15, 2023 at 11:01:01AM +0100, Florian Westphal wrote:
> [...]
> > I'll double check this of course before pushing this out.
> 
> OK, then all has been clarified and this can follow its route to git.
> 
> I will follow up with my pending patches for tests/shell and older
> kernels, with my last patch 5/4 it is done for kernel 5.4.

I"ve pushed the series out.  Note that I modified two dump files
to account for "src: expand create commands", which did modify
two tests in the mean time.
Pablo Neira Ayuso Nov. 15, 2023, 12:30 p.m. UTC | #12
On Wed, Nov 15, 2023 at 01:21:05PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Wed, Nov 15, 2023 at 11:01:01AM +0100, Florian Westphal wrote:
> > [...]
> > > I'll double check this of course before pushing this out.
> > 
> > OK, then all has been clarified and this can follow its route to git.
> > 
> > I will follow up with my pending patches for tests/shell and older
> > kernels, with my last patch 5/4 it is done for kernel 5.4.
> 
> I"ve pushed the series out.  Note that I modified two dump files
> to account for "src: expand create commands", which did modify
> two tests in the mean time.

I see _lots_ of DUMP FAIL with kernel 5.4

I: [SKIPPED]      1/387 testcases/flowtable/0015destroy_0
W: [DUMP FAIL]    2/387 testcases/optimizations/merge_stmts_concat_vmap
I: [OK]           3/387 testcases/chains/0037policy_variable_1
I: [OK]           4/387 testcases/transactions/bad_expression
W: [DUMP FAIL]    5/387 testcases/sets/collapse_elem_0
I: [OK]           6/387 testcases/include/0009glob_nofile_1
W: [DUMP FAIL]    7/387 testcases/sets/0049set_define_0
W: [DUMP FAIL]    8/387 testcases/rule_management/0005replace_1
W: [DUMP FAIL]    9/387 testcases/flowtable/0003add_after_flush_0
W: [DUMP FAIL]   10/387 testcases/sets/0017add_after_flush_0
I: [OK]          11/387 testcases/chains/0011endless_jump_loop_1
I: [SKIPPED]     12/387 testcases/maps/0014destroy_0
I: [SKIPPED]     13/387 testcases/sets/0060set_multistmt_0
W: [DUMP FAIL]   14/387 testcases/cache/0004_cache_update_0
I: [OK]          15/387 testcases/transactions/0036set_1
W: [DUMP FAIL]   16/387 testcases/nft-f/0002rollback_rule_0
W: [DUMP FAIL]   17/387 testcases/chains/0022prio_dummy_1
W: [DUMP FAIL]   18/387 testcases/nft-f/0032pknock_0
I: [SKIPPED]     19/387 testcases/maps/0017_map_variable_0
I: [OK]          20/387 testcases/optionals/comments_objects_dup_0
I: [OK]          21/387 testcases/transactions/0014chain_1
W: [DUMP FAIL]   22/387 testcases/chains/0026prio_netdev_1
W: [DUMP FAIL]   23/387 testcases/optionals/handles_1
W: [DUMP FAIL]   24/387 testcases/include/0007glob_double_0
W: [DUMP FAIL]   25/387 testcases/transactions/0048helpers_0
W: [DUMP FAIL]   26/387 testcases/listing/0021ruleset_json_terse_0
W: [DUMP FAIL]   27/387 testcases/listing/0022terse_0
W: [DUMP FAIL]   28/387 testcases/chains/0017masquerade_jump_1
W: [DUMP FAIL]   29/387 testcases/sets/0016element_leak_0
...
Thomas Haller Nov. 15, 2023, 12:36 p.m. UTC | #13
On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso wrote:
> On Wed, Nov 15, 2023 at 01:21:05PM +0100, Florian Westphal wrote:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > On Wed, Nov 15, 2023 at 11:01:01AM +0100, Florian Westphal wrote:
> > > [...]
> > > > I'll double check this of course before pushing this out.
> > > 
> > > OK, then all has been clarified and this can follow its route to
> > > git.
> > > 
> > > I will follow up with my pending patches for tests/shell and
> > > older
> > > kernels, with my last patch 5/4 it is done for kernel 5.4.
> > 
> > I"ve pushed the series out.  Note that I modified two dump files
> > to account for "src: expand create commands", which did modify
> > two tests in the mean time.
> 
> I see _lots_ of DUMP FAIL with kernel 5.4

Hi,

Could you provide more details?

For example,

    make -j && ./tests/shell/run-tests.sh tests/shell/testcases/include/0007glob_double_0 -x
    grep ^ -a -R /tmp/nft-test.latest.*/


Thomas
Pablo Neira Ayuso Nov. 16, 2023, 4:10 p.m. UTC | #14
Hi Thomas,

On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller wrote:
> On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso wrote:
[...]
> > I see _lots_ of DUMP FAIL with kernel 5.4
> 
> Hi,
> 
> Could you provide more details?
> 
> For example,
> 
>     make -j && ./tests/shell/run-tests.sh tests/shell/testcases/include/0007glob_double_0 -x
>     grep ^ -a -R /tmp/nft-test.latest.*/

# cat [...]/ruleset-diff.json
--- testcases/include/dumps/0007glob_double_0.json-nft  2023-11-15 13:27:20.272084254 +0100
+++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-testcases-include-0007glob_double_0.1/ruleset-after.json      2023-11-16 17:06:18.332535411 +0100
@@ -1 +1 @@
-{"nftables": [{"metainfo": {"version": "VERSION", "release_name": "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family": "ip", "name": "x", "handle": 1}}, {"table": {"family": "ip", "name": "y", "handle": 2}}]}
+{"nftables": [{"metainfo": {"version": "VERSION", "release_name": "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family": "ip", "name": "x", "handle": 158}}, {"table": {"family": "ip", "name": "y", "handle": 159}}]}

It seems that handles are a problem in this diff.
Thomas Haller Nov. 16, 2023, 4:49 p.m. UTC | #15
On Thu, 2023-11-16 at 17:10 +0100, Pablo Neira Ayuso wrote:
> It seems that handles are a problem in this diff.


Hi,


It's a bit surprising that this happens. Do you use unshare (the
default), to create a separate netns for each test? For me, those
handles then grow incrementally from 1. 


A possible workaround would be the following (and a `DUMPGEN=y` run)



diff --git c/tests/shell/helpers/json-sanitize-ruleset.sh w/tests/shell/helpers/json-sanitize-ruleset.sh
index 270a6107e0aa..22a57d72110e 100755
--- c/tests/shell/helpers/json-sanitize-ruleset.sh
+++ w/tests/shell/helpers/json-sanitize-ruleset.sh
@@ -6,7 +6,12 @@ die() {
 }
 
 do_sed() {
-	sed '1s/\({"nftables": \[{"metainfo": {"version": "\)[0-9.]\+\(", "release_name": "\)[^"]\+\(", "\)/\1VERSION\2RELEASE_NAME\3/' "$@"
+	# Also normalize all "handle". Optimally, those handles would be stable
+	# and reproducible. However, they are not.
+	sed \
+		-e '1s/\({"nftables": \[{"metainfo": {"version": "\)[0-9.]\+\(", "release_name": "\)[^"]\+\(", "\)/\1VERSION\2RELEASE_NAME\3/' \
+		-e '1s/"handle": [0-9]\+\>/"handle": 1/g' \
+		"$@"
 }
 
 if [ "$#" = 0 ] ; then
Thomas Haller Nov. 16, 2023, 4:55 p.m. UTC | #16
On Thu, 2023-11-16 at 17:49 +0100, Thomas Haller wrote:
> On Thu, 2023-11-16 at 17:10 +0100, Pablo Neira Ayuso wrote:
> > It seems that handles are a problem in this diff.
> 
> A possible workaround would be the following (and a `DUMPGEN=y` run)

actually, I think first should be understood why the handles are not
stable. And whether they are stable on recent kernels (as I think they
are supposed to be -- provided the test runs in a new netns).


Also, it seems that `nft -j list ruleset` has a bug and does not honor
(the lack of) the `--handle` option.


Thomas
Florian Westphal Nov. 16, 2023, 11 p.m. UTC | #17
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Thomas,
> 
> On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller wrote:
> > On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso wrote:
> [...]
> > > I see _lots_ of DUMP FAIL with kernel 5.4
> > 
> > Hi,
> > 
> > Could you provide more details?
> > 
> > For example,
> > 
> >     make -j && ./tests/shell/run-tests.sh tests/shell/testcases/include/0007glob_double_0 -x
> >     grep ^ -a -R /tmp/nft-test.latest.*/
> 
> # cat [...]/ruleset-diff.json
> --- testcases/include/dumps/0007glob_double_0.json-nft  2023-11-15 13:27:20.272084254 +0100
> +++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-testcases-include-0007glob_double_0.1/ruleset-after.json      2023-11-16 17:06:18.332535411 +0100
> @@ -1 +1 @@
> -{"nftables": [{"metainfo": {"version": "VERSION", "release_name": "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family": "ip", "name": "x", "handle": 1}}, {"table": {"family": "ip", "name": "y", "handle": 2}}]}
> +{"nftables": [{"metainfo": {"version": "VERSION", "release_name": "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family": "ip", "name": "x", "handle": 158}}, {"table": {"family": "ip", "name": "y", "handle": 159}}]}
> 
> It seems that handles are a problem in this diff.

Are you running tests with -s option?

In that case, modules are removed after each test.

I suspect its because we can then hit -EAGAIN mid-transaction
because module is missing (again), then replay logic does its
thing.

But the handle generator isn't transaction aware,
so it has advanced vs. the aborted partial transaction.

I'm not sure what to do here.

One the one hand those rmmods are plain stupid, but on the other
hand this adds partial coverage for the rmmod path.

We could make the handle counter transaction aware to
"fix" this on kernel side; it should not be too much code.

What do you think?
Florian Westphal Nov. 16, 2023, 11:02 p.m. UTC | #18
Florian Westphal <fw@strlen.de> wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Hi Thomas,
> > 
> > On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller wrote:
> > > On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso wrote:
> > [...]
> > > > I see _lots_ of DUMP FAIL with kernel 5.4
> > > 
> > > Hi,
> > > 
> > > Could you provide more details?
> > > 
> > > For example,
> > > 
> > >     make -j && ./tests/shell/run-tests.sh tests/shell/testcases/include/0007glob_double_0 -x
> > >     grep ^ -a -R /tmp/nft-test.latest.*/
> > 
> > # cat [...]/ruleset-diff.json
> > --- testcases/include/dumps/0007glob_double_0.json-nft  2023-11-15 13:27:20.272084254 +0100
> > +++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-testcases-include-0007glob_double_0.1/ruleset-after.json      2023-11-16 17:06:18.332535411 +0100
> > @@ -1 +1 @@
> > -{"nftables": [{"metainfo": {"version": "VERSION", "release_name": "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family": "ip", "name": "x", "handle": 1}}, {"table": {"family": "ip", "name": "y", "handle": 2}}]}
> > +{"nftables": [{"metainfo": {"version": "VERSION", "release_name": "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family": "ip", "name": "x", "handle": 158}}, {"table": {"family": "ip", "name": "y", "handle": 159}}]}
> > 
> > It seems that handles are a problem in this diff.
> 
> Are you running tests with -s option?
> 
> In that case, modules are removed after each test.
> 
> I suspect its because we can then hit -EAGAIN mid-transaction
> because module is missing (again), then replay logic does its
> thing.
> 
> But the handle generator isn't transaction aware,
> so it has advanced vs. the aborted partial transaction.
> 
> I'm not sure what to do here.
> 
> One the one hand those rmmods are plain stupid, but on the other
> hand this adds partial coverage for the rmmod path.
> 
> We could make the handle counter transaction aware to
> "fix" this on kernel side; it should not be too much code.
> 
> What do you think?

Oh, wait, on older kernels the entire handle counter is global,
so "unshare -n" has no effect on it.

But the rmmod scenario explained above happens as well, this
"breaks" json dumps on centos stream 9, which does have the
pernet handle generator fix backported.
Pablo Neira Ayuso Nov. 17, 2023, 8:27 a.m. UTC | #19
On Fri, Nov 17, 2023 at 12:00:24AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Hi Thomas,
> > 
> > On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller wrote:
> > > On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso wrote:
> > [...]
> > > > I see _lots_ of DUMP FAIL with kernel 5.4
> > > 
> > > Hi,
> > > 
> > > Could you provide more details?
> > > 
> > > For example,
> > > 
> > >     make -j && ./tests/shell/run-tests.sh tests/shell/testcases/include/0007glob_double_0 -x
> > >     grep ^ -a -R /tmp/nft-test.latest.*/
> > 
> > # cat [...]/ruleset-diff.json
> > --- testcases/include/dumps/0007glob_double_0.json-nft  2023-11-15 13:27:20.272084254 +0100
> > +++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-testcases-include-0007glob_double_0.1/ruleset-after.json      2023-11-16 17:06:18.332535411 +0100
> > @@ -1 +1 @@
> > -{"nftables": [{"metainfo": {"version": "VERSION", "release_name": "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family": "ip", "name": "x", "handle": 1}}, {"table": {"family": "ip", "name": "y", "handle": 2}}]}
> > +{"nftables": [{"metainfo": {"version": "VERSION", "release_name": "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family": "ip", "name": "x", "handle": 158}}, {"table": {"family": "ip", "name": "y", "handle": 159}}]}
> > 
> > It seems that handles are a problem in this diff.
> 
> Are you running tests with -s option?

This is plain run with no options.

> In that case, modules are removed after each test.
> 
> I suspect its because we can then hit -EAGAIN mid-transaction
> because module is missing (again), then replay logic does its
> thing.
> 
> But the handle generator isn't transaction aware,
> so it has advanced vs. the aborted partial transaction.
> 
> I'm not sure what to do here.
> 
> One the one hand those rmmods are plain stupid, but on the other
> hand this adds partial coverage for the rmmod path.
> 
> We could make the handle counter transaction aware to
> "fix" this on kernel side; it should not be too much code.
> 
> What do you think?

I don't think this needs a kernel fix.

The kernel is free to allocate handle, the guarantee is that they are
unique. How this handles are allocated could change in the future.
There is no way userspace can forecast how handles are allocated.

Phil made some code to skip comparing handles in tests/py that I
remember to deal with this.
Thomas Haller Nov. 17, 2023, 4:16 p.m. UTC | #20
On Fri, 2023-11-17 at 00:00 +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Hi Thomas,
> > 
> > On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller wrote:
> > > On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso wrote:
> > [...]
> > > > I see _lots_ of DUMP FAIL with kernel 5.4
> > > 
> > > Hi,
> > > 
> > > Could you provide more details?
> > > 
> > > For example,
> > > 
> > >     make -j && ./tests/shell/run-tests.sh
> > > tests/shell/testcases/include/0007glob_double_0 -x
> > >     grep ^ -a -R /tmp/nft-test.latest.*/
> > 
> > # cat [...]/ruleset-diff.json
> > --- testcases/include/dumps/0007glob_double_0.json-nft  2023-11-15
> > 13:27:20.272084254 +0100
> > +++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-testcases-
> > include-0007glob_double_0.1/ruleset-after.json      2023-11-16
> > 17:06:18.332535411 +0100
> > @@ -1 +1 @@
> > -{"nftables": [{"metainfo": {"version": "VERSION", "release_name":
> > "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family":
> > "ip", "name": "x", "handle": 1}}, {"table": {"family": "ip",
> > "name": "y", "handle": 2}}]}
> > +{"nftables": [{"metainfo": {"version": "VERSION", "release_name":
> > "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family":
> > "ip", "name": "x", "handle": 158}}, {"table": {"family": "ip",
> > "name": "y", "handle": 159}}]}
> > 
> > It seems that handles are a problem in this diff.
> 
> Are you running tests with -s option?
> 
> In that case, modules are removed after each test.
> 
> I suspect its because we can then hit -EAGAIN mid-transaction
> because module is missing (again), then replay logic does its
> thing.
> 
> But the handle generator isn't transaction aware,
> so it has advanced vs. the aborted partial transaction.

> I'm not sure what to do here.

a combination of:

a) make an effort, that kernel behavior is consistent and reproducible.
Stable output seems important to me, and the automatic loading of a
kernel module should not make a difference. This is IMO a bug.

b) let `nft -j list ruleset` honor (the lack of) `--handle` option and
not print those handles. That bugfix would change behavior, so maybe
instead add a "--no-handle" option for `nft -j` dumps. 

c) sanitize the output with the sed command (my other mail).



This also means, that the .json-nft dumps won't work, if you run
without `unshare`. IMO, the mode without unshare should not be
supported anymore. But if it's deemed important, then it requires b) or
c) or detect the case and skip the diffs with .json-nft.


Thomas
Pablo Neira Ayuso Nov. 17, 2023, 4:36 p.m. UTC | #21
On Fri, Nov 17, 2023 at 05:16:02PM +0100, Thomas Haller wrote:
> On Fri, 2023-11-17 at 00:00 +0100, Florian Westphal wrote:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > Hi Thomas,
> > > 
> > > On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller wrote:
> > > > On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso wrote:
> > > [...]
> > > > > I see _lots_ of DUMP FAIL with kernel 5.4
> > > > 
> > > > Hi,
> > > > 
> > > > Could you provide more details?
> > > > 
> > > > For example,
> > > > 
> > > >     make -j && ./tests/shell/run-tests.sh
> > > > tests/shell/testcases/include/0007glob_double_0 -x
> > > >     grep ^ -a -R /tmp/nft-test.latest.*/
> > > 
> > > # cat [...]/ruleset-diff.json
> > > --- testcases/include/dumps/0007glob_double_0.json-nft  2023-11-15
> > > 13:27:20.272084254 +0100
> > > +++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-testcases-
> > > include-0007glob_double_0.1/ruleset-after.json      2023-11-16
> > > 17:06:18.332535411 +0100
> > > @@ -1 +1 @@
> > > -{"nftables": [{"metainfo": {"version": "VERSION", "release_name":
> > > "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family":
> > > "ip", "name": "x", "handle": 1}}, {"table": {"family": "ip",
> > > "name": "y", "handle": 2}}]}
> > > +{"nftables": [{"metainfo": {"version": "VERSION", "release_name":
> > > "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family":
> > > "ip", "name": "x", "handle": 158}}, {"table": {"family": "ip",
> > > "name": "y", "handle": 159}}]}
> > > 
> > > It seems that handles are a problem in this diff.
> > 
> > Are you running tests with -s option?
> > 
> > In that case, modules are removed after each test.
> > 
> > I suspect its because we can then hit -EAGAIN mid-transaction
> > because module is missing (again), then replay logic does its
> > thing.
> > 
> > But the handle generator isn't transaction aware,
> > so it has advanced vs. the aborted partial transaction.
> 
> > I'm not sure what to do here.
> 
> a combination of:
> 
> a) make an effort, that kernel behavior is consistent and reproducible.
> Stable output seems important to me, and the automatic loading of a
> kernel module should not make a difference. This is IMO a bug.

This is not a bug in the kernel. The kernel guarantees that the handle
is unique, but the handle allocation strategy is up to the kernel.
Userspace cannot forecast what handle will get, such thing might lead
to easy to break assumptions from userspace.

> b) let `nft -j list ruleset` honor (the lack of) `--handle` option and
> not print those handles. That bugfix would change behavior, so maybe
> instead add a "--no-handle" option for `nft -j` dumps.

Will honoring -a/--handle break firewalld? I think it is the main user
of the JSON API. That might help disentangle if this makes sense or
not and what the chances of breaking third party applications are.

I'd prefer not to see a --no-handle that will only work for JSON and
that is only useful for this test infrastructure (noone else asked for
this).

> c) sanitize the output with the sed command (my other mail).
>
> This also means, that the .json-nft dumps won't work, if you run
> without `unshare`. IMO, the mode without unshare should not be
> supported anymore. But if it's deemed important, then it requires b) or
> c) or detect the case and skip the diffs with .json-nft.

a) is no-go (kernel update to make test infrastructure or to allow
userspace application to make fragile assumptions on how handles are
allocated is not correct).

b) needs to evaluated, you maintain firewalld, let us know.
Thomas Haller Nov. 17, 2023, 4:56 p.m. UTC | #22
> 
> 
> Will honoring -a/--handle break firewalld?

firewalld doesn't use the nft command line, only libnftables (via
py/src/nftables.py).

However, on the libnftables API the same problem happens. Namely, that
the nft output by default does not show handles, and you have to opt-in
via NFT_CTX_OUTPUT_HANDLE. On the other hand, the JSON output always
outputs handles. Starting to honor a lack of NFT_CTX_OUTPUT_HANDLE with
JSON output is an obvious change in behavior (well, or rather a
bugfix).


The good new is, that firewalld wouldn't care about that either,
because since forever it calls Nftables.set_handle_output(True) and
always sets NFT_CTX_OUTPUT_HANDLE.



Thomas
Phil Sutter Nov. 17, 2023, 4:57 p.m. UTC | #23
On Fri, Nov 17, 2023 at 05:36:23PM +0100, Pablo Neira Ayuso wrote:
> On Fri, Nov 17, 2023 at 05:16:02PM +0100, Thomas Haller wrote:
> > On Fri, 2023-11-17 at 00:00 +0100, Florian Westphal wrote:
> > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > Hi Thomas,
> > > > 
> > > > On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller wrote:
> > > > > On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso wrote:
> > > > [...]
> > > > > > I see _lots_ of DUMP FAIL with kernel 5.4
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > Could you provide more details?
> > > > > 
> > > > > For example,
> > > > > 
> > > > >     make -j && ./tests/shell/run-tests.sh
> > > > > tests/shell/testcases/include/0007glob_double_0 -x
> > > > >     grep ^ -a -R /tmp/nft-test.latest.*/
> > > > 
> > > > # cat [...]/ruleset-diff.json
> > > > --- testcases/include/dumps/0007glob_double_0.json-nft  2023-11-15
> > > > 13:27:20.272084254 +0100
> > > > +++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-testcases-
> > > > include-0007glob_double_0.1/ruleset-after.json      2023-11-16
> > > > 17:06:18.332535411 +0100
> > > > @@ -1 +1 @@
> > > > -{"nftables": [{"metainfo": {"version": "VERSION", "release_name":
> > > > "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family":
> > > > "ip", "name": "x", "handle": 1}}, {"table": {"family": "ip",
> > > > "name": "y", "handle": 2}}]}
> > > > +{"nftables": [{"metainfo": {"version": "VERSION", "release_name":
> > > > "RELEASE_NAME", "json_schema_version": 1}}, {"table": {"family":
> > > > "ip", "name": "x", "handle": 158}}, {"table": {"family": "ip",
> > > > "name": "y", "handle": 159}}]}
> > > > 
> > > > It seems that handles are a problem in this diff.
> > > 
> > > Are you running tests with -s option?
> > > 
> > > In that case, modules are removed after each test.
> > > 
> > > I suspect its because we can then hit -EAGAIN mid-transaction
> > > because module is missing (again), then replay logic does its
> > > thing.
> > > 
> > > But the handle generator isn't transaction aware,
> > > so it has advanced vs. the aborted partial transaction.
> > 
> > > I'm not sure what to do here.
> > 
> > a combination of:
> > 
> > a) make an effort, that kernel behavior is consistent and reproducible.
> > Stable output seems important to me, and the automatic loading of a
> > kernel module should not make a difference. This is IMO a bug.
> 
> This is not a bug in the kernel. The kernel guarantees that the handle
> is unique, but the handle allocation strategy is up to the kernel.
> Userspace cannot forecast what handle will get, such thing might lead
> to easy to break assumptions from userspace.
> 
> > b) let `nft -j list ruleset` honor (the lack of) `--handle` option and
> > not print those handles. That bugfix would change behavior, so maybe
> > instead add a "--no-handle" option for `nft -j` dumps.


> 
> Will honoring -a/--handle break firewalld? I think it is the main user
> of the JSON API. That might help disentangle if this makes sense or
> not and what the chances of breaking third party applications are.
> 
> I'd prefer not to see a --no-handle that will only work for JSON and
> that is only useful for this test infrastructure (noone else asked for
> this).
> 
> > c) sanitize the output with the sed command (my other mail).
> >
> > This also means, that the .json-nft dumps won't work, if you run
> > without `unshare`. IMO, the mode without unshare should not be
> > supported anymore. But if it's deemed important, then it requires b) or
> > c) or detect the case and skip the diffs with .json-nft.

What is the problem without unshare? Looking at your patch, it seems
possible to drop the handle attributes in json-sanitize-ruleset.sh.

> a) is no-go (kernel update to make test infrastructure or to allow
> userspace application to make fragile assumptions on how handles are
> allocated is not correct).
> 
> b) needs to evaluated, you maintain firewalld, let us know.

Given the inherent importance of the handle value for ruleset
manipulations, I assume *any* application will need to be updated to
pass --handle (or the libnftables-equivalent) to remain functional.

Cheers, Phil
Thomas Haller Nov. 17, 2023, 5:06 p.m. UTC | #24
On Fri, 2023-11-17 at 17:57 +0100, Phil Sutter wrote:
> On Fri, Nov 17, 2023 at 05:36:23PM +0100, Pablo Neira Ayuso wrote:
> > On Fri, Nov 17, 2023 at 05:16:02PM +0100, Thomas Haller wrote:
> > > On Fri, 2023-11-17 at 00:00 +0100, Florian Westphal wrote:
> > > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > > Hi Thomas,
> > > > > 
> > > > > On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller
> > > > > wrote:
> > > > > > On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso wrote:
> > > > > [...]
> > > > > > > I see _lots_ of DUMP FAIL with kernel 5.4
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > Could you provide more details?
> > > > > > 
> > > > > > For example,
> > > > > > 
> > > > > >     make -j && ./tests/shell/run-tests.sh
> > > > > > tests/shell/testcases/include/0007glob_double_0 -x
> > > > > >     grep ^ -a -R /tmp/nft-test.latest.*/
> > > > > 
> > > > > # cat [...]/ruleset-diff.json
> > > > > --- testcases/include/dumps/0007glob_double_0.json-nft  2023-
> > > > > 11-15
> > > > > 13:27:20.272084254 +0100
> > > > > +++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-testcases-
> > > > > include-0007glob_double_0.1/ruleset-after.json      2023-11-
> > > > > 16
> > > > > 17:06:18.332535411 +0100
> > > > > @@ -1 +1 @@
> > > > > -{"nftables": [{"metainfo": {"version": "VERSION",
> > > > > "release_name":
> > > > > "RELEASE_NAME", "json_schema_version": 1}}, {"table":
> > > > > {"family":
> > > > > "ip", "name": "x", "handle": 1}}, {"table": {"family": "ip",
> > > > > "name": "y", "handle": 2}}]}
> > > > > +{"nftables": [{"metainfo": {"version": "VERSION",
> > > > > "release_name":
> > > > > "RELEASE_NAME", "json_schema_version": 1}}, {"table":
> > > > > {"family":
> > > > > "ip", "name": "x", "handle": 158}}, {"table": {"family":
> > > > > "ip",
> > > > > "name": "y", "handle": 159}}]}
> > > > > 
> > > > > It seems that handles are a problem in this diff.
> > > > 
> > > > Are you running tests with -s option?
> > > > 
> > > > In that case, modules are removed after each test.
> > > > 
> > > > I suspect its because we can then hit -EAGAIN mid-transaction
> > > > because module is missing (again), then replay logic does its
> > > > thing.
> > > > 
> > > > But the handle generator isn't transaction aware,
> > > > so it has advanced vs. the aborted partial transaction.
> > > 
> > > > I'm not sure what to do here.
> > > 
> > > a combination of:
> > > 
> > > a) make an effort, that kernel behavior is consistent and
> > > reproducible.
> > > Stable output seems important to me, and the automatic loading of
> > > a
> > > kernel module should not make a difference. This is IMO a bug.
> > 
> > This is not a bug in the kernel. The kernel guarantees that the
> > handle
> > is unique, but the handle allocation strategy is up to the kernel.
> > Userspace cannot forecast what handle will get, such thing might
> > lead
> > to easy to break assumptions from userspace.
> > 
> > > b) let `nft -j list ruleset` honor (the lack of) `--handle`
> > > option and
> > > not print those handles. That bugfix would change behavior, so
> > > maybe
> > > instead add a "--no-handle" option for `nft -j` dumps.
> 
> 
> > 
> > Will honoring -a/--handle break firewalld? I think it is the main
> > user
> > of the JSON API. That might help disentangle if this makes sense or
> > not and what the chances of breaking third party applications are.
> > 
> > I'd prefer not to see a --no-handle that will only work for JSON
> > and
> > that is only useful for this test infrastructure (noone else asked
> > for
> > this).
> > 
> > > c) sanitize the output with the sed command (my other mail).
> > > 
> > > This also means, that the .json-nft dumps won't work, if you run
> > > without `unshare`. IMO, the mode without unshare should not be
> > > supported anymore. But if it's deemed important, then it requires
> > > b) or
> > > c) or detect the case and skip the diffs with .json-nft.
> 
> What is the problem without unshare? Looking at your patch, it seems
> possible to drop the handle attributes in json-sanitize-ruleset.sh.

Yes, (b) would suffice. I said "or" :)

No further problem, but without-unshare seems not a useful thing to
support. The test-run takes significantly longer, interferes with the
caller's netns and requires CAP_NET_ADMIN.

> 
> > a) is no-go (kernel update to make test infrastructure or to allow
> > userspace application to make fragile assumptions on how handles
> > are
> > allocated is not correct).
> > 
> > b) needs to evaluated, you maintain firewalld, let us know.
> 
> Given the inherent importance of the handle value for ruleset
> manipulations, I assume *any* application will need to be updated to
> pass --handle (or the libnftables-equivalent) to remain functional.

Right. So a "--no-handle" / NFT_CTX_OUTPUT_NO_HANDLE flag for JSON
output?



Thomas
Phil Sutter Nov. 17, 2023, 5:11 p.m. UTC | #25
On Fri, Nov 17, 2023 at 06:06:16PM +0100, Thomas Haller wrote:
> On Fri, 2023-11-17 at 17:57 +0100, Phil Sutter wrote:
> > On Fri, Nov 17, 2023 at 05:36:23PM +0100, Pablo Neira Ayuso wrote:
> > > On Fri, Nov 17, 2023 at 05:16:02PM +0100, Thomas Haller wrote:
> > > > On Fri, 2023-11-17 at 00:00 +0100, Florian Westphal wrote:
> > > > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > > > Hi Thomas,
> > > > > > 
> > > > > > On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller
> > > > > > wrote:
> > > > > > > On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso wrote:
> > > > > > [...]
> > > > > > > > I see _lots_ of DUMP FAIL with kernel 5.4
> > > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Could you provide more details?
> > > > > > > 
> > > > > > > For example,
> > > > > > > 
> > > > > > >     make -j && ./tests/shell/run-tests.sh
> > > > > > > tests/shell/testcases/include/0007glob_double_0 -x
> > > > > > >     grep ^ -a -R /tmp/nft-test.latest.*/
> > > > > > 
> > > > > > # cat [...]/ruleset-diff.json
> > > > > > --- testcases/include/dumps/0007glob_double_0.json-nft  2023-
> > > > > > 11-15
> > > > > > 13:27:20.272084254 +0100
> > > > > > +++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-testcases-
> > > > > > include-0007glob_double_0.1/ruleset-after.json      2023-11-
> > > > > > 16
> > > > > > 17:06:18.332535411 +0100
> > > > > > @@ -1 +1 @@
> > > > > > -{"nftables": [{"metainfo": {"version": "VERSION",
> > > > > > "release_name":
> > > > > > "RELEASE_NAME", "json_schema_version": 1}}, {"table":
> > > > > > {"family":
> > > > > > "ip", "name": "x", "handle": 1}}, {"table": {"family": "ip",
> > > > > > "name": "y", "handle": 2}}]}
> > > > > > +{"nftables": [{"metainfo": {"version": "VERSION",
> > > > > > "release_name":
> > > > > > "RELEASE_NAME", "json_schema_version": 1}}, {"table":
> > > > > > {"family":
> > > > > > "ip", "name": "x", "handle": 158}}, {"table": {"family":
> > > > > > "ip",
> > > > > > "name": "y", "handle": 159}}]}
> > > > > > 
> > > > > > It seems that handles are a problem in this diff.
> > > > > 
> > > > > Are you running tests with -s option?
> > > > > 
> > > > > In that case, modules are removed after each test.
> > > > > 
> > > > > I suspect its because we can then hit -EAGAIN mid-transaction
> > > > > because module is missing (again), then replay logic does its
> > > > > thing.
> > > > > 
> > > > > But the handle generator isn't transaction aware,
> > > > > so it has advanced vs. the aborted partial transaction.
> > > > 
> > > > > I'm not sure what to do here.
> > > > 
> > > > a combination of:
> > > > 
> > > > a) make an effort, that kernel behavior is consistent and
> > > > reproducible.
> > > > Stable output seems important to me, and the automatic loading of
> > > > a
> > > > kernel module should not make a difference. This is IMO a bug.
> > > 
> > > This is not a bug in the kernel. The kernel guarantees that the
> > > handle
> > > is unique, but the handle allocation strategy is up to the kernel.
> > > Userspace cannot forecast what handle will get, such thing might
> > > lead
> > > to easy to break assumptions from userspace.
> > > 
> > > > b) let `nft -j list ruleset` honor (the lack of) `--handle`
> > > > option and
> > > > not print those handles. That bugfix would change behavior, so
> > > > maybe
> > > > instead add a "--no-handle" option for `nft -j` dumps.
> > 
> > 
> > > 
> > > Will honoring -a/--handle break firewalld? I think it is the main
> > > user
> > > of the JSON API. That might help disentangle if this makes sense or
> > > not and what the chances of breaking third party applications are.
> > > 
> > > I'd prefer not to see a --no-handle that will only work for JSON
> > > and
> > > that is only useful for this test infrastructure (noone else asked
> > > for
> > > this).
> > > 
> > > > c) sanitize the output with the sed command (my other mail).
> > > > 
> > > > This also means, that the .json-nft dumps won't work, if you run
> > > > without `unshare`. IMO, the mode without unshare should not be
> > > > supported anymore. But if it's deemed important, then it requires
> > > > b) or
> > > > c) or detect the case and skip the diffs with .json-nft.
> > 
> > What is the problem without unshare? Looking at your patch, it seems
> > possible to drop the handle attributes in json-sanitize-ruleset.sh.
> 
> Yes, (b) would suffice. I said "or" :)
> 
> No further problem, but without-unshare seems not a useful thing to
> support. The test-run takes significantly longer, interferes with the
> caller's netns and requires CAP_NET_ADMIN.

No, I was wondering why with option (c) "This also means, that the
.json-nft dumps won't work, if you run without `unshare`."

Because I vote for that option. ;)

> > 
> > > a) is no-go (kernel update to make test infrastructure or to allow
> > > userspace application to make fragile assumptions on how handles
> > > are
> > > allocated is not correct).
> > > 
> > > b) needs to evaluated, you maintain firewalld, let us know.
> > 
> > Given the inherent importance of the handle value for ruleset
> > manipulations, I assume *any* application will need to be updated to
> > pass --handle (or the libnftables-equivalent) to remain functional.
> 
> Right. So a "--no-handle" / NFT_CTX_OUTPUT_NO_HANDLE flag for JSON
> output?

Should not be needed. IIUC, the test infrastructure you're about to
introduce sanitizes the JSON output already anyway, right?
Thomas Haller Nov. 17, 2023, 5:23 p.m. UTC | #26
On Fri, 2023-11-17 at 18:11 +0100, Phil Sutter wrote:
> On Fri, Nov 17, 2023 at 06:06:16PM +0100, Thomas Haller wrote:
> > On Fri, 2023-11-17 at 17:57 +0100, Phil Sutter wrote:
> > > On Fri, Nov 17, 2023 at 05:36:23PM +0100, Pablo Neira Ayuso
> > > wrote:
> > > > On Fri, Nov 17, 2023 at 05:16:02PM +0100, Thomas Haller wrote:
> > > > > On Fri, 2023-11-17 at 00:00 +0100, Florian Westphal wrote:
> > > > > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > > > > Hi Thomas,
> > > > > > > 
> > > > > > > On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller
> > > > > > > wrote:
> > > > > > > > On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso
> > > > > > > > wrote:
> > > > > > > [...]
> > > > > > > > > I see _lots_ of DUMP FAIL with kernel 5.4
> > > > > > > > 
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > Could you provide more details?
> > > > > > > > 
> > > > > > > > For example,
> > > > > > > > 
> > > > > > > >     make -j && ./tests/shell/run-tests.sh
> > > > > > > > tests/shell/testcases/include/0007glob_double_0 -x
> > > > > > > >     grep ^ -a -R /tmp/nft-test.latest.*/
> > > > > > > 
> > > > > > > # cat [...]/ruleset-diff.json
> > > > > > > --- testcases/include/dumps/0007glob_double_0.json-nft 
> > > > > > > 2023-
> > > > > > > 11-15
> > > > > > > 13:27:20.272084254 +0100
> > > > > > > +++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-
> > > > > > > testcases-
> > > > > > > include-0007glob_double_0.1/ruleset-after.json      2023-
> > > > > > > 11-
> > > > > > > 16
> > > > > > > 17:06:18.332535411 +0100
> > > > > > > @@ -1 +1 @@
> > > > > > > -{"nftables": [{"metainfo": {"version": "VERSION",
> > > > > > > "release_name":
> > > > > > > "RELEASE_NAME", "json_schema_version": 1}}, {"table":
> > > > > > > {"family":
> > > > > > > "ip", "name": "x", "handle": 1}}, {"table": {"family":
> > > > > > > "ip",
> > > > > > > "name": "y", "handle": 2}}]}
> > > > > > > +{"nftables": [{"metainfo": {"version": "VERSION",
> > > > > > > "release_name":
> > > > > > > "RELEASE_NAME", "json_schema_version": 1}}, {"table":
> > > > > > > {"family":
> > > > > > > "ip", "name": "x", "handle": 158}}, {"table": {"family":
> > > > > > > "ip",
> > > > > > > "name": "y", "handle": 159}}]}
> > > > > > > 
> > > > > > > It seems that handles are a problem in this diff.
> > > > > > 
> > > > > > Are you running tests with -s option?
> > > > > > 
> > > > > > In that case, modules are removed after each test.
> > > > > > 
> > > > > > I suspect its because we can then hit -EAGAIN mid-
> > > > > > transaction
> > > > > > because module is missing (again), then replay logic does
> > > > > > its
> > > > > > thing.
> > > > > > 
> > > > > > But the handle generator isn't transaction aware,
> > > > > > so it has advanced vs. the aborted partial transaction.
> > > > > 
> > > > > > I'm not sure what to do here.
> > > > > 
> > > > > a combination of:
> > > > > 
> > > > > a) make an effort, that kernel behavior is consistent and
> > > > > reproducible.
> > > > > Stable output seems important to me, and the automatic
> > > > > loading of
> > > > > a
> > > > > kernel module should not make a difference. This is IMO a
> > > > > bug.
> > > > 
> > > > This is not a bug in the kernel. The kernel guarantees that the
> > > > handle
> > > > is unique, but the handle allocation strategy is up to the
> > > > kernel.
> > > > Userspace cannot forecast what handle will get, such thing
> > > > might
> > > > lead
> > > > to easy to break assumptions from userspace.
> > > > 
> > > > > b) let `nft -j list ruleset` honor (the lack of) `--handle`
> > > > > option and
> > > > > not print those handles. That bugfix would change behavior,
> > > > > so
> > > > > maybe
> > > > > instead add a "--no-handle" option for `nft -j` dumps.
> > > 
> > > 
> > > > 
> > > > Will honoring -a/--handle break firewalld? I think it is the
> > > > main
> > > > user
> > > > of the JSON API. That might help disentangle if this makes
> > > > sense or
> > > > not and what the chances of breaking third party applications
> > > > are.
> > > > 
> > > > I'd prefer not to see a --no-handle that will only work for
> > > > JSON
> > > > and
> > > > that is only useful for this test infrastructure (noone else
> > > > asked
> > > > for
> > > > this).
> > > > 
> > > > > c) sanitize the output with the sed command (my other mail).
> > > > > 
> > > > > This also means, that the .json-nft dumps won't work, if you
> > > > > run
> > > > > without `unshare`. IMO, the mode without unshare should not
> > > > > be
> > > > > supported anymore. But if it's deemed important, then it
> > > > > requires
> > > > > b) or
> > > > > c) or detect the case and skip the diffs with .json-nft.
> > > 
> > > What is the problem without unshare? Looking at your patch, it
> > > seems
> > > possible to drop the handle attributes in json-sanitize-
> > > ruleset.sh.
> > 
> > Yes, (b) would suffice. I said "or" :)
> > 
> > No further problem, but without-unshare seems not a useful thing to
> > support. The test-run takes significantly longer, interferes with
> > the
> > caller's netns and requires CAP_NET_ADMIN.
> 
> No, I was wondering why with option (c) "This also means, that the
> .json-nft dumps won't work, if you run without `unshare`."
> 
> Because I vote for that option. ;)

Yes, sorry. I got confused with my own numbering :)

I meant also c)


> > > 
> > > > a) is no-go (kernel update to make test infrastructure or to
> > > > allow
> > > > userspace application to make fragile assumptions on how
> > > > handles
> > > > are
> > > > allocated is not correct).
> > > > 
> > > > b) needs to evaluated, you maintain firewalld, let us know.
> > > 
> > > Given the inherent importance of the handle value for ruleset
> > > manipulations, I assume *any* application will need to be updated
> > > to
> > > pass --handle (or the libnftables-equivalent) to remain
> > > functional.
> > 
> > Right. So a "--no-handle" / NFT_CTX_OUTPUT_NO_HANDLE flag for JSON
> > output?
> 
> Should not be needed. IIUC, the test infrastructure you're about to
> introduce sanitizes the JSON output already anyway, right?

Right. c) alone may very well suffice.

I just sent a patch to that amount.


I still think that `nft -j` ignoring the lack of "--no-handle" /
NFT_CTX_OUTPUT_NO_HANDLE is a bug. At the very last a documentation
bug.


Thomas
Phil Sutter Nov. 17, 2023, 10:30 p.m. UTC | #27
On Fri, Nov 17, 2023 at 06:23:35PM +0100, Thomas Haller wrote:
> On Fri, 2023-11-17 at 18:11 +0100, Phil Sutter wrote:
> > On Fri, Nov 17, 2023 at 06:06:16PM +0100, Thomas Haller wrote:
> > > On Fri, 2023-11-17 at 17:57 +0100, Phil Sutter wrote:
> > > > On Fri, Nov 17, 2023 at 05:36:23PM +0100, Pablo Neira Ayuso
> > > > wrote:
> > > > > On Fri, Nov 17, 2023 at 05:16:02PM +0100, Thomas Haller wrote:
> > > > > > On Fri, 2023-11-17 at 00:00 +0100, Florian Westphal wrote:
> > > > > > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > > > > > Hi Thomas,
> > > > > > > > 
> > > > > > > > On Wed, Nov 15, 2023 at 01:36:40PM +0100, Thomas Haller
> > > > > > > > wrote:
> > > > > > > > > On Wed, 2023-11-15 at 13:30 +0100, Pablo Neira Ayuso
> > > > > > > > > wrote:
> > > > > > > > [...]
> > > > > > > > > > I see _lots_ of DUMP FAIL with kernel 5.4
> > > > > > > > > 
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > Could you provide more details?
> > > > > > > > > 
> > > > > > > > > For example,
> > > > > > > > > 
> > > > > > > > >     make -j && ./tests/shell/run-tests.sh
> > > > > > > > > tests/shell/testcases/include/0007glob_double_0 -x
> > > > > > > > >     grep ^ -a -R /tmp/nft-test.latest.*/
> > > > > > > > 
> > > > > > > > # cat [...]/ruleset-diff.json
> > > > > > > > --- testcases/include/dumps/0007glob_double_0.json-nft 
> > > > > > > > 2023-
> > > > > > > > 11-15
> > > > > > > > 13:27:20.272084254 +0100
> > > > > > > > +++ /tmp/nft-test.20231116-170617.584.lrZzMy/test-
> > > > > > > > testcases-
> > > > > > > > include-0007glob_double_0.1/ruleset-after.json      2023-
> > > > > > > > 11-
> > > > > > > > 16
> > > > > > > > 17:06:18.332535411 +0100
> > > > > > > > @@ -1 +1 @@
> > > > > > > > -{"nftables": [{"metainfo": {"version": "VERSION",
> > > > > > > > "release_name":
> > > > > > > > "RELEASE_NAME", "json_schema_version": 1}}, {"table":
> > > > > > > > {"family":
> > > > > > > > "ip", "name": "x", "handle": 1}}, {"table": {"family":
> > > > > > > > "ip",
> > > > > > > > "name": "y", "handle": 2}}]}
> > > > > > > > +{"nftables": [{"metainfo": {"version": "VERSION",
> > > > > > > > "release_name":
> > > > > > > > "RELEASE_NAME", "json_schema_version": 1}}, {"table":
> > > > > > > > {"family":
> > > > > > > > "ip", "name": "x", "handle": 158}}, {"table": {"family":
> > > > > > > > "ip",
> > > > > > > > "name": "y", "handle": 159}}]}
> > > > > > > > 
> > > > > > > > It seems that handles are a problem in this diff.
> > > > > > > 
> > > > > > > Are you running tests with -s option?
> > > > > > > 
> > > > > > > In that case, modules are removed after each test.
> > > > > > > 
> > > > > > > I suspect its because we can then hit -EAGAIN mid-
> > > > > > > transaction
> > > > > > > because module is missing (again), then replay logic does
> > > > > > > its
> > > > > > > thing.
> > > > > > > 
> > > > > > > But the handle generator isn't transaction aware,
> > > > > > > so it has advanced vs. the aborted partial transaction.
> > > > > > 
> > > > > > > I'm not sure what to do here.
> > > > > > 
> > > > > > a combination of:
> > > > > > 
> > > > > > a) make an effort, that kernel behavior is consistent and
> > > > > > reproducible.
> > > > > > Stable output seems important to me, and the automatic
> > > > > > loading of
> > > > > > a
> > > > > > kernel module should not make a difference. This is IMO a
> > > > > > bug.
> > > > > 
> > > > > This is not a bug in the kernel. The kernel guarantees that the
> > > > > handle
> > > > > is unique, but the handle allocation strategy is up to the
> > > > > kernel.
> > > > > Userspace cannot forecast what handle will get, such thing
> > > > > might
> > > > > lead
> > > > > to easy to break assumptions from userspace.
> > > > > 
> > > > > > b) let `nft -j list ruleset` honor (the lack of) `--handle`
> > > > > > option and
> > > > > > not print those handles. That bugfix would change behavior,
> > > > > > so
> > > > > > maybe
> > > > > > instead add a "--no-handle" option for `nft -j` dumps.
> > > > 
> > > > 
> > > > > 
> > > > > Will honoring -a/--handle break firewalld? I think it is the
> > > > > main
> > > > > user
> > > > > of the JSON API. That might help disentangle if this makes
> > > > > sense or
> > > > > not and what the chances of breaking third party applications
> > > > > are.
> > > > > 
> > > > > I'd prefer not to see a --no-handle that will only work for
> > > > > JSON
> > > > > and
> > > > > that is only useful for this test infrastructure (noone else
> > > > > asked
> > > > > for
> > > > > this).
> > > > > 
> > > > > > c) sanitize the output with the sed command (my other mail).
> > > > > > 
> > > > > > This also means, that the .json-nft dumps won't work, if you
> > > > > > run
> > > > > > without `unshare`. IMO, the mode without unshare should not
> > > > > > be
> > > > > > supported anymore. But if it's deemed important, then it
> > > > > > requires
> > > > > > b) or
> > > > > > c) or detect the case and skip the diffs with .json-nft.
> > > > 
> > > > What is the problem without unshare? Looking at your patch, it
> > > > seems
> > > > possible to drop the handle attributes in json-sanitize-
> > > > ruleset.sh.
> > > 
> > > Yes, (b) would suffice. I said "or" :)
> > > 
> > > No further problem, but without-unshare seems not a useful thing to
> > > support. The test-run takes significantly longer, interferes with
> > > the
> > > caller's netns and requires CAP_NET_ADMIN.
> > 
> > No, I was wondering why with option (c) "This also means, that the
> > .json-nft dumps won't work, if you run without `unshare`."
> > 
> > Because I vote for that option. ;)
> 
> Yes, sorry. I got confused with my own numbering :)
> 
> I meant also c)
> 
> 
> > > > 
> > > > > a) is no-go (kernel update to make test infrastructure or to
> > > > > allow
> > > > > userspace application to make fragile assumptions on how
> > > > > handles
> > > > > are
> > > > > allocated is not correct).
> > > > > 
> > > > > b) needs to evaluated, you maintain firewalld, let us know.
> > > > 
> > > > Given the inherent importance of the handle value for ruleset
> > > > manipulations, I assume *any* application will need to be updated
> > > > to
> > > > pass --handle (or the libnftables-equivalent) to remain
> > > > functional.
> > > 
> > > Right. So a "--no-handle" / NFT_CTX_OUTPUT_NO_HANDLE flag for JSON
> > > output?
> > 
> > Should not be needed. IIUC, the test infrastructure you're about to
> > introduce sanitizes the JSON output already anyway, right?
> 
> Right. c) alone may very well suffice.
> 
> I just sent a patch to that amount.
> 
> 
> I still think that `nft -j` ignoring the lack of "--no-handle" /
> NFT_CTX_OUTPUT_NO_HANDLE is a bug. At the very last a documentation
> bug.

It is per design. Same with --numeric. JSON formatting is meant for
programmatic consumption, no point in increasing readability. I don't
see a reason why one would not want the handle attribute included in
dumps apart from your use-case and there is a solution at hand. See for
instance how nft-test.py strips the handle attribute when comparing JSON
output against the record or creating *.json.got files for missing
records.

Cheers, Phil
diff mbox series

Patch

diff --git a/tests/shell/helpers/json-sanitize-ruleset.sh b/tests/shell/helpers/json-sanitize-ruleset.sh
new file mode 100755
index 000000000000..270a6107e0aa
--- /dev/null
+++ b/tests/shell/helpers/json-sanitize-ruleset.sh
@@ -0,0 +1,23 @@ 
+#!/bin/bash -e
+
+die() {
+	printf "%s\n" "$*"
+	exit 1
+}
+
+do_sed() {
+	sed '1s/\({"nftables": \[{"metainfo": {"version": "\)[0-9.]\+\(", "release_name": "\)[^"]\+\(", "\)/\1VERSION\2RELEASE_NAME\3/' "$@"
+}
+
+if [ "$#" = 0 ] ; then
+	do_sed
+	exit $?
+fi
+
+for f ; do
+	test -f "$f" || die "$0: file \"$f\" does not exist"
+done
+
+for f ; do
+	do_sed -i "$f" || die "$0: \`sed -i\` failed for \"$f\""
+done
diff --git a/tests/shell/helpers/test-wrapper.sh b/tests/shell/helpers/test-wrapper.sh
index b74c56168768..62414d0db074 100755
--- a/tests/shell/helpers/test-wrapper.sh
+++ b/tests/shell/helpers/test-wrapper.sh
@@ -15,6 +15,16 @@  array_contains() {
 	return 1
 }
 
+show_file() {
+	local filename="$1"
+	shift
+	local msg="$*"
+
+	printf '%s\n>>>>\n' "$msg"
+	cat "$filename"
+	printf "<<<<\n"
+}
+
 TEST="$1"
 TESTBASE="$(basename "$TEST")"
 TESTDIR="$(dirname "$TEST")"
@@ -109,55 +119,108 @@  if [ "$rc_test" -eq 0 ] ; then
 	"${CMD[@]}" &>> "$NFT_TEST_TESTTMPDIR/testout.log" || rc_test=$?
 fi
 
-$NFT list ruleset > "$NFT_TEST_TESTTMPDIR/ruleset-after"
+rc_chkdump=0
+rc=0
+$NFT list ruleset > "$NFT_TEST_TESTTMPDIR/ruleset-after" 2> "$NFT_TEST_TESTTMPDIR/chkdump" || rc=$?
+if [ "$rc" -ne 0 -o -s "$NFT_TEST_TESTTMPDIR/chkdump" ] ; then
+	show_file "$NFT_TEST_TESTTMPDIR/chkdump" "Command \`$NFT list ruleset\` failed" >> "$NFT_TEST_TESTTMPDIR/rc-failed-chkdump"
+	rc_chkdump=1
+fi
+if [ "$NFT_TEST_HAVE_json" != n ] ; then
+	rc=0
+	$NFT -j list ruleset > "$NFT_TEST_TESTTMPDIR/ruleset-after.json" 2> "$NFT_TEST_TESTTMPDIR/chkdump" || rc=$?
+
+	# Workaround known bug in stmt_print_json(), due to
+	# "chain_stmt_ops.json" being NULL. This spams stderr.
+	sed -i '/^warning: stmt ops chain have no json callback$/d' "$NFT_TEST_TESTTMPDIR/chkdump"
+
+	if [ "$rc" -ne 0 -o -s "$NFT_TEST_TESTTMPDIR/chkdump" ] ; then
+		show_file "$NFT_TEST_TESTTMPDIR/chkdump" "Command \`$NFT -j list ruleset\` failed" >> "$NFT_TEST_TESTTMPDIR/rc-failed-chkdump"
+		rc_chkdump=1
+	fi
+	# Normalize the version number from the JSON output. Otherwise, we'd
+	# have to regenerate the .json-nft files upon release.
+	"$NFT_TEST_BASEDIR/helpers/json-sanitize-ruleset.sh" "$NFT_TEST_TESTTMPDIR/ruleset-after.json"
+fi
 
 read tainted_after < /proc/sys/kernel/tainted
 
 DUMPPATH="$TESTDIR/dumps"
 DUMPFILE="$DUMPPATH/$TESTBASE.nft"
+JDUMPFILE="$DUMPPATH/$TESTBASE.json-nft"
 NODUMPFILE="$DUMPPATH/$TESTBASE.nodump"
 
-dump_written=
-
-# The caller can request a re-geneating of the dumps, by setting
-# DUMPGEN=y.
-#
-# This only will happen if the command completed with success.
+# The caller can request a re-geneating of the .nft, .nodump, .json-nft dump files
+# by setting DUMPGEN=y. In that case, only the existing files will be regenerated
+# (unless all three files are missing, in which case all of them are generated).
 #
-# It also will only happen for tests, that have a "$DUMPPATH" directory. There
-# might be tests, that don't want to have dumps created. The existence of the
-# directory controls that. Tests that have a "$NODUMPFILE" file, don't get a dump generated.
-if [ "$rc_test" -eq 0 -a "$DUMPGEN" = y -a -d "$DUMPPATH" -a ! -f "$NODUMPFILE" ] ; then
+# By setting DUMPGEN=all, all 3 files are always regenerated.
+dump_written=n
+if [ "$rc_test" -eq 0 -a '(' "$DUMPGEN" = all -o "$DUMPGEN" = y ')' ] ; then
 	dump_written=y
-	if [ ! -f "$DUMPFILE" ] ; then
-		# No dumpfile exists yet. We generate both a .nft and a .nodump
-		# file. The user can pick which one to commit to git.
+	if [ ! -d "$DUMPPATH" ] ; then
+		mkdir "$DUMPPATH"
+	fi
+	if [ "$DUMPGEN" = all ] ; then
+		gen_nodumpfile=y
+		gen_dumpfile=y
+		gen_jdumpfile=y
+	else
+		# by default, only regenerate the files that we already have on disk.
+		gen_nodumpfile=n
+		gen_dumpfile=n
+		gen_jdumpfile=n
+		test -f "$DUMPFILE"  && gen_dumpfile=y
+		test -f "$JDUMPFILE" && gen_jdumpfile=y
+		test -f "$NODUMPFILE" && gen_nodumpfile=y
+		if [ "$gen_dumpfile" != y -a "$gen_jdumpfile" != y -a "$gen_nodumpfile" != y ] ; then
+			# Except, if no files exist. Them generate all files.
+			gen_dumpfile=y
+			gen_jdumpfile=y
+			gen_nodumpfile=y
+		fi
+	fi
+	if [ "$gen_nodumpfile" = y ] ; then
 		: > "$NODUMPFILE"
 	fi
-	cat "$NFT_TEST_TESTTMPDIR/ruleset-after" > "$DUMPFILE"
+	if [ "$gen_dumpfile" = y ] ; then
+		cat "$NFT_TEST_TESTTMPDIR/ruleset-after" > "$DUMPFILE"
+	fi
+	if [ "$NFT_TEST_HAVE_json" != n -a "$gen_jdumpfile" = y ] ; then
+		cat "$NFT_TEST_TESTTMPDIR/ruleset-after.json" > "$JDUMPFILE"
+	fi
 fi
 
 rc_dump=0
-if [ "$rc_test" -ne 77 -a -f "$DUMPFILE" ] ; then
-	if [ "$dump_written" != y ] ; then
+if [ "$rc_test" -ne 77 -a "$dump_written" != y ] ; then
+	if [ -f "$DUMPFILE" ] ; then
 		if ! $DIFF -u "$DUMPFILE" "$NFT_TEST_TESTTMPDIR/ruleset-after" &> "$NFT_TEST_TESTTMPDIR/ruleset-diff" ; then
+			show_file "$NFT_TEST_TESTTMPDIR/ruleset-diff" "Failed \`$DIFF -u \"$DUMPFILE\" \"$NFT_TEST_TESTTMPDIR/ruleset-after\"\`" >> "$NFT_TEST_TESTTMPDIR/rc-failed-dump"
 			rc_dump=1
 		else
 			rm -f "$NFT_TEST_TESTTMPDIR/ruleset-diff"
 		fi
 	fi
-fi
-if [ "$rc_dump" -ne 0 ] ; then
-	echo "$DUMPFILE" > "$NFT_TEST_TESTTMPDIR/rc-failed-dump"
+	if [ "$NFT_TEST_HAVE_json" != n -a -f "$JDUMPFILE" ] ; then
+		if ! $DIFF -u "$JDUMPFILE" "$NFT_TEST_TESTTMPDIR/ruleset-after.json" &> "$NFT_TEST_TESTTMPDIR/ruleset-diff.json" ; then
+			show_file "$NFT_TEST_TESTTMPDIR/ruleset-diff.json" "Failed \`$DIFF -u \"$JDUMPFILE\" \"$NFT_TEST_TESTTMPDIR/ruleset-after.json\"\`" >> "$NFT_TEST_TESTTMPDIR/rc-failed-dump"
+			rc_dump=1
+		else
+			rm -f "$NFT_TEST_TESTTMPDIR/ruleset-diff.json"
+		fi
+	fi
 fi
 
-rc_chkdump=0
 # check that a flush after the test succeeds. We anyway need a clean ruleset
 # for the `nft --check` next.
-$NFT flush ruleset &> "$NFT_TEST_TESTTMPDIR/rc-failed-chkdump" || rc_chkdump=1
+rc=0
+$NFT flush ruleset &> "$NFT_TEST_TESTTMPDIR/chkdump" || rc=1
+if [ "$rc" = 1 -o -s "$NFT_TEST_TESTTMPDIR/chkdump" ] ; then
+	show_file "$NFT_TEST_TESTTMPDIR/chkdump" "Command \`$NFT flush ruleset\` failed" >> "$NFT_TEST_TESTTMPDIR/rc-failed-chkdump"
+	rc_chkdump=1
+fi
+# For the dumpfiles, call `nft --check` to possibly cover new code paths.
 if [ -f "$DUMPFILE" ] ; then
-	# We have a dumpfile. Call `nft --check` to possibly cover new code
-	# paths.
 	if [ "$rc_test" -eq 77 ] ; then
 		# The test was skipped. Possibly we don't have the required
 		# features to process this file. Ignore any output and exit
@@ -165,20 +228,30 @@  if [ -f "$DUMPFILE" ] ; then
 		# issue we hope to find).
 		$NFT --check -f "$DUMPFILE" &>/dev/null || :
 	else
-		$NFT --check -f "$DUMPFILE" &>> "$NFT_TEST_TESTTMPDIR/rc-failed-chkdump" || rc_chkdump=1
+		fail=n
+		$NFT --check -f "$DUMPFILE" &> "$NFT_TEST_TESTTMPDIR/chkdump" || fail=y
+		test -s "$NFT_TEST_TESTTMPDIR/chkdump" && fail=y
+		if [ "$fail" = y ] ; then
+			show_file "$NFT_TEST_TESTTMPDIR/chkdump" "Command \`$NFT --check -f \"$DUMPFILE\"\` failed" >> "$NFT_TEST_TESTTMPDIR/rc-failed-chkdump"
+			rc_chkdump=1
+		fi
+		rm -f "$NFT_TEST_TESTTMPDIR/chkdump"
 	fi
 fi
-if [ -s "$NFT_TEST_TESTTMPDIR/rc-failed-chkdump" ] ; then
-	# Non-empty output? That is wrong.
-	rc_chkdump=1
-elif [ "$rc_chkdump" -eq 0 ] ; then
-	rm -rf "$NFT_TEST_TESTTMPDIR/rc-failed-chkdump"
-fi
-if [ "$rc_chkdump" -ne 0 ] ; then
-	# Ensure we don't have empty output files. Always write something, so
-	# that `grep ^ -R` lists the file.
-	echo -e "<<<<<\n\nCalling \`nft --check\` (or \`nft flush ruleset\`) failed for \"$DUMPFILE\"" >> "$NFT_TEST_TESTTMPDIR/rc-failed-chkdump"
+if [ "$NFT_TEST_HAVE_json" != n -a -f "$JDUMPFILE" ] ; then
+	if [ "$rc_test" -eq 77 ] ; then
+		$NFT -j --check -f "$JDUMPFILE" &>/dev/null || :
+	else
+		fail=n
+		$NFT -j --check -f "$JDUMPFILE" &> "$NFT_TEST_TESTTMPDIR/chkdump" || fail=y
+		test -s "$NFT_TEST_TESTTMPDIR/chkdump" && fail=y
+		if [ "$fail" = y ] ; then
+			show_file "$NFT_TEST_TESTTMPDIR/chkdump" "Command \`$NFT -j --check -f \"$JDUMPFILE\"\` failed" >> "$NFT_TEST_TESTTMPDIR/rc-failed-chkdump"
+			rc_chkdump=1
+		fi
+	fi
 fi
+rm -f "$NFT_TEST_TESTTMPDIR/chkdump"
 
 rc_valgrind=0
 [ -f "$NFT_TEST_TESTTMPDIR/rc-failed-valgrind" ] && rc_valgrind=1
diff --git a/tests/shell/run-tests.sh b/tests/shell/run-tests.sh
index 27a0ec43042a..3cde97b7ea17 100755
--- a/tests/shell/run-tests.sh
+++ b/tests/shell/run-tests.sh
@@ -184,9 +184,10 @@  usage() {
 	echo " VERBOSE=*|y   : Enable verbose output."
 	echo " NFT_TEST_VERBOSE_TEST=*|y: if true, enable verbose output for tests. For bash scripts, this means"
 	echo "                 to pass \"-x\" to the interpreter."
-	echo " DUMPGEN=*|y   : Regenerate dump files. Dump files are only recreated if the"
-	echo "                 test completes successfully and the \"dumps\" directory for the"
-	echo "                 test exits."
+	echo " DUMPGEN=*|y|all : Regenerate dump files \".{nft,json-nft,nodump}\". \"DUMPGEN=y\" only regenerates existing"
+	echo "                 files, unless the test has no files (then all three files are generated, and you need to"
+	echo "                 choose which to keep). With \"DUMPGEN=all\" all 3 files are regenerated, regardless"
+	echo "                 whether they already exist."
 	echo " VALGRIND=*|y  : Run \$NFT in valgrind."
 	echo " KMEMLEAK=*|y  : Check for kernel memleaks."
 	echo " NFT_TEST_HAS_REALROOT=*|y : To indicate whether the test has real root permissions."
@@ -279,7 +280,9 @@  _NFT_TEST_JOBS_DEFAULT="$(( _NFT_TEST_JOBS_DEFAULT + (_NFT_TEST_JOBS_DEFAULT + 1
 
 VERBOSE="$(bool_y "$VERBOSE")"
 NFT_TEST_VERBOSE_TEST="$(bool_y "$NFT_TEST_VERBOSE_TEST")"
-DUMPGEN="$(bool_y "$DUMPGEN")"
+if [ "$DUMPGEN" != "all" ] ; then
+	DUMPGEN="$(bool_y "$DUMPGEN")"
+fi
 VALGRIND="$(bool_y "$VALGRIND")"
 KMEMLEAK="$(bool_y "$KMEMLEAK")"
 NFT_TEST_KEEP_LOGS="$(bool_y "$NFT_TEST_KEEP_LOGS")"