diff mbox series

[ovs-dev,RFC] northd: Fix IGMP external subscriber subscribing to internal feed

Message ID 20220503151507.1782-1-aserdean@ovn.org
State RFC
Headers show
Series [ovs-dev,RFC] northd: Fix IGMP external subscriber subscribing to internal feed | expand

Commit Message

Alin-Gabriel Serdean May 3, 2022, 3:15 p.m. UTC
Add localnet ports to their corresponding southbound multicast group.

Update unit tests.

Reported-at: https://github.com/ovn-org/ovn/issues/125
Reported-by: Diko Parvanov <me@dparv.dev>
Suggested-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
---
 northd/northd.c | 17 ++++++++++-------
 tests/ovn.at    | 15 ++-------------
 2 files changed, 12 insertions(+), 20 deletions(-)

Comments

Dumitru Ceara May 23, 2022, 10:15 p.m. UTC | #1
On 5/3/22 17:15, Alin-Gabriel Serdean wrote:
> Add localnet ports to their corresponding southbound multicast group.
> 
> Update unit tests.
> 
> Reported-at: https://github.com/ovn-org/ovn/issues/125
> Reported-by: Diko Parvanov <me@dparv.dev>
> Suggested-by: Dumitru Ceara <dceara@redhat.com>
> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
> ---

Hi Alin,

Thanks for the patch!

>  northd/northd.c | 17 ++++++++++-------
>  tests/ovn.at    | 15 ++-------------
>  2 files changed, 12 insertions(+), 20 deletions(-)
> 
> diff --git a/northd/northd.c b/northd/northd.c
> index a56666297..4441ef631 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -4591,13 +4591,6 @@ ovn_igmp_group_aggregate_ports(struct ovn_igmp_group *igmp_group,
>          ovn_igmp_group_destroy_entry(entry);
>          free(entry);
>      }
> -
> -    if (igmp_group->datapath->n_localnet_ports) {
> -        ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
> -                                &igmp_group->mcgroup,
> -                                igmp_group->datapath->localnet_ports,
> -                                igmp_group->datapath->n_localnet_ports);
> -    }
>  }
>  
>  static void
> @@ -15066,6 +15059,16 @@ build_mcast_groups(struct lflow_input *input_data,
>  
>          /* Add the extracted ports to the IGMP group. */
>          ovn_igmp_group_add_entry(igmp_group, igmp_ports, n_igmp_ports);
> +        if (!ovn_igmp_group_allocate_id(igmp_group)) {
> +            ovn_igmp_group_destroy(igmp_groups, igmp_group);
> +            continue;
> +        }
> +        if (igmp_group->datapath->n_localnet_ports) {
> +            ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
> +                                    &igmp_group->mcgroup,
> +                                    igmp_group->datapath->localnet_ports,
> +                                    igmp_group->datapath->n_localnet_ports);
> +        }

I'm not sure I understand how this changes things.  After looking at the
main branch code more closely, we already include all localnet ports in
the multicast_group generated for an IGMP group.  We just do it at the
end of build_mcast_groups(), in ovn_igmp_group_aggregate_ports() which
is called once for every IGMP Group learnt in a logical datapath.

>      }
>  
>      /* Build IGMP groups for multicast routers with relay enabled. The router
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 34b6abfc0..5440639a8 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -19570,7 +19570,8 @@ ovn-nbctl lsp-add sw3 sw3-p1
>  ovn-nbctl lsp-add sw3 sw3-p2
>  ovn-nbctl lsp-add sw3 sw3-ln                    \
>      -- lsp-set-type sw3-ln localnet             \
> -    -- lsp-set-options sw3-ln network_name=phys
> +    -- lsp-set-options sw3-ln network_name=phys \
> +    -- lsp-set-options sw3-ln mcast_flood=true

This threw me off track for a bit.. "lsp-set-options <port> <options>"
will actually overwrite all options of <port> with <options>.  Which
most likely is not what we want here because we would be removing the
network_name, essentially disconnecting the localnet port.

If I understand the intention correctly, you should probably use:

-- lsp-set-options sw3-ln network_name=phys mcast_flood=true

>  
>  ovn-nbctl lr-add rtr
>  ovn-nbctl lrp-add rtr rtr-sw1 00:00:00:00:01:00 10.0.0.254/24
> @@ -19985,18 +19986,6 @@ store_ip_multicast_pkt \
>      $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 20 ca70 11 \
>      e518e518000a3b3a0000 expected_switched
>  
> -# TODO: IGMP Relay duplicates IP multicast packets in some conditions, for
> -# more details see TODO.rst, section "IP Multicast Relay". Once that issue is
> -# fixed the duplicated packets should not appear anymore.
> -store_ip_multicast_pkt \
> -    000000000100 01005e000144 \
> -    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 1f cb70 11 \
> -    e518e518000a3b3a0000 expected_routed_sw1
> -store_ip_multicast_pkt \
> -    000000000200 01005e000144 \
> -    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 1f cb70 11 \
> -    e518e518000a3b3a0000 expected_routed_sw2
> -

After changing the test to properly set mcast_flood=true on the localnet
port, it starts failing due to the duplicated routed packets (the TODO
above).  There was no logical change in northd.c as far as I can tell,
so that makes sense.

>  OVS_WAIT_UNTIL(
>    [check_packets 'hv1/vif1-tx.pcap expected_routed_sw1' \
>                   'hv2/vif3-tx.pcap expected_routed_sw2' \

Then I tried to replicate the issue Diko reported but I couldn't.  The
databases attached to the GitHub issue are truncated, I get:

ovsdb-tool: syntax error: ovnnb_db.db: 731669 bytes starting at offset
62 have SHA-1 hash 161c2831d154dce19aeb316b5d95de5db66988ac but should
have hash 0d9bc5cb94dcce9f58bb99c5588b249b551c27cd

And I tried setting up a unit test to do exactly what is reported in the
GitHub issue but regardless of mcast_flood value I always get multicast
traffic generated from an "internal" source properly forwarded to the
"external" subscribers that I created and attached to the provider
network simulated in the test.

Diko, do you still have an environment where you hit this issue?  Would
it be possible to share the NB/SB database files again, making sure
they're not truncated?

Thanks,
Dumitru
Alin Serdean May 23, 2022, 11:13 p.m. UTC | #2
> On 24 May 2022, at 00:15, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 5/3/22 17:15, Alin-Gabriel Serdean wrote:
>> Add localnet ports to their corresponding southbound multicast group.
>> 
>> Update unit tests.
>> 
>> Reported-at: https://github.com/ovn-org/ovn/issues/125
>> Reported-by: Diko Parvanov <me@dparv.dev>
>> Suggested-by: Dumitru Ceara <dceara@redhat.com>
>> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
>> ---
> 
> Hi Alin,
> 
> Thanks for the patch!

Hi Dumitru,

Thanks for the review!
> 
>> northd/northd.c | 17 ++++++++++-------
>> tests/ovn.at    | 15 ++-------------
>> 2 files changed, 12 insertions(+), 20 deletions(-)
>> 
>> diff --git a/northd/northd.c b/northd/northd.c
>> index a56666297..4441ef631 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -4591,13 +4591,6 @@ ovn_igmp_group_aggregate_ports(struct ovn_igmp_group *igmp_group,
>>         ovn_igmp_group_destroy_entry(entry);
>>         free(entry);
>>     }
>> -
>> -    if (igmp_group->datapath->n_localnet_ports) {
>> -        ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
>> -                                &igmp_group->mcgroup,
>> -                                igmp_group->datapath->localnet_ports,
>> -                                igmp_group->datapath->n_localnet_ports);
>> -    }
>> }
>> 
>> static void
>> @@ -15066,6 +15059,16 @@ build_mcast_groups(struct lflow_input *input_data,
>> 
>>         /* Add the extracted ports to the IGMP group. */
>>         ovn_igmp_group_add_entry(igmp_group, igmp_ports, n_igmp_ports);
>> +        if (!ovn_igmp_group_allocate_id(igmp_group)) {
>> +            ovn_igmp_group_destroy(igmp_groups, igmp_group);
>> +            continue;
>> +        }
>> +        if (igmp_group->datapath->n_localnet_ports) {
>> +            ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
>> +                                    &igmp_group->mcgroup,
>> +                                    igmp_group->datapath->localnet_ports,
>> +                                    igmp_group->datapath->n_localnet_ports);
>> +        }
> 
> I'm not sure I understand how this changes things.  After looking at the
> main branch code more closely, we already include all localnet ports in
> the multicast_group generated for an IGMP group.  We just do it at the
> end of build_mcast_groups(), in ovn_igmp_group_aggregate_ports() which
> is called once for every IGMP Group learnt in a logical datapath.

Indeed I got confused between branches and I didn’t saw the ovn_igmp_group_aggregate_ports.

>>     }
>> 
>>     /* Build IGMP groups for multicast routers with relay enabled. The router
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index 34b6abfc0..5440639a8 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -19570,7 +19570,8 @@ ovn-nbctl lsp-add sw3 sw3-p1
>> ovn-nbctl lsp-add sw3 sw3-p2
>> ovn-nbctl lsp-add sw3 sw3-ln                    \
>>     -- lsp-set-type sw3-ln localnet             \
>> -    -- lsp-set-options sw3-ln network_name=phys
>> +    -- lsp-set-options sw3-ln network_name=phys \
>> +    -- lsp-set-options sw3-ln mcast_flood=true
> 
> This threw me off track for a bit.. "lsp-set-options <port> <options>"
> will actually overwrite all options of <port> with <options>.  Which
> most likely is not what we want here because we would be removing the
> network_name, essentially disconnecting the localnet port.
> 
> If I understand the intention correctly, you should probably use:
> 
> -- lsp-set-options sw3-ln network_name=phys mcast_flood=true

You are correct. This actually made me think I was changing things.

> 
>> 
>> ovn-nbctl lr-add rtr
>> ovn-nbctl lrp-add rtr rtr-sw1 00:00:00:00:01:00 10.0.0.254/24
>> @@ -19985,18 +19986,6 @@ store_ip_multicast_pkt \
>>     $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 20 ca70 11 \
>>     e518e518000a3b3a0000 expected_switched
>> 
>> -# TODO: IGMP Relay duplicates IP multicast packets in some conditions, for
>> -# more details see TODO.rst, section "IP Multicast Relay". Once that issue is
>> -# fixed the duplicated packets should not appear anymore.
>> -store_ip_multicast_pkt \
>> -    000000000100 01005e000144 \
>> -    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 1f cb70 11 \
>> -    e518e518000a3b3a0000 expected_routed_sw1
>> -store_ip_multicast_pkt \
>> -    000000000200 01005e000144 \
>> -    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 1f cb70 11 \
>> -    e518e518000a3b3a0000 expected_routed_sw2
>> -
> 
> After changing the test to properly set mcast_flood=true on the localnet
> port, it starts failing due to the duplicated routed packets (the TODO
> above).  There was no logical change in northd.c as far as I can tell,
> so that makes sense.
> 
>> OVS_WAIT_UNTIL(
>>   [check_packets 'hv1/vif1-tx.pcap expected_routed_sw1' \
>>                  'hv2/vif3-tx.pcap expected_routed_sw2' \
> 
> Then I tried to replicate the issue Diko reported but I couldn't.  The
> databases attached to the GitHub issue are truncated, I get:
> 
> ovsdb-tool: syntax error: ovnnb_db.db: 731669 bytes starting at offset
> 62 have SHA-1 hash 161c2831d154dce19aeb316b5d95de5db66988ac but should
> have hash 0d9bc5cb94dcce9f58bb99c5588b249b551c27cd
> 
> And I tried setting up a unit test to do exactly what is reported in the
> GitHub issue but regardless of mcast_flood value I always get multicast
> traffic generated from an "internal" source properly forwarded to the
> "external" subscribers that I created and attached to the provider
> network simulated in the test.
> 
> Diko, do you still have an environment where you hit this issue?  Would
> it be possible to share the NB/SB database files again, making sure
> they're not truncated?

FWIW I think the problem with the database is that it’s a 20.03 version of the schema.

After looking a bit over relevant IGMP fixes I think we are missing this commit on our end:

https://patchwork.ozlabs.org/project/ovn/patch/20211015144315.1416454-1-ihrachys@redhat.com/

Sorry for the noise with the patch.

> 
> Thanks,
> Dumitru
>
Dumitru Ceara May 24, 2022, 7:42 a.m. UTC | #3
On 5/24/22 01:13, Alin Serdean wrote:
> 
> 
>> On 24 May 2022, at 00:15, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 5/3/22 17:15, Alin-Gabriel Serdean wrote:
>>> Add localnet ports to their corresponding southbound multicast group.
>>>
>>> Update unit tests.
>>>
>>> Reported-at: https://github.com/ovn-org/ovn/issues/125
>>> Reported-by: Diko Parvanov <me@dparv.dev>
>>> Suggested-by: Dumitru Ceara <dceara@redhat.com>
>>> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
>>> ---
>>
>> Hi Alin,
>>
>> Thanks for the patch!
> 
> Hi Dumitru,
> 
> Thanks for the review!
>>
>>> northd/northd.c | 17 ++++++++++-------
>>> tests/ovn.at    | 15 ++-------------
>>> 2 files changed, 12 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index a56666297..4441ef631 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -4591,13 +4591,6 @@ ovn_igmp_group_aggregate_ports(struct ovn_igmp_group *igmp_group,
>>>         ovn_igmp_group_destroy_entry(entry);
>>>         free(entry);
>>>     }
>>> -
>>> -    if (igmp_group->datapath->n_localnet_ports) {
>>> -        ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
>>> -                                &igmp_group->mcgroup,
>>> -                                igmp_group->datapath->localnet_ports,
>>> -                                igmp_group->datapath->n_localnet_ports);
>>> -    }
>>> }
>>>
>>> static void
>>> @@ -15066,6 +15059,16 @@ build_mcast_groups(struct lflow_input *input_data,
>>>
>>>         /* Add the extracted ports to the IGMP group. */
>>>         ovn_igmp_group_add_entry(igmp_group, igmp_ports, n_igmp_ports);
>>> +        if (!ovn_igmp_group_allocate_id(igmp_group)) {
>>> +            ovn_igmp_group_destroy(igmp_groups, igmp_group);
>>> +            continue;
>>> +        }
>>> +        if (igmp_group->datapath->n_localnet_ports) {
>>> +            ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
>>> +                                    &igmp_group->mcgroup,
>>> +                                    igmp_group->datapath->localnet_ports,
>>> +                                    igmp_group->datapath->n_localnet_ports);
>>> +        }
>>
>> I'm not sure I understand how this changes things.  After looking at the
>> main branch code more closely, we already include all localnet ports in
>> the multicast_group generated for an IGMP group.  We just do it at the
>> end of build_mcast_groups(), in ovn_igmp_group_aggregate_ports() which
>> is called once for every IGMP Group learnt in a logical datapath.
> 
> Indeed I got confused between branches and I didn’t saw the ovn_igmp_group_aggregate_ports.
> 
>>>     }
>>>
>>>     /* Build IGMP groups for multicast routers with relay enabled. The router
>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>> index 34b6abfc0..5440639a8 100644
>>> --- a/tests/ovn.at
>>> +++ b/tests/ovn.at
>>> @@ -19570,7 +19570,8 @@ ovn-nbctl lsp-add sw3 sw3-p1
>>> ovn-nbctl lsp-add sw3 sw3-p2
>>> ovn-nbctl lsp-add sw3 sw3-ln                    \
>>>     -- lsp-set-type sw3-ln localnet             \
>>> -    -- lsp-set-options sw3-ln network_name=phys
>>> +    -- lsp-set-options sw3-ln network_name=phys \
>>> +    -- lsp-set-options sw3-ln mcast_flood=true
>>
>> This threw me off track for a bit.. "lsp-set-options <port> <options>"
>> will actually overwrite all options of <port> with <options>.  Which
>> most likely is not what we want here because we would be removing the
>> network_name, essentially disconnecting the localnet port.
>>
>> If I understand the intention correctly, you should probably use:
>>
>> -- lsp-set-options sw3-ln network_name=phys mcast_flood=true
> 
> You are correct. This actually made me think I was changing things.
> 
>>
>>>
>>> ovn-nbctl lr-add rtr
>>> ovn-nbctl lrp-add rtr rtr-sw1 00:00:00:00:01:00 10.0.0.254/24
>>> @@ -19985,18 +19986,6 @@ store_ip_multicast_pkt \
>>>     $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 20 ca70 11 \
>>>     e518e518000a3b3a0000 expected_switched
>>>
>>> -# TODO: IGMP Relay duplicates IP multicast packets in some conditions, for
>>> -# more details see TODO.rst, section "IP Multicast Relay". Once that issue is
>>> -# fixed the duplicated packets should not appear anymore.
>>> -store_ip_multicast_pkt \
>>> -    000000000100 01005e000144 \
>>> -    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 1f cb70 11 \
>>> -    e518e518000a3b3a0000 expected_routed_sw1
>>> -store_ip_multicast_pkt \
>>> -    000000000200 01005e000144 \
>>> -    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 1f cb70 11 \
>>> -    e518e518000a3b3a0000 expected_routed_sw2
>>> -
>>
>> After changing the test to properly set mcast_flood=true on the localnet
>> port, it starts failing due to the duplicated routed packets (the TODO
>> above).  There was no logical change in northd.c as far as I can tell,
>> so that makes sense.
>>
>>> OVS_WAIT_UNTIL(
>>>   [check_packets 'hv1/vif1-tx.pcap expected_routed_sw1' \
>>>                  'hv2/vif3-tx.pcap expected_routed_sw2' \
>>
>> Then I tried to replicate the issue Diko reported but I couldn't.  The
>> databases attached to the GitHub issue are truncated, I get:
>>
>> ovsdb-tool: syntax error: ovnnb_db.db: 731669 bytes starting at offset
>> 62 have SHA-1 hash 161c2831d154dce19aeb316b5d95de5db66988ac but should
>> have hash 0d9bc5cb94dcce9f58bb99c5588b249b551c27cd
>>
>> And I tried setting up a unit test to do exactly what is reported in the
>> GitHub issue but regardless of mcast_flood value I always get multicast
>> traffic generated from an "internal" source properly forwarded to the
>> "external" subscribers that I created and attached to the provider
>> network simulated in the test.
>>
>> Diko, do you still have an environment where you hit this issue?  Would
>> it be possible to share the NB/SB database files again, making sure
>> they're not truncated?
> 
> FWIW I think the problem with the database is that it’s a 20.03 version of the schema.
> 

Maybe, although I was running "ovsdb-tool cluster-to-standalone" which
should be schema agnostic.

> After looking a bit over relevant IGMP fixes I think we are missing this commit on our end:
> 
> https://patchwork.ozlabs.org/project/ovn/patch/20211015144315.1416454-1-ihrachys@redhat.com/
> 

Maybe, however this patch above is for localports which are not the same
as localnet ports.  There might still be other, not yet fixed, issues.
Please let me know if you still hit the problem with newer OVN code.

> Sorry for the noise with the patch.
> 

No problem at all. :)

>>
>> Thanks,
>> Dumitru
>>
>
Alin-Gabriel Serdean May 24, 2022, 8:05 a.m. UTC | #4
-----Original Message-----
From: Dumitru Ceara <dceara@redhat.com> 
Sent: Tuesday, May 24, 2022 10:42 AM
To: Alin Serdean <alinserdean@gmail.com>
Cc: Alin-Gabriel Serdean <aserdean@ovn.org>; dev@openvswitch.org; Diko Parvanov <me@dparv.dev>
Subject: Re: [PATCH] [RFC] [ovn] northd: Fix IGMP external subscriber subscribing to internal feed

On 5/24/22 01:13, Alin Serdean wrote:
> 
> 
>> On 24 May 2022, at 00:15, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 5/3/22 17:15, Alin-Gabriel Serdean wrote:
>>> Add localnet ports to their corresponding southbound multicast group.
>>>
>>> Update unit tests.
>>>
>>> Reported-at: https://github.com/ovn-org/ovn/issues/125
>>> Reported-by: Diko Parvanov <me@dparv.dev>
>>> Suggested-by: Dumitru Ceara <dceara@redhat.com>
>>> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
>>> ---
>>
>> Hi Alin,
>>
>> Thanks for the patch!
> 
> Hi Dumitru,
> 
> Thanks for the review!
>>
>>> northd/northd.c | 17 ++++++++++-------
>>> tests/ovn.at    | 15 ++-------------
>>> 2 files changed, 12 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c index 
>>> a56666297..4441ef631 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -4591,13 +4591,6 @@ ovn_igmp_group_aggregate_ports(struct ovn_igmp_group *igmp_group,
>>>         ovn_igmp_group_destroy_entry(entry);
>>>         free(entry);
>>>     }
>>> -
>>> -    if (igmp_group->datapath->n_localnet_ports) {
>>> -        ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
>>> -                                &igmp_group->mcgroup,
>>> -                                igmp_group->datapath->localnet_ports,
>>> -                                igmp_group->datapath->n_localnet_ports);
>>> -    }
>>> }
>>>
>>> static void
>>> @@ -15066,6 +15059,16 @@ build_mcast_groups(struct lflow_input 
>>> *input_data,
>>>
>>>         /* Add the extracted ports to the IGMP group. */
>>>         ovn_igmp_group_add_entry(igmp_group, igmp_ports, 
>>> n_igmp_ports);
>>> +        if (!ovn_igmp_group_allocate_id(igmp_group)) {
>>> +            ovn_igmp_group_destroy(igmp_groups, igmp_group);
>>> +            continue;
>>> +        }
>>> +        if (igmp_group->datapath->n_localnet_ports) {
>>> +            ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
>>> +                                    &igmp_group->mcgroup,
>>> +                                    igmp_group->datapath->localnet_ports,
>>> +                                    igmp_group->datapath->n_localnet_ports);
>>> +        }
>>
>> I'm not sure I understand how this changes things.  After looking at 
>> the main branch code more closely, we already include all localnet 
>> ports in the multicast_group generated for an IGMP group.  We just do 
>> it at the end of build_mcast_groups(), in 
>> ovn_igmp_group_aggregate_ports() which is called once for every IGMP Group learnt in a logical datapath.
> 
> Indeed I got confused between branches and I didn’t saw the ovn_igmp_group_aggregate_ports.
> 
>>>     }
>>>
>>>     /* Build IGMP groups for multicast routers with relay enabled. 
>>> The router diff --git a/tests/ovn.at b/tests/ovn.at index 
>>> 34b6abfc0..5440639a8 100644
>>> --- a/tests/ovn.at
>>> +++ b/tests/ovn.at
>>> @@ -19570,7 +19570,8 @@ ovn-nbctl lsp-add sw3 sw3-p1 ovn-nbctl 
>>> lsp-add sw3 sw3-p2
>>> ovn-nbctl lsp-add sw3 sw3-ln                    \
>>>     -- lsp-set-type sw3-ln localnet             \
>>> -    -- lsp-set-options sw3-ln network_name=phys
>>> +    -- lsp-set-options sw3-ln network_name=phys \
>>> +    -- lsp-set-options sw3-ln mcast_flood=true
>>
>> This threw me off track for a bit.. "lsp-set-options <port> <options>"
>> will actually overwrite all options of <port> with <options>.  Which 
>> most likely is not what we want here because we would be removing the 
>> network_name, essentially disconnecting the localnet port.
>>
>> If I understand the intention correctly, you should probably use:
>>
>> -- lsp-set-options sw3-ln network_name=phys mcast_flood=true
> 
> You are correct. This actually made me think I was changing things.
> 
>>
>>>
>>> ovn-nbctl lr-add rtr
>>> ovn-nbctl lrp-add rtr rtr-sw1 00:00:00:00:01:00 10.0.0.254/24 @@ 
>>> -19985,18 +19986,6 @@ store_ip_multicast_pkt \
>>>     $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 20 ca70 11 \
>>>     e518e518000a3b3a0000 expected_switched
>>>
>>> -# TODO: IGMP Relay duplicates IP multicast packets in some 
>>> conditions, for -# more details see TODO.rst, section "IP Multicast 
>>> Relay". Once that issue is -# fixed the duplicated packets should not appear anymore.
>>> -store_ip_multicast_pkt \
>>> -    000000000100 01005e000144 \
>>> -    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 1f cb70 11 \
>>> -    e518e518000a3b3a0000 expected_routed_sw1
>>> -store_ip_multicast_pkt \
>>> -    000000000200 01005e000144 \
>>> -    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 1f cb70 11 \
>>> -    e518e518000a3b3a0000 expected_routed_sw2
>>> -
>>
>> After changing the test to properly set mcast_flood=true on the 
>> localnet port, it starts failing due to the duplicated routed packets 
>> (the TODO above).  There was no logical change in northd.c as far as 
>> I can tell, so that makes sense.
>>
>>> OVS_WAIT_UNTIL(
>>>   [check_packets 'hv1/vif1-tx.pcap expected_routed_sw1' \
>>>                  'hv2/vif3-tx.pcap expected_routed_sw2' \
>>
>> Then I tried to replicate the issue Diko reported but I couldn't.  
>> The databases attached to the GitHub issue are truncated, I get:
>>
>> ovsdb-tool: syntax error: ovnnb_db.db: 731669 bytes starting at 
>> offset
>> 62 have SHA-1 hash 161c2831d154dce19aeb316b5d95de5db66988ac but 
>> should have hash 0d9bc5cb94dcce9f58bb99c5588b249b551c27cd
>>
>> And I tried setting up a unit test to do exactly what is reported in 
>> the GitHub issue but regardless of mcast_flood value I always get 
>> multicast traffic generated from an "internal" source properly 
>> forwarded to the "external" subscribers that I created and attached 
>> to the provider network simulated in the test.
>>
>> Diko, do you still have an environment where you hit this issue?  
>> Would it be possible to share the NB/SB database files again, making 
>> sure they're not truncated?
> 
> FWIW I think the problem with the database is that it’s a 20.03 version of the schema.
> 

Maybe, although I was running "ovsdb-tool cluster-to-standalone" which should be schema agnostic.

[AS] We will try to convert into standalone and a newer DB and upload it.

> After looking a bit over relevant IGMP fixes I think we are missing this commit on our end:
> 
> https://patchwork.ozlabs.org/project/ovn/patch/20211015144315.1416454-
> 1-ihrachys@redhat.com/
> 

Maybe, however this patch above is for localports which are not the same as localnet ports.  There might still be other, not yet fixed, issues.
Please let me know if you still hit the problem with newer OVN code.

[AS] I will try to reproduce the issue on the latest codebase and let you know.

> Sorry for the noise with the patch.
> 

No problem at all. :)

>>
>> Thanks,
>> Dumitru
>>
>
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index a56666297..4441ef631 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -4591,13 +4591,6 @@  ovn_igmp_group_aggregate_ports(struct ovn_igmp_group *igmp_group,
         ovn_igmp_group_destroy_entry(entry);
         free(entry);
     }
-
-    if (igmp_group->datapath->n_localnet_ports) {
-        ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
-                                &igmp_group->mcgroup,
-                                igmp_group->datapath->localnet_ports,
-                                igmp_group->datapath->n_localnet_ports);
-    }
 }
 
 static void
@@ -15066,6 +15059,16 @@  build_mcast_groups(struct lflow_input *input_data,
 
         /* Add the extracted ports to the IGMP group. */
         ovn_igmp_group_add_entry(igmp_group, igmp_ports, n_igmp_ports);
+        if (!ovn_igmp_group_allocate_id(igmp_group)) {
+            ovn_igmp_group_destroy(igmp_groups, igmp_group);
+            continue;
+        }
+        if (igmp_group->datapath->n_localnet_ports) {
+            ovn_multicast_add_ports(mcast_groups, igmp_group->datapath,
+                                    &igmp_group->mcgroup,
+                                    igmp_group->datapath->localnet_ports,
+                                    igmp_group->datapath->n_localnet_ports);
+        }
     }
 
     /* Build IGMP groups for multicast routers with relay enabled. The router
diff --git a/tests/ovn.at b/tests/ovn.at
index 34b6abfc0..5440639a8 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -19570,7 +19570,8 @@  ovn-nbctl lsp-add sw3 sw3-p1
 ovn-nbctl lsp-add sw3 sw3-p2
 ovn-nbctl lsp-add sw3 sw3-ln                    \
     -- lsp-set-type sw3-ln localnet             \
-    -- lsp-set-options sw3-ln network_name=phys
+    -- lsp-set-options sw3-ln network_name=phys \
+    -- lsp-set-options sw3-ln mcast_flood=true
 
 ovn-nbctl lr-add rtr
 ovn-nbctl lrp-add rtr rtr-sw1 00:00:00:00:01:00 10.0.0.254/24
@@ -19985,18 +19986,6 @@  store_ip_multicast_pkt \
     $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 20 ca70 11 \
     e518e518000a3b3a0000 expected_switched
 
-# TODO: IGMP Relay duplicates IP multicast packets in some conditions, for
-# more details see TODO.rst, section "IP Multicast Relay". Once that issue is
-# fixed the duplicated packets should not appear anymore.
-store_ip_multicast_pkt \
-    000000000100 01005e000144 \
-    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 1f cb70 11 \
-    e518e518000a3b3a0000 expected_routed_sw1
-store_ip_multicast_pkt \
-    000000000200 01005e000144 \
-    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e 1f cb70 11 \
-    e518e518000a3b3a0000 expected_routed_sw2
-
 OVS_WAIT_UNTIL(
   [check_packets 'hv1/vif1-tx.pcap expected_routed_sw1' \
                  'hv2/vif3-tx.pcap expected_routed_sw2' \