diff mbox series

[net-next,09/14] net: bridge: Propagate MC addresses with VID through switchdev

Message ID 20190116200102.2749-10-f.fainelli@gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series net: dsa: management mode for bcm_sf2 | expand

Commit Message

Florian Fainelli Jan. 16, 2019, 8 p.m. UTC
In order for bridge port members to get a chance to implement unicast
and multicast address filtering correctly, which would matter for e.g:
switch network devices, synchronize the UC and MC lists down to the
individual bridge port members using switchdev HOST_MDB objects such
that this does not impact drivers that already have a ndo_set_rx_mode()
operation which likely already operate in promiscuous mode.

When the bridge has multicast snooping enabled, proper HOST_MDB
notifications will be sent through br_mdb_notify() already.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/bridge/br_device.c | 55 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

Comments

Ido Schimmel Jan. 17, 2019, 2:05 p.m. UTC | #1
On Wed, Jan 16, 2019 at 12:00:57PM -0800, Florian Fainelli wrote:
> In order for bridge port members to get a chance to implement unicast
> and multicast address filtering correctly, which would matter for e.g:
> switch network devices, synchronize the UC and MC lists down to the
> individual bridge port members using switchdev HOST_MDB objects such
> that this does not impact drivers that already have a ndo_set_rx_mode()
> operation which likely already operate in promiscuous mode.
> 
> When the bridge has multicast snooping enabled, proper HOST_MDB
> notifications will be sent through br_mdb_notify() already.

I don't understand the change. HOST_MDB is used to notify underlying
drivers about MDB entries that should be configured to locally receive
packets. This is triggered by the transmission of an IGMP report through
the bridge, for example.

It seems that you're trying to overload HOST_MDB with multicast address
filtering on bridge ports? Why are you performing this filtering?
Shouldn't you allow all MAC addresses to ingress?
Florian Fainelli Jan. 17, 2019, 7:17 p.m. UTC | #2
On 1/17/19 6:05 AM, Ido Schimmel wrote:
> On Wed, Jan 16, 2019 at 12:00:57PM -0800, Florian Fainelli wrote:
>> In order for bridge port members to get a chance to implement unicast
>> and multicast address filtering correctly, which would matter for e.g:
>> switch network devices, synchronize the UC and MC lists down to the
>> individual bridge port members using switchdev HOST_MDB objects such
>> that this does not impact drivers that already have a ndo_set_rx_mode()
>> operation which likely already operate in promiscuous mode.
>>
>> When the bridge has multicast snooping enabled, proper HOST_MDB
>> notifications will be sent through br_mdb_notify() already.
> 
> I don't understand the change. HOST_MDB is used to notify underlying
> drivers about MDB entries that should be configured to locally receive
> packets. This is triggered by the transmission of an IGMP report through
> the bridge, for example.
> 
> It seems that you're trying to overload HOST_MDB with multicast address
> filtering on bridge ports?

I don't really think this is an abuse of HOST_MDB, since in case the
bridge has multicast_snooping enabled, and there is e.g: a multicast
application bound to the bridge master device, we would get those
notifications through HOST_MDB already. This is the same use case that I
am addressing here, ndo_set_rx_mode() learns about local multicast
addresses that should be programmed, which means there is a multicast
application listening on the bridge master device itself.

The problem that I want to solve is that with Broadcom b53/bcm_sf2
switches, we cannot easily filter/flood multicast for the CPU/management
port.

We have per-port controls for MC/IPMC flooding, and we also have a
separate control for CPU/management port receiving multicast. If either
of these two bits/settings are configured, then the CPU port will always
receive multicast, even when we should be filtering it in HW. The only
way to perform selective reception of multicast to the CPU port is to
program a corresponding MDB entry.

>Why are you performing this filtering?

If I do not filter, then non-bridged ports on which there is no
multicast application bound to would be passing up multicast traffic all
the way to the CPU port, which then has to be dropped in software. This
is not acceptable IMHO because it is a deviation from how a standalone
NIC supporting multicast filtering would operate.

> Shouldn't you allow all MAC addresses to ingress?

I do allow all MC addresses to ingress on the front-panel switch ports
(while honoring the multicast_snooping setting), but we have no control
over what the CPU/management port should be doing.

As I wrote earlier, if we flood to the CPU/management port, because
there is at least one switch device port, in the bridge, and that bridge
has multicast_snooping disabled, then this could break filtering for
other, non-bridged ports. That is really not acceptable IMHO.

The reason why I chose switchdev HOST_MDB notification here are two fold:

- this is the same use case as with multicast_snooping=1 and we target
the CPU port within DSA to resolve that use case, so from the switch
driver perspective, there is no difference in the context

- this does not impact network device drivers that have a
ndo_set_rx_mode() and somehow decide to support things through that API
since those would typically have a switchdev_port_attr_set() callback

Let me know if this is not clear.
Ido Schimmel Jan. 18, 2019, 10:43 a.m. UTC | #3
On Thu, Jan 17, 2019 at 11:17:57AM -0800, Florian Fainelli wrote:
> On 1/17/19 6:05 AM, Ido Schimmel wrote:
> > On Wed, Jan 16, 2019 at 12:00:57PM -0800, Florian Fainelli wrote:
> >> In order for bridge port members to get a chance to implement unicast
> >> and multicast address filtering correctly, which would matter for e.g:
> >> switch network devices, synchronize the UC and MC lists down to the
> >> individual bridge port members using switchdev HOST_MDB objects such
> >> that this does not impact drivers that already have a ndo_set_rx_mode()
> >> operation which likely already operate in promiscuous mode.
> >>
> >> When the bridge has multicast snooping enabled, proper HOST_MDB
> >> notifications will be sent through br_mdb_notify() already.
> > 
> > I don't understand the change. HOST_MDB is used to notify underlying
> > drivers about MDB entries that should be configured to locally receive
> > packets. This is triggered by the transmission of an IGMP report through
> > the bridge, for example.
> > 
> > It seems that you're trying to overload HOST_MDB with multicast address
> > filtering on bridge ports?
> 
> I don't really think this is an abuse of HOST_MDB, since in case the
> bridge has multicast_snooping enabled, and there is e.g: a multicast
> application bound to the bridge master device, we would get those
> notifications through HOST_MDB already. This is the same use case that I
> am addressing here, ndo_set_rx_mode() learns about local multicast
> addresses that should be programmed, which means there is a multicast
> application listening on the bridge master device itself.
> 
> The problem that I want to solve is that with Broadcom b53/bcm_sf2
> switches, we cannot easily filter/flood multicast for the CPU/management
> port.
> 
> We have per-port controls for MC/IPMC flooding, and we also have a
> separate control for CPU/management port receiving multicast. If either
> of these two bits/settings are configured, then the CPU port will always
> receive multicast, even when we should be filtering it in HW. The only
> way to perform selective reception of multicast to the CPU port is to
> program a corresponding MDB entry.
> 
> >Why are you performing this filtering?
> 
> If I do not filter, then non-bridged ports on which there is no
> multicast application bound to would be passing up multicast traffic all
> the way to the CPU port, which then has to be dropped in software. This
> is not acceptable IMHO because it is a deviation from how a standalone
> NIC supporting multicast filtering would operate.
> 
> > Shouldn't you allow all MAC addresses to ingress?
> 
> I do allow all MC addresses to ingress on the front-panel switch ports
> (while honoring the multicast_snooping setting), but we have no control
> over what the CPU/management port should be doing.
> 
> As I wrote earlier, if we flood to the CPU/management port, because
> there is at least one switch device port, in the bridge, and that bridge
> has multicast_snooping disabled, then this could break filtering for
> other, non-bridged ports. That is really not acceptable IMHO.

When multicast snooping is disabled every multicast packet should be
flooded to the CPU port. Not only addresses configured on the bridge
device's address list. Check br_handle_frame_finish(), it sets
'local_rcv' to 'true'.

> 
> The reason why I chose switchdev HOST_MDB notification here are two fold:
> 
> - this is the same use case as with multicast_snooping=1 and we target
> the CPU port within DSA to resolve that use case, so from the switch
> driver perspective, there is no difference in the context
> 
> - this does not impact network device drivers that have a
> ndo_set_rx_mode() and somehow decide to support things through that API
> since those would typically have a switchdev_port_attr_set() callback
> 
> Let me know if this is not clear.
> -- 
> Florian
Ido Schimmel Jan. 18, 2019, 11:41 a.m. UTC | #4
On Thu, Jan 17, 2019 at 11:17:57AM -0800, Florian Fainelli wrote:
> On 1/17/19 6:05 AM, Ido Schimmel wrote:
> > On Wed, Jan 16, 2019 at 12:00:57PM -0800, Florian Fainelli wrote:
> >> In order for bridge port members to get a chance to implement unicast
> >> and multicast address filtering correctly, which would matter for e.g:
> >> switch network devices, synchronize the UC and MC lists down to the
> >> individual bridge port members using switchdev HOST_MDB objects such
> >> that this does not impact drivers that already have a ndo_set_rx_mode()
> >> operation which likely already operate in promiscuous mode.
> >>
> >> When the bridge has multicast snooping enabled, proper HOST_MDB
> >> notifications will be sent through br_mdb_notify() already.
> > 
> > I don't understand the change. HOST_MDB is used to notify underlying
> > drivers about MDB entries that should be configured to locally receive
> > packets. This is triggered by the transmission of an IGMP report through
> > the bridge, for example.
> > 
> > It seems that you're trying to overload HOST_MDB with multicast address
> > filtering on bridge ports?
> 
> I don't really think this is an abuse of HOST_MDB, since in case the
> bridge has multicast_snooping enabled, and there is e.g: a multicast
> application bound to the bridge master device, we would get those
> notifications through HOST_MDB already. This is the same use case that I
> am addressing here, ndo_set_rx_mode() learns about local multicast
> addresses that should be programmed, which means there is a multicast
> application listening on the bridge master device itself.
> 
> The problem that I want to solve is that with Broadcom b53/bcm_sf2
> switches, we cannot easily filter/flood multicast for the CPU/management
> port.
> 
> We have per-port controls for MC/IPMC flooding, and we also have a
> separate control for CPU/management port receiving multicast. If either
> of these two bits/settings are configured, then the CPU port will always
> receive multicast, even when we should be filtering it in HW. The only
> way to perform selective reception of multicast to the CPU port is to
> program a corresponding MDB entry.
> 
> >Why are you performing this filtering?
> 
> If I do not filter, then non-bridged ports on which there is no
> multicast application bound to would be passing up multicast traffic all
> the way to the CPU port, which then has to be dropped in software. This
> is not acceptable IMHO because it is a deviation from how a standalone
> NIC supporting multicast filtering would operate.
> 
> > Shouldn't you allow all MAC addresses to ingress?
> 
> I do allow all MC addresses to ingress on the front-panel switch ports
> (while honoring the multicast_snooping setting), but we have no control
> over what the CPU/management port should be doing.
> 
> As I wrote earlier, if we flood to the CPU/management port, because
> there is at least one switch device port, in the bridge, and that bridge
> has multicast_snooping disabled, then this could break filtering for
> other, non-bridged ports. That is really not acceptable IMHO.
> 
> The reason why I chose switchdev HOST_MDB notification here are two fold:
> 
> - this is the same use case as with multicast_snooping=1 and we target
> the CPU port within DSA to resolve that use case, so from the switch
> driver perspective, there is no difference in the context
> 
> - this does not impact network device drivers that have a
> ndo_set_rx_mode() and somehow decide to support things through that API
> since those would typically have a switchdev_port_attr_set() callback

HOST_MDB was added for a very specific use case. To allow the bridge
driver to notify underlying switch drivers about MDB entries that should
be programmed to locally receive packets when multicast is enabled.
Andrew described it very nicely in merge commit
5d37636abd15ace8686a54167b488364ee79e88d

Ingress filtering is something completely different and not applicable
to bridged ports that should allow every address to ingress.

switchdev allows to offload the bridge datapath to capable devices, but
you're abusing to it allow non-bridged ports to perform address
filtering. Completely unrelated.

Therefore, it seems completely inappropriate to me to use HOST_MDB for
this reason. This applies to patch #10 as well.

It really sounds like the HW you're working with is not designed to work
in this mixed state where some ports are bridged and some are expected to
act as standalone NICs.

If you're still determined to support this use case, I suggest the
following. In your driver, program the bridge's address list as MDB
entries when the first port is enslaved to it. Then, add a new netdev
event whenever an address is added / removed from this list (in
__dev_set_rx_mode() ?). Have your driver listen to it and program MDB
entries accordingly.
Florian Fainelli Jan. 18, 2019, 9:48 p.m. UTC | #5
On 1/18/19 3:41 AM, Ido Schimmel wrote:
> On Thu, Jan 17, 2019 at 11:17:57AM -0800, Florian Fainelli wrote:
>> On 1/17/19 6:05 AM, Ido Schimmel wrote:
>>> On Wed, Jan 16, 2019 at 12:00:57PM -0800, Florian Fainelli wrote:
>>>> In order for bridge port members to get a chance to implement unicast
>>>> and multicast address filtering correctly, which would matter for e.g:
>>>> switch network devices, synchronize the UC and MC lists down to the
>>>> individual bridge port members using switchdev HOST_MDB objects such
>>>> that this does not impact drivers that already have a ndo_set_rx_mode()
>>>> operation which likely already operate in promiscuous mode.
>>>>
>>>> When the bridge has multicast snooping enabled, proper HOST_MDB
>>>> notifications will be sent through br_mdb_notify() already.
>>>
>>> I don't understand the change. HOST_MDB is used to notify underlying
>>> drivers about MDB entries that should be configured to locally receive
>>> packets. This is triggered by the transmission of an IGMP report through
>>> the bridge, for example.
>>>
>>> It seems that you're trying to overload HOST_MDB with multicast address
>>> filtering on bridge ports?
>>
>> I don't really think this is an abuse of HOST_MDB, since in case the
>> bridge has multicast_snooping enabled, and there is e.g: a multicast
>> application bound to the bridge master device, we would get those
>> notifications through HOST_MDB already. This is the same use case that I
>> am addressing here, ndo_set_rx_mode() learns about local multicast
>> addresses that should be programmed, which means there is a multicast
>> application listening on the bridge master device itself.
>>
>> The problem that I want to solve is that with Broadcom b53/bcm_sf2
>> switches, we cannot easily filter/flood multicast for the CPU/management
>> port.
>>
>> We have per-port controls for MC/IPMC flooding, and we also have a
>> separate control for CPU/management port receiving multicast. If either
>> of these two bits/settings are configured, then the CPU port will always
>> receive multicast, even when we should be filtering it in HW. The only
>> way to perform selective reception of multicast to the CPU port is to
>> program a corresponding MDB entry.
>>
>>> Why are you performing this filtering?
>>
>> If I do not filter, then non-bridged ports on which there is no
>> multicast application bound to would be passing up multicast traffic all
>> the way to the CPU port, which then has to be dropped in software. This
>> is not acceptable IMHO because it is a deviation from how a standalone
>> NIC supporting multicast filtering would operate.
>>
>>> Shouldn't you allow all MAC addresses to ingress?
>>
>> I do allow all MC addresses to ingress on the front-panel switch ports
>> (while honoring the multicast_snooping setting), but we have no control
>> over what the CPU/management port should be doing.
>>
>> As I wrote earlier, if we flood to the CPU/management port, because
>> there is at least one switch device port, in the bridge, and that bridge
>> has multicast_snooping disabled, then this could break filtering for
>> other, non-bridged ports. That is really not acceptable IMHO.
>>
>> The reason why I chose switchdev HOST_MDB notification here are two fold:
>>
>> - this is the same use case as with multicast_snooping=1 and we target
>> the CPU port within DSA to resolve that use case, so from the switch
>> driver perspective, there is no difference in the context
>>
>> - this does not impact network device drivers that have a
>> ndo_set_rx_mode() and somehow decide to support things through that API
>> since those would typically have a switchdev_port_attr_set() callback
> 
> HOST_MDB was added for a very specific use case. To allow the bridge
> driver to notify underlying switch drivers about MDB entries that should
> be programmed to locally receive packets when multicast is enabled.
> Andrew described it very nicely in merge commit
> 5d37636abd15ace8686a54167b488364ee79e88d
> 
> Ingress filtering is something completely different and not applicable
> to bridged ports that should allow every address to ingress.

I actually made a mistake in this patch because there is no need to
iterate over the switch port members and generate a HOST_MDB
notification for each of them because what we want to target is the CPU
port, which DSA internally resolves for us anyway.

What we want to tell the switch HW here is basically: you have a
multicast application bound to the bridge master device, so please let
this MC address go through your CPU/management port. This is effectively
egress filtering at the CPU port side.

Because the bridge has multicast_snooping=false, the switch ports have
been configured to flood MC/IPMC already, but as I wrote, if we do that
for the CPU port, then we "break" non-bridge ports.

It seems to me that this is exactly the same use case that what Andrew
did originally, and drivers that are not pathological like mine can just
decide to ignore that notification and flood everything to the CPU port.
The end results would be the same from an end user perspective.

Do you still think this is too much of a stretch?

> 
> switchdev allows to offload the bridge datapath to capable devices, but
> you're abusing to it allow non-bridged ports to perform address
> filtering. Completely unrelated.
> 
> Therefore, it seems completely inappropriate to me to use HOST_MDB for
> this reason. This applies to patch #10 as well.
> 
> It really sounds like the HW you're working with is not designed to work
> in this mixed state where some ports are bridged and some are expected to
> act as standalone NICs.

That is quite true, the HW that I work with is limited, and does not
really play well with mixed port usage, but with the help of the network
stack and notifications, we can get very close, or even support it.

One thing that I forgot to explain is that the Ethernet MAC connected to
its internal bcm_sf2 switch, because it is only used with an integrated
switch has been greatly simplified, it does not support any type of
filtering and relies on the switch to do that. It effectively operates
in promiscuous mode all the time.

> 
> If you're still determined to support this use case, I suggest the
> following. In your driver, program the bridge's address list as MDB
> entries when the first port is enslaved to it. Then, add a new netdev
> event whenever an address is added / removed from this list (in
> __dev_set_rx_mode() ?). Have your driver listen to it and program MDB
> entries accordingly.
>
Ido Schimmel Jan. 19, 2019, 1:55 p.m. UTC | #6
On Fri, Jan 18, 2019 at 01:48:56PM -0800, Florian Fainelli wrote:
> On 1/18/19 3:41 AM, Ido Schimmel wrote:
> > On Thu, Jan 17, 2019 at 11:17:57AM -0800, Florian Fainelli wrote:
> >> On 1/17/19 6:05 AM, Ido Schimmel wrote:
> >>> On Wed, Jan 16, 2019 at 12:00:57PM -0800, Florian Fainelli wrote:
> >>>> In order for bridge port members to get a chance to implement unicast
> >>>> and multicast address filtering correctly, which would matter for e.g:
> >>>> switch network devices, synchronize the UC and MC lists down to the
> >>>> individual bridge port members using switchdev HOST_MDB objects such
> >>>> that this does not impact drivers that already have a ndo_set_rx_mode()
> >>>> operation which likely already operate in promiscuous mode.
> >>>>
> >>>> When the bridge has multicast snooping enabled, proper HOST_MDB
> >>>> notifications will be sent through br_mdb_notify() already.
> >>>
> >>> I don't understand the change. HOST_MDB is used to notify underlying
> >>> drivers about MDB entries that should be configured to locally receive
> >>> packets. This is triggered by the transmission of an IGMP report through
> >>> the bridge, for example.
> >>>
> >>> It seems that you're trying to overload HOST_MDB with multicast address
> >>> filtering on bridge ports?
> >>
> >> I don't really think this is an abuse of HOST_MDB, since in case the
> >> bridge has multicast_snooping enabled, and there is e.g: a multicast
> >> application bound to the bridge master device, we would get those
> >> notifications through HOST_MDB already. This is the same use case that I
> >> am addressing here, ndo_set_rx_mode() learns about local multicast
> >> addresses that should be programmed, which means there is a multicast
> >> application listening on the bridge master device itself.
> >>
> >> The problem that I want to solve is that with Broadcom b53/bcm_sf2
> >> switches, we cannot easily filter/flood multicast for the CPU/management
> >> port.
> >>
> >> We have per-port controls for MC/IPMC flooding, and we also have a
> >> separate control for CPU/management port receiving multicast. If either
> >> of these two bits/settings are configured, then the CPU port will always
> >> receive multicast, even when we should be filtering it in HW. The only
> >> way to perform selective reception of multicast to the CPU port is to
> >> program a corresponding MDB entry.
> >>
> >>> Why are you performing this filtering?
> >>
> >> If I do not filter, then non-bridged ports on which there is no
> >> multicast application bound to would be passing up multicast traffic all
> >> the way to the CPU port, which then has to be dropped in software. This
> >> is not acceptable IMHO because it is a deviation from how a standalone
> >> NIC supporting multicast filtering would operate.
> >>
> >>> Shouldn't you allow all MAC addresses to ingress?
> >>
> >> I do allow all MC addresses to ingress on the front-panel switch ports
> >> (while honoring the multicast_snooping setting), but we have no control
> >> over what the CPU/management port should be doing.
> >>
> >> As I wrote earlier, if we flood to the CPU/management port, because
> >> there is at least one switch device port, in the bridge, and that bridge
> >> has multicast_snooping disabled, then this could break filtering for
> >> other, non-bridged ports. That is really not acceptable IMHO.
> >>
> >> The reason why I chose switchdev HOST_MDB notification here are two fold:
> >>
> >> - this is the same use case as with multicast_snooping=1 and we target
> >> the CPU port within DSA to resolve that use case, so from the switch
> >> driver perspective, there is no difference in the context
> >>
> >> - this does not impact network device drivers that have a
> >> ndo_set_rx_mode() and somehow decide to support things through that API
> >> since those would typically have a switchdev_port_attr_set() callback
> > 
> > HOST_MDB was added for a very specific use case. To allow the bridge
> > driver to notify underlying switch drivers about MDB entries that should
> > be programmed to locally receive packets when multicast is enabled.
> > Andrew described it very nicely in merge commit
> > 5d37636abd15ace8686a54167b488364ee79e88d
> > 
> > Ingress filtering is something completely different and not applicable
> > to bridged ports that should allow every address to ingress.
> 
> I actually made a mistake in this patch because there is no need to
> iterate over the switch port members and generate a HOST_MDB
> notification for each of them because what we want to target is the CPU
> port, which DSA internally resolves for us anyway.
> 
> What we want to tell the switch HW here is basically: you have a
> multicast application bound to the bridge master device, so please let
> this MC address go through your CPU/management port. This is effectively
> egress filtering at the CPU port side.
> 
> Because the bridge has multicast_snooping=false, the switch ports have
> been configured to flood MC/IPMC already, but as I wrote, if we do that
> for the CPU port, then we "break" non-bridge ports.
> 
> It seems to me that this is exactly the same use case that what Andrew
> did originally, and drivers that are not pathological like mine can just
> decide to ignore that notification and flood everything to the CPU port.
> The end results would be the same from an end user perspective.
> 
> Do you still think this is too much of a stretch?

Yes, but I don't have a much better solution either.

Can we take a step back and re-consider the whole thing? Is it really
worth it?

IIUC, you are trying to enable Rx filtering on switch ports that are
used as standalone NICs while other switch ports are enslaved to a
bridge. This is with HW that was not designed with this use case in
mind. From that I derive that it is not commonly used like that. I might
be wrong.

Furthermore, there's an application running on top of the bridge which
is using multicast, but for some reason multicast is disabled on the
bridge, despite the fact it is on by default.

To get multicast packets to the application you can either flood them to
the CPU or program specific MDB entries. If you use the first option, it
means that other ports - used as standalone NICs - will also be affected
and will not perform Rx filtering in HW. Is it really critical? The CPU
will be overwhelmed by that? These ports are expected to drop a lot of
packets in HW due to Rx filtering?

I'm not trying to hold you up, I just want to make sure that we are
doing the right thing. This code will need to be maintained and debugged
and if no one will use it beside syzbot, then maybe we can live without
it.

> 
> > 
> > switchdev allows to offload the bridge datapath to capable devices, but
> > you're abusing to it allow non-bridged ports to perform address
> > filtering. Completely unrelated.
> > 
> > Therefore, it seems completely inappropriate to me to use HOST_MDB for
> > this reason. This applies to patch #10 as well.
> > 
> > It really sounds like the HW you're working with is not designed to work
> > in this mixed state where some ports are bridged and some are expected to
> > act as standalone NICs.
> 
> That is quite true, the HW that I work with is limited, and does not
> really play well with mixed port usage, but with the help of the network
> stack and notifications, we can get very close, or even support it.
> 
> One thing that I forgot to explain is that the Ethernet MAC connected to
> its internal bcm_sf2 switch, because it is only used with an integrated
> switch has been greatly simplified, it does not support any type of
> filtering and relies on the switch to do that. It effectively operates
> in promiscuous mode all the time.

Got it. Thanks for explaining.

> 
> > 
> > If you're still determined to support this use case, I suggest the
> > following. In your driver, program the bridge's address list as MDB
> > entries when the first port is enslaved to it. Then, add a new netdev
> > event whenever an address is added / removed from this list (in
> > __dev_set_rx_mode() ?). Have your driver listen to it and program MDB
> > entries accordingly.
> > 
> 
> 
> -- 
> Florian
Florian Fainelli Jan. 20, 2019, 3:22 a.m. UTC | #7
On 1/19/2019 5:55 AM, Ido Schimmel wrote:
> On Fri, Jan 18, 2019 at 01:48:56PM -0800, Florian Fainelli wrote:
>> On 1/18/19 3:41 AM, Ido Schimmel wrote:
>>> On Thu, Jan 17, 2019 at 11:17:57AM -0800, Florian Fainelli wrote:
>>>> On 1/17/19 6:05 AM, Ido Schimmel wrote:
>>>>> On Wed, Jan 16, 2019 at 12:00:57PM -0800, Florian Fainelli wrote:
>>>>>> In order for bridge port members to get a chance to implement unicast
>>>>>> and multicast address filtering correctly, which would matter for e.g:
>>>>>> switch network devices, synchronize the UC and MC lists down to the
>>>>>> individual bridge port members using switchdev HOST_MDB objects such
>>>>>> that this does not impact drivers that already have a ndo_set_rx_mode()
>>>>>> operation which likely already operate in promiscuous mode.
>>>>>>
>>>>>> When the bridge has multicast snooping enabled, proper HOST_MDB
>>>>>> notifications will be sent through br_mdb_notify() already.
>>>>>
>>>>> I don't understand the change. HOST_MDB is used to notify underlying
>>>>> drivers about MDB entries that should be configured to locally receive
>>>>> packets. This is triggered by the transmission of an IGMP report through
>>>>> the bridge, for example.
>>>>>
>>>>> It seems that you're trying to overload HOST_MDB with multicast address
>>>>> filtering on bridge ports?
>>>>
>>>> I don't really think this is an abuse of HOST_MDB, since in case the
>>>> bridge has multicast_snooping enabled, and there is e.g: a multicast
>>>> application bound to the bridge master device, we would get those
>>>> notifications through HOST_MDB already. This is the same use case that I
>>>> am addressing here, ndo_set_rx_mode() learns about local multicast
>>>> addresses that should be programmed, which means there is a multicast
>>>> application listening on the bridge master device itself.
>>>>
>>>> The problem that I want to solve is that with Broadcom b53/bcm_sf2
>>>> switches, we cannot easily filter/flood multicast for the CPU/management
>>>> port.
>>>>
>>>> We have per-port controls for MC/IPMC flooding, and we also have a
>>>> separate control for CPU/management port receiving multicast. If either
>>>> of these two bits/settings are configured, then the CPU port will always
>>>> receive multicast, even when we should be filtering it in HW. The only
>>>> way to perform selective reception of multicast to the CPU port is to
>>>> program a corresponding MDB entry.
>>>>
>>>>> Why are you performing this filtering?
>>>>
>>>> If I do not filter, then non-bridged ports on which there is no
>>>> multicast application bound to would be passing up multicast traffic all
>>>> the way to the CPU port, which then has to be dropped in software. This
>>>> is not acceptable IMHO because it is a deviation from how a standalone
>>>> NIC supporting multicast filtering would operate.
>>>>
>>>>> Shouldn't you allow all MAC addresses to ingress?
>>>>
>>>> I do allow all MC addresses to ingress on the front-panel switch ports
>>>> (while honoring the multicast_snooping setting), but we have no control
>>>> over what the CPU/management port should be doing.
>>>>
>>>> As I wrote earlier, if we flood to the CPU/management port, because
>>>> there is at least one switch device port, in the bridge, and that bridge
>>>> has multicast_snooping disabled, then this could break filtering for
>>>> other, non-bridged ports. That is really not acceptable IMHO.
>>>>
>>>> The reason why I chose switchdev HOST_MDB notification here are two fold:
>>>>
>>>> - this is the same use case as with multicast_snooping=1 and we target
>>>> the CPU port within DSA to resolve that use case, so from the switch
>>>> driver perspective, there is no difference in the context
>>>>
>>>> - this does not impact network device drivers that have a
>>>> ndo_set_rx_mode() and somehow decide to support things through that API
>>>> since those would typically have a switchdev_port_attr_set() callback
>>>
>>> HOST_MDB was added for a very specific use case. To allow the bridge
>>> driver to notify underlying switch drivers about MDB entries that should
>>> be programmed to locally receive packets when multicast is enabled.
>>> Andrew described it very nicely in merge commit
>>> 5d37636abd15ace8686a54167b488364ee79e88d
>>>
>>> Ingress filtering is something completely different and not applicable
>>> to bridged ports that should allow every address to ingress.
>>
>> I actually made a mistake in this patch because there is no need to
>> iterate over the switch port members and generate a HOST_MDB
>> notification for each of them because what we want to target is the CPU
>> port, which DSA internally resolves for us anyway.
>>
>> What we want to tell the switch HW here is basically: you have a
>> multicast application bound to the bridge master device, so please let
>> this MC address go through your CPU/management port. This is effectively
>> egress filtering at the CPU port side.
>>
>> Because the bridge has multicast_snooping=false, the switch ports have
>> been configured to flood MC/IPMC already, but as I wrote, if we do that
>> for the CPU port, then we "break" non-bridge ports.
>>
>> It seems to me that this is exactly the same use case that what Andrew
>> did originally, and drivers that are not pathological like mine can just
>> decide to ignore that notification and flood everything to the CPU port.
>> The end results would be the same from an end user perspective.
>>
>> Do you still think this is too much of a stretch?
> 
> Yes, but I don't have a much better solution either.
> 
> Can we take a step back and re-consider the whole thing? Is it really
> worth it?
> 
> IIUC, you are trying to enable Rx filtering on switch ports that are
> used as standalone NICs while other switch ports are enslaved to a
> bridge. This is with HW that was not designed with this use case in
> mind. From that I derive that it is not commonly used like that. I might
> be wrong.

I bet you are right, we happen to have such an use case but it is
uncommon I agree.

> 
> Furthermore, there's an application running on top of the bridge which
> is using multicast, but for some reason multicast is disabled on the
> bridge, despite the fact it is on by default.

Well, it is again because I wanted to allow supporting all possible use
cases of having multicast snooping on/off and let that be changeable at
runtime. Now that this attribute propagates back to the caller, we could
enforce having bridge multicast snooping always on (provided certain
conditions) and it would not require that specific patch 9 we are
discussing see below for details..

> 
> To get multicast packets to the application you can either flood them to
> the CPU or program specific MDB entries. If you use the first option, it
> means that other ports - used as standalone NICs - will also be affected
> and will not perform Rx filtering in HW. Is it really critical? The CPU
> will be overwhelmed by that? These ports are expected to drop a lot of
> packets in HW due to Rx filtering?
> 
> I'm not trying to hold you up, I just want to make sure that we are
> doing the right thing. This code will need to be maintained and debugged
> and if no one will use it beside syzbot, then maybe we can live without
> it.

I would not have gone through all of this if there was not an use case
that came my way to begin with, and now that there is an use case this
is something that I test too.

Maybe let's be more pragmatic in the approach to support those use cases
and do the following with respect to this particular b53 switch HW: it
is only permissible to have a bridge with multicast snooping disabled
under the following conditions: all other ports are a member of that
bridge or another bridge that has the same multicast snooping setting
(turned on). If there is at least one standalone port, or another bridge
requires multicast snooping on, then turning off multicast snooping in
any bridge device is not possible.

Does that sound acceptable?

Another approach like you hinted is to introduce an even more targeted
event/object in the spirit of ndo_set_rx_mode() but with the flexibility
of switchdev such that only a subset of drivers can act on those
notifications. I felt that HOST_MDB would do, and callers could look at
br_multicast_enabled() on the HOST_MDB's switchdev_obj::orig_dev.

Let's continue discussing the VLAN changes since those could be
beneficial for cpsw as well.

> 
>>
>>>
>>> switchdev allows to offload the bridge datapath to capable devices, but
>>> you're abusing to it allow non-bridged ports to perform address
>>> filtering. Completely unrelated.
>>>
>>> Therefore, it seems completely inappropriate to me to use HOST_MDB for
>>> this reason. This applies to patch #10 as well.
>>>
>>> It really sounds like the HW you're working with is not designed to work
>>> in this mixed state where some ports are bridged and some are expected to
>>> act as standalone NICs.
>>
>> That is quite true, the HW that I work with is limited, and does not
>> really play well with mixed port usage, but with the help of the network
>> stack and notifications, we can get very close, or even support it.
>>
>> One thing that I forgot to explain is that the Ethernet MAC connected to
>> its internal bcm_sf2 switch, because it is only used with an integrated
>> switch has been greatly simplified, it does not support any type of
>> filtering and relies on the switch to do that. It effectively operates
>> in promiscuous mode all the time.
> 
> Got it. Thanks for explaining.
> 
>>
>>>
>>> If you're still determined to support this use case, I suggest the
>>> following. In your driver, program the bridge's address list as MDB
>>> entries when the first port is enslaved to it. Then, add a new netdev
>>> event whenever an address is added / removed from this list (in
>>> __dev_set_rx_mode() ?). Have your driver listen to it and program MDB
>>> entries accordingly.
>>>
>>
>>
>> -- 
>> Florian
Ido Schimmel Jan. 21, 2019, 8:41 a.m. UTC | #8
On Sat, Jan 19, 2019 at 07:22:21PM -0800, Florian Fainelli wrote:
> Maybe let's be more pragmatic in the approach to support those use cases
> and do the following with respect to this particular b53 switch HW: it
> is only permissible to have a bridge with multicast snooping disabled
> under the following conditions: all other ports are a member of that
> bridge or another bridge that has the same multicast snooping setting
> (turned on). If there is at least one standalone port, or another bridge
> requires multicast snooping on, then turning off multicast snooping in
> any bridge device is not possible.
> 
> Does that sound acceptable?

Yes.

> Let's continue discussing the VLAN changes since those could be
> beneficial for cpsw as well.

OK. Will reply there.
Jiri Pirko Jan. 21, 2019, 8:46 a.m. UTC | #9
Fri, Jan 18, 2019 at 12:41:08PM CET, idosch@mellanox.com wrote:
>On Thu, Jan 17, 2019 at 11:17:57AM -0800, Florian Fainelli wrote:

[...]


>
>switchdev allows to offload the bridge datapath to capable devices, but
>you're abusing to it allow non-bridged ports to perform address
>filtering. Completely unrelated.

Florian, please make sure not to do that. Switchdev code is now 100%
bridge offload facility, it would be preferable to keep it that way. I
plan to move the code to bridge and get rid of net/switchdev completely
to avoid confusions. Arkadi was working on that in past but he did not
finish it, unfortunatelly.
diff mbox series

Patch

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 013323b6dbe4..ce10d6b7b151 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -18,6 +18,7 @@ 
 #include <linux/ethtool.h>
 #include <linux/list.h>
 #include <linux/netfilter_bridge.h>
+#include <net/switchdev.h>
 
 #include <linux/uaccess.h>
 #include "br_private.h"
@@ -182,8 +183,62 @@  static int br_dev_open(struct net_device *dev)
 	return 0;
 }
 
+static int bridge_sync_unsync_mc_addr(struct net_device *dev,
+				      const unsigned char *addr,
+				      bool add)
+{
+	struct net_bridge *br = netdev_priv(dev);
+	struct switchdev_obj_port_mdb mdb = {
+		.obj = {
+			.orig_dev = dev,
+			.id = SWITCHDEV_OBJ_ID_HOST_MDB,
+			.flags = SWITCHDEV_F_DEFER,
+		},
+		.vid = 0,
+	};
+	struct net_bridge_port *p;
+	int ret = -EOPNOTSUPP;
+
+#ifdef CONFIG_BRIDGE_VLAN_FILTERING
+	if (br_vlan_enabled(dev))
+		mdb.vid = br->default_pvid;
+#endif
+
+	ether_addr_copy(mdb.addr, addr);
+	spin_lock_bh(&br->lock);
+	list_for_each_entry(p, &br->port_list, list) {
+		if (add)
+			ret = switchdev_port_obj_add(p->dev, &mdb.obj, NULL);
+		else
+			ret = switchdev_port_obj_del(p->dev, &mdb.obj);
+		if (ret)
+			goto out;
+	}
+out:
+	spin_unlock_bh(&br->lock);
+	return ret;
+}
+
+static int bridge_sync_mc_addr(struct net_device *dev,
+			       const unsigned char *addr)
+{
+	return bridge_sync_unsync_mc_addr(dev, addr, true);
+}
+
+static int bridge_unsync_mc_addr(struct net_device *dev,
+				 const unsigned char *addr)
+{
+	return bridge_sync_unsync_mc_addr(dev, addr, false);
+}
+
 static void br_dev_set_multicast_list(struct net_device *dev)
 {
+	/* HOST_MDB notifications are sent through MDB notifications */
+	if (br_multicast_enabled(dev))
+		return;
+
+	__hw_addr_sync_dev(&dev->mc, dev, bridge_sync_mc_addr,
+			   bridge_unsync_mc_addr);
 }
 
 static void br_dev_change_rx_flags(struct net_device *dev, int change)