Message ID | 20190107162946.13072-2-soltys@ziu.info |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | bonding: fix PACKET_ORIGDEV regression | expand |
From: Michal Soltys <soltys@ziu.info> Date: Mon, 7 Jan 2019 17:29:46 +0100 > This patch reverts: > > b89f04c61efe bonding: deliver link-local packets with skb->dev set to link that packets arrived on > > And its subsequent fixups: > > 6a9e461f6fe4 bonding: pass link-local packets to bonding master also. > 0f3b914c9cfc bonding: fix warning message > > The intended functionality of the original patch (as explained by its > author) has been available in the kernel since v2.6.21-350-g80feaacb8a64 > via PACKET_ORIGDEV socket option. The patch also broke that feature, as > it's now no longer possible to get the original incoming device. Quoting > the report: > >> Unfortunately, this doesn't completely restore the previous >> functionality as PACKET_ORIGDEV is broken for the copy: the original >> interface is lost through the call to netif_rx(). A LLDP daemon >> listening to the master interface won't get the original interface like >> it was able to before 4.12. > > The patch reverts to pre-b89f04c61efe state, so: > > - both master and original (via PACKET_ORIGDEV) devices are available > when listening on the master > - original device is available when listening directly on one of its > slaves > > Reported-by: Vincent Bernat <vincent@bernat.ch> > Signed-off-by: Michal Soltys <soltys@ziu.info> It is tiring and disappointing to see the behavior sway back and forth multiple times like this. We are breaking someone, then breaking them again if they adjusted to the new behavior. Pretty much unacceptable. I'm not applying anything until someone can convince me the full scope of the situation and why this time it's "right."
❦ 7 janvier 2019 09:12 -08, David Miller <davem@davemloft.net>: >> The patch reverts to pre-b89f04c61efe state, so: >> >> - both master and original (via PACKET_ORIGDEV) devices are available >> when listening on the master >> - original device is available when listening directly on one of its >> slaves >> >> Reported-by: Vincent Bernat <vincent@bernat.ch> >> Signed-off-by: Michal Soltys <soltys@ziu.info> Maybe add: Fixes: b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") > It is tiring and disappointing to see the behavior sway back and forth > multiple times like this. We are breaking someone, then breaking them > again if they adjusted to the new behavior. > > Pretty much unacceptable. > > I'm not applying anything until someone can convince me the full scope > of the situation and why this time it's "right." For me, the problem was not visible until the fix introduced for 4.19, hence the late report of the problem. Before 4.12 (and from 2.6.27 for non-multicast packets), the behaviour was: - ability to receive link-local packets on physical interfaces by listening directly to them, - ability to receive link-local packets on bonded devices and get the original interface using PACKET_ORIGDEV. The patch introduced in 4.12 doesn't change the first ability and breaks the second one as the packets are not received on bonded devices anymore. With the fix introduced in 4.19 enables to receive link-local packets on bonded devices again, but it's still not possible to get the original interface with PACKET_ORIGDEV. I don't understand what b89f04c61efe purpose is. My first report was on November 30th <https://marc.info/?l=linux-netdev&m=154361371507825&w=2> but I didn't get any additional info, notably how the original interface is retrieved when listening to a bond device. Without an example of usage, it's difficult to understand if a revert would break anything. However, a revert would unbreak users of PACKET_ORIGDEV.
From: Michal Soltys <soltys@ziu.info> Date: Mon, 7 Jan 2019 17:29:46 +0100 > This patch reverts: > > b89f04c61efe bonding: deliver link-local packets with skb->dev set to link that packets arrived on > > And its subsequent fixups: > > 6a9e461f6fe4 bonding: pass link-local packets to bonding master also. > 0f3b914c9cfc bonding: fix warning message > > The intended functionality of the original patch (as explained by its > author) has been available in the kernel since v2.6.21-350-g80feaacb8a64 > via PACKET_ORIGDEV socket option. The patch also broke that feature, as > it's now no longer possible to get the original incoming device. Quoting > the report: > >> Unfortunately, this doesn't completely restore the previous >> functionality as PACKET_ORIGDEV is broken for the copy: the original >> interface is lost through the call to netif_rx(). A LLDP daemon >> listening to the master interface won't get the original interface like >> it was able to before 4.12. > > The patch reverts to pre-b89f04c61efe state, so: > > - both master and original (via PACKET_ORIGDEV) devices are available > when listening on the master > - original device is available when listening directly on one of its > slaves > > Reported-by: Vincent Bernat <vincent@bernat.ch> > Signed-off-by: Michal Soltys <soltys@ziu.info> Google folks I want to hear what you have to say about all of this. Thank you.
So I don't remember the specifics... (note I'm writing this all from memory without looking it up/testing it - I may be utterly wrong or dreaming) But I seem to recall that the core problem we were trying to solve was that a daemon listening on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device would not receive LLDP packets arriving on inactive bond slaves (either active-backup or lag). [inactive = link/carrier up, but not part of active aggregator] This made monitoring for miscabling harder (IFIRC the only non kernel fix was to get the daemon to create a separate AF_PACKET/88CC socket bound to every physical interface in the system, or monitor for inactive slaves and add extra packet sockets as needed). They would get re-parented to the master and then since the slave was inactive they would be considered RX_HANDLER_EXACT match only and not match the * interface. Honestly I wasn't aware of PACKET_ORIGDEV, although I don't think it helps in this case - AFAICR the packets never made it to the packet socket. Perhaps going from: /* don't change skb->dev for link-local packets */ if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; if (bond_should_deliver_exact_match(skb, slave, bond)) return RX_HANDLER_EXACT; to something more like: if (bond_should_deliver_exact_match(skb, slave, bond)) { /* don't change skb->dev for link-local packets on inactive slaves */ if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; return RX_HANDLER_EXACT; } would fix both problems? On Sun, Jan 13, 2019 at 3:05 PM David Miller <davem@davemloft.net> wrote: > > From: Michal Soltys <soltys@ziu.info> > Date: Mon, 7 Jan 2019 17:29:46 +0100 > > > This patch reverts: > > > > b89f04c61efe bonding: deliver link-local packets with skb->dev set to link that packets arrived on > > > > And its subsequent fixups: > > > > 6a9e461f6fe4 bonding: pass link-local packets to bonding master also. > > 0f3b914c9cfc bonding: fix warning message > > > > The intended functionality of the original patch (as explained by its > > author) has been available in the kernel since v2.6.21-350-g80feaacb8a64 > > via PACKET_ORIGDEV socket option. The patch also broke that feature, as > > it's now no longer possible to get the original incoming device. Quoting > > the report: > > > >> Unfortunately, this doesn't completely restore the previous > >> functionality as PACKET_ORIGDEV is broken for the copy: the original > >> interface is lost through the call to netif_rx(). A LLDP daemon > >> listening to the master interface won't get the original interface like > >> it was able to before 4.12. > > > > The patch reverts to pre-b89f04c61efe state, so: > > > > - both master and original (via PACKET_ORIGDEV) devices are available > > when listening on the master > > - original device is available when listening directly on one of its > > slaves > > > > Reported-by: Vincent Bernat <vincent@bernat.ch> > > Signed-off-by: Michal Soltys <soltys@ziu.info> > > Google folks I want to hear what you have to say about all of this. > > Thank you.
❦ 13 janvier 2019 18:01 -08, Maciej Żenczykowski <zenczykowski@gmail.com>: > But I seem to recall that the core problem we were trying to solve was > that a daemon listening > on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device > would not receive LLDP packets > arriving on inactive bond slaves (either active-backup or lag). Just tested and with 4.9.150, I am in fact unable to receive anything on a backup link when listening to the active-backup master device or to "any" device. > Perhaps going from: > /* don't change skb->dev for link-local packets */ > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > if (bond_should_deliver_exact_match(skb, slave, bond)) return > RX_HANDLER_EXACT; > > to something more like: > if (bond_should_deliver_exact_match(skb, slave, bond)) { > /* don't change skb->dev for link-local packets on inactive slaves */ > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > return RX_HANDLER_EXACT; > } > > would fix both problems? It makes PACKET_ORIGDEV works again. Moreover, when not binding to any interface, we receive packets on both active and backup links. But when binding to the master device, I only receive packets from the active devices (which is the same behaviour than pre-4.12). When not binding to any device and not using PACKET_ORIGDEV, one packet is said to be from the master device and one packet is said to be from the backup device. Previously, I had one packet from active device, one packet from backup device and two packets from master device. For me, this is a better situation than previously as we return to the situation before 4.12 but you can get what you want by not binding to any device _and_ using PACKET_ORIGDEV (otherwise, you are don't get the right interface in all cases). If it's unclear, I can provide more extensive results. I am using this test program (comment the s.bind/s.setsockopt line if needed): #v+ #!/usr/bin/env python3 import sys import socket import datetime socket.SOL_PACKET = 263 socket.PACKET_ORIGDEV = 9 s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.htons(0x88cc)) s.bind(("bond0", 0)) s.setsockopt(socket.SOL_PACKET, socket.PACKET_ORIGDEV, 1) while True: data, addrinfo = s.recvfrom(1500) if addrinfo[2] == socket.PACKET_OUTGOING: continue print(f"{datetime.datetime.now().isoformat()}: " f"Received {len(data)} bytes from {addrinfo}") #v-
On Mon, Jan 14, 2019 at 12:00 AM Vincent Bernat <vincent@bernat.ch> wrote: > > ❦ 13 janvier 2019 18:01 -08, Maciej Żenczykowski <zenczykowski@gmail.com>: > > > But I seem to recall that the core problem we were trying to solve was > > that a daemon listening > > on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device > > would not receive LLDP packets > > arriving on inactive bond slaves (either active-backup or lag). > > Just tested and with 4.9.150, I am in fact unable to receive anything > on a backup link when listening to the active-backup master device or to > "any" device. > > > Perhaps going from: > > /* don't change skb->dev for link-local packets */ > > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > > if (bond_should_deliver_exact_match(skb, slave, bond)) return > > RX_HANDLER_EXACT; > > > > to something more like: > > if (bond_should_deliver_exact_match(skb, slave, bond)) { > > /* don't change skb->dev for link-local packets on inactive slaves */ > > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > > return RX_HANDLER_EXACT; > > } > > > > would fix both problems? > thanks for jumping in and offering a solution. This should fix the issue. NACK for the revert-patch! Folks, please, revert is not the solution! Last time when there was a problem posted I offered you a solution, so wasn't that enough to prove that we care about solving the problem that you are facing while continuing to have this functionality? No one wants to break your use case, it happens only because one is not aware of it. Thank you David for resorting to resolve it. > It makes PACKET_ORIGDEV works again. Moreover, when not binding to any > interface, we receive packets on both active and backup links. But when > binding to the master device, I only receive packets from the active > devices (which is the same behaviour than pre-4.12). When not binding to > any device and not using PACKET_ORIGDEV, one packet is said to be from > the master device and one packet is said to be from the backup device. > Previously, I had one packet from active device, one packet from backup > device and two packets from master device. > > For me, this is a better situation than previously as we return to the > situation before 4.12 but you can get what you want by not binding to > any device _and_ using PACKET_ORIGDEV (otherwise, you are don't get the > right interface in all cases). > > If it's unclear, I can provide more extensive results. I am using this > test program (comment the s.bind/s.setsockopt line if needed): > > #v+ > #!/usr/bin/env python3 > > import sys > import socket > import datetime > > socket.SOL_PACKET = 263 > socket.PACKET_ORIGDEV = 9 > > s = socket.socket(socket.AF_PACKET, > socket.SOCK_RAW, > socket.htons(0x88cc)) > s.bind(("bond0", 0)) > s.setsockopt(socket.SOL_PACKET, socket.PACKET_ORIGDEV, 1) > while True: > data, addrinfo = s.recvfrom(1500) > if addrinfo[2] == socket.PACKET_OUTGOING: > continue > print(f"{datetime.datetime.now().isoformat()}: " > f"Received {len(data)} bytes from {addrinfo}") > #v- > -- > Localise input and output in subroutines. > - The Elements of Programming Style (Kernighan & Plauger)
On 19/01/14 03:01, Maciej Żenczykowski wrote: > So I don't remember the specifics... > > (note I'm writing this all from memory without looking it up/testing > it - I may be utterly wrong or dreaming) > > But I seem to recall that the core problem we were trying to solve was > that a daemon listening > on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device > would not receive LLDP packets > arriving on inactive bond slaves (either active-backup or lag). > > [inactive = link/carrier up, but not part of active aggregator] > > This made monitoring for miscabling harder (IFIRC the only non kernel > fix was to get the daemon to create > a separate AF_PACKET/88CC socket bound to every physical interface in > the system, or monitor for > inactive slaves and add extra packet sockets as needed). > > They would get re-parented to the master and then since the slave was > inactive they would be considered RX_HANDLER_EXACT match only and not > match the * interface. > > Honestly I wasn't aware of PACKET_ORIGDEV, although I don't think it > helps in this case - AFAICR the packets never made it to the packet > socket. > > Perhaps going from: > /* don't change skb->dev for link-local packets */ > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > if (bond_should_deliver_exact_match(skb, slave, bond)) return > RX_HANDLER_EXACT; > > to something more like: > if (bond_should_deliver_exact_match(skb, slave, bond)) { > /* don't change skb->dev for link-local packets on inactive slaves */ > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > return RX_HANDLER_EXACT; > } > > would fix both problems? > I'll test this change in bridging scenarios in coming days. And thanks for the explanation of what was the issue in your case.
On 19/01/15 03:19, Mahesh Bandewar (महेश बंडेवार) wrote: > On Mon, Jan 14, 2019 at 12:00 AM Vincent Bernat <vincent@bernat.ch> wrote: >> >> ❦ 13 janvier 2019 18:01 -08, Maciej Żenczykowski <zenczykowski@gmail.com>: >> >> > But I seem to recall that the core problem we were trying to solve was >> > that a daemon listening >> > on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device >> > would not receive LLDP packets >> > arriving on inactive bond slaves (either active-backup or lag). >> >> Just tested and with 4.9.150, I am in fact unable to receive anything >> on a backup link when listening to the active-backup master device or to >> "any" device. >> >> > Perhaps going from: >> > /* don't change skb->dev for link-local packets */ >> > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; >> > if (bond_should_deliver_exact_match(skb, slave, bond)) return >> > RX_HANDLER_EXACT; >> > >> > to something more like: >> > if (bond_should_deliver_exact_match(skb, slave, bond)) { >> > /* don't change skb->dev for link-local packets on inactive slaves */ >> > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; >> > return RX_HANDLER_EXACT; >> > } >> > >> > would fix both problems? >> > thanks for jumping in and offering a solution. This should fix the issue. > > NACK for the revert-patch! > > Folks, please, revert is not the solution! Last time when there was a > problem posted I offered you a solution, so wasn't that enough to > prove that we care about solving the problem that you are facing while > continuing to have this functionality? No one wants to break your use > case, it happens only because one is not aware of it. Thank you David > for resorting to resolve it. Mahesh, that's not it. But: Since Vincent reported PACKET_ORIGDEV regression late november, none of you replied to anything posted until now. And if David hadn't called you guys directly, I'm not sure you would have at this point. Reverting and Vincent's offer to patch to update packet(7) were also clearly mentioned in the previous thread, none of them commented/nacked/acked either. Me an Vincent have been scratching our head for a while off list - but our guessing can only go as far as time goes on. Maciej now was the first to ever provide the actual details about the issue you were facing originally. My bad I haven't added him to CC from the very beginning. Anyway, I'll test Maciej's version in bridging context in coming days and look closer at the code overall. It probably works fine if Vincent is seeing packets on masters, but I'd rather be sure.
On 19/01/14 03:01, Maciej Żenczykowski wrote: > So I don't remember the specifics... > > (note I'm writing this all from memory without looking it up/testing > it - I may be utterly wrong or dreaming) > > But I seem to recall that the core problem we were trying to solve was > that a daemon listening > on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device > would not receive LLDP packets > arriving on inactive bond slaves (either active-backup or lag). > > [inactive = link/carrier up, but not part of active aggregator] > > This made monitoring for miscabling harder (IFIRC the only non kernel > fix was to get the daemon to create > a separate AF_PACKET/88CC socket bound to every physical interface in > the system, or monitor for > inactive slaves and add extra packet sockets as needed). > > They would get re-parented to the master and then since the slave was > inactive they would be considered RX_HANDLER_EXACT match only and not > match the * interface. > > Honestly I wasn't aware of PACKET_ORIGDEV, although I don't think it > helps in this case - AFAICR the packets never made it to the packet > socket. > > Perhaps going from: > /* don't change skb->dev for link-local packets */ > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > if (bond_should_deliver_exact_match(skb, slave, bond)) return > RX_HANDLER_EXACT; > > to something more like: > if (bond_should_deliver_exact_match(skb, slave, bond)) { > /* don't change skb->dev for link-local packets on inactive slaves */ > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > return RX_HANDLER_EXACT; > } Having checked the code (if I get the flow correctly), one thing/question - currently with Mahesh's fixes, not bound LLDP listener will receive all packets - both from active and inactive slaves directly (as the check for suppression is done after the link-local check). The version above will do the suppression check first - so all inactive slaves - excluding non-multi/non-broad ALB - will pass it and return RX_HANDLER_PASS if the packet is link-local. So those will be available w/o binding, but active slaves' packets will be available via master device (but with working PACKET_ORIGDEV now - so slave device can be retrieved easily). This is fine in your scenario I presume ?
I'm not sure there's a truly good answer. Also note, that there's subtle differences between af_packet sockets tied to ALL protocols vs tied to specific protocols (vs none/0 of course). However ETH_P_ALL protocol raw sockets need to be avoided if at all possible due to perf impact. Certainly we don't want any of these created outside of debugging. I believe we only cared about the utterly unbound to any device case working (ie. delivering all LLDP packets all nics are receiving). So that a single simple daemon could collect and export all the link information. [btw. now that I know about the PACKET_ORIGDEV option, I do - unsurprisingly - see our daemon sets it] We really didn't want the complexity of having to bind to individual interfaces and having to try to dynamically adjust the set of raw sockets. (not to mention that less raw sockets is always good, also it's never actually entirely clear which interfaces would need to be monitored due to the preponderance of tunnels/macvlan/ipvlan/veth/dummy and other virtual interface types) I think from a logic perspective: - if you bind to slave (physical interface) you should see *all* packets on that interface, regardless of whether the slave is active or not, and whether the packet is link-local or not. ie. this should show you exactly what nic is receiving (& sending: PACKET_OUTGOING) I should see *exactly* one copy of any packet. This mode has to be useful for debugging. This includes seeing packets which aren't even destined for me if nic receives them. (ie. destined to unicast/multicast mac we'd filter out: PACKET_OTHERHOST) Although by itself this socket's existence shouldn't affect nic mac filters and promiscuous mode (I believe there are all sorts of various additional socket options to change those). - if you don't bind to an interface then I think I'd expect to see packets delivered to stack + link local packets - ie. should not see non-link-local packets discarded due to being on inactive slaves similarly to how I should not see packets filtered out by virtue of mac not matching interface mac filters - IMHO you should see *all* link local packets arriving at system (ie. the original change's purpose) [and I think, but am not certain, I shouldn't need to use any socket options to register the link local macs, although glancing at code I do think the daemon uses PACKET_ADD_MEMBERSHIP to register lldp mac, so perhaps filtering should apply as normal?] - with PACKET_ORIGDEV they should always show up as from the physical interfaces (ie. slaves) - without PACKET_ORIGDEV: - non link local packets should show up as from bonding/team master - link local packets on active slaves could show up on master or slave (probably master is required ifirc some earlier fix???) - link local packets on inactive slaves should show up on slave - and definitely not on master - I don't think I should ever see any PACKET_OTHERHOST packets - if you bind to master you should see packets from active slaves (ie. those that will get delivered to stack) - clearly you should not see non-link-local inactive slave packets (they'll be dropped) - behaviour wrt. link local packets is more dicey... (I believe somewhere in these threads patches there was some description of what standard requires, but I don't off the top of my head remember) - for an active-backup bond it would seem logical to see only active links link local - for a multiple-active aggregate bond I'd be fine with seeing none, or all active slaves link local packets - I guess even though the later is confusing it makes sense - it might be okay to see link-local inactive slave packets, but I think I'd prefer not to (could be configurable though) - this would seem confusing/wrong to me. - and again I don't think I should see any PACKET_OTHERHOST packets here... - PACKET_ORIGDEV as above... I gave the above a fair amount of thought... but I'm not guaranteeing I didn't make typos, or write something utterly stupid or unimplementable or not how stuff should work for other reasons... Comments welcome. I think this continues to be in line with my proposal from earlier in the thread? On Thu, Jan 17, 2019 at 4:27 PM Michal Soltys <soltys@ziu.info> wrote: > > On 19/01/14 03:01, Maciej Żenczykowski wrote: > > So I don't remember the specifics... > > > > (note I'm writing this all from memory without looking it up/testing > > it - I may be utterly wrong or dreaming) > > > > But I seem to recall that the core problem we were trying to solve was > > that a daemon listening > > on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device > > would not receive LLDP packets > > arriving on inactive bond slaves (either active-backup or lag). > > > > [inactive = link/carrier up, but not part of active aggregator] > > > > This made monitoring for miscabling harder (IFIRC the only non kernel > > fix was to get the daemon to create > > a separate AF_PACKET/88CC socket bound to every physical interface in > > the system, or monitor for > > inactive slaves and add extra packet sockets as needed). > > > > They would get re-parented to the master and then since the slave was > > inactive they would be considered RX_HANDLER_EXACT match only and not > > match the * interface. > > > > Honestly I wasn't aware of PACKET_ORIGDEV, although I don't think it > > helps in this case - AFAICR the packets never made it to the packet > > socket. > > > > Perhaps going from: > > /* don't change skb->dev for link-local packets */ > > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > > if (bond_should_deliver_exact_match(skb, slave, bond)) return > > RX_HANDLER_EXACT; > > > > to something more like: > > if (bond_should_deliver_exact_match(skb, slave, bond)) { > > /* don't change skb->dev for link-local packets on inactive slaves */ > > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > > return RX_HANDLER_EXACT; > > } > > Having checked the code (if I get the flow correctly), one > thing/question - currently with Mahesh's fixes, not bound LLDP listener > will receive all packets - both from active and inactive slaves directly > (as the check for suppression is done after the link-local check). > > The version above will do the suppression check first - so all inactive > slaves - excluding non-multi/non-broad ALB - will pass it and return > RX_HANDLER_PASS if the packet is link-local. So those will be available > w/o binding, but active slaves' packets will be available via master > device (but with working PACKET_ORIGDEV now - so slave device can be > retrieved easily). This is fine in your scenario I presume ? >
On 19/01/18 07:58, Maciej Żenczykowski wrote: > I'm not sure there's a truly good answer. > > Also note, that there's subtle differences between af_packet sockets > tied to ALL protocols vs tied to specific protocols (vs none/0 of > course). > However ETH_P_ALL protocol raw sockets need to be avoided if at all > possible due to perf impact. > Certainly we don't want any of these created outside of debugging. > > I believe we only cared about the utterly unbound to any device case > working (ie. delivering all LLDP packets all nics are receiving). > So that a single simple daemon could collect and export all the link > information. > [btw. now that I know about the PACKET_ORIGDEV option, I do - > unsurprisingly - see our daemon sets it] > We really didn't want the complexity of having to bind to individual > interfaces and having to try to dynamically adjust the set of raw > sockets. > (not to mention that less raw sockets is always good, also it's never > actually entirely clear which interfaces would need to be monitored > due to the preponderance of tunnels/macvlan/ipvlan/veth/dummy and > other virtual interface types) > > I think from a logic perspective: > > - if you bind to slave (physical interface) you should see *all* > packets on that interface, > regardless of whether the slave is active or not, and whether the > packet is link-local or not. > ie. this should show you exactly what nic is receiving (& sending: > PACKET_OUTGOING) > I should see *exactly* one copy of any packet. > This mode has to be useful for debugging. > This includes seeing packets which aren't even destined for me if nic > receives them. > (ie. destined to unicast/multicast mac we'd filter out: PACKET_OTHERHOST) > Although by itself this socket's existence shouldn't affect nic mac > filters and promiscuous mode > (I believe there are all sorts of various additional socket options to > change those). > > - if you don't bind to an interface then I think I'd expect to see > packets delivered to stack + link local packets > - ie. should not see non-link-local packets discarded due to being > on inactive slaves > similarly to how I should not see packets filtered out by virtue > of mac not matching interface mac filters > - IMHO you should see *all* link local packets arriving at system > (ie. the original change's purpose) > [and I think, but am not certain, I shouldn't need to use any > socket options to register the link local macs, > although glancing at code I do think the daemon uses > PACKET_ADD_MEMBERSHIP to register lldp mac, > so perhaps filtering should apply as normal?] > - with PACKET_ORIGDEV they should always show up as from the > physical interfaces (ie. slaves) > - without PACKET_ORIGDEV: > - non link local packets should show up as from bonding/team master > - link local packets on active slaves could show up on master or > slave (probably master is required ifirc some earlier fix???) As far as I can see, they (LLCs coming on active slave) will show via master as from master (w/o the socket option) or as from slave (with). Think Vincent's tests confirmed it. > - link local packets on inactive slaves should show up on slave - > and definitely not on master > - I don't think I should ever see any PACKET_OTHERHOST packets > > - if you bind to master you should see packets from active slaves (ie. > those that will get delivered to stack) > - clearly you should not see non-link-local inactive slave packets > (they'll be dropped) > - behaviour wrt. link local packets is more dicey... (I believe > somewhere in these threads patches there was some description of what > standard requires, but I don't off the top of my head remember) They must be seen, otherwise bonds attached to a linux bridge (I assume enslaving an interface to a bridge essentially counts as bind) will be blind to them (among those - e.g. to spanning tree information - this was what originally caused issues for me last year). Even the bridge code goes to extra length to not carelessly consume stp packets, if it's not actively participating in stp (of any kind). Aside that, certain [recent] bridge features like group_fwd_mask would be non-functional on bond ports as well. As for standard (the one I quoted in the oldest thread) expected the link-local packet to be both readable via master and slaves depending on need (though it didn't go into exact gory details). Your current patch I think cover all possible cases quite greacefully. > - for an active-backup bond it would seem logical to see only active > links link local > - for a multiple-active aggregate bond I'd be fine with seeing none, > or all active slaves link local packets - I guess even though the > later is confusing it makes sense > - it might be okay to see link-local inactive slave packets, but I > think I'd prefer not to (could be configurable though) - this would > seem confusing/wrong to me. > - and again I don't think I should see any PACKET_OTHERHOST packets here... > - PACKET_ORIGDEV as above... > > I gave the above a fair amount of thought... but I'm not guaranteeing > I didn't make typos, or write something utterly stupid or > unimplementable or not how stuff should work for other reasons... > Comments welcome. > > I think this continues to be in line with my proposal from earlier in > the thread? (sorry for late reply) Yes, pretty much spot on. With some confirmation comments above. I did bridging tests yesterday (2 LACP bonds attached via separate switches treating linux bridge as a shared segment in mstp scenario) and all is working fine on my side as well. If everyone agrees with the proposed code, I will submit v2 patch with added comment explaining basic logic (or you could submit if you prefer). > > On Thu, Jan 17, 2019 at 4:27 PM Michal Soltys <soltys@ziu.info> wrote: >> >> On 19/01/14 03:01, Maciej Żenczykowski wrote: >> > So I don't remember the specifics... >> > >> > (note I'm writing this all from memory without looking it up/testing >> > it - I may be utterly wrong or dreaming) >> > >> > But I seem to recall that the core problem we were trying to solve was >> > that a daemon listening >> > on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device >> > would not receive LLDP packets >> > arriving on inactive bond slaves (either active-backup or lag). >> > >> > [inactive = link/carrier up, but not part of active aggregator] >> > >> > This made monitoring for miscabling harder (IFIRC the only non kernel >> > fix was to get the daemon to create >> > a separate AF_PACKET/88CC socket bound to every physical interface in >> > the system, or monitor for >> > inactive slaves and add extra packet sockets as needed). >> > >> > They would get re-parented to the master and then since the slave was >> > inactive they would be considered RX_HANDLER_EXACT match only and not >> > match the * interface. >> > >> > Honestly I wasn't aware of PACKET_ORIGDEV, although I don't think it >> > helps in this case - AFAICR the packets never made it to the packet >> > socket. >> > >> > Perhaps going from: >> > /* don't change skb->dev for link-local packets */ >> > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; >> > if (bond_should_deliver_exact_match(skb, slave, bond)) return >> > RX_HANDLER_EXACT; >> > >> > to something more like: >> > if (bond_should_deliver_exact_match(skb, slave, bond)) { >> > /* don't change skb->dev for link-local packets on inactive slaves */ >> > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; >> > return RX_HANDLER_EXACT; >> > } >> >> Having checked the code (if I get the flow correctly), one >> thing/question - currently with Mahesh's fixes, not bound LLDP listener >> will receive all packets - both from active and inactive slaves directly >> (as the check for suppression is done after the link-local check). >> >> The version above will do the suppression check first - so all inactive >> slaves - excluding non-multi/non-broad ALB - will pass it and return >> RX_HANDLER_PASS if the packet is link-local. So those will be available >> w/o binding, but active slaves' packets will be available via master >> device (but with working PACKET_ORIGDEV now - so slave device can be >> retrieved easily). This is fine in your scenario I presume ? >> >
Sounds good to me. On Mon, Jan 28, 2019 at 5:47 PM Michal Soltys <soltys@ziu.info> wrote: > > On 19/01/18 07:58, Maciej Żenczykowski wrote: > > I'm not sure there's a truly good answer. > > > > Also note, that there's subtle differences between af_packet sockets > > tied to ALL protocols vs tied to specific protocols (vs none/0 of > > course). > > However ETH_P_ALL protocol raw sockets need to be avoided if at all > > possible due to perf impact. > > Certainly we don't want any of these created outside of debugging. > > > > I believe we only cared about the utterly unbound to any device case > > working (ie. delivering all LLDP packets all nics are receiving). > > So that a single simple daemon could collect and export all the link > > information. > > [btw. now that I know about the PACKET_ORIGDEV option, I do - > > unsurprisingly - see our daemon sets it] > > We really didn't want the complexity of having to bind to individual > > interfaces and having to try to dynamically adjust the set of raw > > sockets. > > (not to mention that less raw sockets is always good, also it's never > > actually entirely clear which interfaces would need to be monitored > > due to the preponderance of tunnels/macvlan/ipvlan/veth/dummy and > > other virtual interface types) > > > > I think from a logic perspective: > > > > - if you bind to slave (physical interface) you should see *all* > > packets on that interface, > > regardless of whether the slave is active or not, and whether the > > packet is link-local or not. > > ie. this should show you exactly what nic is receiving (& sending: > > PACKET_OUTGOING) > > I should see *exactly* one copy of any packet. > > This mode has to be useful for debugging. > > This includes seeing packets which aren't even destined for me if nic > > receives them. > > (ie. destined to unicast/multicast mac we'd filter out: PACKET_OTHERHOST) > > Although by itself this socket's existence shouldn't affect nic mac > > filters and promiscuous mode > > (I believe there are all sorts of various additional socket options to > > change those). > > > > - if you don't bind to an interface then I think I'd expect to see > > packets delivered to stack + link local packets > > - ie. should not see non-link-local packets discarded due to being > > on inactive slaves > > similarly to how I should not see packets filtered out by virtue > > of mac not matching interface mac filters > > - IMHO you should see *all* link local packets arriving at system > > (ie. the original change's purpose) > > [and I think, but am not certain, I shouldn't need to use any > > socket options to register the link local macs, > > although glancing at code I do think the daemon uses > > PACKET_ADD_MEMBERSHIP to register lldp mac, > > so perhaps filtering should apply as normal?] > > - with PACKET_ORIGDEV they should always show up as from the > > physical interfaces (ie. slaves) > > - without PACKET_ORIGDEV: > > - non link local packets should show up as from bonding/team master > > > - link local packets on active slaves could show up on master or > > slave (probably master is required ifirc some earlier fix???) > > As far as I can see, they (LLCs coming on active slave) will show via > master as from master (w/o the socket option) or as from slave (with). > Think Vincent's tests confirmed it. > > > - link local packets on inactive slaves should show up on slave - > > and definitely not on master > > - I don't think I should ever see any PACKET_OTHERHOST packets > > > > - if you bind to master you should see packets from active slaves (ie. > > those that will get delivered to stack) > > - clearly you should not see non-link-local inactive slave packets > > (they'll be dropped) > > > - behaviour wrt. link local packets is more dicey... (I believe > > somewhere in these threads patches there was some description of what > > standard requires, but I don't off the top of my head remember) > > They must be seen, otherwise bonds attached to a linux bridge (I assume > enslaving an interface to a bridge essentially counts as bind) will be > blind to them (among those - e.g. to spanning tree information - this > was what originally caused issues for me last year). Even the bridge > code goes to extra length to not carelessly consume stp packets, if it's > not actively participating in stp (of any kind). Aside that, certain > [recent] bridge features like group_fwd_mask would be non-functional on > bond ports as well. > > As for standard (the one I quoted in the oldest thread) expected the > link-local packet to be both readable via master and slaves depending on > need (though it didn't go into exact gory details). Your current patch I > think cover all possible cases quite greacefully. > > > - for an active-backup bond it would seem logical to see only active > > links link local > > - for a multiple-active aggregate bond I'd be fine with seeing none, > > or all active slaves link local packets - I guess even though the > > later is confusing it makes sense > > - it might be okay to see link-local inactive slave packets, but I > > think I'd prefer not to (could be configurable though) - this would > > seem confusing/wrong to me. > > - and again I don't think I should see any PACKET_OTHERHOST packets here... > > - PACKET_ORIGDEV as above... > > > > I gave the above a fair amount of thought... but I'm not guaranteeing > > I didn't make typos, or write something utterly stupid or > > unimplementable or not how stuff should work for other reasons... > > Comments welcome. > > > > I think this continues to be in line with my proposal from earlier in > > the thread? > > (sorry for late reply) > > Yes, pretty much spot on. With some confirmation comments above. > > I did bridging tests yesterday (2 LACP bonds attached via separate > switches treating linux bridge as a shared segment in mstp scenario) and > all is working fine on my side as well. > > If everyone agrees with the proposed code, I will submit v2 patch with > added comment explaining basic logic (or you could submit if you prefer). > > > > > On Thu, Jan 17, 2019 at 4:27 PM Michal Soltys <soltys@ziu.info> wrote: > >> > >> On 19/01/14 03:01, Maciej Żenczykowski wrote: > >> > So I don't remember the specifics... > >> > > >> > (note I'm writing this all from memory without looking it up/testing > >> > it - I may be utterly wrong or dreaming) > >> > > >> > But I seem to recall that the core problem we were trying to solve was > >> > that a daemon listening > >> > on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device > >> > would not receive LLDP packets > >> > arriving on inactive bond slaves (either active-backup or lag). > >> > > >> > [inactive = link/carrier up, but not part of active aggregator] > >> > > >> > This made monitoring for miscabling harder (IFIRC the only non kernel > >> > fix was to get the daemon to create > >> > a separate AF_PACKET/88CC socket bound to every physical interface in > >> > the system, or monitor for > >> > inactive slaves and add extra packet sockets as needed). > >> > > >> > They would get re-parented to the master and then since the slave was > >> > inactive they would be considered RX_HANDLER_EXACT match only and not > >> > match the * interface. > >> > > >> > Honestly I wasn't aware of PACKET_ORIGDEV, although I don't think it > >> > helps in this case - AFAICR the packets never made it to the packet > >> > socket. > >> > > >> > Perhaps going from: > >> > /* don't change skb->dev for link-local packets */ > >> > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > >> > if (bond_should_deliver_exact_match(skb, slave, bond)) return > >> > RX_HANDLER_EXACT; > >> > > >> > to something more like: > >> > if (bond_should_deliver_exact_match(skb, slave, bond)) { > >> > /* don't change skb->dev for link-local packets on inactive slaves */ > >> > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS; > >> > return RX_HANDLER_EXACT; > >> > } > >> > >> Having checked the code (if I get the flow correctly), one > >> thing/question - currently with Mahesh's fixes, not bound LLDP listener > >> will receive all packets - both from active and inactive slaves directly > >> (as the check for suppression is done after the link-local check). > >> > >> The version above will do the suppression check first - so all inactive > >> slaves - excluding non-multi/non-broad ALB - will pass it and return > >> RX_HANDLER_PASS if the packet is link-local. So those will be available > >> w/o binding, but active slaves' packets will be available via master > >> device (but with working PACKET_ORIGDEV now - so slave device can be > >> retrieved easily). This is fine in your scenario I presume ? > >> > > >
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index a9d597f28023..290235587a0e 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1183,27 +1183,6 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb) } } - /* Link-local multicast packets should be passed to the - * stack on the link they arrive as well as pass them to the - * bond-master device. These packets are mostly usable when - * stack receives it with the link on which they arrive - * (e.g. LLDP) they also must be available on master. Some of - * the use cases include (but are not limited to): LLDP agents - * that must be able to operate both on enslaved interfaces as - * well as on bonds themselves; linux bridges that must be able - * to process/pass BPDUs from attached bonds when any kind of - * STP version is enabled on the network. - */ - if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) { - struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC); - - if (nskb) { - nskb->dev = bond->dev; - nskb->queue_mapping = 0; - netif_rx(nskb); - } - return RX_HANDLER_PASS; - } if (bond_should_deliver_exact_match(skb, slave, bond)) return RX_HANDLER_EXACT;