mbox series

[net-next,0/2] TLS TX HW offload for Bond

Message ID 20201115134251.4272-1-tariqt@nvidia.com
Headers show
Series TLS TX HW offload for Bond | expand

Message

Tariq Toukan Nov. 15, 2020, 1:42 p.m. UTC
Hi,

This series opens TLS TX HW offload for bond interfaces.
This allows bond interfaces to benefit from capable slave devices.

The first patch adds real_dev field in TLS context structure, and aligns
usages in TLS module and supporting drivers.
The second patch opens the offload for bond interfaces.

For the configuration above, SW kTLS keeps picking the same slave
To keep simple track of the HW and SW TLS contexts, we bind each socket to
a specific slave for the socket's whole lifetime. This is logically valid
(and similar to the SW kTLS behavior) in the following bond configuration, 
so we restrict the offload support to it:

((mode == balance-xor) or (mode == 802.3ad))
and xmit_hash_policy == layer3+4.

Regards,
Tariq

Tariq Toukan (2):
  net/tls: Add real_dev field to TLS context
  bond: Add TLS TX offload support

 drivers/net/bonding/bond_main.c               | 203 +++++++++++++++++-
 drivers/net/bonding/bond_options.c            |  10 +-
 .../chelsio/inline_crypto/ch_ktls/chcr_ktls.c |   2 +-
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c    |   2 +-
 include/net/bonding.h                         |   4 +
 include/net/tls.h                             |   1 +
 net/tls/tls_device.c                          |   2 +
 net/tls/tls_device_fallback.c                 |   2 +-
 8 files changed, 216 insertions(+), 10 deletions(-)

Comments

Jakub Kicinski Nov. 19, 2020, 12:02 a.m. UTC | #1
On Sun, 15 Nov 2020 15:42:49 +0200 Tariq Toukan wrote:
> This series opens TLS TX HW offload for bond interfaces.
> This allows bond interfaces to benefit from capable slave devices.
> 
> The first patch adds real_dev field in TLS context structure, and aligns
> usages in TLS module and supporting drivers.
> The second patch opens the offload for bond interfaces.
> 
> For the configuration above, SW kTLS keeps picking the same slave
> To keep simple track of the HW and SW TLS contexts, we bind each socket to
> a specific slave for the socket's whole lifetime. This is logically valid
> (and similar to the SW kTLS behavior) in the following bond configuration, 
> so we restrict the offload support to it:
> 
> ((mode == balance-xor) or (mode == 802.3ad))
> and xmit_hash_policy == layer3+4.

This does not feel extremely clean, maybe you can convince me otherwise.

Can we extend netdev_get_xmit_slave() and figure out the output dev
(and if it's "stable") in a more generic way? And just feed that dev
into TLS handling? All non-crypto upper SW devs should be safe to cross
with .decrypted = 1 skbs, right?
Tariq Toukan Nov. 19, 2020, 3:59 p.m. UTC | #2
On 11/19/2020 2:02 AM, Jakub Kicinski wrote:
> On Sun, 15 Nov 2020 15:42:49 +0200 Tariq Toukan wrote:
>> This series opens TLS TX HW offload for bond interfaces.
>> This allows bond interfaces to benefit from capable slave devices.
>>
>> The first patch adds real_dev field in TLS context structure, and aligns
>> usages in TLS module and supporting drivers.
>> The second patch opens the offload for bond interfaces.
>>
>> For the configuration above, SW kTLS keeps picking the same slave
>> To keep simple track of the HW and SW TLS contexts, we bind each socket to
>> a specific slave for the socket's whole lifetime. This is logically valid
>> (and similar to the SW kTLS behavior) in the following bond configuration,
>> so we restrict the offload support to it:
>>
>> ((mode == balance-xor) or (mode == 802.3ad))
>> and xmit_hash_policy == layer3+4.
> 
> This does not feel extremely clean, maybe you can convince me otherwise.
> 
> Can we extend netdev_get_xmit_slave() and figure out the output dev
> (and if it's "stable") in a more generic way? And just feed that dev
> into TLS handling? 

Hi Jakub,

I don't see we go through netdev_get_xmit_slave(), but through 
.ndo_start_xmit (bond_start_xmit). Currently I have my check there to 
catch all skbs belonging to offloaded TLS sockets.

The TLS offload get_slave() logic decision is per socket, so the result 
cannot be saved in the bond memory. Currently I save the real_dev field 
in the TLS context structure.
One way to make it more generic is to save it on the sock structure. I 
agree that this replaces the TLS-specific logic, but demands increasing 
the sock struct, and has larger impact on all other flows...

What do you think?
If we decide to go with this, I can provide the patches.

> All non-crypto upper SW devs should be safe to cross
> with .decrypted = 1 skbs, right?
> 

AFAIU yes.
Jakub Kicinski Nov. 19, 2020, 4:38 p.m. UTC | #3
On Thu, 19 Nov 2020 17:59:38 +0200 Tariq Toukan wrote:
> On 11/19/2020 2:02 AM, Jakub Kicinski wrote:
> > On Sun, 15 Nov 2020 15:42:49 +0200 Tariq Toukan wrote:  
> >> This series opens TLS TX HW offload for bond interfaces.
> >> This allows bond interfaces to benefit from capable slave devices.
> >>
> >> The first patch adds real_dev field in TLS context structure, and aligns
> >> usages in TLS module and supporting drivers.
> >> The second patch opens the offload for bond interfaces.
> >>
> >> For the configuration above, SW kTLS keeps picking the same slave
> >> To keep simple track of the HW and SW TLS contexts, we bind each socket to
> >> a specific slave for the socket's whole lifetime. This is logically valid
> >> (and similar to the SW kTLS behavior) in the following bond configuration,
> >> so we restrict the offload support to it:
> >>
> >> ((mode == balance-xor) or (mode == 802.3ad))
> >> and xmit_hash_policy == layer3+4.  
> > 
> > This does not feel extremely clean, maybe you can convince me otherwise.
> > 
> > Can we extend netdev_get_xmit_slave() and figure out the output dev
> > (and if it's "stable") in a more generic way? And just feed that dev
> > into TLS handling?   
> 
> I don't see we go through netdev_get_xmit_slave(), but through 
> .ndo_start_xmit (bond_start_xmit).

I may be misunderstanding the purpose of netdev_get_xmit_slave(),
please correct me if I'm wrong. AFAIU it's supposed to return a
lower netdev that the skb should then be xmited on.

So what I was thinking was either construct an skb or somehow reshuffle
the netdev_get_xmit_slave() code to take a flow dissector output or
${insert other ideas}. Then add a helper in the core that would drill
down from the socket netdev to the "egress" netdev. Have TLS call
that helper, and talk to the "egress" netdev from the start, rather
than the socket's netdev. Then loosen the checks on software devices.

I'm probably missing the problem you're trying to explain to me :S

Side note - Jarod, I'd be happy to take a patch renaming
netdev_get_xmit_slave() and the ndo, if you have the cycles to send 
a patch. It's a recent addition, and in the core we should make more 
of an effort to avoid sensitive terms.

> Currently I have my check there to 
> catch all skbs belonging to offloaded TLS sockets.
> 
> The TLS offload get_slave() logic decision is per socket, so the result 
> cannot be saved in the bond memory. Currently I save the real_dev field 
> in the TLS context structure.

Right, but we could just have ctx->netdev be the "egress" netdev
always, right? Do we expect somewhere that it's going to be matching
the socket's dst?

> One way to make it more generic is to save it on the sock structure. I 
> agree that this replaces the TLS-specific logic, but demands increasing 
> the sock struct, and has larger impact on all other flows...
Tariq Toukan Nov. 22, 2020, 12:48 p.m. UTC | #4
On 11/19/2020 6:38 PM, Jakub Kicinski wrote:
> On Thu, 19 Nov 2020 17:59:38 +0200 Tariq Toukan wrote:
>> On 11/19/2020 2:02 AM, Jakub Kicinski wrote:
>>> On Sun, 15 Nov 2020 15:42:49 +0200 Tariq Toukan wrote:
>>>> This series opens TLS TX HW offload for bond interfaces.
>>>> This allows bond interfaces to benefit from capable slave devices.
>>>>
>>>> The first patch adds real_dev field in TLS context structure, and aligns
>>>> usages in TLS module and supporting drivers.
>>>> The second patch opens the offload for bond interfaces.
>>>>
>>>> For the configuration above, SW kTLS keeps picking the same slave
>>>> To keep simple track of the HW and SW TLS contexts, we bind each socket to
>>>> a specific slave for the socket's whole lifetime. This is logically valid
>>>> (and similar to the SW kTLS behavior) in the following bond configuration,
>>>> so we restrict the offload support to it:
>>>>
>>>> ((mode == balance-xor) or (mode == 802.3ad))
>>>> and xmit_hash_policy == layer3+4.
>>>
>>> This does not feel extremely clean, maybe you can convince me otherwise.
>>>
>>> Can we extend netdev_get_xmit_slave() and figure out the output dev
>>> (and if it's "stable") in a more generic way? And just feed that dev
>>> into TLS handling?
>>
>> I don't see we go through netdev_get_xmit_slave(), but through
>> .ndo_start_xmit (bond_start_xmit).
> 
> I may be misunderstanding the purpose of netdev_get_xmit_slave(),
> please correct me if I'm wrong. AFAIU it's supposed to return a
> lower netdev that the skb should then be xmited on.

That's true. It was recently added and used by the RDMA team. Not used 
or integrated in the Eth networking stack.

> So what I was thinking was either construct an skb or somehow reshuffle
> the netdev_get_xmit_slave() code to take a flow dissector output or
> ${insert other ideas}. Then add a helper in the core that would drill
> down from the socket netdev to the "egress" netdev. Have TLS call
> that helper, and talk to the "egress" netdev from the start, rather
> than the socket's netdev. Then loosen the checks on software devices.

As I understand it, best if we can even generalize this to apply to all 
kinds of traffic: bond driver won't do the xmit itself anymore, it just 
picks an egress dev and returns it. The core infrastructure will call 
the xmit function for the egress dev.

I like the idea, it can generalize code structures for all kinds of 
upper-devices and sockets, taking them into a common place in core, 
which reduces code duplications.

If we go only half the way, i.e. keep xmit logic in bond for 
non-TLS-offloaded traffic, then we have to let TLS module (and others in 
the future) act deferentially for different kinds of devs (upper/lower) 
which IMHO reduces generality.

I'm in favor of the deeper change. It will be on a larger scale, and 
totally orthogonal to the current TLS offload support in bond.

If we decide to apply the idea only to TLS sockets (or any subset of 
sockets) we're actually taking a generic one-flow (the xmit patch of a 
bond dev) and turning it into two (or potentially more) flows, depending 
on the socket type. This also reduces generality.

> 
> I'm probably missing the problem you're trying to explain to me :S
> 

I kept the patch minimal, and kept the TLS offload logic internal to the 
bond driver, just like it is internal to the device drivers (mlx5e, and 
others), with no core infrastructure modification.

> Side note - Jarod, I'd be happy to take a patch renaming
> netdev_get_xmit_slave() and the ndo, if you have the cycles to send
> a patch. It's a recent addition, and in the core we should make more
> of an effort to avoid sensitive terms.
> 
>> Currently I have my check there to
>> catch all skbs belonging to offloaded TLS sockets.
>>
>> The TLS offload get_slave() logic decision is per socket, so the result
>> cannot be saved in the bond memory. Currently I save the real_dev field
>> in the TLS context structure.
> 
> Right, but we could just have ctx->netdev be the "egress" netdev
> always, right? Do we expect somewhere that it's going to be matching
> the socket's dst?
> 

So once the offload context is established we totally bypass the bond 
dev? and lose interaction or reference to it?
What if the egress dev is detached form the bond? We must then be 
notified somehow.

>> One way to make it more generic is to save it on the sock structure. I
>> agree that this replaces the TLS-specific logic, but demands increasing
>> the sock struct, and has larger impact on all other flows...
>
Jakub Kicinski Nov. 23, 2020, 6:20 p.m. UTC | #5
On Sun, 22 Nov 2020 14:48:04 +0200 Tariq Toukan wrote:
> On 11/19/2020 6:38 PM, Jakub Kicinski wrote:
> > On Thu, 19 Nov 2020 17:59:38 +0200 Tariq Toukan wrote:  
> >> On 11/19/2020 2:02 AM, Jakub Kicinski wrote:  
> >>> On Sun, 15 Nov 2020 15:42:49 +0200 Tariq Toukan wrote:  
> >>>> This series opens TLS TX HW offload for bond interfaces.
> >>>> This allows bond interfaces to benefit from capable slave devices.
> >>>>
> >>>> The first patch adds real_dev field in TLS context structure, and aligns
> >>>> usages in TLS module and supporting drivers.
> >>>> The second patch opens the offload for bond interfaces.
> >>>>
> >>>> For the configuration above, SW kTLS keeps picking the same slave
> >>>> To keep simple track of the HW and SW TLS contexts, we bind each socket to
> >>>> a specific slave for the socket's whole lifetime. This is logically valid
> >>>> (and similar to the SW kTLS behavior) in the following bond configuration,
> >>>> so we restrict the offload support to it:
> >>>>
> >>>> ((mode == balance-xor) or (mode == 802.3ad))
> >>>> and xmit_hash_policy == layer3+4.  
> >>>
> >>> This does not feel extremely clean, maybe you can convince me otherwise.
> >>>
> >>> Can we extend netdev_get_xmit_slave() and figure out the output dev
> >>> (and if it's "stable") in a more generic way? And just feed that dev
> >>> into TLS handling?  
> >>
> >> I don't see we go through netdev_get_xmit_slave(), but through
> >> .ndo_start_xmit (bond_start_xmit).  
> > 
> > I may be misunderstanding the purpose of netdev_get_xmit_slave(),
> > please correct me if I'm wrong. AFAIU it's supposed to return a
> > lower netdev that the skb should then be xmited on.  
> 
> That's true. It was recently added and used by the RDMA team. Not used 
> or integrated in the Eth networking stack.
> 
> > So what I was thinking was either construct an skb or somehow reshuffle
> > the netdev_get_xmit_slave() code to take a flow dissector output or
> > ${insert other ideas}. Then add a helper in the core that would drill
> > down from the socket netdev to the "egress" netdev. Have TLS call
> > that helper, and talk to the "egress" netdev from the start, rather
> > than the socket's netdev. Then loosen the checks on software devices.  
> 
> As I understand it, best if we can even generalize this to apply to all 
> kinds of traffic: bond driver won't do the xmit itself anymore, it just 
> picks an egress dev and returns it. The core infrastructure will call 
> the xmit function for the egress dev.

I think you went way further than I was intending :) I was only
considering the control path. Leave the datapath unchanged.

AFAIK you're making 3 changes:
 - forwarding tls ops
 - pinning flows
 - handling features

Pinning of the TLS device to a leg of the bond looks like ~15LoC.
I think we can live with that.

It's the 150 LoC of forwarding TLS ops and duplicating dev selection
logic in bond_sk_hash_l34() that I'd rather avoid.

Handling features is probably fine, too, I haven't thought about that
much.

> I like the idea, it can generalize code structures for all kinds of 
> upper-devices and sockets, taking them into a common place in core, 
> which reduces code duplications.
> 
> If we go only half the way, i.e. keep xmit logic in bond for 
> non-TLS-offloaded traffic, then we have to let TLS module (and others in 
> the future) act deferentially for different kinds of devs (upper/lower) 
> which IMHO reduces generality.

How so? I was expecting TLS to just do something like:

	netdev = sk_get_xmit_dev_lowest(sk);

which would recursively call get_xmit_slave(CONST) until it reaches
a device which doesn't resolve further.

BTW is the flow pinning to bond legs actually a must-do? I don't know
much about bonding but wouldn't that mean that if the selected leg goes
down we'd lose connectivity, rather than falling back to SW crypto?

> I'm in favor of the deeper change. It will be on a larger scale, and 
> totally orthogonal to the current TLS offload support in bond.
> 
> If we decide to apply the idea only to TLS sockets (or any subset of 
> sockets) we're actually taking a generic one-flow (the xmit patch of a 
> bond dev) and turning it into two (or potentially more) flows, depending 
> on the socket type. This also reduces generality.

I don't follow this part.

> > I'm probably missing the problem you're trying to explain to me :S
> 
> I kept the patch minimal, and kept the TLS offload logic internal to the 
> bond driver, just like it is internal to the device drivers (mlx5e, and 
> others), with no core infrastructure modification.
> 
> > Side note - Jarod, I'd be happy to take a patch renaming
> > netdev_get_xmit_slave() and the ndo, if you have the cycles to send
> > a patch. It's a recent addition, and in the core we should make more
> > of an effort to avoid sensitive terms.
> >   
> >> Currently I have my check there to
> >> catch all skbs belonging to offloaded TLS sockets.
> >>
> >> The TLS offload get_slave() logic decision is per socket, so the result
> >> cannot be saved in the bond memory. Currently I save the real_dev field
> >> in the TLS context structure.  
> > 
> > Right, but we could just have ctx->netdev be the "egress" netdev
> > always, right? Do we expect somewhere that it's going to be matching
> > the socket's dst?
> 
> So once the offload context is established we totally bypass the bond 
> dev? and lose interaction or reference to it?

Yup, I don't think we need it.

> What if the egress dev is detached form the bond? We must then be 
> notified somehow.

Do we notify TLS when routing changes? I think it's a separate topic. 

If we have the code to "un-offload" a flow we could handle clearing
features better and notify from sk_validate_xmit_skb that the flow
started hitting unexpected dev, hence it should be re-offloaded.

I don't think we need an explicit invalidation from the particular
drivers here.
Tariq Toukan Nov. 24, 2020, 3:08 p.m. UTC | #6
On 11/23/2020 8:20 PM, Jakub Kicinski wrote:
> On Sun, 22 Nov 2020 14:48:04 +0200 Tariq Toukan wrote:
>> On 11/19/2020 6:38 PM, Jakub Kicinski wrote:
>>> On Thu, 19 Nov 2020 17:59:38 +0200 Tariq Toukan wrote:
>>>> On 11/19/2020 2:02 AM, Jakub Kicinski wrote:
>>>>> On Sun, 15 Nov 2020 15:42:49 +0200 Tariq Toukan wrote:
>>>>>> This series opens TLS TX HW offload for bond interfaces.
>>>>>> This allows bond interfaces to benefit from capable slave devices.
>>>>>>
>>>>>> The first patch adds real_dev field in TLS context structure, and aligns
>>>>>> usages in TLS module and supporting drivers.
>>>>>> The second patch opens the offload for bond interfaces.
>>>>>>
>>>>>> For the configuration above, SW kTLS keeps picking the same slave
>>>>>> To keep simple track of the HW and SW TLS contexts, we bind each socket to
>>>>>> a specific slave for the socket's whole lifetime. This is logically valid
>>>>>> (and similar to the SW kTLS behavior) in the following bond configuration,
>>>>>> so we restrict the offload support to it:
>>>>>>
>>>>>> ((mode == balance-xor) or (mode == 802.3ad))
>>>>>> and xmit_hash_policy == layer3+4.
>>>>>
>>>>> This does not feel extremely clean, maybe you can convince me otherwise.
>>>>>
>>>>> Can we extend netdev_get_xmit_slave() and figure out the output dev
>>>>> (and if it's "stable") in a more generic way? And just feed that dev
>>>>> into TLS handling?
>>>>
>>>> I don't see we go through netdev_get_xmit_slave(), but through
>>>> .ndo_start_xmit (bond_start_xmit).
>>>
>>> I may be misunderstanding the purpose of netdev_get_xmit_slave(),
>>> please correct me if I'm wrong. AFAIU it's supposed to return a
>>> lower netdev that the skb should then be xmited on.
>>
>> That's true. It was recently added and used by the RDMA team. Not used
>> or integrated in the Eth networking stack.
>>
>>> So what I was thinking was either construct an skb or somehow reshuffle
>>> the netdev_get_xmit_slave() code to take a flow dissector output or
>>> ${insert other ideas}. Then add a helper in the core that would drill
>>> down from the socket netdev to the "egress" netdev. Have TLS call
>>> that helper, and talk to the "egress" netdev from the start, rather
>>> than the socket's netdev. Then loosen the checks on software devices.
>>
>> As I understand it, best if we can even generalize this to apply to all
>> kinds of traffic: bond driver won't do the xmit itself anymore, it just
>> picks an egress dev and returns it. The core infrastructure will call
>> the xmit function for the egress dev.
> 
> I think you went way further than I was intending :) I was only
> considering the control path. Leave the datapath unchanged.
> 
> AFAIK you're making 3 changes:
>   - forwarding tls ops
>   - pinning flows
>   - handling features
> 
> Pinning of the TLS device to a leg of the bond looks like ~15LoC.
> I think we can live with that.
> 

Good.
You mean the changes under __bond_start_xmit ?

> It's the 150 LoC of forwarding TLS ops and duplicating dev selection
> logic in bond_sk_hash_l34() that I'd rather avoid.
> 

I see.
But there are several issues with this:

1. The .ndo_get_xmit_slave acts on an SKB, not a socket. Hence, it 
doesn't fit for the stage of calling tls_dev_add, unless the ndo goes 
through some refactoring before the feature itself.

2. Existing hash logic acts on an SKB. We must have one that acts on a 
socket to be used inside get_slave(sk). Hence, I don't really see how 
the logic under bond_sk_hash_l34() are going to disappear, maybe just 
move around to a new place.


> Handling features is probably fine, too, I haven't thought about that
> much.
> 

Good.

>> I like the idea, it can generalize code structures for all kinds of
>> upper-devices and sockets, taking them into a common place in core,
>> which reduces code duplications.
>>
>> If we go only half the way, i.e. keep xmit logic in bond for
>> non-TLS-offloaded traffic, then we have to let TLS module (and others in
>> the future) act deferentially for different kinds of devs (upper/lower)
>> which IMHO reduces generality.
> 
> How so? I was expecting TLS to just do something like:
> 
> 	netdev = sk_get_xmit_dev_lowest(sk);
> 
> which would recursively call get_xmit_slave(CONST) until it reaches
> a device which doesn't resolve further.
> 
> BTW is the flow pinning to bond legs actually a must-do? I don't know
> much about bonding but wouldn't that mean that if the selected leg goes
> down we'd lose connectivity, rather than falling back to SW crypto?
> 

Right. As long as we don't have logic for un-offloading.
Currently in TLS, the device-offloaded connections has some 
"independence" once they are created, it's hard to modify them and apply 
configuration modifications to them (example: interaction with tx csum 
offload).
So I think there is a missing un-offloading mechanism in TLS that should 
address all of these together.

This fits with your comments below.

>> I'm in favor of the deeper change. It will be on a larger scale, and
>> totally orthogonal to the current TLS offload support in bond.
>>
>> If we decide to apply the idea only to TLS sockets (or any subset of
>> sockets) we're actually taking a generic one-flow (the xmit patch of a
>> bond dev) and turning it into two (or potentially more) flows, depending
>> on the socket type. This also reduces generality.
> 
> I don't follow this part.
> 
>>> I'm probably missing the problem you're trying to explain to me :S
>>
>> I kept the patch minimal, and kept the TLS offload logic internal to the
>> bond driver, just like it is internal to the device drivers (mlx5e, and
>> others), with no core infrastructure modification.
>>
>>> Side note - Jarod, I'd be happy to take a patch renaming
>>> netdev_get_xmit_slave() and the ndo, if you have the cycles to send
>>> a patch. It's a recent addition, and in the core we should make more
>>> of an effort to avoid sensitive terms.
>>>    
>>>> Currently I have my check there to
>>>> catch all skbs belonging to offloaded TLS sockets.
>>>>
>>>> The TLS offload get_slave() logic decision is per socket, so the result
>>>> cannot be saved in the bond memory. Currently I save the real_dev field
>>>> in the TLS context structure.
>>>
>>> Right, but we could just have ctx->netdev be the "egress" netdev
>>> always, right? Do we expect somewhere that it's going to be matching
>>> the socket's dst?
>>
>> So once the offload context is established we totally bypass the bond
>> dev? and lose interaction or reference to it?
> 
> Yup, I don't think we need it.
> 
>> What if the egress dev is detached form the bond? We must then be
>> notified somehow.
> 
> Do we notify TLS when routing changes? I think it's a separate topic.
> 
> If we have the code to "un-offload" a flow we could handle clearing
> features better and notify from sk_validate_xmit_skb that the flow
> started hitting unexpected dev, hence it should be re-offloaded.
> 
> I don't think we need an explicit invalidation from the particular
> drivers here.
> 

Agree.
Boris Pismenny Nov. 30, 2020, 7:35 a.m. UTC | #7
On 23/11/2020 20:20, Jakub Kicinski wrote:
> On Sun, 22 Nov 2020 14:48:04 +0200 Tariq Toukan wrote:
>>
>> As I understand it, best if we can even generalize this to apply to all 
>> kinds of traffic: bond driver won't do the xmit itself anymore, it just 
>> picks an egress dev and returns it. The core infrastructure will call 
>> the xmit function for the egress dev.
> I think you went way further than I was intending :) I was only
> considering the control path. Leave the datapath unchanged.
>
> AFAIK you're making 3 changes:
>  - forwarding tls ops
>  - pinning flows
>  - handling features
>
> Pinning of the TLS device to a leg of the bond looks like ~15LoC.
> I think we can live with that.
>
> It's the 150 LoC of forwarding TLS ops and duplicating dev selection
> logic in bond_sk_hash_l34() that I'd rather avoid.
>
> Handling features is probably fine, too, I haven't thought about that
> much.

Sorry for jumping in late, but I'd like to present an argument in favor of the approach in the original patch-set, as it may have been overlooked.

The forwarding of TLS ops approach is very flexible, and it will enable support for per-SKB hashing in the future (high-availability): This will require taking ooo_okay into consideration and offloading the context to more than one NIC. But, I think its doable. Even though this approach requires more lines of code, it is already used by other offloads. For instance, XFRM offload in bond_main.c.


>> I like the idea, it can generalize code structures for all kinds of 
>> upper-devices and sockets, taking them into a common place in core, 
>> which reduces code duplications.
>>
>> If we go only half the way, i.e. keep xmit logic in bond for 
>> non-TLS-offloaded traffic, then we have to let TLS module (and others in 
>> the future) act deferentially for different kinds of devs (upper/lower) 
>> which IMHO reduces generality.
> How so? I was expecting TLS to just do something like:
>
> 	netdev = sk_get_xmit_dev_lowest(sk);
>
> which would recursively call get_xmit_slave(CONST) until it reaches
> a device which doesn't resolve further.
>
> BTW is the flow pinning to bond legs actually a must-do? I don't know
> much about bonding but wouldn't that mean that if the selected leg goes
> down we'd lose connectivity, rather than falling back to SW crypto?

It is definitely not a must, and I think we should remove it in the future, once the use-case presents itself.


>> What if the egress dev is detached form the bond? We must then be 
>> notified somehow.
> Do we notify TLS when routing changes? I think it's a separate topic. 
>
> If we have the code to "un-offload" a flow we could handle clearing
> features better and notify from sk_validate_xmit_skb that the flow
> started hitting unexpected dev, hence it should be re-offloaded.
>
> I don't think we need an explicit invalidation from the particular
> drivers here.

Even though re-offload is not exercised, it is possible:
if packets are not using offload by the old netdev, then remove offload from it, and add offload to the new netdev.
A resync, will likely follow, after which offload continue on the new netdev.

The question is who identifies/decides when to re-offload. One option is that the bond driver will trigger it.