[net-next,v2,0/3] ] TC datapath hash api

Message ID	20200701184719.8421-1-lariel@mellanox.com
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Ariel Levkovich <lariel@mellanox.com> To: netdev@vger.kernel.org Cc: jiri@resnulli.us, kuba@kernel.org, jhs@mojatatu.com, xiyou.wangcong@gmail.com, ast@kernel.org, daniel@iogearbox.net, Ariel Levkovich <lariel@mellanox.com> Subject: [PATCH net-next v2 0/3] ] TC datapath hash api Date: Wed, 1 Jul 2020 21:47:16 +0300 Message-Id: <20200701184719.8421-1-lariel@mellanox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	] TC datapath hash api \| expand [net-next,v2,0/3] ] TC datapath hash api [net-next,v2,1/3] net/sched: Introduce action hash [net-next,v2,2/3] net/flow_dissector: add packet hash dissection [net-next,v2,3/3] net/sched: cls_flower: Add hash info to flow classification

Ariel Levkovich July 1, 2020, 6:47 p.m. UTC

Supporting datapath hash allows user to set up rules that provide
load balancing of traffic across multiple vports and for ECMP path
selection while keeping the number of rule at minimum.

Instead of matching on exact flow spec, which requires a rule per
flow, user can define rules based on hashing on the packet headers
and distribute the flows to different buckets. The number of rules
in this case will be constant and equal to the number of buckets.

The datapath hash functionality is achieved in two steps -
performing the hash action and then matching on the result, as
part of the packet's classification.

The api allows user to define a filter with a tc hash action
where the hash function can be standard asymetric hashing that Linux
offers or alternatively user can provide a bpf program that
performs hash calculation on a packet.

Usage is as follows:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash bpf object-file <file> \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto udp \
action hash bpf asym_l4 basis <basis> \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

Ariel Levkovich (3):
  net/sched: Introduce action hash
  net/flow_dissector: add packet hash dissection
  net/sched: cls_flower: Add hash info to flow classification

 include/linux/skbuff.h              |   4 +
 include/net/act_api.h               |   2 +
 include/net/flow_dissector.h        |   9 +
 include/net/tc_act/tc_hash.h        |  22 ++
 include/uapi/linux/pkt_cls.h        |   4 +
 include/uapi/linux/tc_act/tc_hash.h |  32 +++
 net/core/flow_dissector.c           |  17 ++
 net/sched/Kconfig                   |  11 +
 net/sched/Makefile                  |   1 +
 net/sched/act_hash.c                | 389 ++++++++++++++++++++++++++++
 net/sched/cls_api.c                 |   1 +
 net/sched/cls_flower.c              |  16 ++
 12 files changed, 508 insertions(+)
 create mode 100644 include/net/tc_act/tc_hash.h
 create mode 100644 include/uapi/linux/tc_act/tc_hash.h
 create mode 100644 net/sched/act_hash.c

Jamal Hadi Salim July 3, 2020, 11:22 a.m. UTC | #1

Hi,

Several comments:
1) I agree with previous comments that you should
look at incorporating this into skbedit.
Unless incorporating into skbedit introduces huge
complexity, IMO it belongs there.

2) I think it would make sense to create a skb hash classifier
instead of tying this entirely to flower i.e i should not
have to change u32 just so i can support hash classification.
So policy would be something of the sort:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action skbedit hash bpf object-file <file> \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
handle 0x0 skbhash  flowid 1:11 mask 0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
handle 0x1 skbhash  flowid 1:11 mask 0xf  \
action mirred egress redirect dev ens1f0_2

IOW, we maintain current modularity as opposed
to dumping everything into flower.
Ive always wanted to write the skbhash classifier but
time was scarce. At one point i had some experiment
where I would copy skb hash into mark in the driver
and use fw classifier for further processing.
It was ugly.

cheers,
jamal

On 2020-07-01 2:47 p.m., Ariel Levkovich wrote:
> Supporting datapath hash allows user to set up rules that provide
> load balancing of traffic across multiple vports and for ECMP path
> selection while keeping the number of rule at minimum.
> 
> Instead of matching on exact flow spec, which requires a rule per
> flow, user can define rules based on hashing on the packet headers
> and distribute the flows to different buckets. The number of rules
> in this case will be constant and equal to the number of buckets.
> 
> The datapath hash functionality is achieved in two steps -
> performing the hash action and then matching on the result, as
> part of the packet's classification.
> 
> The api allows user to define a filter with a tc hash action
> where the hash function can be standard asymetric hashing that Linux
> offers or alternatively user can provide a bpf program that
> performs hash calculation on a packet.
> 
> Usage is as follows:
> 
> $ tc filter add dev ens1f0_0 ingress \
> prio 1 chain 0 proto ip \
> flower ip_proto tcp \
> action hash bpf object-file <file> \
> action goto chain 2
> 
> $ tc filter add dev ens1f0_0 ingress \
> prio 1 chain 0 proto ip \
> flower ip_proto udp \
> action hash bpf asym_l4 basis <basis> \
> action goto chain 2
> 
> $ tc filter add dev ens1f0_0 ingress \
> prio 1 chain 2 proto ip \
> flower hash 0x0/0xf  \
> action mirred egress redirect dev ens1f0_1
> 
> $ tc filter add dev ens1f0_0 ingress \
> prio 1 chain 2 proto ip \
> flower hash 0x1/0xf  \
> action mirred egress redirect dev ens1f0_2
> 
> Ariel Levkovich (3):
>    net/sched: Introduce action hash
>    net/flow_dissector: add packet hash dissection
>    net/sched: cls_flower: Add hash info to flow classification
> 
>   include/linux/skbuff.h              |   4 +
>   include/net/act_api.h               |   2 +
>   include/net/flow_dissector.h        |   9 +
>   include/net/tc_act/tc_hash.h        |  22 ++
>   include/uapi/linux/pkt_cls.h        |   4 +
>   include/uapi/linux/tc_act/tc_hash.h |  32 +++
>   net/core/flow_dissector.c           |  17 ++
>   net/sched/Kconfig                   |  11 +
>   net/sched/Makefile                  |   1 +
>   net/sched/act_hash.c                | 389 ++++++++++++++++++++++++++++
>   net/sched/cls_api.c                 |   1 +
>   net/sched/cls_flower.c              |  16 ++
>   12 files changed, 508 insertions(+)
>   create mode 100644 include/net/tc_act/tc_hash.h
>   create mode 100644 include/uapi/linux/tc_act/tc_hash.h
>   create mode 100644 net/sched/act_hash.c
>

Ariel Levkovich July 5, 2020, 5:26 p.m. UTC | #2

On 7/3/20 7:22 AM, Jamal Hadi Salim wrote:
> Hi,
>
> Several comments:
> 1) I agree with previous comments that you should
> look at incorporating this into skbedit.
> Unless incorporating into skbedit introduces huge
> complexity, IMO it belongs there.

Hi Jamal,

I agree that using skbedit makes some sense and can provide the same 
functionality.

However I believe that from a concept point of view, using it is wrong.

In my honest opinion, the concept here is to perform some calculation on 
the packet itself and its headers while the skb->hash field

is the storage location of the calculation result (in SW).

Furthermore, looking forward to HW offload support, the HW devices will 
be offloading the hash calculation and

not rewriting skb metadata fields. Therefore the action should be the 
hash, not skbedit.

Another thing that I can mention, which is kind of related to what I 
wrote above, is that for all existing skbedit supported fields,

user typically provides a desired value of his choosing to set to a skb 
metadata field.

Here, the value is unknown and probably not a real concern to the user.


To sum it up, I look at this as performing some operation on the packet 
rather then just

setting an skb metadata field and therefore it requires an explicit, new 
action.


What do you think?


>
> 2) I think it would make sense to create a skb hash classifier
> instead of tying this entirely to flower i.e i should not
> have to change u32 just so i can support hash classification.
> So policy would be something of the sort:
>
> $ tc filter add dev ens1f0_0 ingress \
> prio 1 chain 0 proto ip \
> flower ip_proto tcp \
> action skbedit hash bpf object-file <file> \
> action goto chain 2
>
> $ tc filter add dev ens1f0_0 ingress \
> prio 1 chain 2 proto ip \
> handle 0x0 skbhash  flowid 1:11 mask 0xf  \
> action mirred egress redirect dev ens1f0_1
>
> $ tc filter add dev ens1f0_0 ingress \
> prio 1 chain 2 proto ip \
> handle 0x1 skbhash  flowid 1:11 mask 0xf  \
> action mirred egress redirect dev ens1f0_2
>
> IOW, we maintain current modularity as opposed
> to dumping everything into flower.
> Ive always wanted to write the skbhash classifier but
> time was scarce. At one point i had some experiment
> where I would copy skb hash into mark in the driver
> and use fw classifier for further processing.
> It was ugly.

I agree but perhaps we should make it a separate effort and not block 
this one.

I still think we should have support via flower. This is the HW offload 
path eventually.


Regards,

Ariel


> cheers,
> jamal
>
> On 2020-07-01 2:47 p.m., Ariel Levkovich wrote:
>> Supporting datapath hash allows user to set up rules that provide
>> load balancing of traffic across multiple vports and for ECMP path
>> selection while keeping the number of rule at minimum.
>>
>> Instead of matching on exact flow spec, which requires a rule per
>> flow, user can define rules based on hashing on the packet headers
>> and distribute the flows to different buckets. The number of rules
>> in this case will be constant and equal to the number of buckets.
>>
>> The datapath hash functionality is achieved in two steps -
>> performing the hash action and then matching on the result, as
>> part of the packet's classification.
>>
>> The api allows user to define a filter with a tc hash action
>> where the hash function can be standard asymetric hashing that Linux
>> offers or alternatively user can provide a bpf program that
>> performs hash calculation on a packet.
>>
>> Usage is as follows:
>>
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 0 proto ip \
>> flower ip_proto tcp \
>> action hash bpf object-file <file> \
>> action goto chain 2
>>
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 0 proto ip \
>> flower ip_proto udp \
>> action hash bpf asym_l4 basis <basis> \
>> action goto chain 2
>>
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 2 proto ip \
>> flower hash 0x0/0xf  \
>> action mirred egress redirect dev ens1f0_1
>>
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 2 proto ip \
>> flower hash 0x1/0xf  \
>> action mirred egress redirect dev ens1f0_2
>>
>> Ariel Levkovich (3):
>>    net/sched: Introduce action hash
>>    net/flow_dissector: add packet hash dissection
>>    net/sched: cls_flower: Add hash info to flow classification
>>
>>   include/linux/skbuff.h              |   4 +
>>   include/net/act_api.h               |   2 +
>>   include/net/flow_dissector.h        |   9 +
>>   include/net/tc_act/tc_hash.h        |  22 ++
>>   include/uapi/linux/pkt_cls.h        |   4 +
>>   include/uapi/linux/tc_act/tc_hash.h |  32 +++
>>   net/core/flow_dissector.c           |  17 ++
>>   net/sched/Kconfig                   |  11 +
>>   net/sched/Makefile                  |   1 +
>>   net/sched/act_hash.c                | 389 ++++++++++++++++++++++++++++
>>   net/sched/cls_api.c                 |   1 +
>>   net/sched/cls_flower.c              |  16 ++
>>   12 files changed, 508 insertions(+)
>>   create mode 100644 include/net/tc_act/tc_hash.h
>>   create mode 100644 include/uapi/linux/tc_act/tc_hash.h
>>   create mode 100644 net/sched/act_hash.c
>>
>

Jamal Hadi Salim July 5, 2020, 9:50 p.m. UTC | #3

Hi Ariel,

On 2020-07-05 1:26 p.m., Ariel Levkovich wrote:
> 
> On 7/3/20 7:22 AM, Jamal Hadi Salim wrote:
[..]
> Hi Jamal,
> 
> I agree that using skbedit makes some sense and can provide the same 
> functionality.
> 
> However I believe that from a concept point of view, using it is wrong.
> 
> In my honest opinion, the concept here is to perform some calculation on 
> the packet itself and its headers while the skb->hash field
> 
> is the storage location of the calculation result (in SW).
> 
> Furthermore, looking forward to HW offload support, the HW devices will 
> be offloading the hash calculation and
> 
> not rewriting skb metadata fields. Therefore the action should be the 
> hash, not skbedit.
> 
> Another thing that I can mention, which is kind of related to what I 
> wrote above, is that for all existing skbedit supported fields,
> 
> user typically provides a desired value of his choosing to set to a skb 
> metadata field.
> 
> Here, the value is unknown and probably not a real concern to the user.
> 
> 
> To sum it up, I look at this as performing some operation on the packet 
> rather then just
> 
> setting an skb metadata field and therefore it requires an explicit, new 
> action.
> 
> 
> What do you think?

skbedit is generally the action for any skb metadata modification
(of which hash is one).
Note: We already have skbedit offload for skbmark today.
The hash feature is useful for software as well (as your use case
showed). I agree with you that the majority of the cases are going to
be a computation of some form that results in dynamic skb->hash.
But the hash should be possible to be statically set by a policy.
BTW, nothing in skbedit is against computing what the new metadata
should be.

IMO: A good arguement to not make it part of skbedit is if it adds
unnecessary complexity to skbedit or policy definitions.

> 
>>
>> 2) I think it would make sense to create a skb hash classifier
>> instead of tying this entirely to flower i.e i should not
>> have to change u32 just so i can support hash classification.
>> So policy would be something of the sort:
>>
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 0 proto ip \
>> flower ip_proto tcp \
>> action skbedit hash bpf object-file <file> \
>> action goto chain 2
>>
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 2 proto ip \
>> handle 0x0 skbhash  flowid 1:11 mask 0xf  \
>> action mirred egress redirect dev ens1f0_1
>>
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 2 proto ip \
>> handle 0x1 skbhash  flowid 1:11 mask 0xf  \
>> action mirred egress redirect dev ens1f0_2
>>
>> IOW, we maintain current modularity as opposed
>> to dumping everything into flower.
>> Ive always wanted to write the skbhash classifier but
>> time was scarce. At one point i had some experiment
>> where I would copy skb hash into mark in the driver
>> and use fw classifier for further processing.
>> It was ugly.
> 
> I agree but perhaps we should make it a separate effort and not block 
> this one.
> 
> I still think we should have support via flower. This is the HW offload 
> path eventually.
> 

My main concern is modularity and the tc principle of doing small
things (and in principle doing them well).
Flower is becoming the sink for everything hardware
offload but f.e u32 also does h/w offload as well and we dont
want to limit it to just those two classifiers for the future...

Note: Flower is not very good performance-wise in the ingress in
s/ware. Something that is more specialized like the way skb mark fw
classifier is will be a lot more efficient. One good reason to make
hardware[1] do the hard work is to save the cyles in the host.
So to me adding to flower does not help that cause.

cheers,
jamal

[1] Whether the hash is set by RSS or an offloaded classifier or
shows up in some simple pkt header (IFE original patch had skb
hash being  set in one machine and transported across machines
for use in a remote machine - that setup is in use in production).

cheers,
jamal

Cong Wang July 6, 2020, 12:23 a.m. UTC | #4

On Sun, Jul 5, 2020 at 10:26 AM Ariel Levkovich <lariel@mellanox.com> wrote:
> However I believe that from a concept point of view, using it is wrong.
>
> In my honest opinion, the concept here is to perform some calculation on
> the packet itself and its headers while the skb->hash field
>
> is the storage location of the calculation result (in SW).

With skbedit, you don't have to pass a value either, whatever you
pass to your act_hash, you can pass it to skbedit too. In your case,
it seems to be an algorithm name.

You can take a look at SKBEDIT_F_INHERITDSFIELD, it calculates
skb->priority from headers, not passed from user-space.

>
> Furthermore, looking forward to HW offload support, the HW devices will
> be offloading the hash calculation and
>
> not rewriting skb metadata fields. Therefore the action should be the
> hash, not skbedit.

Not sure if this makes sense, whatever your code under case
TCA_HASH_ALG_L4 can be just moved to skbedit. I don't see
how making it standalone could be different for HW offloading.

>
> Another thing that I can mention, which is kind of related to what I
> wrote above, is that for all existing skbedit supported fields,
>
> user typically provides a desired value of his choosing to set to a skb
> metadata field.

Again, no one forces this rule. Please feel free to adjust it for your needs.

Thanks.

Cong Wang July 6, 2020, 12:28 a.m. UTC | #5

On Sun, Jul 5, 2020 at 2:50 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> BTW, nothing in skbedit is against computing what the new metadata
> should be.

Yup.

>
> IMO: A good arguement to not make it part of skbedit is if it adds
> unnecessary complexity to skbedit or policy definitions.
>

TCA_HASH_ALG_L4 literally has 4 lines of code, has no way
to add any unnecessary complexity to skbedit. (The BPF algorithm
does not belong to skbedit, obviously.)

Thanks.

Jiri Pirko July 7, 2020, 10:05 a.m. UTC | #6

Fri, Jul 03, 2020 at 01:22:47PM CEST, jhs@mojatatu.com wrote:
>Hi,
>
>Several comments:
>1) I agree with previous comments that you should
>look at incorporating this into skbedit.
>Unless incorporating into skbedit introduces huge
>complexity, IMO it belongs there.
>
>2) I think it would make sense to create a skb hash classifier
>instead of tying this entirely to flower i.e i should not
>have to change u32 just so i can support hash classification.

Well, we don't have multiple classifiers for each flower match, we have
them all in one classifier. It turned out to be very convenient and
intuitive for people to use one classifier to do the job for them.
Modularity is nice, but useability is I think more important in this
case. Flower turned out to do good job there.

+ Nothing stops you from creating separate classifier to match on hash
as you wanted to :)


>So policy would be something of the sort:
>
>$ tc filter add dev ens1f0_0 ingress \
>prio 1 chain 0 proto ip \
>flower ip_proto tcp \
>action skbedit hash bpf object-file <file> \
>action goto chain 2
>
>$ tc filter add dev ens1f0_0 ingress \
>prio 1 chain 2 proto ip \
>handle 0x0 skbhash  flowid 1:11 mask 0xf  \
>action mirred egress redirect dev ens1f0_1
>
>$ tc filter add dev ens1f0_0 ingress \
>prio 1 chain 2 proto ip \
>handle 0x1 skbhash  flowid 1:11 mask 0xf  \
>action mirred egress redirect dev ens1f0_2
>
>IOW, we maintain current modularity as opposed
>to dumping everything into flower.
>Ive always wanted to write the skbhash classifier but
>time was scarce. At one point i had some experiment
>where I would copy skb hash into mark in the driver
>and use fw classifier for further processing.
>It was ugly.
>
>cheers,
>jamal
>
>On 2020-07-01 2:47 p.m., Ariel Levkovich wrote:
>> Supporting datapath hash allows user to set up rules that provide
>> load balancing of traffic across multiple vports and for ECMP path
>> selection while keeping the number of rule at minimum.
>> 
>> Instead of matching on exact flow spec, which requires a rule per
>> flow, user can define rules based on hashing on the packet headers
>> and distribute the flows to different buckets. The number of rules
>> in this case will be constant and equal to the number of buckets.
>> 
>> The datapath hash functionality is achieved in two steps -
>> performing the hash action and then matching on the result, as
>> part of the packet's classification.
>> 
>> The api allows user to define a filter with a tc hash action
>> where the hash function can be standard asymetric hashing that Linux
>> offers or alternatively user can provide a bpf program that
>> performs hash calculation on a packet.
>> 
>> Usage is as follows:
>> 
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 0 proto ip \
>> flower ip_proto tcp \
>> action hash bpf object-file <file> \
>> action goto chain 2
>> 
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 0 proto ip \
>> flower ip_proto udp \
>> action hash bpf asym_l4 basis <basis> \
>> action goto chain 2
>> 
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 2 proto ip \
>> flower hash 0x0/0xf  \
>> action mirred egress redirect dev ens1f0_1
>> 
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 2 proto ip \
>> flower hash 0x1/0xf  \
>> action mirred egress redirect dev ens1f0_2
>> 
>> Ariel Levkovich (3):
>>    net/sched: Introduce action hash
>>    net/flow_dissector: add packet hash dissection
>>    net/sched: cls_flower: Add hash info to flow classification
>> 
>>   include/linux/skbuff.h              |   4 +
>>   include/net/act_api.h               |   2 +
>>   include/net/flow_dissector.h        |   9 +
>>   include/net/tc_act/tc_hash.h        |  22 ++
>>   include/uapi/linux/pkt_cls.h        |   4 +
>>   include/uapi/linux/tc_act/tc_hash.h |  32 +++
>>   net/core/flow_dissector.c           |  17 ++
>>   net/sched/Kconfig                   |  11 +
>>   net/sched/Makefile                  |   1 +
>>   net/sched/act_hash.c                | 389 ++++++++++++++++++++++++++++
>>   net/sched/cls_api.c                 |   1 +
>>   net/sched/cls_flower.c              |  16 ++
>>   12 files changed, 508 insertions(+)
>>   create mode 100644 include/net/tc_act/tc_hash.h
>>   create mode 100644 include/uapi/linux/tc_act/tc_hash.h
>>   create mode 100644 net/sched/act_hash.c
>> 
>

Jamal Hadi Salim July 8, 2020, 1:54 p.m. UTC | #7

On 2020-07-07 6:05 a.m., Jiri Pirko wrote:
> Fri, Jul 03, 2020 at 01:22:47PM CEST, jhs@mojatatu.com wrote:
>> Hi,
>>
>> Several comments:
>> 1) I agree with previous comments that you should
>> look at incorporating this into skbedit.
>> Unless incorporating into skbedit introduces huge
>> complexity, IMO it belongs there.
>>
>> 2) I think it would make sense to create a skb hash classifier
>> instead of tying this entirely to flower i.e i should not
>> have to change u32 just so i can support hash classification.
> 
> Well, we don't have multiple classifiers for each flower match, we have
> them all in one classifier.

Packet data matches, yes - makes sense. You could argue the same for
the other classifiers.

> It turned out to be very convenient and
> intuitive for people to use one classifier to do the job for them.

IMO:
For this specific case where _offload_ is the main use case i think
it is not a good idea because flower on ingress is slow.
The goal of offloading classifiers to hardware is so one can reduce
consumed cpu cycles on the host. If the hardware
has done the classification for me, a simple hash lookup of the
32 bit skbhash(similar to fw) in the host would be a lot less
compute intensive than running flower's algorithm.

I think there is a line for adding everything in one place,
my main concern is that this feature this is needed
by all classifiers and not just flower.

> Modularity is nice, but useability is I think more important in this
> case. Flower turned out to do good job there.
> 

For humans, agreed everything in one place is convinient.
Note: your arguement could be used for "ls" to include "grep"
functionality because in my scripts I do both most of the time.

cheers,
jamal

> + Nothing stops you from creating separate classifier to match on hash
> as you wanted to :)
> 
>

Jiri Pirko July 8, 2020, 2:45 p.m. UTC | #8

Wed, Jul 08, 2020 at 03:54:14PM CEST, jhs@mojatatu.com wrote:
>On 2020-07-07 6:05 a.m., Jiri Pirko wrote:
>> Fri, Jul 03, 2020 at 01:22:47PM CEST, jhs@mojatatu.com wrote:
>> > Hi,
>> > 
>> > Several comments:
>> > 1) I agree with previous comments that you should
>> > look at incorporating this into skbedit.
>> > Unless incorporating into skbedit introduces huge
>> > complexity, IMO it belongs there.
>> > 
>> > 2) I think it would make sense to create a skb hash classifier
>> > instead of tying this entirely to flower i.e i should not
>> > have to change u32 just so i can support hash classification.
>> 
>> Well, we don't have multiple classifiers for each flower match, we have
>> them all in one classifier.
>
>Packet data matches, yes - makes sense. You could argue the same for
>the other classifiers.
>
>> It turned out to be very convenient and
>> intuitive for people to use one classifier to do the job for them.
>
>IMO:
>For this specific case where _offload_ is the main use case i think
>it is not a good idea because flower on ingress is slow.

Eh? What do you mean by that?


>The goal of offloading classifiers to hardware is so one can reduce
>consumed cpu cycles on the host. If the hardware
>has done the classification for me, a simple hash lookup of the
>32 bit skbhash(similar to fw) in the host would be a lot less
>compute intensive than running flower's algorithm.

It is totally up to the driver/fw how they decide to offload flower.
There are multiple ways. So I don't really follow what do you mean by
"flower's algorithm"


>
>I think there is a line for adding everything in one place,
>my main concern is that this feature this is needed
>by all classifiers and not just flower.

"All" is effectively only flower. Let's be clear about that.


>
>
>> Modularity is nice, but useability is I think more important in this
>> case. Flower turned out to do good job there.
>> 
>
>For humans, agreed everything in one place is convinient.
>Note: your arguement could be used for "ls" to include "grep"
>functionality because in my scripts I do both most of the time.
>
>cheers,
>jamal
>
>
>
>> + Nothing stops you from creating separate classifier to match on hash
>> as you wanted to :)
>> 
>>

Jamal Hadi Salim July 9, 2020, 11 a.m. UTC | #9

On 2020-07-08 10:45 a.m., Jiri Pirko wrote:
> Wed, Jul 08, 2020 at 03:54:14PM CEST, jhs@mojatatu.com wrote:
>> On 2020-07-07 6:05 a.m., Jiri Pirko wrote:

[..]
>> IMO:
>> For this specific case where _offload_ is the main use case i think
>> it is not a good idea because flower on ingress is slow.
> 
> Eh? What do you mean by that?
> 
> 
>> The goal of offloading classifiers to hardware is so one can reduce
>> consumed cpu cycles on the host. If the hardware
>> has done the classification for me, a simple hash lookup of the
>> 32 bit skbhash(similar to fw) in the host would be a lot less
>> compute intensive than running flower's algorithm.
> 
> It is totally up to the driver/fw how they decide to offload flower.
> There are multiple ways. So I don't really follow what do you mean by
> "flower's algorithm"
> 

Nothing to do with how a driver offloads. That part is fine.

By "flower's algorithm" I mean the fact you have to parse and
create the flow cache from scratch on ingress - that slows down
the ingress path. Compare, from cpu cycles pov, to say fw
classifier which dereferences skbmark and uses it as a key
to lookup a hash table.
An skbhash classifier would look the same as fw in its
approach.
subtle point i was making was: if your goal was to save cpu cycles
by offloading the lookup(whose result you then use to do
less work on the host) then you need all the cycles you can
save.

Main point is: classifying based on hash(and for that
matter any other metadata like mark) is needed as a general
utility for the system and should not be only available for
flower. The one big reason we allow all kinds of classifiers
in tc is in the name of "do one thing and do it well".
It is impossible for any one classifier to classify everything
and do a good job at it. For example, I hope you are NEVER
going to add string classification in flower.

Note, what i am describing has precendence:
I can do the same thing with skbmark offloading today.
On ingress I use fw classifier (not u32 or flower).

> 
>>
>> I think there is a line for adding everything in one place,
>> my main concern is that this feature this is needed
>> by all classifiers and not just flower.
> 
> "All" is effectively only flower. Let's be clear about that.
> 

Unless I am misunderstanding, why is it just about flower?
u32 does offload to some hardware. RSS can set this value
and various existing things like XDP, tc ebpf and the action
posted by Ariel.

cheers,
jamal

Jiri Pirko July 9, 2020, 12:19 p.m. UTC | #10

Thu, Jul 09, 2020 at 01:00:26PM CEST, jhs@mojatatu.com wrote:
>On 2020-07-08 10:45 a.m., Jiri Pirko wrote:
>> Wed, Jul 08, 2020 at 03:54:14PM CEST, jhs@mojatatu.com wrote:
>> > On 2020-07-07 6:05 a.m., Jiri Pirko wrote:
>
>
>[..]
>> > IMO:
>> > For this specific case where _offload_ is the main use case i think
>> > it is not a good idea because flower on ingress is slow.
>> 
>> Eh? What do you mean by that?
>> 
>> 
>> > The goal of offloading classifiers to hardware is so one can reduce
>> > consumed cpu cycles on the host. If the hardware
>> > has done the classification for me, a simple hash lookup of the
>> > 32 bit skbhash(similar to fw) in the host would be a lot less
>> > compute intensive than running flower's algorithm.
>> 
>> It is totally up to the driver/fw how they decide to offload flower.
>> There are multiple ways. So I don't really follow what do you mean by
>> "flower's algorithm"
>> 
>
>Nothing to do with how a driver offloads. That part is fine.
>
>By "flower's algorithm" I mean the fact you have to parse and
>create the flow cache from scratch on ingress - that slows down
>the ingress path. Compare, from cpu cycles pov, to say fw

Could you point to the specific code please?

The skb->hash is only accessed if the user sets it up for matching.
I don't understand what slowdown you are talking about :/


>classifier which dereferences skbmark and uses it as a key
>to lookup a hash table.
>An skbhash classifier would look the same as fw in its
>approach.
>subtle point i was making was: if your goal was to save cpu cycles
>by offloading the lookup(whose result you then use to do
>less work on the host) then you need all the cycles you can
>save.
>
>Main point is: classifying based on hash(and for that
>matter any other metadata like mark) is needed as a general
>utility for the system and should not be only available for
>flower. The one big reason we allow all kinds of classifiers
>in tc is in the name of "do one thing and do it well".

Sure. That classifier can exist, no problem. At the same time, flower
can match on it as well. There are already multiple examples of
classifiers matching on the same thing. I don't see any problem there.


>It is impossible for any one classifier to classify everything
>and do a good job at it. For example, I hope you are NEVER
>going to add string classification in flower.
>
>Note, what i am describing has precendence:
>I can do the same thing with skbmark offloading today.
>On ingress I use fw classifier (not u32 or flower).
>
>> 
>> > 
>> > I think there is a line for adding everything in one place,
>> > my main concern is that this feature this is needed
>> > by all classifiers and not just flower.
>> 
>> "All" is effectively only flower. Let's be clear about that.
>> 
>
>Unless I am misunderstanding, why is it just about flower?
>u32 does offload to some hardware. RSS can set this value
>and various existing things like XDP, tc ebpf and the action
>posted by Ariel.
>
>cheers,
>jamal

Ariel Levkovich July 9, 2020, 1:52 p.m. UTC | #11

Not sure it was sent so trying again...


On 7/5/20 8:28 PM, Cong Wang wrote:
> On Sun, Jul 5, 2020 at 2:50 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> BTW, nothing in skbedit is against computing what the new metadata
>> should be.
> Yup.
>
>> IMO: A good arguement to not make it part of skbedit is if it adds
>> unnecessary complexity to skbedit or policy definitions.
>>
> TCA_HASH_ALG_L4 literally has 4 lines of code, has no way
> to add any unnecessary complexity to skbedit. (The BPF algorithm
> does not belong to skbedit, obviously.)
>
> Thanks.

Moving TCA_HASH_ALG_L4 to skbedit is very simple, I agree.

However, supporting the bpf option via act_bpf is still problematic as 
it is not offloadable.

We need some kind of indication that this is a hash computation program 
and therefore

it requires specific identifier in the form of a new action.

Ariel

Jamal Hadi Salim July 10, 2020, 12:04 p.m. UTC | #12

On 2020-07-09 8:19 a.m., Jiri Pirko wrote:
> Thu, Jul 09, 2020 at 01:00:26PM CEST, jhs@mojatatu.com wrote:
>> On 2020-07-08 10:45 a.m., Jiri Pirko wrote:
>>> Wed, Jul 08, 2020 at 03:54:14PM CEST, jhs@mojatatu.com wrote:
>>>> On 2020-07-07 6:05 a.m., Jiri Pirko wrote:

>> Nothing to do with how a driver offloads. That part is fine.
>>
>> By "flower's algorithm" I mean the fact you have to parse and
>> create the flow cache from scratch on ingress - that slows down
>> the ingress path. Compare, from cpu cycles pov, to say fw
> 
> Could you point to the specific code please?
> 
> The skb->hash is only accessed if the user sets it up for matching.
> I don't understand what slowdown you are talking about :/
> 

Compare the lookup approach taken by flower in ->classify vs fw.
Then add a few hundred(or thousands of) rules.

> 
>> classifier which dereferences skbmark and uses it as a key
>> to lookup a hash table.
>> An skbhash classifier would look the same as fw in its
>> approach.
>> subtle point i was making was: if your goal was to save cpu cycles
>> by offloading the lookup(whose result you then use to do
>> less work on the host) then you need all the cycles you can
>> save.
>>
>> Main point is: classifying based on hash(and for that
>> matter any other metadata like mark) is needed as a general
>> utility for the system and should not be only available for
>> flower. The one big reason we allow all kinds of classifiers
>> in tc is in the name of "do one thing and do it well".
> 
> Sure. That classifier can exist, no problem. At the same time, flower
> can match on it as well. There are already multiple examples of
> classifiers matching on the same thing. I don't see any problem there.
> 

I keep pointing to the issues and we keep circling back
to your desire to add it to flower. I emphatize with the
desire to have flower as a one stop shop for all things classification
but this is at the expense of other classifiers. I too need this for 
offloading  as well as getting the RSS proper feature i described.ets 
make progress.
You go ahead - i will submit a version to add it as a separate
hash classifier.

cheers,
jamal

Jamal Hadi Salim Aug. 7, 2020, 10:41 a.m. UTC | #13

On 2020-07-10 8:04 a.m., Jamal Hadi Salim wrote:
> On 2020-07-09 8:19 a.m., Jiri Pirko wrote:
>> Thu, Jul 09, 2020 at 01:00:26PM CEST, jhs@mojatatu.com wrote:

>>>
>>> Main point is: classifying based on hash(and for that
>>> matter any other metadata like mark) is needed as a general
>>> utility for the system and should not be only available for
>>> flower. The one big reason we allow all kinds of classifiers
>>> in tc is in the name of "do one thing and do it well".
>>
>> Sure. That classifier can exist, no problem. At the same time, flower
>> can match on it as well. There are already multiple examples of
>> classifiers matching on the same thing. I don't see any problem there.
>>
> 
> I keep pointing to the issues and we keep circling back
> to your desire to add it to flower. I emphatize with the
> desire to have flower as a one stop shop for all things classification
> but this is at the expense of other classifiers. I too need this for 
> offloading  as well as getting the RSS proper feature i described.ets 
> make progress.
> You go ahead - i will submit a version to add it as a separate
> hash classifier.
> 

Some cycles opened up - I will work on this in the next day or
two now that your patches are in...

cheers,
jamal

[net-next,v2,0/3] ] TC datapath hash api

Message

Comments