mbox series

[SRU,F,0/1] CVE-2024-26586

Message ID 20240531192357.34798-1-bethany.jamison@canonical.com
Headers show
Series CVE-2024-26586 | expand

Message

Bethany Jamison May 31, 2024, 7:23 p.m. UTC
[Impact]

mlxsw: spectrum_acl_tcam: Fix stack corruption

When tc filters are first added to a net device, the corresponding local
port gets bound to an ACL group in the device. The group contains a list
of ACLs. In turn, each ACL points to a different TCAM region where the
filters are stored. During forwarding, the ACLs are sequentially
evaluated until a match is found.

One reason to place filters in different regions is when they are added
with decreasing priorities and in an alternating order so that two
consecutive filters can never fit in the same region because of their
key usage.

In Spectrum-2 and newer ASICs the firmware started to report that the
maximum number of ACLs in a group is more than 16, but the layout of the
register that configures ACL groups (PAGT) was not updated to account
for that. It is therefore possible to hit stack corruption [1] in the
rare case where more than 16 ACLs in a group are required.

Fix by limiting the maximum ACL group size to the minimum between what
the firmware reports and the maximum ACLs that fit in the PAGT register.

Add a test case to make sure the machine does not crash when this
condition is hit.

[1]
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120
[...]
 dump_stack_lvl+0x36/0x50
 panic+0x305/0x330
 __stack_chk_fail+0x15/0x20
 mlxsw_sp_acl_tcam_group_update+0x116/0x120
 mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110
 mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20
 mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
 mlxsw_sp_acl_rule_add+0x47/0x240
 mlxsw_sp_flower_replace+0x1a9/0x1d0
 tc_setup_cb_add+0xdc/0x1c0
 fl_hw_replace_filter+0x146/0x1f0
 fl_change+0xc17/0x1360
 tc_new_tfilter+0x472/0xb90
 rtnetlink_rcv_msg+0x313/0x3b0
 netlink_rcv_skb+0x58/0x100
 netlink_unicast+0x244/0x390
 netlink_sendmsg+0x1e4/0x440
 ____sys_sendmsg+0x164/0x260
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xc0
 do_syscall_64+0x40/0xe0
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

[Fix]

Noble:	pending
Mantic:	pending
Jammy:	pending
Focal:	backport - function lived in a different spot in the file
	and cherry-pick couldn't find it
Bionic:	not-affected
Xenial:	not-affected
Trusty:	not-affected

[Test Case]

Compile and boot tested

[Where problems could occur]

This fix affects those who use the Mellanox mlxsw driver, an issue with
this fix would be visible to the user via unexpected behavior or a
system crash.

Ido Schimmel (1):
  mlxsw: spectrum_acl_tcam: Fix stack corruption

 .../mellanox/mlxsw/spectrum_acl_tcam.c        |  2 +
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 56 ++++++++++++++++++-
 2 files changed, 57 insertions(+), 1 deletion(-)

Comments

Tim Gardner June 3, 2024, 6:07 p.m. UTC | #1
On 5/31/24 1:23 PM, Bethany Jamison wrote:
> [Impact]
> 
> mlxsw: spectrum_acl_tcam: Fix stack corruption
> 
> When tc filters are first added to a net device, the corresponding local
> port gets bound to an ACL group in the device. The group contains a list
> of ACLs. In turn, each ACL points to a different TCAM region where the
> filters are stored. During forwarding, the ACLs are sequentially
> evaluated until a match is found.
> 
> One reason to place filters in different regions is when they are added
> with decreasing priorities and in an alternating order so that two
> consecutive filters can never fit in the same region because of their
> key usage.
> 
> In Spectrum-2 and newer ASICs the firmware started to report that the
> maximum number of ACLs in a group is more than 16, but the layout of the
> register that configures ACL groups (PAGT) was not updated to account
> for that. It is therefore possible to hit stack corruption [1] in the
> rare case where more than 16 ACLs in a group are required.
> 
> Fix by limiting the maximum ACL group size to the minimum between what
> the firmware reports and the maximum ACLs that fit in the PAGT register.
> 
> Add a test case to make sure the machine does not crash when this
> condition is hit.
> 
> [1]
> Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120
> [...]
>   dump_stack_lvl+0x36/0x50
>   panic+0x305/0x330
>   __stack_chk_fail+0x15/0x20
>   mlxsw_sp_acl_tcam_group_update+0x116/0x120
>   mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110
>   mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20
>   mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
>   mlxsw_sp_acl_rule_add+0x47/0x240
>   mlxsw_sp_flower_replace+0x1a9/0x1d0
>   tc_setup_cb_add+0xdc/0x1c0
>   fl_hw_replace_filter+0x146/0x1f0
>   fl_change+0xc17/0x1360
>   tc_new_tfilter+0x472/0xb90
>   rtnetlink_rcv_msg+0x313/0x3b0
>   netlink_rcv_skb+0x58/0x100
>   netlink_unicast+0x244/0x390
>   netlink_sendmsg+0x1e4/0x440
>   ____sys_sendmsg+0x164/0x260
>   ___sys_sendmsg+0x9a/0xe0
>   __sys_sendmsg+0x7a/0xc0
>   do_syscall_64+0x40/0xe0
>   entry_SYSCALL_64_after_hwframe+0x63/0x6b
> 
> [Fix]
> 
> Noble:	pending
> Mantic:	pending
> Jammy:	pending
> Focal:	backport - function lived in a different spot in the file
> 	and cherry-pick couldn't find it
> Bionic:	not-affected
> Xenial:	not-affected
> Trusty:	not-affected
> 
> [Test Case]
> 
> Compile and boot tested
> 
> [Where problems could occur]
> 
> This fix affects those who use the Mellanox mlxsw driver, an issue with
> this fix would be visible to the user via unexpected behavior or a
> system crash.
> 
> Ido Schimmel (1):
>    mlxsw: spectrum_acl_tcam: Fix stack corruption
> 
>   .../mellanox/mlxsw/spectrum_acl_tcam.c        |  2 +
>   .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 56 ++++++++++++++++++-
>   2 files changed, 57 insertions(+), 1 deletion(-)
> 
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Portia Stephens June 3, 2024, 11:09 p.m. UTC | #2
On Fri, May 31, 2024 at 02:23:56PM -0500, Bethany Jamison wrote:
> [Impact]
> 
> mlxsw: spectrum_acl_tcam: Fix stack corruption
> 
> When tc filters are first added to a net device, the corresponding local
> port gets bound to an ACL group in the device. The group contains a list
> of ACLs. In turn, each ACL points to a different TCAM region where the
> filters are stored. During forwarding, the ACLs are sequentially
> evaluated until a match is found.
> 
> One reason to place filters in different regions is when they are added
> with decreasing priorities and in an alternating order so that two
> consecutive filters can never fit in the same region because of their
> key usage.
> 
> In Spectrum-2 and newer ASICs the firmware started to report that the
> maximum number of ACLs in a group is more than 16, but the layout of the
> register that configures ACL groups (PAGT) was not updated to account
> for that. It is therefore possible to hit stack corruption [1] in the
> rare case where more than 16 ACLs in a group are required.
> 
> Fix by limiting the maximum ACL group size to the minimum between what
> the firmware reports and the maximum ACLs that fit in the PAGT register.
> 
> Add a test case to make sure the machine does not crash when this
> condition is hit.
> 
> [1]
> Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120
> [...]
>  dump_stack_lvl+0x36/0x50
>  panic+0x305/0x330
>  __stack_chk_fail+0x15/0x20
>  mlxsw_sp_acl_tcam_group_update+0x116/0x120
>  mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110
>  mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20
>  mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
>  mlxsw_sp_acl_rule_add+0x47/0x240
>  mlxsw_sp_flower_replace+0x1a9/0x1d0
>  tc_setup_cb_add+0xdc/0x1c0
>  fl_hw_replace_filter+0x146/0x1f0
>  fl_change+0xc17/0x1360
>  tc_new_tfilter+0x472/0xb90
>  rtnetlink_rcv_msg+0x313/0x3b0
>  netlink_rcv_skb+0x58/0x100
>  netlink_unicast+0x244/0x390
>  netlink_sendmsg+0x1e4/0x440
>  ____sys_sendmsg+0x164/0x260
>  ___sys_sendmsg+0x9a/0xe0
>  __sys_sendmsg+0x7a/0xc0
>  do_syscall_64+0x40/0xe0
>  entry_SYSCALL_64_after_hwframe+0x63/0x6b
> 
> [Fix]
> 
> Noble:	pending
> Mantic:	pending
> Jammy:	pending
> Focal:	backport - function lived in a different spot in the file
> 	and cherry-pick couldn't find it
> Bionic:	not-affected
> Xenial:	not-affected
> Trusty:	not-affected
> 
> [Test Case]
> 
> Compile and boot tested
> 
> [Where problems could occur]
> 
> This fix affects those who use the Mellanox mlxsw driver, an issue with
> this fix would be visible to the user via unexpected behavior or a
> system crash.
> 
> Ido Schimmel (1):
>   mlxsw: spectrum_acl_tcam: Fix stack corruption
> 
>  .../mellanox/mlxsw/spectrum_acl_tcam.c        |  2 +
>  .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 56 ++++++++++++++++++-
>  2 files changed, 57 insertions(+), 1 deletion(-)

Acked-by: Portia Stephens <portia.stephens@canonical.com>
Stefan Bader June 4, 2024, 1:59 p.m. UTC | #3
On 31.05.24 21:23, Bethany Jamison wrote:
> [Impact]
> 
> mlxsw: spectrum_acl_tcam: Fix stack corruption
> 
> When tc filters are first added to a net device, the corresponding local
> port gets bound to an ACL group in the device. The group contains a list
> of ACLs. In turn, each ACL points to a different TCAM region where the
> filters are stored. During forwarding, the ACLs are sequentially
> evaluated until a match is found.
> 
> One reason to place filters in different regions is when they are added
> with decreasing priorities and in an alternating order so that two
> consecutive filters can never fit in the same region because of their
> key usage.
> 
> In Spectrum-2 and newer ASICs the firmware started to report that the
> maximum number of ACLs in a group is more than 16, but the layout of the
> register that configures ACL groups (PAGT) was not updated to account
> for that. It is therefore possible to hit stack corruption [1] in the
> rare case where more than 16 ACLs in a group are required.
> 
> Fix by limiting the maximum ACL group size to the minimum between what
> the firmware reports and the maximum ACLs that fit in the PAGT register.
> 
> Add a test case to make sure the machine does not crash when this
> condition is hit.
> 
> [1]
> Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120
> [...]
>   dump_stack_lvl+0x36/0x50
>   panic+0x305/0x330
>   __stack_chk_fail+0x15/0x20
>   mlxsw_sp_acl_tcam_group_update+0x116/0x120
>   mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110
>   mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20
>   mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
>   mlxsw_sp_acl_rule_add+0x47/0x240
>   mlxsw_sp_flower_replace+0x1a9/0x1d0
>   tc_setup_cb_add+0xdc/0x1c0
>   fl_hw_replace_filter+0x146/0x1f0
>   fl_change+0xc17/0x1360
>   tc_new_tfilter+0x472/0xb90
>   rtnetlink_rcv_msg+0x313/0x3b0
>   netlink_rcv_skb+0x58/0x100
>   netlink_unicast+0x244/0x390
>   netlink_sendmsg+0x1e4/0x440
>   ____sys_sendmsg+0x164/0x260
>   ___sys_sendmsg+0x9a/0xe0
>   __sys_sendmsg+0x7a/0xc0
>   do_syscall_64+0x40/0xe0
>   entry_SYSCALL_64_after_hwframe+0x63/0x6b
> 
> [Fix]
> 
> Noble:	pending
> Mantic:	pending
> Jammy:	pending
> Focal:	backport - function lived in a different spot in the file
> 	and cherry-pick couldn't find it
> Bionic:	not-affected
> Xenial:	not-affected
> Trusty:	not-affected
> 
> [Test Case]
> 
> Compile and boot tested
> 
> [Where problems could occur]
> 
> This fix affects those who use the Mellanox mlxsw driver, an issue with
> this fix would be visible to the user via unexpected behavior or a
> system crash.
> 
> Ido Schimmel (1):
>    mlxsw: spectrum_acl_tcam: Fix stack corruption
> 
>   .../mellanox/mlxsw/spectrum_acl_tcam.c        |  2 +
>   .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 56 ++++++++++++++++++-
>   2 files changed, 57 insertions(+), 1 deletion(-)
> 

Applied to focal:linux/master-next. Thanks.

-Stefan