Message ID | 20231026093006.159693-1-frank.heimes@canonical.com |
---|---|
Headers | show |
Series | SMC stats: Wrong bucket calculation for payload of exactly 4096 bytes (LP: 2039575) | expand |
On 26.10.23 11:30, frank.heimes@canonical.com wrote: > BugLink: https://bugs.launchpad.net/bugs/2039575 > > SRU Justification: > > [Impact] > > * There is a wrong bucket calculation for payload of exactly 4096 bytes > in SMC stats counters. > > * SMC_STAT_PAYLOAD_SUB and SMC_STAT_RMB_SIZE_SUB have these issues. > > * The impact is that a system silently updates an incorrect stats counter > in case the underlying kernel is not UBSAN enabled. > (Difficult to detect.) > > * But if a kernel is UBSAN enabled, one will see a UBSAN > 'array-index-out-of-bounds' warning every time this occurs, like: > [ 26.335381] UBSAN: array-index-out-of-bounds in /build/linux-O6Qi7m/linux-5.15.0/net/smc/af_smc.c:2402:3 > [ 26.335385] index -1 is out of range for type 'u64 [9]' > [ 26.335388] CPU: 0 PID: 274 Comm: iperf3 Tainted: G E 5.15.0-79-generic #86-Ubuntu > [ 26.335391] Hardware name: IBM 8561 T01 772 (KVM/Linux) > [ 26.335393] Call Trace: > [ 26.335397] [<00000000cd92e63a>] dump_stack_lvl+0x62/0x80 > [ 26.335404] [<00000000cd92e36c>] ubsan_epilogue+0x1c/0x48 > [ 26.335406] [<00000000cd52d3c4>] __ubsan_handle_out_of_bounds+0x94/0xa0 > [ 26.335411] [<000003ff8033f9da>] smc_sendmsg+0x2aa/0x2d0 [smc] > [ 26.335425] [<00000000cd6a79a4>] sock_sendmsg+0x64/0x80 > [ 26.335431] [<00000000cd6a7a32>] sock_write_iter+0x72/0xa0 > [ 26.335433] [<00000000cd1d4000>] new_sync_write+0x100/0x190 > [ 26.335438] [<00000000cd1d4bb8>] vfs_write+0x1e8/0x280 > [ 26.335440] [<00000000cd1d7014>] ksys_write+0xb4/0x100 > [ 26.335442] [<00000000cd932c7c>] __do_syscall+0x1bc/0x1f0 > [ 26.335446] [<00000000cd940148>] system_call+0x78/0xa0 > > [Fix] > > * a950a5921db4 a950a5921db450c74212327f69950ff03419483a "net/smc: Fix pos miscalculation in statistics" > > [Test Plan] > > * Since this got identified while the livepatch for jammy patches got added, > one could run a simiar (or the same) test like mentioned at LP#1639924 > (but only jammy comes with official livepatch support). > > * Alternatively a dedicated SMC stats counters test with a payload of > exactly 4096 bytes could be done (which is probably easier): > > * Install uperf (https://uperf.org/ - https://github.com/uperf/uperf). > (Wondering if it makes sense to pick this up as Debian/Ubuntu package ?!) > > * Reset SMC-D stats on client and server side. > > * Start uperf at the server side using: # uperf -vs > > * Update profile with remote IP (server IP) > and start uperf at client: # uperf -vai 5 -m rr1c-4kx4k---1.xml > with uperf profile: > # cat rr1c-4kx4k---1.xml > <?xml version="1.0"?> > <profile name="TCP_RR"> > <group nprocs="1"> > <!--group nthreads="1"--> > <!-- if we want to run processes --> > <!--group nprocs="1"--> > <transaction iterations="1"> > <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> > </transaction> > <transaction iterations="1"> > <flowop type="write" options="size=4096"/> > <flowop type="read" options="size=4096"/> > </transaction> > <transaction iterations="1"> > <flowop type="disconnect" /> > </transaction> > </group> > </profile> > > * The smcd stats output is: > # smcd -d stats reset > SMC-D Connections Summary > Total connections handled 2 > SMC connections 2 (client 2, server 0) > v1 0 > v2 2 > Handshake errors 0 (client 0, server 0) > Avg requests per SMC conn 14.0 > TCP fallback 0 (client 0, server 0) > RX Stats > Data transmitted (Bytes) 5796 (5.796K) > Total requests 9 > Buffer full 0 (0.00%) > Buffer downgrades 0 > Buffer reuses 0 > 8KB 16KB 32KB 64KB 128KB 256KB 512KB >512KB > Bufs 0 0 0 2 0 0 0 1 > Reqs 8 0 0 0 0 0 0 0 > TX Stats > Data transmitted (Bytes) 9960 (9.960K) > Total requests 19 > Buffer full 0 (0.00%) > Buffer full (remote) 0 (0.00%) > Buffer too small 0 (0.00%) > Buffer too small (remote) 0 (0.00%) > Buffer downgrades 0 > Buffer reuses 0 > 8KB 16KB 32KB 64KB 128KB 256KB 512KB >512KB > Bufs 0 2 0 0 0 0 0 0 > Reqs 18 0 0 0 0 0 0 1 > Extras > Special socket calls 0 > cork 0 > nodelay 0 > sendpage 0 > splice 0 > urgent data 0 > > * Instead of including the payload in the wrong >512 KB buckets, > the output should be to have 19 reqs in the 8 KB buckets for TX stats > and 9 reqs in the 8 KB bucket for RX stats. > > [Where problems could occur] > > * The changes are in common code /net/smc/smc_stats.h > hence any potential negativ impact is not limited to a specific platform. > but limited to statistics for shared memory communication (SMC) > hardware statistics. > > * But limited to the functions SMC_STAT_PAYLOAD_SUB and > SMC_STAT_RMB_SIZE_SUB. > > * The bucket (aka pos) calculation is not that trivial, > issues here could cause other calculation error, > that may lead to slightly different, but still wrong numbers. > > * Further issues can happen in case variables in use may overflow. > > * Issues can also happen if the ternary conditional operators > are not correctly used, which again may lead to wrong calculations. > > [Other Info] > > * This patch fixes upstream e0e4b8fa5338 > ("net/smc: Add SMC statistics support"). > > * Since the fix got upstream accepted with v6.6-rc6, > not only 22.04, but also 23.04 and 23.10 are affected. > (22.10 already reached it EOS). > > * Will not affected the current development release (N-release), > since it will at least be based on a 6.6 kernel. > > Nils Hoppmann (1): > net/smc: Fix pos miscalculation in statistics > > net/smc/smc_stats.h | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > Acked-by: Stefan Bader <stefan.bader@canonical.com>
On 26/10/2023 11:30, frank.heimes@canonical.com wrote: > BugLink: https://bugs.launchpad.net/bugs/2039575 > > SRU Justification: > > [Impact] > > * There is a wrong bucket calculation for payload of exactly 4096 bytes > in SMC stats counters. > > * SMC_STAT_PAYLOAD_SUB and SMC_STAT_RMB_SIZE_SUB have these issues. > > * The impact is that a system silently updates an incorrect stats counter > in case the underlying kernel is not UBSAN enabled. > (Difficult to detect.) > > * But if a kernel is UBSAN enabled, one will see a UBSAN > 'array-index-out-of-bounds' warning every time this occurs, like: > [ 26.335381] UBSAN: array-index-out-of-bounds in /build/linux-O6Qi7m/linux-5.15.0/net/smc/af_smc.c:2402:3 > [ 26.335385] index -1 is out of range for type 'u64 [9]' > [ 26.335388] CPU: 0 PID: 274 Comm: iperf3 Tainted: G E 5.15.0-79-generic #86-Ubuntu > [ 26.335391] Hardware name: IBM 8561 T01 772 (KVM/Linux) > [ 26.335393] Call Trace: > [ 26.335397] [<00000000cd92e63a>] dump_stack_lvl+0x62/0x80 > [ 26.335404] [<00000000cd92e36c>] ubsan_epilogue+0x1c/0x48 > [ 26.335406] [<00000000cd52d3c4>] __ubsan_handle_out_of_bounds+0x94/0xa0 > [ 26.335411] [<000003ff8033f9da>] smc_sendmsg+0x2aa/0x2d0 [smc] > [ 26.335425] [<00000000cd6a79a4>] sock_sendmsg+0x64/0x80 > [ 26.335431] [<00000000cd6a7a32>] sock_write_iter+0x72/0xa0 > [ 26.335433] [<00000000cd1d4000>] new_sync_write+0x100/0x190 > [ 26.335438] [<00000000cd1d4bb8>] vfs_write+0x1e8/0x280 > [ 26.335440] [<00000000cd1d7014>] ksys_write+0xb4/0x100 > [ 26.335442] [<00000000cd932c7c>] __do_syscall+0x1bc/0x1f0 > [ 26.335446] [<00000000cd940148>] system_call+0x78/0xa0 > > [Fix] > > * a950a5921db4 a950a5921db450c74212327f69950ff03419483a "net/smc: Fix pos miscalculation in statistics" > > [Test Plan] > > * Since this got identified while the livepatch for jammy patches got added, > one could run a simiar (or the same) test like mentioned at LP#1639924 > (but only jammy comes with official livepatch support). > > * Alternatively a dedicated SMC stats counters test with a payload of > exactly 4096 bytes could be done (which is probably easier): > > * Install uperf (https://uperf.org/ - https://github.com/uperf/uperf). > (Wondering if it makes sense to pick this up as Debian/Ubuntu package ?!) > > * Reset SMC-D stats on client and server side. > > * Start uperf at the server side using: # uperf -vs > > * Update profile with remote IP (server IP) > and start uperf at client: # uperf -vai 5 -m rr1c-4kx4k---1.xml > with uperf profile: > # cat rr1c-4kx4k---1.xml > <?xml version="1.0"?> > <profile name="TCP_RR"> > <group nprocs="1"> > <!--group nthreads="1"--> > <!-- if we want to run processes --> > <!--group nprocs="1"--> > <transaction iterations="1"> > <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> > </transaction> > <transaction iterations="1"> > <flowop type="write" options="size=4096"/> > <flowop type="read" options="size=4096"/> > </transaction> > <transaction iterations="1"> > <flowop type="disconnect" /> > </transaction> > </group> > </profile> > > * The smcd stats output is: > # smcd -d stats reset > SMC-D Connections Summary > Total connections handled 2 > SMC connections 2 (client 2, server 0) > v1 0 > v2 2 > Handshake errors 0 (client 0, server 0) > Avg requests per SMC conn 14.0 > TCP fallback 0 (client 0, server 0) > RX Stats > Data transmitted (Bytes) 5796 (5.796K) > Total requests 9 > Buffer full 0 (0.00%) > Buffer downgrades 0 > Buffer reuses 0 > 8KB 16KB 32KB 64KB 128KB 256KB 512KB >512KB > Bufs 0 0 0 2 0 0 0 1 > Reqs 8 0 0 0 0 0 0 0 > TX Stats > Data transmitted (Bytes) 9960 (9.960K) > Total requests 19 > Buffer full 0 (0.00%) > Buffer full (remote) 0 (0.00%) > Buffer too small 0 (0.00%) > Buffer too small (remote) 0 (0.00%) > Buffer downgrades 0 > Buffer reuses 0 > 8KB 16KB 32KB 64KB 128KB 256KB 512KB >512KB > Bufs 0 2 0 0 0 0 0 0 > Reqs 18 0 0 0 0 0 0 1 > Extras > Special socket calls 0 > cork 0 > nodelay 0 > sendpage 0 > splice 0 > urgent data 0 > > * Instead of including the payload in the wrong >512 KB buckets, > the output should be to have 19 reqs in the 8 KB buckets for TX stats > and 9 reqs in the 8 KB bucket for RX stats. > > [Where problems could occur] > > * The changes are in common code /net/smc/smc_stats.h > hence any potential negativ impact is not limited to a specific platform. > but limited to statistics for shared memory communication (SMC) > hardware statistics. > > * But limited to the functions SMC_STAT_PAYLOAD_SUB and > SMC_STAT_RMB_SIZE_SUB. > > * The bucket (aka pos) calculation is not that trivial, > issues here could cause other calculation error, > that may lead to slightly different, but still wrong numbers. > > * Further issues can happen in case variables in use may overflow. > > * Issues can also happen if the ternary conditional operators > are not correctly used, which again may lead to wrong calculations. > > [Other Info] > > * This patch fixes upstream e0e4b8fa5338 > ("net/smc: Add SMC statistics support"). > > * Since the fix got upstream accepted with v6.6-rc6, > not only 22.04, but also 23.04 and 23.10 are affected. > (22.10 already reached it EOS). > > * Will not affected the current development release (N-release), > since it will at least be based on a 6.6 kernel. > > Nils Hoppmann (1): > net/smc: Fix pos miscalculation in statistics > > net/smc/smc_stats.h | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > Acked-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
On 26/10/2023 11:30, frank.heimes@canonical.com wrote: > BugLink: https://bugs.launchpad.net/bugs/2039575 > > SRU Justification: > > [Impact] > > * There is a wrong bucket calculation for payload of exactly 4096 bytes > in SMC stats counters. > > * SMC_STAT_PAYLOAD_SUB and SMC_STAT_RMB_SIZE_SUB have these issues. > > * The impact is that a system silently updates an incorrect stats counter > in case the underlying kernel is not UBSAN enabled. > (Difficult to detect.) > > * But if a kernel is UBSAN enabled, one will see a UBSAN > 'array-index-out-of-bounds' warning every time this occurs, like: > [ 26.335381] UBSAN: array-index-out-of-bounds in /build/linux-O6Qi7m/linux-5.15.0/net/smc/af_smc.c:2402:3 > [ 26.335385] index -1 is out of range for type 'u64 [9]' > [ 26.335388] CPU: 0 PID: 274 Comm: iperf3 Tainted: G E 5.15.0-79-generic #86-Ubuntu > [ 26.335391] Hardware name: IBM 8561 T01 772 (KVM/Linux) > [ 26.335393] Call Trace: > [ 26.335397] [<00000000cd92e63a>] dump_stack_lvl+0x62/0x80 > [ 26.335404] [<00000000cd92e36c>] ubsan_epilogue+0x1c/0x48 > [ 26.335406] [<00000000cd52d3c4>] __ubsan_handle_out_of_bounds+0x94/0xa0 > [ 26.335411] [<000003ff8033f9da>] smc_sendmsg+0x2aa/0x2d0 [smc] > [ 26.335425] [<00000000cd6a79a4>] sock_sendmsg+0x64/0x80 > [ 26.335431] [<00000000cd6a7a32>] sock_write_iter+0x72/0xa0 > [ 26.335433] [<00000000cd1d4000>] new_sync_write+0x100/0x190 > [ 26.335438] [<00000000cd1d4bb8>] vfs_write+0x1e8/0x280 > [ 26.335440] [<00000000cd1d7014>] ksys_write+0xb4/0x100 > [ 26.335442] [<00000000cd932c7c>] __do_syscall+0x1bc/0x1f0 > [ 26.335446] [<00000000cd940148>] system_call+0x78/0xa0 > > [Fix] > > * a950a5921db4 a950a5921db450c74212327f69950ff03419483a "net/smc: Fix pos miscalculation in statistics" > > [Test Plan] > > * Since this got identified while the livepatch for jammy patches got added, > one could run a simiar (or the same) test like mentioned at LP#1639924 > (but only jammy comes with official livepatch support). > > * Alternatively a dedicated SMC stats counters test with a payload of > exactly 4096 bytes could be done (which is probably easier): > > * Install uperf (https://uperf.org/ - https://github.com/uperf/uperf). > (Wondering if it makes sense to pick this up as Debian/Ubuntu package ?!) > > * Reset SMC-D stats on client and server side. > > * Start uperf at the server side using: # uperf -vs > > * Update profile with remote IP (server IP) > and start uperf at client: # uperf -vai 5 -m rr1c-4kx4k---1.xml > with uperf profile: > # cat rr1c-4kx4k---1.xml > <?xml version="1.0"?> > <profile name="TCP_RR"> > <group nprocs="1"> > <!--group nthreads="1"--> > <!-- if we want to run processes --> > <!--group nprocs="1"--> > <transaction iterations="1"> > <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> > </transaction> > <transaction iterations="1"> > <flowop type="write" options="size=4096"/> > <flowop type="read" options="size=4096"/> > </transaction> > <transaction iterations="1"> > <flowop type="disconnect" /> > </transaction> > </group> > </profile> > > * The smcd stats output is: > # smcd -d stats reset > SMC-D Connections Summary > Total connections handled 2 > SMC connections 2 (client 2, server 0) > v1 0 > v2 2 > Handshake errors 0 (client 0, server 0) > Avg requests per SMC conn 14.0 > TCP fallback 0 (client 0, server 0) > RX Stats > Data transmitted (Bytes) 5796 (5.796K) > Total requests 9 > Buffer full 0 (0.00%) > Buffer downgrades 0 > Buffer reuses 0 > 8KB 16KB 32KB 64KB 128KB 256KB 512KB >512KB > Bufs 0 0 0 2 0 0 0 1 > Reqs 8 0 0 0 0 0 0 0 > TX Stats > Data transmitted (Bytes) 9960 (9.960K) > Total requests 19 > Buffer full 0 (0.00%) > Buffer full (remote) 0 (0.00%) > Buffer too small 0 (0.00%) > Buffer too small (remote) 0 (0.00%) > Buffer downgrades 0 > Buffer reuses 0 > 8KB 16KB 32KB 64KB 128KB 256KB 512KB >512KB > Bufs 0 2 0 0 0 0 0 0 > Reqs 18 0 0 0 0 0 0 1 > Extras > Special socket calls 0 > cork 0 > nodelay 0 > sendpage 0 > splice 0 > urgent data 0 > > * Instead of including the payload in the wrong >512 KB buckets, > the output should be to have 19 reqs in the 8 KB buckets for TX stats > and 9 reqs in the 8 KB bucket for RX stats. > > [Where problems could occur] > > * The changes are in common code /net/smc/smc_stats.h > hence any potential negativ impact is not limited to a specific platform. > but limited to statistics for shared memory communication (SMC) > hardware statistics. > > * But limited to the functions SMC_STAT_PAYLOAD_SUB and > SMC_STAT_RMB_SIZE_SUB. > > * The bucket (aka pos) calculation is not that trivial, > issues here could cause other calculation error, > that may lead to slightly different, but still wrong numbers. > > * Further issues can happen in case variables in use may overflow. > > * Issues can also happen if the ternary conditional operators > are not correctly used, which again may lead to wrong calculations. > > [Other Info] > > * This patch fixes upstream e0e4b8fa5338 > ("net/smc: Add SMC statistics support"). > > * Since the fix got upstream accepted with v6.6-rc6, > not only 22.04, but also 23.04 and 23.10 are affected. > (22.10 already reached it EOS). > > * Will not affected the current development release (N-release), > since it will at least be based on a 6.6 kernel. > > Nils Hoppmann (1): > net/smc: Fix pos miscalculation in statistics > > net/smc/smc_stats.h | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > Applied to j,l,m:master-next. Thanks! Roxana