Message ID | CA+jPhpdaCgJvZXWtc-O4j42RxXQc5kkYnzm8ZtNKNtcUBYKgPg@mail.gmail.com |
---|---|
Headers | show |
Series | Fixes for LP1797367 | expand |
On 07.11.18 19:20, Frank Heimes wrote: > BugLink: http://bugs.launchpad.net/bugs/1797367 > > == SRU Justification == > > While running a series of stress tests for network on a bond device on > Ubuntu 18.04.1 with kernel 4.15.0-36.39, > kernel panic is observed (btw. also on non-bond devices). > This looks like a race between disabling a qeth device and accessing debugfs. > This is critical and leads repeatedly to a crash (sooner or later). > > == Fix == > > e19e5be8b4ca ("s390/qeth: sanitize strings in debug messages") > > pre-reqs: > 750b162 ("s390/qeth: reduce hard-coded access to ccw channels") > d857e11 ("s390/qeth: remove outdated portname debug msg") > 9d0a58f ("s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]") > 8174aa8 ("s390/qeth: consolidate qeth MAC address helpers") > 4641b02 ("s390/qeth: don't keep track of MAC address's cast type") > > == Regression Potential == > > Low, because: > - limited to s390x > - and furthermore limited to qeth driver > - patches a problem identified during testing > - fix was tested by IBM before submitted > > == Test Case == > > run: > #!/bin/bash > var=0 > while : > do > var=$((var + 1)) > echo "DBG count is $var" > mkdir /tmp/DBGINFO > dbginfo.sh -d /tmp/DBGINFO > rm -rf /tmp/DBGINFO* > echo "chzdev now is $var" > chzdev -e <qeth device> > chzdev -d <qeth device> > done > and in avg. in less than 20 cycles a crash happens (usually < 10). > And thanks for ASCII ;) Acked-by: Stefan Bader <stefan.bader@canonical.com>
On 11/07/18 19:20, Frank Heimes wrote: > BugLink: http://bugs.launchpad.net/bugs/1797367 > > == SRU Justification == > > While running a series of stress tests for network on a bond device on > Ubuntu 18.04.1 with kernel 4.15.0-36.39, > kernel panic is observed (btw. also on non-bond devices). > This looks like a race between disabling a qeth device and accessing debugfs. > This is critical and leads repeatedly to a crash (sooner or later). > > == Fix == > > e19e5be8b4ca ("s390/qeth: sanitize strings in debug messages") > > pre-reqs: > 750b162 ("s390/qeth: reduce hard-coded access to ccw channels") > d857e11 ("s390/qeth: remove outdated portname debug msg") > 9d0a58f ("s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]") > 8174aa8 ("s390/qeth: consolidate qeth MAC address helpers") > 4641b02 ("s390/qeth: don't keep track of MAC address's cast type") > > == Regression Potential == > > Low, because: > - limited to s390x > - and furthermore limited to qeth driver > - patches a problem identified during testing > - fix was tested by IBM before submitted > > == Test Case == > > run: > #!/bin/bash > var=0 > while : > do > var=$((var + 1)) > echo "DBG count is $var" > mkdir /tmp/DBGINFO > dbginfo.sh -d /tmp/DBGINFO > rm -rf /tmp/DBGINFO* > echo "chzdev now is $var" > chzdev -e <qeth device> > chzdev -d <qeth device> > done > and in avg. in less than 20 cycles a crash happens (usually < 10). > Some of the patches won't probably apply because of bogus line-breaks as I mentioned in one of the patches, so some extra care is needed when applying the patches. We'll probably need to cherry-pick the original commit and add the BugLink and the s-o-b. Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
On 07.11.18 19:20, Frank Heimes wrote: > BugLink: http://bugs.launchpad.net/bugs/1797367 > > == SRU Justification == > > While running a series of stress tests for network on a bond device on > Ubuntu 18.04.1 with kernel 4.15.0-36.39, > kernel panic is observed (btw. also on non-bond devices). > This looks like a race between disabling a qeth device and accessing debugfs. > This is critical and leads repeatedly to a crash (sooner or later). > > == Fix == > > e19e5be8b4ca ("s390/qeth: sanitize strings in debug messages") > > pre-reqs: > 750b162 ("s390/qeth: reduce hard-coded access to ccw channels") > d857e11 ("s390/qeth: remove outdated portname debug msg") > 9d0a58f ("s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]") > 8174aa8 ("s390/qeth: consolidate qeth MAC address helpers") > 4641b02 ("s390/qeth: don't keep track of MAC address's cast type") > > == Regression Potential == > > Low, because: > - limited to s390x > - and furthermore limited to qeth driver > - patches a problem identified during testing > - fix was tested by IBM before submitted > > == Test Case == > > run: > #!/bin/bash > var=0 > while : > do > var=$((var + 1)) > echo "DBG count is $var" > mkdir /tmp/DBGINFO > dbginfo.sh -d /tmp/DBGINFO > rm -rf /tmp/DBGINFO* > echo "chzdev now is $var" > chzdev -e <qeth device> > chzdev -d <qeth device> > done > and in avg. in less than 20 cycles a crash happens (usually < 10). > For Bionic I had to invert the series to make it apply. The actual fix still needed some adaption, so its again more a backport. For Cosmic, there was only one of the pre-reqs was needed and the actual fix had to ignore a bit of context. Applied to bionic,cosmic/master-next. Thanks. -Stefan
On Wed, Nov 07, 2018 at 07:20:40PM +0100, Frank Heimes wrote: > BugLink: http://bugs.launchpad.net/bugs/1797367 > > == SRU Justification == > > While running a series of stress tests for network on a bond device on > Ubuntu 18.04.1 with kernel 4.15.0-36.39, > kernel panic is observed (btw. also on non-bond devices). > This looks like a race between disabling a qeth device and accessing debugfs. > This is critical and leads repeatedly to a crash (sooner or later). > > == Fix == > > e19e5be8b4ca ("s390/qeth: sanitize strings in debug messages") > > pre-reqs: > 750b162 ("s390/qeth: reduce hard-coded access to ccw channels") > d857e11 ("s390/qeth: remove outdated portname debug msg") > 9d0a58f ("s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]") > 8174aa8 ("s390/qeth: consolidate qeth MAC address helpers") > 4641b02 ("s390/qeth: don't keep track of MAC address's cast type") > > == Regression Potential == > > Low, because: > - limited to s390x > - and furthermore limited to qeth driver > - patches a problem identified during testing > - fix was tested by IBM before submitted > > == Test Case == > > run: > #!/bin/bash > var=0 > while : > do > var=$((var + 1)) > echo "DBG count is $var" > mkdir /tmp/DBGINFO > dbginfo.sh -d /tmp/DBGINFO > rm -rf /tmp/DBGINFO* > echo "chzdev now is $var" > chzdev -e <qeth device> > chzdev -d <qeth device> > done > and in avg. in less than 20 cycles a crash happens (usually < 10). Applied patch 1 to unstable/master, the prerequisites were already present. Thanks!