diff mbox series

[SRU,jammy/linux-aws,kinetic/linux-aws,16/20] UBUNTU: SAUCE: block: xen-blkfront: consider new dom0 features on restore

Message ID 20220817085150.2078055-21-gerald.yang@canonical.com
State New
Headers show
Series UBUNTU: SAUCE: PM: Hibernate: Enable Hibernation for Xen Based Instance Types | expand

Commit Message

Gerald Yang Aug. 17, 2022, 8:51 a.m. UTC
From: Eduardo Valentin <eduval@amazon.com>

BugLink: https://bugs.launchpad.net/bugs/1968062

On regular start, the instance will perform a regular boot, in which rootfs
is mounted accordingly to the xen-blkback features (in particular
feature-barrier and feature-flush-cache). That will setup the journal
accordingly to the provided features on SB.
On a start from hibernation, the instance boots, detects that a hibernation
image is present, push the image to memory and jumps back where it was. There
is no regular mount of the rootfs, it uses the data structures already in
the previous saved memory image.
Now, When the instance hibernates, it may move from its original dom0 to a new dom0
when it is restarted.
So, given the above, if the xen-blkback features change then the guest
can be in trouble. And I see the original assumption was that the
dom0 environment would be preserved. I did a couple of experiments,
and I confirm that these particular features change quite a lot across
hibernation attempts:
[ 2343.157903] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2444.712339] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2537.105884] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2636.641298] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2729.868349] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2827.118979] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2924.812599] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3018.063399] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3116.685040] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3209.164475] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3317.981362] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3415.939725] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3514.202478] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3619.355791] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;

Now, considering the above, this patch fixes the following scenario:
a. Instance boots and sets up bio queue on a dom0 A with softbarrier supported.
b. hibernates
c. When asked to restore, the instance is back on dom0 B with unsupported
softbarrier.
d. Restoration goes well until next journal commit is issued. Remember that
it is still using the previous image rootfs data structures, therefore
is gonna request a softbarrier.
e. The bio will error out and throw a "operation not supported" message
and cause the journal to fail, and it will decide to remount
the rootfs as RO.
[ 1138.909290] print_req_error: operation not supported error, dev xvda, sector 4470400, flags 6008
[ 1139.025685] Aborting journal on device xvda1-8.
[ 1139.029758] print_req_error: operation not supported error, dev xvda, sector 4460544, flags 26008
[ 1139.326119] Buffer I/O error on dev xvda1, logical block 0, lost sync page write
[ 1139.331398] EXT4-fs error (device xvda1): ext4_journal_check_start:61: Detected aborted journal
[ 1139.337296] EXT4-fs (xvda1): Remounting filesystem read-only
[ 1139.341006] EXT4-fs (xvda1): previous I/O error to superblock detected
[ 1139.345704] print_req_error: operation not supported error, dev xvda, sector 4096, flags 26008

The fix is essentially to read xenbus to query the new xen
blkback capabilities and update them into the request queue.

Reviewed-by: Balbir Singh <sblbir@amazon.com>
Reviewed-by: Vallish Vaidyeshwara <vallish@amazon.com>
Signed-off-by: Eduardo Valentin <eduval@amazon.com>
(cherry picked from commit 740afde3ef282dfdd4a3c179f3de0ae33162e85c amazon-5.15.y/mainline)
Signed-off-by: Gerald Yang <gerald.yang@canonical.com>
Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
---
 drivers/block/xen-blkfront.c | 3 +++
 1 file changed, 3 insertions(+)
diff mbox series

Patch

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 3ebff999ca70..d607823c7934 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2630,6 +2630,9 @@  static int blkfront_restore(struct xenbus_device *dev)
 {
 	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
 	int err = 0;
+
+	blkfront_gather_backend_features(info);
+	xlvbd_flush(info);
 	err = talk_to_blkback(dev, info);
 	if (err)
 		goto out;