diff mbox series

hw/block/nvme: fix zone write finalize

Message ID 20210112094235.188686-1-its@irrelevant.dk
State New
Headers show
Series hw/block/nvme: fix zone write finalize | expand

Commit Message

Klaus Jensen Jan. 12, 2021, 9:42 a.m. UTC
From: Klaus Jensen <k.jensen@samsung.com>

The zone write pointer is unconditionally advanced, even for write
faults. Make sure that the zone is always transitioned to Full if the
write pointer reaches zone capacity.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
 hw/block/nvme.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Keith Busch Jan. 14, 2021, 11:03 p.m. UTC | #1
On Tue, Jan 12, 2021 at 10:42:35AM +0100, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> The zone write pointer is unconditionally advanced, even for write
> faults. Make sure that the zone is always transitioned to Full if the
> write pointer reaches zone capacity.

Looks like some spec weirdness. It says we can transition to full:

  b) as a result of successful completion of a write operation that
     writes one or more logical blocks that causes the zone to reach its
     writeable zone capacity;

But a failed write also advances the write pointer as you've pointed
out, so they might want to strike "successful".

Looks fine to me.

Reviewed-by: Keith Busch <kbusch@kernel.org>

> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> ---
>  hw/block/nvme.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 0854ee307227..280b31b4459d 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
>      nlb = le16_to_cpu(rw->nlb) + 1;
>      zone = nvme_get_zone_by_slba(ns, slba);
>  
> +    zone->d.wp += nlb;
> +
>      if (failed) {
>          res->slba = 0;
> -        zone->d.wp += nlb;
> -    } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
> +    }
> +
> +    if (zone->d.wp == nvme_zone_wr_boundary(zone)) {

The previous check was using 'zone->w_ptr', but now it's 'zone->d.wp'.
As far as I can tell, this difference will mean the zone won't finalize
until the last write completes, where before it could finalize after the
zone's last write is submitted. Either way looks okay, but I think these
two values ought to always be in sync.

>          switch (nvme_get_zone_state(zone)) {
>          case NVME_ZONE_STATE_IMPLICITLY_OPEN:
>          case NVME_ZONE_STATE_EXPLICITLY_OPEN:
> @@ -1288,9 +1291,6 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
>          default:
>              assert(false);
>          }
> -        zone->d.wp = zone->w_ptr;
> -    } else {
> -        zone->d.wp += nlb;
>      }
>  }
Klaus Jensen Jan. 15, 2021, 6:01 a.m. UTC | #2
On Jan 14 15:03, Keith Busch wrote:
> On Tue, Jan 12, 2021 at 10:42:35AM +0100, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > The zone write pointer is unconditionally advanced, even for write
> > faults. Make sure that the zone is always transitioned to Full if the
> > write pointer reaches zone capacity.
> 
> Looks like some spec weirdness. It says we can transition to full:
> 
>   b) as a result of successful completion of a write operation that
>      writes one or more logical blocks that causes the zone to reach its
>      writeable zone capacity;
> 
> But a failed write also advances the write pointer as you've pointed
> out, so they might want to strike "successful".
> 

Yes, its slightly inconsistent.

Empty/Closed to Opened is also kinda fuzzy in the spec. The wording is
slightly different when transitioning from Empty and Closed to
Implicitly Opened. ZSE->ZSIO just says "and a write operation writes on
or more logical blocks of that zone". ZSC->ZSIO says "and a write
operation that writes one or more logical blocks to the zone completes
succesfully". This has given me some headaches on if the transition
should occur when "preparing/submitting" the write or when the write
completes. But I guess this is pretty much an implementation detail of
the device.

I should mention this to my representatives ;)

> Looks fine to me.
> 
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> > ---
> >  hw/block/nvme.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 0854ee307227..280b31b4459d 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
> >      nlb = le16_to_cpu(rw->nlb) + 1;
> >      zone = nvme_get_zone_by_slba(ns, slba);
> >  
> > +    zone->d.wp += nlb;
> > +
> >      if (failed) {
> >          res->slba = 0;
> > -        zone->d.wp += nlb;
> > -    } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
> > +    }
> > +
> > +    if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
> 
> The previous check was using 'zone->w_ptr', but now it's 'zone->d.wp'.
> As far as I can tell, this difference will mean the zone won't finalize
> until the last write completes, where before it could finalize after the
> zone's last write is submitted. Either way looks okay, but I think these
> two values ought to always be in sync.
> 

Right - the zone will transition to Full on the last completed write. I
really not sure if this models devices better or not. But I would argue
that a real device would not relinquish resources until the last write
completes.

An issue is that for QD>1 (zone append), we can easily end up with
appends A and B completing in reversed order such that d.wp is advanced
by B_nlb, but we left a "hole" of A_nlb in the zone because that write
has not completed yet.

But - I think *some* inconsistency on the write pointer value is to be
expected when using Zone Append. The host can only ever depend on a
consistent value of the write pointer by quiescing I/O to the zone and
then getting a zone report - and d.wp and d.w_ptr will always
converge in that case.
Klaus Jensen Jan. 20, 2021, 9:14 a.m. UTC | #3
On Jan 12 10:42, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> The zone write pointer is unconditionally advanced, even for write
> faults. Make sure that the zone is always transitioned to Full if the
> write pointer reaches zone capacity.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> ---
>  hw/block/nvme.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 0854ee307227..280b31b4459d 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
>      nlb = le16_to_cpu(rw->nlb) + 1;
>      zone = nvme_get_zone_by_slba(ns, slba);
>  
> +    zone->d.wp += nlb;
> +
>      if (failed) {
>          res->slba = 0;
> -        zone->d.wp += nlb;
> -    } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
> +    }
> +
> +    if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
>          switch (nvme_get_zone_state(zone)) {
>          case NVME_ZONE_STATE_IMPLICITLY_OPEN:
>          case NVME_ZONE_STATE_EXPLICITLY_OPEN:
> @@ -1288,9 +1291,6 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
>          default:
>              assert(false);
>          }
> -        zone->d.wp = zone->w_ptr;
> -    } else {
> -        zone->d.wp += nlb;
>      }
>  }
>  

Applied to nvme-next.
diff mbox series

Patch

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 0854ee307227..280b31b4459d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1268,10 +1268,13 @@  static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
     nlb = le16_to_cpu(rw->nlb) + 1;
     zone = nvme_get_zone_by_slba(ns, slba);
 
+    zone->d.wp += nlb;
+
     if (failed) {
         res->slba = 0;
-        zone->d.wp += nlb;
-    } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
+    }
+
+    if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
         switch (nvme_get_zone_state(zone)) {
         case NVME_ZONE_STATE_IMPLICITLY_OPEN:
         case NVME_ZONE_STATE_EXPLICITLY_OPEN:
@@ -1288,9 +1291,6 @@  static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
         default:
             assert(false);
         }
-        zone->d.wp = zone->w_ptr;
-    } else {
-        zone->d.wp += nlb;
     }
 }