diff mbox series

block/file-posix: fix update_zones_wp() caller

Message ID 20230824173957.8472-1-faithilikerun@gmail.com
State New
Headers show
Series block/file-posix: fix update_zones_wp() caller | expand

Commit Message

Sam Li Aug. 24, 2023, 5:39 p.m. UTC
When the zoned requests that may change wp fail, it needs to
update only wps of the zones within the range of the requests
for not disrupting the other in-flight requests. The wp is updated
successfully after the request completes.

Fixed the callers with right offset and nr_zones.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/file-posix.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Damien Le Moal Aug. 24, 2023, 11:49 p.m. UTC | #1
On 8/25/23 02:39, Sam Li wrote:
> When the zoned requests that may change wp fail, it needs to
> update only wps of the zones within the range of the requests
> for not disrupting the other in-flight requests. The wp is updated
> successfully after the request completes.
> 
> Fixed the callers with right offset and nr_zones.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/file-posix.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index b16e9c21a1..22559d6c2d 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2522,7 +2522,8 @@ out:
>          }
>      } else {
>          if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> -            update_zones_wp(bs, s->fd, 0, 1);
> +            update_zones_wp(bs, s->fd, offset,
> +                            ROUND_UP(bytes, bs->bl.zone_size));

Write and zone append operations are not allowed to cross zone boundaries. So I
the number of zones should always be 1. The above changes a number of zones to a
number of bytes, which seems wrong. The correct fix is I think:

		update_zones_wp(bs, s->fd, offset, 1);

>          }
>      }
>  
> @@ -3472,7 +3473,7 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>                          len >> BDRV_SECTOR_BITS);
>      ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
>      if (ret != 0) {
> -        update_zones_wp(bs, s->fd, offset, i);
> +        update_zones_wp(bs, s->fd, offset, nrz);

Same here. Why would you need to update all zones wp ? This will affect zones
that do not have a write error and potentially change there correct in-memory wp
to a wrong value. I think this also should be:

	   update_zones_wp(bs, s->fd, offset, 1);

>          error_report("ioctl %s failed %d", op_name, ret);
>          return ret;
>      }
Sam Li Aug. 25, 2023, 3:05 a.m. UTC | #2
Damien Le Moal <dlemoal@kernel.org> 于2023年8月25日周五 07:49写道:
>
> On 8/25/23 02:39, Sam Li wrote:
> > When the zoned requests that may change wp fail, it needs to
> > update only wps of the zones within the range of the requests
> > for not disrupting the other in-flight requests. The wp is updated
> > successfully after the request completes.
> >
> > Fixed the callers with right offset and nr_zones.
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > ---
> >  block/file-posix.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index b16e9c21a1..22559d6c2d 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2522,7 +2522,8 @@ out:
> >          }
> >      } else {
> >          if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> > -            update_zones_wp(bs, s->fd, 0, 1);
> > +            update_zones_wp(bs, s->fd, offset,
> > +                            ROUND_UP(bytes, bs->bl.zone_size));
>
> Write and zone append operations are not allowed to cross zone boundaries. So I
> the number of zones should always be 1. The above changes a number of zones to a
> number of bytes, which seems wrong. The correct fix is I think:
>
>                 update_zones_wp(bs, s->fd, offset, 1);
>

I see. I forgot this constraint.

> >          }
> >      }
> >
> > @@ -3472,7 +3473,7 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> >                          len >> BDRV_SECTOR_BITS);
> >      ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
> >      if (ret != 0) {
> > -        update_zones_wp(bs, s->fd, offset, i);
> > +        update_zones_wp(bs, s->fd, offset, nrz);
>
> Same here. Why would you need to update all zones wp ? This will affect zones
> that do not have a write error and potentially change there correct in-memory wp
> to a wrong value. I think this also should be:
>
>            update_zones_wp(bs, s->fd, offset, 1);
>

Is update_zones_wp for cancelling the writes on invalid zones or
updating corrupted write pointers caused by caller (write, append or
zone_mgmt)?

My thought is based on the latter. Zone_mgmt can manage multiple zones
with a single request. When the request fails, it's hard to tell which
zone is corrupted. The relation between the req (zone_mgmt) and
update_zones_wp is: if req succeeds, no updates; if req fails,
consider the req never happens and do again.

If the former is right, then it assumes only the first zone may
contain an error. I am not sure it's right.

> >          error_report("ioctl %s failed %d", op_name, ret);
> >          return ret;
> >      }
>
> --
> Damien Le Moal
> Western Digital Research
>
Damien Le Moal Aug. 25, 2023, 3:32 a.m. UTC | #3
On 8/25/23 12:05, Sam Li wrote:
> Damien Le Moal <dlemoal@kernel.org> 于2023年8月25日周五 07:49写道:
>>
>> On 8/25/23 02:39, Sam Li wrote:
>>> When the zoned requests that may change wp fail, it needs to
>>> update only wps of the zones within the range of the requests
>>> for not disrupting the other in-flight requests. The wp is updated
>>> successfully after the request completes.
>>>
>>> Fixed the callers with right offset and nr_zones.
>>>
>>> Signed-off-by: Sam Li <faithilikerun@gmail.com>
>>> ---
>>>  block/file-posix.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>> index b16e9c21a1..22559d6c2d 100644
>>> --- a/block/file-posix.c
>>> +++ b/block/file-posix.c
>>> @@ -2522,7 +2522,8 @@ out:
>>>          }
>>>      } else {
>>>          if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
>>> -            update_zones_wp(bs, s->fd, 0, 1);
>>> +            update_zones_wp(bs, s->fd, offset,
>>> +                            ROUND_UP(bytes, bs->bl.zone_size));
>>
>> Write and zone append operations are not allowed to cross zone boundaries. So I
>> the number of zones should always be 1. The above changes a number of zones to a
>> number of bytes, which seems wrong. The correct fix is I think:
>>
>>                 update_zones_wp(bs, s->fd, offset, 1);
>>
> 
> I see. I forgot this constraint.
> 
>>>          }
>>>      }
>>>
>>> @@ -3472,7 +3473,7 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>>>                          len >> BDRV_SECTOR_BITS);
>>>      ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
>>>      if (ret != 0) {
>>> -        update_zones_wp(bs, s->fd, offset, i);
>>> +        update_zones_wp(bs, s->fd, offset, nrz);
>>
>> Same here. Why would you need to update all zones wp ? This will affect zones
>> that do not have a write error and potentially change there correct in-memory wp
>> to a wrong value. I think this also should be:
>>
>>            update_zones_wp(bs, s->fd, offset, 1);
>>
> 
> Is update_zones_wp for cancelling the writes on invalid zones or
> updating corrupted write pointers caused by caller (write, append or
> zone_mgmt)?
> 
> My thought is based on the latter. Zone_mgmt can manage multiple zones
> with a single request. When the request fails, it's hard to tell which
> zone is corrupted. The relation between the req (zone_mgmt) and
> update_zones_wp is: if req succeeds, no updates; if req fails,
> consider the req never happens and do again.

You should update the wp of the zones that were touched by the operation that
failed. No other zone should have its wp updated as that could cause corruptions
of the wp if there are on-going writes on these other zones.

so the call should be "update_zones_wp(bs, s->fd, offset, n);"

with n being the number of zones that the operation targeted.

> 
> If the former is right, then it assumes only the first zone may
> contain an error. I am not sure it's right.
> 
>>>          error_report("ioctl %s failed %d", op_name, ret);
>>>          return ret;
>>>      }
>>
>> --
>> Damien Le Moal
>> Western Digital Research
>>
Sam Li Aug. 25, 2023, 3:36 a.m. UTC | #4
Damien Le Moal <dlemoal@kernel.org> 于2023年8月25日周五 11:32写道:
>
> On 8/25/23 12:05, Sam Li wrote:
> > Damien Le Moal <dlemoal@kernel.org> 于2023年8月25日周五 07:49写道:
> >>
> >> On 8/25/23 02:39, Sam Li wrote:
> >>> When the zoned requests that may change wp fail, it needs to
> >>> update only wps of the zones within the range of the requests
> >>> for not disrupting the other in-flight requests. The wp is updated
> >>> successfully after the request completes.
> >>>
> >>> Fixed the callers with right offset and nr_zones.
> >>>
> >>> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> >>> ---
> >>>  block/file-posix.c | 5 +++--
> >>>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/block/file-posix.c b/block/file-posix.c
> >>> index b16e9c21a1..22559d6c2d 100644
> >>> --- a/block/file-posix.c
> >>> +++ b/block/file-posix.c
> >>> @@ -2522,7 +2522,8 @@ out:
> >>>          }
> >>>      } else {
> >>>          if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> >>> -            update_zones_wp(bs, s->fd, 0, 1);
> >>> +            update_zones_wp(bs, s->fd, offset,
> >>> +                            ROUND_UP(bytes, bs->bl.zone_size));
> >>
> >> Write and zone append operations are not allowed to cross zone boundaries. So I
> >> the number of zones should always be 1. The above changes a number of zones to a
> >> number of bytes, which seems wrong. The correct fix is I think:
> >>
> >>                 update_zones_wp(bs, s->fd, offset, 1);
> >>
> >
> > I see. I forgot this constraint.
> >
> >>>          }
> >>>      }
> >>>
> >>> @@ -3472,7 +3473,7 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> >>>                          len >> BDRV_SECTOR_BITS);
> >>>      ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
> >>>      if (ret != 0) {
> >>> -        update_zones_wp(bs, s->fd, offset, i);
> >>> +        update_zones_wp(bs, s->fd, offset, nrz);
> >>
> >> Same here. Why would you need to update all zones wp ? This will affect zones
> >> that do not have a write error and potentially change there correct in-memory wp
> >> to a wrong value. I think this also should be:
> >>
> >>            update_zones_wp(bs, s->fd, offset, 1);
> >>
> >
> > Is update_zones_wp for cancelling the writes on invalid zones or
> > updating corrupted write pointers caused by caller (write, append or
> > zone_mgmt)?
> >
> > My thought is based on the latter. Zone_mgmt can manage multiple zones
> > with a single request. When the request fails, it's hard to tell which
> > zone is corrupted. The relation between the req (zone_mgmt) and
> > update_zones_wp is: if req succeeds, no updates; if req fails,
> > consider the req never happens and do again.
>
> You should update the wp of the zones that were touched by the operation that
> failed. No other zone should have its wp updated as that could cause corruptions
> of the wp if there are on-going writes on these other zones.
>
> so the call should be "update_zones_wp(bs, s->fd, offset, n);"
>
> with n being the number of zones that the operation targeted.

Yes, so it's nrz in zone_mgmt. Thanks!

>
> >
> > If the former is right, then it assumes only the first zone may
> > contain an error. I am not sure it's right.
> >
> >>>          error_report("ioctl %s failed %d", op_name, ret);
> >>>          return ret;
> >>>      }
> >>
> >> --
> >> Damien Le Moal
> >> Western Digital Research
> >>
>
> --
> Damien Le Moal
> Western Digital Research
>
diff mbox series

Patch

diff --git a/block/file-posix.c b/block/file-posix.c
index b16e9c21a1..22559d6c2d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2522,7 +2522,8 @@  out:
         }
     } else {
         if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
-            update_zones_wp(bs, s->fd, 0, 1);
+            update_zones_wp(bs, s->fd, offset,
+                            ROUND_UP(bytes, bs->bl.zone_size));
         }
     }
 
@@ -3472,7 +3473,7 @@  static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
                         len >> BDRV_SECTOR_BITS);
     ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
     if (ret != 0) {
-        update_zones_wp(bs, s->fd, offset, i);
+        update_zones_wp(bs, s->fd, offset, nrz);
         error_report("ioctl %s failed %d", op_name, ret);
         return ret;
     }