Message ID | febbbd17b3cf4201aaae24e4adb61e4f8a80e9c9.1667822611.git.ritesh.list@gmail.com |
---|---|
State | Under Review |
Delegated to: | Theodore Ts'o |
Headers | show |
Series | e2fsprogs: Parallel fsck support | expand |
On Mon, Nov 07, 2022 at 05:50:49PM +0530, Ritesh Harjani (IBM) wrote: > f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced > mutex unlock in below path. > > This patch fixes it. > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> This has been fixed in upstream (for a while, actually); what commit is your patches based upon? - Ted
On 22/11/17 11:02AM, Theodore Ts'o wrote: > On Mon, Nov 07, 2022 at 05:50:49PM +0530, Ritesh Harjani (IBM) wrote: > > f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced > > mutex unlock in below path. > > > > This patch fixes it. > > > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > > This has been fixed in upstream (for a while, actually); Hello Ted, I think you are speaking about this patch here [1], which prevents the race of using the fd by multiple threads at a time, by using BOUNCE_MTX lock. The current patch on the other hand fixes the unbalanced mutex_unlock(), which can be reproduced with f_crashdisk test with UNIX_IO_FORCE_BOUNCE=yes. (when tested with --enable-threadsan) Below is the threadsan warning that I see. WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (pid=135059) #0 pthread_mutex_unlock <null> (libtsan.so.0+0x3b64a) #1 mutex_unlock ../../../lib/ext2fs/unix_io.c:161 (e2fsck+0x5b392b) #2 raw_read_blk ../../../lib/ext2fs/unix_io.c:331 (e2fsck+0x5b392b) #3 unix_read_blk64 ../../../lib/ext2fs/unix_io.c:1001 (e2fsck+0x5b4981) #4 unix_read_blk ../../../lib/ext2fs/unix_io.c:1053 (e2fsck+0x5b5171) #5 ext2fs_open2 ../../../lib/ext2fs/openfs.c:228 (e2fsck+0x579e5c) #6 try_open_fs ../../e2fsck/unix.c:1242 (e2fsck+0x424e63) #7 main ../../e2fsck/unix.c:1604 (e2fsck+0x41a1d2) Location is heap block of size 376 at 0x7b4800000000 allocated by main thread: #0 malloc <null> (libtsan.so.0+0x32919) #1 ext2fs_get_mem ../../../lib/ext2fs/ext2fs.h:1944 (e2fsck+0x5aec30) #2 unix_open_channel ../../../lib/ext2fs/unix_io.c:698 (e2fsck+0x5aec30) #3 unix_open ../../../lib/ext2fs/unix_io.c:910 (e2fsck+0x5afd67) #4 ext2fs_open2 ../../../lib/ext2fs/openfs.c:175 (e2fsck+0x579918) #5 try_open_fs ../../e2fsck/unix.c:1242 (e2fsck+0x424e63) #6 main ../../e2fsck/unix.c:1604 (e2fsck+0x41a1d2) Mutex M22 (0x7b4800000128) created at: #0 pthread_mutex_init <null> (libtsan.so.0+0x49603) #1 unix_open_channel ../../../lib/ext2fs/unix_io.c:827 (e2fsck+0x5af5ef) #2 unix_open ../../../lib/ext2fs/unix_io.c:910 (e2fsck+0x5afd67) #3 ext2fs_open2 ../../../lib/ext2fs/openfs.c:175 (e2fsck+0x579918) #4 try_open_fs ../../e2fsck/unix.c:1242 (e2fsck+0x424e63) #5 main ../../e2fsck/unix.c:1604 (e2fsck+0x41a1d2) SUMMARY: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (/lib64/libtsan.so.0+0x3b64a) in pthread_mutex_unlock [1]: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=d557b9659ba97e093f842dcc7e2cfe9a7022c674 > what commit is your patches based upon? 1. This series is rebased on the latest e2fsprogs master branch. https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/log/ Cover letter might provide some more details on how the patch series is broken into. The github link of the series can be found at https://github.com/riteshharjani/e2fsprogs/commits/pfsck-RFCv1 Note: I have provided some details in the cover letter about Patch-073 i.e. ("tests: Add multi-thread tests framework"). But putting it once again here... You can ignore this ^^^ last patch in the github link. This was added to tests all existing e2fsck test with pfsck mode. But later I realized that it will require more work then how it is in the current form. Since I didn't want to hold the patch series any longer and I wanted this series to go through the initial round of review, before it becomes any bigger and difficult to review, hence I decided to drop the last patch from sending it to mailing list. Please let me know if you have any queries on any patch/series. -ritesh
On Nov 7, 2022, at 06:22, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote: > > f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced > mutex unlock in below path. > > This patch fixes it. > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > --- > lib/ext2fs/unix_io.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c > index e53db333..5b894826 100644 > --- a/lib/ext2fs/unix_io.c > +++ b/lib/ext2fs/unix_io.c > @@ -305,7 +305,6 @@ bounce_read: > while (size > 0) { > actual = read(data->dev, data->bounce, align_size); > if (actual != align_size) { > - mutex_unlock(data, BOUNCE_MTX); This patch doesn't show enough context, but AFAIK this is jumping before mutex_down() is called, so this *should* be correct as is? Cheers, Andreas > actual = really_read; > buf -= really_read; > size += really_read; > -- > 2.37.3 >
On 22/11/18 04:34AM, Andreas Dilger wrote: > On Nov 7, 2022, at 06:22, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote: > > > > f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced > > mutex unlock in below path. > > > > This patch fixes it. > > > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > > --- > > lib/ext2fs/unix_io.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c > > index e53db333..5b894826 100644 > > --- a/lib/ext2fs/unix_io.c > > +++ b/lib/ext2fs/unix_io.c > > @@ -305,7 +305,6 @@ bounce_read: > > while (size > 0) { > > actual = read(data->dev, data->bounce, align_size); > > if (actual != align_size) { > > - mutex_unlock(data, BOUNCE_MTX); > > This patch doesn't show enough context, but AFAIK this is jumping before mutex_down() > is called, so this *should* be correct as is? Thanks for the review, Andreas. Yeah, the patch diff above is not sufficient since it doesn't share enuf context. But essentially when "actual" is not equal to "align_size", then in this if condition it goes to label "short_read:", which always goto error_unlock, where we anyways call mutex_unlock() Looking at a lot of labels in this function, this definitely looks like something which can be cleaned up ("raw_read_blk()"). I will add that to my list of todos. I have also shared the threadsan warning which detects the unbalanced mutex unlock here [1] [1]: https://lore.kernel.org/linux-ext4/cover.1667822611.git.ritesh.list@gmail.com/T/#md75b3ccb146e4433653bc2d7dd01329a9757ea26 -ritesh
On Nov 18, 2022, at 05:37, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote: > > On 22/11/18 04:34AM, Andreas Dilger wrote: >>> On Nov 7, 2022, at 06:22, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote: >>> >>> f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced >>> mutex unlock in below path. >>> >>> This patch fixes it. >>> >>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> >>> --- >>> lib/ext2fs/unix_io.c | 1 - >>> 1 file changed, 1 deletion(-) >>> >>> diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c >>> index e53db333..5b894826 100644 >>> --- a/lib/ext2fs/unix_io.c >>> +++ b/lib/ext2fs/unix_io.c >>> @@ -305,7 +305,6 @@ bounce_read: >>> while (size > 0) { >>> actual = read(data->dev, data->bounce, align_size); >>> if (actual != align_size) { >>> - mutex_unlock(data, BOUNCE_MTX); >> >> This patch doesn't show enough context, but AFAIK this is jumping before mutex_down() >> is called, so this *should* be correct as is? > > Thanks for the review, Andreas. > > Yeah, the patch diff above is not sufficient since it doesn't share enuf > context. > But essentially when "actual" is not equal to "align_size", then in this if > condition it goes to label "short_read:", which always goto error_unlock, > where we anyways call mutex_unlock() > > Looking at a lot of labels in this function, this definitely looks like > something which can be cleaned up ("raw_read_blk()"). > I will add that to my list of todos. You are correct, and it means this code is just not very clear to the reader. I think it would make more sense to move the "short_read:" label to the end of the code: actual = read(...); if (actual != size) goto error_short_read; goto success_unlock; : actual = read(...); if (actual != align_size) { actual = really_read; buf -= really_read; size += really_read; goto error_short_read; } : success_unlock: mutex_unlock(...); return 0; error_short_read: if (actual < 0) { retval = errno; actual = 0; } else { retval = EXT2_ET_SHORT_READ; } error_unlock: mutex_unlock(...); That way the code follows the normal error handling convention and is less likely to be surprising to the reader. Cheers, Andreas
On 22/11/18 07:20AM, Andreas Dilger wrote: > On Nov 18, 2022, at 05:37, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote: > > > > On 22/11/18 04:34AM, Andreas Dilger wrote: > >>> On Nov 7, 2022, at 06:22, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote: > >>> > >>> f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced > >>> mutex unlock in below path. > >>> > >>> This patch fixes it. > >>> > >>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > >>> --- > >>> lib/ext2fs/unix_io.c | 1 - > >>> 1 file changed, 1 deletion(-) > >>> > >>> diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c > >>> index e53db333..5b894826 100644 > >>> --- a/lib/ext2fs/unix_io.c > >>> +++ b/lib/ext2fs/unix_io.c > >>> @@ -305,7 +305,6 @@ bounce_read: > >>> while (size > 0) { > >>> actual = read(data->dev, data->bounce, align_size); > >>> if (actual != align_size) { > >>> - mutex_unlock(data, BOUNCE_MTX); > >> > >> This patch doesn't show enough context, but AFAIK this is jumping before mutex_down() > >> is called, so this *should* be correct as is? > > > > Thanks for the review, Andreas. > > > > Yeah, the patch diff above is not sufficient since it doesn't share enuf > > context. > > But essentially when "actual" is not equal to "align_size", then in this if > > condition it goes to label "short_read:", which always goto error_unlock, > > where we anyways call mutex_unlock() > > > > Looking at a lot of labels in this function, this definitely looks like > > something which can be cleaned up ("raw_read_blk()"). > > I will add that to my list of todos. > > You are correct, and it means this code is just not very clear to the reader. I think it > would make more sense to move the "short_read:" label to the end of the code: > > actual = read(...); > if (actual != size) > goto error_short_read; > goto success_unlock; > : > actual = read(...); > if (actual != align_size) { > actual = really_read; > buf -= really_read; > size += really_read; > goto error_short_read; > } > : > success_unlock: > mutex_unlock(...); > return 0; > > error_short_read: > if (actual < 0) { > retval = errno; > actual = 0; > } else { > retval = EXT2_ET_SHORT_READ; > } > error_unlock: > mutex_unlock(...); > > That way the code follows the normal error handling convention and is less likely to be > surprising to the reader. Yes, you are right. I will do the change in the next rev. -ritesh
On Mon, Nov 07, 2022 at 05:50:49PM +0530, Ritesh Harjani (IBM) wrote: > f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced > mutex unlock in below path. > > This patch fixes it. > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Applied, thanks. - Ted
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c index e53db333..5b894826 100644 --- a/lib/ext2fs/unix_io.c +++ b/lib/ext2fs/unix_io.c @@ -305,7 +305,6 @@ bounce_read: while (size > 0) { actual = read(data->dev, data->bounce, align_size); if (actual != align_size) { - mutex_unlock(data, BOUNCE_MTX); actual = really_read; buf -= really_read; size += really_read;
f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced mutex unlock in below path. This patch fixes it. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- lib/ext2fs/unix_io.c | 1 - 1 file changed, 1 deletion(-)