mbox series

[v2,00/12] ext4: enhance simulate fail facility

Message ID 20221110022558.7844-1-yi.zhang@huawei.com
Headers show
Series ext4: enhance simulate fail facility | expand

Message

Zhang Yi Nov. 10, 2022, 2:25 a.m. UTC
Changes since v1:
 - Fix format error in ext4_fault_ops_write().

Now we can test ext4's reliability by simulating fail facility introduced
in commit 46f870d690fe ("ext4: simulate various I/O and checksum errors
when reading metadata"), it can simulate checksum error or I/O error
when reading metadata from disk. But it is functional limited, it cannot
set failure times, probability, filters, etc. Fortunately, we already
have common fault-injection frame in Linux, so above limitation could be
easily supplied by using it in ext4. This patch set add ext4
fault-injection facility to replace the old frame, supply some kinds of
checksum error and I/O error, and also add group, inode, physical
block and inode logical block filters. After this patch set, we could
inject failure more precisly. The facility could be used to do fuzz
stress test include random errors, and it also could be used to
reprodece issues more conveniently.

Patch 1: add debugfs for preparing.
Patch 2: introduce the fault-injection frame for ext4.
Patch 3-11: add various kinds of faults and also do some cleanup.
Patch 12: remove the old simulating facility.

It provides a debugfs interface in ext4/<disk>/fault_inject, besides the
common config interfaces, we give 6 more.
 - available_faults: present available faults we can inject.
 - inject_faults: set faults, can set multiple at a time.
 - inject_inode: set the inode filter, matches all inodes if not set.
 - inject_group: set the block group filter, similar to inject_inode.
 - inject_logical_block: set the logical block filter for one inode.
 - inject_physical_block: set the physical block filter.

Current we add 20 available faults list below, include 8 kinds of
metadata checksum error, 7 metadata I/O error and 5 journal error.
After we have this facility, more other faults could be added easily
in the future.
 - group_desc_checksum
 - inode_bitmap_checksum
 - block_bitmap_checksum
 - inode_checksum
 - extent_block_checksum
 - dir_block_checksum
 - dir_index_block_checksum
 - xattr_block_checksum
 - inode_bitmap_eio
 - block_bitmap_eio
 - inode_eio
 - extent_block_eio
 - dir_block_eio
 - xattr_block_eio
 - symlink_block_eio
 - journal_start
 - journal_start_sb
 - journal_get_create_access
 - journal_get_write_access
 - journal_dirty_metadata

For example: inject inode metadata checksum error on file 'foo'.

$ mkfs.ext4 -F /dev/pmem0
$ mount /dev/pmem0 /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/foo
$ ls -i /mnt/dir/foo
  262146 /mnt/foo

$ echo 100 > /sys/kernel/debug/ext4/pmem0/fault_inject/probability
$ echo 1 > /sys/kernel/debug/ext4/pmem0/fault_inject/times
$ echo 262146 > /sys/kernel/debug/ext4/pmem0/fault_inject/inject_inode
$ echo inode_checksum > /sys/kernel/debug/ext4/pmem0/fault_inject/inject_faults
$ echo 1 > /sys/kernel/debug/ext4/pmem0/fault_inject/enable
$ echo 3 > /proc/sys/vm/drop_caches ##drop cache
$ stat /mnt/dir/foo
  stat: cannot statx '/mnt/dir/foo': Bad message

The kmesg print the injection location.

[  461.433817] FAULT_INJECTION: forcing a failure.
[  461.433817] name fault_inject, interval 1, probability 100, space 0, times 1
...
[  461.438609] Call Trace:
[  461.438875]  <TASK>
[  461.439116]  ? dump_stack_lvl+0x73/0xa3
[  461.439534]  ? dump_stack+0x13/0x1f
[  461.439909]  ? should_fail.cold+0x4a/0x57
[  461.440346]  ? ext4_should_fail.cold+0x11f/0x135
[  461.440833]  ? __ext4_iget+0x407/0x1410
[  461.441245]  ? ext4_lookup+0x1be/0x350
[  461.441650]  ? __lookup_slow+0xb9/0x1f0
[  461.442070]  ? lookup_slow+0x46/0x70
[  461.442463]  ? walk_component+0x13e/0x230
[  461.442890]  ? path_lookupat.isra.0+0x8f/0x200
[  461.443369]  ? filename_lookup+0xd6/0x240
[  461.443798]  ? vfs_statx+0xa6/0x200
[  461.444186]  ? do_statx+0x48/0xc0
[  461.444546]  ? __might_sleep+0x56/0xc0
[  461.444950]  ? should_fail_usercopy+0x19/0x30
[  461.445424]  ? strncpy_from_user+0x33/0x2a0
[  461.445870]  ? getname_flags+0x95/0x330
[  461.446288]  ? switch_fpu_return+0x27/0x1e0
[  461.446736]  ? __x64_sys_statx+0x90/0xd0
[  461.447160]  ? do_syscall_64+0x3b/0x90
[  461.447563]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  461.448122]  </TASK>
[  461.448395] EXT4-fs error (device pmem0): ext4_lookup:1840: inode #262146: comm stat: iget: checksum invalid

Thanks,
Yi.


Zhang Yi (12):
  ext4: add debugfs interface
  ext4: introduce fault injection facility
  ext4: add several checksum fault injection
  ext4: add bitmaps I/O fault injection
  ext4: add inode I/O fault injection
  ext4: add extent block I/O fault injection
  ext4: add dirblock I/O fault injection
  ext4: call ext4_xattr_get_block() when getting xattr block
  ext4: add xattr block I/O fault injection
  ext4: add symlink block I/O fault injection
  ext4: add journal related fault injection
  ext4: remove simulate fail facility

 fs/ext4/Kconfig     |   9 ++
 fs/ext4/balloc.c    |  14 ++-
 fs/ext4/bitmap.c    |   4 +
 fs/ext4/dir.c       |   3 +
 fs/ext4/ext4.h      | 181 +++++++++++++++++++++++++++++--------
 fs/ext4/ext4_jbd2.c |  22 +++--
 fs/ext4/ext4_jbd2.h |   5 +
 fs/ext4/extents.c   |   7 ++
 fs/ext4/ialloc.c    |  24 +++--
 fs/ext4/inode.c     |  26 ++++--
 fs/ext4/namei.c     |  14 ++-
 fs/ext4/super.c     |   7 +-
 fs/ext4/symlink.c   |   4 +
 fs/ext4/sysfs.c     | 183 +++++++++++++++++++++++++++++++++++--
 fs/ext4/xattr.c     | 216 +++++++++++++++++++-------------------------
 15 files changed, 515 insertions(+), 204 deletions(-)

Comments

Zhang Yi Dec. 14, 2022, 1:46 p.m. UTC | #1
Hello, is anybody have advice?

Thanks,
Yi.

On 2022/11/10 10:25, Zhang Yi wrote:
> Changes since v1:
>  - Fix format error in ext4_fault_ops_write().
> 
> Now we can test ext4's reliability by simulating fail facility introduced
> in commit 46f870d690fe ("ext4: simulate various I/O and checksum errors
> when reading metadata"), it can simulate checksum error or I/O error
> when reading metadata from disk. But it is functional limited, it cannot
> set failure times, probability, filters, etc. Fortunately, we already
> have common fault-injection frame in Linux, so above limitation could be
> easily supplied by using it in ext4. This patch set add ext4
> fault-injection facility to replace the old frame, supply some kinds of
> checksum error and I/O error, and also add group, inode, physical
> block and inode logical block filters. After this patch set, we could
> inject failure more precisly. The facility could be used to do fuzz
> stress test include random errors, and it also could be used to
> reprodece issues more conveniently.
> 
> Patch 1: add debugfs for preparing.
> Patch 2: introduce the fault-injection frame for ext4.
> Patch 3-11: add various kinds of faults and also do some cleanup.
> Patch 12: remove the old simulating facility.
> 
> It provides a debugfs interface in ext4/<disk>/fault_inject, besides the
> common config interfaces, we give 6 more.
>  - available_faults: present available faults we can inject.
>  - inject_faults: set faults, can set multiple at a time.
>  - inject_inode: set the inode filter, matches all inodes if not set.
>  - inject_group: set the block group filter, similar to inject_inode.
>  - inject_logical_block: set the logical block filter for one inode.
>  - inject_physical_block: set the physical block filter.
> 
> Current we add 20 available faults list below, include 8 kinds of
> metadata checksum error, 7 metadata I/O error and 5 journal error.
> After we have this facility, more other faults could be added easily
> in the future.
>  - group_desc_checksum
>  - inode_bitmap_checksum
>  - block_bitmap_checksum
>  - inode_checksum
>  - extent_block_checksum
>  - dir_block_checksum
>  - dir_index_block_checksum
>  - xattr_block_checksum
>  - inode_bitmap_eio
>  - block_bitmap_eio
>  - inode_eio
>  - extent_block_eio
>  - dir_block_eio
>  - xattr_block_eio
>  - symlink_block_eio
>  - journal_start
>  - journal_start_sb
>  - journal_get_create_access
>  - journal_get_write_access
>  - journal_dirty_metadata
> 
> For example: inject inode metadata checksum error on file 'foo'.
> 
> $ mkfs.ext4 -F /dev/pmem0
> $ mount /dev/pmem0 /mnt
> $ mkdir /mnt/dir
> $ touch /mnt/dir/foo
> $ ls -i /mnt/dir/foo
>   262146 /mnt/foo
> 
> $ echo 100 > /sys/kernel/debug/ext4/pmem0/fault_inject/probability
> $ echo 1 > /sys/kernel/debug/ext4/pmem0/fault_inject/times
> $ echo 262146 > /sys/kernel/debug/ext4/pmem0/fault_inject/inject_inode
> $ echo inode_checksum > /sys/kernel/debug/ext4/pmem0/fault_inject/inject_faults
> $ echo 1 > /sys/kernel/debug/ext4/pmem0/fault_inject/enable
> $ echo 3 > /proc/sys/vm/drop_caches ##drop cache
> $ stat /mnt/dir/foo
>   stat: cannot statx '/mnt/dir/foo': Bad message
> 
> The kmesg print the injection location.
> 
> [  461.433817] FAULT_INJECTION: forcing a failure.
> [  461.433817] name fault_inject, interval 1, probability 100, space 0, times 1
> ...
> [  461.438609] Call Trace:
> [  461.438875]  <TASK>
> [  461.439116]  ? dump_stack_lvl+0x73/0xa3
> [  461.439534]  ? dump_stack+0x13/0x1f
> [  461.439909]  ? should_fail.cold+0x4a/0x57
> [  461.440346]  ? ext4_should_fail.cold+0x11f/0x135
> [  461.440833]  ? __ext4_iget+0x407/0x1410
> [  461.441245]  ? ext4_lookup+0x1be/0x350
> [  461.441650]  ? __lookup_slow+0xb9/0x1f0
> [  461.442070]  ? lookup_slow+0x46/0x70
> [  461.442463]  ? walk_component+0x13e/0x230
> [  461.442890]  ? path_lookupat.isra.0+0x8f/0x200
> [  461.443369]  ? filename_lookup+0xd6/0x240
> [  461.443798]  ? vfs_statx+0xa6/0x200
> [  461.444186]  ? do_statx+0x48/0xc0
> [  461.444546]  ? __might_sleep+0x56/0xc0
> [  461.444950]  ? should_fail_usercopy+0x19/0x30
> [  461.445424]  ? strncpy_from_user+0x33/0x2a0
> [  461.445870]  ? getname_flags+0x95/0x330
> [  461.446288]  ? switch_fpu_return+0x27/0x1e0
> [  461.446736]  ? __x64_sys_statx+0x90/0xd0
> [  461.447160]  ? do_syscall_64+0x3b/0x90
> [  461.447563]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  461.448122]  </TASK>
> [  461.448395] EXT4-fs error (device pmem0): ext4_lookup:1840: inode #262146: comm stat: iget: checksum invalid
> 
> Thanks,
> Yi.
> 
> 
> Zhang Yi (12):
>   ext4: add debugfs interface
>   ext4: introduce fault injection facility
>   ext4: add several checksum fault injection
>   ext4: add bitmaps I/O fault injection
>   ext4: add inode I/O fault injection
>   ext4: add extent block I/O fault injection
>   ext4: add dirblock I/O fault injection
>   ext4: call ext4_xattr_get_block() when getting xattr block
>   ext4: add xattr block I/O fault injection
>   ext4: add symlink block I/O fault injection
>   ext4: add journal related fault injection
>   ext4: remove simulate fail facility
> 
>  fs/ext4/Kconfig     |   9 ++
>  fs/ext4/balloc.c    |  14 ++-
>  fs/ext4/bitmap.c    |   4 +
>  fs/ext4/dir.c       |   3 +
>  fs/ext4/ext4.h      | 181 +++++++++++++++++++++++++++++--------
>  fs/ext4/ext4_jbd2.c |  22 +++--
>  fs/ext4/ext4_jbd2.h |   5 +
>  fs/ext4/extents.c   |   7 ++
>  fs/ext4/ialloc.c    |  24 +++--
>  fs/ext4/inode.c     |  26 ++++--
>  fs/ext4/namei.c     |  14 ++-
>  fs/ext4/super.c     |   7 +-
>  fs/ext4/symlink.c   |   4 +
>  fs/ext4/sysfs.c     | 183 +++++++++++++++++++++++++++++++++++--
>  fs/ext4/xattr.c     | 216 +++++++++++++++++++-------------------------
>  15 files changed, 515 insertions(+), 204 deletions(-)
>