[RFCv3,2/4] ext4: Add ext4_ilock & ext4_iunlock API

Message ID	20191120050024.11161-3-riteshh@linux.ibm.com
State	New
Headers	show Return-Path: <linux-ext4-owner@vger.kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <linux-ext4@vger.kernel.org> from <riteshh@linux.ibm.com>; Wed, 20 Nov 2019 05:00:36 -0000 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 20 Nov 2019 05:00:33 -0000 From: Ritesh Harjani <riteshh@linux.ibm.com> To: jack@suse.cz, tytso@mit.edu, linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, mbobrowski@mbobrowski.org, riteshh@linux.ibm.com Subject: [RFCv3 2/4] ext4: Add ext4_ilock & ext4_iunlock API Date: Wed, 20 Nov 2019 10:30:22 +0530 In-Reply-To: <20191120050024.11161-1-riteshh@linux.ibm.com> References: <20191120050024.11161-1-riteshh@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <20191120050024.11161-3-riteshh@linux.ibm.com> Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk
Series	ext4: Introducing ilock wrapper APIs & fixing i_rwsem scalablity prob. in DIO mixed-rw \| expand [RFCv3,0/4] ext4: Introducing ilock wrapper APIs & fixing i_rwsem scalablity prob. in DIO mixed-rw [RFCv3,1/4] ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT [RFCv3,2/4] ext4: Add ext4_ilock & ext4_iunlock API [RFCv3,3/4] ext4: start with shared iolock in case of DIO instead of excl. iolock [RFCv3,4/4] ext4: Move to shared iolock even without dioread_nolock mount opt

Ritesh Harjani Nov. 20, 2019, 5 a.m. UTC

This adds ext4_ilock/iunlock types of APIs.
This is the preparation APIs to make shared
locking/unlocking & restarting with exclusive
locking/unlocking easier in next patch.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/ext4.h    | 33 ++++++++++++++++++++++++++++++
 fs/ext4/extents.c | 16 +++++++--------
 fs/ext4/file.c    | 52 +++++++++++++++++++++++------------------------
 fs/ext4/inode.c   |  4 ++--
 fs/ext4/ioctl.c   | 16 +++++++--------
 fs/ext4/super.c   | 12 +++++------
 fs/ext4/xattr.c   | 17 ++++++++--------
 7 files changed, 92 insertions(+), 58 deletions(-)

Matthew Bobrowski Nov. 20, 2019, 11:23 a.m. UTC | #1

On Wed, Nov 20, 2019 at 10:30:22AM +0530, Ritesh Harjani wrote:
> This adds ext4_ilock/iunlock types of APIs.
> This is the preparation APIs to make shared
> locking/unlocking & restarting with exclusive
> locking/unlocking easier in next patch.

*scratches head*

A nit, but what's with the changelog wrapping at like ~40 characters?

> +#define EXT4_IOLOCK_EXCL	(1 << 0)
> +#define EXT4_IOLOCK_SHARED	(1 << 1)
>
> +static inline void ext4_ilock(struct inode *inode, unsigned int iolock)
> +{
> +	if (iolock == EXT4_IOLOCK_EXCL)
> +		inode_lock(inode);
> +	else
> +		inode_lock_shared(inode);
> +}
> +
> +static inline void ext4_iunlock(struct inode *inode, unsigned int iolock)
> +{
> +	if (iolock == EXT4_IOLOCK_EXCL)
> +		inode_unlock(inode);
> +	else
> +		inode_unlock_shared(inode);
> +}
> +
> +static inline int ext4_ilock_nowait(struct inode *inode, unsigned int iolock)
> +{
> +	if (iolock == EXT4_IOLOCK_EXCL)
> +		return inode_trylock(inode);
> +	else
> +		return inode_trylock_shared(inode);
> +}

Is it really necessary for all these helpers to actually have the
'else' statement? Could we not just return/set whatever takes the
'else' branch directly from the end of these functions? I think it
would be cleaner that way.

/me doesn't really like the naming of these functions either.

What's people's opinion on changing these for example:
   - ext4_inode_lock()
   - ext4_inode_unlock()

Or, better yet, is there any reason why we've never actually
considered naming such functions to have the verb precede the actual
object that we're performing the operation on? In my opinion, it
totally makes way more sense from a code readability standpoint and
overall intent of the function. For example:
   - ext4_lock_inode()
   - ext4_unlock_inode()

> +static inline void ext4_ilock_demote(struct inode *inode, unsigned int iolock)
> +{
> +	BUG_ON(iolock != EXT4_IOLOCK_EXCL);
> +	downgrade_write(&inode->i_rwsem);
> +}
> +

Same principle would also apply here.

On an ending note, I'm not really sure that I like the name of these
macros. Like, for example, expand the macro 'EXT4_IOLOCK_EXCL' into
plain english words as if you were reading it. This would translate to
something like 'EXT4 INPUT/OUPUT LOCK EXCLUSIVE' or 'EXT4 IO LOCK
EXCLUSIVE'. Just flipping the words around make a significant
improvement for overall readability i.e. 'EXT4_EXCL_IOLOCK', which
would expand out to 'EXT4 EXCLUSIVE IO LOCK'. Speaking of, is there
any reason why we don't mention 'INODE' here seeing as though that's
the object we're actually protecting by taking one of these locking
mechanisms?

/M

Ritesh Harjani Nov. 20, 2019, 12:18 p.m. UTC | #2

Hello Matthew,

Thanks for the review.

On 11/20/19 4:53 PM, Matthew Bobrowski wrote:
> On Wed, Nov 20, 2019 at 10:30:22AM +0530, Ritesh Harjani wrote:
>> This adds ext4_ilock/iunlock types of APIs.
>> This is the preparation APIs to make shared
>> locking/unlocking & restarting with exclusive
>> locking/unlocking easier in next patch.
> 
> *scratches head*
> 
> A nit, but what's with the changelog wrapping at like ~40 characters?

Yup will fix that next time. Thanks.

> 
>> +#define EXT4_IOLOCK_EXCL	(1 << 0)
>> +#define EXT4_IOLOCK_SHARED	(1 << 1)
>>
>> +static inline void ext4_ilock(struct inode *inode, unsigned int iolock)
>> +{
>> +	if (iolock == EXT4_IOLOCK_EXCL)
>> +		inode_lock(inode);
>> +	else
>> +		inode_lock_shared(inode);
>> +}
>> +
>> +static inline void ext4_iunlock(struct inode *inode, unsigned int iolock)
>> +{
>> +	if (iolock == EXT4_IOLOCK_EXCL)
>> +		inode_unlock(inode);
>> +	else
>> +		inode_unlock_shared(inode);
>> +}
>> +
>> +static inline int ext4_ilock_nowait(struct inode *inode, unsigned int iolock)
>> +{
>> +	if (iolock == EXT4_IOLOCK_EXCL)
>> +		return inode_trylock(inode);
>> +	else
>> +		return inode_trylock_shared(inode);
>> +}
> 
> Is it really necessary for all these helpers to actually have the
> 'else' statement? Could we not just return/set whatever takes the
> 'else' branch directly from the end of these functions? I think it
> would be cleaner that way.

Sure np.

> 
> /me doesn't really like the naming of these functions either.

:) difference of opinion.

> 
> What's people's opinion on changing these for example:
>     - ext4_inode_lock()
>     - ext4_inode_unlock()
> 

ext4_ilock/iunlock sounds better to me as it is short too.
But if others have also have a strong opinion towards
ext4_inode_lock/unlock() - I am ok with that.


> Or, better yet, is there any reason why we've never actually
> considered naming such functions to have the verb precede the actual
> object that we're performing the operation on? In my opinion, it
> totally makes way more sense from a code readability standpoint and
> overall intent of the function. For example:
>     - ext4_lock_inode()
>     - ext4_unlock_inode()

Not against your suggestion here.
But in kernel I do see a preference towards object followed by a verb.
At least in vfs I see functions like inode_lock()/unlock().

Plus I would not deny that this naming is also inspired from
xfs_ilock()/iunlock API names.

> 
>> +static inline void ext4_ilock_demote(struct inode *inode, unsigned int iolock)
>> +{
>> +	BUG_ON(iolock != EXT4_IOLOCK_EXCL);
>> +	downgrade_write(&inode->i_rwsem);
>> +}
>> +
> 
> Same principle would also apply here.
> 
> On an ending note, I'm not really sure that I like the name of these
> macros. Like, for example, expand the macro 'EXT4_IOLOCK_EXCL' into
> plain english words as if you were reading it. This would translate to
> something like 'EXT4 INPUT/OUPUT LOCK EXCLUSIVE' or 'EXT4 IO LOCK
> EXCLUSIVE'. Just flipping the words around make a significant
> improvement for overall readability i.e. 'EXT4_EXCL_IOLOCK', which
> would expand out to 'EXT4 EXCLUSIVE IO LOCK'. Speaking of, is there

Ditto. Unless you and others have a strong objection, I would rather
keep this as is :)


> any reason why we don't mention 'INODE' here seeing as though that's
> the object we're actually protecting by taking one of these locking
> mechanisms?

hmm, it was increasing the name of the macro if I do it that way.
But that's ok. Is below macro name better?

#define EXT4_INODE_IOLOCK_EXCL		(1 << 0)
#define EXT4_INODE_IOLOCK_SHARED	(1 << 1)


Thanks for the review!!
-ritesh

Jan Kara Nov. 20, 2019, 1:11 p.m. UTC | #3

On Wed 20-11-19 10:30:22, Ritesh Harjani wrote:
> This adds ext4_ilock/iunlock types of APIs.
> This is the preparation APIs to make shared
> locking/unlocking & restarting with exclusive
> locking/unlocking easier in next patch.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

I know XFS does it this way but I don't think we need the obsurity of
additional locking helpers etc. just because of one place in
ext4_dio_write_iter() that will use this. So I'd just drop this patch...

								Honza

> ---
>  fs/ext4/ext4.h    | 33 ++++++++++++++++++++++++++++++
>  fs/ext4/extents.c | 16 +++++++--------
>  fs/ext4/file.c    | 52 +++++++++++++++++++++++------------------------
>  fs/ext4/inode.c   |  4 ++--
>  fs/ext4/ioctl.c   | 16 +++++++--------
>  fs/ext4/super.c   | 12 +++++------
>  fs/ext4/xattr.c   | 17 ++++++++--------
>  7 files changed, 92 insertions(+), 58 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 61987c106511..b4169a92e8d0 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2960,6 +2960,39 @@ do {								\
>  #define EXT4_FREECLUSTERS_WATERMARK 0
>  #endif
>  
> +#define EXT4_IOLOCK_EXCL	(1 << 0)
> +#define EXT4_IOLOCK_SHARED	(1 << 1)
> +
> +static inline void ext4_ilock(struct inode *inode, unsigned int iolock)
> +{
> +	if (iolock == EXT4_IOLOCK_EXCL)
> +		inode_lock(inode);
> +	else
> +		inode_lock_shared(inode);
> +}
> +
> +static inline void ext4_iunlock(struct inode *inode, unsigned int iolock)
> +{
> +	if (iolock == EXT4_IOLOCK_EXCL)
> +		inode_unlock(inode);
> +	else
> +		inode_unlock_shared(inode);
> +}
> +
> +static inline int ext4_ilock_nowait(struct inode *inode, unsigned int iolock)
> +{
> +	if (iolock == EXT4_IOLOCK_EXCL)
> +		return inode_trylock(inode);
> +	else
> +		return inode_trylock_shared(inode);
> +}
> +
> +static inline void ext4_ilock_demote(struct inode *inode, unsigned int iolock)
> +{
> +	BUG_ON(iolock != EXT4_IOLOCK_EXCL);
> +	downgrade_write(&inode->i_rwsem);
> +}
> +
>  /* Update i_disksize. Requires i_mutex to avoid races with truncate */
>  static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
>  {
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 0e8708b77da6..08dd57558533 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4754,7 +4754,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  	else
>  		max_blocks -= lblk;
>  
> -	inode_lock(inode);
> +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  
>  	/*
>  	 * Indirect files do not support unwritten extnets
> @@ -4864,7 +4864,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  
>  	ext4_journal_stop(handle);
>  out_mutex:
> -	inode_unlock(inode);
> +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  	return ret;
>  }
>  
> @@ -4930,7 +4930,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  	if (mode & FALLOC_FL_KEEP_SIZE)
>  		flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
>  
> -	inode_lock(inode);
> +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  
>  	/*
>  	 * We only support preallocation for extent-based files only
> @@ -4961,7 +4961,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  						EXT4_I(inode)->i_sync_tid);
>  	}
>  out:
> -	inode_unlock(inode);
> +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  	trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
>  	return ret;
>  }
> @@ -5509,7 +5509,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
>  			return ret;
>  	}
>  
> -	inode_lock(inode);
> +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  	/*
>  	 * There is no need to overlap collapse range with EOF, in which case
>  	 * it is effectively a truncate operation
> @@ -5608,7 +5608,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
>  out_mmap:
>  	up_write(&EXT4_I(inode)->i_mmap_sem);
>  out_mutex:
> -	inode_unlock(inode);
> +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  	return ret;
>  }
>  
> @@ -5659,7 +5659,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
>  			return ret;
>  	}
>  
> -	inode_lock(inode);
> +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  	/* Currently just for extent based files */
>  	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
>  		ret = -EOPNOTSUPP;
> @@ -5786,7 +5786,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
>  out_mmap:
>  	up_write(&EXT4_I(inode)->i_mmap_sem);
>  out_mutex:
> -	inode_unlock(inode);
> +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  	return ret;
>  }
>  
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 977ac58dc718..ebe3f051598d 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -55,14 +55,14 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	struct inode *inode = file_inode(iocb->ki_filp);
>  
>  	if (iocb->ki_flags & IOCB_NOWAIT) {
> -		if (!inode_trylock_shared(inode))
> +		if (!ext4_ilock_nowait(inode, EXT4_IOLOCK_SHARED))
>  			return -EAGAIN;
>  	} else {
> -		inode_lock_shared(inode);
> +		ext4_ilock(inode, EXT4_IOLOCK_SHARED);
>  	}
>  
>  	if (!ext4_dio_supported(inode)) {
> -		inode_unlock_shared(inode);
> +		ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
>  		/*
>  		 * Fallback to buffered I/O if the operation being performed on
>  		 * the inode is not supported by direct I/O. The IOCB_DIRECT
> @@ -76,7 +76,7 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  
>  	ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL,
>  			   is_sync_kiocb(iocb));
> -	inode_unlock_shared(inode);
> +	ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
>  
>  	file_accessed(iocb->ki_filp);
>  	return ret;
> @@ -89,22 +89,23 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	ssize_t ret;
>  
>  	if (iocb->ki_flags & IOCB_NOWAIT) {
> -		if (!inode_trylock_shared(inode))
> +		if (!ext4_ilock_nowait(inode, EXT4_IOLOCK_SHARED))
>  			return -EAGAIN;
>  	} else {
> -		inode_lock_shared(inode);
> +		ext4_ilock(inode, EXT4_IOLOCK_SHARED);
>  	}
> +
>  	/*
>  	 * Recheck under inode lock - at this point we are sure it cannot
>  	 * change anymore
>  	 */
>  	if (!IS_DAX(inode)) {
> -		inode_unlock_shared(inode);
> +		ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
>  		/* Fallback to buffered IO in case we cannot support DAX */
>  		return generic_file_read_iter(iocb, to);
>  	}
>  	ret = dax_iomap_rw(iocb, to, &ext4_iomap_ops);
> -	inode_unlock_shared(inode);
> +	ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
>  
>  	file_accessed(iocb->ki_filp);
>  	return ret;
> @@ -244,7 +245,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
>  	if (iocb->ki_flags & IOCB_NOWAIT)
>  		return -EOPNOTSUPP;
>  
> -	inode_lock(inode);
> +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  	ret = ext4_write_checks(iocb, from);
>  	if (ret <= 0)
>  		goto out;
> @@ -254,7 +255,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
>  	current->backing_dev_info = NULL;
>  
>  out:
> -	inode_unlock(inode);
> +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  	if (likely(ret > 0)) {
>  		iocb->ki_pos += ret;
>  		ret = generic_write_sync(iocb, ret);
> @@ -372,16 +373,17 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  	handle_t *handle;
>  	struct inode *inode = file_inode(iocb->ki_filp);
>  	bool extend = false, overwrite = false, unaligned_aio = false;
> +	unsigned int iolock = EXT4_IOLOCK_EXCL;
>  
>  	if (iocb->ki_flags & IOCB_NOWAIT) {
> -		if (!inode_trylock(inode))
> +		if (!ext4_ilock_nowait(inode, iolock))
>  			return -EAGAIN;
>  	} else {
> -		inode_lock(inode);
> +		ext4_ilock(inode, iolock);
>  	}
>  
>  	if (!ext4_dio_supported(inode)) {
> -		inode_unlock(inode);
> +		ext4_iunlock(inode, iolock);
>  		/*
>  		 * Fallback to buffered I/O if the inode does not support
>  		 * direct I/O.
> @@ -391,7 +393,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  
>  	ret = ext4_write_checks(iocb, from);
>  	if (ret <= 0) {
> -		inode_unlock(inode);
> +		ext4_iunlock(inode, iolock);
>  		return ret;
>  	}
>  
> @@ -416,7 +418,8 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  	if (!unaligned_aio && ext4_overwrite_io(inode, offset, count) &&
>  	    ext4_should_dioread_nolock(inode)) {
>  		overwrite = true;
> -		downgrade_write(&inode->i_rwsem);
> +		ext4_ilock_demote(inode, iolock);
> +		iolock = EXT4_IOLOCK_SHARED;
>  	}
>  
>  	if (offset + count > EXT4_I(inode)->i_disksize) {
> @@ -443,10 +446,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  		ret = ext4_handle_inode_extension(inode, offset, ret, count);
>  
>  out:
> -	if (overwrite)
> -		inode_unlock_shared(inode);
> -	else
> -		inode_unlock(inode);
> +	ext4_iunlock(inode, iolock);
>  
>  	if (ret >= 0 && iov_iter_count(from)) {
>  		ssize_t err;
> @@ -489,10 +489,10 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  	struct inode *inode = file_inode(iocb->ki_filp);
>  
>  	if (iocb->ki_flags & IOCB_NOWAIT) {
> -		if (!inode_trylock(inode))
> +		if (!ext4_ilock_nowait(inode, EXT4_IOLOCK_EXCL))
>  			return -EAGAIN;
>  	} else {
> -		inode_lock(inode);
> +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  	}
>  
>  	ret = ext4_write_checks(iocb, from);
> @@ -524,7 +524,7 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  	if (extend)
>  		ret = ext4_handle_inode_extension(inode, offset, ret, count);
>  out:
> -	inode_unlock(inode);
> +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  	if (ret > 0)
>  		ret = generic_write_sync(iocb, ret);
>  	return ret;
> @@ -757,16 +757,16 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int whence)
>  		return generic_file_llseek_size(file, offset, whence,
>  						maxbytes, i_size_read(inode));
>  	case SEEK_HOLE:
> -		inode_lock_shared(inode);
> +		ext4_ilock(inode, EXT4_IOLOCK_SHARED);
>  		offset = iomap_seek_hole(inode, offset,
>  					 &ext4_iomap_report_ops);
> -		inode_unlock_shared(inode);
> +		ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
>  		break;
>  	case SEEK_DATA:
> -		inode_lock_shared(inode);
> +		ext4_ilock(inode, EXT4_IOLOCK_SHARED);
>  		offset = iomap_seek_data(inode, offset,
>  					 &ext4_iomap_report_ops);
> -		inode_unlock_shared(inode);
> +		ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
>  		break;
>  	}
>  
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 381813205f99..39dcc22667a1 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3930,7 +3930,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>  			return ret;
>  	}
>  
> -	inode_lock(inode);
> +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  
>  	/* No need to punch hole beyond i_size */
>  	if (offset >= inode->i_size)
> @@ -4037,7 +4037,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>  out_dio:
>  	up_write(&EXT4_I(inode)->i_mmap_sem);
>  out_mutex:
> -	inode_unlock(inode);
> +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  	return ret;
>  }
>  
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index 0b7f316fd30f..43b7a23dc57b 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -855,13 +855,13 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		if (err)
>  			return err;
>  
> -		inode_lock(inode);
> +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  		err = ext4_ioctl_check_immutable(inode,
>  				from_kprojid(&init_user_ns, ei->i_projid),
>  				flags);
>  		if (!err)
>  			err = ext4_ioctl_setflags(inode, flags);
> -		inode_unlock(inode);
> +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  		mnt_drop_write_file(filp);
>  		return err;
>  	}
> @@ -892,7 +892,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  			goto setversion_out;
>  		}
>  
> -		inode_lock(inode);
> +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  		handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
>  		if (IS_ERR(handle)) {
>  			err = PTR_ERR(handle);
> @@ -907,7 +907,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		ext4_journal_stop(handle);
>  
>  unlock_out:
> -		inode_unlock(inode);
> +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  setversion_out:
>  		mnt_drop_write_file(filp);
>  		return err;
> @@ -1026,9 +1026,9 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		 * ext4_ext_swap_inode_data before we switch the
>  		 * inode format to prevent read.
>  		 */
> -		inode_lock((inode));
> +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  		err = ext4_ext_migrate(inode);
> -		inode_unlock((inode));
> +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  		mnt_drop_write_file(filp);
>  		return err;
>  	}
> @@ -1272,7 +1272,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		if (err)
>  			return err;
>  
> -		inode_lock(inode);
> +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  		ext4_fill_fsxattr(inode, &old_fa);
>  		err = vfs_ioc_fssetxattr_check(inode, &old_fa, &fa);
>  		if (err)
> @@ -1287,7 +1287,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  			goto out;
>  		err = ext4_ioctl_setproject(filp, fa.fsx_projid);
>  out:
> -		inode_unlock(inode);
> +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  		mnt_drop_write_file(filp);
>  		return err;
>  	}
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 7796e2ffc294..48b83b2cf0ad 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2682,12 +2682,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
>  					__func__, inode->i_ino, inode->i_size);
>  			jbd_debug(2, "truncating inode %lu to %lld bytes\n",
>  				  inode->i_ino, inode->i_size);
> -			inode_lock(inode);
> +			ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  			truncate_inode_pages(inode->i_mapping, inode->i_size);
>  			ret = ext4_truncate(inode);
>  			if (ret)
>  				ext4_std_error(inode->i_sb, ret);
> -			inode_unlock(inode);
> +			ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  			nr_truncates++;
>  		} else {
>  			if (test_opt(sb, DEBUG))
> @@ -5785,7 +5785,7 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id,
>  		 * files. If this fails, we return success anyway since quotas
>  		 * are already enabled and this is not a hard failure.
>  		 */
> -		inode_lock(inode);
> +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  		handle = ext4_journal_start(inode, EXT4_HT_QUOTA, 1);
>  		if (IS_ERR(handle))
>  			goto unlock_inode;
> @@ -5795,7 +5795,7 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id,
>  		ext4_mark_inode_dirty(handle, inode);
>  		ext4_journal_stop(handle);
>  	unlock_inode:
> -		inode_unlock(inode);
> +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  	}
>  	return err;
>  }
> @@ -5887,7 +5887,7 @@ static int ext4_quota_off(struct super_block *sb, int type)
>  	if (err || ext4_has_feature_quota(sb))
>  		goto out_put;
>  
> -	inode_lock(inode);
> +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  	/*
>  	 * Update modification times of quota files when userspace can
>  	 * start looking at them. If we fail, we return success anyway since
> @@ -5902,7 +5902,7 @@ static int ext4_quota_off(struct super_block *sb, int type)
>  	ext4_mark_inode_dirty(handle, inode);
>  	ext4_journal_stop(handle);
>  out_unlock:
> -	inode_unlock(inode);
> +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  out_put:
>  	lockdep_set_quota_inode(inode, I_DATA_SEM_NORMAL);
>  	iput(inode);
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 8966a5439a22..5c2dcc4c836a 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -422,9 +422,9 @@ static int ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino,
>  		ext4_set_inode_state(inode, EXT4_STATE_LUSTRE_EA_INODE);
>  		ext4_xattr_inode_set_ref(inode, 1);
>  	} else {
> -		inode_lock(inode);
> +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
>  		inode->i_flags |= S_NOQUOTA;
> -		inode_unlock(inode);
> +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
>  	}
>  
>  	*ea_inode = inode;
> @@ -976,7 +976,7 @@ static int ext4_xattr_inode_update_ref(handle_t *handle, struct inode *ea_inode,
>  	u32 hash;
>  	int ret;
>  
> -	inode_lock(ea_inode);
> +	ext4_ilock(ea_inode, EXT4_IOLOCK_EXCL);
>  
>  	ret = ext4_reserve_inode_write(handle, ea_inode, &iloc);
>  	if (ret)
> @@ -1030,7 +1030,7 @@ static int ext4_xattr_inode_update_ref(handle_t *handle, struct inode *ea_inode,
>  		ext4_warning_inode(ea_inode,
>  				   "ext4_mark_iloc_dirty() failed ret=%d", ret);
>  out:
> -	inode_unlock(ea_inode);
> +	ext4_iunlock(ea_inode, EXT4_IOLOCK_EXCL);
>  	return ret;
>  }
>  
> @@ -1380,10 +1380,11 @@ static int ext4_xattr_inode_write(handle_t *handle, struct inode *ea_inode,
>  		block += 1;
>  	}
>  
> -	inode_lock(ea_inode);
> +	ext4_ilock(ea_inode, EXT4_IOLOCK_EXCL);
>  	i_size_write(ea_inode, wsize);
>  	ext4_update_i_disksize(ea_inode, wsize);
> -	inode_unlock(ea_inode);
> +	ext4_iunlock(ea_inode, EXT4_IOLOCK_EXCL);
> +
>  
>  	ext4_mark_inode_dirty(handle, ea_inode);
>  
> @@ -1432,9 +1433,9 @@ static struct inode *ext4_xattr_inode_create(handle_t *handle,
>  		 */
>  		dquot_free_inode(ea_inode);
>  		dquot_drop(ea_inode);
> -		inode_lock(ea_inode);
> +		ext4_ilock(ea_inode, EXT4_IOLOCK_EXCL);
>  		ea_inode->i_flags |= S_NOQUOTA;
> -		inode_unlock(ea_inode);
> +		ext4_iunlock(ea_inode, EXT4_IOLOCK_EXCL);
>  	}
>  
>  	return ea_inode;
> -- 
> 2.21.0
>

Darrick Wong Nov. 20, 2019, 4:06 p.m. UTC | #4

On Wed, Nov 20, 2019 at 02:11:07PM +0100, Jan Kara wrote:
> On Wed 20-11-19 10:30:22, Ritesh Harjani wrote:
> > This adds ext4_ilock/iunlock types of APIs.
> > This is the preparation APIs to make shared
> > locking/unlocking & restarting with exclusive
> > locking/unlocking easier in next patch.
> > 
> > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> 
> I know XFS does it this way but I don't think we need the obsurity of
> additional locking helpers etc. just because of one place in
> ext4_dio_write_iter() that will use this. So I'd just drop this patch...

FWIW, if you were to add tracepoints to these inode lock and unlock
helpers, then the helpers would have the additional value that you could
use ftrace + script to diagnose inode deadlocking issues and the like.
XFS has such scripts in the xfsprogs source code and it's really nice to
have an automated system to tell you which thread forgot to let go of
something, and when.

--D

> 								Honza
> 
> > ---
> >  fs/ext4/ext4.h    | 33 ++++++++++++++++++++++++++++++
> >  fs/ext4/extents.c | 16 +++++++--------
> >  fs/ext4/file.c    | 52 +++++++++++++++++++++++------------------------
> >  fs/ext4/inode.c   |  4 ++--
> >  fs/ext4/ioctl.c   | 16 +++++++--------
> >  fs/ext4/super.c   | 12 +++++------
> >  fs/ext4/xattr.c   | 17 ++++++++--------
> >  7 files changed, 92 insertions(+), 58 deletions(-)
> > 
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 61987c106511..b4169a92e8d0 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -2960,6 +2960,39 @@ do {								\
> >  #define EXT4_FREECLUSTERS_WATERMARK 0
> >  #endif
> >  
> > +#define EXT4_IOLOCK_EXCL	(1 << 0)
> > +#define EXT4_IOLOCK_SHARED	(1 << 1)
> > +
> > +static inline void ext4_ilock(struct inode *inode, unsigned int iolock)
> > +{
> > +	if (iolock == EXT4_IOLOCK_EXCL)
> > +		inode_lock(inode);
> > +	else
> > +		inode_lock_shared(inode);
> > +}
> > +
> > +static inline void ext4_iunlock(struct inode *inode, unsigned int iolock)
> > +{
> > +	if (iolock == EXT4_IOLOCK_EXCL)
> > +		inode_unlock(inode);
> > +	else
> > +		inode_unlock_shared(inode);
> > +}
> > +
> > +static inline int ext4_ilock_nowait(struct inode *inode, unsigned int iolock)
> > +{
> > +	if (iolock == EXT4_IOLOCK_EXCL)
> > +		return inode_trylock(inode);
> > +	else
> > +		return inode_trylock_shared(inode);
> > +}
> > +
> > +static inline void ext4_ilock_demote(struct inode *inode, unsigned int iolock)
> > +{
> > +	BUG_ON(iolock != EXT4_IOLOCK_EXCL);
> > +	downgrade_write(&inode->i_rwsem);
> > +}
> > +
> >  /* Update i_disksize. Requires i_mutex to avoid races with truncate */
> >  static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
> >  {
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 0e8708b77da6..08dd57558533 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -4754,7 +4754,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
> >  	else
> >  		max_blocks -= lblk;
> >  
> > -	inode_lock(inode);
> > +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  
> >  	/*
> >  	 * Indirect files do not support unwritten extnets
> > @@ -4864,7 +4864,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
> >  
> >  	ext4_journal_stop(handle);
> >  out_mutex:
> > -	inode_unlock(inode);
> > +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  	return ret;
> >  }
> >  
> > @@ -4930,7 +4930,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >  	if (mode & FALLOC_FL_KEEP_SIZE)
> >  		flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
> >  
> > -	inode_lock(inode);
> > +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  
> >  	/*
> >  	 * We only support preallocation for extent-based files only
> > @@ -4961,7 +4961,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >  						EXT4_I(inode)->i_sync_tid);
> >  	}
> >  out:
> > -	inode_unlock(inode);
> > +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  	trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
> >  	return ret;
> >  }
> > @@ -5509,7 +5509,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
> >  			return ret;
> >  	}
> >  
> > -	inode_lock(inode);
> > +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  	/*
> >  	 * There is no need to overlap collapse range with EOF, in which case
> >  	 * it is effectively a truncate operation
> > @@ -5608,7 +5608,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
> >  out_mmap:
> >  	up_write(&EXT4_I(inode)->i_mmap_sem);
> >  out_mutex:
> > -	inode_unlock(inode);
> > +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  	return ret;
> >  }
> >  
> > @@ -5659,7 +5659,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> >  			return ret;
> >  	}
> >  
> > -	inode_lock(inode);
> > +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  	/* Currently just for extent based files */
> >  	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> >  		ret = -EOPNOTSUPP;
> > @@ -5786,7 +5786,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> >  out_mmap:
> >  	up_write(&EXT4_I(inode)->i_mmap_sem);
> >  out_mutex:
> > -	inode_unlock(inode);
> > +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  	return ret;
> >  }
> >  
> > diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> > index 977ac58dc718..ebe3f051598d 100644
> > --- a/fs/ext4/file.c
> > +++ b/fs/ext4/file.c
> > @@ -55,14 +55,14 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  	struct inode *inode = file_inode(iocb->ki_filp);
> >  
> >  	if (iocb->ki_flags & IOCB_NOWAIT) {
> > -		if (!inode_trylock_shared(inode))
> > +		if (!ext4_ilock_nowait(inode, EXT4_IOLOCK_SHARED))
> >  			return -EAGAIN;
> >  	} else {
> > -		inode_lock_shared(inode);
> > +		ext4_ilock(inode, EXT4_IOLOCK_SHARED);
> >  	}
> >  
> >  	if (!ext4_dio_supported(inode)) {
> > -		inode_unlock_shared(inode);
> > +		ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
> >  		/*
> >  		 * Fallback to buffered I/O if the operation being performed on
> >  		 * the inode is not supported by direct I/O. The IOCB_DIRECT
> > @@ -76,7 +76,7 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  
> >  	ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL,
> >  			   is_sync_kiocb(iocb));
> > -	inode_unlock_shared(inode);
> > +	ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
> >  
> >  	file_accessed(iocb->ki_filp);
> >  	return ret;
> > @@ -89,22 +89,23 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  	ssize_t ret;
> >  
> >  	if (iocb->ki_flags & IOCB_NOWAIT) {
> > -		if (!inode_trylock_shared(inode))
> > +		if (!ext4_ilock_nowait(inode, EXT4_IOLOCK_SHARED))
> >  			return -EAGAIN;
> >  	} else {
> > -		inode_lock_shared(inode);
> > +		ext4_ilock(inode, EXT4_IOLOCK_SHARED);
> >  	}
> > +
> >  	/*
> >  	 * Recheck under inode lock - at this point we are sure it cannot
> >  	 * change anymore
> >  	 */
> >  	if (!IS_DAX(inode)) {
> > -		inode_unlock_shared(inode);
> > +		ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
> >  		/* Fallback to buffered IO in case we cannot support DAX */
> >  		return generic_file_read_iter(iocb, to);
> >  	}
> >  	ret = dax_iomap_rw(iocb, to, &ext4_iomap_ops);
> > -	inode_unlock_shared(inode);
> > +	ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
> >  
> >  	file_accessed(iocb->ki_filp);
> >  	return ret;
> > @@ -244,7 +245,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
> >  	if (iocb->ki_flags & IOCB_NOWAIT)
> >  		return -EOPNOTSUPP;
> >  
> > -	inode_lock(inode);
> > +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  	ret = ext4_write_checks(iocb, from);
> >  	if (ret <= 0)
> >  		goto out;
> > @@ -254,7 +255,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
> >  	current->backing_dev_info = NULL;
> >  
> >  out:
> > -	inode_unlock(inode);
> > +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  	if (likely(ret > 0)) {
> >  		iocb->ki_pos += ret;
> >  		ret = generic_write_sync(iocb, ret);
> > @@ -372,16 +373,17 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  	handle_t *handle;
> >  	struct inode *inode = file_inode(iocb->ki_filp);
> >  	bool extend = false, overwrite = false, unaligned_aio = false;
> > +	unsigned int iolock = EXT4_IOLOCK_EXCL;
> >  
> >  	if (iocb->ki_flags & IOCB_NOWAIT) {
> > -		if (!inode_trylock(inode))
> > +		if (!ext4_ilock_nowait(inode, iolock))
> >  			return -EAGAIN;
> >  	} else {
> > -		inode_lock(inode);
> > +		ext4_ilock(inode, iolock);
> >  	}
> >  
> >  	if (!ext4_dio_supported(inode)) {
> > -		inode_unlock(inode);
> > +		ext4_iunlock(inode, iolock);
> >  		/*
> >  		 * Fallback to buffered I/O if the inode does not support
> >  		 * direct I/O.
> > @@ -391,7 +393,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  
> >  	ret = ext4_write_checks(iocb, from);
> >  	if (ret <= 0) {
> > -		inode_unlock(inode);
> > +		ext4_iunlock(inode, iolock);
> >  		return ret;
> >  	}
> >  
> > @@ -416,7 +418,8 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  	if (!unaligned_aio && ext4_overwrite_io(inode, offset, count) &&
> >  	    ext4_should_dioread_nolock(inode)) {
> >  		overwrite = true;
> > -		downgrade_write(&inode->i_rwsem);
> > +		ext4_ilock_demote(inode, iolock);
> > +		iolock = EXT4_IOLOCK_SHARED;
> >  	}
> >  
> >  	if (offset + count > EXT4_I(inode)->i_disksize) {
> > @@ -443,10 +446,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  		ret = ext4_handle_inode_extension(inode, offset, ret, count);
> >  
> >  out:
> > -	if (overwrite)
> > -		inode_unlock_shared(inode);
> > -	else
> > -		inode_unlock(inode);
> > +	ext4_iunlock(inode, iolock);
> >  
> >  	if (ret >= 0 && iov_iter_count(from)) {
> >  		ssize_t err;
> > @@ -489,10 +489,10 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  	struct inode *inode = file_inode(iocb->ki_filp);
> >  
> >  	if (iocb->ki_flags & IOCB_NOWAIT) {
> > -		if (!inode_trylock(inode))
> > +		if (!ext4_ilock_nowait(inode, EXT4_IOLOCK_EXCL))
> >  			return -EAGAIN;
> >  	} else {
> > -		inode_lock(inode);
> > +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  	}
> >  
> >  	ret = ext4_write_checks(iocb, from);
> > @@ -524,7 +524,7 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  	if (extend)
> >  		ret = ext4_handle_inode_extension(inode, offset, ret, count);
> >  out:
> > -	inode_unlock(inode);
> > +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  	if (ret > 0)
> >  		ret = generic_write_sync(iocb, ret);
> >  	return ret;
> > @@ -757,16 +757,16 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int whence)
> >  		return generic_file_llseek_size(file, offset, whence,
> >  						maxbytes, i_size_read(inode));
> >  	case SEEK_HOLE:
> > -		inode_lock_shared(inode);
> > +		ext4_ilock(inode, EXT4_IOLOCK_SHARED);
> >  		offset = iomap_seek_hole(inode, offset,
> >  					 &ext4_iomap_report_ops);
> > -		inode_unlock_shared(inode);
> > +		ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
> >  		break;
> >  	case SEEK_DATA:
> > -		inode_lock_shared(inode);
> > +		ext4_ilock(inode, EXT4_IOLOCK_SHARED);
> >  		offset = iomap_seek_data(inode, offset,
> >  					 &ext4_iomap_report_ops);
> > -		inode_unlock_shared(inode);
> > +		ext4_iunlock(inode, EXT4_IOLOCK_SHARED);
> >  		break;
> >  	}
> >  
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 381813205f99..39dcc22667a1 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -3930,7 +3930,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
> >  			return ret;
> >  	}
> >  
> > -	inode_lock(inode);
> > +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  
> >  	/* No need to punch hole beyond i_size */
> >  	if (offset >= inode->i_size)
> > @@ -4037,7 +4037,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
> >  out_dio:
> >  	up_write(&EXT4_I(inode)->i_mmap_sem);
> >  out_mutex:
> > -	inode_unlock(inode);
> > +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  	return ret;
> >  }
> >  
> > diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> > index 0b7f316fd30f..43b7a23dc57b 100644
> > --- a/fs/ext4/ioctl.c
> > +++ b/fs/ext4/ioctl.c
> > @@ -855,13 +855,13 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >  		if (err)
> >  			return err;
> >  
> > -		inode_lock(inode);
> > +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  		err = ext4_ioctl_check_immutable(inode,
> >  				from_kprojid(&init_user_ns, ei->i_projid),
> >  				flags);
> >  		if (!err)
> >  			err = ext4_ioctl_setflags(inode, flags);
> > -		inode_unlock(inode);
> > +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  		mnt_drop_write_file(filp);
> >  		return err;
> >  	}
> > @@ -892,7 +892,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >  			goto setversion_out;
> >  		}
> >  
> > -		inode_lock(inode);
> > +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  		handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
> >  		if (IS_ERR(handle)) {
> >  			err = PTR_ERR(handle);
> > @@ -907,7 +907,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >  		ext4_journal_stop(handle);
> >  
> >  unlock_out:
> > -		inode_unlock(inode);
> > +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  setversion_out:
> >  		mnt_drop_write_file(filp);
> >  		return err;
> > @@ -1026,9 +1026,9 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >  		 * ext4_ext_swap_inode_data before we switch the
> >  		 * inode format to prevent read.
> >  		 */
> > -		inode_lock((inode));
> > +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  		err = ext4_ext_migrate(inode);
> > -		inode_unlock((inode));
> > +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  		mnt_drop_write_file(filp);
> >  		return err;
> >  	}
> > @@ -1272,7 +1272,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >  		if (err)
> >  			return err;
> >  
> > -		inode_lock(inode);
> > +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  		ext4_fill_fsxattr(inode, &old_fa);
> >  		err = vfs_ioc_fssetxattr_check(inode, &old_fa, &fa);
> >  		if (err)
> > @@ -1287,7 +1287,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >  			goto out;
> >  		err = ext4_ioctl_setproject(filp, fa.fsx_projid);
> >  out:
> > -		inode_unlock(inode);
> > +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  		mnt_drop_write_file(filp);
> >  		return err;
> >  	}
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 7796e2ffc294..48b83b2cf0ad 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -2682,12 +2682,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
> >  					__func__, inode->i_ino, inode->i_size);
> >  			jbd_debug(2, "truncating inode %lu to %lld bytes\n",
> >  				  inode->i_ino, inode->i_size);
> > -			inode_lock(inode);
> > +			ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  			truncate_inode_pages(inode->i_mapping, inode->i_size);
> >  			ret = ext4_truncate(inode);
> >  			if (ret)
> >  				ext4_std_error(inode->i_sb, ret);
> > -			inode_unlock(inode);
> > +			ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  			nr_truncates++;
> >  		} else {
> >  			if (test_opt(sb, DEBUG))
> > @@ -5785,7 +5785,7 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id,
> >  		 * files. If this fails, we return success anyway since quotas
> >  		 * are already enabled and this is not a hard failure.
> >  		 */
> > -		inode_lock(inode);
> > +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  		handle = ext4_journal_start(inode, EXT4_HT_QUOTA, 1);
> >  		if (IS_ERR(handle))
> >  			goto unlock_inode;
> > @@ -5795,7 +5795,7 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id,
> >  		ext4_mark_inode_dirty(handle, inode);
> >  		ext4_journal_stop(handle);
> >  	unlock_inode:
> > -		inode_unlock(inode);
> > +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  	}
> >  	return err;
> >  }
> > @@ -5887,7 +5887,7 @@ static int ext4_quota_off(struct super_block *sb, int type)
> >  	if (err || ext4_has_feature_quota(sb))
> >  		goto out_put;
> >  
> > -	inode_lock(inode);
> > +	ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  	/*
> >  	 * Update modification times of quota files when userspace can
> >  	 * start looking at them. If we fail, we return success anyway since
> > @@ -5902,7 +5902,7 @@ static int ext4_quota_off(struct super_block *sb, int type)
> >  	ext4_mark_inode_dirty(handle, inode);
> >  	ext4_journal_stop(handle);
> >  out_unlock:
> > -	inode_unlock(inode);
> > +	ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  out_put:
> >  	lockdep_set_quota_inode(inode, I_DATA_SEM_NORMAL);
> >  	iput(inode);
> > diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> > index 8966a5439a22..5c2dcc4c836a 100644
> > --- a/fs/ext4/xattr.c
> > +++ b/fs/ext4/xattr.c
> > @@ -422,9 +422,9 @@ static int ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino,
> >  		ext4_set_inode_state(inode, EXT4_STATE_LUSTRE_EA_INODE);
> >  		ext4_xattr_inode_set_ref(inode, 1);
> >  	} else {
> > -		inode_lock(inode);
> > +		ext4_ilock(inode, EXT4_IOLOCK_EXCL);
> >  		inode->i_flags |= S_NOQUOTA;
> > -		inode_unlock(inode);
> > +		ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
> >  	}
> >  
> >  	*ea_inode = inode;
> > @@ -976,7 +976,7 @@ static int ext4_xattr_inode_update_ref(handle_t *handle, struct inode *ea_inode,
> >  	u32 hash;
> >  	int ret;
> >  
> > -	inode_lock(ea_inode);
> > +	ext4_ilock(ea_inode, EXT4_IOLOCK_EXCL);
> >  
> >  	ret = ext4_reserve_inode_write(handle, ea_inode, &iloc);
> >  	if (ret)
> > @@ -1030,7 +1030,7 @@ static int ext4_xattr_inode_update_ref(handle_t *handle, struct inode *ea_inode,
> >  		ext4_warning_inode(ea_inode,
> >  				   "ext4_mark_iloc_dirty() failed ret=%d", ret);
> >  out:
> > -	inode_unlock(ea_inode);
> > +	ext4_iunlock(ea_inode, EXT4_IOLOCK_EXCL);
> >  	return ret;
> >  }
> >  
> > @@ -1380,10 +1380,11 @@ static int ext4_xattr_inode_write(handle_t *handle, struct inode *ea_inode,
> >  		block += 1;
> >  	}
> >  
> > -	inode_lock(ea_inode);
> > +	ext4_ilock(ea_inode, EXT4_IOLOCK_EXCL);
> >  	i_size_write(ea_inode, wsize);
> >  	ext4_update_i_disksize(ea_inode, wsize);
> > -	inode_unlock(ea_inode);
> > +	ext4_iunlock(ea_inode, EXT4_IOLOCK_EXCL);
> > +
> >  
> >  	ext4_mark_inode_dirty(handle, ea_inode);
> >  
> > @@ -1432,9 +1433,9 @@ static struct inode *ext4_xattr_inode_create(handle_t *handle,
> >  		 */
> >  		dquot_free_inode(ea_inode);
> >  		dquot_drop(ea_inode);
> > -		inode_lock(ea_inode);
> > +		ext4_ilock(ea_inode, EXT4_IOLOCK_EXCL);
> >  		ea_inode->i_flags |= S_NOQUOTA;
> > -		inode_unlock(ea_inode);
> > +		ext4_iunlock(ea_inode, EXT4_IOLOCK_EXCL);
> >  	}
> >  
> >  	return ea_inode;
> > -- 
> > 2.21.0
> > 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

Matthew Wilcox Nov. 20, 2019, 4:35 p.m. UTC | #5

On Wed, Nov 20, 2019 at 05:48:30PM +0530, Ritesh Harjani wrote:
> Not against your suggestion here.
> But in kernel I do see a preference towards object followed by a verb.
> At least in vfs I see functions like inode_lock()/unlock().
> 
> Plus I would not deny that this naming is also inspired from
> xfs_ilock()/iunlock API names.

I see those names as being "classical Unix" heritage (eh, maybe SysV).

> hmm, it was increasing the name of the macro if I do it that way.
> But that's ok. Is below macro name better?
> 
> #define EXT4_INODE_IOLOCK_EXCL		(1 << 0)
> #define EXT4_INODE_IOLOCK_SHARED	(1 << 1)

In particular, Linux tends to prefer read/write instead of
shared/exclusive terminology.  rwlocks, rwsems, rcu_read_lock, seqlocks.
shared/exclusive is used by file locks.  And XFS ;-)

I agree with Jan; just leave them opencoded.

Probably worth adding inode_lock_downgrade() to fs.h instead of
accessing i_rwsem directly.

Ritesh Harjani Nov. 23, 2019, 11:51 a.m. UTC | #6

On 11/20/19 10:05 PM, Matthew Wilcox wrote:
> On Wed, Nov 20, 2019 at 05:48:30PM +0530, Ritesh Harjani wrote:
>> Not against your suggestion here.
>> But in kernel I do see a preference towards object followed by a verb.
>> At least in vfs I see functions like inode_lock()/unlock().
>>
>> Plus I would not deny that this naming is also inspired from
>> xfs_ilock()/iunlock API names.
> 
> I see those names as being "classical Unix" heritage (eh, maybe SysV).
> 
>> hmm, it was increasing the name of the macro if I do it that way.
>> But that's ok. Is below macro name better?
>>
>> #define EXT4_INODE_IOLOCK_EXCL		(1 << 0)
>> #define EXT4_INODE_IOLOCK_SHARED	(1 << 1)
> 
> In particular, Linux tends to prefer read/write instead of
> shared/exclusive terminology.  rwlocks, rwsems, rcu_read_lock, seqlocks.
> shared/exclusive is used by file locks.  And XFS ;-)
> 
> I agree with Jan; just leave them opencoded.

Sure.

> 
> Probably worth adding inode_lock_downgrade() to fs.h instead of
> accessing i_rwsem directly.
> 

Yup, make sense. but since this series is independent of that change,
let me add that as a separate patch after this series.


Thanks for the review!!
-ritesh

[RFCv3,2/4] ext4: Add ext4_ilock & ext4_iunlock API

Commit Message

Comments

Patch