Message ID | 20200421191754.3372370-4-ira.weiny@intel.com |
---|---|
State | Not Applicable |
Headers | show |
Series | XFS - Enable per-file/per-directory DAX operations V9 | expand |
On Tue, Apr 21, 2020 at 12:17:45PM -0700, ira.weiny@intel.com wrote: > From: Ira Weiny <ira.weiny@intel.com> > > In order for users to determine if a file is currently operating in DAX > state (effective DAX). Define a statx attribute value and set that > attribute if the effective DAX flag is set. > > To go along with this we propose the following addition to the statx man > page: > > STATX_ATTR_DAX > > The file is in the DAX (cpu direct access) state. DAX state > attempts to minimize software cache effects for both I/O and > memory mappings of this file. It requires a file system which > has been configured to support DAX. > > DAX generally assumes all accesses are via cpu load / store > instructions which can minimize overhead for small accesses, but > may adversely affect cpu utilization for large transfers. > > File I/O is done directly to/from user-space buffers and memory > mapped I/O may be performed with direct memory mappings that > bypass kernel page cache. > > While the DAX property tends to result in data being transferred > synchronously, it does not give the same guarantees of O_SYNC > where data and the necessary metadata are transferred together. > > A DAX file may support being mapped with the MAP_SYNC flag, > which enables a program to use CPU cache flush instructions to > persist CPU store operations without an explicit fsync(2). See > mmap(2) for more information. One thing I hadn't noticed before -- this is a change to userspace API, so please cc this series to linux-api@vger.kernel.org when you send V10. Also, I've started to think about commit order sequencing for actually landing this series. Usually I try to put vfs and documentation things before xfs stuff, which means I came up with: vfs xfs I_DONTCACHE 2 3 11 1 4 5 6 7 8 9 10 Note that I separated the DONTCACHE part because it touches VFS internals, which implies a higher standard of review (aka Al) and I do not wish to hold up the 2-3-11-1-4-5-6-7 patches if the dontcache part becomes contentious. What do you think of that ordering? (Heck, maybe I'll just put patch 1 in the queue for 5.8 right now...) --D > Reviewed-by: Dave Chinner <dchinner@redhat.com> > Reviewed-by: Jan Kara <jack@suse.cz> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > --- > Changes from V2: > Update man page text with comments from Darrick, Jan, Dan, and > Dave. > --- > fs/stat.c | 3 +++ > include/uapi/linux/stat.h | 1 + > 2 files changed, 4 insertions(+) > > diff --git a/fs/stat.c b/fs/stat.c > index 030008796479..894699c74dde 100644 > --- a/fs/stat.c > +++ b/fs/stat.c > @@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat, > if (IS_AUTOMOUNT(inode)) > stat->attributes |= STATX_ATTR_AUTOMOUNT; > > + if (IS_DAX(inode)) > + stat->attributes |= STATX_ATTR_DAX; > + > if (inode->i_op->getattr) > return inode->i_op->getattr(path, stat, request_mask, > query_flags); > diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h > index ad80a5c885d5..e5f9d5517f6b 100644 > --- a/include/uapi/linux/stat.h > +++ b/include/uapi/linux/stat.h > @@ -169,6 +169,7 @@ struct statx { > #define STATX_ATTR_ENCRYPTED 0x00000800 /* [I] File requires key to decrypt in fs */ > #define STATX_ATTR_AUTOMOUNT 0x00001000 /* Dir: Automount trigger */ > #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ > +#define STATX_ATTR_DAX 0x00002000 /* [I] File is DAX */ > > > #endif /* _UAPI_LINUX_STAT_H */ > -- > 2.25.1 >
On Wed, Apr 22, 2020 at 09:29:51AM -0700, Darrick J. Wong wrote: > On Tue, Apr 21, 2020 at 12:17:45PM -0700, ira.weiny@intel.com wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > In order for users to determine if a file is currently operating in DAX > > state (effective DAX). Define a statx attribute value and set that > > attribute if the effective DAX flag is set. > > > > To go along with this we propose the following addition to the statx man > > page: > > > > STATX_ATTR_DAX > > > > The file is in the DAX (cpu direct access) state. DAX state > > attempts to minimize software cache effects for both I/O and > > memory mappings of this file. It requires a file system which > > has been configured to support DAX. > > > > DAX generally assumes all accesses are via cpu load / store > > instructions which can minimize overhead for small accesses, but > > may adversely affect cpu utilization for large transfers. > > > > File I/O is done directly to/from user-space buffers and memory > > mapped I/O may be performed with direct memory mappings that > > bypass kernel page cache. > > > > While the DAX property tends to result in data being transferred > > synchronously, it does not give the same guarantees of O_SYNC > > where data and the necessary metadata are transferred together. > > > > A DAX file may support being mapped with the MAP_SYNC flag, > > which enables a program to use CPU cache flush instructions to > > persist CPU store operations without an explicit fsync(2). See > > mmap(2) for more information. > > One thing I hadn't noticed before -- this is a change to userspace API, > so please cc this series to linux-api@vger.kernel.org when you send V10. Right! Glad you caught me on this because I was just preparing to send V10. Is there someone I could directly mail who needs to look at this? I guess I thought we had the important FS people involved for this type of API change. :-/ > > Also, I've started to think about commit order sequencing for actually > landing this series. Usually I try to put vfs and documentation things > before xfs stuff, which means I came up with: > > vfs xfs I_DONTCACHE > 2 3 11 1 4 5 6 7 8 9 10 > > Note that I separated the DONTCACHE part because it touches VFS > internals, which implies a higher standard of review (aka Al) and I do > not wish to hold up the 2-3-11-1-4-5-6-7 patches if the dontcache part > becomes contentious. > > What do you think of that ordering? I think 1 stands on it's own separate from this series... so I would keep it first. Moving Documentation up is easy. I've changed to this order... prelim vfs xfs I_DONTCACHE 1 2 3 11 4 5 6 7 8 9 10 Which is pretty much the same now that I look at it! ;-) > > (Heck, maybe I'll just put patch 1 in the queue for 5.8 right now...) IMHO, I think 1 and 2 can go. While patch 2 is in the VFS layer it is very much a DAX thing. Jan and Christoph approved it. I think even Dave approved the version before I removed io_is_direct() but I don't recall now. Dan and I also discussed it internally when I first found the issue. So I'm very confident in it! :-D Unfortunately, 3 and 10 are the critical pieces to the feature. So we could move 3 out later after 8 and 9 are approved. But I don't think it buys us much to have the tri-state go in without the rest. Ira > > --D > > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > Reviewed-by: Jan Kara <jack@suse.cz> > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > > > --- > > Changes from V2: > > Update man page text with comments from Darrick, Jan, Dan, and > > Dave. > > --- > > fs/stat.c | 3 +++ > > include/uapi/linux/stat.h | 1 + > > 2 files changed, 4 insertions(+) > > > > diff --git a/fs/stat.c b/fs/stat.c > > index 030008796479..894699c74dde 100644 > > --- a/fs/stat.c > > +++ b/fs/stat.c > > @@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat, > > if (IS_AUTOMOUNT(inode)) > > stat->attributes |= STATX_ATTR_AUTOMOUNT; > > > > + if (IS_DAX(inode)) > > + stat->attributes |= STATX_ATTR_DAX; > > + > > if (inode->i_op->getattr) > > return inode->i_op->getattr(path, stat, request_mask, > > query_flags); > > diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h > > index ad80a5c885d5..e5f9d5517f6b 100644 > > --- a/include/uapi/linux/stat.h > > +++ b/include/uapi/linux/stat.h > > @@ -169,6 +169,7 @@ struct statx { > > #define STATX_ATTR_ENCRYPTED 0x00000800 /* [I] File requires key to decrypt in fs */ > > #define STATX_ATTR_AUTOMOUNT 0x00001000 /* Dir: Automount trigger */ > > #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ > > +#define STATX_ATTR_DAX 0x00002000 /* [I] File is DAX */ > > > > > > #endif /* _UAPI_LINUX_STAT_H */ > > -- > > 2.25.1 > >
On Wed 22-04-20 11:51:21, Ira Weiny wrote: > On Wed, Apr 22, 2020 at 09:29:51AM -0700, Darrick J. Wong wrote: > > On Tue, Apr 21, 2020 at 12:17:45PM -0700, ira.weiny@intel.com wrote: > > > From: Ira Weiny <ira.weiny@intel.com> > > > > > > In order for users to determine if a file is currently operating in DAX > > > state (effective DAX). Define a statx attribute value and set that > > > attribute if the effective DAX flag is set. > > > > > > To go along with this we propose the following addition to the statx man > > > page: > > > > > > STATX_ATTR_DAX > > > > > > The file is in the DAX (cpu direct access) state. DAX state > > > attempts to minimize software cache effects for both I/O and > > > memory mappings of this file. It requires a file system which > > > has been configured to support DAX. > > > > > > DAX generally assumes all accesses are via cpu load / store > > > instructions which can minimize overhead for small accesses, but > > > may adversely affect cpu utilization for large transfers. > > > > > > File I/O is done directly to/from user-space buffers and memory > > > mapped I/O may be performed with direct memory mappings that > > > bypass kernel page cache. > > > > > > While the DAX property tends to result in data being transferred > > > synchronously, it does not give the same guarantees of O_SYNC > > > where data and the necessary metadata are transferred together. > > > > > > A DAX file may support being mapped with the MAP_SYNC flag, > > > which enables a program to use CPU cache flush instructions to > > > persist CPU store operations without an explicit fsync(2). See > > > mmap(2) for more information. > > > > One thing I hadn't noticed before -- this is a change to userspace API, > > so please cc this series to linux-api@vger.kernel.org when you send V10. > > Right! Glad you caught me on this because I was just preparing to send V10. > > Is there someone I could directly mail who needs to look at this? I guess I > thought we had the important FS people involved for this type of API change. > :-/ I believe we have all the important people here. But linux-api is a general fallback list where people reviewing API changes linger. So when changing user facing API, it is good to CC this list. Honza
diff --git a/fs/stat.c b/fs/stat.c index 030008796479..894699c74dde 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat, if (IS_AUTOMOUNT(inode)) stat->attributes |= STATX_ATTR_AUTOMOUNT; + if (IS_DAX(inode)) + stat->attributes |= STATX_ATTR_DAX; + if (inode->i_op->getattr) return inode->i_op->getattr(path, stat, request_mask, query_flags); diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index ad80a5c885d5..e5f9d5517f6b 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -169,6 +169,7 @@ struct statx { #define STATX_ATTR_ENCRYPTED 0x00000800 /* [I] File requires key to decrypt in fs */ #define STATX_ATTR_AUTOMOUNT 0x00001000 /* Dir: Automount trigger */ #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ +#define STATX_ATTR_DAX 0x00002000 /* [I] File is DAX */ #endif /* _UAPI_LINUX_STAT_H */