Message ID | 20191129142045.7215-1-agruenba@redhat.com |
---|---|
State | New, archived |
Headers | show |
Series | [v2] fs: Fix page_mkwrite off-by-one errors | expand |
On Fri, Nov 29, 2019 at 03:20:45PM +0100, Andreas Gruenbacher wrote: > The check in block_page_mkwrite meant to determine whether an offset is > within the inode size is off by one. This bug has spread to > iomap_page_mkwrite and to several filesystems (ubifs, ext4, f2fs, ceph). > To fix that, introduce a new page_mkwrite_check_truncate helper that > checks for truncate and computes the bytes in the page up to EOF, and > use that helper in the above mentioned filesystems and in btrfs. > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > --- > > This patch has a trivial conflict with commit "iomap: Fix overflow in > iomap_page_mkwrite" in Darrick's iomap pull request for 5.5: > > https://lore.kernel.org/lkml/20191125190907.GN6219@magnolia/ > --- > fs/btrfs/inode.c | 15 ++++----------- For the btrfs part Acked-by: David Sterba <dsterba@suse.com> and reviewed that the change is equivalent.
----- Ursprüngliche Mail ----- > Von: "Andreas Gruenbacher" <agruenba@redhat.com> > An: "Christoph Hellwig" <hch@infradead.org>, "Darrick" <darrick.wong@oracle.com> > CC: "Andreas Gruenbacher" <agruenba@redhat.com>, "torvalds" <torvalds@linux-foundation.org>, "linux-kernel" > <linux-kernel@vger.kernel.org>, "Al Viro" <viro@zeniv.linux.org.uk>, "Jeff Layton" <jlayton@kernel.org>, "Sage Weil" > <sage@redhat.com>, "Ilya Dryomov" <idryomov@gmail.com>, "tytso" <tytso@mit.edu>, "Andreas Dilger" > <adilger.kernel@dilger.ca>, "Jaegeuk Kim" <jaegeuk@kernel.org>, "Chao Yu" <chao@kernel.org>, linux-xfs@vger.kernel.org, > "linux-fsdevel" <linux-fsdevel@vger.kernel.org>, "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, > "Adrian Hunter" <adrian.hunter@intel.com>, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, > linux-f2fs-devel@lists.sourceforge.net, "linux-mtd" <linux-mtd@lists.infradead.org>, "Chris Mason" <clm@fb.com>, "Josef > Bacik" <josef@toxicpanda.com>, "David Sterba" <dsterba@suse.com>, "linux-btrfs" <linux-btrfs@vger.kernel.org> > Gesendet: Freitag, 29. November 2019 15:20:45 > Betreff: [PATCH v2] fs: Fix page_mkwrite off-by-one errors > The check in block_page_mkwrite meant to determine whether an offset is > within the inode size is off by one. This bug has spread to > iomap_page_mkwrite and to several filesystems (ubifs, ext4, f2fs, ceph). > To fix that, introduce a new page_mkwrite_check_truncate helper that > checks for truncate and computes the bytes in the page up to EOF, and > use that helper in the above mentioned filesystems and in btrfs. > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Thank you for fixing UBIFS! Acked-by: Richard Weinberger <richard@nod.at> Thanks, //richard
On Fri, Nov 29, 2019 at 6:21 AM Andreas Gruenbacher <agruenba@redhat.com> wrote: > > +/** > + * page_mkwrite_check_truncate - check if page was truncated > + * @page: the page to check > + * @inode: the inode to check the page against > + * > + * Returns the number of bytes in the page up to EOF, > + * or -EFAULT if the page was truncated. > + */ > +static inline int page_mkwrite_check_truncate(struct page *page, > + struct inode *inode) > +{ > + loff_t size = i_size_read(inode); > + pgoff_t end_index = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; This special end_index calculation seems to be redundant. You later want "size >> PAGE_SHIFT" for another test, and that's actually the important part. The "+ PAGE_SIZE - 1" case is purely to handle the "AT the page boundary is special" case, but since you have to calculate "offset_in_page(size)" anyway, that's entirely redundant - the answer is part of that. So I think it would be better to write the logic as loff_t size = i_size_read(inode); pgoff_t index = size >> PAGE_SHIFT; int offset = offset_in_page(size); if (page->mapping != inode->i_mapping) return -EFAULT; /* Page is wholly past the EOF page */ if (page->index > index) return -EFAULT; /* page is wholly inside EOF */ if (page->index < index) return PAGE_SIZE; /* bytes in a page? If 0, it's past EOF */ return offset ? offset : -PAGE_SIZE; instead. That avoids the unnecessary "round up" part, and simply uses the same EOF index for everything. Linus
Am Di., 3. Dez. 2019 um 02:09 Uhr schrieb Linus Torvalds <torvalds@linux-foundation.org>: > > On Fri, Nov 29, 2019 at 6:21 AM Andreas Gruenbacher <agruenba@redhat.com> wrote: > > > > +/** > > + * page_mkwrite_check_truncate - check if page was truncated > > + * @page: the page to check > > + * @inode: the inode to check the page against > > + * > > + * Returns the number of bytes in the page up to EOF, > > + * or -EFAULT if the page was truncated. > > + */ > > +static inline int page_mkwrite_check_truncate(struct page *page, > > + struct inode *inode) > > +{ > > + loff_t size = i_size_read(inode); > > + pgoff_t end_index = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; > > This special end_index calculation seems to be redundant. > > You later want "size >> PAGE_SHIFT" for another test, and that's > actually the important part. > > The "+ PAGE_SIZE - 1" case is purely to handle the "AT the page > boundary is special" case, but since you have to calculate > "offset_in_page(size)" anyway, that's entirely redundant - the answer > is part of that. > > So I think it would be better to write the logic as > > loff_t size = i_size_read(inode); > pgoff_t index = size >> PAGE_SHIFT; > int offset = offset_in_page(size); > > if (page->mapping != inode->i_mapping) > return -EFAULT; > > /* Page is wholly past the EOF page */ > if (page->index > index) > return -EFAULT; > /* page is wholly inside EOF */ > if (page->index < index) > return PAGE_SIZE; > /* bytes in a page? If 0, it's past EOF */ > return offset ? offset : -PAGE_SIZE; > > instead. That avoids the unnecessary "round up" part, and simply uses > the same EOF index for everything. And if we rearrange things slightly, we end up with: /* page is wholly inside EOF */ if (page->index < index) return PAGE_SIZE; /* page is wholly past EOF */ if (page->index > index || !offset) return -EFAULT; /* page is partially inside EOF */ return offset; Thanks, Andreas
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 015910079e73..019948101bc2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8990,13 +8990,11 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */ again: lock_page(page); - size = i_size_read(inode); - if ((page->mapping != inode->i_mapping) || - (page_start >= size)) { - /* page got truncated out from underneath us */ + ret2 = page_mkwrite_check_truncate(page, inode); + if (ret2 < 0) goto out_unlock; - } + zero_start = ret2; wait_on_page_writeback(page); lock_extent_bits(io_tree, page_start, page_end, &cached_state); @@ -9017,6 +9015,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) goto again; } + size = i_size_read(inode); if (page->index == ((size - 1) >> PAGE_SHIFT)) { reserved_space = round_up(size - page_start, fs_info->sectorsize); @@ -9049,12 +9048,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) } ret2 = 0; - /* page is wholly or partially inside EOF */ - if (page_start + PAGE_SIZE > size) - zero_start = offset_in_page(size); - else - zero_start = PAGE_SIZE; - if (zero_start != PAGE_SIZE) { kaddr = kmap(page); memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start); diff --git a/fs/buffer.c b/fs/buffer.c index 86a38b979323..b162ec65910e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2459,23 +2459,13 @@ int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, struct page *page = vmf->page; struct inode *inode = file_inode(vma->vm_file); unsigned long end; - loff_t size; int ret; lock_page(page); - size = i_size_read(inode); - if ((page->mapping != inode->i_mapping) || - (page_offset(page) > size)) { - /* We overload EFAULT to mean page got truncated */ - ret = -EFAULT; + ret = page_mkwrite_check_truncate(page, inode); + if (ret < 0) goto out_unlock; - } - - /* page is wholly or partially inside EOF */ - if (((page->index + 1) << PAGE_SHIFT) > size) - end = size & ~PAGE_MASK; - else - end = PAGE_SIZE; + end = ret; ret = __block_write_begin(page, 0, end, get_block); if (!ret) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 7ab616601141..ef958aa4adb4 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1575,7 +1575,7 @@ static vm_fault_t ceph_page_mkwrite(struct vm_fault *vmf) do { lock_page(page); - if ((off > size) || (page->mapping != inode->i_mapping)) { + if (page_mkwrite_check_truncate(page, inode) < 0) { unlock_page(page); ret = VM_FAULT_NOPAGE; break; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 516faa280ced..23bf095e0b29 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6186,7 +6186,6 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct page *page = vmf->page; - loff_t size; unsigned long len; int err; vm_fault_t ret; @@ -6222,18 +6221,13 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) } lock_page(page); - size = i_size_read(inode); - /* Page got truncated from under us? */ - if (page->mapping != mapping || page_offset(page) > size) { + err = page_mkwrite_check_truncate(page, inode); + if (err < 0) { unlock_page(page); - ret = VM_FAULT_NOPAGE; - goto out; + goto out_ret; } + len = err; - if (page->index == size >> PAGE_SHIFT) - len = size & ~PAGE_MASK; - else - len = PAGE_SIZE; /* * Return if we have all the buffers mapped. This avoids the need to do * journal_start/journal_stop which can block and take a long time diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 29bc0a542759..973f731e7af4 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -51,7 +51,7 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf) struct inode *inode = file_inode(vmf->vma->vm_file); struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct dnode_of_data dn = { .node_changed = false }; - int err; + int offset, err; if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; @@ -70,13 +70,14 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf) file_update_time(vmf->vma->vm_file); down_read(&F2FS_I(inode)->i_mmap_sem); lock_page(page); - if (unlikely(page->mapping != inode->i_mapping || - page_offset(page) > i_size_read(inode) || - !PageUptodate(page))) { + err = -EFAULT; + if (likely(PageUptodate(page))) + err = page_mkwrite_check_truncate(page, inode); + if (unlikely(err < 0)) { unlock_page(page); - err = -EFAULT; goto out_sem; } + offset = err; /* block allocation */ __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true); @@ -101,14 +102,8 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf) if (PageMappedToDisk(page)) goto out_sem; - /* page is wholly or partially inside EOF */ - if (((loff_t)(page->index + 1) << PAGE_SHIFT) > - i_size_read(inode)) { - loff_t offset; - - offset = i_size_read(inode) & ~PAGE_MASK; + if (offset != PAGE_SIZE) zero_user_segment(page, offset, PAGE_SIZE); - } set_page_dirty(page); if (!PageUptodate(page)) SetPageUptodate(page); diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index e25901ae3ff4..663b5071b154 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1035,23 +1035,14 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops) struct page *page = vmf->page; struct inode *inode = file_inode(vmf->vma->vm_file); unsigned long length; - loff_t offset, size; + loff_t offset; ssize_t ret; lock_page(page); - size = i_size_read(inode); - if ((page->mapping != inode->i_mapping) || - (page_offset(page) > size)) { - /* We overload EFAULT to mean page got truncated */ - ret = -EFAULT; + ret = page_mkwrite_check_truncate(page, inode); + if (ret < 0) goto out_unlock; - } - - /* page is wholly or partially inside EOF */ - if (((page->index + 1) << PAGE_SHIFT) > size) - length = offset_in_page(size); - else - length = PAGE_SIZE; + length = ret; offset = page_offset(page); while (length > 0) { diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index cd52585c8f4f..91f7a1f2db0d 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -1563,8 +1563,7 @@ static vm_fault_t ubifs_vm_page_mkwrite(struct vm_fault *vmf) } lock_page(page); - if (unlikely(page->mapping != inode->i_mapping || - page_offset(page) > i_size_read(inode))) { + if (unlikely(page_mkwrite_check_truncate(page, inode) < 0)) { /* Page got truncated out from underneath us */ goto sigbus; } diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 37a4d9e32cd3..5a3f860470ad 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -636,4 +636,28 @@ static inline unsigned long dir_pages(struct inode *inode) PAGE_SHIFT; } +/** + * page_mkwrite_check_truncate - check if page was truncated + * @page: the page to check + * @inode: the inode to check the page against + * + * Returns the number of bytes in the page up to EOF, + * or -EFAULT if the page was truncated. + */ +static inline int page_mkwrite_check_truncate(struct page *page, + struct inode *inode) +{ + loff_t size = i_size_read(inode); + pgoff_t end_index = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + + if (page->mapping != inode->i_mapping || + page->index >= end_index) + return -EFAULT; + if (page->index != size >> PAGE_SHIFT) { + /* page is wholly inside EOF */ + return PAGE_SIZE; + } + return offset_in_page(size); +} + #endif /* _LINUX_PAGEMAP_H */
The check in block_page_mkwrite meant to determine whether an offset is within the inode size is off by one. This bug has spread to iomap_page_mkwrite and to several filesystems (ubifs, ext4, f2fs, ceph). To fix that, introduce a new page_mkwrite_check_truncate helper that checks for truncate and computes the bytes in the page up to EOF, and use that helper in the above mentioned filesystems and in btrfs. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> --- This patch has a trivial conflict with commit "iomap: Fix overflow in iomap_page_mkwrite" in Darrick's iomap pull request for 5.5: https://lore.kernel.org/lkml/20191125190907.GN6219@magnolia/ --- fs/btrfs/inode.c | 15 ++++----------- fs/buffer.c | 16 +++------------- fs/ceph/addr.c | 2 +- fs/ext4/inode.c | 14 ++++---------- fs/f2fs/file.c | 19 +++++++------------ fs/iomap/buffered-io.c | 17 ++++------------- fs/ubifs/file.c | 3 +-- include/linux/pagemap.h | 24 ++++++++++++++++++++++++ 8 files changed, 48 insertions(+), 62 deletions(-)