Message ID | 20240410142948.2817554-4-yi.zhang@huaweicloud.com |
---|---|
State | Superseded |
Headers | show |
Series | ext4: use iomap for regular file's buffered IO path and enable large folio | expand |
Zhang Yi <yi.zhang@huaweicloud.com> writes: > From: Zhang Yi <yi.zhang@huawei.com> > > The cached delalloc or hole extent should be trimed to the map->map_len > if we map delalloc blocks in ext4_da_map_blocks(). Why do you say the cached delalloc extent should also be trimemd to m_len? Because we are only inserting delalloc blocks of min(hole_len, m_len), right? If we find delalloc blocks, we don't need to insert anything in ES cache. So we just return 0 in such case in this function. > But it doesn't > trigger any issue now because the map->m_len is always set to one and we > always insert one delayed block once a time. Fix this by trim the extent > once we get one from the cached extent tree, prearing for mapping a > extent with multiple delalloc blocks. > Yes, it wasn't clear until I looked at the discussion in the other thread. It would be helpful if you could use that example in the commit msg here for clarity. """ Yeah, now we only trim map len if we found an unwritten extent or written extent in the cache, this isn't okay if we found a hole and ext4_insert_delayed_block() and ext4_da_map_blocks() support inserting map->len blocks. If we found a hole which es->es_len is shorter than the length we want to write, we could delay more blocks than we expected. Please assume we write data [A, C) to a file that contains a hole extent [A, B) and a written extent [B, D) in cache. A B C D before da write: ...hhhhhh|wwwwww.... Then we will get extent [A, B), we should trim map->m_len to B-A before inserting new delalloc blocks, if not, the range [B, C) is duplicated. """ Minor nit: ext4_da_map_blocks() function comments have become stale now. It's not clear of it's return value, the lock it uses etc. etc. If we are at it, we might as well fix the function description. -ritesh
On 2024/5/1 22:31, Ritesh Harjani (IBM) wrote: > Zhang Yi <yi.zhang@huaweicloud.com> writes: > >> From: Zhang Yi <yi.zhang@huawei.com> >> >> The cached delalloc or hole extent should be trimed to the map->map_len >> if we map delalloc blocks in ext4_da_map_blocks(). > > Why do you say the cached delalloc extent should also be trimemd to > m_len? Because we are only inserting delalloc blocks of > min(hole_len, m_len), right? > > If we find delalloc blocks, we don't need to insert anything in ES > cache. So we just return 0 in such case in this function. > I'm sorry for the clerical error, it should not be trimmed to m_len, it should be trimmed to es->es_len. If we find a delalloc entry that shorter than the map->m_len, it means the front part of this write range has already been delayed, we can't insert the delalloc extent that contains the latter part in this round, we need to trim the map->m_len and return 0, the caller will increase the position and call ext4_da_map_blocks() again. For example, please assume we write data [A, C) to a file that contains a delayed extent [A, B) in the cache. A B C before da write: ...dddddd|hhh.... Then we will get delayed extent [A, B), we should trim map->m_len to B-A and return 0, if not, the caller will incorrectly assume that the write is complete and won't insert [B, C) later. > >> But it doesn't >> trigger any issue now because the map->m_len is always set to one and we >> always insert one delayed block once a time. Fix this by trim the extent >> once we get one from the cached extent tree, prearing for mapping a >> extent with multiple delalloc blocks. >> > > Yes, it wasn't clear until I looked at the discussion in the other > thread. It would be helpful if you could use that example in the commit > msg here for clarity. > > > """ > Yeah, now we only trim map len if we found an unwritten extent or written > extent in the cache, this isn't okay if we found a hole and > ext4_insert_delayed_block() and ext4_da_map_blocks() support inserting > map->len blocks. If we found a hole which es->es_len is shorter than the > length we want to write, we could delay more blocks than we expected. > > Please assume we write data [A, C) to a file that contains a hole extent > [A, B) and a written extent [B, D) in cache. > > A B C D > before da write: ...hhhhhh|wwwwww.... > > Then we will get extent [A, B), we should trim map->m_len to B-A before > inserting new delalloc blocks, if not, the range [B, C) is duplicated. > > """ > > Minor nit: ext4_da_map_blocks() function comments have become stale now. > It's not clear of it's return value, the lock it uses etc. etc. If we are > at it, we might as well fix the function description. > Thanks for the reminder, I will update it in patch 9 since it does some cleanup and also changes the return value. Thanks, Yi.
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 118b0497a954..e4043ddb07a5 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1734,6 +1734,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock, /* Lookup extent status tree firstly */ if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) { + retval = es.es_len - (iblock - es.es_lblk); + if (retval > map->m_len) + retval = map->m_len; + map->m_len = retval; + if (ext4_es_is_hole(&es)) goto add_delayed; @@ -1750,10 +1755,6 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock, } map->m_pblk = ext4_es_pblock(&es) + iblock - es.es_lblk; - retval = es.es_len - (iblock - es.es_lblk); - if (retval > map->m_len) - retval = map->m_len; - map->m_len = retval; if (ext4_es_is_written(&es)) map->m_flags |= EXT4_MAP_MAPPED; else if (ext4_es_is_unwritten(&es)) @@ -1788,6 +1789,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock, * whitout holding i_rwsem and folio lock. */ if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) { + retval = es.es_len - (iblock - es.es_lblk); + if (retval > map->m_len) + retval = map->m_len; + map->m_len = retval; + if (!ext4_es_is_hole(&es)) { up_write(&EXT4_I(inode)->i_data_sem); goto found;