Could you help with move-on-write review?

Hi guys,

I would like to ask you as a big favor to review this patch,
not as candidate for inclusion in mainline of course, but as a code/design
review for my new move-on-write hooks. hopefully, some of this code will
find it's way to ext4 some day soon.

The problems with the old data hooks, which I may have mentioned, are:
1. second write to mmaped page did not call get_block() to trigger
    move-on-write.
2. quota_write() called next3_bread(create=1) without notifying this
    is a partial write and move-on-write has corrupted it's data,
    resulting in broken quota file.
3. direct I/O was not supported.

The new design uses 3 buffer head flags to signal get_block() about
move-on-write:
1. buffer_move_data() is an explicit request to trigger move-on-write.
    users which do not declare move_data (like quota_write) will not
    trigger move-on-write. previous hooks would do move-on-write on
    all regular files data without an explicit request.
    new design suggests that we rather corrupt snapshot data
    (missed move-on-write) then corrupt the file system live data
    (false positive move-on-write).
2. buffer_partial_write() signals that in case of move-on-write, the
old block data
    needs to be read, before mapping the buffer to a new block.
3. buffer_direct_io() signals that in case of move-on-write or hole filling,
    the buffer should not be mapped on return.

There are 4 places I found that map data blocks and should trigger
move-on-write:
1. write_begin() sets move_data and optionally partial_write flags and
    unmaps buffers
2. ordered_writepage() sets move_data flag and unmaps buffers
    (snapshots only work with data=ordered)
3. truncate_page() sets move_data flag (and calls next3_get_block() itself)
4. next3_get_block() sets direct_io and move_data flags when (create && !handle)

Know issue:
snapshot copy of quota files is not consistent with snapshot'ed
file system (can be fixed, but is it relevant for read-only mount?)

you should know that the core move-on-write code in
get_blocks_handle() has not changed much
(only the move_data and direct_io flag checks were added) and it has
been used for quite some time now,
so I do not expect to find major bugs in the block move mechanism itself.

I revised my 'next3 test' scripts to rewrite random data to files via direct I/O
and mmap to test the new move-on-write hooks in next3_get_block() and in
ordered_writepage() and it all seems to work fine.
quotas also seems to work fine now.

I understand if this review is not on the top of your priorities,
but if you can find the time for it, I would very much appreciate it.

Thanks,
Amir.


 #define next3_snapshot_cow(handle, inode, bh, cow) 0
@@ -158,6 +167,31 @@ static inline int next3_snapshot_get_cre
 }

 /*
+ * get_move_access() - move block to snapshot
+ * @handle:    JBD handle
+ * @inode:     owner of @block
+ * @block:     address of @block
+ * @move:      if false, only test if @block needs to be moved
+ *
+ * Called from next3_get_blocks_handle() before overwriting a data block,
+ * when buffer_move() is true.  Specifically, only data blocks of
regular files,
+ * whose data is not being journaled are moved on full page write.
+ * Journaled data blocks are COWed on get_write_access().
+ * Snapshots and excluded files blocks are never moved-on-write.
+ * If @move is true, then truncate_mutex is held.
+ *
+ * Return values:
+ * = 1 - @block was moved or may not be overwritten
+ * = 0 - @block may be overwritten
+ * < 0 - error
+ */
+static inline int next3_snapshot_get_move_access(handle_t *handle,
+               struct inode *inode, next3_fsblk_t block, int move)
+{
+       return next3_snapshot_move(handle, inode, block, 1, move);
+}
+
+/*
 * get_delete_access() - move count blocks to snapshot
 * @handle:    JBD handle
 * @inode:     owner of blocks
@@ -216,6 +250,19 @@ extern next3_fsblk_t next3_get_inode_blo



+/*
+ * check if the data blocks of @inode should be moved-on-write
+ */
+static inline int next3_snapshot_should_move_data(struct inode *inode)
+{
+       if (!NEXT3_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+                               NEXT3_FEATURE_RO_COMPAT_HAS_SNAPSHOT))
+               return 0;
+       /* when a data block is journaled, it is already COWed as metadata */
+       if (next3_should_journal_data(inode))
+               return 0;
+       return 1;
+}
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Could you help with move-on-write review?

Commit Message

Patch