Message ID | 4BBD5D90.4090203@redhat.com |
---|---|
State | New, archived |
Headers | show |
Eric Sandeen wrote: > Eric Sandeen wrote: >> I'll have to think about the right way to do this... it seems pretty >> convoluted to me right now. >> > Something like this probably works, but I really REALLY would not test > it on an important filesystem. :) > > I'm not sure it's a good idea to discard it before returning it > to the prealloc pool, because it may well get re-used again > quickly.... not sure if that's helpful. (now I'm really talking to myself, but scratch that bit - ext4_mb_return_to_preallocation is pretty much a no-op) -Eric > Just a note, I think eventually we may move to more of a batch discard > in the background, because these little discards are actually quite > inefficient on the hardware we've tested so far. > > -Eric > > p.s. really. Don't test this with important data. I haven't tested > it at all yet. > > Index: linux-2.6/fs/ext4/mballoc.c > =================================================================== > --- linux-2.6.orig/fs/ext4/mballoc.c > +++ linux-2.6/fs/ext4/mballoc.c > @@ -4602,6 +4606,8 @@ do_more: > mb_clear_bits(bitmap_bh->b_data, bit, count); > ext4_mb_free_metadata(handle, &e4b, new_entry); > } else { > + ext4_fsblk_t discard_block; > + > /* need to update group_info->bb_free and bitmap > * with group lock held. generate_buddy look at > * them with group lock_held > @@ -4609,6 +4615,11 @@ do_more: > ext4_lock_group(sb, block_group); > mb_clear_bits(bitmap_bh->b_data, bit, count); > mb_free_blocks(inode, &e4b, bit, count); > + discard_block = bit + > + ext4_group_first_block_no(sb, block_group); > + trace_ext4_discard_blocks(sb, > + (unsigned long long)discard_block, count); > + sb_issue_discard(sb, discard_block, count); > ext4_mb_return_to_preallocation(inode, &e4b, block, count); > } > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/08/10 06:47, Eric Sandeen wrote: > Eric Sandeen wrote: >> Eric Sandeen wrote: >>> I'll have to think about the right way to do this... it seems pretty >>> convoluted to me right now. >>> >> Something like this probably works, but I really REALLY would not test >> it on an important filesystem. :) >> >> I'm not sure it's a good idea to discard it before returning it >> to the prealloc pool, because it may well get re-used again >> quickly.... not sure if that's helpful. > > (now I'm really talking to myself, but scratch that bit - > ext4_mb_return_to_preallocation is pretty much a no-op) > > -Eric > >> Just a note, I think eventually we may move to more of a batch discard >> in the background, because these little discards are actually quite >> inefficient on the hardware we've tested so far. >> >> -Eric >> >> p.s. really. Don't test this with important data. I haven't tested >> it at all yet. >> >> Index: linux-2.6/fs/ext4/mballoc.c >> =================================================================== >> --- linux-2.6.orig/fs/ext4/mballoc.c >> +++ linux-2.6/fs/ext4/mballoc.c >> @@ -4602,6 +4606,8 @@ do_more: >> mb_clear_bits(bitmap_bh->b_data, bit, count); >> ext4_mb_free_metadata(handle, &e4b, new_entry); >> } else { >> + ext4_fsblk_t discard_block; >> + >> /* need to update group_info->bb_free and bitmap >> * with group lock held. generate_buddy look at >> * them with group lock_held >> @@ -4609,6 +4615,11 @@ do_more: >> ext4_lock_group(sb, block_group); >> mb_clear_bits(bitmap_bh->b_data, bit, count); >> mb_free_blocks(inode, &e4b, bit, count); >> + discard_block = bit + >> + ext4_group_first_block_no(sb, block_group); >> + trace_ext4_discard_blocks(sb, >> + (unsigned long long)discard_block, count); >> + sb_issue_discard(sb, discard_block, count); >> ext4_mb_return_to_preallocation(inode, &e4b, block, count); >> } Well, to be honest, I'm not some programmer guy, so I doubt my skills can be of any help here. Second, unfortunately, my SSD is now my root partition (just one big sda1), so I cannot experiment with it too much. And finally, >> I'm not sure it's a good idea to discard it before returning it >> to the prealloc pool, because it may well get re-used again >> quickly.... not sure if that's helpful. I'm not sure I understood you well about this prealloc pool - re-using mechanism, but... AFAIK, modern SSDs are using very aggressive wear-leveling algorithms. Writing two times into the same filesystem sector almost newer goes to the same hardware sector. Therefore, saving sectors from discarding and for re-using makes no much sense - once re-used it will be written to some other physical NAND memory cell anyways. Guess we can discard it as soon as we have no valid data in it. Nebojsa -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nebojsa Trpkovic wrote: > Well, to be honest, I'm not some programmer guy, so I doubt my skills > can be of any help here. > > Second, unfortunately, my SSD is now my root partition (just one big > sda1), so I cannot experiment with it too much. Well, you might just keep in mind that: 1) trimming these small amounts has actually looked very inefficient, and 2) data=writeback really isn't very safe in the face of a crash or power loss, and 3) hopefully we'll have a better trim solution eventually. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/08/10 17:32, Eric Sandeen wrote: > Well, you might just keep in mind that: > > 1) trimming these small amounts has actually looked very inefficient, and > 2) data=writeback really isn't very safe in the face of a crash or power loss, and > 3) hopefully we'll have a better trim solution eventually. 1) I understand that big TRIMs are better then small ones, but skipping some TRIMs completely would lead to slow but sure drive degradation as drive would have less and less spare space for wear leveling. 2) Yes, I'm aware of possible data=writeback inconsistency, but I've tried to let IO scheduler to merge and reorganize as many writes as it can, all to avoid small writes to SSD which are main cause of write amplification. 3) I'll stick with no data=writeback for the time being. I guess I'm doing just fine even without it. :) One more noob quotestion completely out-of-topic: Will md layer pass TRIM command to drive if one has ext4 on linux software RAID 0/1/5 ? Nebojsa -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nebojsa Trpkovic wrote: > On 04/08/10 17:32, Eric Sandeen wrote: >> Well, you might just keep in mind that: >> >> 1) trimming these small amounts has actually looked very inefficient, and >> 2) data=writeback really isn't very safe in the face of a crash or power loss, and >> 3) hopefully we'll have a better trim solution eventually. > > 1) I understand that big TRIMs are better then small ones, but skipping > some TRIMs completely would lead to slow but sure drive degradation as > drive would have less and less spare space for wear leveling. > > 2) Yes, I'm aware of possible data=writeback inconsistency, but I've > tried to let IO scheduler to merge and reorganize as many writes as it > can, all to avoid small writes to SSD which are main cause of write > amplification. > > 3) I'll stick with no data=writeback for the time being. I guess I'm > doing just fine even without it. :) > > > One more noob quotestion completely out-of-topic: > Will md layer pass TRIM command to drive if one has ext4 on linux > software RAID 0/1/5 ? I think the answer is "not yet but it's being worked on" -Eric > Nebojsa -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6/fs/ext4/mballoc.c =================================================================== --- linux-2.6.orig/fs/ext4/mballoc.c +++ linux-2.6/fs/ext4/mballoc.c @@ -4602,6 +4606,8 @@ do_more: mb_clear_bits(bitmap_bh->b_data, bit, count); ext4_mb_free_metadata(handle, &e4b, new_entry); } else { + ext4_fsblk_t discard_block; + /* need to update group_info->bb_free and bitmap * with group lock held. generate_buddy look at * them with group lock_held @@ -4609,6 +4615,11 @@ do_more: ext4_lock_group(sb, block_group); mb_clear_bits(bitmap_bh->b_data, bit, count); mb_free_blocks(inode, &e4b, bit, count); + discard_block = bit + + ext4_group_first_block_no(sb, block_group); + trace_ext4_discard_blocks(sb, + (unsigned long long)discard_block, count); + sb_issue_discard(sb, discard_block, count); ext4_mb_return_to_preallocation(inode, &e4b, block, count); }