Message ID | 20081006175502.GA12937@mit.edu |
---|---|
State | Not Applicable, archived |
Headers | show |
On Mon, Oct 06, 2008 at 01:55:02PM -0400, Theodore Tso wrote: > Here are updates to the two patches I sent you. The first fixes some > printk labels and requires that the debugging dump ioctl be triggered > by root; the second fixes the think-o that Eric pointed out earlier > this morning. Hey Quentin, just wanted to check in and find out if the patches seem to be working for you. They seem to solve the problem on my side; I can no longer reproduce the problem. Is it working for you? Thanks, regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 07, 2008 at 06:12:56PM -0400, Theodore Tso wrote: > On Mon, Oct 06, 2008 at 01:55:02PM -0400, Theodore Tso wrote: > > Here are updates to the two patches I sent you. The first fixes some > > printk labels and requires that the debugging dump ioctl be triggered > > by root; the second fixes the think-o that Eric pointed out earlier > > this morning. > > Hey Quentin, just wanted to check in and find out if the patches seem > to be working for you. They seem to solve the problem on my side; I > can no longer reproduce the problem. Is it working for you? I rebooted, but as I didn't know exactly how to trigger the bug except than by waiting a few days I was waiting to see. Do you have an idea about how to trigger the bug? Regards, Quentin Godfroy -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 08, 2008 at 02:02:27AM +0200, Quentin Godfroy wrote: > > I rebooted, but as I didn't know exactly how to trigger the bug except than > by waiting a few days I was waiting to see. > > Do you have an idea about how to trigger the bug? The bug is that each time a transaction commits, if the buffer head hasn't been leaked already, it will leak pinning the memory until the total size of the journal is taking up memory. If you have a 2gigs of memory, and a single filesystem with 256meg journal, you might not notice the problem at all. If you less than 256 megs of memory, you'll notice fairly quickly. So you can test to see if you are running into problems by running the buffer_dump program to trigger a dump of buffers in use. Some buffers will always be pinned, such as the super block, block group descriptors, and journal super blocks. And there will be some number of buffers that are in use; but the number of dirty buffers should be no more than, oh, 30 or 40, and over the next couple of hours, it should not be growing. Even before you start runninng to memory exhaustion problems, the number of in-use buffers should relatively quick rise to hundreds or thousands after a few hours of run-time. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 07, 2008 at 08:53:38PM -0400, Theodore Tso wrote: > On Wed, Oct 08, 2008 at 02:02:27AM +0200, Quentin Godfroy wrote: > > > > I rebooted, but as I didn't know exactly how to trigger the bug except than > > by waiting a few days I was waiting to see. > > > > Do you have an idea about how to trigger the bug? > > The bug is that each time a transaction commits, if the buffer head > hasn't been leaked already, it will leak pinning the memory until the > total size of the journal is taking up memory. If you have a 2gigs of > memory, and a single filesystem with 256meg journal, you might not > notice the problem at all. If you less than 256 megs of memory, > you'll notice fairly quickly. Indeed after a couple of days of uptime the number of dirty blocks do not go further than ~50, so I think the bug is corrected as far as I am concerned. By the way, why does the kernel not commit to memory these remaining buffers when the memory is scarse (say instead of firing an OOM killer)? Regards Quentin Godfroy -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 09, 2008 at 01:52:46AM +0200, Quentin Godfroy wrote: > > Indeed after a couple of days of uptime the number of dirty blocks do not go > further than ~50, so I think the bug is corrected as far as I am concerned. > > By the way, why does the kernel not commit to memory these remaining buffers > when the memory is scarse (say instead of firing an OOM killer)? The bug was the jbd2 code was bumping the reference count for the buffers, so they were considered "in-use" and thus couldn't be freed and released for other uses. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c index c23177e..c2a788d 100644 --- a/block/compat_ioctl.c +++ b/block/compat_ioctl.c @@ -786,6 +786,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg) switch (cmd) { case HDIO_GETGEO: return compat_hdio_getgeo(disk, bdev, compat_ptr(arg)); + case BLKDUMPUSEDBUFFERS: case BLKFLSBUF: case BLKROSET: /* diff --git a/block/ioctl.c b/block/ioctl.c index 77185e5..5eba4d6 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -279,6 +279,11 @@ int blkdev_ioctl(struct inode *inode, struct file *file, unsigned cmd, return -EFAULT; return 0; } + case BLKDUMPUSEDBUFFERS: + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; + dump_used_buffers(bdev); + return 0; } lock_kernel(); diff --git a/fs/buffer.c b/fs/buffer.c index ac78d4c..34ffeb4 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -33,6 +33,7 @@ #include <linux/writeback.h> #include <linux/hash.h> #include <linux/suspend.h> +#include <linux/pagevec.h> #include <linux/buffer_head.h> #include <linux/task_io_accounting_ops.h> #include <linux/bio.h> @@ -247,6 +248,45 @@ void thaw_bdev(struct block_device *bdev, struct super_block *sb) } EXPORT_SYMBOL(thaw_bdev); +void dump_used_buffers(struct block_device *bdev) +{ + struct inode *bd_inode = bdev->bd_inode; + struct address_space *bd_mapping = bd_inode->i_mapping; + struct buffer_head *bh, *head; + struct pagevec pvec; + unsigned long index = 0; + int nr_pages, i, count, total = 0; + char b[BDEVNAME_SIZE]; + + spin_lock(&bd_mapping->private_lock); + printk(KERN_INFO "Begin dump of block device %s\n", bdevname(bdev, b)); + while (1) { + nr_pages = pagevec_lookup(&pvec, bd_mapping, index, PAGEVEC_SIZE); + if (nr_pages == 0) + break; + for (i = 0; i < nr_pages; i++) { + struct page *page = pvec.pages[i]; + index = page->index + 1; + + if (!page_has_buffers(page)) + continue; + bh = head = page_buffers(page); + do { + count = atomic_read(&bh->b_count); + if (count) { + printk(KERN_INFO + "buffer in-use: block %Lu count %d\n", + (unsigned long long) bh->b_blocknr, count); + total++; + } + bh = bh->b_this_page; + } while (bh != head); + } + } + printk(KERN_INFO "Total number of in-use buffers: %d\n", total); + spin_unlock(&bd_mapping->private_lock); +} + /* * Various filesystems appear to want __find_get_block to be non-blocking. * But it's the page lock which protects the buffers. To get around this, diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index eadaab4..1c48dff 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -193,6 +193,7 @@ void write_boundary_block(struct block_device *bdev, sector_t bblock, unsigned blocksize); int bh_uptodate_or_lock(struct buffer_head *bh); int bh_submit_read(struct buffer_head *bh); +void dump_used_buffers(struct block_device *bdev); extern int buffer_heads_over_limit; diff --git a/include/linux/fs.h b/include/linux/fs.h index 580b513..ae0ab82 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -222,6 +222,7 @@ extern int dir_notify_enable; #define BLKTRACESTART _IO(0x12,116) #define BLKTRACESTOP _IO(0x12,117) #define BLKTRACETEARDOWN _IO(0x12,118) +#define BLKDUMPUSEDBUFFERS _IO(0x12,119) #define BMAP_IOCTL 1 /* obsolete - kept for compatibility */ #define FIBMAP _IO(0x00,1) /* bmap access */