Message ID | 1370253616-8173-2-git-send-email-ruslan.bilovol@ti.com |
---|---|
State | New, archived |
Headers | show |
On Mon, Jun 03, 2013 at 01:00:15PM +0300, Ruslan Bilovol wrote: > The memset() doesn't perform any NULL-pointer checking > before dereferencing passed pointer so this should be > checked before calling it. I can see that __getblk() can return NULL if there is a memory allocation failure (and is defined to do so), so checking to make sure bh is not NULL is a good thing to do. Have you actually seen a case where bh is non-NULL, but bh->b_data is NULL? If not, it might be better to do something like this: > bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize); if (!bh) return NULL; BUG_ON(!bh->b_data); - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Ted, On Mon, Jun 3, 2013 at 6:33 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Mon, Jun 03, 2013 at 01:00:15PM +0300, Ruslan Bilovol wrote: >> The memset() doesn't perform any NULL-pointer checking >> before dereferencing passed pointer so this should be >> checked before calling it. > > I can see that __getblk() can return NULL if there is a memory > allocation failure (and is defined to do so), so checking to make sure > bh is not NULL is a good thing to do. > > Have you actually seen a case where bh is non-NULL, but bh->b_data is > NULL? If not, it might be better to do something like this: Yes, this is exactly the situation I observe (bh is non-NULL, but bh->b_data is NULL) > >> bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize); > if (!bh) > return NULL; > BUG_ON(!bh->b_data); Is it so critical that we need to stop the kernel here? Can we recover from this state gracefully? Maybe something like this may be better: bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize); if (!bh) return NULL; if(!bh->b_data) { WARN_ON(1); return NULL; } Regards, Ruslan > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
On Tue, Jun 04, 2013 at 02:15:57PM +0300, Ruslan Bilovol wrote: > > Have you actually seen a case where bh is non-NULL, but bh->b_data is > > NULL? If not, it might be better to do something like this: > > Yes, this is exactly the situation I observe (bh is non-NULL, but > bh->b_data is NULL) Hmm... so the stack trace you sent in the commit description was one where bh->b_data was NULL? I'm trying to make sure there isn't something else going on that we don't understand. Could you put some instrumentation in __find_get_block()? Something like this: struct buffer_head * __find_get_block(struct block_device *bdev, sector_t block, unsigned size) { struct buffer_head *bh = lookup_bh_lru(bdev, block, size); if (bh == NULL) { bh = __find_get_block_slow(bdev, block); if (bh->b_data == NULL) { pr_crit("b_data NULL after find_get_block_slow\n); WARN_ON(1); } if (bh) bh_lru_install(bh); } else { if (bh->b_data == NULL) { pr_crit("b_data NULL after lookup_bh_lru\n"); WARN_ON(1); } } if (bh) touch_buffer(bh); return bh; } ... and then send me the stack trace after running your reproduction case. If it turns out the problem is in __find_get_block_slow(), could you put in similar debugging checks there and try to track it down? I'm pretty sure the case of bh non-NULL and bh->b_data NULL is never supposed to happen, and while we could just put a check where you suggested, there are plenty of other places which use __getblk(), and there may be other bugs that are hiding here. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Ted, On Tue, Jun 4, 2013 at 4:37 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Tue, Jun 04, 2013 at 02:15:57PM +0300, Ruslan Bilovol wrote: >> > Have you actually seen a case where bh is non-NULL, but bh->b_data is >> > NULL? If not, it might be better to do something like this: >> >> Yes, this is exactly the situation I observe (bh is non-NULL, but >> bh->b_data is NULL) > > Hmm... so the stack trace you sent in the commit description was one > where bh->b_data was NULL? I'm trying to make sure there isn't > something else going on that we don't understand. > > Could you put some instrumentation in __find_get_block()? Something like this: > > struct buffer_head * > __find_get_block(struct block_device *bdev, sector_t block, unsigned size) > { > struct buffer_head *bh = lookup_bh_lru(bdev, block, size); > > if (bh == NULL) { > bh = __find_get_block_slow(bdev, block); > if (bh->b_data == NULL) { > pr_crit("b_data NULL after find_get_block_slow\n); > WARN_ON(1); > } > if (bh) > bh_lru_install(bh); > } else { > if (bh->b_data == NULL) { > pr_crit("b_data NULL after lookup_bh_lru\n"); > WARN_ON(1); > } > } > if (bh) > touch_buffer(bh); > return bh; > } > > ... and then send me the stack trace after running your reproduction > case. If it turns out the problem is in __find_get_block_slow(), > could you put in similar debugging checks there and try to track it > down? > > I'm pretty sure the case of bh non-NULL and bh->b_data NULL is never > supposed to happen, and while we could just put a check where you > suggested, there are plenty of other places which use __getblk(), and > there may be other bugs that are hiding here. Yes agree, that's what I told about in my cover letter fir this patch series. I will debug it with code you mentioned, but the issue appears very rarely, so I need at lease few days for catching this.. Regards, Ruslan > > Regards, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 9545757..48f3da5 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -810,7 +810,7 @@ struct journal_head *jbd2_journal_get_descriptor_buffer(journal_t *journal) return NULL; bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize); - if (!bh) + if (!bh || !bh->b_data) return NULL; lock_buffer(bh); memset(bh->b_data, 0, journal->j_blocksize);
The memset() doesn't perform any NULL-pointer checking before dereferencing passed pointer so this should be checked before calling it. This fixes next issue: [38200.069122] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [38200.078002] pgd = c0004000 [38200.080963] [00000000] *pgd=00000000 [38200.084991] Internal error: Oops: 805 [#1] PREEMPT SMP ARM [38200.091003] Modules linked in: rproc_drm(O) tf_driver(O) gps_drv wl18xx(O) wl12xx(O) wlcore(O) mac80211(O) cfg80211(O) pvrsrvkm_sgx540_120(O) compat(O) [38200.106719] CPU: 1 Tainted: G W O (3.4.34 #1) [38200.112579] PC is at __memzero+0x24/0x80 [38200.116882] LR is at 0x0 [38200.119689] pc : [<c023b004>] lr : [<00000000>] psr: 28000113 [38200.119689] sp : d66b1e2c ip : 00000000 fp : d66b1e54 [38200.132171] r10: 00000000 r9 : d6ad48c0 r8 : c01bd414 [38200.137847] r7 : 00000000 r6 : ffffffff r5 : cb19fe48 r4 : d678bc00 [38200.144958] r3 : 00000000 r2 : 00000000 r1 : 00000fc0 r0 : 00000000 [38200.152008] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [38200.160034] Control: 10c5387d Table: 967b004a DAC: 00000015 [...] [38200.888031] Backtrace: [38200.890869] [<c01c509c>] (jbd2_journal_get_descriptor_buffer+0x0/0xa4) from [<c01bddf0>] (jbd2_journal_commit_transaction+0x994/0x18f4) [38200.903930] r5:d6ad5348 r4:d678bc00 [38200.907989] [<c01bd45c>] (jbd2_journal_commit_transaction+0x0/0x18f4) from [<c01c2e70>] (kjournald2+0xb4/0x24c) [38200.918884] [<c01c2dbc>] (kjournald2+0x0/0x24c) from [<c0066990>] (kthread+0x90/0x9c) [38200.927429] [<c0066900>] (kthread+0x0/0x9c) from [<c004a968>] (do_exit+0x0/0x804) [38200.935577] r6:c004a968 r5:c0066900 r4:d6749c8c [38200.940887] Code: e52de004 e1a0c002 e1a0e002 e2511040 (a8a0500c) Signed-off-by: Ruslan Bilovol <ruslan.bilovol@ti.com> --- fs/jbd2/journal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)