Message ID | 20211227032246.2886878-13-chengzhihao1@huawei.com |
---|---|
State | Under Review |
Delegated to: | Richard Weinberger |
Headers | show |
Series | Some bugfixs for ubi/ubifs | expand |
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> > An: "richard" <richard@nod.at>, "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, > "mcoquelin stm32" <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer" > <s.hauer@pengutronix.de> > CC: "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org> > Gesendet: Montag, 27. Dezember 2021 04:22:43 > Betreff: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2 > Fastmap pebs(pnum >= UBI_FM_MAX_START) won't be added into 'ai->fastmap' > while attaching ubi device if 'fm->used_blocks' is greater than 2, which > may cause warning from 'ubi_assert(ubi->good_peb_count == found_pebs)': > > UBI assert failed in ubi_wl_init at 1878 (pid 2409) > Call Trace: > ubi_wl_init.cold+0xae/0x2af [ubi] > ubi_attach+0x1b0/0x780 [ubi] > ubi_init+0x23a/0x3ad [ubi] > load_module+0x22d2/0x2430 > > Reproduce: > ID="0x20,0x33,0x00,0x00" # 16M 16KB PEB, 512 page > modprobe nandsim id_bytes=$ID > modprobe ubi mtd="0,0" fm_autoconvert # Fastmap takes 2 pebs > rmmod ubi > modprobe ubi mtd="0,0" fm_autoconvert # Attach by fastmap > > Add all used fastmap pebs into list 'ai->fastmap' to make sure they can > be counted into 'found_pebs'. > > Fixes: fdf10ed710c0aa ("ubi: Rework Fastmap attach base code") > Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> > --- > drivers/mtd/ubi/fastmap.c | 35 +++++------------------------------ > 1 file changed, 5 insertions(+), 30 deletions(-) > > diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c > index 6b5f1ffd961b..01dcdd94c9d2 100644 > --- a/drivers/mtd/ubi/fastmap.c > +++ b/drivers/mtd/ubi/fastmap.c > /** > * ubi_scan_fastmap - scan the fastmap. > * @ubi: UBI device object > @@ -865,7 +847,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct > ubi_attach_info *ai, > struct ubi_vid_hdr *vh; > struct ubi_ec_hdr *ech; > struct ubi_fastmap_layout *fm; > - struct ubi_ainf_peb *aeb; > int i, used_blocks, pnum, fm_anchor, ret = 0; > size_t fm_size; > __be32 crc, tmp_crc; > @@ -875,17 +856,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct > ubi_attach_info *ai, > if (fm_anchor < 0) > return UBI_NO_FASTMAP; > > - /* Copy all (possible) fastmap blocks into our new attach structure. */ > - list_for_each_entry(aeb, &scan_ai->fastmap, u.list) { > - struct ubi_ainf_peb *new; > - > - new = clone_aeb(ai, aeb); > - if (!new) > - return -ENOMEM; > - > - list_add(&new->u.list, &ai->fastmap); > - } > - scan_ai->fastmap may contain also old fastmap PEBs. In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs. e.g. after power-cut. That's why scan_ai->fastmap is copied into ai->fastmap. Later in ubi_wl_init() these outdated PEBs will get erased. So, you cannot remove this code. But I fully agree with you that the fm->used_blocks > 1 case is not correct. I fear if scan_ai->fastmap contains old fastmap PEBs and fm->used_blocks is > 1 we need to fall back to scanning mode while attaching. Thanks, //richard
Hi Richard, > scan_ai->fastmap may contain also old fastmap PEBs. > In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs. > e.g. after power-cut. > That's why scan_ai->fastmap is copied into ai->fastmap. > Later in ubi_wl_init() these outdated PEBs will get erased. > So, you cannot remove this code. I thought old fastmap PEBs(async erase works in ubi_update_fastmap()) will be counted into erase PEBs in the next attaching process, because I saw following code snippet in ubi_write_fastmap(): 1260 list_for_each_entry(ubi_wrk, &ubi->works, list) { 1261 if (ubi_is_erase_work(ubi_wrk)) { 1262 wl_e = ubi_wrk->e; 1263 ubi_assert(wl_e); 1264 1265 fec = (struct ubi_fm_ec *)(fm_raw + fm_pos); 1266 1267 fec->pnum = cpu_to_be32(wl_e->pnum); 1268 set_seen(ubi, wl_e->pnum, seen_pebs); 1269 fec->ec = cpu_to_be32(wl_e->ec); 1270 1271 erase_peb_count++; 1272 fm_pos += sizeof(*fec); 1273 ubi_assert(fm_pos <= ubi->fm_size); 1274 } 1275 } 1276 fmh->erase_peb_count = cpu_to_be32(erase_peb_count); Half-writing on fastmap will be recognized in scanning, and UBI fallbacks full scanning, So, I come up with two situations: 1. power-cut before new fastmap written, the old fastmap is completely saved until next attaching, and some free PEBs are written with new fastmap data. Luckly, fastmap anchor PEB's vid header is written first of all, bad fastmap will be returned by ubi_attach_fastmap() in next attaching. 2. power-cut after new fastmap written, the old fastmap PEBs will be added into 'ai->erase' list in next attaching. Did I miss other possible circumstances? > > But I fully agree with you that the fm->used_blocks > 1 case is not correct. > I fear if scan_ai->fastmap contains old fastmap PEBs and fm->used_blocks is > 1 > we need to fall back to scanning mode while attaching.
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> > An: "richard" <richard@nod.at> > CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32" > <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer" > <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org> > Gesendet: Dienstag, 11. Januar 2022 03:48:24 > Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2 > Hi Richard, >> scan_ai->fastmap may contain also old fastmap PEBs. >> In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs. >> e.g. after power-cut. >> That's why scan_ai->fastmap is copied into ai->fastmap. >> Later in ubi_wl_init() these outdated PEBs will get erased. >> So, you cannot remove this code. > I thought old fastmap PEBs(async erase works in ubi_update_fastmap()) > will be counted into erase PEBs in the next attaching process, because I > saw following code snippet in ubi_write_fastmap(): > 1260 list_for_each_entry(ubi_wrk, &ubi->works, list) { > > 1261 if (ubi_is_erase_work(ubi_wrk)) { > > 1262 wl_e = ubi_wrk->e; > > 1263 ubi_assert(wl_e); > > 1264 > > 1265 fec = (struct ubi_fm_ec *)(fm_raw + > fm_pos); > 1266 > > 1267 fec->pnum = cpu_to_be32(wl_e->pnum); > > 1268 set_seen(ubi, wl_e->pnum, seen_pebs); > > 1269 fec->ec = cpu_to_be32(wl_e->ec); > > 1270 > > 1271 erase_peb_count++; > > 1272 fm_pos += sizeof(*fec); > > 1273 ubi_assert(fm_pos <= ubi->fm_size); > > 1274 } > > 1275 } > > 1276 fmh->erase_peb_count = cpu_to_be32(erase_peb_count); > Half-writing on fastmap will be recognized in scanning, and UBI > fallbacks full scanning, So, I come up with two situations: > 1. power-cut before new fastmap written, the old fastmap is completely > saved until next attaching, and some free PEBs are written with new > fastmap data. Luckly, fastmap anchor PEB's vid header is written first > of all, bad fastmap will be returned by ubi_attach_fastmap() in next > attaching. > 2. power-cut after new fastmap written, the old fastmap PEBs will be > added into 'ai->erase' list in next attaching. > Did I miss other possible circumstances? In ubi_wl_init() there is another corner case documented: /* * The fastmap update code might not find a free PEB for * writing the fastmap anchor to and then reuses the * current fastmap anchor PEB. When this PEB gets erased * and a power cut happens before it is written again we * must make sure that the fastmap attach code doesn't * find any outdated fastmap anchors, hence we erase the * outdated fastmap anchor PEBs synchronously here. */ if (aeb->vol_id == UBI_FM_SB_VOLUME_ID) sync = true; So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI starts to operate. With your change this is no longer satisfied. Thanks, //richard
Hi Richard, > In ubi_wl_init() there is another corner case documented: > /* > * The fastmap update code might not find a free PEB for > * writing the fastmap anchor to and then reuses the > * current fastmap anchor PEB. When this PEB gets erased > * and a power cut happens before it is written again we > * must make sure that the fastmap attach code doesn't > * find any outdated fastmap anchors, hence we erase the > * outdated fastmap anchor PEBs synchronously here. > */ > if (aeb->vol_id == UBI_FM_SB_VOLUME_ID) > sync = true; > > So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI > starts to operate. With your change this is no longer satisfied I seem to understand the another case. But I'm still confused that why outdated fastmap PEBs cannot be erased. When UBI comes to this point, it means UBI is attached by **full scanning mode**, scan_fast() returns two values: UBI_NO_FASTMAP: scan all pebs from pnum UBI_FM_MAX_START, ai is assigned with scan_ai, at last, all fastmap pebs are added into 'ai->fastmap' UBI_BAD_FASTMAP: scan all pebs from pnum 0, all fastmap pebs are added into 'ai->fastmap'
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> > An: "richard" <richard@nod.at> > CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32" > <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer" > <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org> > Gesendet: Dienstag, 11. Januar 2022 12:44:49 > Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2 > Hi Richard, >> In ubi_wl_init() there is another corner case documented: >> /* >> * The fastmap update code might not find a free PEB for >> * writing the fastmap anchor to and then reuses the >> * current fastmap anchor PEB. When this PEB gets erased >> * and a power cut happens before it is written again we >> * must make sure that the fastmap attach code doesn't >> * find any outdated fastmap anchors, hence we erase the >> * outdated fastmap anchor PEBs synchronously here. >> */ >> if (aeb->vol_id == UBI_FM_SB_VOLUME_ID) >> sync = true; >> >> So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI >> starts to operate. With your change this is no longer satisfied > I seem to understand the another case. But I'm still confused that why > outdated fastmap PEBs cannot be erased. When UBI comes to this point, it > means UBI is attached by **full scanning mode**, scan_fast() returns two > values: > UBI_NO_FASTMAP: scan all pebs from pnum UBI_FM_MAX_START, ai is > assigned with scan_ai, at last, all fastmap pebs are added into > 'ai->fastmap' > UBI_BAD_FASTMAP: scan all pebs from pnum 0, all fastmap pebs are > added into 'ai->fastmap' I fear, I'm unable to follow your thoughts. ubi_wl_init() is called in both cases, with and without fastmap. And ai->fastmap contains all anchor PEBs that scan_fast() found. This can be the most recent but also outdated anchor PEBs. Thanks, //richard
在 2022/1/11 19:57, Richard Weinberger 写道: > ubi_wl_init() is called in both cases, with and without fastmap. I agree. > And ai->fastmap contains all anchor PEBs that scan_fast() found. > This can be the most recent but also outdated anchor PEBs. Is it exists a case that outdated fastmap PEBs are neither counted into 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching by fastmap. 1853 if (ubi->lookuptbl[aeb->pnum]) 1854 continue; 1855 1856 /* 1857 * The fastmap update code might not find a free PEB for 1858 * writing the fastmap anchor to and then reuses the 1859 * current fastmap anchor PEB. When this PEB gets erased 1860 * and a power cut happens before it is written again we 1861 * must make sure that the fastmap attach code doesn't 1862 * find any outdated fastmap anchors, hence we erase the 1863 * outdated fastmap anchor PEBs synchronously here. 1864 */ 1865 if (aeb->vol_id == UBI_FM_SB_VOLUME_ID) 1866 sync = true; I think UBI attaches failed by fastmap if kernel goes here. 1870 err = erase_aeb(ubi, aeb, sync);
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> > An: "richard" <richard@nod.at> > CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32" > <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer" > <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org> > Gesendet: Dienstag, 11. Januar 2022 14:23:52 > Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2 > 在 2022/1/11 19:57, Richard Weinberger 写道: >> ubi_wl_init() is called in both cases, with and without fastmap. > I agree. > >> And ai->fastmap contains all anchor PEBs that scan_fast() found. >> This can be the most recent but also outdated anchor PEBs. > Is it exists a case that outdated fastmap PEBs are neither counted into > 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching > by fastmap. > [...] > I think UBI attaches failed by fastmap if kernel goes here. > 1870 err = erase_aeb(ubi, aeb, sync); Hmm, I think the paranoia check in fastmap.c is too strict these days. if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count - ai->bad_peb_count - fm->used_blocks)) goto fail_bad; It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs this check will trigger and force falling back to scanning mode. With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap. So I agree that this code path is wonky and can be cleaned up. But please be extremely careful and give all your changes excessive testing with real workload and power-cuts. Thanks, //richard
>>> ubi_wl_init() is called in both cases, with and without fastmap. >> I agree. >> >>> And ai->fastmap contains all anchor PEBs that scan_fast() found. >>> This can be the most recent but also outdated anchor PEBs. >> Is it exists a case that outdated fastmap PEBs are neither counted into >> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching >> by fastmap. >> > > [...] > >> I think UBI attaches failed by fastmap if kernel goes here. >> 1870 err = erase_aeb(ubi, aeb, sync); > > Hmm, I think the paranoia check in fastmap.c is too strict these days. > if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count - > ai->bad_peb_count - fm->used_blocks)) > goto fail_bad; > > It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs > this check will trigger and force falling back to scanning mode. > With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap. Forgive my stubbornness, I think this strict check is good, could you show me a process to trigger this WARN_ON, it would be nice to provide a reproducer. I still insist the point(after my fix patch applied): All outdated fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted into 'fmhdr->erase_peb_count'(fast attached case). > > So I agree that this code path is wonky and can be cleaned up. But please be > extremely careful and give all your changes excessive testing with real workload > and power-cuts.Let's list all power-cut cases during fastmap updateing(I tried to simulate some of them on nandsim), in theory, I think the WARN_ON check is okay and all outdated fastmap PEBs can be erased in next attaching: ubi_update_fastmap: 1565 for (i = 1; i < new_fm->used_blocks; i++) { 1570 if (!tmp_e) { 1571 if (old_fm && old_fm->e[i]) { /* sync erase old fm data PEBs */ power-cut!!! 1585 } else { /* async erase old fm data PEBs */ power-cut!!! 1595 } 1596 } else { 1599 if (old_fm && old_fm->e[i]) { /* async erase old fm data PEBs */ power-cut!!! 1603 } 1604 } 1605 } 1621 if (old_fm) { 1623 if (!tmp_e) { /* sync erase old fm anchor PEB */ power-cut!!! 1638 } else { /* async erase old fm anchor PEB */ power-cut!!! 1644 } 1645 } else { 1658 } 1660 ret = ubi_write_fastmap(ubi, new_fm); ubi_write_fastmap: 1324 ret = ubi_io_write_vid_hdr(ubi, new_fm->e[0]->pnum, avbuf); power-cut!!! 1345 for (i = 1; i < new_fm->used_blocks; i++) { 1350 ret = ubi_io_write_vid_hdr(...); power-cut!!! 1356 } 1358 for (i = 0; i < new_fm->used_blocks; i++) { 1359 ret = ubi_io_write_data(...), power-cut!!! 1366 }
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> > An: "richard" <richard@nod.at> > CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32" > <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer" > <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org> > Gesendet: Mittwoch, 12. Januar 2022 04:46:28 > Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2 >>>> ubi_wl_init() is called in both cases, with and without fastmap. >>> I agree. >>> >>>> And ai->fastmap contains all anchor PEBs that scan_fast() found. >>>> This can be the most recent but also outdated anchor PEBs. >>> Is it exists a case that outdated fastmap PEBs are neither counted into >>> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching >>> by fastmap. >>> >> >> [...] >> >>> I think UBI attaches failed by fastmap if kernel goes here. >>> 1870 err = erase_aeb(ubi, aeb, sync); >> >> Hmm, I think the paranoia check in fastmap.c is too strict these days. >> if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count - >> ai->bad_peb_count - fm->used_blocks)) >> goto fail_bad; >> >> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs >> this check will trigger and force falling back to scanning mode. >> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap. > Forgive my stubbornness, I think this strict check is good, could you > show me a process to trigger this WARN_ON, it would be nice to provide a > reproducer. You can trigger this by interrupting UBI. e.g. When UBI writes a new fastmap to the NAND, it schedules the old fastmap PEBs for erasure. PEB erasure is asynchronous in UBI. So this can be delayed for a very long time. While developing UBI fastmap and performing powercut tests I saw this often on targets. > I still insist the point(after my fix patch applied): All outdated > fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted > into 'fmhdr->erase_peb_count'(fast attached case). Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs get erases synchronously(!). The comment before the erasure explains why. To complicate things, this code is currently unreachable because the WARN_ON() is not right. I misses to count ai->fastmap. So, when there are old fastmap PEBs found, the counter does not match and UBI falls back to full scanning while it could to an attach by fastmap. Fastmap is full with corner cases that have been found by massive amount of testing, sadly. Thanks, //richard
>>> if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count - >>> ai->bad_peb_count - fm->used_blocks)) >>> goto fail_bad; >>> >>> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs >>> this check will trigger and force falling back to scanning mode. >>> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap. >> Forgive my stubbornness, I think this strict check is good, could you >> show me a process to trigger this WARN_ON, it would be nice to provide a >> reproducer. > > You can trigger this by interrupting UBI. > e.g. When UBI writes a new fastmap to the NAND, it schedules the old fastmap > PEBs for erasure. PEB erasure is asynchronous in UBI. So this can be delayed > for a very long time. I tried with following modifications: diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c index 01dcdd94c9d2..253e76e2a7d7 100644 --- a/drivers/mtd/ubi/fastmap.c +++ b/drivers/mtd/ubi/fastmap.c @@ -679,6 +679,7 @@ static int ubi_attach_fastmap(struct ubi_device *ubi, if (fm_pos >= fm_size) goto fail_bad; + pr_err("%s: add %d to erase list\n", __func__, be32_to_cpu(fmec->pnum)); ret = add_aeb(ai, &ai->erase, be32_to_cpu(fmec->pnum), be32_to_cpu(fmec->ec), 1); if (ret) @@ -1103,6 +1104,7 @@ void ubi_fastmap_destroy_checkmap(struct ubi_volume *vol) * * Returns 0 on success, < 0 indicates an internal error. */ +#include <linux/delay.h> static int ubi_write_fastmap(struct ubi_device *ubi, struct ubi_fastmap_layout *new_fm) { @@ -1148,6 +1150,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi, goto out_free_dvbuf; } + pr_err("%s: wait all erase workers deleted from list\n", __func__); + msleep(500); spin_lock(&ubi->volumes_lock); spin_lock(&ubi->wl_lock); @@ -1196,6 +1200,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi, ubi_for_each_free_peb(ubi, wl_e, tmp_rb) { fec = (struct ubi_fm_ec *)(fm_raw + fm_pos); + if (global_old_fm_peb == wl_e->pnum) + pr_err("%s: count peb %d as free(cannot happen)!!!\n", __func__, wl_e->pnum); fec->pnum = cpu_to_be32(wl_e->pnum); set_seen(ubi, wl_e->pnum, seen_pebs); @@ -1208,6 +1214,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi, if (ubi->fm_next_anchor) { fec = (struct ubi_fm_ec *)(fm_raw + fm_pos); + if (global_old_fm_peb == ubi->fm_next_anchor->pnum) + pr_err("%s: count peb %d as next anchor(cannot happen)!!!\n", __func__, ubi->fm_next_anchor->pnum); fec->pnum = cpu_to_be32(ubi->fm_next_anchor->pnum); set_seen(ubi, ubi->fm_next_anchor->pnum, seen_pebs); fec->ec = cpu_to_be32(ubi->fm_next_anchor->ec); @@ -1264,6 +1272,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi, fec = (struct ubi_fm_ec *)(fm_raw + fm_pos); + if (wl_e->pnum == global_old_fm_peb) + pr_err("%s: count fm anchor %d in erase work(should happen)!!!\n", __func__, wl_e->pnum); fec->pnum = cpu_to_be32(wl_e->pnum); set_seen(ubi, wl_e->pnum, seen_pebs); fec->ec = cpu_to_be32(wl_e->ec); @@ -1520,12 +1530,14 @@ static void return_fm_pebs(struct ubi_device *ubi, * * Returns 0 on success, < 0 indicates an internal error. */ +int global_old_fm_peb = -1; int ubi_update_fastmap(struct ubi_device *ubi) { int ret, i, j; struct ubi_fastmap_layout *new_fm, *old_fm; struct ubi_wl_entry *tmp_e; + pr_err("%s: start\n", __func__); down_write(&ubi->fm_protect); down_write(&ubi->work_sem); down_write(&ubi->fm_eba_sem); @@ -1634,6 +1646,9 @@ int ubi_update_fastmap(struct ubi_device *ubi) /* we've got a new anchor PEB, return the old one */ ubi_wl_put_fm_peb(ubi, old_fm->e[0], 0, old_fm->to_be_tortured[0]); + global_old_fm_peb = old_fm->e[0]->pnum; + pr_err("%s: put old fastmap peb %d\n", __func__, old_fm->e[0]->pnum); + /* Not check return value ? Fault injection may cause the WARNON() too! */ new_fm->e[0] = tmp_e; old_fm->e[0] = NULL; } @@ -1665,6 +1680,7 @@ int ubi_update_fastmap(struct ubi_device *ubi) ubi_ensure_anchor_pebs(ubi); + pr_err("%s: done\n", __func__); return ret; err: diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h index 7c083ad58274..17df8dcebbe9 100644 --- a/drivers/mtd/ubi/ubi.h +++ b/drivers/mtd/ubi/ubi.h @@ -37,6 +37,7 @@ #define UBI_NAME_STR "ubi" struct ubi_device; +extern int global_old_fm_peb; /* Normal UBI messages */ __printf(2, 3) diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 8455f1d47f3c..a148280f4ce8 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -198,8 +198,12 @@ static int do_work(struct ubi_device *ubi) * the queue flush code has to be sure the whole queue of works is * done, and it takes the mutex in write mode. */ + if (global_old_fm_peb != -1) + pr_err("-----\n"); down_read(&ubi->work_sem); spin_lock(&ubi->wl_lock); + if (global_old_fm_peb != -1) + pr_err("=====\n"); if (list_empty(&ubi->works)) { spin_unlock(&ubi->wl_lock); up_read(&ubi->work_sem); @@ -1070,6 +1074,7 @@ static int ensure_wear_leveling(struct ubi_device *ubi, int nested) * needed. Returns zero in case of success and a negative error code in case of * failure. */ +#include <linux/delay.h> static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk) { struct ubi_wl_entry *e = wl_wrk->e; @@ -1078,6 +1083,12 @@ static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk) int lnum = wl_wrk->lnum; int err, available_consumed = 0; + if (pnum == global_old_fm_peb) { + pr_err("%s: erase worker(erase %d) has been deleted from list!", __func__, pnum); + pr_err("Dump ubi image to file\n"); + ubi_ro_mode(ubi); + } + dbg_wl("erase PEB %d EC %d LEB %d:%d", pnum, e->ec, wl_wrk->vol_id, wl_wrk->lnum); modprobe nandsim id_bytes="0xec,0xa1,0x00,0x15" modprobe ubi mtd="0,2048" fm_autoconvert sleep 1 # Wait until all erase works done ubimkvol -N vol_a -m -n 0 /dev/ubi0 # trigger fastmap updating dd if=/dev/mtd0 of=flash rmmod ubi nandwrite /dev/mtd0 flash > /dev/null modprobe ubi mtd="0,2048" fm_autoconvert Kernel will print: [13198.624433] ubi_update_fastmap: start [13198.625164] ubi_write_fastmap: wait all erase workers deleted from list [13199.134011] ubi_update_fastmap: done # first attaching triggers wearleveling [13199.623438] ubi_update_fastmap: start # volume creation triggers updating fastmap [13199.624732] ubi_update_fastmap: put old fastmap peb 2 # schedule old fastmap PEB to erase [13199.624757] ----- [13199.626532] ubi_write_fastmap: wait all erase workers deleted from list [13200.133059] ubi_write_fastmap: count fm anchor 2 in erase work(should happen)!!! # Count PEB(in erase work) into 'fmh->erase_peb_count' [13200.136996] ubi_update_fastmap: done [13200.137000] ===== [13200.138833] __erase_worker: erase worker(erase 2) has been deleted from list! [13200.138842] Dump ubi image to file [13203.091720] ubi0: attaching mtd0 [13203.111179] ubi_attach_fastmap: add 2 to erase list [13203.113013] ubi0: attached by fastmap I think the WARNON cannot be triggered because old fastmap PEBs are always counted into 'fmh->erase_peb_count', and 'ubi->work_sem' serilizes ubi_update_fastmap() and do_work(): ubi_update_fastmap ubi_thread down_write(&ubi->work_sem) ubi_wl_put_fm_peb(old fastmap PEB) schedule_erase list_add_tail(&wrk->list, &ubi->works) do_work down_read(&ubi->work_sem) // Stuck here! ubi_write_fastmap list_for_each_entry(ubi_wrk, &ubi->works, list) fec->pnum = cpu_to_be32(wl_e->pnum) up_write(&ubi->work_sem) list_del(&wrk->list) erase_worker > While developing UBI fastmap and performing powercut tests I saw this often > on targets. Was commit 2e8f08deabbc7ee("ubi: Fix races around ubi_refill_pools()") applied at that time? The WANRON can be triggered without this commit, because this commit makes sure do_work() cannot happen between ubi_wl_put_fm_peb() and ubi_write_fastmap(). > >> I still insist the point(after my fix patch applied): All outdated >> fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted >> into 'fmhdr->erase_peb_count'(fast attached case). > > Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs > get erases synchronously(!). The comment before the erasure explains why. About erasing fastmap anchor PEB synchronously, I admit curreunt UBI implementation cannot satisfy it, even with my fix applied. Wait, it seems that UBI has never made it sure. Old fastmap PEBs could be erased asynchronously, they could be counted into 'fmh->erase_peb_count' even in early UBI implementation code, so old fastmap anchor PEB will be added into 'ai->erase' and be erased asynchronously in next attaching. But, I feel it is not a problem, find_fm_anchor() can help us find out the right fastmap anchor PEB according seqnum. > To complicate things, this code is currently unreachable because the WARN_ON() > is not right. I misses to count ai->fastmap. > So, when there are old fastmap PEBs found, the counter does not match > and UBI falls back to full scanning while it could to an attach by fastmap. >I think the mainly cause of failed fastmap attaching is half-written of fastmap. The counter check(eg. WARNON) is okay. > Fastmap is full with corner cases that have been found by massive amount of testing, sadly.
>> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs >> get erases synchronously(!). The comment before the erasure explains why. > About erasing fastmap anchor PEB synchronously, I admit curreunt UBI > implementation cannot satisfy it, even with my fix applied. Wait, it > seems that UBI has never made it sure. Old fastmap PEBs could be erased > asynchronously, they could be counted into 'fmh->erase_peb_count' even > in early UBI implementation code, so old fastmap anchor PEB will be > added into 'ai->erase' and be erased asynchronously in next attaching. In next attaching old fastmap PEBs will be processed as following: ubi_attach_fastmap -> add_aeb(ai, &ai->erase...) ubi_wl_init list_for_each_entry_safe(aeb, tmp, &ai->erase) erase_aeb // erase asynchronously ubi->lookuptbl[e->pnum] = e list_for_each_entry(aeb, &ai->fastmap, u.list) e = ubi_find_fm_block(ubi, aeb->pnum) if (e) { ... } else { if (ubi->lookuptbl[aeb->pnum]) // old fastmap PEBs are assigned to 'ubi->lookuptbl' continue; } > But, I feel it is not a problem, find_fm_anchor() can help us find out > the right fastmap anchor PEB according seqnum.
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> > An: "richard" <richard@nod.at> > CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32" > <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer" > <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org> > Gesendet: Samstag, 15. Januar 2022 09:46:07 > Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2 >>> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs >>> get erases synchronously(!). The comment before the erasure explains why. >> About erasing fastmap anchor PEB synchronously, I admit curreunt UBI >> implementation cannot satisfy it, even with my fix applied. Wait, it >> seems that UBI has never made it sure. Old fastmap PEBs could be erased >> asynchronously, they could be counted into 'fmh->erase_peb_count' even >> in early UBI implementation code, so old fastmap anchor PEB will be >> added into 'ai->erase' and be erased asynchronously in next attaching. > In next attaching old fastmap PEBs will be processed as following: > ubi_attach_fastmap -> add_aeb(ai, &ai->erase...) > ubi_wl_init > list_for_each_entry_safe(aeb, tmp, &ai->erase) > erase_aeb // erase asynchronously > ubi->lookuptbl[e->pnum] = e > list_for_each_entry(aeb, &ai->fastmap, u.list) > e = ubi_find_fm_block(ubi, aeb->pnum) > if (e) { > ... > } else { > if (ubi->lookuptbl[aeb->pnum]) // old fastmap PEBs are > assigned to 'ubi->lookuptbl' > continue; > } >> But, I feel it is not a problem, find_fm_anchor() can help us find out > > the right fastmap anchor PEB according seqnum. FYI, I think I understand now our disagreement. You assume that old Fastmap PEBs are *guaranteed* to be part of Fastmap's erase list. That's okay and this is what Linux as of today does. My point is that we need to be paranoid and check carefully for old Fastmap PEBs which might be *not* on the erase list. I saw such issues in the wild. These were causes by old and/or buggy Fastmap implementations. Keep in mind that Linux is not the only system that implements UBI (and fastmap). So let me give the whole situation another thought on how to improve it. I totally agree with you that currently there is a problem with fm->used_blocks > 1. I'm just careful (maybe too careful) about changing Fastmap code. Thanks, //richard
> FYI, I think I understand now our disagreement. > You assume that old Fastmap PEBs are *guaranteed* to be part of Fastmap's erase list. > That's okay and this is what Linux as of today does. > > My point is that we need to be paranoid and check carefully for old Fastmap PEBs > which might be *not* on the erase list. > I saw such issues in the wild. These were causes by old and/or buggy Fastmap > implementations. > Keep in mind that Linux is not the only system that implements UBI (and fastmap). Uh, that is really a point, I met UBI implemented in Vxworks ever. Now, you convinced me, we should process fastmap with considering bad images(caused by other implementations). Let's keep this wonky assertion until a better fix. > > So let me give the whole situation another thought on how to improve it. > I totally agree with you that currently there is a problem with fm->used_blocks > 1. > I'm just careful (maybe too careful) about changing Fastmap code. > > Thanks, > //richard > . >
Hi Richard, >> FYI, I think I understand now our disagreement. >> You assume that old Fastmap PEBs are *guaranteed* to be part of >> Fastmap's erase list. >> That's okay and this is what Linux as of today does. >> >> My point is that we need to be paranoid and check carefully for old >> Fastmap PEBs >> which might be *not* on the erase list. >> I saw such issues in the wild. These were causes by old and/or buggy >> Fastmap >> implementations. >> Keep in mind that Linux is not the only system that implements UBI >> (and fastmap). > Uh, that is really a point, I met UBI implemented in Vxworks ever. Now, > you convinced me, we should process fastmap with considering bad > images(caused by other implementations). Let's keep this wonky assertion > until a better fix. >> >> So let me give the whole situation another thought on how to improve it. >> I totally agree with you that currently there is a problem with >> fm->used_blocks > 1. >> I'm just careful (maybe too careful) about changing Fastmap code. >>I reconsider the WARNON, it can recognize the bad images and fall back full scanning mode early. If linux kernel encounters the WARNON, it means something wrong with your image(caused by bad UBI implementation). I begin to like this strict check. After comparing WARNON with the assertion: WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count - ai->bad_peb_count - fm->used_blocks) ubi_assert(ubi->good_peb_count == found_pebs) The 'found_pebs' consists of 'ai->erase', 'ai->free', 'ai->volumes' and 'ai->fastmap'. The count_fastmap_pebs() includes 'ai->erase', 'ai->free' and 'ai->volumes'. This means the number of 'ai->fastmap' equals to 'fm->used_blocks'. So, the 'ai->fastmap' adding position in my fix is right. In a word, can you accept the point that 'WARNON' can help us recognize the bad fastmap images?
diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c index 6b5f1ffd961b..01dcdd94c9d2 100644 --- a/drivers/mtd/ubi/fastmap.c +++ b/drivers/mtd/ubi/fastmap.c @@ -828,24 +828,6 @@ static int find_fm_anchor(struct ubi_attach_info *ai) return ret; } -static struct ubi_ainf_peb *clone_aeb(struct ubi_attach_info *ai, - struct ubi_ainf_peb *old) -{ - struct ubi_ainf_peb *new; - - new = ubi_alloc_aeb(ai, old->pnum, old->ec); - if (!new) - return NULL; - - new->vol_id = old->vol_id; - new->sqnum = old->sqnum; - new->lnum = old->lnum; - new->scrub = old->scrub; - new->copy_flag = old->copy_flag; - - return new; -} - /** * ubi_scan_fastmap - scan the fastmap. * @ubi: UBI device object @@ -865,7 +847,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai, struct ubi_vid_hdr *vh; struct ubi_ec_hdr *ech; struct ubi_fastmap_layout *fm; - struct ubi_ainf_peb *aeb; int i, used_blocks, pnum, fm_anchor, ret = 0; size_t fm_size; __be32 crc, tmp_crc; @@ -875,17 +856,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai, if (fm_anchor < 0) return UBI_NO_FASTMAP; - /* Copy all (possible) fastmap blocks into our new attach structure. */ - list_for_each_entry(aeb, &scan_ai->fastmap, u.list) { - struct ubi_ainf_peb *new; - - new = clone_aeb(ai, aeb); - if (!new) - return -ENOMEM; - - list_add(&new->u.list, &ai->fastmap); - } - down_write(&ubi->fm_protect); memset(ubi->fm_buf, 0, ubi->fm_size); @@ -1029,6 +999,11 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai, "err: %i)", i, pnum, ret); goto free_hdr; } + + /* Add all fastmap blocks into attach structure. */ + ret = add_aeb(ai, &ai->fastmap, pnum, be64_to_cpu(ech->ec), 0); + if (ret) + goto free_hdr; } kfree(fmsb);
Fastmap pebs(pnum >= UBI_FM_MAX_START) won't be added into 'ai->fastmap' while attaching ubi device if 'fm->used_blocks' is greater than 2, which may cause warning from 'ubi_assert(ubi->good_peb_count == found_pebs)': UBI assert failed in ubi_wl_init at 1878 (pid 2409) Call Trace: ubi_wl_init.cold+0xae/0x2af [ubi] ubi_attach+0x1b0/0x780 [ubi] ubi_init+0x23a/0x3ad [ubi] load_module+0x22d2/0x2430 Reproduce: ID="0x20,0x33,0x00,0x00" # 16M 16KB PEB, 512 page modprobe nandsim id_bytes=$ID modprobe ubi mtd="0,0" fm_autoconvert # Fastmap takes 2 pebs rmmod ubi modprobe ubi mtd="0,0" fm_autoconvert # Attach by fastmap Add all used fastmap pebs into list 'ai->fastmap' to make sure they can be counted into 'found_pebs'. Fixes: fdf10ed710c0aa ("ubi: Rework Fastmap attach base code") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> --- drivers/mtd/ubi/fastmap.c | 35 +++++------------------------------ 1 file changed, 5 insertions(+), 30 deletions(-)