Message ID | 200906221938.27574.bzolnier@gmail.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
On Monday 22 June 2009, Bartlomiej Zolnierkiewicz wrote: > On Monday 22 June 2009 17:16:04 Frans Pop wrote: > Thanks. Please notice 0701 0701 words above -- it means that this > device reports both SWDMA0 and MWDMA0 enabled at once (which results > in IDE layer failing DMA tuning). > > The patch below should fix it Yes, this gives back MWDMA2 for hdd. > and it would be quite interesting to try > it on vanilla kernel to see if it helps with unexpected IRQ problem. Will do later. > However this still doesn't explain the regression fully -- we had > ide_id_dma_bug() checks since Dec 2007 (and equivalent > ide_dma_verbose() ones since almost forever) while 2.6.26 (which works > fine) is much younger than that. I suspect that there are some other > kernel changes coming into the picture (Power Management?). Would it > be possible to try 2.6.2[78] and/or bisect this problem further? I suspect commit 8d64fcd9 "ide: identify data word 53 bit 1 doesn't cover words 62 and 63 (take 3)": @@ -396,15 +393,14 @@ int ide_id_dma_bug(ide_drive_t *drive) if (id[ATA_ID_FIELD_VALID] & 4) { if ((id[ATA_ID_UDMA_MODES] >> 8) && (id[ATA_ID_MWDMA_MODES] >> 8)) goto err_out; - } else if (id[ATA_ID_FIELD_VALID] & 2) { - if ((id[ATA_ID_MWDMA_MODES] >> 8) && - (id[ATA_ID_SWDMA_MODES] >> 8)) - goto err_out; - } + } else if ((id[ATA_ID_MWDMA_MODES] >> 8) && + (id[ATA_ID_SWDMA_MODES] >> 8)) + goto err_out; The logs I posted were from 2.6.30. I also tried 2.6.29 and that did *not* yet have the DMA problem. The commit above is from the 2.6.30 development cycle, so that fits. I expect you can verify it from the identify data. > From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> > Subject: [PATCH] ide: relax DMA info validity checking > > There are some broken devices that report multiple DMA xfer modes > enabled at once (ATA spec doesn't allow it) but otherwise work fine > with DMA so just delete ide_id_dma_bug(). The question is maybe: are there other devices that currently have dma disabled because of the (old) code and would stop working with ide_id_dma_bug() completely removed? The conservative thing to do I guess would be to reverse 8d64fcd9. There is one thing I should mention here. I have been seeing the following error with this CD drive: ide-cd: hdd: weird block size 2352 ide-cd: hdd: default to 2kb block size This was present with 2.6.26 and also now with 2.6.31; not sure about older kernels. I initially saw it with a self-burned Debian installation CD. I also now see it with an audio CD. It does not seem to affect reading the disks: installations go fine and the audio CD plays without any problems. Any risk this may be related to something we've been discussing so far, or is this a separate issue? -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday 22 June 2009 21:01:37 Frans Pop wrote: > On Monday 22 June 2009, Bartlomiej Zolnierkiewicz wrote: > > On Monday 22 June 2009 17:16:04 Frans Pop wrote: > > Thanks. Please notice 0701 0701 words above -- it means that this > > device reports both SWDMA0 and MWDMA0 enabled at once (which results > > in IDE layer failing DMA tuning). > > > > The patch below should fix it > > Yes, this gives back MWDMA2 for hdd. Cool. > > and it would be quite interesting to try > > it on vanilla kernel to see if it helps with unexpected IRQ problem. > > Will do later. > > > However this still doesn't explain the regression fully -- we had > > ide_id_dma_bug() checks since Dec 2007 (and equivalent > > ide_dma_verbose() ones since almost forever) while 2.6.26 (which works > > fine) is much younger than that. I suspect that there are some other > > kernel changes coming into the picture (Power Management?). Would it > > be possible to try 2.6.2[78] and/or bisect this problem further? > > I suspect commit 8d64fcd9 "ide: identify data word 53 bit 1 doesn't cover > words 62 and 63 (take 3)": > @@ -396,15 +393,14 @@ int ide_id_dma_bug(ide_drive_t *drive) > > if (id[ATA_ID_FIELD_VALID] & 4) { > if ((id[ATA_ID_UDMA_MODES] >> 8) && > (id[ATA_ID_MWDMA_MODES] >> 8)) > goto err_out; > - } else if (id[ATA_ID_FIELD_VALID] & 2) { > - if ((id[ATA_ID_MWDMA_MODES] >> 8) && > - (id[ATA_ID_SWDMA_MODES] >> 8)) > - goto err_out; > - } > + } else if ((id[ATA_ID_MWDMA_MODES] >> 8) && > + (id[ATA_ID_SWDMA_MODES] >> 8)) > + goto err_out; > > > The logs I posted were from 2.6.30. I also tried 2.6.29 and that did *not* This breaks my beautiful theory about the root cause of unexpected IRQs.. ;( > yet have the DMA problem. The commit above is from the 2.6.30 development > cycle, so that fits. I expect you can verify it from the identify data. I had the same idea initially, unfortunately bit 1 is set for word 53 so this must be something else... > > From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> > > Subject: [PATCH] ide: relax DMA info validity checking > > > > There are some broken devices that report multiple DMA xfer modes > > enabled at once (ATA spec doesn't allow it) but otherwise work fine > > with DMA so just delete ide_id_dma_bug(). > > The question is maybe: are there other devices that currently have dma > disabled because of the (old) code and would stop working with > ide_id_dma_bug() completely removed? The conservative thing to do I guess > would be to reverse 8d64fcd9. This is quite unlikely given that libata has never had such checks.. > There is one thing I should mention here. I have been seeing the following > error with this CD drive: > ide-cd: hdd: weird block size 2352 > ide-cd: hdd: default to 2kb block size > > This was present with 2.6.26 and also now with 2.6.31; not sure about > older kernels. I initially saw it with a self-burned Debian installation > CD. I also now see it with an audio CD. It does not seem to affect > reading the disks: installations go fine and the audio CD plays without > any problems. > > Any risk this may be related to something we've been discussing so far, or > is this a separate issue? This is just a harmless warning coming from enabling of the workaround for weird ATAPI devices (the one you have in this sparc machine seems to score really high on the weirdness scale ;) introduced by commit e8e7b9e. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Date: Mon, 22 Jun 2009 23:35:06 +0200 > On Monday 22 June 2009 21:01:37 Frans Pop wrote: >> yet have the DMA problem. The commit above is from the 2.6.30 development >> cycle, so that fits. I expect you can verify it from the identify data. > > I had the same idea initially, unfortunately bit 1 is set for word 53 so this > must be something else... So this change should have had zero effect for Frans's case. I double checked everything with test programs going over his ID dump and it all checks out. We might need to bisect this one. But Frans, just for the record could you simply test reverting just that hunk? Thanks! >> @@ -396,15 +393,14 @@ int ide_id_dma_bug(ide_drive_t *drive) >> >> if (id[ATA_ID_FIELD_VALID] & 4) { >> if ((id[ATA_ID_UDMA_MODES] >> 8) && >> (id[ATA_ID_MWDMA_MODES] >> 8)) >> goto err_out; >> - } else if (id[ATA_ID_FIELD_VALID] & 2) { >> - if ((id[ATA_ID_MWDMA_MODES] >> 8) && >> - (id[ATA_ID_SWDMA_MODES] >> 8)) >> - goto err_out; >> - } >> + } else if ((id[ATA_ID_MWDMA_MODES] >> 8) && >> + (id[ATA_ID_SWDMA_MODES] >> 8)) >> + goto err_out; -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Frans Pop <elendil@planet.nl> Date: Mon, 22 Jun 2009 21:01:37 +0200 > The question is maybe: are there other devices that currently have dma > disabled because of the (old) code and would stop working with > ide_id_dma_bug() completely removed? The conservative thing to do I guess > would be to reverse 8d64fcd9. For this specific situation I would likely avoid a revert at least with how things currently stand. Right now we are not yet certain what introduced the problem. We are also not certain what might break by reverting this commit, either. There are portions of that commit other than the one changing the logic of ide_id_dma_bug(). That's a "fix" based upon two large unknowns, which is not wise I think :) -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 23 June 2009, you wrote: > We might need to bisect this one. But Frans, just for the record > could you simply test reverting just that hunk? Thanks! I'm way ahead of you :-) Instead of a bisect [1] I decided to first see if some printks in both .26 and .31 would show anything useful. With 2.6.31 and code included below I get: hda: ST34342A, ATA DISK drive FJP: id_dma_bug 0x7: &4: 0x0-0x4 no error hda: MWDMA2 mode selected hdc: Maxtor 6E040L0, ATA DISK drive hdd: CD-ROM 56X/AKH, ATAPI CD/DVD-ROM drive hdc: host max PIO5 wanted PIO255(auto-tune) selected PIO4 FJP: ID_FIELD_VALID: 0x7 (true) FJP: id_dma_bug 0x7: &4: 0x0-0x4 no error hdc: MWDMA2 mode selected hdd: host max PIO5 wanted PIO255(auto-tune) selected PIO4 FJP: ID_FIELD_VALID: 0x2 (true) FJP: id_dma_bug 0x2: &2: 0x1-0x1 bad modes <------------- hdd: bad DMA info in identify block Note that this included a complete revert of 8d64fcd9 (with minor conflict resolved). Here's the same output with 2.6.26.3 with equivalent debug statements: hda: ST34342A, ATA DISK drive FJP: id_dma_bug 0x7: &4: 0x0-0x0 no error hda: MWDMA2 mode selected hdc: Maxtor 6E040L0, ATA DISK drive hdd: CD-ROM 56X/AKH, ATAPI CD/DVD-ROM drive FJP: id_dma_bug 0x7: &4: 0x0-0x0 no error hdc: MWDMA2 mode selected FJP: id_dma_bug 0x2: &2: 0x0-0x0 no error <------------- hdd: MWDMA2 mode selected So it seems to me that in 2.6.26 something was broken in the way these ID fields were handled, at least in this check. This is now fixed (possibly by the changes around 5b90e990..48fb2688) and *that* causes the regression. Note that the hard disks are also affected. If I'm correct I guess that supports Bart's patch to just remove the whole thing. But did this only affect the id_dma_bug check or also the use of these fields elsewhere? Cheers, FJP [1] I don't have a crossbuild environment for sparc, so a bisect would be painful with 300MHz; I at least have plenty memory luckily. int ide_id_dma_bug(ide_drive_t *drive) { u16 *id = drive->id; printk("FJP: id_dma_bug 0x%x:", id[ATA_ID_FIELD_VALID]); if (id[ATA_ID_FIELD_VALID] & 4) { printk(" &4: 0x%x-0x%x", (id[ATA_ID_UDMA_MODES] >> 8), (id[ATA_ID_MWDMA_MODES] >> 8)); if ((id[ATA_ID_UDMA_MODES] >> 8) && (id[ATA_ID_MWDMA_MODES] >> 8)) { printk(" bad modes"); goto err_out; } } else if (id[ATA_ID_FIELD_VALID] & 2) { printk(" &2: 0x%x-0x%x", (id[ATA_ID_MWDMA_MODES] >> 8), (id[ATA_ID_SWDMA_MODES] >> 8)); if ((id[ATA_ID_MWDMA_MODES] >> 8) && (id[ATA_ID_SWDMA_MODES] >> 8)) { printk(" bad modes"); goto err_out; } } printk(" no error\n"); return 0; err_out: printk("\n"); printk(KERN_ERR "%s: bad DMA info in identify block\n", drive->name); return 1; } -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 23 June 2009 16:58:54 Frans Pop wrote: > On Tuesday 23 June 2009, you wrote: > > We might need to bisect this one. But Frans, just for the record > > could you simply test reverting just that hunk? Thanks! > > I'm way ahead of you :-) > > Instead of a bisect [1] I decided to first see if some printks in both .26 > and .31 would show anything useful. > > With 2.6.31 and code included below I get: > hda: ST34342A, ATA DISK drive > FJP: id_dma_bug 0x7: &4: 0x0-0x4 no error > hda: MWDMA2 mode selected > hdc: Maxtor 6E040L0, ATA DISK drive > hdd: CD-ROM 56X/AKH, ATAPI CD/DVD-ROM drive > hdc: host max PIO5 wanted PIO255(auto-tune) selected PIO4 > FJP: ID_FIELD_VALID: 0x7 (true) > FJP: id_dma_bug 0x7: &4: 0x0-0x4 no error > hdc: MWDMA2 mode selected > hdd: host max PIO5 wanted PIO255(auto-tune) selected PIO4 > FJP: ID_FIELD_VALID: 0x2 (true) > FJP: id_dma_bug 0x2: &2: 0x1-0x1 bad modes <------------- > hdd: bad DMA info in identify block > > Note that this included a complete revert of 8d64fcd9 (with minor conflict > resolved). > > Here's the same output with 2.6.26.3 with equivalent debug statements: > hda: ST34342A, ATA DISK drive > FJP: id_dma_bug 0x7: &4: 0x0-0x0 no error > hda: MWDMA2 mode selected > hdc: Maxtor 6E040L0, ATA DISK drive > hdd: CD-ROM 56X/AKH, ATAPI CD/DVD-ROM drive > FJP: id_dma_bug 0x7: &4: 0x0-0x0 no error > hdc: MWDMA2 mode selected > FJP: id_dma_bug 0x2: &2: 0x0-0x0 no error <------------- > hdd: MWDMA2 mode selected > > So it seems to me that in 2.6.26 something was broken in the way these ID > fields were handled, at least in this check. This is now fixed (possibly Great debugging work, thanks! > by the changes around 5b90e990..48fb2688) and *that* causes the > regression. Note that the hard disks are also affected. I see it now -- in commit c419993 ("ide-iops: only clear DMA words on setting DMA mode") we fixed a bug in ide_config_drive_speed().. [ It was a real bug resulting in incorrect data being passed to the user-space through HDIO_GET_IDENTITY ioctl ('hdparm -i'). ] -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Date: Tue, 23 Jun 2009 18:13:08 +0200 > On Tuesday 23 June 2009 16:58:54 Frans Pop wrote: >> So it seems to me that in 2.6.26 something was broken in the way these ID >> fields were handled, at least in this check. This is now fixed (possibly > > Great debugging work, thanks! Indeed, thanks for all of that work Frans! >> by the changes around 5b90e990..48fb2688) and *that* causes the >> regression. Note that the hard disks are also affected. > > I see it now -- in commit c419993 ("ide-iops: only clear DMA words > on setting DMA mode") we fixed a bug in ide_config_drive_speed().. > > [ It was a real bug resulting in incorrect data being passed to > the user-space through HDIO_GET_IDENTITY ioctl ('hdparm -i'). ] Given all of this I also agree that we should apply Bart's patch to remove the debugging check altogether, and I will thus do that right now. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: b/drivers/ide/ide-dma.c =================================================================== --- a/drivers/ide/ide-dma.c +++ b/drivers/ide/ide-dma.c @@ -361,9 +361,6 @@ static int ide_tune_dma(ide_drive_t *dri if (__ide_dma_bad_drive(drive)) return 0; - if (ide_id_dma_bug(drive)) - return 0; - if (hwif->host_flags & IDE_HFLAG_TRUST_BIOS_FOR_DMA) return config_drive_for_dma(drive); @@ -394,24 +391,6 @@ static int ide_dma_check(ide_drive_t *dr return -1; } -int ide_id_dma_bug(ide_drive_t *drive) -{ - u16 *id = drive->id; - - if (id[ATA_ID_FIELD_VALID] & 4) { - if ((id[ATA_ID_UDMA_MODES] >> 8) && - (id[ATA_ID_MWDMA_MODES] >> 8)) - goto err_out; - } else if ((id[ATA_ID_MWDMA_MODES] >> 8) && - (id[ATA_ID_SWDMA_MODES] >> 8)) - goto err_out; - - return 0; -err_out: - printk(KERN_ERR "%s: bad DMA info in identify block\n", drive->name); - return 1; -} - int ide_set_dma(ide_drive_t *drive) { int rc; Index: b/drivers/ide/ide-iops.c =================================================================== --- a/drivers/ide/ide-iops.c +++ b/drivers/ide/ide-iops.c @@ -329,9 +329,6 @@ int ide_driveid_update(ide_drive_t *driv kfree(id); - if ((drive->dev_flags & IDE_DFLAG_USING_DMA) && ide_id_dma_bug(drive)) - ide_dma_off(drive); - return 1; out_err: if (rc == 2) Index: b/include/linux/ide.h =================================================================== --- a/include/linux/ide.h +++ b/include/linux/ide.h @@ -1361,7 +1361,6 @@ int ide_in_drive_list(u16 *, const struc #ifdef CONFIG_BLK_DEV_IDEDMA int ide_dma_good_drive(ide_drive_t *); int __ide_dma_bad_drive(ide_drive_t *); -int ide_id_dma_bug(ide_drive_t *); u8 ide_find_dma_mode(ide_drive_t *, u8); @@ -1402,7 +1401,6 @@ void ide_dma_lost_irq(ide_drive_t *); ide_startstop_t ide_dma_timeout_retry(ide_drive_t *, int); #else -static inline int ide_id_dma_bug(ide_drive_t *drive) { return 0; } static inline u8 ide_find_dma_mode(ide_drive_t *drive, u8 speed) { return 0; } static inline u8 ide_max_dma_mode(ide_drive_t *drive) { return 0; } static inline void ide_dma_off_quietly(ide_drive_t *drive) { ; }