Message ID | 1415336894-15327-2-git-send-email-martin.petersen@oracle.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
On Fri, Nov 07, 2014 at 12:08:12AM -0500, Martin K. Petersen wrote: > if (ata_id_has_trim(args->id)) { > - rbuf[14] |= 0x80; /* TPE */ > + rbuf[14] |= 0x80; /* LBPME */ > > - if (ata_id_has_zero_after_trim(args->id)) > - rbuf[14] |= 0x40; /* TPRZ */ > + if (ata_id_has_zero_after_trim(args->id) && > + dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { > + ata_dev_warn(dev, "Enabling discard_zeroes_data\n"); I think this should _info, not _warn. Otherwise looks good to me, Reviewed-by: Christoph Hellwig <hch@lst.de> It would be nice if there was a way to trigger the flag from userspace, so that we don't need to rebuild the kernel to add a whitelist entry. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes: >> + ata_dev_warn(dev, "Enabling discard_zeroes_data\n"); Christoph> I think this should _info, not _warn. Fixed.
On 07/11/2014 06:08, Martin K. Petersen wrote: > The whitelist is only meant as a starting point and is by no means > comprehensive: > > - All intel SSD models except for 510 > - Micron M5* > - Samsung SSDs > - Seagate SSDs > > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > --- > drivers/ata/libata-core.c | 18 ++++++++++++++---- > drivers/ata/libata-scsi.c | 10 ++++++---- > include/linux/libata.h | 1 + > 3 files changed, 21 insertions(+), 8 deletions(-) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index c5ba15af87d3..f41f24a8bc21 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -4225,10 +4225,20 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { > { "PIONEER DVD-RW DVR-216D", NULL, ATA_HORKAGE_NOSETXFER }, > > /* devices that don't properly handle queued TRIM commands */ > - { "Micron_M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, > - { "Crucial_CT???M500SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, > - { "Micron_M550*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, > - { "Crucial_CT*M550SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, > + { "Micron_M5?0*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | > + ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + { "Crucial_CT???M5?0SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero. BTW. it's the same hardware as the M550, so probably the same set of quirks should apply to both. Paolo > + > + /* > + * DRAT/RZAT are weak guarantees. Explicitly black/whitelist > + * SSDs that provide reliable zero after TRIM. > + */ > + { "INTEL*SSDSC2MH*", NULL, 0, }, /* Blacklist intel 510 */ > + { "INTEL*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + { "SSD*INTEL*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + { "Samsung*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + { "SAMSUNG*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + { "ST[1248][0248]0[FH]*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, > > /* > * Some WD SATA-I drives spin up and down erratically when the link > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 0586f66d70fa..deaa6e34ed4d 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -2515,13 +2515,15 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf) > rbuf[15] = lowest_aligned; > > if (ata_id_has_trim(args->id)) { > - rbuf[14] |= 0x80; /* TPE */ > + rbuf[14] |= 0x80; /* LBPME */ > > - if (ata_id_has_zero_after_trim(args->id)) > - rbuf[14] |= 0x40; /* TPRZ */ > + if (ata_id_has_zero_after_trim(args->id) && > + dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { > + ata_dev_warn(dev, "Enabling discard_zeroes_data\n"); > + rbuf[14] |= 0x40; /* LBPRZ */ > + } > } > } > - > return 0; > } > > diff --git a/include/linux/libata.h b/include/linux/libata.h > index bd5fefeaf548..45ac825b8366 100644 > --- a/include/linux/libata.h > +++ b/include/linux/libata.h > @@ -421,6 +421,7 @@ enum { > ATA_HORKAGE_NO_NCQ_TRIM = (1 << 19), /* don't use queued TRIM */ > ATA_HORKAGE_NOLPM = (1 << 20), /* don't use LPM */ > ATA_HORKAGE_WD_BROKEN_LPM = (1 << 21), /* some WDs have broken LPM */ > + ATA_HORKAGE_ZERO_AFTER_TRIM = (1 << 22),/* guarantees zero after trim */ > > /* DMA mask for user DMA control: User visible values; DO NOT > renumber */ > -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi- > owner@vger.kernel.org] On Behalf Of Martin K. Petersen > Sent: Thursday, 06 November, 2014 11:08 PM > To: linux-scsi@vger.kernel.org; linux-ide@vger.kernel.org; linux- > fsdevel@vger.kernel.org; neilb@suse.de > Cc: Martin K. Petersen > Subject: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return > zeroes after TRIM > > As defined, the DRAT (Deterministic Read After Trim) and RZAT (Return > Zero After Trim) flags in the ATA Command Set are unreliable in the > sense that they only define what happens if the device successfully > executed the DSM TRIM command. TRIM is only advisory, however, and the > device is free to silently ignore all or parts of the request. > > In practice this renders the DRAT and RZAT flags completely useless and > because the results are unpredictable we decided to disable discard in > MD for 3.18 to avoid the risk of data corruption. > > Hardware vendors in the real world obviously need better guarantees than > what the standards bodies provide. Unfortuntely those guarantees are > encoded in product requirements documents rather than somewhere we can > key off of them programatically. So we are compelled to disabling > discard_zeroes_data for all devices unless we explicitly have data to > support whitelisting them. > > This patch whitelists SSDs from a few of the main vendors. None of the > whitelists are based on written guarantees. They are purely based on > empirical evidence collected from internal and external users that have > tested or qualified these drives in RAID deployments. > > The whitelist is only meant as a starting point and is by no means > comprehensive: > > - All intel SSD models except for 510 > - Micron M5* > - Samsung SSDs > - Seagate SSDs > That description and Paolo's reply: > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi- > owner@vger.kernel.org] On Behalf Of Paolo Bonzini > Sent: Friday, 05 December, 2014 10:45 AM ... > I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero. make me concerned about this whitelist approach. I think you need a manufacturer assertion that this is indeed the design intent; you cannot be certain based on observation from outside. Since the SCSI and ATA standards allow ignoring the hint, it might be honored most of the time, but ignored in some rare cases (e.g., drive firmware has a malloc() failure that only happens when the drive is handling an overtemperature condition and six other problems at the same time). Maybe there should be two levels: * vendor asserts the drive is designed to always honor the hint * community observes the drive always seems to honor the hint and a sysfs flag for users to select the level at which they feel safe. A user running 3 replicas of the data in different sites might be more trusting than a user for which this is the only copy of the data. --- Rob Elliott HP Server Storage -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Dec 05, 2014 at 10:58:09PM +0000, Elliott, Robert (Server Storage) wrote: > > I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero. > > make me concerned about this whitelist approach. > > I think you need a manufacturer assertion that this is indeed > the design intent; you cannot be certain based on observation > from outside. How is this different from a manufacturer assertion that they follow a SCSI or ATA standard? There have been cases in the distant past (fortunately) of disk manufacturers that ignored a CACHE FLUSH command just to get higher Winbench scores. Does that mean we can't trust them to do anything right? What I'd suggest instead is that if a vendor states this on a spec sheet --- more than just an e-mail assertion --- so they can be sued if they knowingly misrepresent their product, that we take their word at it. Of course, there will be bugs, which is why we have blacklists, or why we can remove them from the list if it turns out there are edge conditions where it appears the disk doesn't quite do the right thing. After all, we generally take the manufacturer's word that air bags will work as claimed, even if potentially 11 million of them are currently subject to recall. And do we think that "the community" would necessarily be more suited than the vendors and the manufacturer to figure out whether or not air bags or drives are working as desired? That being said, if someone wants to create a open source program which stress tests SSD's to look for cases where it is dropping a requested discard, that would certainly be a good thing to do... Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-12-08 at 10:15 -0500, Theodore Ts'o wrote: > On Fri, Dec 05, 2014 at 10:58:09PM +0000, Elliott, Robert (Server Storage) wrote: > > > I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero. > > > > make me concerned about this whitelist approach. > > > > I think you need a manufacturer assertion that this is indeed > > the design intent; you cannot be certain based on observation > > from outside. > > How is this different from a manufacturer assertion that they follow a > SCSI or ATA standard? There have been cases in the distant past > (fortunately) of disk manufacturers that ignored a CACHE FLUSH command > just to get higher Winbench scores. Does that mean we can't trust > them to do anything right? That answer depends on device type manufacturer. USB devices, hell no. ATA devices, maybe and SCSI devices usually. The main problem is usually testing. Consumer devices like USB and (s)ATA rarely get tested on anything but windows. USB devices tend to supply their own driver, so they're on the "we fix it in the driver" model which is why they bite us so badly. (S)ATA usually comply, but they only test what windows exercises, so if windows doesn't do it, chances are it never got tested. SCSI devices still tend to be tested in legacy UNIX environments, which are as diverse as we are. > What I'd suggest instead is that if a vendor states this on a spec > sheet --- more than just an e-mail assertion --- so they can be sued > if they knowingly misrepresent their product, that we take their word > at it. Of course, there will be bugs, which is why we have > blacklists, or why we can remove them from the list if it turns out > there are edge conditions where it appears the disk doesn't quite do > the right thing. > > After all, we generally take the manufacturer's word that air bags > will work as claimed, even if potentially 11 million of them are > currently subject to recall. And do we think that "the community" > would necessarily be more suited than the vendors and the manufacturer > to figure out whether or not air bags or drives are working as > desired? > > That being said, if someone wants to create a open source program > which stress tests SSD's to look for cases where it is dropping a > requested discard, that would certainly be a good thing to do... The purpose of DRAT and RZAT is to enable disk arrays deterministically to use TRIM/Unmap so arrays know what happens to stripes on discard. Arrays are being built of mostly SATA technology these days, so some manufacturers have retargetted to arrays and consumer technology (and are testing the array cases). However, windows doesn't use either feature, so manufacturers not targetting arrays will never test this feature. Hence, in this case, I think a whitelist does make sense. James -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 8 Dec 2014 10:15:59 -0500 "Theodore Ts'o" <tytso@mit.edu> wrote: > On Fri, Dec 05, 2014 at 10:58:09PM +0000, Elliott, Robert (Server Storage) wrote: > > > I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero. > > > > make me concerned about this whitelist approach. > > > > I think you need a manufacturer assertion that this is indeed > > the design intent; you cannot be certain based on observation > > from outside. > > How is this different from a manufacturer assertion that they follow a > SCSI or ATA standard? There have been cases in the distant past > (fortunately) of disk manufacturers that ignored a CACHE FLUSH command > just to get higher Winbench scores. Does that mean we can't trust > them to do anything right? At the time they never promised to honour cache flush. The reason it was became mandatory in the specification was in part so that the vendors could all force each other to play fair. If its "optional" then it's tough..., if they say they meet the standard it's class action 8) If this is a promise then it ought to be good James: "USB devices tend to supply their own driver" has not been true for some years now. Microsoft provide an in-box driver and vendors have the choice of using that or certifying their own via WHQL, which is a bit like choosing between free ice cream and banging your head against a plank cover in nails. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index c5ba15af87d3..f41f24a8bc21 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4225,10 +4225,20 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "PIONEER DVD-RW DVR-216D", NULL, ATA_HORKAGE_NOSETXFER }, /* devices that don't properly handle queued TRIM commands */ - { "Micron_M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, - { "Crucial_CT???M500SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, - { "Micron_M550*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, - { "Crucial_CT*M550SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, + { "Micron_M5?0*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | + ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "Crucial_CT???M5?0SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, + + /* + * DRAT/RZAT are weak guarantees. Explicitly black/whitelist + * SSDs that provide reliable zero after TRIM. + */ + { "INTEL*SSDSC2MH*", NULL, 0, }, /* Blacklist intel 510 */ + { "INTEL*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "SSD*INTEL*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "Samsung*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "SAMSUNG*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "ST[1248][0248]0[FH]*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, /* * Some WD SATA-I drives spin up and down erratically when the link diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 0586f66d70fa..deaa6e34ed4d 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -2515,13 +2515,15 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf) rbuf[15] = lowest_aligned; if (ata_id_has_trim(args->id)) { - rbuf[14] |= 0x80; /* TPE */ + rbuf[14] |= 0x80; /* LBPME */ - if (ata_id_has_zero_after_trim(args->id)) - rbuf[14] |= 0x40; /* TPRZ */ + if (ata_id_has_zero_after_trim(args->id) && + dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { + ata_dev_warn(dev, "Enabling discard_zeroes_data\n"); + rbuf[14] |= 0x40; /* LBPRZ */ + } } } - return 0; } diff --git a/include/linux/libata.h b/include/linux/libata.h index bd5fefeaf548..45ac825b8366 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -421,6 +421,7 @@ enum { ATA_HORKAGE_NO_NCQ_TRIM = (1 << 19), /* don't use queued TRIM */ ATA_HORKAGE_NOLPM = (1 << 20), /* don't use LPM */ ATA_HORKAGE_WD_BROKEN_LPM = (1 << 21), /* some WDs have broken LPM */ + ATA_HORKAGE_ZERO_AFTER_TRIM = (1 << 22),/* guarantees zero after trim */ /* DMA mask for user DMA control: User visible values; DO NOT renumber */
As defined, the DRAT (Deterministic Read After Trim) and RZAT (Return Zero After Trim) flags in the ATA Command Set are unreliable in the sense that they only define what happens if the device successfully executed the DSM TRIM command. TRIM is only advisory, however, and the device is free to silently ignore all or parts of the request. In practice this renders the DRAT and RZAT flags completely useless and because the results are unpredictable we decided to disable discard in MD for 3.18 to avoid the risk of data corruption. Hardware vendors in the real world obviously need better guarantees than what the standards bodies provide. Unfortuntely those guarantees are encoded in product requirements documents rather than somewhere we can key off of them programatically. So we are compelled to disabling discard_zeroes_data for all devices unless we explicitly have data to support whitelisting them. This patch whitelists SSDs from a few of the main vendors. None of the whitelists are based on written guarantees. They are purely based on empirical evidence collected from internal and external users that have tested or qualified these drives in RAID deployments. The whitelist is only meant as a starting point and is by no means comprehensive: - All intel SSD models except for 510 - Micron M5* - Samsung SSDs - Seagate SSDs Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> --- drivers/ata/libata-core.c | 18 ++++++++++++++---- drivers/ata/libata-scsi.c | 10 ++++++---- include/linux/libata.h | 1 + 3 files changed, 21 insertions(+), 8 deletions(-)