diff mbox series

ahci: document and clarify unconventional intel quirk

Message ID 20240207091054.1697236-1-cassel@kernel.org
State New
Headers show
Series ahci: document and clarify unconventional intel quirk | expand

Commit Message

Niklas Cassel Feb. 7, 2024, 9:10 a.m. UTC
The ahci_intel_pcs_quirk is unconventional in several ways:
First of all because it has a board ID for which the quirk should NOT be
applied (board_ahci_pcs7), instead of the usual way where we have a board
ID for which the quirk should be applied.

The second reason is that other than only excluding board_ahci_pcs7 from
the quirk, PCI devices that make use of the generic entry in ahci_pci_tbl
(which matches on AHCI class code) are also excluded.

This can of course lead to very subtle breakage, and did indeed do so in:
commit 104ff59af73a ("ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"),
which added an explicit entry with board_ahci_low_power to ahci_pci_tbl.

This caused many users to complain that their SATA drives disappeared.
The logical assumption was of course that the issue was related to LPM,
and was therefore reverted in commit 6210038aeaf4 ("ata: ahci: Revert
"ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"").

It took a lot of time to figure out that this was all completely unrelated
to LPM, and was instead caused by an unconventional Intel quirk.

While this quirk should definitely be cleaned up to be implemented like
all other quirks, for now, at least document the behavior of this quirk.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217114
Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
 drivers/ata/ahci.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

Comments

Mika Westerberg Feb. 8, 2024, 4:09 p.m. UTC | #1
On Wed, Feb 07, 2024 at 10:10:53AM +0100, Niklas Cassel wrote:
> The ahci_intel_pcs_quirk is unconventional in several ways:
> First of all because it has a board ID for which the quirk should NOT be
> applied (board_ahci_pcs7), instead of the usual way where we have a board
> ID for which the quirk should be applied.
> 
> The second reason is that other than only excluding board_ahci_pcs7 from
> the quirk, PCI devices that make use of the generic entry in ahci_pci_tbl
> (which matches on AHCI class code) are also excluded.
> 
> This can of course lead to very subtle breakage, and did indeed do so in:
> commit 104ff59af73a ("ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"),
> which added an explicit entry with board_ahci_low_power to ahci_pci_tbl.
> 
> This caused many users to complain that their SATA drives disappeared.
> The logical assumption was of course that the issue was related to LPM,
> and was therefore reverted in commit 6210038aeaf4 ("ata: ahci: Revert
> "ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"").
> 
> It took a lot of time to figure out that this was all completely unrelated
> to LPM, and was instead caused by an unconventional Intel quirk.
> 
> While this quirk should definitely be cleaned up to be implemented like
> all other quirks, for now, at least document the behavior of this quirk.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217114
> Signed-off-by: Niklas Cassel <cassel@kernel.org>

Thanks!

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Damien Le Moal Feb. 8, 2024, 11:33 p.m. UTC | #2
On 2/7/24 18:10, Niklas Cassel wrote:
> The ahci_intel_pcs_quirk is unconventional in several ways:
> First of all because it has a board ID for which the quirk should NOT be
> applied (board_ahci_pcs7), instead of the usual way where we have a board
> ID for which the quirk should be applied.
> 
> The second reason is that other than only excluding board_ahci_pcs7 from
> the quirk, PCI devices that make use of the generic entry in ahci_pci_tbl
> (which matches on AHCI class code) are also excluded.
> 
> This can of course lead to very subtle breakage, and did indeed do so in:
> commit 104ff59af73a ("ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"),
> which added an explicit entry with board_ahci_low_power to ahci_pci_tbl.
> 
> This caused many users to complain that their SATA drives disappeared.
> The logical assumption was of course that the issue was related to LPM,
> and was therefore reverted in commit 6210038aeaf4 ("ata: ahci: Revert
> "ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"").
> 
> It took a lot of time to figure out that this was all completely unrelated
> to LPM, and was instead caused by an unconventional Intel quirk.
> 
> While this quirk should definitely be cleaned up to be implemented like
> all other quirks, for now, at least document the behavior of this quirk.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217114
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
>  drivers/ata/ahci.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index da2e74fce2d9..122278438092 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -69,8 +69,8 @@ enum board_ids {
>  	board_ahci_vt8251,
>  
>  	/*
> -	 * board IDs for Intel chipsets that support more than 6 ports
> -	 * *and* end up needing the PCS quirk.
> +	 * board IDs for Intel chipsets that should NOT have the
> +	 * ahci_intel_pcs_quirk applied. Yes, this is not a typo.
>  	 */
>  	board_ahci_pcs7,
>  
> @@ -1670,14 +1670,33 @@ static void ahci_update_initial_lpm_policy(struct ata_port *ap,
>  		ap->target_lpm_policy = policy;
>  }
>  
> +/*
> + * NOTE: this quirk is applied for all board IDs in ahci_pci_tbl, where
> + * the PCI vendor ID == PCI_VENDOR_ID_INTEL (except for board_ahci_pcs7).
> + *
> + * This quirk causes some Intel AHCI controllers (e.g. Intel Tiger Lake)
> + * to not get a link up when Intel VMD is enabled, see:
> + * https://bugzilla.kernel.org/show_bug.cgi?id=217114
> + *
> + * Since the quirk is only applied for explicit entries in ahci_pci_tbl
> + * (it does not apply to the generic entry in ahci_pci_tbl that matches on
> + * AHCI class code), if your Intel AHCI controller does not get a link up
> + * because of this quirk, try to remove the explicit entry from ahci_pci_tbl.
> + */
>  static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
>  {
> -	const struct pci_device_id *id = pci_match_id(ahci_pci_tbl, pdev);
> +	const struct pci_device_id *id;
>  	u16 tmp16;
>  
> +	/* If the detected PCI device is not an Intel device, skip. */
> +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
> +		return;

Can't we also skip TigerLake ?

> +
>  	/*
> -	 * Only apply the 6-port PCS quirk for known legacy platforms.
> +	 * See if there is an explicit entry for this PCI device in
> +	 * ahci_pci_tbl, if there is not, do not apply the quirk.
>  	 */
> +	id = pci_match_id(ahci_pci_tbl, pdev);
>  	if (!id || id->vendor != PCI_VENDOR_ID_INTEL)
>  		return;
>
Dan Williams Feb. 9, 2024, 12:58 a.m. UTC | #3
Niklas Cassel wrote:
> The ahci_intel_pcs_quirk is unconventional in several ways:
> First of all because it has a board ID for which the quirk should NOT be
> applied (board_ahci_pcs7), instead of the usual way where we have a board
> ID for which the quirk should be applied.
> 
> The second reason is that other than only excluding board_ahci_pcs7 from
> the quirk, PCI devices that make use of the generic entry in ahci_pci_tbl
> (which matches on AHCI class code) are also excluded.
> 
> This can of course lead to very subtle breakage, and did indeed do so in:
> commit 104ff59af73a ("ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"),
> which added an explicit entry with board_ahci_low_power to ahci_pci_tbl.
> 
> This caused many users to complain that their SATA drives disappeared.
> The logical assumption was of course that the issue was related to LPM,
> and was therefore reverted in commit 6210038aeaf4 ("ata: ahci: Revert
> "ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"").
> 
> It took a lot of time to figure out that this was all completely unrelated
> to LPM, and was instead caused by an unconventional Intel quirk.

Ugh, sorry about that.

> 
> While this quirk should definitely be cleaned up to be implemented like
> all other quirks, for now, at least document the behavior of this quirk.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217114
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
>  drivers/ata/ahci.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index da2e74fce2d9..122278438092 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -69,8 +69,8 @@ enum board_ids {
>  	board_ahci_vt8251,
>  
>  	/*
> -	 * board IDs for Intel chipsets that support more than 6 ports
> -	 * *and* end up needing the PCS quirk.
> +	 * board IDs for Intel chipsets that should NOT have the
> +	 * ahci_intel_pcs_quirk applied. Yes, this is not a typo.

I am not sure if this wording helps the next person without the context
of this patch. How about this?

/*
 * NOTE NOTE NOTE this board id is identifying a point in history where
 * the "PCS Quirk" from 2007 should *stop* being applied to more modern
 * platforms. So this is a "stop-quirk" point and board_ids >= to this
 * value do not run the quirk, see ahci_intel_pcs_quirk() for details.
 */


>  	 */
>  	board_ahci_pcs7,
>  
> @@ -1670,14 +1670,33 @@ static void ahci_update_initial_lpm_policy(struct ata_port *ap,
>  		ap->target_lpm_policy = policy;
>  }
>  
> +/*
> + * NOTE: this quirk is applied for all board IDs in ahci_pci_tbl, where
> + * the PCI vendor ID == PCI_VENDOR_ID_INTEL (except for board_ahci_pcs7).
> + *
> + * This quirk causes some Intel AHCI controllers (e.g. Intel Tiger Lake)
> + * to not get a link up when Intel VMD is enabled, see:
> + * https://bugzilla.kernel.org/show_bug.cgi?id=217114
> + *
> + * Since the quirk is only applied for explicit entries in ahci_pci_tbl
> + * (it does not apply to the generic entry in ahci_pci_tbl that matches on
> + * AHCI class code), if your Intel AHCI controller does not get a link up
> + * because of this quirk, try to remove the explicit entry from ahci_pci_tbl.
> + */
>  static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
>  {
> -	const struct pci_device_id *id = pci_match_id(ahci_pci_tbl, pdev);
> +	const struct pci_device_id *id;
>  	u16 tmp16;
>  
> +	/* If the detected PCI device is not an Intel device, skip. */
> +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
> +		return;
> +
>  	/*
> -	 * Only apply the 6-port PCS quirk for known legacy platforms.
> +	 * See if there is an explicit entry for this PCI device in

Perhaps:

s/explicit entry/explicit Intel-vendor entry/

?

> +	 * ahci_pci_tbl, if there is not, do not apply the quirk.
>  	 */
> +	id = pci_match_id(ahci_pci_tbl, pdev);
>  	if (!id || id->vendor != PCI_VENDOR_ID_INTEL)

Otherwise you can add:

Acked-by: Dan Williams <dan.j.williams@intel.com>
Niklas Cassel Feb. 9, 2024, 1:26 p.m. UTC | #4
On Thu, Feb 08, 2024 at 04:58:24PM -0800, Dan Williams wrote:
> Niklas Cassel wrote:
> > The ahci_intel_pcs_quirk is unconventional in several ways:
> > First of all because it has a board ID for which the quirk should NOT be
> > applied (board_ahci_pcs7), instead of the usual way where we have a board
> > ID for which the quirk should be applied.
> > 
> > The second reason is that other than only excluding board_ahci_pcs7 from
> > the quirk, PCI devices that make use of the generic entry in ahci_pci_tbl
> > (which matches on AHCI class code) are also excluded.
> > 
> > This can of course lead to very subtle breakage, and did indeed do so in:
> > commit 104ff59af73a ("ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"),
> > which added an explicit entry with board_ahci_low_power to ahci_pci_tbl.
> > 
> > This caused many users to complain that their SATA drives disappeared.
> > The logical assumption was of course that the issue was related to LPM,
> > and was therefore reverted in commit 6210038aeaf4 ("ata: ahci: Revert
> > "ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"").
> > 
> > It took a lot of time to figure out that this was all completely unrelated
> > to LPM, and was instead caused by an unconventional Intel quirk.
> 
> Ugh, sorry about that.

Out of most bad things come something good :)

I've been reading up on LPM in SATA + AHCI specs because of this,
so some of the patches related to LPM probably wouldn't have been
submitted if it wasn't for this :)


> 
> > 
> > While this quirk should definitely be cleaned up to be implemented like
> > all other quirks, for now, at least document the behavior of this quirk.
> > 
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217114
> > Signed-off-by: Niklas Cassel <cassel@kernel.org>
> > ---
> >  drivers/ata/ahci.c | 27 +++++++++++++++++++++++----
> >  1 file changed, 23 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> > index da2e74fce2d9..122278438092 100644
> > --- a/drivers/ata/ahci.c
> > +++ b/drivers/ata/ahci.c
> > @@ -69,8 +69,8 @@ enum board_ids {
> >  	board_ahci_vt8251,
> >  
> >  	/*
> > -	 * board IDs for Intel chipsets that support more than 6 ports
> > -	 * *and* end up needing the PCS quirk.
> > +	 * board IDs for Intel chipsets that should NOT have the
> > +	 * ahci_intel_pcs_quirk applied. Yes, this is not a typo.
> 
> I am not sure if this wording helps the next person without the context
> of this patch. How about this?
> 
> /*
>  * NOTE NOTE NOTE this board id is identifying a point in history where
>  * the "PCS Quirk" from 2007 should *stop* being applied to more modern
>  * platforms. So this is a "stop-quirk" point and board_ids >= to this
>  * value do not run the quirk, see ahci_intel_pcs_quirk() for details.
>  */

The amout of comments needed to describe how "out of the ordinary"
something is, is probably a sign that you should just do a proper
cleanup instead, so that is what I decided to do:
https://lore.kernel.org/linux-ide/20240209130307.39113-1-cassel@kernel.org/

So feel free to review that one instead :)


Kind regards,
Niklas
diff mbox series

Patch

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index da2e74fce2d9..122278438092 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -69,8 +69,8 @@  enum board_ids {
 	board_ahci_vt8251,
 
 	/*
-	 * board IDs for Intel chipsets that support more than 6 ports
-	 * *and* end up needing the PCS quirk.
+	 * board IDs for Intel chipsets that should NOT have the
+	 * ahci_intel_pcs_quirk applied. Yes, this is not a typo.
 	 */
 	board_ahci_pcs7,
 
@@ -1670,14 +1670,33 @@  static void ahci_update_initial_lpm_policy(struct ata_port *ap,
 		ap->target_lpm_policy = policy;
 }
 
+/*
+ * NOTE: this quirk is applied for all board IDs in ahci_pci_tbl, where
+ * the PCI vendor ID == PCI_VENDOR_ID_INTEL (except for board_ahci_pcs7).
+ *
+ * This quirk causes some Intel AHCI controllers (e.g. Intel Tiger Lake)
+ * to not get a link up when Intel VMD is enabled, see:
+ * https://bugzilla.kernel.org/show_bug.cgi?id=217114
+ *
+ * Since the quirk is only applied for explicit entries in ahci_pci_tbl
+ * (it does not apply to the generic entry in ahci_pci_tbl that matches on
+ * AHCI class code), if your Intel AHCI controller does not get a link up
+ * because of this quirk, try to remove the explicit entry from ahci_pci_tbl.
+ */
 static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
 {
-	const struct pci_device_id *id = pci_match_id(ahci_pci_tbl, pdev);
+	const struct pci_device_id *id;
 	u16 tmp16;
 
+	/* If the detected PCI device is not an Intel device, skip. */
+	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
+		return;
+
 	/*
-	 * Only apply the 6-port PCS quirk for known legacy platforms.
+	 * See if there is an explicit entry for this PCI device in
+	 * ahci_pci_tbl, if there is not, do not apply the quirk.
 	 */
+	id = pci_match_id(ahci_pci_tbl, pdev);
 	if (!id || id->vendor != PCI_VENDOR_ID_INTEL)
 		return;