mbox series

[v3,0/3] Add dw_mmc support for rk3576

Message ID 20240814223555.3695-1-detlev.casanova@collabora.com
Headers show
Series Add dw_mmc support for rk3576 | expand

Message

Detlev Casanova Aug. 14, 2024, 10:33 p.m. UTC
The SD card controller on the rk3576 SoC uses a new tuning version that is
capable of using pre-boot tuning information.

Also, it stores the phase settings into the dw_mmc controller, so the code
has to be adapted to implement that.

Changes since v2:
- Drop rockchip,v2-tuning and use compatible-based detection
- Fix coding style

Changes since v1:
- Renamed use-v2-tuning to v2-tuning
- Rewrite v2-tuning description as the hardware feature

Detlev.

Detlev Casanova (1):
  dt-bindings: mmc: Add support for rk3576 dw-mshc

Shawn Lin (2):
  mmc: dw_mmc-rockchip: Add v2 tuning support
  mmc: dw_mmc-rockchip: Add internal phase support

 .../bindings/mmc/rockchip-dw-mshc.yaml        |   1 +
 drivers/mmc/host/dw_mmc-rockchip.c            | 227 +++++++++++++++++-
 2 files changed, 217 insertions(+), 11 deletions(-)

Comments

Shawn Lin Aug. 15, 2024, 12:55 a.m. UTC | #1
Hi Detlev

在 2024/8/15 6:34, Detlev Casanova 写道:
> From: Shawn Lin <shawn.lin@rock-chips.com>
> 
> v2 tuning will inherit pre-stage loader's phase settings for the first
> time, and do re-tune if necessary.
> Re-tune will still try the rough degrees, for instance, 90, 180, 270,
> 360 but continue to do the fine tuning if sample window isn't good
> enough.
> 
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
> ---
>   drivers/mmc/host/dw_mmc-rockchip.c | 49 ++++++++++++++++++++++++++++++
>   1 file changed, 49 insertions(+)
> 
> diff --git a/drivers/mmc/host/dw_mmc-rockchip.c b/drivers/mmc/host/dw_mmc-rockchip.c
> index b07190ba4b7ac..367633f4e8892 100644
> --- a/drivers/mmc/host/dw_mmc-rockchip.c
> +++ b/drivers/mmc/host/dw_mmc-rockchip.c
> @@ -24,6 +24,8 @@ struct dw_mci_rockchip_priv_data {
>   	struct clk		*sample_clk;
>   	int			default_sample_phase;
>   	int			num_phases;
> +	bool			use_v2_tuning;
> +	int			last_degree;
>   };
>   
>   static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios)
> @@ -134,6 +136,42 @@ static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios)
>   #define TUNING_ITERATION_TO_PHASE(i, num_phases) \
>   		(DIV_ROUND_UP((i) * 360, num_phases))
>   
> +static int dw_mci_v2_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
> +{
> +	struct dw_mci *host = slot->host;
> +	struct dw_mci_rockchip_priv_data *priv = host->priv;
> +	struct mmc_host *mmc = slot->mmc;
> +	u32 degrees[4] = {90, 180, 270, 360};
> +	int i;
> +	static bool inherit = true;
> +
> +	if (inherit) {
> +		inherit = false;
> +		i = clk_get_phase(priv->sample_clk) / 90 - 1;
> +		goto done;
> +	}
> +
> +	/* v2 only support 4 degrees in theory */
> +	for (i = 0; i < ARRAY_SIZE(degrees); i++) {
> +		if (degrees[i] == priv->last_degree)
> +			continue;
> +
> +		clk_set_phase(priv->sample_clk, degrees[i]);
> +		if (!mmc_send_tuning(mmc, opcode, NULL))
> +			break;
> +	}
> +
> +	if (i == ARRAY_SIZE(degrees)) {
> +		dev_warn(host->dev, "All phases bad!");
> +		return -EIO;
> +	}
> +
> +done:
> +	dev_info(host->dev, "Successfully tuned phase to %d\n", degrees[i]);
> +	priv->last_degree = degrees[i];
> +	return 0;
> +}
> +
>   static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>   {
>   	struct dw_mci *host = slot->host;
> @@ -157,6 +195,13 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>   		return -EIO;
>   	}
>   
> +	if (priv->use_v2_tuning) {
> +		ret = dw_mci_v2_execute_tuning(slot, opcode);
> +		if (!ret)
> +			return 0;
> +		/* Otherwise we continue using fine tuning */
> +	}
> +
>   	ranges = kmalloc_array(priv->num_phases / 2 + 1,
>   			       sizeof(*ranges), GFP_KERNEL);
>   	if (!ranges)
> @@ -277,6 +322,10 @@ static int dw_mci_rk3288_parse_dt(struct dw_mci *host)
>   					&priv->default_sample_phase))
>   		priv->default_sample_phase = 0;
>   
> +	priv->use_v2_tuning =
> +		of_device_is_compatible(host->dev->of_node,
> +					"rockchip,rk3576-dw-mshc");
> +

v2 is a kind of software decision instead of hardware dependency.
So in theory, any SoC can claim to use it via DT.

>   	priv->drv_clk = devm_clk_get(host->dev, "ciu-drive");
>   	if (IS_ERR(priv->drv_clk))
>   		dev_dbg(host->dev, "ciu-drive not available\n");
Shawn Lin Aug. 15, 2024, 12:57 a.m. UTC | #2
Hi Detlev

在 2024/8/15 6:34, Detlev Casanova 写道:
> From: Shawn Lin <shawn.lin@rock-chips.com>
> 
> Some Rockchip devices put the phase settings into the dw_mmc controller.
> 
> The feature is implemented in devices where the USRID register contains
> 0x20230002.
> 

Thanks for helping upstream it. USRID is 0x20230001 actually, so commit
msg should be amended. Otherwise,

Acked-by: Shawn Lin <shawn.lin@rock-chips.com>

> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
> ---
>   drivers/mmc/host/dw_mmc-rockchip.c | 184 ++++++++++++++++++++++++++---
>   1 file changed, 170 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/mmc/host/dw_mmc-rockchip.c b/drivers/mmc/host/dw_mmc-rockchip.c
> index 367633f4e8892..03e25a8b8a305 100644
> --- a/drivers/mmc/host/dw_mmc-rockchip.c
> +++ b/drivers/mmc/host/dw_mmc-rockchip.c
> @@ -16,6 +16,17 @@
>   #include "dw_mmc-pltfm.h"
>   
>   #define RK3288_CLKGEN_DIV	2
> +#define USRID_INTER_PHASE	0x20230001
> +#define SDMMC_TIMING_CON0	0x130
> +#define SDMMC_TIMING_CON1	0x134
> +#define ROCKCHIP_MMC_DELAY_SEL BIT(10)
> +#define ROCKCHIP_MMC_DEGREE_MASK 0x3
> +#define ROCKCHIP_MMC_DELAYNUM_OFFSET 2
> +#define ROCKCHIP_MMC_DELAYNUM_MASK (0xff << ROCKCHIP_MMC_DELAYNUM_OFFSET)
> +#define PSECS_PER_SEC 1000000000000LL
> +#define ROCKCHIP_MMC_DELAY_ELEMENT_PSEC 60
> +#define HIWORD_UPDATE(val, mask, shift) \
> +		((val) << (shift) | (mask) << ((shift) + 16))
>   
>   static const unsigned int freqs[] = { 100000, 200000, 300000, 400000 };
>   
> @@ -25,9 +36,121 @@ struct dw_mci_rockchip_priv_data {
>   	int			default_sample_phase;
>   	int			num_phases;
>   	bool			use_v2_tuning;
> +	int			usrid;
>   	int			last_degree;
>   };
>   
> +/*
> + * Each fine delay is between 44ps-77ps. Assume each fine delay is 60ps to
> + * simplify calculations. So 45degs could be anywhere between 33deg and 57.8deg.
> + */
> +static int rockchip_mmc_get_phase(struct dw_mci *host, bool sample)
> +{
> +	unsigned long rate = clk_get_rate(host->ciu_clk);
> +	u32 raw_value;
> +	u16 degrees;
> +	u32 delay_num = 0;
> +
> +	/* Constant signal, no measurable phase shift */
> +	if (!rate)
> +		return 0;
> +
> +	if (sample)
> +		raw_value = mci_readl(host, TIMING_CON1) >> 1;
> +	else
> +		raw_value = mci_readl(host, TIMING_CON0) >> 1;
> +
> +	degrees = (raw_value & ROCKCHIP_MMC_DEGREE_MASK) * 90;
> +
> +	if (raw_value & ROCKCHIP_MMC_DELAY_SEL) {
> +		/* degrees/delaynum * 1000000 */
> +		unsigned long factor = (ROCKCHIP_MMC_DELAY_ELEMENT_PSEC / 10) *
> +					36 * (rate / 10000);
> +
> +		delay_num = (raw_value & ROCKCHIP_MMC_DELAYNUM_MASK);
> +		delay_num >>= ROCKCHIP_MMC_DELAYNUM_OFFSET;
> +		degrees += DIV_ROUND_CLOSEST(delay_num * factor, 1000000);
> +	}
> +
> +	return degrees % 360;
> +}
> +
> +static int rockchip_mmc_set_phase(struct dw_mci *host, bool sample, int degrees)
> +{
> +	unsigned long rate = clk_get_rate(host->ciu_clk);
> +	u8 nineties, remainder;
> +	u8 delay_num;
> +	u32 raw_value;
> +	u32 delay;
> +
> +	/*
> +	 * The below calculation is based on the output clock from
> +	 * MMC host to the card, which expects the phase clock inherits
> +	 * the clock rate from its parent, namely the output clock
> +	 * provider of MMC host. However, things may go wrong if
> +	 * (1) It is orphan.
> +	 * (2) It is assigned to the wrong parent.
> +	 *
> +	 * This check help debug the case (1), which seems to be the
> +	 * most likely problem we often face and which makes it difficult
> +	 * for people to debug unstable mmc tuning results.
> +	 */
> +	if (!rate) {
> +		dev_err(host->dev, "%s: invalid clk rate\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	nineties = degrees / 90;
> +	remainder = (degrees % 90);
> +
> +	/*
> +	 * Due to the inexact nature of the "fine" delay, we might
> +	 * actually go non-monotonic.  We don't go _too_ monotonic
> +	 * though, so we should be OK.  Here are options of how we may
> +	 * work:
> +	 *
> +	 * Ideally we end up with:
> +	 *   1.0, 2.0, ..., 69.0, 70.0, ...,  89.0, 90.0
> +	 *
> +	 * On one extreme (if delay is actually 44ps):
> +	 *   .73, 1.5, ..., 50.6, 51.3, ...,  65.3, 90.0
> +	 * The other (if delay is actually 77ps):
> +	 *   1.3, 2.6, ..., 88.6. 89.8, ..., 114.0, 90
> +	 *
> +	 * It's possible we might make a delay that is up to 25
> +	 * degrees off from what we think we're making.  That's OK
> +	 * though because we should be REALLY far from any bad range.
> +	 */
> +
> +	/*
> +	 * Convert to delay; do a little extra work to make sure we
> +	 * don't overflow 32-bit / 64-bit numbers.
> +	 */
> +	delay = 10000000; /* PSECS_PER_SEC / 10000 / 10 */
> +	delay *= remainder;
> +	delay = DIV_ROUND_CLOSEST(delay,
> +			(rate / 1000) * 36 *
> +				(ROCKCHIP_MMC_DELAY_ELEMENT_PSEC / 10));
> +
> +	delay_num = (u8) min_t(u32, delay, 255);
> +
> +	raw_value = delay_num ? ROCKCHIP_MMC_DELAY_SEL : 0;
> +	raw_value |= delay_num << ROCKCHIP_MMC_DELAYNUM_OFFSET;
> +	raw_value |= nineties;
> +
> +	if (sample)
> +		mci_writel(host, TIMING_CON1, HIWORD_UPDATE(raw_value, 0x07ff, 1));
> +	else
> +		mci_writel(host, TIMING_CON0, HIWORD_UPDATE(raw_value, 0x07ff, 1));
> +
> +	dev_dbg(host->dev, "set %s_phase(%d) delay_nums=%u actual_degrees=%d\n",
> +		sample ? "sample" : "drv", degrees, delay_num,
> +		rockchip_mmc_get_phase(host, sample)
> +	);
> +
> +	return 0;
> +}
> +
>   static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios)
>   {
>   	struct dw_mci_rockchip_priv_data *priv = host->priv;
> @@ -65,8 +188,12 @@ static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios)
>   	}
>   
>   	/* Make sure we use phases which we can enumerate with */
> -	if (!IS_ERR(priv->sample_clk) && ios->timing <= MMC_TIMING_SD_HS)
> -		clk_set_phase(priv->sample_clk, priv->default_sample_phase);
> +	if (!IS_ERR(priv->sample_clk) && ios->timing <= MMC_TIMING_SD_HS) {
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			rockchip_mmc_set_phase(host, true, priv->default_sample_phase);
> +		else
> +			clk_set_phase(priv->sample_clk, priv->default_sample_phase);
> +	}
>   
>   	/*
>   	 * Set the drive phase offset based on speed mode to achieve hold times.
> @@ -129,7 +256,10 @@ static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios)
>   			break;
>   		}
>   
> -		clk_set_phase(priv->drv_clk, phase);
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			rockchip_mmc_set_phase(host, false, phase);
> +		else
> +			clk_set_phase(priv->drv_clk, phase);
>   	}
>   }
>   
> @@ -141,13 +271,16 @@ static int dw_mci_v2_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>   	struct dw_mci *host = slot->host;
>   	struct dw_mci_rockchip_priv_data *priv = host->priv;
>   	struct mmc_host *mmc = slot->mmc;
> -	u32 degrees[4] = {90, 180, 270, 360};
> +	u32 degree, degrees[4] = {90, 180, 270, 360};
>   	int i;
>   	static bool inherit = true;
>   
>   	if (inherit) {
>   		inherit = false;
> -		i = clk_get_phase(priv->sample_clk) / 90 - 1;
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			i = rockchip_mmc_get_phase(host, true) / 90;
> +		else
> +			i = clk_get_phase(priv->sample_clk) / 90 - 1;
>   		goto done;
>   	}
>   
> @@ -156,7 +289,11 @@ static int dw_mci_v2_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>   		if (degrees[i] == priv->last_degree)
>   			continue;
>   
> -		clk_set_phase(priv->sample_clk, degrees[i]);
> +		degree = (degrees[i] + priv->last_degree + 90) % 360;
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			rockchip_mmc_set_phase(host, true, degree);
> +		else
> +			clk_set_phase(priv->sample_clk, degree);
>   		if (!mmc_send_tuning(mmc, opcode, NULL))
>   			break;
>   	}
> @@ -189,6 +326,7 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>   	int longest_range_len = -1;
>   	int longest_range = -1;
>   	int middle_phase;
> +	int phase;
>   
>   	if (IS_ERR(priv->sample_clk)) {
>   		dev_err(host->dev, "Tuning clock (sample_clk) not defined.\n");
> @@ -209,8 +347,15 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>   
>   	/* Try each phase and extract good ranges */
>   	for (i = 0; i < priv->num_phases; ) {
> -		clk_set_phase(priv->sample_clk,
> -			      TUNING_ITERATION_TO_PHASE(i, priv->num_phases));
> +		/* Cannot guarantee any phases larger than 270 would work well */
> +		if (TUNING_ITERATION_TO_PHASE(i, priv->num_phases) > 270)
> +			break;
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			rockchip_mmc_set_phase(host, true,
> +				TUNING_ITERATION_TO_PHASE(i, priv->num_phases));
> +		else
> +			clk_set_phase(priv->sample_clk,
> +				TUNING_ITERATION_TO_PHASE(i, priv->num_phases));
>   
>   		v = !mmc_send_tuning(mmc, opcode, NULL);
>   
> @@ -256,7 +401,10 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>   	}
>   
>   	if (ranges[0].start == 0 && ranges[0].end == priv->num_phases - 1) {
> -		clk_set_phase(priv->sample_clk, priv->default_sample_phase);
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			rockchip_mmc_set_phase(host, true, priv->default_sample_phase);
> +		else
> +			clk_set_phase(priv->sample_clk, priv->default_sample_phase);
>   		dev_info(host->dev, "All phases work, using default phase %d.",
>   			 priv->default_sample_phase);
>   		goto free;
> @@ -293,12 +441,13 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>   
>   	middle_phase = ranges[longest_range].start + longest_range_len / 2;
>   	middle_phase %= priv->num_phases;
> -	dev_info(host->dev, "Successfully tuned phase to %d\n",
> -		 TUNING_ITERATION_TO_PHASE(middle_phase, priv->num_phases));
> +	phase = TUNING_ITERATION_TO_PHASE(middle_phase, priv->num_phases);
> +	dev_info(host->dev, "Successfully tuned phase to %d\n", phase);
>   
> -	clk_set_phase(priv->sample_clk,
> -		      TUNING_ITERATION_TO_PHASE(middle_phase,
> -						priv->num_phases));
> +	if (priv->usrid == USRID_INTER_PHASE)
> +		rockchip_mmc_set_phase(host, true, phase);
> +	else
> +		clk_set_phase(priv->sample_clk, phase);
>   
>   free:
>   	kfree(ranges);
> @@ -342,6 +491,7 @@ static int dw_mci_rk3288_parse_dt(struct dw_mci *host)
>   static int dw_mci_rockchip_init(struct dw_mci *host)
>   {
>   	int ret, i;
> +	struct dw_mci_rockchip_priv_data *priv = host->priv;
>   
>   	/* It is slot 8 on Rockchip SoCs */
>   	host->sdio_id0 = 8;
> @@ -365,6 +515,12 @@ static int dw_mci_rockchip_init(struct dw_mci *host)
>   			dev_warn(host->dev, "no valid minimum freq: %d\n", ret);
>   	}
>   
> +	priv->usrid = mci_readl(host, USRID);
> +	if (priv->usrid == USRID_INTER_PHASE) {
> +		priv->sample_clk = NULL;
> +		priv->drv_clk = NULL;
> +	}
> +
>   	return 0;
>   }
>
Heiko Stübner Aug. 15, 2024, 1:17 p.m. UTC | #3
Am Donnerstag, 15. August 2024, 02:55:37 CEST schrieb Shawn Lin:
> Hi Detlev
> 
> 在 2024/8/15 6:34, Detlev Casanova 写道:
> > From: Shawn Lin <shawn.lin@rock-chips.com>
> > 
> > v2 tuning will inherit pre-stage loader's phase settings for the first
> > time, and do re-tune if necessary.
> > Re-tune will still try the rough degrees, for instance, 90, 180, 270,
> > 360 but continue to do the fine tuning if sample window isn't good
> > enough.
> > 
> > Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> > Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>

> > @@ -277,6 +322,10 @@ static int dw_mci_rk3288_parse_dt(struct dw_mci *host)
> >   					&priv->default_sample_phase))
> >   		priv->default_sample_phase = 0;
> >   
> > +	priv->use_v2_tuning =
> > +		of_device_is_compatible(host->dev->of_node,
> > +					"rockchip,rk3576-dw-mshc");
> > +
> 
> v2 is a kind of software decision instead of hardware dependency.
> So in theory, any SoC can claim to use it via DT.

which actually makes it unsuitable for dt.

Devicetree describes hardware-properties and should _not_ be used for
software configuration.

From the comment above, I assume the rk3576 does not need that feature
and can just work with the regular tuning?

So there are two routes for the immediate future:
(1) rk3576 _needs_ that feature, then going with the compatible is fine

(2) rk3576 does not need absolutely need that feature, then I'd expect
the basic rk3576 to first come without, as I'd expect a lot more explanation
on why it is actually needed, and which cases it does improve.
The commit message does not really explain that much about why this
is a great/needed feature and which areas it does improve.


Heiko
Detlev Casanova Aug. 15, 2024, 1:23 p.m. UTC | #4
On Wednesday, 14 August 2024 20:55:37 EDT Shawn Lin wrote:
> Hi Detlev
> 
> 在 2024/8/15 6:34, Detlev Casanova 写道:
> > From: Shawn Lin <shawn.lin@rock-chips.com>
> > 
> > v2 tuning will inherit pre-stage loader's phase settings for the first
> > time, and do re-tune if necessary.
> > Re-tune will still try the rough degrees, for instance, 90, 180, 270,
> > 360 but continue to do the fine tuning if sample window isn't good
> > enough.
> > 
> > Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> > Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
> > ---
> > 
> >   drivers/mmc/host/dw_mmc-rockchip.c | 49 ++++++++++++++++++++++++++++++
> >   1 file changed, 49 insertions(+)
> > 
> > diff --git a/drivers/mmc/host/dw_mmc-rockchip.c
> > b/drivers/mmc/host/dw_mmc-rockchip.c index b07190ba4b7ac..367633f4e8892
> > 100644
> > --- a/drivers/mmc/host/dw_mmc-rockchip.c
> > +++ b/drivers/mmc/host/dw_mmc-rockchip.c

[...]

> >   		
> >   		priv->default_sample_phase = 0;
> > 
> > +	priv->use_v2_tuning =
> > +		of_device_is_compatible(host->dev->of_node,
> > +					"rockchip,rk3576-dw-
mshc");
> > +
> 
> v2 is a kind of software decision instead of hardware dependency.
> So in theory, any SoC can claim to use it via DT.

Yes but from my tests, only rk3576 won't work without it. So it makes sense to 
only use v2 for this SoC (and other future ones not supported yet)

> 
> >   	priv->drv_clk = devm_clk_get(host->dev, "ciu-drive");
> >   	if (IS_ERR(priv->drv_clk))
> >   	
> >   		dev_dbg(host->dev, "ciu-drive not available\n");


Detlev.
Heiko Stübner Aug. 15, 2024, 1:52 p.m. UTC | #5
Am Donnerstag, 15. August 2024, 00:34:02 CEST schrieb Detlev Casanova:
> From: Shawn Lin <shawn.lin@rock-chips.com>
> 
> Some Rockchip devices put the phase settings into the dw_mmc controller.
> 
> The feature is implemented in devices where the USRID register contains
> 0x20230002.
> 
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
> ---
>  drivers/mmc/host/dw_mmc-rockchip.c | 184 ++++++++++++++++++++++++++---
>  1 file changed, 170 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/mmc/host/dw_mmc-rockchip.c b/drivers/mmc/host/dw_mmc-rockchip.c
> index 367633f4e8892..03e25a8b8a305 100644
> --- a/drivers/mmc/host/dw_mmc-rockchip.c
> +++ b/drivers/mmc/host/dw_mmc-rockchip.c
> @@ -16,6 +16,17 @@
>  #include "dw_mmc-pltfm.h"
>  
>  #define RK3288_CLKGEN_DIV	2
> +#define USRID_INTER_PHASE	0x20230001
> +#define SDMMC_TIMING_CON0	0x130
> +#define SDMMC_TIMING_CON1	0x134
> +#define ROCKCHIP_MMC_DELAY_SEL BIT(10)
> +#define ROCKCHIP_MMC_DEGREE_MASK 0x3
> +#define ROCKCHIP_MMC_DELAYNUM_OFFSET 2

the delaynum offset in the register is 3, please don't encode how you
use constants below into the values. So the constants should reflect the
register setup


> +#define ROCKCHIP_MMC_DELAYNUM_MASK (0xff << ROCKCHIP_MMC_DELAYNUM_OFFSET)

please use one firm of mask values ROCKCHIP_MMC_DEGREE_MASK is used
for values _after_ they are shifted, while your DELAYNUM_MASK is vor values
_before_ they are shifted. Please decide on one form for both.


> +#define PSECS_PER_SEC 1000000000000LL

this is the second definition of this value, after clk/rockchip/clk-mmc-phase.c [0] .
The kernel already has a PSEC_PER_SEC definition in vdso/time64.h [1] and
thus linux/time64.h, so please re-use that - the innosilicon-dsi-phy
already does it too [2].

[0] https://elixir.bootlin.com/linux/v6.10.5/source/drivers/clk/rockchip/clk-mmc-phase.c#L37
[1] https://elixir.bootlin.com/linux/v6.10.5/source/include/vdso/time64.h#L11
[2] https://elixir.bootlin.com/linux/v6.11-rc3/source/drivers/phy/rockchip/phy-rockchip-inno-dsidphy.c#L415


> +#define ROCKCHIP_MMC_DELAY_ELEMENT_PSEC 60
> +#define HIWORD_UPDATE(val, mask, shift) \
> +		((val) << (shift) | (mask) << ((shift) + 16))
>  
>  static const unsigned int freqs[] = { 100000, 200000, 300000, 400000 };
>  
> @@ -25,9 +36,121 @@ struct dw_mci_rockchip_priv_data {
>  	int			default_sample_phase;
>  	int			num_phases;
>  	bool			use_v2_tuning;
> +	int			usrid;
>  	int			last_degree;
>  };
>  
> +/*
> + * Each fine delay is between 44ps-77ps. Assume each fine delay is 60ps to
> + * simplify calculations. So 45degs could be anywhere between 33deg and 57.8deg.
> + */
> +static int rockchip_mmc_get_phase(struct dw_mci *host, bool sample)
> +{
> +	unsigned long rate = clk_get_rate(host->ciu_clk);
> +	u32 raw_value;
> +	u16 degrees;
> +	u32 delay_num = 0;
> +
> +	/* Constant signal, no measurable phase shift */
> +	if (!rate)
> +		return 0;
> +
> +	if (sample)
> +		raw_value = mci_readl(host, TIMING_CON1) >> 1;
> +	else
> +		raw_value = mci_readl(host, TIMING_CON0) >> 1;

please define some sort of 
	#define ROCKCHIP_MMC_DEGREE_OFFSET	1 
above and use that constant instead of hard-coding the 1 shift

> +
> +	degrees = (raw_value & ROCKCHIP_MMC_DEGREE_MASK) * 90;
> +
> +	if (raw_value & ROCKCHIP_MMC_DELAY_SEL) {
> +		/* degrees/delaynum * 1000000 */
> +		unsigned long factor = (ROCKCHIP_MMC_DELAY_ELEMENT_PSEC / 10) *
> +					36 * (rate / 10000);
> +
> +		delay_num = (raw_value & ROCKCHIP_MMC_DELAYNUM_MASK);
> +		delay_num >>= ROCKCHIP_MMC_DELAYNUM_OFFSET;
> +		degrees += DIV_ROUND_CLOSEST(delay_num * factor, 1000000);
> +	}
> +
> +	return degrees % 360;
> +}
> +
> +static int rockchip_mmc_set_phase(struct dw_mci *host, bool sample, int degrees)
> +{
> +	unsigned long rate = clk_get_rate(host->ciu_clk);
> +	u8 nineties, remainder;
> +	u8 delay_num;
> +	u32 raw_value;
> +	u32 delay;
> +
> +	/*
> +	 * The below calculation is based on the output clock from
> +	 * MMC host to the card, which expects the phase clock inherits
> +	 * the clock rate from its parent, namely the output clock
> +	 * provider of MMC host. However, things may go wrong if
> +	 * (1) It is orphan.
> +	 * (2) It is assigned to the wrong parent.
> +	 *
> +	 * This check help debug the case (1), which seems to be the
> +	 * most likely problem we often face and which makes it difficult
> +	 * for people to debug unstable mmc tuning results.
> +	 */
> +	if (!rate) {
> +		dev_err(host->dev, "%s: invalid clk rate\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	nineties = degrees / 90;
> +	remainder = (degrees % 90);
> +
> +	/*
> +	 * Due to the inexact nature of the "fine" delay, we might
> +	 * actually go non-monotonic.  We don't go _too_ monotonic
> +	 * though, so we should be OK.  Here are options of how we may
> +	 * work:
> +	 *
> +	 * Ideally we end up with:
> +	 *   1.0, 2.0, ..., 69.0, 70.0, ...,  89.0, 90.0
> +	 *
> +	 * On one extreme (if delay is actually 44ps):
> +	 *   .73, 1.5, ..., 50.6, 51.3, ...,  65.3, 90.0
> +	 * The other (if delay is actually 77ps):
> +	 *   1.3, 2.6, ..., 88.6. 89.8, ..., 114.0, 90
> +	 *
> +	 * It's possible we might make a delay that is up to 25
> +	 * degrees off from what we think we're making.  That's OK
> +	 * though because we should be REALLY far from any bad range.
> +	 */
> +
> +	/*
> +	 * Convert to delay; do a little extra work to make sure we
> +	 * don't overflow 32-bit / 64-bit numbers.
> +	 */
> +	delay = 10000000; /* PSECS_PER_SEC / 10000 / 10 */
> +	delay *= remainder;
> +	delay = DIV_ROUND_CLOSEST(delay,
> +			(rate / 1000) * 36 *
> +				(ROCKCHIP_MMC_DELAY_ELEMENT_PSEC / 10));
> +
> +	delay_num = (u8) min_t(u32, delay, 255);
> +
> +	raw_value = delay_num ? ROCKCHIP_MMC_DELAY_SEL : 0;
> +	raw_value |= delay_num << ROCKCHIP_MMC_DELAYNUM_OFFSET;
> +	raw_value |= nineties;
> +
> +	if (sample)
> +		mci_writel(host, TIMING_CON1, HIWORD_UPDATE(raw_value, 0x07ff, 1));
> +	else
> +		mci_writel(host, TIMING_CON0, HIWORD_UPDATE(raw_value, 0x07ff, 1));
> +
> +	dev_dbg(host->dev, "set %s_phase(%d) delay_nums=%u actual_degrees=%d\n",
> +		sample ? "sample" : "drv", degrees, delay_num,
> +		rockchip_mmc_get_phase(host, sample)
> +	);
> +
> +	return 0;
> +}
> +
>  static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios)
>  {
>  	struct dw_mci_rockchip_priv_data *priv = host->priv;
> @@ -65,8 +188,12 @@ static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios)
>  	}
>  
>  	/* Make sure we use phases which we can enumerate with */
> -	if (!IS_ERR(priv->sample_clk) && ios->timing <= MMC_TIMING_SD_HS)
> -		clk_set_phase(priv->sample_clk, priv->default_sample_phase);
> +	if (!IS_ERR(priv->sample_clk) && ios->timing <= MMC_TIMING_SD_HS) {
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			rockchip_mmc_set_phase(host, true, priv->default_sample_phase);
> +		else
> +			clk_set_phase(priv->sample_clk, priv->default_sample_phase);
> +	}
>  
>  	/*
>  	 * Set the drive phase offset based on speed mode to achieve hold times.
> @@ -129,7 +256,10 @@ static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios)
>  			break;
>  		}
>  
> -		clk_set_phase(priv->drv_clk, phase);
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			rockchip_mmc_set_phase(host, false, phase);
> +		else
> +			clk_set_phase(priv->drv_clk, phase);
>  	}
>  }
>  
> @@ -141,13 +271,16 @@ static int dw_mci_v2_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>  	struct dw_mci *host = slot->host;
>  	struct dw_mci_rockchip_priv_data *priv = host->priv;
>  	struct mmc_host *mmc = slot->mmc;
> -	u32 degrees[4] = {90, 180, 270, 360};
> +	u32 degree, degrees[4] = {90, 180, 270, 360};
>  	int i;
>  	static bool inherit = true;
>  
>  	if (inherit) {
>  		inherit = false;
> -		i = clk_get_phase(priv->sample_clk) / 90 - 1;
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			i = rockchip_mmc_get_phase(host, true) / 90;
> +		else
> +			i = clk_get_phase(priv->sample_clk) / 90 - 1;
>  		goto done;
>  	}
>  
> @@ -156,7 +289,11 @@ static int dw_mci_v2_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>  		if (degrees[i] == priv->last_degree)
>  			continue;
>  
> -		clk_set_phase(priv->sample_clk, degrees[i]);
> +		degree = (degrees[i] + priv->last_degree + 90) % 360;
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			rockchip_mmc_set_phase(host, true, degree);
> +		else
> +			clk_set_phase(priv->sample_clk, degree);
>  		if (!mmc_send_tuning(mmc, opcode, NULL))
>  			break;
>  	}
> @@ -189,6 +326,7 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>  	int longest_range_len = -1;
>  	int longest_range = -1;
>  	int middle_phase;
> +	int phase;
>  
>  	if (IS_ERR(priv->sample_clk)) {
>  		dev_err(host->dev, "Tuning clock (sample_clk) not defined.\n");
> @@ -209,8 +347,15 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>  
>  	/* Try each phase and extract good ranges */
>  	for (i = 0; i < priv->num_phases; ) {
> -		clk_set_phase(priv->sample_clk,
> -			      TUNING_ITERATION_TO_PHASE(i, priv->num_phases));
> +		/* Cannot guarantee any phases larger than 270 would work well */
> +		if (TUNING_ITERATION_TO_PHASE(i, priv->num_phases) > 270)
> +			break;

this changes behaviour for the existing user as well.
Not say that this is incorrect, but limiting phases to a working area should be a separate change.

> +		if (priv->usrid == USRID_INTER_PHASE)
> +			rockchip_mmc_set_phase(host, true,
> +				TUNING_ITERATION_TO_PHASE(i, priv->num_phases));
> +		else
> +			clk_set_phase(priv->sample_clk,
> +				TUNING_ITERATION_TO_PHASE(i, priv->num_phases));
>  
>  		v = !mmc_send_tuning(mmc, opcode, NULL);
>  
> @@ -256,7 +401,10 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>  	}
>  
>  	if (ranges[0].start == 0 && ranges[0].end == priv->num_phases - 1) {
> -		clk_set_phase(priv->sample_clk, priv->default_sample_phase);
> +		if (priv->usrid == USRID_INTER_PHASE)
> +			rockchip_mmc_set_phase(host, true, priv->default_sample_phase);
> +		else
> +			clk_set_phase(priv->sample_clk, priv->default_sample_phase);
>  		dev_info(host->dev, "All phases work, using default phase %d.",
>  			 priv->default_sample_phase);
>  		goto free;
> @@ -293,12 +441,13 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
>  
>  	middle_phase = ranges[longest_range].start + longest_range_len / 2;
>  	middle_phase %= priv->num_phases;
> -	dev_info(host->dev, "Successfully tuned phase to %d\n",
> -		 TUNING_ITERATION_TO_PHASE(middle_phase, priv->num_phases));
> +	phase = TUNING_ITERATION_TO_PHASE(middle_phase, priv->num_phases);
> +	dev_info(host->dev, "Successfully tuned phase to %d\n", phase);
>  
> -	clk_set_phase(priv->sample_clk,
> -		      TUNING_ITERATION_TO_PHASE(middle_phase,
> -						priv->num_phases));
> +	if (priv->usrid == USRID_INTER_PHASE)
> +		rockchip_mmc_set_phase(host, true, phase);
> +	else
> +		clk_set_phase(priv->sample_clk, phase);

with the change in compatible (not claiming to be compatible with tk3288)
you can also just introduce a new dw_mci_drv_data for the rk3576 and
following can therefore create a separate tuning function to not need all
those ifs.

And also a shorter parse_dt function that does not tries to get non-existent
clocks.


>  
>  free:
>  	kfree(ranges);
> @@ -342,6 +491,7 @@ static int dw_mci_rk3288_parse_dt(struct dw_mci *host)
>  static int dw_mci_rockchip_init(struct dw_mci *host)
>  {
>  	int ret, i;
> +	struct dw_mci_rockchip_priv_data *priv = host->priv;
>  
>  	/* It is slot 8 on Rockchip SoCs */
>  	host->sdio_id0 = 8;
> @@ -365,6 +515,12 @@ static int dw_mci_rockchip_init(struct dw_mci *host)
>  			dev_warn(host->dev, "no valid minimum freq: %d\n", ret);
>  	}
>  
> +	priv->usrid = mci_readl(host, USRID);
> +	if (priv->usrid == USRID_INTER_PHASE) {
> +		priv->sample_clk = NULL;
> +		priv->drv_clk = NULL;
> +	}
> +

in any case, shouldn't the clocks be NULL anyway?


Heiko
Heiko Stübner Aug. 15, 2024, 2:09 p.m. UTC | #6
Am Donnerstag, 15. August 2024, 15:23:40 CEST schrieb Detlev Casanova:
> On Wednesday, 14 August 2024 20:55:37 EDT Shawn Lin wrote:
> > Hi Detlev
> > 
> > 在 2024/8/15 6:34, Detlev Casanova 写道:
> > > From: Shawn Lin <shawn.lin@rock-chips.com>
> > > 
> > > v2 tuning will inherit pre-stage loader's phase settings for the first
> > > time, and do re-tune if necessary.
> > > Re-tune will still try the rough degrees, for instance, 90, 180, 270,
> > > 360 but continue to do the fine tuning if sample window isn't good
> > > enough.
> > > 
> > > Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> > > Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
> > > ---
> > > 
> > >   drivers/mmc/host/dw_mmc-rockchip.c | 49 ++++++++++++++++++++++++++++++
> > >   1 file changed, 49 insertions(+)
> > > 
> > > diff --git a/drivers/mmc/host/dw_mmc-rockchip.c
> > > b/drivers/mmc/host/dw_mmc-rockchip.c index b07190ba4b7ac..367633f4e8892
> > > 100644
> > > --- a/drivers/mmc/host/dw_mmc-rockchip.c
> > > +++ b/drivers/mmc/host/dw_mmc-rockchip.c
> 
> [...]
> 
> > >   		
> > >   		priv->default_sample_phase = 0;
> > > 
> > > +	priv->use_v2_tuning =
> > > +		of_device_is_compatible(host->dev->of_node,
> > > +					"rockchip,rk3576-dw-
> mshc");
> > > +
> > 
> > v2 is a kind of software decision instead of hardware dependency.
> > So in theory, any SoC can claim to use it via DT.
> 
> Yes but from my tests, only rk3576 won't work without it. So it makes sense to 
> only use v2 for this SoC (and other future ones not supported yet)

Good know and thanks for testing that scenario.

If you go with my suggestion from patch3 and separate the rk3576
from the original rk3288 type, you can even just assume this v2-tuning
for all those types (rk3576 only so far)


> > >   	priv->drv_clk = devm_clk_get(host->dev, "ciu-drive");
> > >   	if (IS_ERR(priv->drv_clk))
> > >   	
> > >   		dev_dbg(host->dev, "ciu-drive not available\n");
> 
> 
> Detlev.
> 
> 
>
Shawn Lin Aug. 16, 2024, 12:41 a.m. UTC | #7
Hi Heiko,

在 2024/8/15 21:17, Heiko Stübner 写道:
> Am Donnerstag, 15. August 2024, 02:55:37 CEST schrieb Shawn Lin:
>> Hi Detlev
>>
>> 在 2024/8/15 6:34, Detlev Casanova 写道:
>>> From: Shawn Lin <shawn.lin@rock-chips.com>
>>>
>>> v2 tuning will inherit pre-stage loader's phase settings for the first
>>> time, and do re-tune if necessary.
>>> Re-tune will still try the rough degrees, for instance, 90, 180, 270,
>>> 360 but continue to do the fine tuning if sample window isn't good
>>> enough.
>>>
>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
> 
>>> @@ -277,6 +322,10 @@ static int dw_mci_rk3288_parse_dt(struct dw_mci *host)
>>>    					&priv->default_sample_phase))
>>>    		priv->default_sample_phase = 0;
>>>    
>>> +	priv->use_v2_tuning =
>>> +		of_device_is_compatible(host->dev->of_node,
>>> +					"rockchip,rk3576-dw-mshc");
>>> +
>>
>> v2 is a kind of software decision instead of hardware dependency.
>> So in theory, any SoC can claim to use it via DT.
> 
> which actually makes it unsuitable for dt. >

Understood.

> Devicetree describes hardware-properties and should _not_ be used for
> software configuration.
> 
>  From the comment above, I assume the rk3576 does not need that feature
> and can just work with the regular tuning?

Yep, your are right.

> 
> So there are two routes for the immediate future:
> (1) rk3576 _needs_ that feature, then going with the compatible is fine
> 
> (2) rk3576 does not need absolutely need that feature, then I'd expect
> the basic rk3576 to first come without, as I'd expect a lot more explanation
> on why it is actually needed, and which cases it does improve.
> The commit message does not really explain that much about why this
> is a great/needed feature and which areas it does improve.
> 

Vote for the 2nd. rk3576 just need
[PATCH v3 3/3] mmc: dw_mmc-rockchip: Add internal phase support


> 
> Heiko
> 
>
Shawn Lin Aug. 16, 2024, 12:43 a.m. UTC | #8
在 2024/8/15 21:23, Detlev Casanova 写道:
> On Wednesday, 14 August 2024 20:55:37 EDT Shawn Lin wrote:
>> Hi Detlev
>>
>> 在 2024/8/15 6:34, Detlev Casanova 写道:
>>> From: Shawn Lin <shawn.lin@rock-chips.com>
>>>
>>> v2 tuning will inherit pre-stage loader's phase settings for the first
>>> time, and do re-tune if necessary.
>>> Re-tune will still try the rough degrees, for instance, 90, 180, 270,
>>> 360 but continue to do the fine tuning if sample window isn't good
>>> enough.
>>>
>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
>>> ---
>>>
>>>    drivers/mmc/host/dw_mmc-rockchip.c | 49 ++++++++++++++++++++++++++++++
>>>    1 file changed, 49 insertions(+)
>>>
>>> diff --git a/drivers/mmc/host/dw_mmc-rockchip.c
>>> b/drivers/mmc/host/dw_mmc-rockchip.c index b07190ba4b7ac..367633f4e8892
>>> 100644
>>> --- a/drivers/mmc/host/dw_mmc-rockchip.c
>>> +++ b/drivers/mmc/host/dw_mmc-rockchip.c
> 
> [...]
> 
>>>    		
>>>    		priv->default_sample_phase = 0;
>>>
>>> +	priv->use_v2_tuning =
>>> +		of_device_is_compatible(host->dev->of_node,
>>> +					"rockchip,rk3576-dw-
> mshc");
>>> +
>>
>> v2 is a kind of software decision instead of hardware dependency.
>> So in theory, any SoC can claim to use it via DT.
> 
> Yes but from my tests, only rk3576 won't work without it. So it makes sense to
> only use v2 for this SoC (and other future ones not supported yet)
> 

However from both of the IC design POV and the test from my side,
we just need internal phase support patch, and rk3576 could
work.

>>
>>>    	priv->drv_clk = devm_clk_get(host->dev, "ciu-drive");
>>>    	if (IS_ERR(priv->drv_clk))
>>>    	
>>>    		dev_dbg(host->dev, "ciu-drive not available\n");
> 
> 
> Detlev.
> 
>