Message ID | 20220317211426.38940-1-eajames@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | spi: fsi: Implement a timeout for polling status | expand |
On Thu, 17 Mar 2022 at 21:14, Eddie James <eajames@linux.ibm.com> wrote: > > The data transfer routines must poll the status register to > determine when more data can be shifted in or out. If the hardware > gets into a bad state, these polling loops may never exit. Prevent > this by returning an error if a timeout is exceeded. This makes sense. We may even want to put this code in regardless. However, I'm wondering why the code in fsi_spi_status didn't catch this. > static int fsi_spi_status(struct fsi_spi *ctx, u64 *status, const char *dir) > { > int rc = fsi_spi_read_reg(ctx, SPI_FSI_STATUS, status); You mentioned the error condition is we get back 0xff. That means that status will be 0xffff_ffff_ffff_ffff ? Did you observe status being this value? > if (rc) > return rc; > > if (*status & SPI_FSI_STATUS_ANY_ERROR) { I think that we're checking against 0xffe0f000. > dev_err(ctx->dev, "%s error: %016llx\n", dir, *status); > > rc = fsi_spi_reset(ctx); > if (rc) > return rc; Is the problem here? fsi_spi_reset writes to the clock config registers, but doesn't read the status. Obviously doing the writes causes a call to fsi_spi_check_status, but that checks the FSI2SPI bridge, not the SPI master. ...but it doesn't matter, because we're either going to return an error from the reset, or return EREMOTEIO, so there's no masking of the error. > > return -EREMOTEIO; > } > > return 0; > } > > Signed-off-by: Eddie James <eajames@linux.ibm.com> > --- > drivers/spi/spi-fsi.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/spi/spi-fsi.c b/drivers/spi/spi-fsi.c > index b6c7467f0b59..d403a7a3021d 100644 > --- a/drivers/spi/spi-fsi.c > +++ b/drivers/spi/spi-fsi.c > @@ -25,6 +25,7 @@ > > #define SPI_FSI_BASE 0x70000 > #define SPI_FSI_INIT_TIMEOUT_MS 1000 > +#define SPI_FSI_STATUS_TIMEOUT_MS 100 Can you add a comment (or put something in the commit message) about why you chose 100ms. > #define SPI_FSI_MAX_RX_SIZE 8 > #define SPI_FSI_MAX_TX_SIZE 40 > > @@ -299,6 +300,7 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx, > struct spi_transfer *transfer) > { > int rc = 0; > + unsigned long end; > u64 status = 0ULL; > > if (transfer->tx_buf) { > @@ -315,10 +317,14 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx, > if (rc) > return rc; > > + end = jiffies + msecs_to_jiffies(SPI_FSI_STATUS_TIMEOUT_MS); > do { > rc = fsi_spi_status(ctx, &status, "TX"); > if (rc) > return rc; > + > + if (time_after(jiffies, end)) > + return -ETIMEDOUT; > } while (status & SPI_FSI_STATUS_TDR_FULL); > > sent += nb; > @@ -329,10 +335,14 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx, > u8 *rx = transfer->rx_buf; > > while (transfer->len > recv) { > + end = jiffies + msecs_to_jiffies(SPI_FSI_STATUS_TIMEOUT_MS); > do { > rc = fsi_spi_status(ctx, &status, "RX"); > if (rc) > return rc; > + > + if (time_after(jiffies, end)) > + return -ETIMEDOUT; > } while (!(status & SPI_FSI_STATUS_RDR_FULL)); > > rc = fsi_spi_read_reg(ctx, SPI_FSI_DATA_RX, &in); > -- > 2.27.0 >
On 3/17/22 23:19, Joel Stanley wrote: > On Thu, 17 Mar 2022 at 21:14, Eddie James <eajames@linux.ibm.com> wrote: >> The data transfer routines must poll the status register to >> determine when more data can be shifted in or out. If the hardware >> gets into a bad state, these polling loops may never exit. Prevent >> this by returning an error if a timeout is exceeded. > This makes sense. We may even want to put this code in regardless. > > However, I'm wondering why the code in fsi_spi_status didn't catch this. Same, which is why I thought the problem couldn't be happening here for a long time. See below with what I think is going on. > >> static int fsi_spi_status(struct fsi_spi *ctx, u64 *status, const char *dir) >> { >> int rc = fsi_spi_read_reg(ctx, SPI_FSI_STATUS, status); > You mentioned the error condition is we get back 0xff. That means that > status will be 0xffff_ffff_ffff_ffff ? > > Did you observe status being this value? No, I think my observation of 0xff is not universal. I suspect that while the CFAM is IN reset, 0xff is returned, but once it's been reset, valid (though maybe uninitialized) data is returned. I observed a status of 0x0001100000000000, which means the controller is idle, which makes sense since it's been reset. So the issue occurs if we start an operation before a CFAM reset and are waiting for it to complete during the CFAM reset, but we also don't get any failed/invalid data FSI operations during/after the reset (very timing dependent - the FSI master does lock during the reset but doesn't wait afterwards for the hardware to initialize). > >> if (rc) >> return rc; >> >> if (*status & SPI_FSI_STATUS_ANY_ERROR) { > I think that we're checking against 0xffe0f000. > >> dev_err(ctx->dev, "%s error: %016llx\n", dir, *status); >> >> rc = fsi_spi_reset(ctx); >> if (rc) >> return rc; > Is the problem here? fsi_spi_reset writes to the clock config > registers, but doesn't read the status. > > Obviously doing the writes causes a call to fsi_spi_check_status, but > that checks the FSI2SPI bridge, not the SPI master. > > ...but it doesn't matter, because we're either going to return an > error from the reset, or return EREMOTEIO, so there's no masking of > the error. Not sure I follow. I don't think we were hitting this path in this error scenario. Do you think we need to check the status after a reset? It should always be the same. > >> return -EREMOTEIO; >> } >> >> return 0; >> } > >> Signed-off-by: Eddie James <eajames@linux.ibm.com> >> --- >> drivers/spi/spi-fsi.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/drivers/spi/spi-fsi.c b/drivers/spi/spi-fsi.c >> index b6c7467f0b59..d403a7a3021d 100644 >> --- a/drivers/spi/spi-fsi.c >> +++ b/drivers/spi/spi-fsi.c >> @@ -25,6 +25,7 @@ >> >> #define SPI_FSI_BASE 0x70000 >> #define SPI_FSI_INIT_TIMEOUT_MS 1000 >> +#define SPI_FSI_STATUS_TIMEOUT_MS 100 > Can you add a comment (or put something in the commit message) about > why you chose 100ms. Hm, sure, but I chose it pretty arbitrarily. I'm not sure how to choose something like this. > >> #define SPI_FSI_MAX_RX_SIZE 8 >> #define SPI_FSI_MAX_TX_SIZE 40 >> >> @@ -299,6 +300,7 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx, >> struct spi_transfer *transfer) >> { >> int rc = 0; >> + unsigned long end; >> u64 status = 0ULL; >> >> if (transfer->tx_buf) { >> @@ -315,10 +317,14 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx, >> if (rc) >> return rc; >> >> + end = jiffies + msecs_to_jiffies(SPI_FSI_STATUS_TIMEOUT_MS); >> do { >> rc = fsi_spi_status(ctx, &status, "TX"); >> if (rc) >> return rc; >> + >> + if (time_after(jiffies, end)) >> + return -ETIMEDOUT; >> } while (status & SPI_FSI_STATUS_TDR_FULL); >> >> sent += nb; >> @@ -329,10 +335,14 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx, >> u8 *rx = transfer->rx_buf; >> >> while (transfer->len > recv) { >> + end = jiffies + msecs_to_jiffies(SPI_FSI_STATUS_TIMEOUT_MS); >> do { >> rc = fsi_spi_status(ctx, &status, "RX"); >> if (rc) >> return rc; >> + >> + if (time_after(jiffies, end)) >> + return -ETIMEDOUT; >> } while (!(status & SPI_FSI_STATUS_RDR_FULL)); >> >> rc = fsi_spi_read_reg(ctx, SPI_FSI_DATA_RX, &in); >> -- >> 2.27.0 >>
On Thu, 17 Mar 2022 16:14:26 -0500, Eddie James wrote: > The data transfer routines must poll the status register to > determine when more data can be shifted in or out. If the hardware > gets into a bad state, these polling loops may never exit. Prevent > this by returning an error if a timeout is exceeded. > > Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/1] spi: fsi: Implement a timeout for polling status commit: 89b35e3f28514087d3f1e28e8f5634fbfd07c554 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
diff --git a/drivers/spi/spi-fsi.c b/drivers/spi/spi-fsi.c index b6c7467f0b59..d403a7a3021d 100644 --- a/drivers/spi/spi-fsi.c +++ b/drivers/spi/spi-fsi.c @@ -25,6 +25,7 @@ #define SPI_FSI_BASE 0x70000 #define SPI_FSI_INIT_TIMEOUT_MS 1000 +#define SPI_FSI_STATUS_TIMEOUT_MS 100 #define SPI_FSI_MAX_RX_SIZE 8 #define SPI_FSI_MAX_TX_SIZE 40 @@ -299,6 +300,7 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx, struct spi_transfer *transfer) { int rc = 0; + unsigned long end; u64 status = 0ULL; if (transfer->tx_buf) { @@ -315,10 +317,14 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx, if (rc) return rc; + end = jiffies + msecs_to_jiffies(SPI_FSI_STATUS_TIMEOUT_MS); do { rc = fsi_spi_status(ctx, &status, "TX"); if (rc) return rc; + + if (time_after(jiffies, end)) + return -ETIMEDOUT; } while (status & SPI_FSI_STATUS_TDR_FULL); sent += nb; @@ -329,10 +335,14 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx, u8 *rx = transfer->rx_buf; while (transfer->len > recv) { + end = jiffies + msecs_to_jiffies(SPI_FSI_STATUS_TIMEOUT_MS); do { rc = fsi_spi_status(ctx, &status, "RX"); if (rc) return rc; + + if (time_after(jiffies, end)) + return -ETIMEDOUT; } while (!(status & SPI_FSI_STATUS_RDR_FULL)); rc = fsi_spi_read_reg(ctx, SPI_FSI_DATA_RX, &in);
The data transfer routines must poll the status register to determine when more data can be shifted in or out. If the hardware gets into a bad state, these polling loops may never exit. Prevent this by returning an error if a timeout is exceeded. Signed-off-by: Eddie James <eajames@linux.ibm.com> --- drivers/spi/spi-fsi.c | 10 ++++++++++ 1 file changed, 10 insertions(+)