Message ID | 20230927154244.3774670-1-zhangjian.3032@bytedance.com |
---|---|
State | New |
Headers | show |
Series | [v2] i2c: aspeed: Fix i2c bus hang in slave read | expand |
Hi Jian, On Wed, Sep 27, 2023 at 11:42:43PM +0800, Jian Zhang wrote: > When the `CONFIG_I2C_SLAVE` option is enabled and the device operates > as a slave, a situation arises where the master sends a START signal > without the accompanying STOP signal. This action results in a > persistent I2C bus timeout. The core issue stems from the fact that > the i2c controller remains in a slave read state without a timeout > mechanism. As a consequence, the bus perpetually experiences timeouts. > > In this case, the i2c bus will be reset, but the slave_state reset is > missing. > > Fixes: fee465150b45 ("i2c: aspeed: Reset the i2c controller when timeout occurs") > Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com> Why I'm failing to find your v1 patch? And where is the changelog? Andi
> From: "Andi Shyti"<andi.shyti@kernel.org> > Date: Thu, Sep 28, 2023, 22:51 > Subject: [External] Re: [PATCH v2] i2c: aspeed: Fix i2c bus hang in slave read > To: "Jian Zhang"<zhangjian.3032@bytedance.com> > Cc: <brendan.higgins@linux.dev>, <benh@kernel.crashing.org>, <joel@jms.id.au>, <andrew@aj.id.au>, <zhangjian3032@gmail.com>, <yulei.sh@bytedance.com>, <xiexinnan@bytedance.com>, "Tommy Huang"<tommy_huang@aspeedtech.com>, "Wolfram Sang"<wsa@kernel.org>, "open list:ARM/ASPEED I2C DRIVER"<linux-i2c@vger.kernel.org>, "moderated list:ARM/ASPEED I2C DRIVER"<openbmc@lists.ozlabs.org>, "moderated list:ARM/ASPEED MACHINE SUPPORT"<linux-arm-kernel@lists.infradead.org>, "moderated list:ARM/ASPEED MACHINE SUPPORT"<linux-aspeed@lists.ozlabs.org>, "open list"<linux-kernel@vger.kernel.org> > Hi Jian, > > On Wed, Sep 27, 2023 at 11:42:43PM +0800, Jian Zhang wrote: > > When the `CONFIG_I2C_SLAVE` option is enabled and the device operates > > as a slave, a situation arises where the master sends a START signal > > without the accompanying STOP signal. This action results in a > > persistent I2C bus timeout. The core issue stems from the fact that > > the i2c controller remains in a slave read state without a timeout > > mechanism. As a consequence, the bus perpetually experiences timeouts. > > > > In this case, the i2c bus will be reset, but the slave_state reset is > > missing. > > > > Fixes: fee465150b45 ("i2c: aspeed: Reset the i2c controller when timeout occurs") > > Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com> > > Why I'm failing to find your v1 patch? And where is the > changelog? Sorry, something was missing, v2: * remove the i2c slave reset and only move the `bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE` to the aspeed_i2c_init [0]: https://lore.kernel.org/linux-arm-kernel/20230810072155.3726352-1-zhangjian.3032@bytedance.com/T/ Jian > > Andi
On Wed, Sep 27, 2023 at 11:42:43PM +0800, Jian Zhang wrote: > When the `CONFIG_I2C_SLAVE` option is enabled and the device operates > as a slave, a situation arises where the master sends a START signal > without the accompanying STOP signal. This action results in a > persistent I2C bus timeout. The core issue stems from the fact that > the i2c controller remains in a slave read state without a timeout > mechanism. As a consequence, the bus perpetually experiences timeouts. > > In this case, the i2c bus will be reset, but the slave_state reset is > missing. > > Fixes: fee465150b45 ("i2c: aspeed: Reset the i2c controller when timeout occurs") > Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com> Somebody wants to add tags here? I think it should go to my pull request this week.
Hi Jian, On Thu, Sep 28, 2023 at 11:04:23AM -0400, Jian Zhang wrote: > > From: "Andi Shyti"<andi.shyti@kernel.org> > > Date: Thu, Sep 28, 2023, 22:51 > > Subject: [External] Re: [PATCH v2] i2c: aspeed: Fix i2c bus hang in slave read > > To: "Jian Zhang"<zhangjian.3032@bytedance.com> > > Cc: <brendan.higgins@linux.dev>, <benh@kernel.crashing.org>, <joel@jms.id.au>, <andrew@aj.id.au>, <zhangjian3032@gmail.com>, <yulei.sh@bytedance.com>, <xiexinnan@bytedance.com>, "Tommy Huang"<tommy_huang@aspeedtech.com>, "Wolfram Sang"<wsa@kernel.org>, "open list:ARM/ASPEED I2C DRIVER"<linux-i2c@vger.kernel.org>, "moderated list:ARM/ASPEED I2C DRIVER"<openbmc@lists.ozlabs.org>, "moderated list:ARM/ASPEED MACHINE SUPPORT"<linux-arm-kernel@lists.infradead.org>, "moderated list:ARM/ASPEED MACHINE SUPPORT"<linux-aspeed@lists.ozlabs.org>, "open list"<linux-kernel@vger.kernel.org> > > Hi Jian, > > > > On Wed, Sep 27, 2023 at 11:42:43PM +0800, Jian Zhang wrote: > > > When the `CONFIG_I2C_SLAVE` option is enabled and the device operates > > > as a slave, a situation arises where the master sends a START signal > > > without the accompanying STOP signal. This action results in a > > > persistent I2C bus timeout. The core issue stems from the fact that > > > the i2c controller remains in a slave read state without a timeout > > > mechanism. As a consequence, the bus perpetually experiences timeouts. > > > > > > In this case, the i2c bus will be reset, but the slave_state reset is > > > missing. Acked-by: Andi Shyti <andi.shyti@kernel.org> I checked the flow in the driver and makes sense to me. I'd also love a last minute comment from Brendan or Benjamin or Joel. > > > Fixes: fee465150b45 ("i2c: aspeed: Reset the i2c controller when timeout occurs") > > > Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com> > > > > Why I'm failing to find your v1 patch? And where is the > > changelog? > Sorry, something was missing, > v2: > * remove the i2c slave reset and only move the `bus->slave_state = > ASPEED_I2C_SLAVE_INACTIVE` to the aspeed_i2c_init > > [0]: https://lore.kernel.org/linux-arm-kernel/20230810072155.3726352-1-zhangjian.3032@bytedance.com/T/ Thanks! I should really check my filters here. Andi > Jian > > > > Andi
On Fri, 2023-09-29 at 09:39 +0200, Wolfram Sang wrote: > On Wed, Sep 27, 2023 at 11:42:43PM +0800, Jian Zhang wrote: > > When the `CONFIG_I2C_SLAVE` option is enabled and the device operates > > as a slave, a situation arises where the master sends a START signal > > without the accompanying STOP signal. This action results in a > > persistent I2C bus timeout. The core issue stems from the fact that > > the i2c controller remains in a slave read state without a timeout > > mechanism. As a consequence, the bus perpetually experiences timeouts. > > > > In this case, the i2c bus will be reset, but the slave_state reset is > > missing. > > > > Fixes: fee465150b45 ("i2c: aspeed: Reset the i2c controller when timeout occurs") > > Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com> > > Somebody wants to add tags here? I think it should go to my pull request > this week. > I've tested this patch applied on top of fee465150b45 on an AST2600 and the the system behaviour doesn't seem worse. However, I can still lock the bus up and trigger a hung task panic by surprise-unplugging things. I'll poke around to see if I can get to the bottom of that. Resetting the slave state makes sense, so with the above observation aside: Tested-by: Andrew Jeffery <andrew@codeconstruct.com.au> Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au> That said I do wonder whether we should update the slave state in the same place we're updating the hardware state. It would cover off the gap identified by Jian if it were to ever occur anywhere else. Something like: diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c- aspeed.c index 5a416b39b818..28e2a5fc4528 100644 --- a/drivers/i2c/busses/i2c-aspeed.c +++ b/drivers/i2c/busses/i2c-aspeed.c @@ -749,6 +749,8 @@ static void __aspeed_i2c_reg_slave(struct aspeed_i2c_bus *bus, u16 slave_addr) func_ctrl_reg_val = readl(bus->base + ASPEED_I2C_FUN_CTRL_REG); func_ctrl_reg_val |= ASPEED_I2CD_SLAVE_EN; writel(func_ctrl_reg_val, bus->base + ASPEED_I2C_FUN_CTRL_REG); + + bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE; } static int aspeed_i2c_reg_slave(struct i2c_client *client) @@ -765,7 +767,6 @@ static int aspeed_i2c_reg_slave(struct i2c_client *client) __aspeed_i2c_reg_slave(bus, client->addr); bus->slave = client; - bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE; spin_unlock_irqrestore(&bus->lock, flags); return 0;
On 04/10/2023 13:08, Andrew Jeffery wrote: > On Fri, 2023-09-29 at 09:39 +0200, Wolfram Sang wrote: >> On Wed, Sep 27, 2023 at 11:42:43PM +0800, Jian Zhang wrote: >>> When the `CONFIG_I2C_SLAVE` option is enabled and the device operates >>> as a slave, a situation arises where the master sends a START signal >>> without the accompanying STOP signal. This action results in a >>> persistent I2C bus timeout. The core issue stems from the fact that >>> the i2c controller remains in a slave read state without a timeout >>> mechanism. As a consequence, the bus perpetually experiences timeouts. >>> >>> In this case, the i2c bus will be reset, but the slave_state reset is >>> missing. >>> >>> Fixes: fee465150b45 ("i2c: aspeed: Reset the i2c controller when timeout occurs") >>> Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com> >> >> Somebody wants to add tags here? I think it should go to my pull request >> this week. >> > > I've tested this patch applied on top of fee465150b45 on an AST2600 and > the the system behaviour doesn't seem worse. However, I can still lock > the bus up and trigger a hung task panic by surprise-unplugging things. > I'll poke around to see if I can get to the bottom of that. > > Resetting the slave state makes sense, so with the above observation > aside: > > Tested-by: Andrew Jeffery <andrew@codeconstruct.com.au> > Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au> > > That said I do wonder whether we should update the slave state in the > same place we're updating the hardware state. It would cover off the > gap identified by Jian if it were to ever occur anywhere else. > Something like: > > diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c- > aspeed.c > index 5a416b39b818..28e2a5fc4528 100644 > --- a/drivers/i2c/busses/i2c-aspeed.c > +++ b/drivers/i2c/busses/i2c-aspeed.c > @@ -749,6 +749,8 @@ static void __aspeed_i2c_reg_slave(struct > aspeed_i2c_bus *bus, u16 slave_addr) > func_ctrl_reg_val = readl(bus->base + ASPEED_I2C_FUN_CTRL_REG); > func_ctrl_reg_val |= ASPEED_I2CD_SLAVE_EN; > writel(func_ctrl_reg_val, bus->base + ASPEED_I2C_FUN_CTRL_REG); > + > + bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE; > } > > static int aspeed_i2c_reg_slave(struct i2c_client *client) > @@ -765,7 +767,6 @@ static int aspeed_i2c_reg_slave(struct i2c_client > *client) > __aspeed_i2c_reg_slave(bus, client->addr); > > bus->slave = client; > - bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE; > spin_unlock_irqrestore(&bus->lock, flags); > > return 0; > > We tested both Jian's patch and Andrew's patch on our MCTP-i2c bus (ast2600 based BMC) and see both patches work well. We currently use upstream i2c-aspeed.c driver with the commit [1] backported. Without that commit, we frequently experienced the bus hang (due to bus arbitration) and it is unable to recover. But, by reverting that commit and with Jian or Andrew's patch, we see the bus could be able to recover so we think both changes are good. [1] https://github.com/AspeedTech-BMC/linux/commit/11a94e5918aa0f87c828d63fd254dd60ab2505e5 Anyway, I would prefer Andrew's way because the bus->slave_state must always be reset to ASPEED_I2C_SLAVE_INACTIVE everytime __aspeed_i2c_reg_slave() is called. Thanks - Quan
On Thu, 2023-10-05 at 14:55 +0700, Quan Nguyen wrote: > > On 04/10/2023 13:08, Andrew Jeffery wrote: > > On Fri, 2023-09-29 at 09:39 +0200, Wolfram Sang wrote: > > > On Wed, Sep 27, 2023 at 11:42:43PM +0800, Jian Zhang wrote: > > > > When the `CONFIG_I2C_SLAVE` option is enabled and the device operates > > > > as a slave, a situation arises where the master sends a START signal > > > > without the accompanying STOP signal. This action results in a > > > > persistent I2C bus timeout. The core issue stems from the fact that > > > > the i2c controller remains in a slave read state without a timeout > > > > mechanism. As a consequence, the bus perpetually experiences timeouts. > > > > > > > > In this case, the i2c bus will be reset, but the slave_state reset is > > > > missing. > > > > > > > > Fixes: fee465150b45 ("i2c: aspeed: Reset the i2c controller when timeout occurs") > > > > Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com> > > > > > > Somebody wants to add tags here? I think it should go to my pull request > > > this week. > > > > > > > I've tested this patch applied on top of fee465150b45 on an AST2600 and > > the the system behaviour doesn't seem worse. However, I can still lock > > the bus up and trigger a hung task panic by surprise-unplugging things. > > I'll poke around to see if I can get to the bottom of that. > > > > Resetting the slave state makes sense, so with the above observation > > aside: > > > > Tested-by: Andrew Jeffery <andrew@codeconstruct.com.au> > > Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au> > > > > That said I do wonder whether we should update the slave state in the > > same place we're updating the hardware state. It would cover off the > > gap identified by Jian if it were to ever occur anywhere else. > > Something like: > > > > diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c- > > aspeed.c > > index 5a416b39b818..28e2a5fc4528 100644 > > --- a/drivers/i2c/busses/i2c-aspeed.c > > +++ b/drivers/i2c/busses/i2c-aspeed.c > > @@ -749,6 +749,8 @@ static void __aspeed_i2c_reg_slave(struct > > aspeed_i2c_bus *bus, u16 slave_addr) > > func_ctrl_reg_val = readl(bus->base + ASPEED_I2C_FUN_CTRL_REG); > > func_ctrl_reg_val |= ASPEED_I2CD_SLAVE_EN; > > writel(func_ctrl_reg_val, bus->base + ASPEED_I2C_FUN_CTRL_REG); > > + > > + bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE; > > } > > > > static int aspeed_i2c_reg_slave(struct i2c_client *client) > > @@ -765,7 +767,6 @@ static int aspeed_i2c_reg_slave(struct i2c_client > > *client) > > __aspeed_i2c_reg_slave(bus, client->addr); > > > > bus->slave = client; > > - bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE; > > spin_unlock_irqrestore(&bus->lock, flags); > > > > return 0; > > > > > > We tested both Jian's patch and Andrew's patch on our MCTP-i2c bus > (ast2600 based BMC) and see both patches work well. > > We currently use upstream i2c-aspeed.c driver with the commit [1] > backported. Without that commit, we frequently experienced the bus hang > (due to bus arbitration) and it is unable to recover. > > But, by reverting that commit and with Jian or Andrew's patch, we see > the bus could be able to recover so we think both changes are good. > > [1] > https://github.com/AspeedTech-BMC/linux/commit/11a94e5918aa0f87c828d63fd254dd60ab2505e5 > > Anyway, I would prefer Andrew's way because the bus->slave_state must > always be reset to ASPEED_I2C_SLAVE_INACTIVE everytime > __aspeed_i2c_reg_slave() is called. Jian, what's your preference? Are you happy to do a v3 along the lines of my suggestion above? Otherwise Wolfram can take v2 and we can always do the cleanup in a follow-up patch. Andrew
> From: "Andrew Jeffery"<andrew@codeconstruct.com.au> > Date: Fri, Oct 6, 2023, 08:20 > Subject: [External] Re: [PATCH v2] i2c: aspeed: Fix i2c bus hang in slave read > To: "Quan Nguyen"<quan@os.amperecomputing.com>, "Wolfram Sang"<wsa@kernel.org>, "Jian Zhang"<zhangjian.3032@bytedance.com> > Cc: "Andi Shyti"<andi.shyti@kernel.org>, "moderated list:ARM/ASPEED MACHINE SUPPORT"<linux-aspeed@lists.ozlabs.org>, <andrew@aj.id.au>, "moderated list:ARM/ASPEED I2C DRIVER"<openbmc@lists.ozlabs.org>, <yulei.sh@bytedance.com>, "open list"<linux-kernel@vger.kernel.org>, "Tommy Huang"<tommy_huang@aspeedtech.com>, "open list:ARM/ASPEED I2C DRIVER"<linux-i2c@vger.kernel.org>, <brendan.higgins@linux.dev>, <joel@jms.id.au>, <zhangjian3032@gmail.com>, "moderated list:ARM/ASPEED MACHINE SUPPORT"<linux-arm-kernel@lists.infradead.org>, <xiexinnan@bytedance.com> > On Thu, 2023-10-05 at 14:55 +0700, Quan Nguyen wrote: > > > > On 04/10/2023 13:08, Andrew Jeffery wrote: > > > On Fri, 2023-09-29 at 09:39 +0200, Wolfram Sang wrote: > > > > On Wed, Sep 27, 2023 at 11:42:43PM +0800, Jian Zhang wrote: > > > > > When the `CONFIG_I2C_SLAVE` option is enabled and the device operates > > > > > as a slave, a situation arises where the master sends a START signal > > > > > without the accompanying STOP signal. This action results in a > > > > > persistent I2C bus timeout. The core issue stems from the fact that > > > > > the i2c controller remains in a slave read state without a timeout > > > > > mechanism. As a consequence, the bus perpetually experiences timeouts. > > > > > > > > > > In this case, the i2c bus will be reset, but the slave_state reset is > > > > > missing. > > > > > > > > > > Fixes: fee465150b45 ("i2c: aspeed: Reset the i2c controller when timeout occurs") > > > > > Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com> > > > > > > > > Somebody wants to add tags here? I think it should go to my pull request > > > > this week. > > > > > > > > > > I've tested this patch applied on top of fee465150b45 on an AST2600 and > > > the the system behaviour doesn't seem worse. However, I can still lock > > > the bus up and trigger a hung task panic by surprise-unplugging things. > > > I'll poke around to see if I can get to the bottom of that. > > > > > > Resetting the slave state makes sense, so with the above observation > > > aside: > > > > > > Tested-by: Andrew Jeffery <andrew@codeconstruct.com.au> > > > Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au> > > > > > > That said I do wonder whether we should update the slave state in the > > > same place we're updating the hardware state. It would cover off the > > > gap identified by Jian if it were to ever occur anywhere else. > > > Something like: > > > > > > diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c- > > > aspeed.c > > > index 5a416b39b818..28e2a5fc4528 100644 > > > --- a/drivers/i2c/busses/i2c-aspeed.c > > > +++ b/drivers/i2c/busses/i2c-aspeed.c > > > @@ -749,6 +749,8 @@ static void __aspeed_i2c_reg_slave(struct > > > aspeed_i2c_bus *bus, u16 slave_addr) > > > func_ctrl_reg_val = readl(bus->base + ASPEED_I2C_FUN_CTRL_REG); > > > func_ctrl_reg_val |= ASPEED_I2CD_SLAVE_EN; > > > writel(func_ctrl_reg_val, bus->base + ASPEED_I2C_FUN_CTRL_REG); > > > + > > > + bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE; > > > } > > > > > > static int aspeed_i2c_reg_slave(struct i2c_client *client) > > > @@ -765,7 +767,6 @@ static int aspeed_i2c_reg_slave(struct i2c_client > > > *client) > > > __aspeed_i2c_reg_slave(bus, client->addr); > > > > > > bus->slave = client; > > > - bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE; > > > spin_unlock_irqrestore(&bus->lock, flags); > > > > > > return 0; > > > > > > > > > > We tested both Jian's patch and Andrew's patch on our MCTP-i2c bus > > (ast2600 based BMC) and see both patches work well. > > > > We currently use upstream i2c-aspeed.c driver with the commit [1] > > backported. Without that commit, we frequently experienced the bus hang > > (due to bus arbitration) and it is unable to recover. > > > > But, by reverting that commit and with Jian or Andrew's patch, we see > > the bus could be able to recover so we think both changes are good. > > > > [1] > > https://github.com/AspeedTech-BMC/linux/commit/11a94e5918aa0f87c828d63fd254dd60ab2505e5 > > > > Anyway, I would prefer Andrew's way because the bus->slave_state must > > always be reset to ASPEED_I2C_SLAVE_INACTIVE everytime > > __aspeed_i2c_reg_slave() is called. > > Jian, what's your preference? Are you happy to do a v3 along the lines > of my suggestion above? Thanks, LGTM, I will send the patch v3. Jian. > > Otherwise Wolfram can take v2 and we can always do the cleanup in a > follow-up patch. > > Andrew
diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c index 5a416b39b818..18f618625472 100644 --- a/drivers/i2c/busses/i2c-aspeed.c +++ b/drivers/i2c/busses/i2c-aspeed.c @@ -933,6 +933,7 @@ static int aspeed_i2c_init(struct aspeed_i2c_bus *bus, /* If slave has already been registered, re-enable it. */ if (bus->slave) __aspeed_i2c_reg_slave(bus, bus->slave->addr); + bus->slave_state = ASPEED_I2C_SLAVE_INACTIVE; #endif /* CONFIG_I2C_SLAVE */ /* Set interrupt generation of I2C controller */
When the `CONFIG_I2C_SLAVE` option is enabled and the device operates as a slave, a situation arises where the master sends a START signal without the accompanying STOP signal. This action results in a persistent I2C bus timeout. The core issue stems from the fact that the i2c controller remains in a slave read state without a timeout mechanism. As a consequence, the bus perpetually experiences timeouts. In this case, the i2c bus will be reset, but the slave_state reset is missing. Fixes: fee465150b45 ("i2c: aspeed: Reset the i2c controller when timeout occurs") Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com> --- drivers/i2c/busses/i2c-aspeed.c | 1 + 1 file changed, 1 insertion(+) -- 2.30.2