mbox series

[v5,00/19] remoteproc: Add support for detaching a remote processor

Message ID 20210211234627.2669674-1-mathieu.poirier@linaro.org
Headers show
Series remoteproc: Add support for detaching a remote processor | expand

Message

Mathieu Poirier Feb. 11, 2021, 11:46 p.m. UTC
Following the work done here [1], this set provides support for the
remoteproc core to release resources associated with a remote processor
without having to switch it off. That way a platform driver can be removed
or the application processor power cycled while the remote processor is
still operating.

Modifications for this revision are detailed in the changelog of each patch but
the main enhancement is the setup of a clean resource table when a remote
processor is detached from.

I have tested scenarios where the processor is detached and re-attached when
booted from an external entity and the remoteproc core.  I was also able
to confirm that removing the platform driver of a detached remote processor
works.  Re-attaching the remote processor after re-inserting the platorm driver
also works properly.

Applies cleanly on rproc-next (43d3f2c715ce). 

Thanks,
Mathieu

Arnaud POULIQUEN (1):
  remoteproc: stm32: Move memory parsing to rproc_ops

Mathieu Poirier (18):
  dt-bindings: remoteproc: Add bindind to support autonomous processors
  remoteproc: Re-check state in rproc_shutdown()
  remoteproc: Remove useless check in rproc_del()
  remoteproc: Rename function rproc_actuate()
  remoteproc: Add new RPROC_ATTACHED state
  remoteproc: Properly represent the attached state
  remoteproc: Add new get_loaded_rsc_table() to rproc_ops
  remoteproc: stm32: Move resource table setup to rproc_ops
  remoteproc: Add new detach() remoteproc operation
  remoteproc: Introduce function __rproc_detach()
  remoteproc: Introduce function rproc_detach()
  remoteproc: Properly deal with the resource table
  remoteproc: Add return value to function rproc_shutdown()
  remoteproc: Properly deal with a kernel panic when attached
  remoteproc: Properly deal with a stop request when attached
  remoteproc: Properly deal with a start request when attached
  remoteproc: Properly deal with detach request
  remoteproc: Refactor rproc delete and cdev release path

 .../bindings/remoteproc/remoteproc-core.yaml  |  27 ++
 drivers/remoteproc/remoteproc_cdev.c          |  32 +-
 drivers/remoteproc/remoteproc_core.c          | 307 ++++++++++++++++--
 drivers/remoteproc/remoteproc_elf_loader.c    |  24 +-
 drivers/remoteproc/remoteproc_internal.h      |  10 +
 drivers/remoteproc/remoteproc_sysfs.c         |  20 +-
 drivers/remoteproc/stm32_rproc.c              | 168 +++++-----
 include/linux/remoteproc.h                    |  27 +-
 8 files changed, 465 insertions(+), 150 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/remoteproc/remoteproc-core.yaml

Comments

Arnaud POULIQUEN Feb. 15, 2021, 1:10 p.m. UTC | #1
Hi Mathieu,

On 2/12/21 12:46 AM, Mathieu Poirier wrote:
> Add a new get_loaded_rsc_table() operation in order to support
> scenarios where the remoteproc core has booted a remote processor
> and detaches from it.  When re-attaching to the remote processor,
> the core needs to know where the resource table has been placed
> in memory.
> 
> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> ---
> New for V5:
> - Added function rproc_set_loaded_rsc_table() to keep rproc_attach() clean.
> - Setting ->cached_table, ->table_ptr and ->table_sz in the remoteproc core
>   rather than the platform drivers.
> ---
>  drivers/remoteproc/remoteproc_core.c     | 35 ++++++++++++++++++++++++
>  drivers/remoteproc/remoteproc_internal.h | 10 +++++++
>  include/linux/remoteproc.h               |  6 +++-
>  3 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index e6606d10a4c8..741bc20de437 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1537,6 +1537,35 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
>  	return ret;
>  }
>  
> +static int rproc_set_loaded_rsc_table(struct rproc *rproc)
> +{
> +	struct resource_table *table_ptr;
> +	struct device *dev = &rproc->dev;
> +	size_t table_sz;
> +	int ret;
> +
> +	table_ptr = rproc_get_loaded_rsc_table(rproc, &table_sz);
> +	if (IS_ERR_OR_NULL(table_ptr)) {
> +		if (!table_ptr)
> +			ret = -EINVAL;

I did few tests on this showing that this approach does not cover all use cases.

The first one is a firmware without resource table. In this case table_ptr
should be null, or we have to consider the -ENOENT error as a non error usecase.

The second one, more tricky, is a firmware started by the remoteproc framework.
In this case the resource table address is retrieved from the ELF file by the
core part.
So if we detach and reattach rproc_get_loaded_rsc_table cannot return the
address. Look to me that we should have also an alocation of the clean_table in
rproc_start and then to keep the memory allocated until a shutdown.

That said regarding the complexity to re-attach, I wonder if it would not be
better to focus first on a simple detach, and address the reattachment in a
separate series, to move forward in stages.

Regards,
Arnaud

> +		else
> +			ret = PTR_ERR(table_ptr);
> +
> +		dev_err(dev, "can't load resource table: %d\n", ret);
> +		return ret;
> +	}
> +
> +	/*
> +	 * The resource table is already loaded in device memory, no need
> +	 * to work with a cached table.
> +	 */
> +	rproc->cached_table = NULL;
> +	rproc->table_ptr = table_ptr;
> +	rproc->table_sz = table_sz;
> +
> +	return 0;
> +}
> +
>  /*
>   * Attach to remote processor - similar to rproc_fw_boot() but without
>   * the steps that deal with the firmware image.
> @@ -1556,6 +1585,12 @@ static int rproc_attach(struct rproc *rproc)
>  		return ret;
>  	}
>  
> +	ret = rproc_set_loaded_rsc_table(rproc);
> +	if (ret) {
> +		dev_err(dev, "can't load resource table: %d\n", ret);
> +		goto disable_iommu;
> +	}
> +
>  	/* reset max_notifyid */
>  	rproc->max_notifyid = -1;
>  
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index c34002888d2c..4f73aac7e60d 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -177,6 +177,16 @@ struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
>  	return NULL;
>  }
>  
> +static inline
> +struct resource_table *rproc_get_loaded_rsc_table(struct rproc *rproc,
> +						  size_t *size)
> +{
> +	if (rproc->ops->get_loaded_rsc_table)
> +		return rproc->ops->get_loaded_rsc_table(rproc, size);
> +
> +	return NULL;
> +}
> +
>  static inline
>  bool rproc_u64_fit_in_size_t(u64 val)
>  {
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index 6b0a0ed30a03..51538a7d120d 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -368,7 +368,9 @@ enum rsc_handling_status {
>   * RSC_HANDLED if resource was handled, RSC_IGNORED if not handled and a
>   * negative value on error
>   * @load_rsc_table:	load resource table from firmware image
> - * @find_loaded_rsc_table: find the loaded resouce table
> + * @find_loaded_rsc_table: find the loaded resource table from firmware image
> + * @get_loaded_rsc_table: get resource table installed in memory
> + *			  by external entity
>   * @load:		load firmware to memory, where the remote processor
>   *			expects to find it
>   * @sanity_check:	sanity check the fw image
> @@ -390,6 +392,8 @@ struct rproc_ops {
>  			  int offset, int avail);
>  	struct resource_table *(*find_loaded_rsc_table)(
>  				struct rproc *rproc, const struct firmware *fw);
> +	struct resource_table *(*get_loaded_rsc_table)(
> +				struct rproc *rproc, size_t *size);
>  	int (*load)(struct rproc *rproc, const struct firmware *fw);
>  	int (*sanity_check)(struct rproc *rproc, const struct firmware *fw);
>  	u64 (*get_boot_addr)(struct rproc *rproc, const struct firmware *fw);
>
Arnaud POULIQUEN Feb. 15, 2021, 1:19 p.m. UTC | #2
On 2/12/21 12:46 AM, Mathieu Poirier wrote:
> If it is possible to detach the remote processor, keep an untouched
> copy of the resource table.  That way we can start from the same
> resource table without having to worry about original values or what
> elements the startup code has changed when re-attaching to the remote
> processor.
> 
> Reported-by: Arnaud POULIQUEN <arnaud.pouliquen@foss.st.com>
> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> ---
>  drivers/remoteproc/remoteproc_core.c       | 70 ++++++++++++++++++++++
>  drivers/remoteproc/remoteproc_elf_loader.c | 24 +++++++-
>  include/linux/remoteproc.h                 |  3 +
>  3 files changed, 95 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 660dcc002ff6..9a77cb6d6470 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1527,7 +1527,9 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
>  clean_up_resources:
>  	rproc_resource_cleanup(rproc);
>  	kfree(rproc->cached_table);
> +	kfree(rproc->clean_table);
>  	rproc->cached_table = NULL;
> +	rproc->clean_table = NULL;
>  	rproc->table_ptr = NULL;
>  unprepare_rproc:
>  	/* release HW resources if needed */
> @@ -1555,6 +1557,23 @@ static int rproc_set_loaded_rsc_table(struct rproc *rproc)
>  		return ret;
>  	}
>  
> +	/*
> +	 * If it is possible to detach the remote processor, keep an untouched
> +	 * copy of the resource table.  That way we can start fresh again when
> +	 * the remote processor is re-attached, that is:
> +	 *
> +	 *	DETACHED -> ATTACHED -> DETACHED -> ATTACHED
> +	 *
> +	 * A clean copy of the table is also taken in rproc_elf_load_rsc_table()
> +	 * for cases where the remote processor is booted by the remoteproc
> +	 * core and later detached from.
> +	 */
> +	if (rproc->ops->detach) {
> +		rproc->clean_table = kmemdup(table_ptr, table_sz, GFP_KERNEL);
> +		if (!rproc->clean_table)
> +			return -ENOMEM;
> +	}
> +
>  	/*
>  	 * The resource table is already loaded in device memory, no need
>  	 * to work with a cached table.
> @@ -1566,6 +1585,40 @@ static int rproc_set_loaded_rsc_table(struct rproc *rproc)
>  	return 0;
>  }
>  
> +static int rproc_reset_loaded_rsc_table(struct rproc *rproc)
> +{
> +	/*
> +	 * In order to detach() from a remote processor a clean resource table
> +	 * _must_ have been allocated at boot time, either from rproc_fw_boot()
> +	 * or from rproc_attach().  If one isn't present something went really
> +	 * wrong and we must complain.
> +	 */
> +	if (WARN_ON(!rproc->clean_table))
> +		return -EINVAL;
> +
> +	/*
> +	 * Install the clean resource table where the firmware, i.e
> +	 * rproc_get_loaded_rsc_table(), expects it.
> +	 */
> +	memcpy(rproc->table_ptr, rproc->clean_table, rproc->table_sz);
> +
> +	/*
> +	 * If the remote processors was started by the core then a cached_table
> +	 * is present and we must follow the same cleanup sequence as we would
> +	 * for a shutdown().  As it is in rproc_stop(), use the cached resource
> +	 * table for the rest of the detach process since ->table_ptr will
> +	 * become invalid as soon as carveouts are released in
> +	 * rproc_resource_cleanup().
> +	 *
> +	 * If the remote processor was started by an external entity the
> +	 * cached_table is NULL and the rest of the cleanup code in
> +	 * rproc_free_vring() can deal with that.
> +	 */
> +	rproc->table_ptr = rproc->cached_table;
> +
> +	return 0;
> +}
> +
>  /*
>   * Attach to remote processor - similar to rproc_fw_boot() but without
>   * the steps that deal with the firmware image.
> @@ -1947,7 +2000,10 @@ void rproc_shutdown(struct rproc *rproc)
>  
>  	/* Free the copy of the resource table */
>  	kfree(rproc->cached_table);
> +	/* Free the clean resource table */
> +	kfree(rproc->clean_table);
>  	rproc->cached_table = NULL;
> +	rproc->clean_table = NULL;
>  	rproc->table_ptr = NULL;
>  out:
>  	mutex_unlock(&rproc->lock);
> @@ -2000,6 +2056,16 @@ int rproc_detach(struct rproc *rproc)
>  		goto out;
>  	}
>  
> +	/*
> +	 * Install a clean resource table for re-attach while
> +	 * rproc->table_ptr is still valid.
> +	 */
> +	ret = rproc_reset_loaded_rsc_table(rproc);
> +	if (ret) {
> +		atomic_inc(&rproc->power);
> +		goto out;
> +	}
> +

Here you rewrite the initial values in the loaded resource table but then
rproc_resource_cleanup will clean up the resource table.
That can lead to an overwrite, and perhaps to unexpected memory access, as
DA and PA addresses are reinitialized.
(e.g call of rproc_vdev_release that will overwrite the resource table)

And because the vdev release is asynchronous, probably better to reinitialize
the resource table on attach or in rproc_handle_resources.

Regards,
Arnaud

>  	/* clean up all acquired resources */
>  	rproc_resource_cleanup(rproc);
>  
> @@ -2008,10 +2074,14 @@ int rproc_detach(struct rproc *rproc)
>  
>  	rproc_disable_iommu(rproc);
>  
> +	/* Free the copy of the resource table */
> +	kfree(rproc->cached_table);
>  	/* Follow the same sequence as in rproc_shutdown() */
>  	kfree(rproc->cached_table);
>  	rproc->cached_table = NULL;
> +	rproc->clean_table = NULL;
>  	rproc->table_ptr = NULL;
> +
>  out:
>  	mutex_unlock(&rproc->lock);
>  	return ret;
> diff --git a/drivers/remoteproc/remoteproc_elf_loader.c b/drivers/remoteproc/remoteproc_elf_loader.c
> index df68d87752e4..aa09782c932d 100644
> --- a/drivers/remoteproc/remoteproc_elf_loader.c
> +++ b/drivers/remoteproc/remoteproc_elf_loader.c
> @@ -17,10 +17,11 @@
>  
>  #define pr_fmt(fmt)    "%s: " fmt, __func__
>  
> -#include <linux/module.h>
> +#include <linux/elf.h>
>  #include <linux/firmware.h>
> +#include <linux/module.h>
>  #include <linux/remoteproc.h>
> -#include <linux/elf.h>
> +#include <linux/slab.h>
>  
>  #include "remoteproc_internal.h"
>  #include "remoteproc_elf_helpers.h"
> @@ -338,6 +339,25 @@ int rproc_elf_load_rsc_table(struct rproc *rproc, const struct firmware *fw)
>  	if (!rproc->cached_table)
>  		return -ENOMEM;
>  
> +	/*
> +	 * If it is possible to detach the remote processor, keep an untouched
> +	 * copy of the resource table.  That way we can start fresh again when
> +	 * the remote processor is re-attached, that is:
> +	 *
> +	 *	OFFLINE -> RUNNING -> DETACHED -> ATTACHED
> +	 *
> +	 * A clean copy of the table is also taken in
> +	 * rproc_set_loaded_rsc_table() for cases where the remote processor is
> +	 * booted by an external entity and later detached from.
> +	 */
> +	if (rproc->ops->detach) {
> +		rproc->clean_table = kmemdup(table, tablesz, GFP_KERNEL);
> +		if (!rproc->clean_table) {
> +			kfree(rproc->cached_table);
> +			return -ENOMEM;
> +		}
> +	}
> +
>  	rproc->table_ptr = rproc->cached_table;
>  	rproc->table_sz = tablesz;
>  
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index e1c843c19cc6..e5f52a12a650 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -514,6 +514,8 @@ struct rproc_dump_segment {
>   * @recovery_disabled: flag that state if recovery was disabled
>   * @max_notifyid: largest allocated notify id.
>   * @table_ptr: pointer to the resource table in effect
> + * @clean_table: copy of the resource table without modifications.  Used
> + *		 when a remote processor is attached or detached from the core
>   * @cached_table: copy of the resource table
>   * @table_sz: size of @cached_table
>   * @has_iommu: flag to indicate if remote processor is behind an MMU
> @@ -550,6 +552,7 @@ struct rproc {
>  	bool recovery_disabled;
>  	int max_notifyid;
>  	struct resource_table *table_ptr;
> +	struct resource_table *clean_table;
>  	struct resource_table *cached_table;
>  	size_t table_sz;
>  	bool has_iommu;
>
Mathieu Poirier Feb. 17, 2021, 9:22 p.m. UTC | #3
On Mon, Feb 15, 2021 at 02:10:10PM +0100, Arnaud POULIQUEN wrote:
> Hi Mathieu,
> 
> On 2/12/21 12:46 AM, Mathieu Poirier wrote:
> > Add a new get_loaded_rsc_table() operation in order to support
> > scenarios where the remoteproc core has booted a remote processor
> > and detaches from it.  When re-attaching to the remote processor,
> > the core needs to know where the resource table has been placed
> > in memory.
> > 
> > Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> > ---
> > New for V5:
> > - Added function rproc_set_loaded_rsc_table() to keep rproc_attach() clean.
> > - Setting ->cached_table, ->table_ptr and ->table_sz in the remoteproc core
> >   rather than the platform drivers.
> > ---
> >  drivers/remoteproc/remoteproc_core.c     | 35 ++++++++++++++++++++++++
> >  drivers/remoteproc/remoteproc_internal.h | 10 +++++++
> >  include/linux/remoteproc.h               |  6 +++-
> >  3 files changed, 50 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index e6606d10a4c8..741bc20de437 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1537,6 +1537,35 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> >  	return ret;
> >  }
> >  
> > +static int rproc_set_loaded_rsc_table(struct rproc *rproc)
> > +{
> > +	struct resource_table *table_ptr;
> > +	struct device *dev = &rproc->dev;
> > +	size_t table_sz;
> > +	int ret;
> > +
> > +	table_ptr = rproc_get_loaded_rsc_table(rproc, &table_sz);
> > +	if (IS_ERR_OR_NULL(table_ptr)) {
> > +		if (!table_ptr)
> > +			ret = -EINVAL;
> 
> I did few tests on this showing that this approach does not cover all use cases.
> 
> The first one is a firmware without resource table. In this case table_ptr
> should be null, or we have to consider the -ENOENT error as a non error usecase.
>

Right, I'll provision for those cases.
 
> The second one, more tricky, is a firmware started by the remoteproc framework.
> In this case the resource table address is retrieved from the ELF file by the
> core part.

Correct.

> So if we detach and reattach rproc_get_loaded_rsc_table cannot return the
> address. Look to me that we should have also an alocation of the clean_table in
> rproc_start and then to keep the memory allocated until a shutdown.

I assumed the address of the resource table found in the ELF image was the same
as the one known by the platform driver.  In hindsight I realise the platform
driver may not know that address.

> 
> That said regarding the complexity to re-attach, I wonder if it would not be
> better to focus first on a simple detach, and address the reattachment in a
> separate series, to move forward in stages.

I agree that OFFLINE -> RUNNING -> DETACHED -> ATTACHED is introducing some
complexity related to the management of the resource table that where not
expected.  We could concentrate on a simple detach and see where that takes us.
It would also mean to get rid of the "autonomous-on-core-shutdown" DT binding. 

Thanks,
Mathieu

> 
> Regards,
> Arnaud
> 
> > +		else
> > +			ret = PTR_ERR(table_ptr);
> > +
> > +		dev_err(dev, "can't load resource table: %d\n", ret);
> > +		return ret;
> > +	}
> > +
> > +	/*
> > +	 * The resource table is already loaded in device memory, no need
> > +	 * to work with a cached table.
> > +	 */
> > +	rproc->cached_table = NULL;
> > +	rproc->table_ptr = table_ptr;
> > +	rproc->table_sz = table_sz;
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Attach to remote processor - similar to rproc_fw_boot() but without
> >   * the steps that deal with the firmware image.
> > @@ -1556,6 +1585,12 @@ static int rproc_attach(struct rproc *rproc)
> >  		return ret;
> >  	}
> >  
> > +	ret = rproc_set_loaded_rsc_table(rproc);
> > +	if (ret) {
> > +		dev_err(dev, "can't load resource table: %d\n", ret);
> > +		goto disable_iommu;
> > +	}
> > +
> >  	/* reset max_notifyid */
> >  	rproc->max_notifyid = -1;
> >  
> > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> > index c34002888d2c..4f73aac7e60d 100644
> > --- a/drivers/remoteproc/remoteproc_internal.h
> > +++ b/drivers/remoteproc/remoteproc_internal.h
> > @@ -177,6 +177,16 @@ struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> >  	return NULL;
> >  }
> >  
> > +static inline
> > +struct resource_table *rproc_get_loaded_rsc_table(struct rproc *rproc,
> > +						  size_t *size)
> > +{
> > +	if (rproc->ops->get_loaded_rsc_table)
> > +		return rproc->ops->get_loaded_rsc_table(rproc, size);
> > +
> > +	return NULL;
> > +}
> > +
> >  static inline
> >  bool rproc_u64_fit_in_size_t(u64 val)
> >  {
> > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> > index 6b0a0ed30a03..51538a7d120d 100644
> > --- a/include/linux/remoteproc.h
> > +++ b/include/linux/remoteproc.h
> > @@ -368,7 +368,9 @@ enum rsc_handling_status {
> >   * RSC_HANDLED if resource was handled, RSC_IGNORED if not handled and a
> >   * negative value on error
> >   * @load_rsc_table:	load resource table from firmware image
> > - * @find_loaded_rsc_table: find the loaded resouce table
> > + * @find_loaded_rsc_table: find the loaded resource table from firmware image
> > + * @get_loaded_rsc_table: get resource table installed in memory
> > + *			  by external entity
> >   * @load:		load firmware to memory, where the remote processor
> >   *			expects to find it
> >   * @sanity_check:	sanity check the fw image
> > @@ -390,6 +392,8 @@ struct rproc_ops {
> >  			  int offset, int avail);
> >  	struct resource_table *(*find_loaded_rsc_table)(
> >  				struct rproc *rproc, const struct firmware *fw);
> > +	struct resource_table *(*get_loaded_rsc_table)(
> > +				struct rproc *rproc, size_t *size);
> >  	int (*load)(struct rproc *rproc, const struct firmware *fw);
> >  	int (*sanity_check)(struct rproc *rproc, const struct firmware *fw);
> >  	u64 (*get_boot_addr)(struct rproc *rproc, const struct firmware *fw);
> >