Message ID | 20230126163008.3676950-1-andersson@kernel.org |
---|---|
State | New |
Headers | show |
Series | [GIT,PULL] Qualcomm driver updates for v6.3 | expand |
On Thu, Jan 26, 2023, at 17:30, Bjorn Andersson wrote: > The following changes since commit 6049aae52392539e505bfb8ccbcff3c26f1d2f0b: > > ---------------------------------------------------------------- > Qualcomm driver updates for v6.3 > > This introduces a new driver for the Data Capture and Compare block, > which provides a mechanism for capturing hardware state (access MMIO > registers) either upon request of triggered automatically e.g. upon a > watchdog bite, for post mortem analysis. > > The remote filesystem memory share driver gains support for having its > memory bound to more than a single VMID. > > The SCM driver gains the minimal support needed to support a new > mechanism where secure world can put calls on hold and later request > them to be retried. > > Support for the new SA8775P platform is added to rpmhpd, QDU1000 is > added to the SCM driver and a long list of platforms are added to the > socinfo driver. Support for socinfo data revision 16 is also introduced. > > Lastly a driver to program the ramp controller in MSM8976 is introduced. Hi Bjorn, I don't feel comfortable merging the DCC driver through drivers/soc/ at this point: This is the first time I see the driver and it introduces a complex user space ABI that I have no time to review as part of the merge process. I usually try to avoid adding any custom user space interfaces in drivers/soc, as these tend to be things that end up being similar to other chips and need a generic interface. In particular I don't see an explanation about how the new interface relates to the established drivers/hwtracing/ subsystem and why it shouldn't be part of that (adding the hwtracing and coresight maintainers to Cc in case they have looked at this already). Can you send an updated pull request that leaves out the DCC driver until we have clarified these points? Arnd
On Mon, Jan 30, 2023 at 04:18:45PM +0100, Arnd Bergmann wrote: > On Thu, Jan 26, 2023, at 17:30, Bjorn Andersson wrote: > > The following changes since commit 6049aae52392539e505bfb8ccbcff3c26f1d2f0b: > > > > ---------------------------------------------------------------- > > Qualcomm driver updates for v6.3 > > > > This introduces a new driver for the Data Capture and Compare block, > > which provides a mechanism for capturing hardware state (access MMIO > > registers) either upon request of triggered automatically e.g. upon a > > watchdog bite, for post mortem analysis. > > > > The remote filesystem memory share driver gains support for having its > > memory bound to more than a single VMID. > > > > The SCM driver gains the minimal support needed to support a new > > mechanism where secure world can put calls on hold and later request > > them to be retried. > > > > Support for the new SA8775P platform is added to rpmhpd, QDU1000 is > > added to the SCM driver and a long list of platforms are added to the > > socinfo driver. Support for socinfo data revision 16 is also introduced. > > > > Lastly a driver to program the ramp controller in MSM8976 is introduced. > > Hi Bjorn, > > I don't feel comfortable merging the DCC driver through drivers/soc/ > at this point: This is the first time I see the driver and it introduces > a complex user space ABI that I have no time to review as part of the > merge process. > The DCC driver has made 22 versions over the last 23 months, but now that I look back I do agree that the recipients list has been too limited. Further more, due to the complexity of the ABI I steered this towards debugfs, with the explicit mentioning that we will change the interface if needed - in particular since not a lot of review interest has been shown... > I usually try to avoid adding any custom user space interfaces > in drivers/soc, as these tend to be things that end up being > similar to other chips and need a generic interface. > I have no concern with that, but I'm not able to suggest an existing subsystem where this would fit. > In particular I don't see an explanation about how the new interface > relates to the established drivers/hwtracing/ subsystem and why it > shouldn't be part of that (adding the hwtracing and coresight > maintainers to Cc in case they have looked at this already). > To my knowledge the hwtracing framework is an interface for enabling/disabling traces and then you get a stream of trace data out of it. With DCC you essentially write a small "program" to be run at the time of an exception (or triggered manually). When the "program" is run it acquire data from mmio interfaces and stores data in sram, which can then be retrieved - possibly after the fatal reset of the system. Perhaps I've misunderstood the hwtracing framework, please help me steer Souradeep towards a subsystem you find suitable for this functionality. > Can you send an updated pull request that leaves out the > DCC driver until we have clarified these points? > I will send a new pull request, with the driver addition reverted. I don't think there's anything controversial with the DT binding, so let's keep that and the dts nodes (we can move the yaml if a better home is found...). Regards, Bjorn
On Mon, Jan 30, 2023, at 23:24, Bjorn Andersson wrote: > On Mon, Jan 30, 2023 at 04:18:45PM +0100, Arnd Bergmann wrote: >> On Thu, Jan 26, 2023, at 17:30, Bjorn Andersson wrote: >> >> I don't feel comfortable merging the DCC driver through drivers/soc/ >> at this point: This is the first time I see the driver and it introduces >> a complex user space ABI that I have no time to review as part of the >> merge process. >> > > The DCC driver has made 22 versions over the last 23 months, but now > that I look back I do agree that the recipients list has been too > limited. > > Further more, due to the complexity of the ABI I steered this towards > debugfs, with the explicit mentioning that we will change the interface > if needed - in particular since not a lot of review interest has > been shown... I'm sorry to hear this has already taken so long, I understand it's frustrating to come up with a good userspace interface for any of this. >> I usually try to avoid adding any custom user space interfaces >> in drivers/soc, as these tend to be things that end up being >> similar to other chips and need a generic interface. >> > > I have no concern with that, but I'm not able to suggest an existing > subsystem where this would fit. > >> In particular I don't see an explanation about how the new interface >> relates to the established drivers/hwtracing/ subsystem and why it >> shouldn't be part of that (adding the hwtracing and coresight >> maintainers to Cc in case they have looked at this already). >> > > To my knowledge the hwtracing framework is an interface for > enabling/disabling traces and then you get a stream of trace data out of > it. > > With DCC you essentially write a small "program" to be run at the time > of an exception (or triggered manually). When the "program" is run it > acquire data from mmio interfaces and stores data in sram, which can > then be retrieved - possibly after the fatal reset of the system. > > Perhaps I've misunderstood the hwtracing framework, please help me steer > Souradeep towards a subsystem you find suitable for this functionality. I'm also not too familiar with tracing infrastructure and was hoping that the coresight maintainers (Mathieu, Suzuki, Mike and Leo) would have some suggestions here. My initial guess was that in both cases, you have hardware support that is abstracted by the kernel in order to have a user interface that can be consumed by the 'perf' tool. I probably misinterpreted the part about the crash based trigger here, as my original (brief) reading was that the data snapshot could be triggered by any kind of event in the machine, which would make this useful as a more general way of tracing the state of devices at runtime. Can you describe how the crash trigger works, and if this would be usable with other random hardware events aside from an explicit software event? I've added the perf maintainers to Cc as well now, for reference, the now reverted commit is at https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git/commit/?h=drivers-for-6.3&id=4cbe60cf5ad62 and it contains both the implementation and the documentation of the debugfs interface. One bit I don't see is the user space side. Is there a patch for perf as well, or is the idea to use a custom tool for this? How does userspace know which MMIO addresses are even valid here? If the possible use is purely for saving some state across a reboot, as opposed to other events, I wonder if there is a good way to integrate it into the fs/pstore/ code, which already has a way to multiplex various kinds of input (log buffer, ftrace call chain, userspace strings, ...) into various kinds of persistent buffers (sram, blockdev, mtd, efivars, ...) with the purpose of helping analyze the state after a reboot. >> Can you send an updated pull request that leaves out the >> DCC driver until we have clarified these points? >> > > I will send a new pull request, with the driver addition reverted. I > don't think there's anything controversial with the DT binding, so let's > keep that and the dts nodes (we can move the yaml if a better home is > found...) Right, this is fine. I merged the first pull request after I saw the revert in the second drivers branch, though I did not see a pull request from you that replaced the first one with just the revert added as I had expected. Also, I see that patchwork never noticed me merging the PR, so you did not get the automated email. Maybe you can double check the contents of the soc/drivers branch to see if the contents are what you expect them to be. Arnd
On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote: [...] > > To my knowledge the hwtracing framework is an interface for > > enabling/disabling traces and then you get a stream of trace data out of > > it. > > > > With DCC you essentially write a small "program" to be run at the time > > of an exception (or triggered manually). When the "program" is run it > > acquire data from mmio interfaces and stores data in sram, which can > > then be retrieved - possibly after the fatal reset of the system. > > > > Perhaps I've misunderstood the hwtracing framework, please help me steer > > Souradeep towards a subsystem you find suitable for this functionality. > > I'm also not too familiar with tracing infrastructure and was hoping > that the coresight maintainers (Mathieu, Suzuki, Mike and Leo) > would have some suggestions here. My initial guess was that in > both cases, you have hardware support that is abstracted by the > kernel in order to have a user interface that can be consumed > by the 'perf' tool. My understanding is hwtracing provides a common framework for STM so that different tracing IPs (like Intel_th and Arm CoreSight) can register STM module into this framework. The framework code is placed in: linux/drivers/hwtracing/stm. Now kernel doesn't provide a general framework for all hardware tracing IPs, e.g. Arm CoreSight has its own framework to manage tracing components and creating links with sinks. Simply to say, we can place DCC driver in linux/drivers/hwtracing folder (like Hisilicon's ptt driver), but we have no common framework for it to use. Based on reading DCC's driver, seems to me it's more like a bus tracing module rather than a uncore PMU. I found the driver does not support interrupt, I am not sure this is a hardware limitation or just software doesn't implement the interrupt handling, without interrupt, it would be difficult for using DCC for profiling. If we register DCC into perf framework, the good thing is DCC can use perf framework (e.g. perf's configs) as its user space interface, but it's still not clear for me how to capture the DCC trace data (no interrupt and not relevant with process task switching). [...] > If the possible use is purely for saving some state across > a reboot, as opposed to other events, I wonder if there is > a good way to integrate it into the fs/pstore/ code, which > already has a way to multiplex various kinds of input (log > buffer, ftrace call chain, userspace strings, ...) into > various kinds of persistent buffers (sram, blockdev, mtd, > efivars, ...) with the purpose of helping analyze the > state after a reboot. Good point! I understand pstore/ramoops is somehow like a sink which routes the tracing data (software tracing data but not hadware tracing data) to persistent memory. This is why we also can route these software tracing data to STM (hardware sink!). Seems to me, Arnd suggests to connect two sinks between DCC and pstore (to persistent memory). But I cannot give an example code in kernel for doing this way, sorry if I miss something. Essentially, a good user case is to keep a persistent memory for the tracing data, then after rebooting cycle we can retrieve the tracing data via user space interface (like sysfs node). Thanks, Leo
On 2/22/2023 7:13 AM, Leo Yan wrote: > On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote: > > [...] > >>> To my knowledge the hwtracing framework is an interface for >>> enabling/disabling traces and then you get a stream of trace data out of >>> it. >>> >>> With DCC you essentially write a small "program" to be run at the time >>> of an exception (or triggered manually). When the "program" is run it >>> acquire data from mmio interfaces and stores data in sram, which can >>> then be retrieved - possibly after the fatal reset of the system. >>> >>> Perhaps I've misunderstood the hwtracing framework, please help me steer >>> Souradeep towards a subsystem you find suitable for this functionality. >> >> I'm also not too familiar with tracing infrastructure and was hoping >> that the coresight maintainers (Mathieu, Suzuki, Mike and Leo) >> would have some suggestions here. My initial guess was that in >> both cases, you have hardware support that is abstracted by the >> kernel in order to have a user interface that can be consumed >> by the 'perf' tool. > > My understanding is hwtracing provides a common framework for STM so > that different tracing IPs (like Intel_th and Arm CoreSight) can > register STM module into this framework. The framework code is placed > in: linux/drivers/hwtracing/stm. > > Now kernel doesn't provide a general framework for all hardware tracing > IPs, e.g. Arm CoreSight has its own framework to manage tracing > components and creating links with sinks. > > Simply to say, we can place DCC driver in linux/drivers/hwtracing folder > (like Hisilicon's ptt driver), but we have no common framework for it to > use. > > Based on reading DCC's driver, seems to me it's more like a bus tracing > module rather than a uncore PMU. I found the driver does not support > interrupt, I am not sure this is a hardware limitation or just software > doesn't implement the interrupt handling, without interrupt, it would be > difficult for using DCC for profiling. > > If we register DCC into perf framework, the good thing is DCC can use > perf framework (e.g. perf's configs) as its user space interface, but > it's still not clear for me how to capture the DCC trace data (no > interrupt and not relevant with process task switching). > > [...] > >> If the possible use is purely for saving some state across >> a reboot, as opposed to other events, I wonder if there is >> a good way to integrate it into the fs/pstore/ code, which >> already has a way to multiplex various kinds of input (log >> buffer, ftrace call chain, userspace strings, ...) into >> various kinds of persistent buffers (sram, blockdev, mtd, >> efivars, ...) with the purpose of helping analyze the >> state after a reboot. > > Good point! > > I understand pstore/ramoops is somehow like a sink which routes the > tracing data (software tracing data but not hadware tracing data) to > persistent memory. This is why we also can route these software > tracing data to STM (hardware sink!). > > Seems to me, Arnd suggests to connect two sinks between DCC and > pstore (to persistent memory). But I cannot give an example code in > kernel for doing this way, sorry if I miss something. > > Essentially, a good user case is to keep a persistent memory for the > tracing data, then after rebooting cycle we can retrieve the tracing > data via user space interface (like sysfs node). Hi Leo/Arnd, Just wanted to let you know that the justification of not using PStore was already given in the version 1 of this patch series as below https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r PStore/Ramoops only persists across warm-reboots which is present for chrome devices but not for android ones. Also the dcc_sram contents can also be collected by going for a software trigger after loading the kernel and the dcc_sram is parsed to get the register values with the opensource parser as below https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser Pstore on the other hand can only be collected on the next reboot. Thanks, Souradeep > > Thanks, > Leo
Hi Souradeep, On Wed, Feb 22, 2023 at 04:46:07PM +0530, Souradeep Chowdhury wrote: > On 2/22/2023 7:13 AM, Leo Yan wrote: > > On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote: [...] > > > If the possible use is purely for saving some state across > > > a reboot, as opposed to other events, I wonder if there is > > > a good way to integrate it into the fs/pstore/ code, which > > > already has a way to multiplex various kinds of input (log > > > buffer, ftrace call chain, userspace strings, ...) into > > > various kinds of persistent buffers (sram, blockdev, mtd, > > > efivars, ...) with the purpose of helping analyze the > > > state after a reboot. > > > > Good point! > > > > I understand pstore/ramoops is somehow like a sink which routes the > > tracing data (software tracing data but not hadware tracing data) to > > persistent memory. This is why we also can route these software > > tracing data to STM (hardware sink!). > > > > Seems to me, Arnd suggests to connect two sinks between DCC and > > pstore (to persistent memory). But I cannot give an example code in > > kernel for doing this way, sorry if I miss something. > > > > Essentially, a good user case is to keep a persistent memory for the > > tracing data, then after rebooting cycle we can retrieve the tracing > > data via user space interface (like sysfs node). > > Hi Leo/Arnd, > > Just wanted to let you know that the justification of not using PStore was > already given in the version 1 of this patch series as below > > https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r > > PStore/Ramoops only persists across warm-reboots which is present for chrome > devices but not for android ones. Thanks for the info. Just remind a subtle difference of reboots. Besides warm reboot, kernel can reboot system after panic (see kernel command line option `panic`) and watchdog can reboot the system as well. Even though Android doesn't support warm reboot, system still can reboot on panic or by watchdog (in particular after bus lockup), pstore/ramoops also can support these cases. > Also the dcc_sram contents can > also be collected by going for a software trigger after loading the kernel > and the dcc_sram is parsed to get the register values with the > opensource parser as below > > https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser To be clear, current driver is fine for me (TBH, I didn't spend much time to read it but it's very neat after quickly went through it), I just share some info in case it's helpful for the discussion. Thanks, Leo
On 2/22/2023 5:43 PM, Leo Yan wrote: > Hi Souradeep, > > On Wed, Feb 22, 2023 at 04:46:07PM +0530, Souradeep Chowdhury wrote: >> On 2/22/2023 7:13 AM, Leo Yan wrote: >>> On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote: > > [...] > >>>> If the possible use is purely for saving some state across >>>> a reboot, as opposed to other events, I wonder if there is >>>> a good way to integrate it into the fs/pstore/ code, which >>>> already has a way to multiplex various kinds of input (log >>>> buffer, ftrace call chain, userspace strings, ...) into >>>> various kinds of persistent buffers (sram, blockdev, mtd, >>>> efivars, ...) with the purpose of helping analyze the >>>> state after a reboot. >>> >>> Good point! >>> >>> I understand pstore/ramoops is somehow like a sink which routes the >>> tracing data (software tracing data but not hadware tracing data) to >>> persistent memory. This is why we also can route these software >>> tracing data to STM (hardware sink!). >>> >>> Seems to me, Arnd suggests to connect two sinks between DCC and >>> pstore (to persistent memory). But I cannot give an example code in >>> kernel for doing this way, sorry if I miss something. >>> >>> Essentially, a good user case is to keep a persistent memory for the >>> tracing data, then after rebooting cycle we can retrieve the tracing >>> data via user space interface (like sysfs node). >> >> Hi Leo/Arnd, >> >> Just wanted to let you know that the justification of not using PStore was >> already given in the version 1 of this patch series as below >> >> https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r >> >> PStore/Ramoops only persists across warm-reboots which is present for chrome >> devices but not for android ones. > > Thanks for the info. Just remind a subtle difference of reboots. > > Besides warm reboot, kernel can reboot system after panic (see kernel > command line option `panic`) and watchdog can reboot the system as well. > > Even though Android doesn't support warm reboot, system still can reboot > on panic or by watchdog (in particular after bus lockup), pstore/ramoops > also can support these cases. So for the SoCs that doesn't support warm reboots, the DDR memory is non persistent across panics or watchdog bites in which case the PStore/Ramoops cannot be of use. > >> Also the dcc_sram contents can >> also be collected by going for a software trigger after loading the kernel >> and the dcc_sram is parsed to get the register values with the >> opensource parser as below >> >> https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser > > To be clear, current driver is fine for me (TBH, I didn't spend much > time to read it but it's very neat after quickly went through it), I > just share some info in case it's helpful for the discussion. > > Thanks, > Leo
On 2/27/2023 4:43 AM, Souradeep Chowdhury wrote: > > > On 2/22/2023 5:43 PM, Leo Yan wrote: >> Hi Souradeep, >> >> On Wed, Feb 22, 2023 at 04:46:07PM +0530, Souradeep Chowdhury wrote: >>> On 2/22/2023 7:13 AM, Leo Yan wrote: >>>> On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote: >> >> [...] >> >>>>> If the possible use is purely for saving some state across >>>>> a reboot, as opposed to other events, I wonder if there is >>>>> a good way to integrate it into the fs/pstore/ code, which >>>>> already has a way to multiplex various kinds of input (log >>>>> buffer, ftrace call chain, userspace strings, ...) into >>>>> various kinds of persistent buffers (sram, blockdev, mtd, >>>>> efivars, ...) with the purpose of helping analyze the >>>>> state after a reboot. >>>> >>>> Good point! >>>> >>>> I understand pstore/ramoops is somehow like a sink which routes the >>>> tracing data (software tracing data but not hadware tracing data) to >>>> persistent memory. This is why we also can route these software >>>> tracing data to STM (hardware sink!). >>>> >>>> Seems to me, Arnd suggests to connect two sinks between DCC and >>>> pstore (to persistent memory). But I cannot give an example code in >>>> kernel for doing this way, sorry if I miss something. >>>> >>>> Essentially, a good user case is to keep a persistent memory for the >>>> tracing data, then after rebooting cycle we can retrieve the tracing >>>> data via user space interface (like sysfs node). >>> >>> Hi Leo/Arnd, >>> >>> Just wanted to let you know that the justification of not using >>> PStore was >>> already given in the version 1 of this patch series as below >>> >>> https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r >>> >>> PStore/Ramoops only persists across warm-reboots which is present for >>> chrome >>> devices but not for android ones. >> >> Thanks for the info. Just remind a subtle difference of reboots. >> >> Besides warm reboot, kernel can reboot system after panic (see kernel >> command line option `panic`) and watchdog can reboot the system as well. >> >> Even though Android doesn't support warm reboot, system still can reboot >> on panic or by watchdog (in particular after bus lockup), pstore/ramoops >> also can support these cases. > > > So for the SoCs that doesn't support warm reboots, the DDR memory is non > persistent across panics or watchdog bites in which case the > PStore/Ramoops cannot be of use. > > >> >>> Also the dcc_sram contents can >>> also be collected by going for a software trigger after loading the >>> kernel >>> and the dcc_sram is parsed to get the register values with the >>> opensource parser as below >>> >>> https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser >> >> To be clear, current driver is fine for me (TBH, I didn't spend much >> time to read it but it's very neat after quickly went through it), I >> just share some info in case it's helpful for the discussion. What is the conclusion here? Can we pick up the DCC now if we rebase to the latest tree? It seems so far response here is that driver is fine as-is and it can be included without any changes. I want this driver discussion to be concluded since we are trying to submit it for more than 23 months (as Bjorn counted :) ). ---Trilok Soni
On 3/13/2023 9:55 AM, Trilok Soni wrote: > On 2/27/2023 4:43 AM, Souradeep Chowdhury wrote: >> >> >> On 2/22/2023 5:43 PM, Leo Yan wrote: >>> Hi Souradeep, >>> >>> On Wed, Feb 22, 2023 at 04:46:07PM +0530, Souradeep Chowdhury wrote: >>>> On 2/22/2023 7:13 AM, Leo Yan wrote: >>>>> On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote: >>> >>> [...] >>> >>>>>> If the possible use is purely for saving some state across >>>>>> a reboot, as opposed to other events, I wonder if there is >>>>>> a good way to integrate it into the fs/pstore/ code, which >>>>>> already has a way to multiplex various kinds of input (log >>>>>> buffer, ftrace call chain, userspace strings, ...) into >>>>>> various kinds of persistent buffers (sram, blockdev, mtd, >>>>>> efivars, ...) with the purpose of helping analyze the >>>>>> state after a reboot. >>>>> >>>>> Good point! >>>>> >>>>> I understand pstore/ramoops is somehow like a sink which routes the >>>>> tracing data (software tracing data but not hadware tracing data) to >>>>> persistent memory. This is why we also can route these software >>>>> tracing data to STM (hardware sink!). >>>>> >>>>> Seems to me, Arnd suggests to connect two sinks between DCC and >>>>> pstore (to persistent memory). But I cannot give an example code in >>>>> kernel for doing this way, sorry if I miss something. >>>>> >>>>> Essentially, a good user case is to keep a persistent memory for the >>>>> tracing data, then after rebooting cycle we can retrieve the tracing >>>>> data via user space interface (like sysfs node). >>>> >>>> Hi Leo/Arnd, >>>> >>>> Just wanted to let you know that the justification of not using >>>> PStore was >>>> already given in the version 1 of this patch series as below >>>> >>>> https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r >>>> >>>> PStore/Ramoops only persists across warm-reboots which is present >>>> for chrome >>>> devices but not for android ones. >>> >>> Thanks for the info. Just remind a subtle difference of reboots. >>> >>> Besides warm reboot, kernel can reboot system after panic (see kernel >>> command line option `panic`) and watchdog can reboot the system as well. >>> >>> Even though Android doesn't support warm reboot, system still can reboot >>> on panic or by watchdog (in particular after bus lockup), pstore/ramoops >>> also can support these cases. >> >> >> So for the SoCs that doesn't support warm reboots, the DDR memory is non >> persistent across panics or watchdog bites in which case the >> PStore/Ramoops cannot be of use. >> >> >>> >>>> Also the dcc_sram contents can >>>> also be collected by going for a software trigger after loading the >>>> kernel >>>> and the dcc_sram is parsed to get the register values with the >>>> opensource parser as below >>>> >>>> https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser >>> >>> To be clear, current driver is fine for me (TBH, I didn't spend much >>> time to read it but it's very neat after quickly went through it), I >>> just share some info in case it's helpful for the discussion. > > > What is the conclusion here? Can we pick up the DCC now if we rebase to > the latest tree? It seems so far response here is that driver is fine > as-is and it can be included without any changes. I want this driver > discussion to be concluded since we are trying to submit it for more > than 23 months (as Bjorn counted :) ). Bjorn and Arnd can you please check and provide feedback? I really want this driver to get merged or open issues resolved. ---Trilok Soni
On Wed, Feb 22, 2023 at 08:13:59PM +0800, Leo Yan wrote: > Hi Souradeep, > > On Wed, Feb 22, 2023 at 04:46:07PM +0530, Souradeep Chowdhury wrote: > > On 2/22/2023 7:13 AM, Leo Yan wrote: > > > On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote: > > [...] > > > > > If the possible use is purely for saving some state across > > > > a reboot, as opposed to other events, I wonder if there is > > > > a good way to integrate it into the fs/pstore/ code, which > > > > already has a way to multiplex various kinds of input (log > > > > buffer, ftrace call chain, userspace strings, ...) into > > > > various kinds of persistent buffers (sram, blockdev, mtd, > > > > efivars, ...) with the purpose of helping analyze the > > > > state after a reboot. > > > > > > Good point! > > > > > > I understand pstore/ramoops is somehow like a sink which routes the > > > tracing data (software tracing data but not hadware tracing data) to > > > persistent memory. This is why we also can route these software > > > tracing data to STM (hardware sink!). > > > > > > Seems to me, Arnd suggests to connect two sinks between DCC and > > > pstore (to persistent memory). But I cannot give an example code in > > > kernel for doing this way, sorry if I miss something. > > > > > > Essentially, a good user case is to keep a persistent memory for the > > > tracing data, then after rebooting cycle we can retrieve the tracing > > > data via user space interface (like sysfs node). > > > > Hi Leo/Arnd, > > > > Just wanted to let you know that the justification of not using PStore was > > already given in the version 1 of this patch series as below > > > > https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r > > > > PStore/Ramoops only persists across warm-reboots which is present for chrome > > devices but not for android ones. > > Thanks for the info. Just remind a subtle difference of reboots. > > Besides warm reboot, kernel can reboot system after panic (see kernel > command line option `panic`) and watchdog can reboot the system as well. > > Even though Android doesn't support warm reboot, system still can reboot > on panic or by watchdog (in particular after bus lockup), pstore/ramoops > also can support these cases. > The fact that DDR content isn't maintained across a typical reboot is certainly causing issues for using pstore. But afaiu, with pstore we capture various system state into the pstore and then make that available after reboot. DCC provides a mechanism for capturing hardware state (by reading register content) and storing that in DCC-specific SRAM. This can be triggered either by a HW reset, or on request from the user. As such it doesn't seem to be a conceptual fit, we perhaps could present the data as if it was pstore data, but I don't see how the reports created at runtime would fit into that model. > > Also the dcc_sram contents can > > also be collected by going for a software trigger after loading the kernel > > and the dcc_sram is parsed to get the register values with the > > opensource parser as below > > > > https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser > > To be clear, current driver is fine for me (TBH, I didn't spend much > time to read it but it's very neat after quickly went through it), I > just share some info in case it's helpful for the discussion. > My take is still that /dev/dcc looks to provide a reasonable ABI, and the control mechanism was pushed to debugfs for the purpose of enabling users to play with the interface, and improve it as necessary. Regards, Bjorn
On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote: > On Mon, Jan 30, 2023, at 23:24, Bjorn Andersson wrote: > > On Mon, Jan 30, 2023 at 04:18:45PM +0100, Arnd Bergmann wrote: > >> On Thu, Jan 26, 2023, at 17:30, Bjorn Andersson wrote: > >> > >> I don't feel comfortable merging the DCC driver through drivers/soc/ > >> at this point: This is the first time I see the driver and it introduces > >> a complex user space ABI that I have no time to review as part of the > >> merge process. > >> > > > > The DCC driver has made 22 versions over the last 23 months, but now > > that I look back I do agree that the recipients list has been too > > limited. > > > > Further more, due to the complexity of the ABI I steered this towards > > debugfs, with the explicit mentioning that we will change the interface > > if needed - in particular since not a lot of review interest has > > been shown... > > I'm sorry to hear this has already taken so long, I understand it's > frustrating to come up with a good userspace interface for any of > this. > > >> I usually try to avoid adding any custom user space interfaces > >> in drivers/soc, as these tend to be things that end up being > >> similar to other chips and need a generic interface. > >> > > > > I have no concern with that, but I'm not able to suggest an existing > > subsystem where this would fit. > > > >> In particular I don't see an explanation about how the new interface > >> relates to the established drivers/hwtracing/ subsystem and why it > >> shouldn't be part of that (adding the hwtracing and coresight > >> maintainers to Cc in case they have looked at this already). > >> > > > > To my knowledge the hwtracing framework is an interface for > > enabling/disabling traces and then you get a stream of trace data out of > > it. > > > > With DCC you essentially write a small "program" to be run at the time > > of an exception (or triggered manually). When the "program" is run it > > acquire data from mmio interfaces and stores data in sram, which can > > then be retrieved - possibly after the fatal reset of the system. > > > > Perhaps I've misunderstood the hwtracing framework, please help me steer > > Souradeep towards a subsystem you find suitable for this functionality. > > I'm also not too familiar with tracing infrastructure and was hoping > that the coresight maintainers (Mathieu, Suzuki, Mike and Leo) > would have some suggestions here. My initial guess was that in > both cases, you have hardware support that is abstracted by the > kernel in order to have a user interface that can be consumed > by the 'perf' tool. I probably misinterpreted the part about the > crash based trigger here, as my original (brief) reading was that > the data snapshot could be triggered by any kind of event in > the machine, which would make this useful as a more general > way of tracing the state of devices at runtime. Can you describe > how the crash trigger works, and if this would be usable with > other random hardware events aside from an explicit software > event? > > I've added the perf maintainers to Cc as well now, for reference, > the now reverted commit is at > https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git/commit/?h=drivers-for-6.3&id=4cbe60cf5ad62 > and it contains both the implementation and the documentation > of the debugfs interface. > > One bit I don't see is the user space side. Is there a patch for > perf as well, or is the idea to use a custom tool for this? How > does userspace know which MMIO addresses are even valid here? > > If the possible use is purely for saving some state across > a reboot, as opposed to other events, I wonder if there is > a good way to integrate it into the fs/pstore/ code, which > already has a way to multiplex various kinds of input (log > buffer, ftrace call chain, userspace strings, ...) into > various kinds of persistent buffers (sram, blockdev, mtd, > efivars, ...) with the purpose of helping analyze the > state after a reboot. > Iiuc pstore provides a place to store system state for analysis after a reboot, but DCC provides essentially register dumps on demand - with the system reset being a special case trigger. So I think it would look neat to expose the DCC data alongside other pstore data (except for the mentioned issues with ramoops not working on most Qualcomm devices), but when the reboot happens DCC captures data in the DCC SRAM, not in the pstore (whatever backing might be used). So post boot, something would need to get the DCC data into the pstore. To me this sounds in conflict with the pstore design. Further more, with the reboot trigger being the special case, we'd need to amend the pstore state in runtime to capture the case where the user requested the DCC to capture the state. One idea that I was looking at was to trigger a devcoredump as a way to get the data out of the kernel, instead of a new device node. But it doesn't seem to fit very well with existing use cases, and I haven't used DCC sufficiently - given that it doesn't exist upstream... We made significant changes to the control interface through the review process, I think we have something that looks reasonable now, but I picked the patches under the premise that it's unstable and in debugfs, and exposing the tool to users could lead to more interest in polishing it. > >> Can you send an updated pull request that leaves out the > >> DCC driver until we have clarified these points? > >> > > > > I will send a new pull request, with the driver addition reverted. I > > don't think there's anything controversial with the DT binding, so let's > > keep that and the dts nodes (we can move the yaml if a better home is > > found...) > > Right, this is fine. I merged the first pull request after I saw the > revert in the second drivers branch, though I did not see a pull request > from you that replaced the first one with just the revert added as > I had expected. Also, I see that patchwork never noticed me merging > the PR, so you did not get the automated email. Maybe you can double > check the contents of the soc/drivers branch to see if the contents > are what you expect them to be. > I've promised the ChromeOS team to try really hard to keep the commits in my branch stable, so I really try to avoid rebasing commits that has been present in linux-next for a while. Regards, Bjorn
On Fri, Mar 24, 2023, at 18:26, Bjorn Andersson wrote: > On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote: >> On Mon, Jan 30, 2023, at 23:24, Bjorn Andersson wrote: > > Iiuc pstore provides a place to store system state for analysis after a > reboot, but DCC provides essentially register dumps on demand - with the > system reset being a special case trigger. > > So I think it would look neat to expose the DCC data alongside other > pstore data (except for the mentioned issues with ramoops not working on > most Qualcomm devices), but when the reboot happens DCC captures data in > the DCC SRAM, not in the pstore (whatever backing might be used). So > post boot, something would need to get the DCC data into the pstore. > > To me this sounds in conflict with the pstore design. Sorry for the late reply. The case I was thinking of is making the DCC SRAM a pstore backend that might be shared with other pstore users, in place of the other targets like PSTORE_RAM, PSTORE_BLK, EFI_VARS_PSTORE and the powerpc64 nvram. This might be a bad idea for other reasons of course, e.g. if it's impossible to have more than one pstore backend active and the SRAM is too small to contain the data that would otherwise be store here, or if it's not persistent enough. > Further more, with the reboot trigger being the special case, we'd need > to amend the pstore state in runtime to capture the case where the user > requested the DCC to capture the state. > > > One idea that I was looking at was to trigger a devcoredump as a way to > get the data out of the kernel, instead of a new device node. But it > doesn't seem to fit very well with existing use cases, and I haven't > used DCC sufficiently - given that it doesn't exist upstream... You are probably right, but I'm curious about DEV_COREDUMP, as I haven't actually seen that before, and I can't seem to find any documentation about what its intention is, though I can see a number of in-kernel users. > We made significant changes to the control interface through the review > process, I think we have something that looks reasonable now, but I > picked the patches under the premise that it's unstable and in debugfs, > and exposing the tool to users could lead to more interest in > polishing it. Ok Arnd