Message ID | 1546266733-9512-1-git-send-email-eranbe@mellanox.com |
---|---|
Headers | show |
Series | Devlink health reporting and recovery system | expand |
On Mon, 31 Dec 2018 16:31:54 +0200, Eran Ben Elisha wrote: > The health mechanism is targeted for Real Time Alerting, in order to know when > something bad had happened to a PCI device > - Provide alert debug information > - Self healing > - If problem needs vendor support, provide a way to gather all needed debugging > information. > > The main idea is to unify and centralize driver health reports in the > generic devlink instance and allow the user to set different > attributes of the health reporting and recovery procedures. > > The devlink health reporter: > Device driver creates a "health reporter" per each error/health type. > Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) > or unknown (driver specific). > For each registered health reporter a driver can issue error/health reports > asynchronously. All health reports handling is done by devlink. > Device driver can provide specific callbacks for each "health reporter", e.g. > - Recovery procedures > - Diagnostics and object dump procedures > - OOB initial attributes > Different parts of the driver can register different types of health reporters > with different handlers. > > Once an error is reported, devlink health will do the following actions: > * A log is being send to the kernel trace events buffer > * Health status and statistics are being updated for the reporter instance > * Object dump is being taken and saved at the reporter instance (as long as > there is no other Objdump which is already stored) > * Auto recovery attempt is being done. Depends on: > - Auto-recovery configuration > - Grace period vs. time passed since last recover > > The user interface: > User can access/change each reporter attributes and driver specific callbacks > via devlink, e.g per error type (per health reporter) > - Configure reporter's generic attributes (like: Disable/enable auto recovery) > - Invoke recovery procedure > - Run diagnostics > - Object dump Thanks for continuing this work! > The devlink health interface (via netlink): > DEVLINK_CMD_HEALTH_REPORTER_GET > Retrieves status and configuration info per DEV and reporter. > DEVLINK_CMD_HEALTH_REPORTER_SET > Allows reporter-related configuration setting. > DEVLINK_CMD_HEALTH_REPORTER_RECOVER > Triggers a reporter's recovery procedure. > DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE > Retrieves diagnostics data from a reporter on a device. > DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET The addition of "objdump" and its marshalling is a bit disappointing. It seemed to me when region snapshots were added that they would serve this exact purpose. Taking a region snapshot or "core dump", if you will, after something went south. Now it's confusing to have two mechanisms serving the same purpose. It's unclear from quick reading of the code how if the buffers get timestamped. Can you report more than one? About the marshalling, I'm not sure it belongs in the kernel. There is precedent for adding interpretation of FW blobs in user space (ethtool). IMHO it's a more scalable approach, if we want to avoid having 100 kLoC drivers. Amount of code you add to print the simple example from last patch is not inspiring confidence. And on the bike shedding side :) -> I think you should steer clear of calling this objdump, as it has very little to do with the functionality of well-known objdump tool. Its not even clear what the object the name is referring to is. Long story short the overlap with region snapshots makes it unclear what purpose either serves, and IMHO we should avoid the marshalling by teaching user space how to interpret snapshots. Preferably we only have one dump mechanism, and user space can be taught the interpretation once. > Retrieves the last stored objdump. Devlink health > saves a single objdump. If an objdump is not already stored by the devlink > for this reporter, devlink generates a new objdump. > Objdump output is defined by the reporter. > DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR > Clears the last saved objdump file for the specified reporter. > > > netlink > +--------------------------+ > | | > | + | > | | | > +--------------------------+ > |request for ops > |(diagnose, > mlx5_core devlink |recover, > |dump) > +--------+ +--------------------------+ > | | | reporter| | > | | | +---------v----------+ | > | | ops execution | | | | > | <----------------------------------+ | | > | | | | | | > | | | + ^------------------+ | > | | | | request for ops | > | | | | (recover, dump) | > | | | | | > | | | +-+------------------+ | > | | health report | | health handler | | > | +-------------------------------> | | > | | | +--------------------+ | > | | health reporter create | | > | +----------------------------> | > +--------+ +--------------------------+ > > Available reporters: > In this patchset, three reporters of mlx5e driver are included. The FW > reporters implement devlink_health_reporter diagnostic, object dump and > recovery procedures. The TX reporter implements devlink_health_reporter > diagnostic and recovery procedures. > > This RFC is based on the same concepts as previous RFC I sent earlier this > year: "[RFC PATCH iproute2-next] System specification health API". The API was > changed, also devlink code and mlx5e reporters were not available at the > previous RFC.
On 1/1/2019 3:47 AM, Jakub Kicinski wrote: > On Mon, 31 Dec 2018 16:31:54 +0200, Eran Ben Elisha wrote: >> The health mechanism is targeted for Real Time Alerting, in order to know when >> something bad had happened to a PCI device >> - Provide alert debug information >> - Self healing >> - If problem needs vendor support, provide a way to gather all needed debugging >> information. >> >> The main idea is to unify and centralize driver health reports in the >> generic devlink instance and allow the user to set different >> attributes of the health reporting and recovery procedures. >> >> The devlink health reporter: >> Device driver creates a "health reporter" per each error/health type. >> Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) >> or unknown (driver specific). >> For each registered health reporter a driver can issue error/health reports >> asynchronously. All health reports handling is done by devlink. >> Device driver can provide specific callbacks for each "health reporter", e.g. >> - Recovery procedures >> - Diagnostics and object dump procedures >> - OOB initial attributes >> Different parts of the driver can register different types of health reporters >> with different handlers. >> >> Once an error is reported, devlink health will do the following actions: >> * A log is being send to the kernel trace events buffer >> * Health status and statistics are being updated for the reporter instance >> * Object dump is being taken and saved at the reporter instance (as long as >> there is no other Objdump which is already stored) >> * Auto recovery attempt is being done. Depends on: >> - Auto-recovery configuration >> - Grace period vs. time passed since last recover >> >> The user interface: >> User can access/change each reporter attributes and driver specific callbacks >> via devlink, e.g per error type (per health reporter) >> - Configure reporter's generic attributes (like: Disable/enable auto recovery) >> - Invoke recovery procedure >> - Run diagnostics >> - Object dump > > Thanks for continuing this work! :) > >> The devlink health interface (via netlink): >> DEVLINK_CMD_HEALTH_REPORTER_GET >> Retrieves status and configuration info per DEV and reporter. >> DEVLINK_CMD_HEALTH_REPORTER_SET >> Allows reporter-related configuration setting. >> DEVLINK_CMD_HEALTH_REPORTER_RECOVER >> Triggers a reporter's recovery procedure. >> DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE >> Retrieves diagnostics data from a reporter on a device. >> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET > > The addition of "objdump" and its marshalling is a bit disappointing. > It seemed to me when region snapshots were added that they would serve > this exact purpose. Taking a region snapshot or "core dump", if you > will, after something went south. Now it's confusing to have two > mechanisms serving the same purpose. The motivation here was that the driver can provide reporters to its sub-modules, such that each reporter will be able to provide all needed info and recover methods to face run time errors. The implementation of the objdump function is in the hands of the reporter developer, and he can dump whatever he thinks it is needed. Keep in mind that a driver can have many reporters (TX, RX, FW, command interface, etc). For most of the reporters, there is important information in the driver which can not be fetched with region snapshot (intended for memory fetching only). For SW info, driver shall build the info and send it interpreted (unlike all dumps / region available mechanisms) I have in plans to extend the TX reporter to have objdump method in which I will pass many SQ SW attributes that can be very handy in a debug session. > > It's unclear from quick reading of the code how if the buffers get > timestamped. Can you report more than one? The timestamp which devlink health reports on, is the timestamp in which it got the buffers filled by the driver. Every dump/diagnose has one ts. Per reporter, it is possible to store up to one dump. only clear command can remove it and makes the reporter ready to fetch a new objdump. > > About the marshalling, I'm not sure it belongs in the kernel. There is > precedent for adding interpretation of FW blobs in user space (ethtool). > IMHO it's a more scalable approach, if we want to avoid having 100 kLoC > drivers. Amount of code you add to print the simple example from last > patch is not inspiring confidence. The idea was to provide the developer the ability to create "tree-like" of information, it is needed when you want to describe complex objects. This caused a longer coding, but I don't think we are even close to the scale you are talking about. We can remove the tree flexibility, and move to array format, it will make the code of storing data by the driver to be shorter, but we will lose the ability to have it in tree-like format. And again, regarding ethtool, it is a tool for dumping HW/FW, this could have been an argument for the region snapshot, not for the devlink health... > > And on the bike shedding side :) -> I think you should steer clear of > calling this objdump, as it has very little to do with the > functionality of well-known objdump tool. Its not even clear what the > object the name is referring to is. Let's agree on concept, we can change name to dump. Reporter->dump is very clear when you know what the reporter is. > > Long story short the overlap with region snapshots makes it unclear > what purpose either serves, and IMHO we should avoid the marshalling by > teaching user space how to interpret snapshots. Preferably we only > have one dump mechanism, and user space can be taught the interpretation > once. Forcing SW reporters to use region snapshot mechanism sounds like a bad idea. To summarize, In order to have the health system working properly, it must have a way to objdump/dump itself and provide it to the developer for debug. Removing the objdump part will make it useless for run-time debug. I think this is a powerful tool and we shall not ask the user level scripts to fetch info from many other partial tools. It shall all be focused in one place (recover, diagnose, objdump, statistics). > >> Retrieves the last stored objdump. Devlink health >> saves a single objdump. If an objdump is not already stored by the devlink >> for this reporter, devlink generates a new objdump. >> Objdump output is defined by the reporter. >> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR >> Clears the last saved objdump file for the specified reporter. >> >> >> netlink >> +--------------------------+ >> | | >> | + | >> | | | >> +--------------------------+ >> |request for ops >> |(diagnose, >> mlx5_core devlink |recover, >> |dump) >> +--------+ +--------------------------+ >> | | | reporter| | >> | | | +---------v----------+ | >> | | ops execution | | | | >> | <----------------------------------+ | | >> | | | | | | >> | | | + ^------------------+ | >> | | | | request for ops | >> | | | | (recover, dump) | >> | | | | | >> | | | +-+------------------+ | >> | | health report | | health handler | | >> | +-------------------------------> | | >> | | | +--------------------+ | >> | | health reporter create | | >> | +----------------------------> | >> +--------+ +--------------------------+ >> >> Available reporters: >> In this patchset, three reporters of mlx5e driver are included. The FW >> reporters implement devlink_health_reporter diagnostic, object dump and >> recovery procedures. The TX reporter implements devlink_health_reporter >> diagnostic and recovery procedures. >> >> This RFC is based on the same concepts as previous RFC I sent earlier this >> year: "[RFC PATCH iproute2-next] System specification health API". The API was >> changed, also devlink code and mlx5e reporters were not available at the >> previous RFC.
On Tue, 1 Jan 2019 09:58:30 +0000, Eran Ben Elisha wrote: > On 1/1/2019 3:47 AM, Jakub Kicinski wrote: > > The addition of "objdump" and its marshalling is a bit disappointing. > > It seemed to me when region snapshots were added that they would serve > > this exact purpose. Taking a region snapshot or "core dump", if you > > will, after something went south. Now it's confusing to have two > > mechanisms serving the same purpose. > > The motivation here was that the driver can provide reporters to its > sub-modules, such that each reporter will be able to provide all needed > info and recover methods to face run time errors. > > The implementation of the objdump function is in the hands of the > reporter developer, and he can dump whatever he thinks it is needed. > Keep in mind that a driver can have many reporters (TX, RX, FW, command > interface, etc). For most of the reporters, there is important > information in the driver which can not be fetched with region snapshot > (intended for memory fetching only). > For SW info, driver shall build the info and send it interpreted (unlike > all dumps / region available mechanisms) > I have in plans to extend the TX reporter to have objdump method in > which I will pass many SQ SW attributes that can be very handy in a > debug session. My feeling is that instead of duplicating this infrastructure we should try to grow region snapshots beyond the "HW memory dumper". In a debugging session you may want to have dumps as well as read the state live. Region snapshot API was built with this promise in mind. The information in reporter dump is not easily available other than when the dump happens, which is not great in a debugging session. Driver will have to expose it via debugfs/region dump/ethtool dump or some such for live debug. > > It's unclear from quick reading of the code how if the buffers get > > timestamped. Can you report more than one? > > The timestamp which devlink health reports on, is the timestamp in which > it got the buffers filled by the driver. Every dump/diagnose has one ts. > Per reporter, it is possible to store up to one dump. only clear command > can remove it and makes the reporter ready to fetch a new objdump. Region snapshots support collecting multiple snapshots IIRC, no? > > About the marshalling, I'm not sure it belongs in the kernel. There is > > precedent for adding interpretation of FW blobs in user space (ethtool). > > IMHO it's a more scalable approach, if we want to avoid having 100 kLoC > > drivers. Amount of code you add to print the simple example from last > > patch is not inspiring confidence. > > The idea was to provide the developer the ability to create "tree-like" > of information, it is needed when you want to describe complex objects. > This caused a longer coding, but I don't think we are even close to the > scale you are talking about. > We can remove the tree flexibility, and move to array format, it will > make the code of storing data by the driver to be shorter, but we will > lose the ability to have it in tree-like format. To be clear I slightly oppose the marshalling in the first place. It may be better to just dump the data as is, and have user space know what the interpretation is. > And again, regarding ethtool, it is a tool for dumping HW/FW, this could > have been an argument for the region snapshot, not for the devlink health... I've seen drivers dumping ring state and other SW info via ethtool. All debugging APIs end up "mixed source" in my experience. > > And on the bike shedding side :) -> I think you should steer clear of > > calling this objdump, as it has very little to do with the > > functionality of well-known objdump tool. Its not even clear what the > > object the name is referring to is. > Let's agree on concept, we can change name to dump. Reporter->dump is > very clear when you know what the reporter is. Thanks! > > Long story short the overlap with region snapshots makes it unclear > > what purpose either serves, and IMHO we should avoid the marshalling by > > teaching user space how to interpret snapshots. Preferably we only > > have one dump mechanism, and user space can be taught the interpretation > > once. > Forcing SW reporters to use region snapshot mechanism sounds like a bad > idea. I'm not super excited about reusing region API for SW info. But I like it more than having multitude of debug APIs for drivers to implement with largely overlapping functionality and semantics. > To summarize, In order to have the health system working properly, it > must have a way to objdump/dump itself and provide it to the developer > for debug. Removing the objdump part will make it useless for run-time > debug. > > I think this is a powerful tool and we shall not ask the user level > scripts to fetch info from many other partial tools. It shall all be > focused in one place (recover, diagnose, objdump, statistics). I don't think having reporter API refer to snapshot IDs is "many other partial tools" if that's the suggestion. Seems like you want to focus all the reporter stuff in one place, and I want to focus the debug APIs a little :)
On 1/3/2019 12:46 AM, Jakub Kicinski wrote: > On Tue, 1 Jan 2019 09:58:30 +0000, Eran Ben Elisha wrote: >> On 1/1/2019 3:47 AM, Jakub Kicinski wrote: >>> The addition of "objdump" and its marshalling is a bit disappointing. >>> It seemed to me when region snapshots were added that they would serve >>> this exact purpose. Taking a region snapshot or "core dump", if you >>> will, after something went south. Now it's confusing to have two >>> mechanisms serving the same purpose. >> >> The motivation here was that the driver can provide reporters to its >> sub-modules, such that each reporter will be able to provide all needed >> info and recover methods to face run time errors. >> >> The implementation of the objdump function is in the hands of the >> reporter developer, and he can dump whatever he thinks it is needed. >> Keep in mind that a driver can have many reporters (TX, RX, FW, command >> interface, etc). For most of the reporters, there is important >> information in the driver which can not be fetched with region snapshot >> (intended for memory fetching only). >> For SW info, driver shall build the info and send it interpreted (unlike >> all dumps / region available mechanisms) >> I have in plans to extend the TX reporter to have objdump method in >> which I will pass many SQ SW attributes that can be very handy in a >> debug session. > > My feeling is that instead of duplicating this infrastructure we should > try to grow region snapshots beyond the "HW memory dumper". In a > debugging session you may want to have dumps as well as read the state > live. Region snapshot API was built with this promise in mind. The > information in reporter dump is not easily available other than when > the dump happens, which is not great in a debugging session. Driver > will have to expose it via debugfs/region dump/ethtool dump or some > such for live debug. Arch wise those are two different features which we shouldn't mix. The region dump is aiming at dumping of information for monitoring of "HW memory" at real time, more like a dumb channel to provide memory chunks from HW to user. The health is a user facing tool which provides a centralized vision on a device health status with diagnose, recover and dump options divided via sub-reporters in a real time and can save critical data automatically in case of a reported error. Regarding the need for other tool for live debug, it is not true. Devlink health objdump command provides latest stored dump, as well as dumping now for live debug. It is even better, as driver reporters already contains the needed HW and SW buffers according to the device and the error. Unlike region dump that requires the administrator to set regions while the administrator is not an expert of mapping between required info, device model and reported error. > >>> It's unclear from quick reading of the code how if the buffers get >>> timestamped. Can you report more than one? >> >> The timestamp which devlink health reports on, is the timestamp in which >> it got the buffers filled by the driver. Every dump/diagnose has one ts. >> Per reporter, it is possible to store up to one dump. only clear command >> can remove it and makes the reporter ready to fetch a new objdump. > > Region snapshots support collecting multiple snapshots IIRC, no? it does allow multiple snapshots. This can be easily added to the devlink health if we wish to. I didn't see the current need for it. > >>> About the marshalling, I'm not sure it belongs in the kernel. There is >>> precedent for adding interpretation of FW blobs in user space (ethtool). >>> IMHO it's a more scalable approach, if we want to avoid having 100 kLoC >>> drivers. Amount of code you add to print the simple example from last >>> patch is not inspiring confidence. >> >> The idea was to provide the developer the ability to create "tree-like" >> of information, it is needed when you want to describe complex objects. >> This caused a longer coding, but I don't think we are even close to the >> scale you are talking about. >> We can remove the tree flexibility, and move to array format, it will >> make the code of storing data by the driver to be shorter, but we will >> lose the ability to have it in tree-like format. > > To be clear I slightly oppose the marshalling in the first place. It > may be better to just dump the data as is, and have user space know > what the interpretation is. We provides a way to store the data in nested layers. In internal discussions with Jiri, we decided that this is the correct approach. However, if one insists, it can fill the buffers with raw binary and label it as such. > >> And again, regarding ethtool, it is a tool for dumping HW/FW, this could >> have been an argument for the region snapshot, not for the devlink health... > > I've seen drivers dumping ring state and other SW info via ethtool. > All debugging APIs end up "mixed source" in my experience. > >>> And on the bike shedding side :) -> I think you should steer clear of >>> calling this objdump, as it has very little to do with the >>> functionality of well-known objdump tool. Its not even clear what the >>> object the name is referring to is. >> Let's agree on concept, we can change name to dump. Reporter->dump is >> very clear when you know what the reporter is. > > Thanks! > >>> Long story short the overlap with region snapshots makes it unclear >>> what purpose either serves, and IMHO we should avoid the marshalling by >>> teaching user space how to interpret snapshots. Preferably we only >>> have one dump mechanism, and user space can be taught the interpretation >>> once. >> Forcing SW reporters to use region snapshot mechanism sounds like a bad >> idea. > > I'm not super excited about reusing region API for SW info. But I like > it more than having multitude of debug APIs for drivers to implement > with largely overlapping functionality and semantics. The dumping of HW information is just a very small portion of the devlink health. If driver developer thinks he can use existing region API to fetch some data into its reporter, he can do so in his dump method. > >> To summarize, In order to have the health system working properly, it >> must have a way to objdump/dump itself and provide it to the developer >> for debug. Removing the objdump part will make it useless for run-time >> debug. >> >> I think this is a powerful tool and we shall not ask the user level >> scripts to fetch info from many other partial tools. It shall all be >> focused in one place (recover, diagnose, objdump, statistics). > > I don't think having reporter API refer to snapshot IDs is "many other > partial tools" if that's the suggestion. Seems like you want to focus > all the reporter stuff in one place, and I want to focus the debug APIs > a little :) > As I can see it, this tool is an envelop to all health functionality and should provide good and easy to use interface. This tool should contain all health related dumps (unfortunately as the subsystem runs for a long time, it cannot be the sole provider of it, but aspire to be the leading one). With devlink health dump you can get: 1. Run time dumps 2. Automatic error related dumps from the time the error happened stored in the memory waiting to be fetched. 3. Driver parsed data as well as raw data (up to the developer to decide, API can support both). 4. OOB configured data marked by the driver developers as relevant for the failure Non of the existing tools can provide such features. And the option to add them to an existing tool doesn't seem possible / correct even in theory. Removing the dump option from devlink for the sake of duplicate HW memory dump is a killer for this feature. And as A driver developer I really think that the networking subsystem needs it.
On Thu, 3 Jan 2019 13:31:59 +0000, Eran Ben Elisha wrote: > Arch wise those are two different features which we shouldn't mix. > The region dump is aiming at dumping of information for monitoring of > "HW memory" at real time, more like a dumb channel to provide memory > chunks from HW to user. The "real time read" part of the region dump was not even implemented. And it was the part that made most sense to me. Region snapshots were described as a tool for gathering crash dumps. See bedc989b0c98 ("net/mlx4_core: Add Crdump FW snapshot support"). The "chunks from HW" is also incorrect as (1) current implementation of regions seem to mostly revolve around FW state and (2) there is nothing in the man page etc. that says HW. I'm not saying region snapshots fit the bill perfectly for you, I'm saying you guys are adding a second facility to do a very similar thing in the span of 6 months - how is it unreasonable of me to ask to consolidate? But I'm not gonna fight you any more on this, if nobody else cares.
Tue, Jan 01, 2019 at 02:47:27AM CET, jakub.kicinski@netronome.com wrote: >On Mon, 31 Dec 2018 16:31:54 +0200, Eran Ben Elisha wrote: >> The health mechanism is targeted for Real Time Alerting, in order to know when >> something bad had happened to a PCI device >> - Provide alert debug information >> - Self healing >> - If problem needs vendor support, provide a way to gather all needed debugging >> information. >> >> The main idea is to unify and centralize driver health reports in the >> generic devlink instance and allow the user to set different >> attributes of the health reporting and recovery procedures. >> >> The devlink health reporter: >> Device driver creates a "health reporter" per each error/health type. >> Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) >> or unknown (driver specific). >> For each registered health reporter a driver can issue error/health reports >> asynchronously. All health reports handling is done by devlink. >> Device driver can provide specific callbacks for each "health reporter", e.g. >> - Recovery procedures >> - Diagnostics and object dump procedures >> - OOB initial attributes >> Different parts of the driver can register different types of health reporters >> with different handlers. >> >> Once an error is reported, devlink health will do the following actions: >> * A log is being send to the kernel trace events buffer >> * Health status and statistics are being updated for the reporter instance >> * Object dump is being taken and saved at the reporter instance (as long as >> there is no other Objdump which is already stored) >> * Auto recovery attempt is being done. Depends on: >> - Auto-recovery configuration >> - Grace period vs. time passed since last recover >> >> The user interface: >> User can access/change each reporter attributes and driver specific callbacks >> via devlink, e.g per error type (per health reporter) >> - Configure reporter's generic attributes (like: Disable/enable auto recovery) >> - Invoke recovery procedure >> - Run diagnostics >> - Object dump > >Thanks for continuing this work! > >> The devlink health interface (via netlink): >> DEVLINK_CMD_HEALTH_REPORTER_GET >> Retrieves status and configuration info per DEV and reporter. >> DEVLINK_CMD_HEALTH_REPORTER_SET >> Allows reporter-related configuration setting. >> DEVLINK_CMD_HEALTH_REPORTER_RECOVER >> Triggers a reporter's recovery procedure. >> DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE >> Retrieves diagnostics data from a reporter on a device. >> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET > >The addition of "objdump" and its marshalling is a bit disappointing. >It seemed to me when region snapshots were added that they would serve >this exact purpose. Taking a region snapshot or "core dump", if you >will, after something went south. Now it's confusing to have two >mechanisms serving the same purpose. > >It's unclear from quick reading of the code how if the buffers get >timestamped. Can you report more than one? > >About the marshalling, I'm not sure it belongs in the kernel. There is >precedent for adding interpretation of FW blobs in user space (ethtool). >IMHO it's a more scalable approach, if we want to avoid having 100 kLoC >drivers. Amount of code you add to print the simple example from last >patch is not inspiring confidence. > >And on the bike shedding side :) -> I think you should steer clear of >calling this objdump, as it has very little to do with the >functionality of well-known objdump tool. Its not even clear what the >object the name is referring to is. > >Long story short the overlap with region snapshots makes it unclear >what purpose either serves, and IMHO we should avoid the marshalling by >teaching user space how to interpret snapshots. Preferably we only >have one dump mechanism, and user space can be taught the interpretation >once. Unlike regions, which are binary blobs, the "objdump" we have here is a tree of values (json like). It is not taken from FW/HW. It is generated in driver and passed down to user to be shown. Originally, Eran just pushed that info into a string buffer and the user had to parse it again. I asked to format this in Netlink attributes JSON-style so the user gets all hierarchy without need of parsing and can easily do Netlink->human_stdout or Netlink->JSON. > >> Retrieves the last stored objdump. Devlink health >> saves a single objdump. If an objdump is not already stored by the devlink >> for this reporter, devlink generates a new objdump. >> Objdump output is defined by the reporter. >> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR >> Clears the last saved objdump file for the specified reporter. >> >> >> netlink >> +--------------------------+ >> | | >> | + | >> | | | >> +--------------------------+ >> |request for ops >> |(diagnose, >> mlx5_core devlink |recover, >> |dump) >> +--------+ +--------------------------+ >> | | | reporter| | >> | | | +---------v----------+ | >> | | ops execution | | | | >> | <----------------------------------+ | | >> | | | | | | >> | | | + ^------------------+ | >> | | | | request for ops | >> | | | | (recover, dump) | >> | | | | | >> | | | +-+------------------+ | >> | | health report | | health handler | | >> | +-------------------------------> | | >> | | | +--------------------+ | >> | | health reporter create | | >> | +----------------------------> | >> +--------+ +--------------------------+ >> >> Available reporters: >> In this patchset, three reporters of mlx5e driver are included. The FW >> reporters implement devlink_health_reporter diagnostic, object dump and >> recovery procedures. The TX reporter implements devlink_health_reporter >> diagnostic and recovery procedures. >> >> This RFC is based on the same concepts as previous RFC I sent earlier this >> year: "[RFC PATCH iproute2-next] System specification health API". The API was >> changed, also devlink code and mlx5e reporters were not available at the >> previous RFC.
Thu, Jan 03, 2019 at 02:31:59PM CET, eranbe@mellanox.com wrote: > > >On 1/3/2019 12:46 AM, Jakub Kicinski wrote: >> On Tue, 1 Jan 2019 09:58:30 +0000, Eran Ben Elisha wrote: >>> On 1/1/2019 3:47 AM, Jakub Kicinski wrote: [...] > >> >>>> About the marshalling, I'm not sure it belongs in the kernel. There is >>>> precedent for adding interpretation of FW blobs in user space (ethtool). >>>> IMHO it's a more scalable approach, if we want to avoid having 100 kLoC >>>> drivers. Amount of code you add to print the simple example from last >>>> patch is not inspiring confidence. >>> >>> The idea was to provide the developer the ability to create "tree-like" >>> of information, it is needed when you want to describe complex objects. >>> This caused a longer coding, but I don't think we are even close to the >>> scale you are talking about. >>> We can remove the tree flexibility, and move to array format, it will >>> make the code of storing data by the driver to be shorter, but we will >>> lose the ability to have it in tree-like format. >> >> To be clear I slightly oppose the marshalling in the first place. It >> may be better to just dump the data as is, and have user space know >> what the interpretation is. > >We provides a way to store the data in nested layers. In internal >discussions with Jiri, we decided that this is the correct approach. >However, if one insists, it can fill the buffers with raw binary and >label it as such. Again, the data is generated by driver. Driver knows the objects, it can assemble a tree of values accordingly. To me it seems wrong to squash the info into binary, push to userspace where it has to be parsed and unsquashed back.
Thu, Jan 03, 2019 at 11:28:34PM CET, jakub.kicinski@netronome.com wrote: >On Thu, 3 Jan 2019 13:31:59 +0000, Eran Ben Elisha wrote: >> Arch wise those are two different features which we shouldn't mix. >> The region dump is aiming at dumping of information for monitoring of >> "HW memory" at real time, more like a dumb channel to provide memory >> chunks from HW to user. > >The "real time read" part of the region dump was not even implemented. >And it was the part that made most sense to me. Agreed. I believe that it was planned to be used for mlx5. > >Region snapshots were described as a tool for gathering crash dumps. >See bedc989b0c98 ("net/mlx4_core: Add Crdump FW snapshot support"). > >The "chunks from HW" is also incorrect as (1) current implementation of >regions seem to mostly revolve around FW state and (2) there is nothing >in the man page etc. that says HW. > >I'm not saying region snapshots fit the bill perfectly for you, I'm >saying you guys are adding a second facility to do a very similar thing >in the span of 6 months - how is it unreasonable of me to ask to >consolidate? If we would need to push binary, yes. But as I described in another email, that is not the case. > >But I'm not gonna fight you any more on this, if nobody else cares.