diff mbox series

[v7,9/9] docs: ABI: testing: Document the Ampere Altra Family's SMpro sysfs interfaces

Message ID 20220321081355.6802-10-quan@os.amperecomputing.com
State New
Headers show
Series Add Ampere's Altra SMPro MFD and its child drivers | expand

Commit Message

Quan Nguyen March 21, 2022, 8:13 a.m. UTC
Add documentation for the Ampere(R)'s Altra(R) SMpro sysfs interfaces

Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
---
Changes in v7:
  + First introduce in v7     [Greg]

 .../sysfs-bus-platform-devices-ampere-smpro   | 133 ++++++++++++++++++
 1 file changed, 133 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro

Comments

Greg KH March 21, 2022, 8:23 a.m. UTC | #1
On Mon, Mar 21, 2022 at 03:13:55PM +0700, Quan Nguyen wrote:
> Add documentation for the Ampere(R)'s Altra(R) SMpro sysfs interfaces
> 
> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
> ---
> Changes in v7:
>   + First introduce in v7     [Greg]
> 
>  .../sysfs-bus-platform-devices-ampere-smpro   | 133 ++++++++++++++++++
>  1 file changed, 133 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> new file mode 100644
> index 000000000000..9bfd8d6d0f71
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> @@ -0,0 +1,133 @@
> +What:		/sys/bus/platform/devices/smpro-errmon.*/errors_[core|mem|pcie|other]_[ce|ue]

Please split this out as one entry per file.

> +KernelVersion:	5.14

5.14 is a long time ago.

> +Contact:	quan@os.amperecomputing.com
> +Description:
> +		(RO) Contains the 48-byte Ampere (Vendor-Specific) Error Record, see [1]
> +		printed in hex format as below:
> +
> +		AA BB CCCC DDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD \
> +		   DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD
> +		Where:
> +		  AA       : Error Type
> +		  BB       : Subtype
> +		  CCCC     : Instance
> +		  DDD...DDD: Similar to the Arm RAS standard error record

No, this is not a valid sysfs file, sorry.  This should just be one
value per file.


> +
> +		See [1] below for the format details.
> +
> +		The detail of each sysfs entries is as below:
> +		+-------------+---------------------------------------------------------+
> +		|   Error     |                   Sysfs entry                           |
> +		+-------------+---------------------------------------------------------+
> +		| Core's CE   | /sys/bus/platform/devices/smpro-errmon.*/errors_core_ce |
> +		| Core's UE   | /sys/bus/platform/devices/smpro-errmon.*/errors_core_ue |
> +		| Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/errors_mem_ce  |
> +		| Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/errors_mem_ue  |
> +		| PCIe's CE   | /sys/bus/platform/devices/smpro-errmon.*/errors_pcie_ce |
> +		| PCIe's UE   | /sys/bus/platform/devices/smpro-errmon.*/errors_pcie_ue |
> +		| Other's CE  | /sys/bus/platform/devices/smpro-errmon.*/errors_other_ce|
> +		| Other's UE  | /sys/bus/platform/devices/smpro-errmon.*/errors_other_ue|
> +		+-------------+---------------------------------------------------------+
> +		UE: Uncorrect-able Error
> +		CE: Correct-able Error
> +
> +		[1] Section 3.3 Ampere (Vendor-Specific) Error Record Formats,
> +		    Altra Family RAS Supplement.
> +
> +
> +What:           /sys/bus/platform/devices/smpro-errmon.*/errors_[smpro|pmpro]
> +KernelVersion:	5.14
> +Contact:	quan@os.amperecomputing.com
> +Description:
> +		(RO) Contains the internal firmware error record printed as hex format
> +		as below:
> +
> +		A BB C DD EEEE FFFFFFFF

Again this isn't a good sysfs entry.  You should never have to parse a
sysfs file except for a single value.

thanks,

greg k-h
Quan Nguyen March 21, 2022, 9:46 a.m. UTC | #2
On 21/03/2022 15:23, Greg Kroah-Hartman wrote:
> On Mon, Mar 21, 2022 at 03:13:55PM +0700, Quan Nguyen wrote:
>> Add documentation for the Ampere(R)'s Altra(R) SMpro sysfs interfaces
>>
>> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
>> ---
>> Changes in v7:
>>    + First introduce in v7     [Greg]
>>
>>   .../sysfs-bus-platform-devices-ampere-smpro   | 133 ++++++++++++++++++
>>   1 file changed, 133 insertions(+)
>>   create mode 100644 Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>> new file mode 100644
>> index 000000000000..9bfd8d6d0f71
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>> @@ -0,0 +1,133 @@
>> +What:		/sys/bus/platform/devices/smpro-errmon.*/errors_[core|mem|pcie|other]_[ce|ue]
> 
> Please split this out as one entry per file.
> 

These sysfs share same format of HW errors (the 48-byte Arm vendor 
specific HW error record) but for separate HW domains: Core, PCIe, 
Mem... etc

>> +KernelVersion:	5.14
> 
> 5.14 is a long time ago.
> 
>> +Contact:	quan@os.amperecomputing.com
>> +Description:
>> +		(RO) Contains the 48-byte Ampere (Vendor-Specific) Error Record, see [1]
>> +		printed in hex format as below:
>> +
>> +		AA BB CCCC DDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD \
>> +		   DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD
>> +		Where:
>> +		  AA       : Error Type
>> +		  BB       : Subtype
>> +		  CCCC     : Instance
>> +		  DDD...DDD: Similar to the Arm RAS standard error record
> 
> No, this is not a valid sysfs file, sorry.  This should just be one
> value per file.
> 

This 48-byte value is unable to separate into smaller values because it 
contain all information necessary to indicate a single HW error as per 
ARM RAS supplement document [1]. The format is to make it read-able 
other than a single 48-byte hex value.

[1] https://developer.arm.com/documentation/ddi0587/latest/

> 
>> +
>> +		See [1] below for the format details.
>> +
>> +		The detail of each sysfs entries is as below:
>> +		+-------------+---------------------------------------------------------+
>> +		|   Error     |                   Sysfs entry                           |
>> +		+-------------+---------------------------------------------------------+
>> +		| Core's CE   | /sys/bus/platform/devices/smpro-errmon.*/errors_core_ce |
>> +		| Core's UE   | /sys/bus/platform/devices/smpro-errmon.*/errors_core_ue |
>> +		| Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/errors_mem_ce  |
>> +		| Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/errors_mem_ue  |
>> +		| PCIe's CE   | /sys/bus/platform/devices/smpro-errmon.*/errors_pcie_ce |
>> +		| PCIe's UE   | /sys/bus/platform/devices/smpro-errmon.*/errors_pcie_ue |
>> +		| Other's CE  | /sys/bus/platform/devices/smpro-errmon.*/errors_other_ce|
>> +		| Other's UE  | /sys/bus/platform/devices/smpro-errmon.*/errors_other_ue|
>> +		+-------------+---------------------------------------------------------+
>> +		UE: Uncorrect-able Error
>> +		CE: Correct-able Error
>> +
>> +		[1] Section 3.3 Ampere (Vendor-Specific) Error Record Formats,
>> +		    Altra Family RAS Supplement.
>> +
>> +
>> +What:           /sys/bus/platform/devices/smpro-errmon.*/errors_[smpro|pmpro]
>> +KernelVersion:	5.14
>> +Contact:	quan@os.amperecomputing.com
>> +Description:
>> +		(RO) Contains the internal firmware error record printed as hex format
>> +		as below:
>> +
>> +		A BB C DD EEEE FFFFFFFF
> 
> Again this isn't a good sysfs entry.  You should never have to parse a
> sysfs file except for a single value.
> 
> thanks,
> 
> greg k-h

This error is also unable to separate further as well.

Thanks Greg for the review.
- Quan
Greg KH March 21, 2022, 10:03 a.m. UTC | #3
On Mon, Mar 21, 2022 at 04:46:36PM +0700, Quan Nguyen wrote:
> 
> 
> On 21/03/2022 15:23, Greg Kroah-Hartman wrote:
> > On Mon, Mar 21, 2022 at 03:13:55PM +0700, Quan Nguyen wrote:
> > > Add documentation for the Ampere(R)'s Altra(R) SMpro sysfs interfaces
> > > 
> > > Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
> > > ---
> > > Changes in v7:
> > >    + First introduce in v7     [Greg]
> > > 
> > >   .../sysfs-bus-platform-devices-ampere-smpro   | 133 ++++++++++++++++++
> > >   1 file changed, 133 insertions(+)
> > >   create mode 100644 Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> > > 
> > > diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> > > new file mode 100644
> > > index 000000000000..9bfd8d6d0f71
> > > --- /dev/null
> > > +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> > > @@ -0,0 +1,133 @@
> > > +What:		/sys/bus/platform/devices/smpro-errmon.*/errors_[core|mem|pcie|other]_[ce|ue]
> > 
> > Please split this out as one entry per file.
> > 
> 
> These sysfs share same format of HW errors (the 48-byte Arm vendor specific
> HW error record) but for separate HW domains: Core, PCIe, Mem... etc
> 
> > > +KernelVersion:	5.14
> > 
> > 5.14 is a long time ago.
> > 
> > > +Contact:	quan@os.amperecomputing.com
> > > +Description:
> > > +		(RO) Contains the 48-byte Ampere (Vendor-Specific) Error Record, see [1]
> > > +		printed in hex format as below:
> > > +
> > > +		AA BB CCCC DDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD \
> > > +		   DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD
> > > +		Where:
> > > +		  AA       : Error Type
> > > +		  BB       : Subtype
> > > +		  CCCC     : Instance
> > > +		  DDD...DDD: Similar to the Arm RAS standard error record
> > 
> > No, this is not a valid sysfs file, sorry.  This should just be one
> > value per file.
> > 
> 
> This 48-byte value is unable to separate into smaller values because it
> contain all information necessary to indicate a single HW error as per ARM
> RAS supplement document [1]. The format is to make it read-able other than a
> single 48-byte hex value.
> 
> [1] https://developer.arm.com/documentation/ddi0587/latest/

Just export the 48 byte hex value and make userspace split it up if it
wants to do so.  Don't do things in the kernel that can be done in
userspace.

thanks,

greg k-h
Quan Nguyen March 25, 2022, 7:14 a.m. UTC | #4
On 21/03/2022 17:03, Greg Kroah-Hartman wrote:
> On Mon, Mar 21, 2022 at 04:46:36PM +0700, Quan Nguyen wrote:
>>
>>
>> On 21/03/2022 15:23, Greg Kroah-Hartman wrote:
>>> On Mon, Mar 21, 2022 at 03:13:55PM +0700, Quan Nguyen wrote:
>>>> Add documentation for the Ampere(R)'s Altra(R) SMpro sysfs interfaces
>>>>
>>>> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
>>>> ---
>>>> Changes in v7:
>>>>     + First introduce in v7     [Greg]
>>>>
>>>>    .../sysfs-bus-platform-devices-ampere-smpro   | 133 ++++++++++++++++++
>>>>    1 file changed, 133 insertions(+)
>>>>    create mode 100644 Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>>>>
>>>> diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>>>> new file mode 100644
>>>> index 000000000000..9bfd8d6d0f71
>>>> --- /dev/null
>>>> +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>>>> @@ -0,0 +1,133 @@
>>>> +What:		/sys/bus/platform/devices/smpro-errmon.*/errors_[core|mem|pcie|other]_[ce|ue]
>>>
>>> Please split this out as one entry per file.
>>>
>>
>> These sysfs share same format of HW errors (the 48-byte Arm vendor specific
>> HW error record) but for separate HW domains: Core, PCIe, Mem... etc
>>
>>>> +KernelVersion:	5.14
>>>
>>> 5.14 is a long time ago.
>>>
>>>> +Contact:	quan@os.amperecomputing.com
>>>> +Description:
>>>> +		(RO) Contains the 48-byte Ampere (Vendor-Specific) Error Record, see [1]
>>>> +		printed in hex format as below:
>>>> +
>>>> +		AA BB CCCC DDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD \
>>>> +		   DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD
>>>> +		Where:
>>>> +		  AA       : Error Type
>>>> +		  BB       : Subtype
>>>> +		  CCCC     : Instance
>>>> +		  DDD...DDD: Similar to the Arm RAS standard error record
>>>
>>> No, this is not a valid sysfs file, sorry.  This should just be one
>>> value per file.
>>>
>>
>> This 48-byte value is unable to separate into smaller values because it
>> contain all information necessary to indicate a single HW error as per ARM
>> RAS supplement document [1]. The format is to make it read-able other than a
>> single 48-byte hex value.
>>
>> [1] https://developer.arm.com/documentation/ddi0587/latest/
> 
> Just export the 48 byte hex value and make userspace split it up if it
> wants to do so.  Don't do things in the kernel that can be done in
> userspace.
> 

Thanks Greg for the suggestion,
Will do this in my next version.

Thanks,
- Quan
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
new file mode 100644
index 000000000000..9bfd8d6d0f71
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
@@ -0,0 +1,133 @@ 
+What:		/sys/bus/platform/devices/smpro-errmon.*/errors_[core|mem|pcie|other]_[ce|ue]
+KernelVersion:	5.14
+Contact:	quan@os.amperecomputing.com
+Description:
+		(RO) Contains the 48-byte Ampere (Vendor-Specific) Error Record, see [1]
+		printed in hex format as below:
+
+		AA BB CCCC DDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD \
+		   DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD
+		Where:
+		  AA       : Error Type
+		  BB       : Subtype
+		  CCCC     : Instance
+		  DDD...DDD: Similar to the Arm RAS standard error record
+
+		See [1] below for the format details.
+
+		The detail of each sysfs entries is as below:
+		+-------------+---------------------------------------------------------+
+		|   Error     |                   Sysfs entry                           |
+		+-------------+---------------------------------------------------------+
+		| Core's CE   | /sys/bus/platform/devices/smpro-errmon.*/errors_core_ce |
+		| Core's UE   | /sys/bus/platform/devices/smpro-errmon.*/errors_core_ue |
+		| Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/errors_mem_ce  |
+		| Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/errors_mem_ue  |
+		| PCIe's CE   | /sys/bus/platform/devices/smpro-errmon.*/errors_pcie_ce |
+		| PCIe's UE   | /sys/bus/platform/devices/smpro-errmon.*/errors_pcie_ue |
+		| Other's CE  | /sys/bus/platform/devices/smpro-errmon.*/errors_other_ce|
+		| Other's UE  | /sys/bus/platform/devices/smpro-errmon.*/errors_other_ue|
+		+-------------+---------------------------------------------------------+
+		UE: Uncorrect-able Error
+		CE: Correct-able Error
+
+		[1] Section 3.3 Ampere (Vendor-Specific) Error Record Formats,
+		    Altra Family RAS Supplement.
+
+
+What:           /sys/bus/platform/devices/smpro-errmon.*/errors_[smpro|pmpro]
+KernelVersion:	5.14
+Contact:	quan@os.amperecomputing.com
+Description:
+		(RO) Contains the internal firmware error record printed as hex format
+		as below:
+
+		A BB C DD EEEE FFFFFFFF
+		Where:
+		  A       : Firmware Error Type
+		              1: Warning
+		              2: Error
+			      4: Error with data
+		  BB      : Firmware Image Code (8-bit value)
+		  C       : Direction:
+		              0: Enter
+		              1: Exit
+		  DD      : Location, firmware module location code (8-bit value)
+		  EEEE    : Error Code, firmware Error Code (16-bit value)
+		  FFFFFFFF: Extensive data (32-bit value)
+
+		Example:
+		  root@mtjade:~# cat /sys/bus/platform/devices/smpro-errmon.1.auto/errors_smpro
+		  1 09 0 08 000a 00000000
+
+		The detail of each sysfs entries is as below:
+		+-------------+-------------------------------------------------------+
+		|   Error     |                   Sysfs entry                         |
+		+-------------+-------------------------------------------------------+
+		| SMpro error | /sys/bus/platform/devices/smpro-errmon.*/errors_smpro |
+		| PMpro error | /sys/bus/platform/devices/smpro-errmon.*/errors_pmpro |
+		+-------------+-------------------------------------------------------+
+		See more details in section 5.10 RAS Internal Error Register Definitions,
+		Altra Family Soc BMC Interface Specification.
+
+
+What:           /sys/bus/platform/devices/smpro-errmon.*/event_[vrd_warn_fault|vrd_hot|dimm_hot]
+KernelVersion:	5.14
+Contact:	quan@os.amperecomputing.com
+Description:
+		(RO) Contains the detail information in case of VRD/DIMM warning/hot events
+		in hex format as below:
+
+		AA BBBB
+		Where:
+		  AA  : The event channel
+		          00: VRD Warning Fault
+		          01: VRD Hot
+			  02: DIMM host
+		  BBBB: The event detail information data
+
+		See more details in section 5.7 GPI Status Registers,
+		Altra Family Soc BMC Interface Specification.
+
+
+What:		/sys/bus/platform/devices/smpro-misc.*/boot_progress
+KernelVersion:	5.14
+Contact:	quan@os.amperecomputing.com
+Description:
+		(RO) Contains the boot stages information in hex as format below:
+
+		AA BB CCCCCCCC
+		Where:
+		  AA      : The boot stages
+		              00: SMpro firmware booting
+		              01: PMpro firmware booting
+		              02: ATF BL1 firmware booting
+		              03: DDR initialization
+		              04: DDR training report status
+		              05: ATF BL2 firmware booting
+		              06: ATF BL31 firmware booting
+		              07: ATF BL32 firmware booting
+		              08: UEFI firmware booting
+		              09: OS booting
+		  BB      : Boot status
+		              00: Not started
+		              01: Started
+		              02: Completed without error
+		              03: Failed.
+		  CCCCCCCC: Boot status information defined for each boot stages
+
+		See more details in section 5.11 Boot Stage Register Definitions,
+		and section 6. Processor Boot Progress Codes, Altra Family Soc BMC
+		Interface Specification.
+
+
+What:           /sys/bus/platform/devices/smpro-misc*/soc_power_limit
+KernelVersion:	5.14
+Contact:	quan@os.amperecomputing.com
+Description:
+		(RW) Contains the desired SoC power limit in Watt.
+		Writes to this sysfs set the desired SoC power limit (W).
+		Reads from this register return the current SoC power limit (W).
+		The value ranges:
+		  Minimum: 120 W
+		  Maximum: Socket TDP power