mbox series

[SRU,N/O/unstable,0/2] Enable ASPM for nvme controller when working in RAID on mode

Message ID 20240815032259.27719-1-hui.wang@canonical.com
Headers show
Series Enable ASPM for nvme controller when working in RAID on mode | expand

Message

Hui Wang Aug. 15, 2024, 3:22 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2072679

[Impact]
The NVME controller works in RAID on mode by default on some Dell
machines, and in this case, the PCIE ASPM couldn't be enabled, and
as a result the system idle can't enter deep idle states. This issue
not only impacts ubuntu users but also impacts our Dell OEM projects.


[Fix]
pick 2 commits from linux-pci mailist

[Test]
After running the patched kernel, we could run 'sudo lspci -nnvv'
and check "Non-Volatile memory controller":
               LnkCtl: ASPM L1 Enabled;

And check idle states, we could see the system could enter deep idle:
$ sudo cat /sys/kernel/debug/pmc_core/package_cstate_show
Package C2 : 55740989
Package C3 : 4656373
Package C6 : 43325041
Package C7 : 6687655
Package C8 : 44948950
Package C9 : 1693
Package C10 : 92865596

[Where problems could occur]
Because the patchset is not accepted by upstream yet, it is a bit
risky to merge the patchset to ubuntu kernel. And the patch only
impacts vmd driver, hence if there is regression, it could only be
in the nvme driver with RAID on mode. The regression possibility is
very low because we already tested the patch on many Dell, lenovo
machines, there is no regression so far.


Kai-Heng Feng (2):
  UBUNTU: SAUCE: PCI: ASPM: Allow OS to configure ASPM where BIOS is
    incapable of
  UBUNTU: SAUCE: PCI: vmd: Let OS control ASPM for devices under VMD
    domain

 drivers/pci/controller/vmd.c | 2 ++
 drivers/pci/pcie/aspm.c      | 8 ++++++--
 include/linux/pci.h          | 1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

Comments

Stefan Bader Aug. 15, 2024, 12:38 p.m. UTC | #1
On 15.08.24 05:22, Hui Wang wrote:
> BugLink: https://bugs.launchpad.net/bugs/2072679
> 
> [Impact]
> The NVME controller works in RAID on mode by default on some Dell
> machines, and in this case, the PCIE ASPM couldn't be enabled, and
> as a result the system idle can't enter deep idle states. This issue
> not only impacts ubuntu users but also impacts our Dell OEM projects.
> 
> 
> [Fix]
> pick 2 commits from linux-pci mailist
> 
> [Test]
> After running the patched kernel, we could run 'sudo lspci -nnvv'
> and check "Non-Volatile memory controller":
>                 LnkCtl: ASPM L1 Enabled;
> 
> And check idle states, we could see the system could enter deep idle:
> $ sudo cat /sys/kernel/debug/pmc_core/package_cstate_show
> Package C2 : 55740989
> Package C3 : 4656373
> Package C6 : 43325041
> Package C7 : 6687655
> Package C8 : 44948950
> Package C9 : 1693
> Package C10 : 92865596
> 
> [Where problems could occur]
> Because the patchset is not accepted by upstream yet, it is a bit
> risky to merge the patchset to ubuntu kernel. And the patch only
> impacts vmd driver, hence if there is regression, it could only be
> in the nvme driver with RAID on mode. The regression possibility is
> very low because we already tested the patch on many Dell, lenovo
> machines, there is no regression so far.
> 
> 
> Kai-Heng Feng (2):
>    UBUNTU: SAUCE: PCI: ASPM: Allow OS to configure ASPM where BIOS is
>      incapable of
>    UBUNTU: SAUCE: PCI: vmd: Let OS control ASPM for devices under VMD
>      domain
> 
>   drivers/pci/controller/vmd.c | 2 ++
>   drivers/pci/pcie/aspm.c      | 8 ++++++--
>   include/linux/pci.h          | 1 +
>   3 files changed, 9 insertions(+), 2 deletions(-)
> 

Rejected for the following reasons:
- Patches which are not at least in linux-next are not material for SRU

-Stefan
Hui Wang Aug. 16, 2024, 10:16 a.m. UTC | #2
On 8/15/24 20:38, Stefan Bader wrote:
> On 15.08.24 05:22, Hui Wang wrote:
>> BugLink: https://bugs.launchpad.net/bugs/2072679
>>
>> [Impact]
>> The NVME controller works in RAID on mode by default on some Dell
>> machines, and in this case, the PCIE ASPM couldn't be enabled, and
>> as a result the system idle can't enter deep idle states. This issue
>> not only impacts ubuntu users but also impacts our Dell OEM projects.
>>
>>
>> [Fix]
>> pick 2 commits from linux-pci mailist
>>
>> [Test]
>> After running the patched kernel, we could run 'sudo lspci -nnvv'
>> and check "Non-Volatile memory controller":
>>                 LnkCtl: ASPM L1 Enabled;
>>
>> And check idle states, we could see the system could enter deep idle:
>> $ sudo cat /sys/kernel/debug/pmc_core/package_cstate_show
>> Package C2 : 55740989
>> Package C3 : 4656373
>> Package C6 : 43325041
>> Package C7 : 6687655
>> Package C8 : 44948950
>> Package C9 : 1693
>> Package C10 : 92865596
>>
>> [Where problems could occur]
>> Because the patchset is not accepted by upstream yet, it is a bit
>> risky to merge the patchset to ubuntu kernel. And the patch only
>> impacts vmd driver, hence if there is regression, it could only be
>> in the nvme driver with RAID on mode. The regression possibility is
>> very low because we already tested the patch on many Dell, lenovo
>> machines, there is no regression so far.
>>
>>
>> Kai-Heng Feng (2):
>>    UBUNTU: SAUCE: PCI: ASPM: Allow OS to configure ASPM where BIOS is
>>      incapable of
>>    UBUNTU: SAUCE: PCI: vmd: Let OS control ASPM for devices under VMD
>>      domain
>>
>>   drivers/pci/controller/vmd.c | 2 ++
>>   drivers/pci/pcie/aspm.c      | 8 ++++++--
>>   include/linux/pci.h          | 1 +
>>   3 files changed, 9 insertions(+), 2 deletions(-)
>>
>
> Rejected for the following reasons:
> - Patches which are not at least in linux-next are not material for SRU
>
> -Stefan

Hi Stefan,

Understand, that is the general rule for SRU.

In this case, it impacts our Dell oem project. This issue is a 
regression of hwe-6.8 kernel,  we don't have this issue with the oem-6.5 
kernel, but oem-6.5 is EOL. Hence we need to merge this patchset to 
-generic kernel ASAP.

Kai-Heng will continue working on this patchset and make sure it will be 
accepted by PCI maintainers. So could we get an exception in this case?

Thanks,

Hui.
Stefan Bader Aug. 16, 2024, 11:36 a.m. UTC | #3
On 16.08.24 12:16, Hui Wang wrote:
> 
> On 8/15/24 20:38, Stefan Bader wrote:
>> On 15.08.24 05:22, Hui Wang wrote:
>>> BugLink: https://bugs.launchpad.net/bugs/2072679
>>>
>>> [Impact]
>>> The NVME controller works in RAID on mode by default on some Dell
>>> machines, and in this case, the PCIE ASPM couldn't be enabled, and
>>> as a result the system idle can't enter deep idle states. This issue
>>> not only impacts ubuntu users but also impacts our Dell OEM projects.
>>>
>>>
>>> [Fix]
>>> pick 2 commits from linux-pci mailist
>>>
>>> [Test]
>>> After running the patched kernel, we could run 'sudo lspci -nnvv'
>>> and check "Non-Volatile memory controller":
>>>                 LnkCtl: ASPM L1 Enabled;
>>>
>>> And check idle states, we could see the system could enter deep idle:
>>> $ sudo cat /sys/kernel/debug/pmc_core/package_cstate_show
>>> Package C2 : 55740989
>>> Package C3 : 4656373
>>> Package C6 : 43325041
>>> Package C7 : 6687655
>>> Package C8 : 44948950
>>> Package C9 : 1693
>>> Package C10 : 92865596
>>>
>>> [Where problems could occur]
>>> Because the patchset is not accepted by upstream yet, it is a bit
>>> risky to merge the patchset to ubuntu kernel. And the patch only
>>> impacts vmd driver, hence if there is regression, it could only be
>>> in the nvme driver with RAID on mode. The regression possibility is
>>> very low because we already tested the patch on many Dell, lenovo
>>> machines, there is no regression so far.
>>>
>>>
>>> Kai-Heng Feng (2):
>>>    UBUNTU: SAUCE: PCI: ASPM: Allow OS to configure ASPM where BIOS is
>>>      incapable of
>>>    UBUNTU: SAUCE: PCI: vmd: Let OS control ASPM for devices under VMD
>>>      domain
>>>
>>>   drivers/pci/controller/vmd.c | 2 ++
>>>   drivers/pci/pcie/aspm.c      | 8 ++++++--
>>>   include/linux/pci.h          | 1 +
>>>   3 files changed, 9 insertions(+), 2 deletions(-)
>>>
>>
>> Rejected for the following reasons:
>> - Patches which are not at least in linux-next are not material for SRU
>>
>> -Stefan
> 
> Hi Stefan,
> 
> Understand, that is the general rule for SRU.
> 
> In this case, it impacts our Dell oem project. This issue is a 
> regression of hwe-6.8 kernel,  we don't have this issue with the oem-6.5 
> kernel, but oem-6.5 is EOL. Hence we need to merge this patchset to 
> -generic kernel ASAP.
> 
> Kai-Heng will continue working on this patchset and make sure it will be 
> accepted by PCI maintainers. So could we get an exception in this case?
> 
> Thanks,
> 
> Hui.
> 

If you re-submit and put the reasoning about why this is needed now and 
the plan going forward. If this does not get accepted as is we need to 
make sure things get updated to the actual solution.
Timo Aaltonen Aug. 16, 2024, 7:58 p.m. UTC | #4
Stefan Bader kirjoitti 16.8.2024 klo 14.36:
> On 16.08.24 12:16, Hui Wang wrote:
>>
>> On 8/15/24 20:38, Stefan Bader wrote:
>>> On 15.08.24 05:22, Hui Wang wrote:
>>>> BugLink: https://bugs.launchpad.net/bugs/2072679
>>>>
>>>> [Impact]
>>>> The NVME controller works in RAID on mode by default on some Dell
>>>> machines, and in this case, the PCIE ASPM couldn't be enabled, and
>>>> as a result the system idle can't enter deep idle states. This issue
>>>> not only impacts ubuntu users but also impacts our Dell OEM projects.
>>>>
>>>>
>>>> [Fix]
>>>> pick 2 commits from linux-pci mailist
>>>>
>>>> [Test]
>>>> After running the patched kernel, we could run 'sudo lspci -nnvv'
>>>> and check "Non-Volatile memory controller":
>>>>                 LnkCtl: ASPM L1 Enabled;
>>>>
>>>> And check idle states, we could see the system could enter deep idle:
>>>> $ sudo cat /sys/kernel/debug/pmc_core/package_cstate_show
>>>> Package C2 : 55740989
>>>> Package C3 : 4656373
>>>> Package C6 : 43325041
>>>> Package C7 : 6687655
>>>> Package C8 : 44948950
>>>> Package C9 : 1693
>>>> Package C10 : 92865596
>>>>
>>>> [Where problems could occur]
>>>> Because the patchset is not accepted by upstream yet, it is a bit
>>>> risky to merge the patchset to ubuntu kernel. And the patch only
>>>> impacts vmd driver, hence if there is regression, it could only be
>>>> in the nvme driver with RAID on mode. The regression possibility is
>>>> very low because we already tested the patch on many Dell, lenovo
>>>> machines, there is no regression so far.
>>>>
>>>>
>>>> Kai-Heng Feng (2):
>>>>    UBUNTU: SAUCE: PCI: ASPM: Allow OS to configure ASPM where BIOS is
>>>>      incapable of
>>>>    UBUNTU: SAUCE: PCI: vmd: Let OS control ASPM for devices under VMD
>>>>      domain
>>>>
>>>>   drivers/pci/controller/vmd.c | 2 ++
>>>>   drivers/pci/pcie/aspm.c      | 8 ++++++--
>>>>   include/linux/pci.h          | 1 +
>>>>   3 files changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>
>>> Rejected for the following reasons:
>>> - Patches which are not at least in linux-next are not material for SRU
>>>
>>> -Stefan
>>
>> Hi Stefan,
>>
>> Understand, that is the general rule for SRU.
>>
>> In this case, it impacts our Dell oem project. This issue is a 
>> regression of hwe-6.8 kernel,  we don't have this issue with the 
>> oem-6.5 kernel, but oem-6.5 is EOL. Hence we need to merge this 
>> patchset to -generic kernel ASAP.
>>
>> Kai-Heng will continue working on this patchset and make sure it will 
>> be accepted by PCI maintainers. So could we get an exception in this 
>> case?
>>
>> Thanks,
>>
>> Hui.
>>
> 
> If you re-submit and put the reasoning about why this is needed now and 
> the plan going forward. If this does not get accepted as is we need to 
> make sure things get updated to the actual solution.

FTR, this bug was originally

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2034504

and it got applied to the mantic kernel but not unstable (even though 
the bug suggests that it was), which is the reason for the regression..

would've been nice if the new bug had a link to the old one, to save a 
lot of time chasing where things went wrong
Hui Wang Aug. 17, 2024, 2:09 a.m. UTC | #5
On 8/16/24 19:36, Stefan Bader wrote:
> On 16.08.24 12:16, Hui Wang wrote:
>>
>> On 8/15/24 20:38, Stefan Bader wrote:
>>> On 15.08.24 05:22, Hui Wang wrote:
>>>> BugLink: https://bugs.launchpad.net/bugs/2072679
>>>>
>>>> [Impact]
>>>> The NVME controller works in RAID on mode by default on some Dell
>>>> machines, and in this case, the PCIE ASPM couldn't be enabled, and
>>>> as a result the system idle can't enter deep idle states. This issue
>>>> not only impacts ubuntu users but also impacts our Dell OEM projects.
>>>>
>>>>
>>>> [Fix]
>>>> pick 2 commits from linux-pci mailist
>>>>
>>>> [Test]
>>>> After running the patched kernel, we could run 'sudo lspci -nnvv'
>>>> and check "Non-Volatile memory controller":
>>>>                 LnkCtl: ASPM L1 Enabled;
>>>>
>>>> And check idle states, we could see the system could enter deep idle:
>>>> $ sudo cat /sys/kernel/debug/pmc_core/package_cstate_show
>>>> Package C2 : 55740989
>>>> Package C3 : 4656373
>>>> Package C6 : 43325041
>>>> Package C7 : 6687655
>>>> Package C8 : 44948950
>>>> Package C9 : 1693
>>>> Package C10 : 92865596
>>>>
>>>> [Where problems could occur]
>>>> Because the patchset is not accepted by upstream yet, it is a bit
>>>> risky to merge the patchset to ubuntu kernel. And the patch only
>>>> impacts vmd driver, hence if there is regression, it could only be
>>>> in the nvme driver with RAID on mode. The regression possibility is
>>>> very low because we already tested the patch on many Dell, lenovo
>>>> machines, there is no regression so far.
>>>>
>>>>
>>>> Kai-Heng Feng (2):
>>>>    UBUNTU: SAUCE: PCI: ASPM: Allow OS to configure ASPM where BIOS is
>>>>      incapable of
>>>>    UBUNTU: SAUCE: PCI: vmd: Let OS control ASPM for devices under VMD
>>>>      domain
>>>>
>>>>   drivers/pci/controller/vmd.c | 2 ++
>>>>   drivers/pci/pcie/aspm.c      | 8 ++++++--
>>>>   include/linux/pci.h          | 1 +
>>>>   3 files changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>
>>> Rejected for the following reasons:
>>> - Patches which are not at least in linux-next are not material for SRU
>>>
>>> -Stefan
>>
>> Hi Stefan,
>>
>> Understand, that is the general rule for SRU.
>>
>> In this case, it impacts our Dell oem project. This issue is a 
>> regression of hwe-6.8 kernel,  we don't have this issue with the 
>> oem-6.5 kernel, but oem-6.5 is EOL. Hence we need to merge this 
>> patchset to -generic kernel ASAP.
>>
>> Kai-Heng will continue working on this patchset and make sure it will 
>> be accepted by PCI maintainers. So could we get an exception in this 
>> case?
>>
>> Thanks,
>>
>> Hui.
>>
>
> If you re-submit and put the reasoning about why this is needed now 
> and the plan going forward. If this does not get accepted as is we 
> need to make sure things get updated to the actual solution.
OK, got it. Thanks.
Hui Wang Aug. 17, 2024, 2:10 a.m. UTC | #6
On 8/17/24 03:58, Timo Aaltonen wrote:
> Stefan Bader kirjoitti 16.8.2024 klo 14.36:
>> On 16.08.24 12:16, Hui Wang wrote:
>>>
>>> On 8/15/24 20:38, Stefan Bader wrote:
>>>> On 15.08.24 05:22, Hui Wang wrote:
>>>>> BugLink: https://bugs.launchpad.net/bugs/2072679
>>>>>
>>>>> [Impact]
>>>>> The NVME controller works in RAID on mode by default on some Dell
>>>>> machines, and in this case, the PCIE ASPM couldn't be enabled, and
>>>>> as a result the system idle can't enter deep idle states. This issue
>>>>> not only impacts ubuntu users but also impacts our Dell OEM projects.
>>>>>
>>>>>
>>>>> [Fix]
>>>>> pick 2 commits from linux-pci mailist
>>>>>
>>>>> [Test]
>>>>> After running the patched kernel, we could run 'sudo lspci -nnvv'
>>>>> and check "Non-Volatile memory controller":
>>>>>                 LnkCtl: ASPM L1 Enabled;
>>>>>
>>>>> And check idle states, we could see the system could enter deep idle:
>>>>> $ sudo cat /sys/kernel/debug/pmc_core/package_cstate_show
>>>>> Package C2 : 55740989
>>>>> Package C3 : 4656373
>>>>> Package C6 : 43325041
>>>>> Package C7 : 6687655
>>>>> Package C8 : 44948950
>>>>> Package C9 : 1693
>>>>> Package C10 : 92865596
>>>>>
>>>>> [Where problems could occur]
>>>>> Because the patchset is not accepted by upstream yet, it is a bit
>>>>> risky to merge the patchset to ubuntu kernel. And the patch only
>>>>> impacts vmd driver, hence if there is regression, it could only be
>>>>> in the nvme driver with RAID on mode. The regression possibility is
>>>>> very low because we already tested the patch on many Dell, lenovo
>>>>> machines, there is no regression so far.
>>>>>
>>>>>
>>>>> Kai-Heng Feng (2):
>>>>>    UBUNTU: SAUCE: PCI: ASPM: Allow OS to configure ASPM where BIOS is
>>>>>      incapable of
>>>>>    UBUNTU: SAUCE: PCI: vmd: Let OS control ASPM for devices under VMD
>>>>>      domain
>>>>>
>>>>>   drivers/pci/controller/vmd.c | 2 ++
>>>>>   drivers/pci/pcie/aspm.c      | 8 ++++++--
>>>>>   include/linux/pci.h          | 1 +
>>>>>   3 files changed, 9 insertions(+), 2 deletions(-)
>>>>>
>>>>
>>>> Rejected for the following reasons:
>>>> - Patches which are not at least in linux-next are not material for 
>>>> SRU
>>>>
>>>> -Stefan
>>>
>>> Hi Stefan,
>>>
>>> Understand, that is the general rule for SRU.
>>>
>>> In this case, it impacts our Dell oem project. This issue is a 
>>> regression of hwe-6.8 kernel,  we don't have this issue with the 
>>> oem-6.5 kernel, but oem-6.5 is EOL. Hence we need to merge this 
>>> patchset to -generic kernel ASAP.
>>>
>>> Kai-Heng will continue working on this patchset and make sure it 
>>> will be accepted by PCI maintainers. So could we get an exception in 
>>> this case?
>>>
>>> Thanks,
>>>
>>> Hui.
>>>
>>
>> If you re-submit and put the reasoning about why this is needed now 
>> and the plan going forward. If this does not get accepted as is we 
>> need to make sure things get updated to the actual solution.
>
> FTR, this bug was originally
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2034504
>
> and it got applied to the mantic kernel but not unstable (even though 
> the bug suggests that it was), which is the reason for the regression..
>
> would've been nice if the new bug had a link to the old one, to save a 
> lot of time chasing where things went wrong

Thanks for sharing this. I will add the link as you pointed out.

Thanks,

Hui.

>
>