mbox series

[v6,0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

Message ID 20210201033601.1642-1-thunder.leizhen@huawei.com
Headers show
Series ARM: Add support for Hisilicon Kunpeng L3 cache controller | expand

Message

Leizhen (ThunderTown) Feb. 1, 2021, 3:35 a.m. UTC
v5 --> v6:
1. Use raw_spin_lock_irqsave() instead of spin_lock_irqsave()
2. Move the macros defined in cache-kunpeng-l3.h into cache-kunpeng-l3.c, and delete that header file.
3. In some places, replace readl()/writel() with readl_relaxed()/writel_relaxed() to improve performance without affecting functions.
4. Returns 0 instead of an error code when Kunpeng L3 Cache matching failed.

Thank you for Arnd's review comments and Russell's help.

v4 --> v5:
1. Add SoC macro ARCH_KUNPENG50X, and the Kunpeng L3 cache controller only enabled
   on that platform.
2. Require the compatible string of the Kunpeng L3 cache controller must have a
   relevant name on a specific SoC. For example:
   compatible = "hisilicon,kunpeng509-l3cache", "hisilicon,kunpeng-l3cache";

v3 --> v4:
Rename the compatible string from "hisilicon,l3cache" to "hisilicon,kunpeng-l3cache".
Then adjust the file name, configuration option name, and description accordingly.

v2 --> v3:
Add Hisilicon L3 cache controller driver and its document. That's: patch 2-3.

v1 --> v2:
Discard the middle-tier functions and do silent narrowing cast in the outcache
hook functions. For example:
-static void l2c220_inv_range(unsigned long start, unsigned long end)
+static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
 {
+	unsigned long start = pa_start;
+	unsigned long end = pa_end;


v1:
Do cast phys_addr_t to unsigned long by adding a middle-tier function.
For example:
-static void l2c220_inv_range(unsigned long start, unsigned long end)
+static void __l2c220_inv_range(unsigned long start, unsigned long end)
 {
 	...
 }
+static void l2c220_inv_range(phys_addr_t start, phys_addr_t end)
+{
+  __l2c220_inv_range(start, end);
+}

Zhen Lei (4):
  ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache
    hooks
  ARM: hisi: add support for Kunpeng50x SoC
  dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache
    controller
  ARM: Add support for Hisilicon Kunpeng L3 cache controller

 .../arm/hisilicon/kunpeng-l3cache.yaml        |  40 ++++
 arch/arm/include/asm/outercache.h             |   6 +-
 arch/arm/mach-hisi/Kconfig                    |   6 +
 arch/arm/mm/Kconfig                           |  10 +
 arch/arm/mm/Makefile                          |   1 +
 arch/arm/mm/cache-feroceon-l2.c               |  15 +-
 arch/arm/mm/cache-kunpeng-l3.c                | 176 ++++++++++++++++++
 arch/arm/mm/cache-l2x0.c                      |  50 +++--
 arch/arm/mm/cache-tauros2.c                   |  15 +-
 arch/arm/mm/cache-uniphier.c                  |   6 +-
 arch/arm/mm/cache-xsc3l2.c                    |  12 +-
 11 files changed, 308 insertions(+), 29 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml
 create mode 100644 arch/arm/mm/cache-kunpeng-l3.c

Comments

Arnd Bergmann Feb. 1, 2021, 8:31 a.m. UTC | #1
On Mon, Feb 1, 2021 at 4:36 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
>
> Add support for the Hisilicon Kunpeng L3 cache controller as used with
> Kunpeng506 and Kunpeng509 SoCs.
>
> These Hisilicon SoCs support LPAE, so the physical addresses is wider than
> 32-bits, but the actual bit width does not exceed 36 bits. When the cache
> operation is performed based on the address range, the upper 30 bits of
> the physical address are recorded in registers L3_MAINT_START and
> L3_MAINT_END, and ignore the lower 6 bits cacheline offset.
>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

If you add one more thing:

> +static void l3cache_maint_common(u32 range, u32 op_type)
> +{
> +       u32 reg;
> +
> +       reg = readl_relaxed(l3_ctrl_base + L3_MAINT_CTRL);
> +       reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
> +       reg |= range | op_type;
> +       reg |= L3_MAINT_STATUS_START;
> +       writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
> +
> +       /* Wait until the hardware maintenance operation is complete. */
> +       do {
> +               cpu_relax();
> +               reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
> +       } while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END);
> +}
> +
> +static void l3cache_maint_range(phys_addr_t start, phys_addr_t end, u32 op_type)
> +{
> +       start = start >> L3_CACHE_LINE_SHITF;
> +       end = ((end - 1) >> L3_CACHE_LINE_SHITF) + 1;
> +
> +       writel_relaxed(start, l3_ctrl_base + L3_MAINT_START);
> +       writel_relaxed(end, l3_ctrl_base + L3_MAINT_END);
> +
> +       l3cache_maint_common(L3_MAINT_RANGE_ADDR, op_type);
> +}

As mentioned, I'd like to see a code comment that explains the use
the of relaxed() vs non-relaxed MMIO accessors, as it will be impossible
for a reader to later understand why you picked a mix of the two,
and it also ensures that you have considered which one is the best
option to use here and that your explanation matches what you do.

Based on Russell's comments, I had expected that you would use
only relaxed accessors, plus explicit barriers if you change it, matching
what l2x0 does (l2x0 has to do it because of __l2c210_cache_sync(),
while you don't have a sync callback and don't need to).

      Arnd
Arnd Bergmann Feb. 1, 2021, 8:35 a.m. UTC | #2
On Mon, Feb 1, 2021 at 4:35 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
>
> Enable support for the Hisilicon Kunpeng506 and Kunpeng509 SoC.
>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

Russell, do you have a preference for how to get this series merged
after the last comments are resolved?

I think there is no technical problem in having patch two merged through
the soc tree, while merging the other three through your tree, but it
seems more logical to keep all four together in either location.

       Arnd
Leizhen (ThunderTown) Feb. 1, 2021, 11:38 a.m. UTC | #3
On 2021/2/1 16:31, Arnd Bergmann wrote:
> On Mon, Feb 1, 2021 at 4:36 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
>>
>> Add support for the Hisilicon Kunpeng L3 cache controller as used with
>> Kunpeng506 and Kunpeng509 SoCs.
>>
>> These Hisilicon SoCs support LPAE, so the physical addresses is wider than
>> 32-bits, but the actual bit width does not exceed 36 bits. When the cache
>> operation is performed based on the address range, the upper 30 bits of
>> the physical address are recorded in registers L3_MAINT_START and
>> L3_MAINT_END, and ignore the lower 6 bits cacheline offset.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> 
> Reviewed-by: Arnd Bergmann <arnd@arndb.de>
> 
> If you add one more thing:
> 
>> +static void l3cache_maint_common(u32 range, u32 op_type)
>> +{
>> +       u32 reg;
>> +
>> +       reg = readl_relaxed(l3_ctrl_base + L3_MAINT_CTRL);
>> +       reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
>> +       reg |= range | op_type;
>> +       reg |= L3_MAINT_STATUS_START;
>> +       writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
>> +
>> +       /* Wait until the hardware maintenance operation is complete. */
>> +       do {
>> +               cpu_relax();
>> +               reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
>> +       } while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END);
>> +}
>> +
>> +static void l3cache_maint_range(phys_addr_t start, phys_addr_t end, u32 op_type)
>> +{
>> +       start = start >> L3_CACHE_LINE_SHITF;
>> +       end = ((end - 1) >> L3_CACHE_LINE_SHITF) + 1;
>> +
>> +       writel_relaxed(start, l3_ctrl_base + L3_MAINT_START);
>> +       writel_relaxed(end, l3_ctrl_base + L3_MAINT_END);
>> +
>> +       l3cache_maint_common(L3_MAINT_RANGE_ADDR, op_type);
>> +}
> 
> As mentioned, I'd like to see a code comment that explains the use
> the of relaxed() vs non-relaxed MMIO accessors, as it will be impossible
> for a reader to later understand why you picked a mix of the two,
> and it also ensures that you have considered which one is the best
> option to use here and that your explanation matches what you do.

OK, I'll test the performance and add the comment.

> 
> Based on Russell's comments, I had expected that you would use
> only relaxed accessors, plus explicit barriers if you change it, matching
> what l2x0 does (l2x0 has to do it because of __l2c210_cache_sync(),
> while you don't have a sync callback and don't need to).

I might have been a little conservative, I'll change all of them to _relaxed and then test it. Thanks.

> 
>       Arnd
> 
> .
>
Leizhen (ThunderTown) Feb. 1, 2021, 11:49 a.m. UTC | #4
On 2021/2/1 16:35, Arnd Bergmann wrote:
> On Mon, Feb 1, 2021 at 4:35 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
>>
>> Enable support for the Hisilicon Kunpeng506 and Kunpeng509 SoC.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> 
> Reviewed-by: Arnd Bergmann <arnd@arndb.de>
> 
> Russell, do you have a preference for how to get this series merged
> after the last comments are resolved?
> 
> I think there is no technical problem in having patch two merged through
> the soc tree, while merging the other three through your tree, but it
> seems more logical to keep all four together in either location.

Wait, wait. I've coordinated resources urgently. I can run test cases for new changes tonight.

> 
>        Arnd
> 
> .
>
Arnd Bergmann Feb. 1, 2021, 12:14 p.m. UTC | #5
On Mon, Feb 1, 2021 at 12:49 PM Leizhen (ThunderTown)
<thunder.leizhen@huawei.com> wrote:
> On 2021/2/1 16:35, Arnd Bergmann wrote:
> > On Mon, Feb 1, 2021 at 4:35 AM Zhen Lei <thunder.leizhen@huawei.com> wrote:
> >>
> >> Enable support for the Hisilicon Kunpeng506 and Kunpeng509 SoC.
> >>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >
> > Reviewed-by: Arnd Bergmann <arnd@arndb.de>
> >
> > Russell, do you have a preference for how to get this series merged
> > after the last comments are resolved?
> >
> > I think there is no technical problem in having patch two merged through
> > the soc tree, while merging the other three through your tree, but it
> > seems more logical to keep all four together in either location.
>
> Wait, wait. I've coordinated resources urgently. I can run test cases for new changes tonight.

Just to clarify, my question is independent of how quickly it gets merged,
please continue addressing the review comments and sending new versions
in the meantime.

If we decide to take it through the two trees separately, that would just mean
that Wei Xu sends the patch 2/4 to soc@kernel.org for inclusion there
(I'm happy to take that one right away), while the other three will go through
https://www.armlinux.org.uk/developer/patches/.

       Arnd