From patchwork Mon Apr 4 18:31:48 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hans de Goede X-Patchwork-Id: 605967 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from theia.denx.de (theia.denx.de [85.214.87.163]) by ozlabs.org (Postfix) with ESMTP id 3qf0vW3Jrdz9s2G for ; Tue, 5 Apr 2016 04:32:07 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by theia.denx.de (Postfix) with ESMTP id 0F19AA7629; Mon, 4 Apr 2016 20:32:06 +0200 (CEST) Received: from theia.denx.de ([127.0.0.1]) by localhost (theia.denx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EAuRpPoNcSGB; Mon, 4 Apr 2016 20:32:05 +0200 (CEST) Received: from theia.denx.de (localhost [127.0.0.1]) by theia.denx.de (Postfix) with ESMTP id 88DF0A7519; Mon, 4 Apr 2016 20:32:05 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by theia.denx.de (Postfix) with ESMTP id A6E61A7519 for ; Mon, 4 Apr 2016 20:32:01 +0200 (CEST) Received: from theia.denx.de ([127.0.0.1]) by localhost (theia.denx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yNJeuaaEr2dW for ; Mon, 4 Apr 2016 20:32:01 +0200 (CEST) X-policyd-weight: NOT_IN_SBL_XBL_SPAMHAUS=-1.5 NOT_IN_SPAMCOP=-1.5 NOT_IN_BL_NJABL=-1.5 (only DNSBL check requested) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by theia.denx.de (Postfix) with ESMTPS id 2B2CAA748A for ; Mon, 4 Apr 2016 20:31:57 +0200 (CEST) Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D884C64445; Mon, 4 Apr 2016 18:31:55 +0000 (UTC) Received: from shalem.localdomain.com (vpn1-6-205.ams2.redhat.com [10.36.6.205]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u34IVqx1010043; Mon, 4 Apr 2016 14:31:53 -0400 From: Hans de Goede To: Albert ARIBAUD Date: Mon, 4 Apr 2016 20:31:48 +0200 Message-Id: <1459794709-21601-1-git-send-email-hdegoede@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 04 Apr 2016 18:31:56 +0000 (UTC) Cc: Ian Campbell , Tom Rini , u-boot@lists.denx.de Subject: [U-Boot] [PATCH 1/2] arm: Replace v7_maint_dcache_all(ARMV7_DCACHE_CLEAN_INVAL_ALL) with asm code X-BeenThere: u-boot@lists.denx.de X-Mailman-Version: 2.1.15 Precedence: list List-Id: U-Boot discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: u-boot-bounces@lists.denx.de Sender: "U-Boot" v7_maint_dcache_all() does not work reliable when build with gcc6, see: https://bugzilla.redhat.com/show_bug.cgi?id=1318788 While debugging this I learned that v7_maint_dcache_all() is unreliable when build with gcc5 too when it is marked as noinline. This commit fixes the reliability issues by replacing the C-code with the ready to use asm implementation from the kernel. Given that this code when written as C-code clearly is quite fragile (also see the existing comments about the C-code being the way it is to get optimal assembly) and that we have a proven asm alternative, I believe that this is the best solution. Note that we actually already have a copy of the kernel's v7_flush_dcache_all() in arch/arm/mach-uniphier/arm32/lowlevel_init.S. We should replace that implementation with a call to this one, but I'm leaving this up to people with access to actual unifier hw. With this replacement in mind I've kept the original function as is, only renamed it to __v7_flush_dcache_all and v7_flush_dcache_all is a wrapper saving the registered clobbered by the core __v7_flush_dcache_all code Signed-off-by: Hans de Goede --- arch/arm/cpu/armv7/Makefile | 2 +- arch/arm/cpu/armv7/cache_v7.c | 41 +++---------------- arch/arm/cpu/armv7/cache_v7_asm.S | 85 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 92 insertions(+), 36 deletions(-) create mode 100644 arch/arm/cpu/armv7/cache_v7_asm.S diff --git a/arch/arm/cpu/armv7/Makefile b/arch/arm/cpu/armv7/Makefile index 45f346c..328c4b1 100644 --- a/arch/arm/cpu/armv7/Makefile +++ b/arch/arm/cpu/armv7/Makefile @@ -7,7 +7,7 @@ extra-y := start.o -obj-y += cache_v7.o +obj-y += cache_v7.o cache_v7_asm.o obj-y += cpu.o cp15.o obj-y += syslib.o diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c index 94ff488..dd07ba1 100644 --- a/arch/arm/cpu/armv7/cache_v7.c +++ b/arch/arm/cpu/armv7/cache_v7.c @@ -16,6 +16,10 @@ #define ARMV7_DCACHE_CLEAN_INVAL_RANGE 4 #ifndef CONFIG_SYS_DCACHE_OFF + +/* Asm functions from cache_v7_asm.S */ +void v7_flush_dcache_all(void); + static int check_cache_range(unsigned long start, unsigned long stop) { int ok = 1; @@ -88,34 +92,6 @@ static void v7_inval_dcache_level_setway(u32 level, u32 num_sets, DSB; } -static void v7_clean_inval_dcache_level_setway(u32 level, u32 num_sets, - u32 num_ways, u32 way_shift, - u32 log2_line_len) -{ - int way, set; - u32 setway; - - /* - * For optimal assembly code: - * a. count down - * b. have bigger loop inside - */ - for (way = num_ways - 1; way >= 0 ; way--) { - for (set = num_sets - 1; set >= 0; set--) { - setway = (level << 1) | (set << log2_line_len) | - (way << way_shift); - /* - * Clean & Invalidate data/unified - * cache line by set/way - */ - asm volatile (" mcr p15, 0, %0, c7, c14, 2" - : : "r" (setway)); - } - } - /* DSB to make sure the operation is complete */ - DSB; -} - static void v7_maint_dcache_level_setway(u32 level, u32 operation) { u32 ccsidr; @@ -142,13 +118,8 @@ static void v7_maint_dcache_level_setway(u32 level, u32 operation) log2_num_ways = log_2_n_round_up(num_ways); way_shift = (32 - log2_num_ways); - if (operation == ARMV7_DCACHE_INVAL_ALL) { - v7_inval_dcache_level_setway(level, num_sets, num_ways, + v7_inval_dcache_level_setway(level, num_sets, num_ways, way_shift, log2_line_len); - } else if (operation == ARMV7_DCACHE_CLEAN_INVAL_ALL) { - v7_clean_inval_dcache_level_setway(level, num_sets, num_ways, - way_shift, log2_line_len); - } } static void v7_maint_dcache_all(u32 operation) @@ -263,7 +234,7 @@ void invalidate_dcache_all(void) */ void flush_dcache_all(void) { - v7_maint_dcache_all(ARMV7_DCACHE_CLEAN_INVAL_ALL); + v7_flush_dcache_all(); v7_outer_cache_flush_all(); } diff --git a/arch/arm/cpu/armv7/cache_v7_asm.S b/arch/arm/cpu/armv7/cache_v7_asm.S new file mode 100644 index 0000000..1b1daf8 --- /dev/null +++ b/arch/arm/cpu/armv7/cache_v7_asm.S @@ -0,0 +1,85 @@ +/* + * Copyright (C) 2012-2015 Masahiro Yamada + * + * SPDX-License-Identifier: GPL-2.0+ + */ + +#include +#include +#include +#include + +#ifdef CONFIG_SYS_THUMB_BUILD +#define ARM(x...) +#define THUMB(x...) x +#else +#define ARM(x...) x +#define THUMB(x...) +#endif + +/* + * v7_flush_dcache_all() + * + * Flush the whole D-cache. + * + * Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode) + * + * Note: copied from arch/arm/mm/cache-v7.S of Linux 4.4 + */ +ENTRY(__v7_flush_dcache_all) + dmb @ ensure ordering with previous memory accesses + mrc p15, 1, r0, c0, c0, 1 @ read clidr + mov r3, r0, lsr #23 @ move LoC into position + ands r3, r3, #7 << 1 @ extract LoC*2 from clidr + beq finished @ if loc is 0, then no need to clean + mov r10, #0 @ start clean at cache level 0 +flush_levels: + add r2, r10, r10, lsr #1 @ work out 3x current cache level + mov r1, r0, lsr r2 @ extract cache type bits from clidr + and r1, r1, #7 @ mask of the bits for current cache only + cmp r1, #2 @ see what cache we have at this level + blt skip @ skip if no cache, or just i-cache + mcr p15, 2, r10, c0, c0, 0 @ select current cache level in cssr + isb @ isb to sych the new cssr&csidr + mrc p15, 1, r1, c0, c0, 0 @ read the new csidr + and r2, r1, #7 @ extract the length of the cache lines + add r2, r2, #4 @ add 4 (line length offset) + movw r4, #0x3ff + ands r4, r4, r1, lsr #3 @ find maximum number on the way size + clz r5, r4 @ find bit position of way size increment + movw r7, #0x7fff + ands r7, r7, r1, lsr #13 @ extract max number of the index size +loop1: + mov r9, r7 @ create working copy of max index +loop2: + ARM( orr r11, r10, r4, lsl r5 ) @ factor way and cache number into r11 + THUMB( lsl r6, r4, r5 ) + THUMB( orr r11, r10, r6 ) @ factor way and cache number into r11 + ARM( orr r11, r11, r9, lsl r2 ) @ factor index number into r11 + THUMB( lsl r6, r9, r2 ) + THUMB( orr r11, r11, r6 ) @ factor index number into r11 + mcr p15, 0, r11, c7, c14, 2 @ clean & invalidate by set/way + subs r9, r9, #1 @ decrement the index + bge loop2 + subs r4, r4, #1 @ decrement the way + bge loop1 +skip: + add r10, r10, #2 @ increment cache number + cmp r3, r10 + bgt flush_levels +finished: + mov r10, #0 @ swith back to cache level 0 + mcr p15, 2, r10, c0, c0, 0 @ select current cache level in cssr + dsb st + isb + bx lr +ENDPROC(__v7_flush_dcache_all) + +ENTRY(v7_flush_dcache_all) + ARM( stmfd sp!, {r4-r5, r7, r9-r11, lr} ) + THUMB( stmfd sp!, {r4-r7, r9-r11, lr} ) + bl __v7_flush_dcache_all + ARM( ldmfd sp!, {r4-r5, r7, r9-r11, lr} ) + THUMB( ldmfd sp!, {r4-r7, r9-r11, lr} ) + bx lr +ENDPROC(v7_flush_dcache_all)