From patchwork Tue Dec 23 07:55:32 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Wang X-Patchwork-Id: 423602 X-Patchwork-Delegate: michael@ellerman.id.au Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4D61F1400D2 for ; Tue, 23 Dec 2014 18:56:15 +1100 (AEDT) Received: from ozlabs.org (ozlabs.org [103.22.144.67]) by lists.ozlabs.org (Postfix) with ESMTP id 234C11A0E2D for ; Tue, 23 Dec 2014 18:56:15 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0119.outbound.protection.outlook.com [207.46.100.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id E51A91A0BEB for ; Tue, 23 Dec 2014 18:55:39 +1100 (AEDT) Received: from BN1PR03MB188.namprd03.prod.outlook.com (10.255.200.151) by BN1PR03MB188.namprd03.prod.outlook.com (10.255.200.151) with Microsoft SMTP Server (TLS) id 15.1.49.12; Tue, 23 Dec 2014 07:55:32 +0000 Received: from BN1PR03MB188.namprd03.prod.outlook.com ([169.254.16.152]) by BN1PR03MB188.namprd03.prod.outlook.com ([169.254.16.152]) with mapi id 15.01.0049.002; Tue, 23 Dec 2014 07:55:32 +0000 From: "Dongsheng.Wang@freescale.com" To: "Dongsheng.Wang@freescale.com" , "Michael Ellerman" Subject: RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. Thread-Topic: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. Thread-Index: AQHQHbI8OGAvNuJTN02L8IMWKMnNwJycXFmAgAAUKtCAADuCAIAAA63QgAAfcgA= Date: Tue, 23 Dec 2014 07:55:32 +0000 Message-ID: References: <1419230320-37558-1-git-send-email-dongsheng.wang@freescale.com> <1419296441.30550.3.camel@ellerman.id.au> <1419313550.30550.7.camel@ellerman.id.au> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [123.151.195.53] authentication-results: spf=none (sender IP is ) smtp.mailfrom=Dongsheng.Wang@freescale.com; x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:BN1PR03MB188; x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:;SRVR:BN1PR03MB188; x-forefront-prvs: 04347F8039 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(51704005)(24454002)(377454003)(377424004)(199003)(189002)(13464003)(54356999)(76176999)(62966003)(77156002)(101416001)(2656002)(68736005)(93886004)(21056001)(50986999)(122556002)(31966008)(40100003)(74316001)(99396003)(92566001)(15975445007)(87936001)(54206007)(2900100001)(102836002)(66066001)(54606007)(86362001)(20776003)(107046002)(97736003)(4396001)(46102003)(120916001)(99286002)(64706001)(76576001)(106356001)(105586002)(106116001)(33656002)(19580395003)(19580405001); DIR:OUT; SFP:1102; SCL:1; SRVR:BN1PR03MB188; H:BN1PR03MB188.namprd03.prod.outlook.com; FPR:; SPF:None; MLV:sfv; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: freescale.com does not designate permitted sender hosts) MIME-Version: 1.0 X-OriginatorOrg: freescale.com X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Dec 2014 07:55:32.3993 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 710a03f5-10f6-4d38-9ff4-a80b81da590d X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN1PR03MB188 Cc: Scott Wood , "linuxppc-dev@lists.ozlabs.org" , "anton@samba.org" X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" > -----Original Message----- > From: Wang Dongsheng-B40534 > Sent: Tuesday, December 23, 2014 2:53 PM > To: 'Michael Ellerman' > Cc: benh@kernel.crashing.org; Wood Scott-B07421; anton@samba.org; linuxppc- > dev@lists.ozlabs.org > Subject: RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. > > > > > -----Original Message----- > > From: Michael Ellerman [mailto:mpe@ellerman.id.au] > > Sent: Tuesday, December 23, 2014 1:46 PM > > To: Wang Dongsheng-B40534 > > Cc: benh@kernel.crashing.org; Wood Scott-B07421; anton@samba.org; > > linuxppc- dev@lists.ozlabs.org > > Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. > > > > On Tue, 2014-12-23 at 02:41 +0000, Dongsheng.Wang@freescale.com wrote: > > > > -----Original Message----- > > > > From: Michael Ellerman [mailto:mpe@ellerman.id.au] > > > > Sent: Tuesday, December 23, 2014 9:01 AM > > > > To: Wang Dongsheng-B40534 > > > > Cc: benh@kernel.crashing.org; Wood Scott-B07421; anton@samba.org; > > > > linuxppc- dev@lists.ozlabs.org > > > > Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. > > > > > > > > On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote: > > > > > From: Wang Dongsheng > > > > > > > > > > Kernel cannot bring up Non-boot cpus always get "Processor xx is stuck". > > > > > this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc: > > > > > Secondary CPUs must set cpu_callin_map after setting active and > > > > > online) We need to take timebase after bootup cpu give the > > > > > timebase > > firstly. > > > > > > > > > > When start_secondary, non-boot cpus set cpu_callin_map for boot > > > > > cpu after that boot cpu will give the timebase for non-boot cpu. > > > > > Otherwise non-boot cpus will fall in dead loop to waiting bootup > > > > > cpu to give imebase. > > > > > > > > Right. > > > > > > > > However, doesn't this introduce the possibility that the secondary > > > > cpu is up and marked online but has an unsynchronised clock? > > > > > Yes, right. But Freescale platform boot-cpu will freeze the TB until > > > secondary cpu take the time base, so the clock is synchronized. > > > > It does the freeze in give_timebase() doesn't it? > > > > So there's still a window there where the secondary is up & online but > > hasn't had it's timebase synchronised, and the primary hasn't frozen the > timebase yet. > > So that makes me nervous. > > > > > For generic PowerPC maybe has this issue. So for safe I think we > > > need to set cpu online after synchronized clock. > > > > > > I will update my patch if you agree this way. > > > + if (smp_ops->take_timebase) > > > + smp_ops->take_timebase(); > > > + secondary_cpu_time_init(); > > > + > > > Move set_cpu_online to here. > > > + set_cpu_online(cpu, true); > > > > But that reverses the effect of the original patch, which was that we > > have to set online *before* we set the callin map. > > > > > > Looking harder at Anton's patch I'm not sure it's right anyway. > > > > The issue he was trying to fix was that the cpu was online but not > > active, which confused the scheduler. > > > > I think Anton missed that we have a loop that waits for online at the > > bottom of > > __cpu_up(): > > > > /* Wait until cpu puts itself in the online map */ > > while (!cpu_online(cpu)) > > cpu_relax(); > > > > > > He must have seen a case where that popped due to the cpu being > > online, but the cpu wasn't yet active. > > > > His patch fixed the problem by ensuring the previous loop that waits > > for cpu_callin_map doesn't finish until active & online are set, > > making the while loop above a nop. > > > > So I think we should probably revert Anton's patch and instead change > > that while loop to: > > > > /* Wait until cpu is online AND active */ > > while (!cpu_online(cpu) || !cpu_active(cpu)) > > cpu_relax(); > > Base on Anton's patch, we should probably change __cup_up. Please comment the changes. Regards, -Dongsheng > Umm.. Sorry about that...I forgot Anton's patch. > > But set_cpu_online also set cpu_active_bits, I think this judgment cannot fix > Anton's issue. > It is actually the effects of the original is the same. > > Regrads, > -Dongsheng --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -528,12 +528,10 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) } /* - * wait to see if the cpu made a callin (is actually up). - * use this value that I found through experimentation. - * -- Cort + * Wait until cpu puts itself in the online map */ if (system_state < SYSTEM_RUNNING) - for (c = 50000; c && !cpu_callin_map[cpu]; c--) + for (c = 50000; c && !cpu_online(cpu); c--) udelay(100); #ifdef CONFIG_HOTPLUG_CPU else @@ -541,11 +539,10 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) * CPUs can take much longer to come up in the * hotplug case. Wait five seconds. */ - for (c = 5000; c && !cpu_callin_map[cpu]; c--) + for (c = 5000; c && !cpu_online(cpu); c--) msleep(1); #endif - - if (!cpu_callin_map[cpu]) { + if (!cpu_online(cpu)) { printk(KERN_ERR "Processor %u is stuck.\n", cpu); return -ENOENT; } @@ -555,8 +552,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) if (smp_ops->give_timebase) smp_ops->give_timebase(); - /* Wait until cpu puts itself in the online map */ - while (!cpu_online(cpu)) + /* Wait until cpu synchronized clock */ + while (!cpu_callin_map[cpu]) cpu_relax(); return 0; @@ -703,10 +700,6 @@ void start_secondary(void *unused) if (smp_ops->setup_cpu) smp_ops->setup_cpu(cpu); - if (smp_ops->take_timebase) - smp_ops->take_timebase(); - - secondary_cpu_time_init(); #ifdef CONFIG_PPC64 if (system_state == SYSTEM_RUNNING) @@ -738,6 +731,10 @@ void start_secondary(void *unused) notify_cpu_starting(cpu); set_cpu_online(cpu, true); + if (smp_ops->take_timebase) + smp_ops->take_timebase(); + + secondary_cpu_time_init(); /* * CPU must be marked active and online before we signal back to the * master, because the scheduler needs to see the cpu_online and