From patchwork Wed Sep 21 10:57:11 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 672814 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3sfGnz4955z9sCZ for ; Wed, 21 Sep 2016 20:58:47 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=Pk+6pwjI; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3sfGny4kdLzDsh3 for ; Wed, 21 Sep 2016 20:58:46 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=Pk+6pwjI; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from mail-pa0-x241.google.com (mail-pa0-x241.google.com [IPv6:2607:f8b0:400e:c03::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3sfGmK1G7hzDsTQ for ; Wed, 21 Sep 2016 20:57:21 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=Pk+6pwjI; dkim-atps=neutral Received: by mail-pa0-x241.google.com with SMTP id my20so2159850pab.3 for ; Wed, 21 Sep 2016 03:57:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=UntGolDFytco/JmMVr09yPEtJF83VI76hrbGvL3WYVs=; b=Pk+6pwjILqy9tYDcvQ23zbM1HwO12TG/63Qggkg4UAetiqfPNhkk/vyaBzM90k2a+s XAdgKqrzgpH9mkGrtgM9lrdEi5DAeNJEK/FprUGOdExvG6dozEuMyBVXDmrEkzxBxsf2 ToNpQeVyF30OAarZr8OtluoQegggXlqmDTuTTgNnVRhK4apxaeG49lIeEhQB9ToYaT6O jWhCJwVwgMuT85QWOc4vXb5JZZkIeLaXSYSr54XW6twPz8S2CcvaPv53q9L2ErxOWfXl QHPQbj21Cuzt26aWsHFFwGH7FqDaNRvez+MkIyGxjJKaUFAo62/Fp87SvHZO3cQWgjDN l6qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=UntGolDFytco/JmMVr09yPEtJF83VI76hrbGvL3WYVs=; b=EwYH3krVOHsSbcXNYfdSY+QuKYTvJy7v9frDXMnb+RcGfTqQNyr7mXSm6wrfSA2jlM Mqbge/b6sN1oOiMwRLkzHzguiY4QbXlpQguAATHRjuK2bKexY4aI4qdlD3mrOWMnzy5N 0lK7Ww1rKc7Ec2n9S9gfbbg9qQhFAsPvPFvEv6noA6mhFLpSnDSmOxGCNujJS/Z7UtCH FtL+Zt7P48a68VEzDbjVA5WcLYH+zuK/CMrEFSM8tQtrmqPMcXX+xBAD8sRGDNGtp+Tg Ai7l0QEyJT+TDOX8Eedx9QTui3839hwkGjfgZf6XIU7Fixzjy0r8NDHT/y3W3JaHv5YE A9Sw== X-Gm-Message-State: AE9vXwNJ4fADCH3zlzltJqcLsg0ctVv7jEaINoaul2VBVNdh3L1XEtdHvcO6TKVRIWXevQ== X-Received: by 10.66.230.228 with SMTP id tb4mr63194062pac.99.1474455439197; Wed, 21 Sep 2016 03:57:19 -0700 (PDT) Received: from roar.ozlabs.ibm.com ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id x9sm30944516pff.19.2016.09.21.03.57.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 21 Sep 2016 03:57:18 -0700 (PDT) Date: Wed, 21 Sep 2016 20:57:11 +1000 From: Nicholas Piggin To: Tejun Heo , Christoph Lameter Subject: Re: [PATCH] percpu: improve generic percpu modify-return implementation Message-ID: <20160921205711.4e804777@roar.ozlabs.ibm.com> In-Reply-To: <20160921085137.862-1-npiggin@gmail.com> References: <20160921085137.862-1-npiggin@gmail.com> Organization: IBM X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Wed, 21 Sep 2016 18:51:37 +1000 Nicholas Piggin wrote: > Some architectures require an additional load to find the address of > percpu pointers. In some implemenatations, the C aliasing rules do not > allow the result of that load to be kept over the store that modifies > the percpu variable, which causes additional loads. Sorry I picked up an old patch here. This one should be better. From d0cb9052d6f4c31d24f999b7b0cecb34681eee9b Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Wed, 21 Sep 2016 18:23:43 +1000 Subject: [PATCH] percpu: improve generic percpu modify-return implementations Some architectures require an additional load to find the address of percpu pointers. In some implemenatations, the C aliasing rules do not allow the result of that load to be kept over the store that modifies the percpu variable, which causes additional loads. Work around this by finding the pointer first, then operating on that. It's also possible to mark things as restrict and those kind of games, but that can require larger and arch specific changes. On powerpc, __this_cpu_inc_return compiles to: ld 10,48(13) ldx 9,3,10 addi 9,9,1 stdx 9,3,10 ld 9,48(13) ldx 3,9,3 With this patch it compiles to: ld 10,48(13) ldx 9,3,10 addi 9,9,1 stdx 9,3,10 Signed-off-by: Nicholas Piggin --- include/asm-generic/percpu.h | 53 +++++++++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 23 deletions(-) diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h index 4d9f233..40e8870 100644 --- a/include/asm-generic/percpu.h +++ b/include/asm-generic/percpu.h @@ -65,6 +65,11 @@ extern void setup_per_cpu_areas(void); #define PER_CPU_DEF_ATTRIBUTES #endif +#define raw_cpu_generic_read(pcp) \ +({ \ + *raw_cpu_ptr(&(pcp)); \ +}) + #define raw_cpu_generic_to_op(pcp, val, op) \ do { \ *raw_cpu_ptr(&(pcp)) op val; \ @@ -72,34 +77,39 @@ do { \ #define raw_cpu_generic_add_return(pcp, val) \ ({ \ - raw_cpu_add(pcp, val); \ - raw_cpu_read(pcp); \ + typeof(&(pcp)) __p = raw_cpu_ptr(&(pcp)); \ + \ + *__p += val; \ + *__p; \ }) #define raw_cpu_generic_xchg(pcp, nval) \ ({ \ + typeof(&(pcp)) __p = raw_cpu_ptr(&(pcp)); \ typeof(pcp) __ret; \ - __ret = raw_cpu_read(pcp); \ - raw_cpu_write(pcp, nval); \ + __ret = *__p; \ + *__p = nval; \ __ret; \ }) #define raw_cpu_generic_cmpxchg(pcp, oval, nval) \ ({ \ + typeof(&(pcp)) __p = raw_cpu_ptr(&(pcp)); \ typeof(pcp) __ret; \ - __ret = raw_cpu_read(pcp); \ + __ret = *__p; \ if (__ret == (oval)) \ - raw_cpu_write(pcp, nval); \ + *__p = nval; \ __ret; \ }) #define raw_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) \ ({ \ + typeof(&(pcp1)) __p1 = raw_cpu_ptr(&(pcp1)); \ + typeof(&(pcp2)) __p2 = raw_cpu_ptr(&(pcp2)); \ int __ret = 0; \ - if (raw_cpu_read(pcp1) == (oval1) && \ - raw_cpu_read(pcp2) == (oval2)) { \ - raw_cpu_write(pcp1, nval1); \ - raw_cpu_write(pcp2, nval2); \ + if (*__p1 == (oval1) && *__p2 == (oval2)) { \ + *__p1 = nval1; \ + *__p2 = nval2; \ __ret = 1; \ } \ (__ret); \ @@ -109,7 +119,7 @@ do { \ ({ \ typeof(pcp) __ret; \ preempt_disable(); \ - __ret = *this_cpu_ptr(&(pcp)); \ + __ret = raw_cpu_generic_read(pcp); \ preempt_enable(); \ __ret; \ }) @@ -118,17 +128,17 @@ do { \ do { \ unsigned long __flags; \ raw_local_irq_save(__flags); \ - *raw_cpu_ptr(&(pcp)) op val; \ + raw_cpu_generic_to_op(pcp, val, op); \ raw_local_irq_restore(__flags); \ } while (0) + #define this_cpu_generic_add_return(pcp, val) \ ({ \ typeof(pcp) __ret; \ unsigned long __flags; \ raw_local_irq_save(__flags); \ - raw_cpu_add(pcp, val); \ - __ret = raw_cpu_read(pcp); \ + __ret = raw_cpu_generic_add_return(pcp, val); \ raw_local_irq_restore(__flags); \ __ret; \ }) @@ -138,8 +148,7 @@ do { \ typeof(pcp) __ret; \ unsigned long __flags; \ raw_local_irq_save(__flags); \ - __ret = raw_cpu_read(pcp); \ - raw_cpu_write(pcp, nval); \ + __ret = raw_cpu_generic_xchg(pcp, nval); \ raw_local_irq_restore(__flags); \ __ret; \ }) @@ -149,9 +158,7 @@ do { \ typeof(pcp) __ret; \ unsigned long __flags; \ raw_local_irq_save(__flags); \ - __ret = raw_cpu_read(pcp); \ - if (__ret == (oval)) \ - raw_cpu_write(pcp, nval); \ + __ret = raw_cpu_generic_cmpxchg(pcp, oval, nval); \ raw_local_irq_restore(__flags); \ __ret; \ }) @@ -168,16 +175,16 @@ do { \ }) #ifndef raw_cpu_read_1 -#define raw_cpu_read_1(pcp) (*raw_cpu_ptr(&(pcp))) +#define raw_cpu_read_1(pcp) raw_cpu_generic_read(pcp) #endif #ifndef raw_cpu_read_2 -#define raw_cpu_read_2(pcp) (*raw_cpu_ptr(&(pcp))) +#define raw_cpu_read_2(pcp) raw_cpu_generic_read(pcp) #endif #ifndef raw_cpu_read_4 -#define raw_cpu_read_4(pcp) (*raw_cpu_ptr(&(pcp))) +#define raw_cpu_read_4(pcp) raw_cpu_generic_read(pcp) #endif #ifndef raw_cpu_read_8 -#define raw_cpu_read_8(pcp) (*raw_cpu_ptr(&(pcp))) +#define raw_cpu_read_8(pcp) raw_cpu_generic_read(pcp) #endif #ifndef raw_cpu_write_1