From patchwork Wed Sep 21 08:51:37 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 672699 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3sfDRG18Gjz9sD5 for ; Wed, 21 Sep 2016 19:12:26 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=v43v2PoO; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3sfDRF714xzDvdg for ; Wed, 21 Sep 2016 19:12:25 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=v43v2PoO; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from mail-pa0-x243.google.com (mail-pa0-x243.google.com [IPv6:2607:f8b0:400e:c03::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3sfCzS75LlzDsjB for ; Wed, 21 Sep 2016 18:51:48 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=v43v2PoO; dkim-atps=neutral Received: by mail-pa0-x243.google.com with SMTP id hi10so2020816pac.2 for ; Wed, 21 Sep 2016 01:51:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=E0+RdE9tJEbTuLgFCaQ+l8K08htfqSNF4cxMhw+j9cE=; b=v43v2PoOw7/k6W9fbzcfSkgnZ1qK7EGXU3zeMnCZukKNSkgvYaebZQ1RxVj8dGynYU EprIY7yxTB8ZPulSFAofutRUL4xqcpg0cQyEslGfcav3ZPOrfUw9qDyldj+mX8gTM8xA iFsfbzvAFQPaB2z9oyRN7tdAvrJQaSF9P/Rjrw/0QQ2PhDy4gNpBONHe6WK2WkaSnTvn Tqv5TBMgn8ZBUKunuNnFjtOZSqwBa8Tdj6KLjenm5Le3X9hDY3Y7UW0OYtwlOyLmG5W3 g4aX9uowPzn3wWi7WMyWiYWH0GMQeBxefKZPT0xabo1ndyUdpEoaM6EXONNfQNds2ick UYAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=E0+RdE9tJEbTuLgFCaQ+l8K08htfqSNF4cxMhw+j9cE=; b=QqWBTVQEectF1KO69iuWWN9GPUm4cX+86g58RItit6TJ4F68nGdVhGzeaCYP6s+vSH 84uZ6zg+KX2bJMcNX1DKEVedq57DdofKJx8bi1SB2V54R7aOO8Obf2lo1aQYD5M21YWX mOFOdWtHBxhEGBG1Ea21FgFo4Xysg+K/8DFM5a0BarEp6Temp7uL63RvU7GZMvrKqw30 07Oayup0kyHA1WwBnBDRatdGmGvhyy7JtHQcLvYweqTJH5XgZgJ+9TqN7JO1JOIDDoRD sTcOtrYNRc6o17kWpPECnpkItok7GcO01maU/j7Imxp+F9iz6cKil3kNbPiO2qd9E3sc Wv5Q== X-Gm-Message-State: AE9vXwOs3w34g4rYaaLD600/795U45zwYAM5+DqX7wMntkeDth1Ph33/YjcL6qTgFc4W/g== X-Received: by 10.66.43.82 with SMTP id u18mr63052735pal.28.1474447907297; Wed, 21 Sep 2016 01:51:47 -0700 (PDT) Received: from roar.ozlabs.ibm.com ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id 25sm37196683pfj.23.2016.09.21.01.51.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 Sep 2016 01:51:46 -0700 (PDT) From: Nicholas Piggin To: Tejun Heo , Christoph Lameter Subject: [PATCH] percpu: improve generic percpu modify-return implementation Date: Wed, 21 Sep 2016 18:51:37 +1000 Message-Id: <20160921085137.862-1-npiggin@gmail.com> X-Mailer: git-send-email 2.9.3 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Nicholas Piggin Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Some architectures require an additional load to find the address of percpu pointers. In some implemenatations, the C aliasing rules do not allow the result of that load to be kept over the store that modifies the percpu variable, which causes additional loads. Work around this by finding the pointer first, then operating on that. It's also possible to mark things as restrict and those kind of games, but that can require larger and arch specific changes. On powerpc, __this_cpu_inc_return compiles to: ld 10,48(13) ldx 9,3,10 addi 9,9,1 stdx 9,3,10 ld 9,48(13) ldx 3,9,3 With this patch it saves 2 loads: ld 10,48(13) ldx 9,3,10 addi 9,9,1 stdx 9,3,10 Signed-off-by: Nicholas Piggin --- include/asm-generic/percpu.h | 54 +++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h index 4d9f233..3fe18fe 100644 --- a/include/asm-generic/percpu.h +++ b/include/asm-generic/percpu.h @@ -65,6 +65,11 @@ extern void setup_per_cpu_areas(void); #define PER_CPU_DEF_ATTRIBUTES #endif +#define raw_cpu_generic_read(pcp) \ +({ \ + *raw_cpu_ptr(&(pcp)); \ +}) + #define raw_cpu_generic_to_op(pcp, val, op) \ do { \ *raw_cpu_ptr(&(pcp)) op val; \ @@ -72,34 +77,39 @@ do { \ #define raw_cpu_generic_add_return(pcp, val) \ ({ \ - raw_cpu_add(pcp, val); \ - raw_cpu_read(pcp); \ + typeof(&(pcp)) __p = raw_cpu_ptr(&(pcp)); \ + \ + *__p += val; \ + *__p; \ }) #define raw_cpu_generic_xchg(pcp, nval) \ ({ \ + typeof(&(pcp)) __p = raw_cpu_ptr(&(pcp)); \ typeof(pcp) __ret; \ - __ret = raw_cpu_read(pcp); \ - raw_cpu_write(pcp, nval); \ + __ret = *__p; \ + *__p = nval; \ __ret; \ }) #define raw_cpu_generic_cmpxchg(pcp, oval, nval) \ ({ \ + typeof(&(pcp)) __p = raw_cpu_ptr(&(pcp)); \ typeof(pcp) __ret; \ - __ret = raw_cpu_read(pcp); \ + __ret = *__p; \ if (__ret == (oval)) \ - raw_cpu_write(pcp, nval); \ + *__p = nval; \ __ret; \ }) #define raw_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) \ ({ \ + typeof(&(pcp1)) __p1 = raw_cpu_ptr(&(pcp1)); \ + typeof(&(pcp2)) __p2 = raw_cpu_ptr(&(pcp2)); \ int __ret = 0; \ - if (raw_cpu_read(pcp1) == (oval1) && \ - raw_cpu_read(pcp2) == (oval2)) { \ - raw_cpu_write(pcp1, nval1); \ - raw_cpu_write(pcp2, nval2); \ + if (*__p1 == (oval1) && *__p2 == (oval2)) { \ + *__p1 = nval1; \ + *__p2 = nval2; \ __ret = 1; \ } \ (__ret); \ @@ -109,7 +119,7 @@ do { \ ({ \ typeof(pcp) __ret; \ preempt_disable(); \ - __ret = *this_cpu_ptr(&(pcp)); \ + __ret = raw_cpu_generic_read(pcp); \ preempt_enable(); \ __ret; \ }) @@ -118,17 +128,18 @@ do { \ do { \ unsigned long __flags; \ raw_local_irq_save(__flags); \ - *raw_cpu_ptr(&(pcp)) op val; \ + raw_cpu_generic_to_op(pcp, val, op); \ raw_local_irq_restore(__flags); \ } while (0) + #define this_cpu_generic_add_return(pcp, val) \ ({ \ + typeof(&(pcp)) __p = raw_cpu_ptr(&(pcp)); \ typeof(pcp) __ret; \ unsigned long __flags; \ raw_local_irq_save(__flags); \ - raw_cpu_add(pcp, val); \ - __ret = raw_cpu_read(pcp); \ + __ret = raw_cpu_generic_add_return(pcp, val); \ raw_local_irq_restore(__flags); \ __ret; \ }) @@ -138,8 +149,7 @@ do { \ typeof(pcp) __ret; \ unsigned long __flags; \ raw_local_irq_save(__flags); \ - __ret = raw_cpu_read(pcp); \ - raw_cpu_write(pcp, nval); \ + __ret = raw_cpu_generic_xchg(pcp); \ raw_local_irq_restore(__flags); \ __ret; \ }) @@ -149,9 +159,7 @@ do { \ typeof(pcp) __ret; \ unsigned long __flags; \ raw_local_irq_save(__flags); \ - __ret = raw_cpu_read(pcp); \ - if (__ret == (oval)) \ - raw_cpu_write(pcp, nval); \ + __ret = raw_cpu_generic_cmpxchg(pcp, oval, nval); \ raw_local_irq_restore(__flags); \ __ret; \ }) @@ -168,16 +176,16 @@ do { \ }) #ifndef raw_cpu_read_1 -#define raw_cpu_read_1(pcp) (*raw_cpu_ptr(&(pcp))) +#define raw_cpu_read_1(pcp) raw_cpu_generic_read(pcp) #endif #ifndef raw_cpu_read_2 -#define raw_cpu_read_2(pcp) (*raw_cpu_ptr(&(pcp))) +#define raw_cpu_read_2(pcp) raw_cpu_generic_read(pcp) #endif #ifndef raw_cpu_read_4 -#define raw_cpu_read_4(pcp) (*raw_cpu_ptr(&(pcp))) +#define raw_cpu_read_4(pcp) raw_cpu_generic_read(pcp) #endif #ifndef raw_cpu_read_8 -#define raw_cpu_read_8(pcp) (*raw_cpu_ptr(&(pcp))) +#define raw_cpu_read_8(pcp) raw_cpu_generic_read(pcp) #endif #ifndef raw_cpu_write_1