From patchwork Fri Mar 26 16:15:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1458871 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=lbvxxwNL; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4F6Rrt4g5cz9sR4 for ; Sat, 27 Mar 2021 03:15:10 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8CA6C3844061; Fri, 26 Mar 2021 16:15:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8CA6C3844061 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1616775308; bh=I/cep+eSG4EwQJzg8QpfXABZpNnql5yQvHjBoDS3bYM=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=lbvxxwNLZixJKOnHY/D07JQH1/prBy8TQlUeiOtcpHAW1om0iwGK3X8u/YLqh0yWp KBpJQ/HZesng5cyokTYuyO4IlvPkBTAnQ9kmp0DgZBhutxtB41LHNRMBgcVM/bwnU0 WAOYgpjCbUkSLQB+TXLl6ZtxHm2WlmOA3GibaJd4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id E8C363857C63 for ; Fri, 26 Mar 2021 16:15:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E8C363857C63 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9499214BF for ; Fri, 26 Mar 2021 09:15:05 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3ACA93F792 for ; Fri, 26 Mar 2021 09:15:05 -0700 (PDT) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 03/13] aarch64: Add costs for LD[234]/ST[234] permutes References: Date: Fri, 26 Mar 2021 16:15:04 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 26 Mar 2021 16:12:42 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" At the moment, we cost LD[234] and ST[234] as N vector loads or stores, which effectively treats the implied permute as free. This patch adds additional costs for the permutes, which apply on top of the costs for the loads and stores. Like with the previous patches, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. gcc/ * config/aarch64/aarch64-protos.h (simd_vec_cost::ld2_st2_permute_cost) (simd_vec_cost::ld3_st3_permute_cost): New member variables. (simd_vec_cost::ld4_st4_permute_cost): Likewise. * config/aarch64/aarch64.c (generic_advsimd_vector_cost): Update accordingly, using zero for the new costs. (generic_sve_vector_cost, a64fx_advsimd_vector_cost): Likewise. (a64fx_sve_vector_cost, qdf24xx_advsimd_vector_cost): Likewise. (thunderx_advsimd_vector_cost, tsv110_advsimd_vector_cost): Likewise. (cortexa57_advsimd_vector_cost, exynosm1_advsimd_vector_cost) (xgene1_advsimd_vector_cost, thunderx2t99_advsimd_vector_cost) (thunderx3t110_advsimd_vector_cost): Likewise. (aarch64_ld234_st234_vectors): New function. (aarch64_adjust_stmt_cost): Likewise. (aarch64_add_stmt_cost): Call aarch64_adjust_stmt_cost if using the new vector costs. --- gcc/config/aarch64/aarch64-protos.h | 7 +++ gcc/config/aarch64/aarch64.c | 94 +++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index bfcab72b122..3d152754981 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -202,6 +202,13 @@ struct simd_vec_cost specially below. */ const int fp_stmt_cost; + /* Per-vector cost of permuting vectors after an LD2, LD3 or LD4, + as well as the per-vector cost of permuting vectors before + an ST2, ST3 or ST4. */ + const int ld2_st2_permute_cost; + const int ld3_st3_permute_cost; + const int ld4_st4_permute_cost; + /* Cost of a permute operation. */ const int permute_cost; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b62169a267a..8fb723dabd2 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -590,6 +590,9 @@ static const advsimd_vec_cost generic_advsimd_vector_cost = { 1, /* int_stmt_cost */ 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 2, /* permute_cost */ 2, /* reduc_i8_cost */ 2, /* reduc_i16_cost */ @@ -612,6 +615,9 @@ static const sve_vec_cost generic_sve_vector_cost = { 1, /* int_stmt_cost */ 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 2, /* permute_cost */ 2, /* reduc_i8_cost */ 2, /* reduc_i16_cost */ @@ -650,6 +656,9 @@ static const advsimd_vec_cost a64fx_advsimd_vector_cost = { 2, /* int_stmt_cost */ 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 3, /* permute_cost */ 13, /* reduc_i8_cost */ 13, /* reduc_i16_cost */ @@ -671,6 +680,9 @@ static const sve_vec_cost a64fx_sve_vector_cost = { 2, /* int_stmt_cost */ 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 3, /* permute_cost */ 13, /* reduc_i8_cost */ 13, /* reduc_i16_cost */ @@ -708,6 +720,9 @@ static const advsimd_vec_cost qdf24xx_advsimd_vector_cost = { 1, /* int_stmt_cost */ 3, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 2, /* permute_cost */ 1, /* reduc_i8_cost */ 1, /* reduc_i16_cost */ @@ -742,6 +757,9 @@ static const advsimd_vec_cost thunderx_advsimd_vector_cost = { 4, /* int_stmt_cost */ 1, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 4, /* permute_cost */ 2, /* reduc_i8_cost */ 2, /* reduc_i16_cost */ @@ -775,6 +793,9 @@ static const advsimd_vec_cost tsv110_advsimd_vector_cost = { 2, /* int_stmt_cost */ 2, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 2, /* permute_cost */ 3, /* reduc_i8_cost */ 3, /* reduc_i16_cost */ @@ -807,6 +828,9 @@ static const advsimd_vec_cost cortexa57_advsimd_vector_cost = { 2, /* int_stmt_cost */ 2, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 3, /* permute_cost */ 8, /* reduc_i8_cost */ 8, /* reduc_i16_cost */ @@ -840,6 +864,9 @@ static const advsimd_vec_cost exynosm1_advsimd_vector_cost = { 3, /* int_stmt_cost */ 3, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 3, /* permute_cost */ 3, /* reduc_i8_cost */ 3, /* reduc_i16_cost */ @@ -872,6 +899,9 @@ static const advsimd_vec_cost xgene1_advsimd_vector_cost = { 2, /* int_stmt_cost */ 2, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 2, /* permute_cost */ 4, /* reduc_i8_cost */ 4, /* reduc_i16_cost */ @@ -905,6 +935,9 @@ static const advsimd_vec_cost thunderx2t99_advsimd_vector_cost = { 4, /* int_stmt_cost */ 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 10, /* permute_cost */ 6, /* reduc_i8_cost */ 6, /* reduc_i16_cost */ @@ -938,6 +971,9 @@ static const advsimd_vec_cost thunderx3t110_advsimd_vector_cost = { 5, /* int_stmt_cost */ 5, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ 10, /* permute_cost */ 5, /* reduc_i8_cost */ 5, /* reduc_i16_cost */ @@ -14086,6 +14122,26 @@ aarch64_reduc_type (vec_info *vinfo, stmt_vec_info stmt_info) return -1; } +/* Return true if an access of kind KIND for STMT_INFO represents one + vector of an LD[234] or ST[234] operation. Return the total number of + vectors (2, 3 or 4) if so, otherwise return a value outside that range. */ +static int +aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, stmt_vec_info stmt_info) +{ + if ((kind == vector_load + || kind == unaligned_load + || kind == vector_store + || kind == unaligned_store) + && STMT_VINFO_DATA_REF (stmt_info)) + { + stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); + if (stmt_info + && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES) + return DR_GROUP_SIZE (stmt_info); + } + return 0; +} + /* Return true if creating multiple copies of STMT_INFO for Advanced SIMD vectors would produce a series of LDP or STP operations. KIND is the kind of statement that STMT_INFO represents. */ @@ -14320,6 +14376,38 @@ aarch64_sve_adjust_stmt_cost (class vec_info *vinfo, vect_cost_for_stmt kind, return stmt_cost; } +/* STMT_COST is the cost calculated for STMT_INFO, which has cost kind KIND + and which when vectorized would operate on vector type VECTYPE. Add the + cost of any embedded operations. */ +static unsigned int +aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, stmt_vec_info stmt_info, + tree vectype, unsigned int stmt_cost) +{ + if (vectype) + { + const simd_vec_cost *simd_costs = aarch64_simd_vec_costs (vectype); + + /* Detect cases in which a vector load or store represents an + LD[234] or ST[234] instruction. */ + switch (aarch64_ld234_st234_vectors (kind, stmt_info)) + { + case 2: + stmt_cost += simd_costs->ld2_st2_permute_cost; + break; + + case 3: + stmt_cost += simd_costs->ld3_st3_permute_cost; + break; + + case 4: + stmt_cost += simd_costs->ld4_st4_permute_cost; + break; + } + } + + return stmt_cost; +} + /* Implement targetm.vectorize.add_stmt_cost. */ static unsigned aarch64_add_stmt_cost (class vec_info *vinfo, void *data, int count, @@ -14347,6 +14435,12 @@ aarch64_add_stmt_cost (class vec_info *vinfo, void *data, int count, stmt_cost = aarch64_sve_adjust_stmt_cost (vinfo, kind, stmt_info, vectype, stmt_cost); + if (stmt_info && aarch64_use_new_vector_costs_p ()) + /* Account for any extra "embedded" costs that apply additively + to the base cost calculated above. */ + stmt_cost = aarch64_adjust_stmt_cost (kind, stmt_info, vectype, + stmt_cost); + /* Statements in an inner loop relative to the loop being vectorized are weighted more heavily. The value here is arbitrary and could potentially be improved with analysis. */