From patchwork Tue Jul 30 16:02:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 1966618 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=ir9dZ3YJ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKmC0NZ8z1ybV for ; Wed, 31 Jul 2024 02:04:11 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sYpJi-00039e-Mz; Tue, 30 Jul 2024 12:03:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sYpJh-000346-EF for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:13 -0400 Received: from mail-wm1-x336.google.com ([2a00:1450:4864:20::336]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sYpJe-0000Fc-MD for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:13 -0400 Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-4266f3e0df8so27917225e9.2 for ; Tue, 30 Jul 2024 09:03:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1722355389; x=1722960189; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=MzA6yndkssVsonTKDMnjVcpdRFu5VtP6syrp6zmpq4E=; b=ir9dZ3YJPUo/CIDm54tqFp1AasR6mmctWCOwGZ4khO5oLWMSW77KRSNOp0vQX10kdX fKJ0AyQqP1trHICdwJ/xa3hobpseOgGYlRDVR2gXxUtQYXcCNfbS3SCreusKmZtpBEr/ R4h7fg3fLLEdh0mB+DQhuUg4QvI4w8b7MlJh4x4JXnCTULieClDeHFk0RWD1cBTtKVYs PSgzmSUCUdCuZnZdIqwc1JvYVNsI8+9HW18+HGMq+82W65AUOLUyNZUJqKJTQ965mU9m 8KZWpyer3DUiA259xhAdu4N+GCan/rrCA7jPZfOMko5Ejsr7fxUuH01BbuMFwTs7UQZL marQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722355389; x=1722960189; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MzA6yndkssVsonTKDMnjVcpdRFu5VtP6syrp6zmpq4E=; b=gd7QLuPRy99YWVHOMOqWKv1PolF9Fm4/6j4iG/jMP3wyHCLpoqaDt7T+4fyHMz46yg U0kiRef5cGYqpfLBeQT5C9UGqKlnrbJtmULZBafe6js+kymEZHm0nFckUzXGvj32qkId OBWUnfquaQf5dxxLXodHqzXPlWi1dJkxICpfEcwMWNOIRSN61AC4LrUQI7Yu+MZhq85J MC5cFF6jQZ8mHOrsnAMNdjY3GXYauXsSxjNvTIum6zR0kziOxZGFzrwpS13BERAkT30I 1dIUuj3pxRDz1Om3F74b9EgOOS3lyg6BN3T1QO2xNUyJHrPZ+KEbWZlrOekuO6/vHWom t4sQ== X-Forwarded-Encrypted: i=1; AJvYcCVjJxoRy+4BDej80BmLLu8mDypu5ehD8p8H4bVtNee27su4W2exPutpJvlK7ASSNn44rn5u+zFnSzO0xeoJ3HcJWhvvv6M= X-Gm-Message-State: AOJu0YwI5QOnIF9jL6n7sSdjHs8UfoD9tkvfF4PWq1n3jw7e+jlXZ8zT AHDwfsq7c+5++YFiytZon82sdoYDtY9sbPTKYEN++PQYW4okvdsf6GmlHtesrVkdqoLoPccg5o+ n X-Google-Smtp-Source: AGHT+IE+yXJxpolR06rGC1+uZtIYYRn9kaSlm//LYMQyZQ6nvbdGrkjwW0AWIRewFbgjZ0Bz4odOuQ== X-Received: by 2002:adf:fc0c:0:b0:366:e7aa:7fa5 with SMTP id ffacd0b85a97d-36b5d0b094cmr7074359f8f.1.1722355388644; Tue, 30 Jul 2024 09:03:08 -0700 (PDT) Received: from orth.archaic.org.uk (orth.archaic.org.uk. [2001:8b0:1d0::2]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36b3685810csm15001676f8f.71.2024.07.30.09.03.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 09:03:08 -0700 (PDT) From: Peter Maydell To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 1/8] target/arm: Allow setting the FPCR.EBF bit for FEAT_EBF16 Date: Tue, 30 Jul 2024 17:02:59 +0100 Message-Id: <20240730160306.2959745-2-peter.maydell@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730160306.2959745-1-peter.maydell@linaro.org> References: <20240730160306.2959745-1-peter.maydell@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::336; envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x336.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org FEAT_EBF16 adds one new bit to the FPCR floating point control register. Allow this bit to be read and written when the ID registers indicate the presence of the feature. Note that because this new bit is not in FPSCR_FPCR_MASK the bit is not visible in the AArch32 FPSCR, and FPSCR writes do not affect it. Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- target/arm/cpu-features.h | 5 +++++ target/arm/cpu.h | 1 + target/arm/vfp_helper.c | 8 ++++++-- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h index c59ca104fe1..cfb82c23cad 100644 --- a/target/arm/cpu-features.h +++ b/target/arm/cpu-features.h @@ -556,6 +556,11 @@ static inline bool isar_feature_aa64_bf16(const ARMISARegisters *id) return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, BF16) != 0; } +static inline bool isar_feature_aa64_ebf16(const ARMISARegisters *id) +{ + return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, BF16) > 1; +} + static inline bool isar_feature_aa64_rcpc_8_3(const ARMISARegisters *id) { return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, LRCPC) != 0; diff --git a/target/arm/cpu.h b/target/arm/cpu.h index a12859fc533..34df9d7e39b 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -1707,6 +1707,7 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val); #define FPCR_OFE (1 << 10) /* Overflow exception trap enable */ #define FPCR_UFE (1 << 11) /* Underflow exception trap enable */ #define FPCR_IXE (1 << 12) /* Inexact exception trap enable */ +#define FPCR_EBF (1 << 13) /* Extended BFloat16 behaviors */ #define FPCR_IDE (1 << 15) /* Input Denormal exception trap enable */ #define FPCR_LEN_MASK (7 << 16) /* LEN, A-profile only */ #define FPCR_FZ16 (1 << 19) /* ARMv8.2+, FP16 flush-to-zero */ diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c index b3698da8ca7..203d37303bd 100644 --- a/target/arm/vfp_helper.c +++ b/target/arm/vfp_helper.c @@ -254,6 +254,10 @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask) val &= ~FPCR_FZ16; } + if (!cpu_isar_feature(aa64_ebf16, cpu)) { + val &= ~FPCR_EBF; + } + vfp_set_fpcr_to_host(env, val, mask); if (mask & (FPCR_LEN_MASK | FPCR_STRIDE_MASK)) { @@ -278,12 +282,12 @@ static void vfp_set_fpcr_masked(CPUARMState *env, uint32_t val, uint32_t mask) * We don't implement trapped exception handling, so the * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!) * - * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode + * The FPCR bits we keep in vfp.fpcr are AHP, DN, FZ, RMode, EBF * and FZ16. Len, Stride and LTPSIZE we just handled. Store those bits * there, and zero any of the other FPCR bits and the RES0 and RAZ/WI * bits. */ - val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16; + val &= FPCR_AHP | FPCR_DN | FPCR_FZ | FPCR_RMODE_MASK | FPCR_FZ16 | FPCR_EBF; env->vfp.fpcr &= ~mask; env->vfp.fpcr |= val; } From patchwork Tue Jul 30 16:03:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 1966615 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=pMb9oii5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKlL0BW4z1ybV for ; Wed, 31 Jul 2024 02:03:26 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sYpJj-0003Ee-Sw; Tue, 30 Jul 2024 12:03:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sYpJh-00034b-I5 for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:13 -0400 Received: from mail-wr1-x42a.google.com ([2a00:1450:4864:20::42a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sYpJe-0000Fn-Mf for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:13 -0400 Received: by mail-wr1-x42a.google.com with SMTP id ffacd0b85a97d-3683178b226so2137778f8f.1 for ; Tue, 30 Jul 2024 09:03:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1722355389; x=1722960189; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=J8QodGCyXxyaENLTKMO9Txf9gYtGANATnCYmDKKTwbU=; b=pMb9oii565HwEJFr2CgmyD59rMk/n9JX0/04HGWcG9Vad35iKrkX4tPZldxCiRzpi8 2k+otYTYcaIwTfpCv8rnkmE5t3Z6yVlYLf6Ryju5Ezeg7L2aUGx4KGEdYb9X5XnjoTOL qG9OsSZpj1gJTaTLlcqYLSvyTiwb89YEMMXTGf1UoB2RHTqqUmwiqLeeRBD6H8L2ws0h AnpQpze+djgnT69r41E3AqOR1WrkCGNTsGOi9fgVzXyFQJFOGO7as9iMF4nUY1BCfFdZ txm2iOwpAMpuffnYQePMOjXCkuZYi1ISkJpmbbWomjig2k758fziJ3aAv0zKmWOas9gF a4OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722355389; x=1722960189; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=J8QodGCyXxyaENLTKMO9Txf9gYtGANATnCYmDKKTwbU=; b=K0sVzZ21kB8TPABWAKEttsFqOe0DZagbW/8yOq1V/7jUEE5kPsYII1d5mZEE/r1YYF ot8v1vOgKpYXyczWp61AjKNZKLc54u+IqAtSSQDW2ZH/fpqYV78GDjzP0tuz2YJY4C9X o8K95d+FT+vRQ154nR3t1Fq30njFdqkX6dOuLsPgXv/dDGMCnq4Lm53Z6Ut9NzsmZ9UB 0AdPPeN2ad/OwJDDOYJAH0eWpbjmCoGKQJHd+4cAf9n8O4eYg2CtJMuzY9cnHrYkwoRv eyQaEvUQJFeHQjjHymDX5BXY9fzmakC2D80IerT2CXiAOUcgInbxLE16IRrCxqit3Y/d Qfig== X-Forwarded-Encrypted: i=1; AJvYcCXeG2Lltiy0+eIt7PputrSC4QClnrwTM7umWZZKpa/apSOLIr/lH7zstZhzUpR7DcToH0yt7+qJbk+wnwGlKt3G6l7QdNQ= X-Gm-Message-State: AOJu0YyxyA1ayP0VNcYsY3knFPd2vyuRTOkiTlV+Mpa6WpeHLLImfehu fZ56S/qtO0dFuegDr/KMoKG+i1OSyiUVYMC+EPKh3YJxHaloK1uECw3kRXZSybUHCK9hBpBlTLW f X-Google-Smtp-Source: AGHT+IGv6KAM2NIpBdX8j2d+sWyxJ7dI7m6Wx8zNbzHKmN0Rg8GfpvxIx+dOdVBRmX6AUPGoaLvV0g== X-Received: by 2002:a5d:698b:0:b0:367:9088:fecd with SMTP id ffacd0b85a97d-36b5cee2e4emr7735549f8f.7.1722355389158; Tue, 30 Jul 2024 09:03:09 -0700 (PDT) Received: from orth.archaic.org.uk (orth.archaic.org.uk. [2001:8b0:1d0::2]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36b3685810csm15001676f8f.71.2024.07.30.09.03.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 09:03:08 -0700 (PDT) From: Peter Maydell To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 2/8] target/arm: Pass env pointer through to sme_bfmopa helper Date: Tue, 30 Jul 2024 17:03:00 +0100 Message-Id: <20240730160306.2959745-3-peter.maydell@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730160306.2959745-1-peter.maydell@linaro.org> References: <20240730160306.2959745-1-peter.maydell@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::42a; envelope-from=peter.maydell@linaro.org; helo=mail-wr1-x42a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org To implement the FEAT_EBF16 semantics, we are going to need the CPUARMState env pointer in every helper function which calls bfdotadd(). Pass the env pointer through from generated code to the sme_bfmopa helper. (We'll add the code that uses it when we've adjusted all the helpers to have access to the env pointer.) Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- target/arm/tcg/helper-sme.h | 4 ++-- target/arm/tcg/sme_helper.c | 4 ++-- target/arm/tcg/translate-sme.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index 659867a1faf..f12d903aa44 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -126,8 +126,8 @@ DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_6(sme_bfmopa, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_7(sme_bfmopa, TCG_CALL_NO_RWG, + void, env, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme_umopa_s, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index 2af2b957cb6..f172225b2f2 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -1080,8 +1080,8 @@ void HELPER(sme_fmopa_h)(CPUARMState *env, } } -void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, uint32_t desc) +void HELPER(sme_bfmopa)(CPUARMState *env, void *vza, void *vzn, void *vzm, + void *vpn, void *vpm, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index 8e9332f1898..bcb502feb05 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -355,7 +355,7 @@ TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_FPCR, gen_helper_sme_fmopa_d) /* TODO: FEAT_EBF16 */ -TRANS_FEAT(BFMOPA, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_bfmopa) +TRANS_FEAT(BFMOPA, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa) TRANS_FEAT(SMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_smopa_s) TRANS_FEAT(UMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_umopa_s) From patchwork Tue Jul 30 16:03:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 1966619 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=Zfkf2PWx; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKmV1Ct9z1ybV for ; Wed, 31 Jul 2024 02:04:26 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sYpJj-0003Bh-5D; Tue, 30 Jul 2024 12:03:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sYpJh-00036M-Rw for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:13 -0400 Received: from mail-wm1-x32f.google.com ([2a00:1450:4864:20::32f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sYpJf-0000Fz-5p for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:13 -0400 Received: by mail-wm1-x32f.google.com with SMTP id 5b1f17b1804b1-42816ca782dso27078765e9.2 for ; Tue, 30 Jul 2024 09:03:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1722355390; x=1722960190; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=Br6CkeiTIKRILq+hj11nYdiPNIHIQx79eIryvbnx584=; b=Zfkf2PWxBj5Zc5l8WSoGGBnm+gujF0xs7dADb/SiBN866sMfknKR3VjxSQSeoYqACm 6Apqndw1ZWsO/Y05UO0XmWw+vbpShC6Ru0g0NlHCnvjXW1KR0YKohLrUr/xz/7AtB3V8 hoIJaIPS4uPXW5IRbJ7ENe184BnDj8p7Q5JrJ+QXI0HqkNEAVGkf9E7lH5/cj5sbksKq gDz+lKNoj8a2zHv1eRO4kQosgbKtn2IhQQ75Wlt5AV6Ioyq7jhKN1ayJib5Z+2nM2r7K CWGN795pszwNbVhgNyIJKSuUASNgBl9oFP9fFDIoMeo3XlaYiBMfzNNNM2zJP08yUaup V7uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722355390; x=1722960190; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Br6CkeiTIKRILq+hj11nYdiPNIHIQx79eIryvbnx584=; b=A9wP/s+Bvien1+IrqRF71nZ9yizHZOBUWQVfOhyGZkE3kV5l7yk4+oAaAESrw970OZ mb92UNH6FlxuSYyPyP/GTC2MRtitHkCd2ELWqvv7usTMHBkQpRtthzfdQBJ1v0yeUdjK ovAlUjO9+4SIYQcIQ62Xm22qM74I+V11WneoQlzSySxMU8OAebf7FmvDffmoqcMc4drf FYcCnONCxXu0XpV4l2KvgS21Pkv42ehrbPJM15pOY3zDSMZbtxr4QKxKuANEr8v1K2/Y EnK9vUx120xCQ3TGrF3+ebMYyffpSDMTG/0NzuEK9t6XrcuE2vN9J30vCdzEAGPJJxvK viIQ== X-Forwarded-Encrypted: i=1; AJvYcCVV9Py6D1YYGHkAI3jq7YLTgkX4nrL1cDQWbXxXtmjy/Zgsi8y082XAoaRXJkiOzxvBdqzX0Xe+SPziM/5EkYlZwZtMFCM= X-Gm-Message-State: AOJu0Yw/YT7M0bbQr+dDBMjEeFwVXPiXZG4WXcMwBon+O2T+EF+f/E6n 2xEpEE5k1/lFDa2QOes+Gy5zRGc0jZhoQxpfh9Q0039AFPH+F1J5ORGt3UHV6eg= X-Google-Smtp-Source: AGHT+IGKQh+gJQ4pTLxFA6qHayJeL+sbYxn/gWstLH+VvcbGEoEFUPrwBcFl4lJLBBOnP4SO7IQNng== X-Received: by 2002:a05:600c:5492:b0:427:b995:5bd0 with SMTP id 5b1f17b1804b1-42811dd08d3mr71927105e9.23.1722355389721; Tue, 30 Jul 2024 09:03:09 -0700 (PDT) Received: from orth.archaic.org.uk (orth.archaic.org.uk. [2001:8b0:1d0::2]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36b3685810csm15001676f8f.71.2024.07.30.09.03.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 09:03:09 -0700 (PDT) From: Peter Maydell To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 3/8] target/arm: Pass env pointer through to gvec_bfdot helper Date: Tue, 30 Jul 2024 17:03:01 +0100 Message-Id: <20240730160306.2959745-4-peter.maydell@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730160306.2959745-1-peter.maydell@linaro.org> References: <20240730160306.2959745-1-peter.maydell@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::32f; envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x32f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Pass the env pointer through to the gvec_bfdot helper, so we can use it to add support for FEAT_EBF16. Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- target/arm/helper.h | 4 ++-- target/arm/tcg/translate-a64.c | 27 ++++++++++++++++++++++++- target/arm/tcg/translate-neon.c | 35 +++++++++++++++++++++++++++++++-- target/arm/tcg/translate-sve.c | 15 +++++++++++++- target/arm/tcg/vec_helper.c | 3 ++- 5 files changed, 77 insertions(+), 7 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index 970d059dec5..aece9fd4aa7 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -1027,8 +1027,8 @@ DEF_HELPER_FLAGS_5(gvec_ummla_b, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(gvec_bfdot, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 148be2826ec..4aef8b9211a 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -735,6 +735,22 @@ static void gen_gvec_op4_ool(DisasContext *s, bool is_q, int rd, int rn, is_q ? 16 : 8, vec_full_reg_size(s), data, fn); } +/* + * Expand a 4-operand operation using an out-of-line helper that takes + * a pointer to the CPU env. + */ +static void gen_gvec_op4_env(DisasContext *s, bool is_q, int rd, int rn, + int rm, int ra, int data, + gen_helper_gvec_4_ptr *fn) +{ + tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + vec_full_reg_offset(s, ra), + tcg_env, + is_q ? 16 : 8, vec_full_reg_size(s), data, fn); +} + /* * Expand a 4-operand + fpstatus pointer + simd data value operation using * an out-of-line helper. @@ -5601,10 +5617,19 @@ static bool do_dot_vector(DisasContext *s, arg_qrrr_e *a, return true; } +static bool do_dot_vector_env(DisasContext *s, arg_qrrr_e *a, + gen_helper_gvec_4_ptr *fn) +{ + if (fp_access_check(s)) { + gen_gvec_op4_env(s, a->q, a->rd, a->rn, a->rm, a->rd, 0, fn); + } + return true; +} + TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b) TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b) TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b) -TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfdot) +TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector_env, a, gen_helper_gvec_bfdot) TRANS_FEAT(BFMMLA, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfmmla) TRANS_FEAT(SMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_smmla_b) TRANS_FEAT(UMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_ummla_b) diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index 915c9e56db5..454380f01d7 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -148,6 +148,37 @@ static bool do_neon_ddda(DisasContext *s, int q, int vd, int vn, int vm, return true; } +static bool do_neon_ddda_env(DisasContext *s, int q, int vd, int vn, int vm, + int data, gen_helper_gvec_4_ptr *fn_gvec) +{ + /* UNDEF accesses to D16-D31 if they don't exist. */ + if (((vd | vn | vm) & 0x10) && !dc_isar_feature(aa32_simd_r32, s)) { + return false; + } + + /* + * UNDEF accesses to odd registers for each bit of Q. + * Q will be 0b111 for all Q-reg instructions, otherwise + * when we have mixed Q- and D-reg inputs. + */ + if (((vd & 1) * 4 | (vn & 1) * 2 | (vm & 1)) & q) { + return false; + } + + if (!vfp_access_check(s)) { + return true; + } + + int opr_sz = q ? 16 : 8; + tcg_gen_gvec_4_ptr(vfp_reg_offset(1, vd), + vfp_reg_offset(1, vn), + vfp_reg_offset(1, vm), + vfp_reg_offset(1, vd), + tcg_env, + opr_sz, opr_sz, data, fn_gvec); + return true; +} + static bool do_neon_ddda_fpst(DisasContext *s, int q, int vd, int vn, int vm, int data, ARMFPStatusFlavour fp_flavour, gen_helper_gvec_4_ptr *fn_gvec_ptr) @@ -266,8 +297,8 @@ static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a) if (!dc_isar_feature(aa32_bf16, s)) { return false; } - return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_bfdot); + return do_neon_ddda_env(s, a->q * 7, a->vd, a->vn, a->vm, 0, + gen_helper_gvec_bfdot); } static bool trans_VFML(DisasContext *s, arg_VFML *a) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index 798ab2bfb13..4fb0bd077b4 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -238,6 +238,19 @@ static bool gen_gvec_fpst_zzzz(DisasContext *s, gen_helper_gvec_4_ptr *fn, return ret; } +static bool gen_gvec_env_zzzz(DisasContext *s, gen_helper_gvec_4_ptr *fn, + int rd, int rn, int rm, int ra, + int data) +{ + return gen_gvec_ptr_zzzz(s, fn, rd, rn, rm, ra, data, tcg_env); +} + +static bool gen_gvec_env_arg_zzzz(DisasContext *s, gen_helper_gvec_4_ptr *fn, + arg_rrrr_esz *a, int data) +{ + return gen_gvec_env_zzzz(s, fn, a->rd, a->rn, a->rm, a->ra, data); +} + /* Invoke an out-of-line helper on 4 Zregs, 1 Preg, plus fpst. */ static bool gen_gvec_fpst_zzzzp(DisasContext *s, gen_helper_gvec_5_ptr *fn, int rd, int rn, int rm, int ra, int pg, @@ -7099,7 +7112,7 @@ TRANS_FEAT_NONSTREAMING(USMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, TRANS_FEAT_NONSTREAMING(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, gen_helper_gvec_ummla_b, a, 0) -TRANS_FEAT(BFDOT_zzzz, aa64_sve_bf16, gen_gvec_ool_arg_zzzz, +TRANS_FEAT(BFDOT_zzzz, aa64_sve_bf16, gen_gvec_env_arg_zzzz, gen_helper_gvec_bfdot, a, 0) TRANS_FEAT(BFDOT_zzxz, aa64_sve_bf16, gen_gvec_ool_arg_zzxz, gen_helper_gvec_bfdot_idx, a) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index 98604d170fd..37aad4be4b0 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -2814,7 +2814,8 @@ float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2) return t1; } -void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc) +void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, + void *envp, uint32_t desc) { intptr_t i, opr_sz = simd_oprsz(desc); float32 *d = vd, *a = va; From patchwork Tue Jul 30 16:03:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 1966617 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=ejB9alhi; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKm61GBzz1ybV for ; Wed, 31 Jul 2024 02:04:06 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sYpJr-0003jT-Gb; Tue, 30 Jul 2024 12:03:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sYpJp-0003bk-Ji for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:21 -0400 Received: from mail-wm1-x329.google.com ([2a00:1450:4864:20::329]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sYpJn-0000HW-Oh for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:21 -0400 Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-428141be2ddso28658955e9.2 for ; Tue, 30 Jul 2024 09:03:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1722355398; x=1722960198; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=HZaZuKw6MIHzrJiZQtNTqpnwcvq+BC4Z/9tdatLyulU=; b=ejB9alhiGOaIMtWcnK3ZApZyZoIJH7BBbymuu7t7lPxTy/B8mw4OSdoNRa8SX5sXLi r2mOz2di7l1Gqwk69zYwJJ52RT+ckbd+la8h3FDXxK4s3I+/3rNN680pJbUzskxdNbj0 akzncUY8g186td7hGNEQoYkMy28pn+Cyvx86zFcLRA4cXZ1KeyKTng9PzJNzaGaACDgw RH4ZdEUHR8nWtOrzYgDZwVUCFg1eeVF/pE1b2sAlkqYz6Xvp6OXYGJHm7mM+dEXgq6oo +d2omKqJg9o1FI/PhCkG4AKtgMPu0ATuIRaaV3I72/jyQGoWV0HWmgCyY5SI9dxJCgJ5 lzhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722355398; x=1722960198; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HZaZuKw6MIHzrJiZQtNTqpnwcvq+BC4Z/9tdatLyulU=; b=cbWobvdA3og4JYUq41wpqU7Y6pVLWl1/Pn0vptO61Xnun4JhJzVdSRYtGAF9efugHe TmN4fg+u00kXpWGvJAvpcNDzKOzzLeopPtdVa6FNcc6zEdwLOqr8DXrreVeJdkVsCoZ7 UADcnSqOwlBxsg9sCnbwzSWMT8+r6fBu3vQ1KZP/3d2H/7d9wXZP5mExibJAO5do8+Gw Cw2Rw4cBF+/VimJ2ZUYkF4KSvoaf6MWzL0B1iBjv+uoW+8sEN6kqneOFG5loZgzvmPFQ OuB7Kje08AuuOCyyW6DFxFPqFE1Yc6KocGQxX3fxkusLMj6IMYZlU5P0GWGiWIX0xHRT mfog== X-Forwarded-Encrypted: i=1; AJvYcCULadMRyktjy4Ov54CS+8BDIPGdziulpTrgb2j+Hn4cthvRUzhMktU7EfYU2HHdn6T30g9ElxZosS4XztA2FY7tjTRu1/k= X-Gm-Message-State: AOJu0YyhkOyas2pHB7PlMXtNx21sS0ee21SfcA2y95sK5CUmvHLP2Ni8 weQ7qgtheX6ZDsDww+8wFARKuBdp6BE7dC2WZr//JlaMOu27k3M9f76TaxMCnDNLBvG0ai+26vm K X-Google-Smtp-Source: AGHT+IH3MNtcyOKIVE8kmGH8g/FwWt73Y62hrB/XF93if0TLcn2gIfKF/ZkughpzD9Py3xUZcw8UvQ== X-Received: by 2002:a5d:698b:0:b0:367:9088:fecd with SMTP id ffacd0b85a97d-36b5cee2e4emr7735591f8f.7.1722355390195; Tue, 30 Jul 2024 09:03:10 -0700 (PDT) Received: from orth.archaic.org.uk (orth.archaic.org.uk. [2001:8b0:1d0::2]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36b3685810csm15001676f8f.71.2024.07.30.09.03.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 09:03:09 -0700 (PDT) From: Peter Maydell To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 4/8] target/arm: Pass env pointer through to gvec_bfdot_idx helper Date: Tue, 30 Jul 2024 17:03:02 +0100 Message-Id: <20240730160306.2959745-5-peter.maydell@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730160306.2959745-1-peter.maydell@linaro.org> References: <20240730160306.2959745-1-peter.maydell@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::329; envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x329.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Pass the env pointer through to the gvec_bfdot_idx helper, so we can use it to add support for FEAT_EBF16. Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- target/arm/helper.h | 4 ++-- target/arm/tcg/translate-a64.c | 11 ++++++++++- target/arm/tcg/translate-neon.c | 4 ++-- target/arm/tcg/translate-sve.c | 8 +++++++- target/arm/tcg/vec_helper.c | 2 +- 5 files changed, 22 insertions(+), 7 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index aece9fd4aa7..386cf8686ea 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -1029,8 +1029,8 @@ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(gvec_bfdot, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(gvec_bfdot_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 4aef8b9211a..a4e9740c921 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -6403,13 +6403,22 @@ static bool do_dot_vector_idx(DisasContext *s, arg_qrrx_e *a, return true; } +static bool do_dot_vector_idx_env(DisasContext *s, arg_qrrx_e *a, + gen_helper_gvec_4_ptr *fn) +{ + if (fp_access_check(s)) { + gen_gvec_op4_env(s, a->q, a->rd, a->rn, a->rm, a->rd, a->idx, fn); + } + return true; +} + TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_b) TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_b) TRANS_FEAT(SUDOT_vi, aa64_i8mm, do_dot_vector_idx, a, gen_helper_gvec_sudot_idx_b) TRANS_FEAT(USDOT_vi, aa64_i8mm, do_dot_vector_idx, a, gen_helper_gvec_usdot_idx_b) -TRANS_FEAT(BFDOT_vi, aa64_bf16, do_dot_vector_idx, a, +TRANS_FEAT(BFDOT_vi, aa64_bf16, do_dot_vector_idx_env, a, gen_helper_gvec_bfdot_idx) static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a) diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index 454380f01d7..7de157c539c 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -391,8 +391,8 @@ static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a) if (!dc_isar_feature(aa32_bf16, s)) { return false; } - return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_bfdot_idx); + return do_neon_ddda_env(s, a->q * 6, a->vd, a->vn, a->vm, a->index, + gen_helper_gvec_bfdot_idx); } static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index 4fb0bd077b4..8876d1f91a9 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -251,6 +251,12 @@ static bool gen_gvec_env_arg_zzzz(DisasContext *s, gen_helper_gvec_4_ptr *fn, return gen_gvec_env_zzzz(s, fn, a->rd, a->rn, a->rm, a->ra, data); } +static bool gen_gvec_env_arg_zzxz(DisasContext *s, gen_helper_gvec_4_ptr *fn, + arg_rrxr_esz *a) +{ + return gen_gvec_env_zzzz(s, fn, a->rd, a->rn, a->rm, a->ra, a->index); +} + /* Invoke an out-of-line helper on 4 Zregs, 1 Preg, plus fpst. */ static bool gen_gvec_fpst_zzzzp(DisasContext *s, gen_helper_gvec_5_ptr *fn, int rd, int rn, int rm, int ra, int pg, @@ -7114,7 +7120,7 @@ TRANS_FEAT_NONSTREAMING(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, TRANS_FEAT(BFDOT_zzzz, aa64_sve_bf16, gen_gvec_env_arg_zzzz, gen_helper_gvec_bfdot, a, 0) -TRANS_FEAT(BFDOT_zzxz, aa64_sve_bf16, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(BFDOT_zzxz, aa64_sve_bf16, gen_gvec_env_arg_zzxz, gen_helper_gvec_bfdot_idx, a) TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_ool_arg_zzzz, diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index 37aad4be4b0..1edde9792f0 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -2828,7 +2828,7 @@ void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, } void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm, - void *va, uint32_t desc) + void *va, void *envp, uint32_t desc) { intptr_t i, j, opr_sz = simd_oprsz(desc); intptr_t index = simd_data(desc); From patchwork Tue Jul 30 16:03:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 1966620 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=XhRvqy8J; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKmw5n2Vz1ybV for ; Wed, 31 Jul 2024 02:04:48 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sYpJs-0003mM-8Z; Tue, 30 Jul 2024 12:03:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sYpJr-0003i2-6D for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:23 -0400 Received: from mail-wm1-x330.google.com ([2a00:1450:4864:20::330]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sYpJo-0000Hd-3y for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:22 -0400 Received: by mail-wm1-x330.google.com with SMTP id 5b1f17b1804b1-428035c0bb2so17930545e9.1 for ; Tue, 30 Jul 2024 09:03:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1722355398; x=1722960198; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=u/1r/oeOEIAX2gVU8SoWnDp6iCu5q8o/SWMbeXFkLdU=; b=XhRvqy8JslX+a0T971HQhfSvFmuFR9lctr81xCuSIMvOyvMduFb29uvBRWhMRxQdL1 AZtRSmum/sBOzOO/3LNNbOZ9ZpgDoNTEc7yLZviih7a5BJ01p+SfrufyxcrNDojsnDcJ T+WfWrzVeTYWuGbPJkgkC97b+bsZYgu5HYey7RkDxRcmc+kTlMp6PIwcb5sK72W11t5r fzSEIM+z1kWczybpV6MT423jjRCuR+cBExj2GaPa9rY0t9EhWDh5Uy8I1ZImzyfcX0Om gwxrX0mpuGrS8nXLzToFRJ7Y8nqSc3ScJtmYszWcMxPiWv8o13LTHW/CBzi0t3MfLraY g0+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722355398; x=1722960198; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u/1r/oeOEIAX2gVU8SoWnDp6iCu5q8o/SWMbeXFkLdU=; b=AyDmFEdYlDukcGKw0n4Is7+9JmzXcdy3HHRE7A+YZPHHCENWhCJiWwNSK5kDziGVP1 X60uptgGOuSYgWSzCpPpRIVeEBTmB8l0ZCfLhK0sOkRIIvq+IwMfhYNwJdHaD5vYJBAq N+Vfkkt1oVHMsegDvXiXm7768BM8sZsMIQT4Tyd9TXnEaMnrexc8d1Gitzmy86b6bGqj zX4Sp2mBjBwLlyd9TL2qCTW0nRCd8Gzx77gnp99Kwg5OKMfMgBXjXF3kjqrx8vtAL9Zd LmDxedbrtTP0RnWepKasJZkZbDLSO9UeTrYQqJi/1g1LfAn2wcFGJjB5IXDz4VIao8rN zDHA== X-Forwarded-Encrypted: i=1; AJvYcCUnjA+ULUV0Wni8OmcDasS33PzwZGL6sFBEZoeQ2A4pAuwEtC1PQcy5rIRDJN4TSiM3k56wPSjsiO9oUV6g1/ZIghic4eQ= X-Gm-Message-State: AOJu0Yyf6CFeiDfxE9PElE90AjnCzTD0iLibud7Q6ks8AP3ipfzIuWqe rD+om8EePdCF+7mgBhgChSx10+9/L2AP3oCF8A8R/pouAkG2k3kv0RDsCTpdnPM= X-Google-Smtp-Source: AGHT+IGbDB84jhqyhx1wPLgBsFnr2qZy88dXU8Oz4wWTo4/y8BYNNlCQJwdAfmu5RfYWLFfj8AcdNQ== X-Received: by 2002:a05:6000:1a51:b0:362:2af4:43cc with SMTP id ffacd0b85a97d-36b8c8e6637mr1913162f8f.19.1722355398465; Tue, 30 Jul 2024 09:03:18 -0700 (PDT) Received: from orth.archaic.org.uk (orth.archaic.org.uk. [2001:8b0:1d0::2]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36b3685810csm15001676f8f.71.2024.07.30.09.03.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 09:03:18 -0700 (PDT) From: Peter Maydell To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 5/8] target/arm: Pass env pointer through to gvec_bfmmla helper Date: Tue, 30 Jul 2024 17:03:03 +0100 Message-Id: <20240730160306.2959745-6-peter.maydell@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730160306.2959745-1-peter.maydell@linaro.org> References: <20240730160306.2959745-1-peter.maydell@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::330; envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x330.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Pass the env pointer through to the gvec_bfmmla helper, so we can use it to add support for FEAT_EBF16. Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- target/arm/helper.h | 4 ++-- target/arm/tcg/translate-a64.c | 2 +- target/arm/tcg/translate-neon.c | 4 ++-- target/arm/tcg/translate-sve.c | 2 +- target/arm/tcg/vec_helper.c | 3 ++- 5 files changed, 8 insertions(+), 7 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index 386cf8686ea..93b830d2cce 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -1032,8 +1032,8 @@ DEF_HELPER_FLAGS_6(gvec_bfdot, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(gvec_bfdot_idx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(gvec_bfmmla, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index a4e9740c921..33d49f524f4 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -5630,7 +5630,7 @@ TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b) TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b) TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b) TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector_env, a, gen_helper_gvec_bfdot) -TRANS_FEAT(BFMMLA, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfmmla) +TRANS_FEAT(BFMMLA, aa64_bf16, do_dot_vector_env, a, gen_helper_gvec_bfmmla) TRANS_FEAT(SMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_smmla_b) TRANS_FEAT(UMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_ummla_b) TRANS_FEAT(USMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usmmla_b) diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index 7de157c539c..13cd31aad42 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -3730,8 +3730,8 @@ static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a) if (!dc_isar_feature(aa32_bf16, s)) { return false; } - return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_bfmmla); + return do_neon_ddda_env(s, 7, a->vd, a->vn, a->vm, 0, + gen_helper_gvec_bfmmla); } static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index 8876d1f91a9..95e938662ed 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -7123,7 +7123,7 @@ TRANS_FEAT(BFDOT_zzzz, aa64_sve_bf16, gen_gvec_env_arg_zzzz, TRANS_FEAT(BFDOT_zzxz, aa64_sve_bf16, gen_gvec_env_arg_zzxz, gen_helper_gvec_bfdot_idx, a) -TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_ool_arg_zzzz, +TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_env_arg_zzzz, gen_helper_gvec_bfmmla, a, 0) static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index 1edde9792f0..77efb5f47d8 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -2847,7 +2847,8 @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm, clear_tail(d, opr_sz, simd_maxsz(desc)); } -void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc) +void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, + void *envp, uint32_t desc) { intptr_t s, opr_sz = simd_oprsz(desc); float32 *d = vd, *a = va; From patchwork Tue Jul 30 16:03:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 1966621 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=P8mj6Asf; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKmy1mjGz1ybV for ; Wed, 31 Jul 2024 02:04:50 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sYpJu-00042E-Rx; Tue, 30 Jul 2024 12:03:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sYpJt-0003ta-5s for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:25 -0400 Received: from mail-wr1-x434.google.com ([2a00:1450:4864:20::434]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sYpJp-0000Hq-1G for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:24 -0400 Received: by mail-wr1-x434.google.com with SMTP id ffacd0b85a97d-369cb9f086aso2500687f8f.0 for ; Tue, 30 Jul 2024 09:03:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1722355399; x=1722960199; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=QvrVCRfMbrZEGuOW0qBR5GVSvdV1DpAOaaHu4y3CbnQ=; b=P8mj6AsfbcfFPnCOdFCEkJSptuWiJODlsjFF4wmL4k1XzeEecJ9+z0lZTSc/n/Ne8a pPlkBD5LGNSoCIY5dXcr8Su6xQMABiKFMMKB3V2EOhcLfn6TjUdCmSAwukGh/nrnqXca 3x3aPFZ/AT9kC/rE59OqMXPluPcYzwBO6xM5SoA4eug2TV+s9lGXLROHAD48yb4L+OCX UfeXWSMW5ryzifFDM29ELrJKQyuvqojmyZvlRnkJnXePB6MDgevNGdBXkqyb/Y7JgLG1 SI+FZJEcVBk08WbjUxUXNNWGJiM0qqDNqma37RZErAiC5CbTYKK0+E1oGYCdvLKsauI2 j8Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722355399; x=1722960199; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QvrVCRfMbrZEGuOW0qBR5GVSvdV1DpAOaaHu4y3CbnQ=; b=BvvIzNqyW8AzKZjyAbrvzfXfb/aDmzgPPWG4NoNqy3UxjZ8+84pXOb1k+mFa21XXc8 CeiKBHqP1nx7U1xsUTV8vBMkDYJ56mWjbtTR8137dMp31gfjdcX1pXztbPFFVGcZTtZE h7CiOpqhCj45FSHRfc/3wx8G2frFMhl+ndJoXS41fwaCSclX9k2aJr+y5sa/q/vzRHov 0rXz9wy+/aITH+PZJNQIBnjE/tQTJzYO7yVKC/Lvby+1M04ioCU0gdDCLBSi/Qm2ttEh 292Y0q0LGYunjz6hPAco9hH7k8MUpo2hOm+obKbTdGUaEKCTyId+iEJBu6sDFkrKDPSX RLgw== X-Forwarded-Encrypted: i=1; AJvYcCVollNT/f45JIhyKo4pOJWszkqbTy4EQXKuUWi6NORvuzi05pphR0Ruu8ESO/3Wd9Acq4ip9lKcS3LXgbDt4NyDq3Hz3+c= X-Gm-Message-State: AOJu0YzHYLPlw181cyeQsVFs/XQb1CyxvNv25y5UX5uS4VbDbIUBoShu eGB8AptEBDjJNvR6COO972Y2w4d1vBIFxey21HXTXkjVdEAKtYt/wevusaojPFs= X-Google-Smtp-Source: AGHT+IFATCG8GZGJDx84yDEt+a6U8QpXMwCDxSwhlh6mdDK32eVghfZVEYIbvR0rW6qNoANzj5olIw== X-Received: by 2002:a05:6000:110a:b0:367:98e6:362b with SMTP id ffacd0b85a97d-36b5d353ad0mr6460218f8f.42.1722355398982; Tue, 30 Jul 2024 09:03:18 -0700 (PDT) Received: from orth.archaic.org.uk (orth.archaic.org.uk. [2001:8b0:1d0::2]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36b3685810csm15001676f8f.71.2024.07.30.09.03.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 09:03:18 -0700 (PDT) From: Peter Maydell To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 6/8] target/arm: Prepare bfdotadd() callers for FEAT_EBF support Date: Tue, 30 Jul 2024 17:03:04 +0100 Message-Id: <20240730160306.2959745-7-peter.maydell@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730160306.2959745-1-peter.maydell@linaro.org> References: <20240730160306.2959745-1-peter.maydell@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::434; envelope-from=peter.maydell@linaro.org; helo=mail-wr1-x434.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org We use bfdotadd() in four callsites for various helper functions. Currently this all assumes that we have the FPCR.EBF=0 semantics. For FPCR.EBF=1 we will need to: * call a different routine to bfdotadd() because we need to do a fused multiply-add rather than separate multiply and add steps * use a different float_status that honours the FPCR rounding mode and denormal-flushing fields * pass in an extra float_status that has been set up to perform round-to-odd rounding To prepare for this, refactor all the callsites so that instead of for (...) { x = bfdotadd(...); } they are: float_status fpst, fpst_odd; if (is_ebf(env, &fpst, &fpst_odd)) { for (...) { x = bfdotadd_ebf(..., &fpst, &fpst_odd); } } else { for (...) { x = bfdotadd(..., &fpst); } } For the moment the is_ebf() function always returns false, sets up fpst for EBF=0 semantics and never sets up fpst_odd; bfdotadd_ebf() will assert if called. We'll fill in the handling for EBF=1 in the next commit. This change should be a zero-behaviour-change refactor. Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- target/arm/tcg/vec_internal.h | 37 ++++++++- target/arm/tcg/sme_helper.c | 74 ++++++++++++------ target/arm/tcg/vec_helper.c | 141 +++++++++++++++++++++++++--------- 3 files changed, 192 insertions(+), 60 deletions(-) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index 3ca1b94ccf9..094f5c169ca 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -223,13 +223,46 @@ int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool); * bfdotadd: * @sum: addend * @e1, @e2: multiplicand vectors + * @fpst: floating-point status to use * * BFloat16 2-way dot product of @e1 & @e2, accumulating with @sum. * The @e1 and @e2 operands correspond to the 32-bit source vector * slots and contain two Bfloat16 values each. * - * Corresponds to the ARM pseudocode function BFDotAdd. + * Corresponds to the ARM pseudocode function BFDotAdd, specialized + * for the FPCR.EBF == 0 case. */ -float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2); +float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst); +/** + * bfdotadd_ebf: + * @sum: addend + * @e1, @e2: multiplicand vectors + * @fpst: floating-point status to use + * @fpst_odd: floating-point status to use for round-to-odd operations + * + * BFloat16 2-way dot product of @e1 & @e2, accumulating with @sum. + * The @e1 and @e2 operands correspond to the 32-bit source vector + * slots and contain two Bfloat16 values each. + * + * Corresponds to the ARM pseudocode function BFDotAdd, specialized + * for the FPCR.EBF == 1 case. + */ +float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2, + float_status *fpst, float_status *fpst_odd); + +/** + * is_ebf: + * @env: CPU state + * @statusp: pointer to floating point status to fill in + * @oddstatusp: pointer to floating point status to fill in for round-to-odd + * + * Determine whether a BFDotAdd operation should use FPCR.EBF = 0 + * or FPCR.EBF = 1 semantics. On return, has initialized *statusp + * and *oddstatusp to suitable float_status arguments to use with either + * bfdotadd() or bfdotadd_ebf(). + * Returns true for EBF = 1, false for EBF = 0. (The caller should use this + * to decide whether to call bfdotadd() or bfdotadd_ebf().) + */ +bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp); #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index f172225b2f2..e3fbfa98fa5 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -1086,32 +1086,62 @@ void HELPER(sme_bfmopa)(CPUARMState *env, void *vza, void *vzn, void *vzm, intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; uint16_t *pn = vpn, *pm = vpm; + float_status fpst, fpst_odd; - for (row = 0; row < oprsz; ) { - uint16_t prow = pn[H2(row >> 4)]; - do { - void *vza_row = vza + tile_vslice_offset(row); - uint32_t n = *(uint32_t *)(vzn + H1_4(row)); + if (is_ebf(env, &fpst, &fpst_odd)) { + for (row = 0; row < oprsz; ) { + uint16_t prow = pn[H2(row >> 4)]; + do { + void *vza_row = vza + tile_vslice_offset(row); + uint32_t n = *(uint32_t *)(vzn + H1_4(row)); - n = f16mop_adj_pair(n, prow, neg); + n = f16mop_adj_pair(n, prow, neg); - for (col = 0; col < oprsz; ) { - uint16_t pcol = pm[H2(col >> 4)]; - do { - if (prow & pcol & 0b0101) { - uint32_t *a = vza_row + H1_4(col); - uint32_t m = *(uint32_t *)(vzm + H1_4(col)); + for (col = 0; col < oprsz; ) { + uint16_t pcol = pm[H2(col >> 4)]; + do { + if (prow & pcol & 0b0101) { + uint32_t *a = vza_row + H1_4(col); + uint32_t m = *(uint32_t *)(vzm + H1_4(col)); - m = f16mop_adj_pair(m, pcol, 0); - *a = bfdotadd(*a, n, m); - } - col += 4; - pcol >>= 4; - } while (col & 15); - } - row += 4; - prow >>= 4; - } while (row & 15); + m = f16mop_adj_pair(m, pcol, 0); + *a = bfdotadd_ebf(*a, n, m, &fpst, &fpst_odd); + } + col += 4; + pcol >>= 4; + } while (col & 15); + } + row += 4; + prow >>= 4; + } while (row & 15); + } + } else { + for (row = 0; row < oprsz; ) { + uint16_t prow = pn[H2(row >> 4)]; + do { + void *vza_row = vza + tile_vslice_offset(row); + uint32_t n = *(uint32_t *)(vzn + H1_4(row)); + + n = f16mop_adj_pair(n, prow, neg); + + for (col = 0; col < oprsz; ) { + uint16_t pcol = pm[H2(col >> 4)]; + do { + if (prow & pcol & 0b0101) { + uint32_t *a = vza_row + H1_4(col); + uint32_t m = *(uint32_t *)(vzm + H1_4(col)); + + m = f16mop_adj_pair(m, pcol, 0); + *a = bfdotadd(*a, n, m, &fpst); + } + col += 4; + pcol >>= 4; + } while (col & 15); + } + row += 4; + prow >>= 4; + } while (row & 15); + } } } diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index 77efb5f47d8..baf04a0561b 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -2790,7 +2790,7 @@ DO_MMLA_B(gvec_usmmla_b, do_usmmla_b) * BFloat16 Dot Product */ -float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2) +bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp) { /* FPCR is ignored for BFDOT and BFMMLA. */ float_status bf_status = { @@ -2800,29 +2800,50 @@ float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2) .flush_inputs_to_zero = true, .default_nan_mode = true, }; + + *statusp = bf_status; + return false; +} + +float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst) +{ float32 t1, t2; /* * Extract each BFloat16 from the element pair, and shift * them such that they become float32. */ - t1 = float32_mul(e1 << 16, e2 << 16, &bf_status); - t2 = float32_mul(e1 & 0xffff0000u, e2 & 0xffff0000u, &bf_status); - t1 = float32_add(t1, t2, &bf_status); - t1 = float32_add(sum, t1, &bf_status); + t1 = float32_mul(e1 << 16, e2 << 16, fpst); + t2 = float32_mul(e1 & 0xffff0000u, e2 & 0xffff0000u, fpst); + t1 = float32_add(t1, t2, fpst); + t1 = float32_add(sum, t1, fpst); return t1; } +float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2, + float_status *fpst, float_status *fpst_odd) +{ + g_assert_not_reached(); +} + void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, void *envp, uint32_t desc) { + CPUARMState *env = envp; intptr_t i, opr_sz = simd_oprsz(desc); float32 *d = vd, *a = va; uint32_t *n = vn, *m = vm; + float_status fpst, fpst_odd; - for (i = 0; i < opr_sz / 4; ++i) { - d[i] = bfdotadd(a[i], n[i], m[i]); + if (is_ebf(env, &fpst, &fpst_odd)) { + for (i = 0; i < opr_sz / 4; ++i) { + d[i] = bfdotadd_ebf(a[i], n[i], m[i], &fpst, &fpst_odd); + } + } else { + for (i = 0; i < opr_sz / 4; ++i) { + d[i] = bfdotadd(a[i], n[i], m[i], &fpst); + } } clear_tail(d, opr_sz, simd_maxsz(desc)); } @@ -2830,18 +2851,30 @@ void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm, void *va, void *envp, uint32_t desc) { + CPUARMState *env = envp; intptr_t i, j, opr_sz = simd_oprsz(desc); intptr_t index = simd_data(desc); intptr_t elements = opr_sz / 4; intptr_t eltspersegment = MIN(16 / 4, elements); float32 *d = vd, *a = va; uint32_t *n = vn, *m = vm; + float_status fpst, fpst_odd; - for (i = 0; i < elements; i += eltspersegment) { - uint32_t m_idx = m[i + H4(index)]; + if (is_ebf(env, &fpst, &fpst_odd)) { + for (i = 0; i < elements; i += eltspersegment) { + uint32_t m_idx = m[i + H4(index)]; - for (j = i; j < i + eltspersegment; j++) { - d[j] = bfdotadd(a[j], n[j], m_idx); + for (j = i; j < i + eltspersegment; j++) { + d[j] = bfdotadd_ebf(a[j], n[j], m_idx, &fpst, &fpst_odd); + } + } + } else { + for (i = 0; i < elements; i += eltspersegment) { + uint32_t m_idx = m[i + H4(index)]; + + for (j = i; j < i + eltspersegment; j++) { + d[j] = bfdotadd(a[j], n[j], m_idx, &fpst); + } } } clear_tail(d, opr_sz, simd_maxsz(desc)); @@ -2850,40 +2883,76 @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm, void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, void *envp, uint32_t desc) { + CPUARMState *env = envp; intptr_t s, opr_sz = simd_oprsz(desc); float32 *d = vd, *a = va; uint32_t *n = vn, *m = vm; + float_status fpst, fpst_odd; - for (s = 0; s < opr_sz / 4; s += 4) { - float32 sum00, sum01, sum10, sum11; + if (is_ebf(env, &fpst, &fpst_odd)) { + for (s = 0; s < opr_sz / 4; s += 4) { + float32 sum00, sum01, sum10, sum11; - /* - * Process the entire segment at once, writing back the - * results only after we've consumed all of the inputs. - * - * Key to indices by column: - * i j i k j k - */ - sum00 = a[s + H4(0 + 0)]; - sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]); - sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]); + /* + * Process the entire segment at once, writing back the + * results only after we've consumed all of the inputs. + * + * Key to indices by column: + * i j i k j k + */ + sum00 = a[s + H4(0 + 0)]; + sum00 = bfdotadd_ebf(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)], &fpst, &fpst_odd); + sum00 = bfdotadd_ebf(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)], &fpst, &fpst_odd); - sum01 = a[s + H4(0 + 1)]; - sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]); - sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]); + sum01 = a[s + H4(0 + 1)]; + sum01 = bfdotadd_ebf(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)], &fpst, &fpst_odd); + sum01 = bfdotadd_ebf(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)], &fpst, &fpst_odd); - sum10 = a[s + H4(2 + 0)]; - sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]); - sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]); + sum10 = a[s + H4(2 + 0)]; + sum10 = bfdotadd_ebf(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)], &fpst, &fpst_odd); + sum10 = bfdotadd_ebf(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)], &fpst, &fpst_odd); - sum11 = a[s + H4(2 + 1)]; - sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]); - sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]); + sum11 = a[s + H4(2 + 1)]; + sum11 = bfdotadd_ebf(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)], &fpst, &fpst_odd); + sum11 = bfdotadd_ebf(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)], &fpst, &fpst_odd); - d[s + H4(0 + 0)] = sum00; - d[s + H4(0 + 1)] = sum01; - d[s + H4(2 + 0)] = sum10; - d[s + H4(2 + 1)] = sum11; + d[s + H4(0 + 0)] = sum00; + d[s + H4(0 + 1)] = sum01; + d[s + H4(2 + 0)] = sum10; + d[s + H4(2 + 1)] = sum11; + } + } else { + for (s = 0; s < opr_sz / 4; s += 4) { + float32 sum00, sum01, sum10, sum11; + + /* + * Process the entire segment at once, writing back the + * results only after we've consumed all of the inputs. + * + * Key to indices by column: + * i j i k j k + */ + sum00 = a[s + H4(0 + 0)]; + sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)], &fpst); + sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)], &fpst); + + sum01 = a[s + H4(0 + 1)]; + sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)], &fpst); + sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)], &fpst); + + sum10 = a[s + H4(2 + 0)]; + sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)], &fpst); + sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)], &fpst); + + sum11 = a[s + H4(2 + 1)]; + sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)], &fpst); + sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)], &fpst); + + d[s + H4(0 + 0)] = sum00; + d[s + H4(0 + 1)] = sum01; + d[s + H4(2 + 0)] = sum10; + d[s + H4(2 + 1)] = sum11; + } } clear_tail(d, opr_sz, simd_maxsz(desc)); } From patchwork Tue Jul 30 16:03:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 1966622 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=eITtdWou; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKn12SkPz1ybV for ; Wed, 31 Jul 2024 02:04:53 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sYpJu-00040M-Bv; Tue, 30 Jul 2024 12:03:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sYpJs-0003pv-OS for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:24 -0400 Received: from mail-wm1-x32d.google.com ([2a00:1450:4864:20::32d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sYpJp-0000I1-1O for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:24 -0400 Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-428163f7635so27352085e9.2 for ; Tue, 30 Jul 2024 09:03:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1722355399; x=1722960199; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=iDXGs+7guPqwtDoRL/y+LlQlY9IQZ4XWQ35haRtARdg=; b=eITtdWourb0qB8xke3jBw6nplm/fK6GREgpsvwHe6njMIgwxpTUgSDsRJ8EoPUcth4 u2vL/Kxm5e57ShgOxBqlVZ9LS8VG1Pf5Pq7fQolmxTAZqQpTj0+UksUWuA2ibHnwB/Kr iXRH/NojnLFaketdiclGYtKFFa9q8y9Yvo6KIVDqV3ygI7WKu2J5N41Yhroq9COpyKHV GM/lHSuhYJNEFhftYQXNXJVhZJTFKda/+gSyLwK30NpeSrTJH59IZYKy3Cbw6xL5iRby vPc+oWPSMp1LmQ2uGqsMoq1rDqrkSXjCKSTXoXGKSqSzy8nalCheyWpRSkrYUjXecZnP Dx5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722355399; x=1722960199; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iDXGs+7guPqwtDoRL/y+LlQlY9IQZ4XWQ35haRtARdg=; b=wQfSv49EVE5RCQE7AzJIl/rrbLZU+0hen6GcN5e533RWyOv15Ge8SQtkedT+9JdYMp 0xB+oFnfZo/mMKKowfA8ftgp0Jp6UGWXo9jJ24+liNbhE4L0I+dcYMyxX1hDotmpcrkR dDSYzSCMdBnePvamhSHgCrcfoJJHR7CxrGtZE5Ifsww2sR2Dw8SN8NRxXxYThxTuF0TM npO+1Nrxr/KiHdD2ekK6HoARmZK4JrjDZo7tgakpE07Ka52ONkQDbC/TAX94mV+3zhpR hZGzMM7TzPrQlxJE8fOnOhkKnhqi3pPi+N1xKu52kz5pSYd8MzhmbwnuydOwCcBFmMQI hDbQ== X-Forwarded-Encrypted: i=1; AJvYcCWOIsp40flcWDeNmpQSZZH1kAHN8uz5HxogUecsZysFWA+8FoffEayXVnDIV1Hg2ugbIFIlEwWu3Q4GXNg10gqxUVgDe/M= X-Gm-Message-State: AOJu0YzbuXiqIfYrbRf7Qfk08VY2ZomPSuWEmioL73ZtdYBds/LrxGLV jmuEZNQe0UefnoaF9mgd4kVh/bt71mW9zY1SbNZclpDjm4Tv6jQZob1KzONY6mckdcGFXOTiohb + X-Google-Smtp-Source: AGHT+IGZWc2oIOPACORoSQfToOSj5K/k3XRvcITYBo4TlAvSSsmwL/k+tOD6VBheekPGZDgSOtPapw== X-Received: by 2002:a05:600c:468e:b0:426:66a2:b200 with SMTP id 5b1f17b1804b1-42811a8f351mr84164095e9.0.1722355399506; Tue, 30 Jul 2024 09:03:19 -0700 (PDT) Received: from orth.archaic.org.uk (orth.archaic.org.uk. [2001:8b0:1d0::2]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36b3685810csm15001676f8f.71.2024.07.30.09.03.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 09:03:19 -0700 (PDT) From: Peter Maydell To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 7/8] target/arm: Implement FPCR.EBF=1 semantics for bfdotadd() Date: Tue, 30 Jul 2024 17:03:05 +0100 Message-Id: <20240730160306.2959745-8-peter.maydell@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730160306.2959745-1-peter.maydell@linaro.org> References: <20240730160306.2959745-1-peter.maydell@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::32d; envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x32d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Implement the FPCR.EBF=1 semantics for bfdotadd() operations: * is_ebf() sets up fpst and fpst_odd * bfdotadd_ebf() implements the fused paired-multiply-and-add operation that we need The paired-multiply-and-add is similar to f16_dotadd() and we use the same trick here as in that function, but the inputs here are bfloat16 rather than float16. Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- target/arm/tcg/vec_helper.c | 57 +++++++++++++++++++++++++++++++++++-- 1 file changed, 54 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index baf04a0561b..64076c1c595 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -2792,7 +2792,20 @@ DO_MMLA_B(gvec_usmmla_b, do_usmmla_b) bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp) { - /* FPCR is ignored for BFDOT and BFMMLA. */ + /* + * For BFDOT, BFMMLA, etc, the behaviour depends on FPCR.EBF. + * For EBF = 0, we ignore the FPCR bits which determine rounding + * mode and denormal-flushing, and we do unfused multiplies and + * additions with intermediate rounding of all products and sums. + * For EBF = 1, we honour FPCR rounding mode and denormal-flushing bits, + * and we perform a fused two-way sum-of-products without intermediate + * rounding of the products. + * In either case, we don't set fp exception flags. + * + * EBF is AArch64 only, so even if it's set in the FPCR it has + * no effect on AArch32 instructions. + */ + bool ebf = is_a64(env) && env->vfp.fpcr & FPCR_EBF; float_status bf_status = { .tininess_before_rounding = float_tininess_before_rounding, .float_rounding_mode = float_round_to_odd_inf, @@ -2801,8 +2814,19 @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp) .default_nan_mode = true, }; + if (ebf) { + float_status *fpst = &env->vfp.fp_status; + set_flush_to_zero(get_flush_to_zero(fpst), &bf_status); + set_flush_inputs_to_zero(get_flush_inputs_to_zero(fpst), &bf_status); + set_float_rounding_mode(get_float_rounding_mode(fpst), &bf_status); + + /* EBF=1 needs to do a step with round-to-odd semantics */ + *oddstatusp = bf_status; + set_float_rounding_mode(float_round_to_odd, oddstatusp); + } + *statusp = bf_status; - return false; + return ebf; } float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst) @@ -2824,7 +2848,34 @@ float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst) float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst, float_status *fpst_odd) { - g_assert_not_reached(); + /* + * Compare f16_dotadd() in sme_helper.c, but here we have + * bfloat16 inputs. In particular that means that we do not + * want the FPCR.FZ16 flush semantics, so we use the normal + * float_status for the input handling here. + */ + float64 e1r = float32_to_float64(e1 << 16, fpst); + float64 e1c = float32_to_float64(e1 & 0xffff0000u, fpst); + float64 e2r = float32_to_float64(e2 << 16, fpst); + float64 e2c = float32_to_float64(e2 & 0xffff0000u, fpst); + float64 t64; + float32 t32; + + /* + * The ARM pseudocode function FPDot performs both multiplies + * and the add with a single rounding operation. Emulate this + * by performing the first multiply in round-to-odd, then doing + * the second multiply as fused multiply-add, and rounding to + * float32 all in one step. + */ + t64 = float64_mul(e1r, e2r, fpst_odd); + t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst); + + /* This conversion is exact, because we've already rounded. */ + t32 = float64_to_float32(t64, fpst); + + /* The final accumulation step is not fused. */ + return float32_add(sum, t32, fpst); } void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, From patchwork Tue Jul 30 16:03:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 1966623 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=tBoJxOSj; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKn40ZFtz1ybV for ; Wed, 31 Jul 2024 02:04:56 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sYpJu-00040S-C1; Tue, 30 Jul 2024 12:03:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sYpJt-0003un-Ar for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:25 -0400 Received: from mail-wm1-x32a.google.com ([2a00:1450:4864:20::32a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sYpJp-0000IB-Ni for qemu-devel@nongnu.org; Tue, 30 Jul 2024 12:03:24 -0400 Received: by mail-wm1-x32a.google.com with SMTP id 5b1f17b1804b1-428141be2ddso28659385e9.2 for ; Tue, 30 Jul 2024 09:03:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1722355400; x=1722960200; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=YvH6WF9jz8JWZCJ6bAYBpoQtH4MHX55fT9yLfRYRxQ0=; b=tBoJxOSjlchtr6RiHaQrtYT5kCfBX190aoHaeTmTRf0jMySmOIBHiqZHXzaV93CYvc O0UpFSuVCaEQ4ZV3fLFDNx2Hmx1a+byyrUSSdqzuCUS8uIoxIDU2utEAAq7934qWDY24 Uyi93+m4m4qhffoAvmjZWw96EFest6GuYFlSYVzfWxRfMHZXQ/D+zkCYSoFx6tSmEVyx VNuhnglhdlp6cBKjE8iwW36EL7K1WPs+/6XowHXr92MaxKtg8YEpCsQbQhnzVvS2P93e 99W4FQIKU8Yxf9ozBoy1ml7eKOIF37jImK0jWhD2eE+mVioKsRZH7xa4l1eYvBbmrxtA gjBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722355400; x=1722960200; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YvH6WF9jz8JWZCJ6bAYBpoQtH4MHX55fT9yLfRYRxQ0=; b=tqpfSIc8bjREn+i/hA82PHNcrlfAwr4G1spfz3wxMcCeR3XUS5WoAjdeFqccN4+WkJ yfgPNzwlvcMyXURl6HPQj2btKTphSayTziq9LkA75nwBX4V3witBipH6PokvlRMA50Tx FnlOiF5j31N7h4OnJKO+c1azmzBLBCWYU4G8phe1CG//a+aGdoShs999JV/a53iKG/6d ycrq7fYA6NaWBhxNLgQWToZrxB0fkLpayW00D9KgSjLddrptyVRNa1kAJHfgNSgS4KBZ ogBsZgZSKluUDPKGJX+6VJEt78rkxd3vIKNiQ+1958NfZeIOwUwOmXG2eTNPE22DE8/K Ho2w== X-Forwarded-Encrypted: i=1; AJvYcCVWpCN8wBkAahk7a0hdHSpYIcCIQeX6znOAgL6bbFJyWohYox9ic/nEDPRyQ7lKfx/Yt8s2MiXwXl7ULkgso0e7/0qOQUU= X-Gm-Message-State: AOJu0Yyh+DoSI9lNR14ZCmJq3W0Ew1KFvfORg41FP39voJgeT0McoXkd cUmR852bvxwKkGMMrtJNaHJfIAENXkUEZ6tObY6PXVAJ1LYUzjL7Z+prPe32oyM= X-Google-Smtp-Source: AGHT+IGYhz9kqyaSvfR2zovEJceiSkgICxZwUH2z03g45TT6Hmt+XK3vQS9Sl/hXCjx2oD+RzdYXrA== X-Received: by 2002:a5d:45cb:0:b0:368:4634:c419 with SMTP id ffacd0b85a97d-36b5d09e40emr6824079f8f.58.1722355399934; Tue, 30 Jul 2024 09:03:19 -0700 (PDT) Received: from orth.archaic.org.uk (orth.archaic.org.uk. [2001:8b0:1d0::2]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36b3685810csm15001676f8f.71.2024.07.30.09.03.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 09:03:19 -0700 (PDT) From: Peter Maydell To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 8/8] target/arm: Enable FEAT_EBF16 in the "max" CPU Date: Tue, 30 Jul 2024 17:03:06 +0100 Message-Id: <20240730160306.2959745-9-peter.maydell@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730160306.2959745-1-peter.maydell@linaro.org> References: <20240730160306.2959745-1-peter.maydell@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::32a; envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x32a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Now that we've implemented the required behaviour for FEAT_EBF16, we can enable it for the "max" CPU type, list it in our documentation, and delete a TODO comment about it being missing. Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- docs/system/arm/emulation.rst | 1 + target/arm/tcg/cpu64.c | 4 ++-- target/arm/tcg/translate-sme.c | 1 - 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst index 3ab6e726679..35f52a54b1c 100644 --- a/docs/system/arm/emulation.rst +++ b/docs/system/arm/emulation.rst @@ -45,6 +45,7 @@ the following architecture extensions: - FEAT_DotProd (Advanced SIMD dot product instructions) - FEAT_DoubleFault (Double Fault Extension) - FEAT_E0PD (Preventing EL0 access to halves of address maps) +- FEAT_EBF16 (AArch64 Extended BFloat16 instructions) - FEAT_ECV (Enhanced Counter Virtualization) - FEAT_EL0 (Support for execution at EL0) - FEAT_EL1 (Support for execution at EL1) diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c index fe232eb3069..79258a7c928 100644 --- a/target/arm/tcg/cpu64.c +++ b/target/arm/tcg/cpu64.c @@ -1160,7 +1160,7 @@ void aarch64_max_tcg_initfn(Object *obj) t = FIELD_DP64(t, ID_AA64ISAR1, FRINTTS, 1); /* FEAT_FRINTTS */ t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1); /* FEAT_SB */ t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1); /* FEAT_SPECRES */ - t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 1); /* FEAT_BF16 */ + t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 2); /* FEAT_BF16, FEAT_EBF16 */ t = FIELD_DP64(t, ID_AA64ISAR1, DGH, 1); /* FEAT_DGH */ t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 1); /* FEAT_I8MM */ cpu->isar.id_aa64isar1 = t; @@ -1244,7 +1244,7 @@ void aarch64_max_tcg_initfn(Object *obj) t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1); t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2); /* FEAT_SVE_PMULL128 */ t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1); /* FEAT_SVE_BitPerm */ - t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 1); /* FEAT_BF16 */ + t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 2); /* FEAT_BF16, FEAT_EBF16 */ t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1); /* FEAT_SVE_SHA3 */ t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1); /* FEAT_SVE_SM4 */ t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1); /* FEAT_I8MM */ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index bcb502feb05..760c200e622 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -354,7 +354,6 @@ TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_FPCR, gen_helper_sme_fmopa_d) -/* TODO: FEAT_EBF16 */ TRANS_FEAT(BFMOPA, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa) TRANS_FEAT(SMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_smopa_s)