From patchwork Thu Sep 19 20:49:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 1987571 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Ng6vVNei; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4X8nhY6Z05z1y1m for ; Fri, 20 Sep 2024 06:50:05 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 55A88385840A for ; Thu, 19 Sep 2024 20:50:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTP id 95E5B3858D29 for ; Thu, 19 Sep 2024 20:49:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 95E5B3858D29 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 95E5B3858D29 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726778984; cv=none; b=GMICiS6OI/h2HE0Tc9JABgXTT4ysHydFx3koS82bkPjmyoern/vssVynuyi//S3eD3lVa5QdHzQLh/TnRJ7fmnCvB8Ycq0HhgMsUMLd1cVyqWV447sgQVWbmxVI7v2oR0/HriWkId5TAkk0mF6MMwq9PkV8nRnOQmb8WLCsP9fE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726778984; c=relaxed/simple; bh=y22f1/qGJx9rcR3u9rirmkla6daEHMKKi4ID5EbkzDA=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=r2MJ5pIqRVwk/dg1avV2NNCqJKaCfxpMI8ypDRFIDWJVUd4jrggvXYbKcCUj1Iuiep6gAAiioUHDI+0+tWDY0ALhWzg2ORxnaeqQ8MV6KadbVyDeEVTtaPCyhkVtRV8tG2+i8Xxkg+vuuO7Q315wDB63sr2waDmz663+fsVZsjk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1726778981; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=KYWngBbVOCuWwTsfsGb4dp7yQJWYzQ7gEoyp/+gkFVQ=; b=Ng6vVNei8S6/z6U1Mslbx32Be5wAqFPFSU6J6oongu1/wnmCZiY3TJpzKP0YdLkTAJ46Q7 cFT9ryrAi7KB7weKYrRLBDZYlw5v5jcK9wdGfE6xN0CgJFViWZDtWpSNKkm6brXNITJ24h TmAwCcQc8nKeTrZJ4B8xxlc0fYmJCMI= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-195-fpefTRrqNB6UgOxLzpTIIQ-1; Thu, 19 Sep 2024 16:49:38 -0400 X-MC-Unique: fpefTRrqNB6UgOxLzpTIIQ-1 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (unknown [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BEFB1197700F; Thu, 19 Sep 2024 20:49:37 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.45.224.61]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EA9CC1956052; Thu, 19 Sep 2024 20:49:36 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 48JKnYZH1840330 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 19 Sep 2024 22:49:34 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 48JKnYUd1840329; Thu, 19 Sep 2024 22:49:34 +0200 Date: Thu, 19 Sep 2024 22:49:33 +0200 From: Jakub Jelinek To: Uros Bizjak Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738] Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi! min/max patterns for intrinsics which on x86 result in the second input operand if the two operands are both zeros or one or both of them are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to MIN_EXPR/MAX_EXPR undefined what will be the result in those cases. The following patch adds an expander which uses either a new pattern with UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? P.S. I have a patch to replace UNSPEC_IEEE_M{AX,IN} with IF_THEN_ELSE (except for the 3dNOW! PFMIN/MAX, those actually are documented to behave differently), but it actually doesn't improve anything much, as simplify_const_relational_operation nor simplify_ternary_operation aren't able to fold comparisons with two CONST_VECTOR operands or IF_THEN_ELSE with 3 CONST_VECTOR operands. So, maybe better approach will be to generic fold the builtins with constant arguments (maybe leaving NaNs to runtime). 2024-09-19 Uros Bizjak Jakub Jelinek PR target/116738 * config/i386/subst.md (mask_scalar_operand_arg34, mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New subst attributes. * config/i386/sse.md (_vm3): Change from define_insn to define_expand, rename the old define_insn to ... (*_vm3): ... this. (_ieee_vm3): New define_insn. * gcc.target/i386/sse-pr116738.c: New test. Jakub --- gcc/config/i386/subst.md.jj 2024-09-18 15:49:42.200791315 +0200 +++ gcc/config/i386/subst.md 2024-09-19 12:32:51.048626421 +0200 @@ -366,6 +366,8 @@ (define_subst_attr "mask_scalar_operand4 (define_subst_attr "mask_scalarcz_operand4" "mask_scalarcz" "" "%{%5%}%N4") (define_subst_attr "mask_scalar4_dest_false_dep_for_glc_cond" "mask_scalar" "1" "operands[4] == CONST0_RTX(mode)") (define_subst_attr "mask_scalarc_dest_false_dep_for_glc_cond" "mask_scalarc" "1" "operands[3] == CONST0_RTX(V8HFmode)") +(define_subst_attr "mask_scalar_operand_arg34" "mask_scalar" "" ", operands[3], operands[4]") +(define_subst_attr "mask_scalar_expand_op3" "mask_scalar" "3" "5") (define_subst "mask_scalar" [(set (match_operand:SUBST_V 0) @@ -473,6 +475,7 @@ (define_subst_attr "round_saeonly_scalar (define_subst_attr "round_saeonly_scalar_constraint" "round_saeonly_scalar" "vm" "v") (define_subst_attr "round_saeonly_scalar_prefix" "round_saeonly_scalar" "vex" "evex") (define_subst_attr "round_saeonly_scalar_nimm_predicate" "round_saeonly_scalar" "nonimmediate_operand" "register_operand") +(define_subst_attr "round_saeonly_scalar_mask_arg3" "round_saeonly_scalar" "" ", operands[]") (define_subst "round_saeonly_scalar" [(set (match_operand:SUBST_V 0) --- gcc/config/i386/sse.md.jj 2024-09-10 16:26:02.875151133 +0200 +++ gcc/config/i386/sse.md 2024-09-19 12:43:31.693030695 +0200 @@ -3333,7 +3333,27 @@ (define_insn "*ieee_3 (const_string "*"))) (set_attr "mode" "")]) -(define_insn "_vm3" +(define_expand "_vm3" + [(set (match_operand:VFH_128 0 "register_operand") + (vec_merge:VFH_128 + (smaxmin:VFH_128 + (match_operand:VFH_128 1 "register_operand") + (match_operand:VFH_128 2 "nonimmediate_operand")) + (match_dup 1) + (const_int 1)))] + "TARGET_SSE" +{ + if (!flag_finite_math_only || flag_signed_zeros) + { + emit_insn (gen__ieee_vm3 + (operands[0], operands[1], operands[2] + + )); + DONE; + } +}) + +(define_insn "*_vm3" [(set (match_operand:VFH_128 0 "register_operand" "=x,v") (vec_merge:VFH_128 (smaxmin:VFH_128 @@ -3348,6 +3368,25 @@ (define_insn "_vm3") + (set_attr "mode" "")]) + +(define_insn "_ieee_vm3" + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") + (vec_merge:VFH_128 + (unspec:VFH_128 + [(match_operand:VFH_128 1 "register_operand" "0,v") + (match_operand:VFH_128 2 "nonimmediate_operand" "xm,")] + IEEE_MAXMIN) + (match_dup 1) + (const_int 1)))] + "TARGET_SSE" + "@ + \t{%2, %0|%0, %2} + v\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sse") + (set_attr "btver2_sse_attr" "maxmin") (set_attr "prefix" "") (set_attr "mode" "")]) --- gcc/testsuite/gcc.target/i386/sse-pr116738.c.jj 2024-09-19 12:52:33.502681950 +0200 +++ gcc/testsuite/gcc.target/i386/sse-pr116738.c 2024-09-19 12:54:20.938219741 +0200 @@ -0,0 +1,28 @@ +/* PR target/116738 */ +/* { dg-do run } */ +/* { dg-options "-O2 -msse" } */ +/* { dg-require-effective-target sse } */ + +#include "sse-check.h" + +static inline float +clamp (float f) +{ + __m128 v = _mm_set_ss (f); + __m128 zero = _mm_setzero_ps (); + __m128 greatest = _mm_set_ss (__FLT_MAX__); + v = _mm_min_ss (v, greatest); + v = _mm_max_ss (v, zero); + return _mm_cvtss_f32 (v); +} + +static void +sse_test (void) +{ + float f = clamp (-0.0f); + if (f != 0.0f || __builtin_signbitf (f)) + abort (); + f = clamp (__builtin_nanf ("")); + if (__builtin_isnanf (f) || f != __FLT_MAX__) + abort (); +}