From patchwork Wed Nov 24 19:37:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Pandey X-Patchwork-Id: 1559340 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=sMe+54Jl; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Hzt5m5dM1z9sCD for ; Thu, 25 Nov 2021 07:34:20 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 600BA385800F for ; Wed, 24 Nov 2021 20:34:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 600BA385800F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1637786058; bh=DNlXw+r4cFwZ3frM2kApqnS/Q2y5B7jMfUf6qT8o6zc=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=sMe+54JlD41RAYerq/+36fAhqTnu5BJAZuybudPBI3J5F/5xZi8hJvkAplKtUjWYR tgnpG/ABhDLtEWfCQ61V+BBU+rYe6GObQiDmgHWu5RZMbbFgdsJUfsyuK6Y2aWJdv4 CtkKltdxnTghq9EmY37xGlcisLHagcbDf2zLqqWM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 1A7FF385800F for ; Wed, 24 Nov 2021 19:38:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1A7FF385800F X-IronPort-AV: E=McAfee;i="6200,9189,10178"; a="298763445" X-IronPort-AV: E=Sophos;i="5.87,261,1631602800"; d="scan'208";a="298763445" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2021 11:38:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,261,1631602800"; d="scan'208";a="650501619" Received: from scymds02.sc.intel.com ([10.82.73.244]) by fmsmga001.fm.intel.com with ESMTP; 24 Nov 2021 11:38:13 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds02.sc.intel.com with ESMTP id 1AOJc7XG021555; Wed, 24 Nov 2021 11:38:13 -0800 To: libc-alpha@sourceware.org Subject: [PATCH 29/42] x86-64: Add vector hypot/hypotf implementation to libmvec Date: Wed, 24 Nov 2021 11:37:54 -0800 Message-Id: <20211124193807.2093208-30-skpgkp2@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211124193807.2093208-1-skpgkp2@gmail.com> References: <20211124193807.2093208-1-skpgkp2@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, LOTS_OF_MONEY, NML_ADSP_CUSTOM_MED, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Sunil K Pandey via Libc-alpha From: Sunil Pandey Reply-To: Sunil K Pandey Cc: andrey.kolesov@intel.com Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector hypot/hypotf with regenerated ulps. --- bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 2 + sysdeps/x86_64/fpu/libm-test-ulps | 20 + .../fpu/multiarch/svml_d_hypot2_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_d_hypot2_core.c | 28 + .../fpu/multiarch/svml_d_hypot2_core_sse4.S | 2336 +++++++++++++++++ .../fpu/multiarch/svml_d_hypot4_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_d_hypot4_core.c | 28 + .../fpu/multiarch/svml_d_hypot4_core_avx2.S | 2162 +++++++++++++++ .../fpu/multiarch/svml_d_hypot8_core-avx2.S | 20 + .../x86_64/fpu/multiarch/svml_d_hypot8_core.c | 28 + .../fpu/multiarch/svml_d_hypot8_core_avx512.S | 1775 +++++++++++++ .../fpu/multiarch/svml_s_hypotf16_core-avx2.S | 20 + .../fpu/multiarch/svml_s_hypotf16_core.c | 28 + .../multiarch/svml_s_hypotf16_core_avx512.S | 1684 ++++++++++++ .../fpu/multiarch/svml_s_hypotf4_core-sse2.S | 20 + .../fpu/multiarch/svml_s_hypotf4_core.c | 28 + .../fpu/multiarch/svml_s_hypotf4_core_sse4.S | 2062 +++++++++++++++ .../fpu/multiarch/svml_s_hypotf8_core-sse.S | 20 + .../fpu/multiarch/svml_s_hypotf8_core.c | 28 + .../fpu/multiarch/svml_s_hypotf8_core_avx2.S | 1943 ++++++++++++++ sysdeps/x86_64/fpu/svml_d_hypot2_core.S | 29 + sysdeps/x86_64/fpu/svml_d_hypot4_core.S | 29 + sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S | 25 + sysdeps/x86_64/fpu/svml_d_hypot8_core.S | 25 + sysdeps/x86_64/fpu/svml_s_hypotf16_core.S | 25 + sysdeps/x86_64/fpu/svml_s_hypotf4_core.S | 29 + sysdeps/x86_64/fpu/svml_s_hypotf8_core.S | 29 + sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S | 25 + .../fpu/test-double-libmvec-hypot-avx.c | 1 + .../fpu/test-double-libmvec-hypot-avx2.c | 1 + .../fpu/test-double-libmvec-hypot-avx512f.c | 1 + .../x86_64/fpu/test-double-libmvec-hypot.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../fpu/test-float-libmvec-hypotf-avx.c | 1 + .../fpu/test-float-libmvec-hypotf-avx2.c | 1 + .../fpu/test-float-libmvec-hypotf-avx512f.c | 1 + .../x86_64/fpu/test-float-libmvec-hypotf.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 49 files changed, 12533 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot2_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h index 683eb5569e..4e08de9936 100644 --- a/bits/libm-simd-decl-stubs.h +++ b/bits/libm-simd-decl-stubs.h @@ -252,4 +252,15 @@ #define __DECL_SIMD_expm1f32x #define __DECL_SIMD_expm1f64x #define __DECL_SIMD_expm1f128x + +#define __DECL_SIMD_hypot +#define __DECL_SIMD_hypotf +#define __DECL_SIMD_hypotl +#define __DECL_SIMD_hypotf16 +#define __DECL_SIMD_hypotf32 +#define __DECL_SIMD_hypotf64 +#define __DECL_SIMD_hypotf128 +#define __DECL_SIMD_hypotf32x +#define __DECL_SIMD_hypotf64x +#define __DECL_SIMD_hypotf128x #endif diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h index 345f1f3704..32f487c109 100644 --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -144,7 +144,7 @@ __MATHCALL (sqrt,, (_Mdouble_ __x)); #if defined __USE_XOPEN || defined __USE_ISOC99 /* Return `sqrt(X*X + Y*Y)'. */ -__MATHCALL (hypot,, (_Mdouble_ __x, _Mdouble_ __y)); +__MATHCALL_VEC (hypot,, (_Mdouble_ __x, _Mdouble_ __y)); #endif #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99 diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist index 119985e65e..4fbc8629a1 100644 --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist @@ -60,6 +60,7 @@ GLIBC_2.35 _ZGVbN2v_exp10 F GLIBC_2.35 _ZGVbN2v_exp2 F GLIBC_2.35 _ZGVbN2v_expm1 F GLIBC_2.35 _ZGVbN2vv_atan2 F +GLIBC_2.35 _ZGVbN2vv_hypot F GLIBC_2.35 _ZGVbN4v_acosf F GLIBC_2.35 _ZGVbN4v_acoshf F GLIBC_2.35 _ZGVbN4v_asinf F @@ -74,6 +75,7 @@ GLIBC_2.35 _ZGVbN4v_exp10f F GLIBC_2.35 _ZGVbN4v_exp2f F GLIBC_2.35 _ZGVbN4v_expm1f F GLIBC_2.35 _ZGVbN4vv_atan2f F +GLIBC_2.35 _ZGVbN4vv_hypotf F GLIBC_2.35 _ZGVcN4v_acos F GLIBC_2.35 _ZGVcN4v_acosh F GLIBC_2.35 _ZGVcN4v_asin F @@ -88,6 +90,7 @@ GLIBC_2.35 _ZGVcN4v_exp10 F GLIBC_2.35 _ZGVcN4v_exp2 F GLIBC_2.35 _ZGVcN4v_expm1 F GLIBC_2.35 _ZGVcN4vv_atan2 F +GLIBC_2.35 _ZGVcN4vv_hypot F GLIBC_2.35 _ZGVcN8v_acosf F GLIBC_2.35 _ZGVcN8v_acoshf F GLIBC_2.35 _ZGVcN8v_asinf F @@ -102,6 +105,7 @@ GLIBC_2.35 _ZGVcN8v_exp10f F GLIBC_2.35 _ZGVcN8v_exp2f F GLIBC_2.35 _ZGVcN8v_expm1f F GLIBC_2.35 _ZGVcN8vv_atan2f F +GLIBC_2.35 _ZGVcN8vv_hypotf F GLIBC_2.35 _ZGVdN4v_acos F GLIBC_2.35 _ZGVdN4v_acosh F GLIBC_2.35 _ZGVdN4v_asin F @@ -116,6 +120,7 @@ GLIBC_2.35 _ZGVdN4v_exp10 F GLIBC_2.35 _ZGVdN4v_exp2 F GLIBC_2.35 _ZGVdN4v_expm1 F GLIBC_2.35 _ZGVdN4vv_atan2 F +GLIBC_2.35 _ZGVdN4vv_hypot F GLIBC_2.35 _ZGVdN8v_acosf F GLIBC_2.35 _ZGVdN8v_acoshf F GLIBC_2.35 _ZGVdN8v_asinf F @@ -130,6 +135,7 @@ GLIBC_2.35 _ZGVdN8v_exp10f F GLIBC_2.35 _ZGVdN8v_exp2f F GLIBC_2.35 _ZGVdN8v_expm1f F GLIBC_2.35 _ZGVdN8vv_atan2f F +GLIBC_2.35 _ZGVdN8vv_hypotf F GLIBC_2.35 _ZGVeN16v_acosf F GLIBC_2.35 _ZGVeN16v_acoshf F GLIBC_2.35 _ZGVeN16v_asinf F @@ -144,6 +150,7 @@ GLIBC_2.35 _ZGVeN16v_exp10f F GLIBC_2.35 _ZGVeN16v_exp2f F GLIBC_2.35 _ZGVeN16v_expm1f F GLIBC_2.35 _ZGVeN16vv_atan2f F +GLIBC_2.35 _ZGVeN16vv_hypotf F GLIBC_2.35 _ZGVeN8v_acos F GLIBC_2.35 _ZGVeN8v_acosh F GLIBC_2.35 _ZGVeN8v_asin F @@ -158,3 +165,4 @@ GLIBC_2.35 _ZGVeN8v_exp10 F GLIBC_2.35 _ZGVeN8v_exp2 F GLIBC_2.35 _ZGVeN8v_expm1 F GLIBC_2.35 _ZGVeN8vv_atan2 F +GLIBC_2.35 _ZGVeN8vv_hypot F diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h index f1e3b7e660..0f43244b3f 100644 --- a/sysdeps/x86/fpu/bits/math-vector.h +++ b/sysdeps/x86/fpu/bits/math-vector.h @@ -114,6 +114,10 @@ # define __DECL_SIMD_expm1 __DECL_SIMD_x86_64 # undef __DECL_SIMD_expm1f # define __DECL_SIMD_expm1f __DECL_SIMD_x86_64 +# undef __DECL_SIMD_hypot +# define __DECL_SIMD_hypot __DECL_SIMD_x86_64 +# undef __DECL_SIMD_hypotf +# define __DECL_SIMD_hypotf __DECL_SIMD_x86_64 # endif #endif diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig index 6c63e0ceed..8c614c5fb4 100644 --- a/sysdeps/x86_64/fpu/Makeconfig +++ b/sysdeps/x86_64/fpu/Makeconfig @@ -38,6 +38,7 @@ libmvec-funcs = \ exp10 \ exp2 \ expm1 \ + hypot \ log \ pow \ sin \ diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions index 74a881b0f6..6beaa3bf8b 100644 --- a/sysdeps/x86_64/fpu/Versions +++ b/sysdeps/x86_64/fpu/Versions @@ -28,6 +28,7 @@ libmvec { _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2; _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1; _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2; + _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot; _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf; _ZGVbN4v_acoshf; _ZGVcN8v_acoshf; _ZGVdN8v_acoshf; _ZGVeN16v_acoshf; _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf; @@ -42,5 +43,6 @@ libmvec { _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f; _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f; _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f; + _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf; } } diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps index c338319b69..577ffd239a 100644 --- a/sysdeps/x86_64/fpu/libm-test-ulps +++ b/sysdeps/x86_64/fpu/libm-test-ulps @@ -1592,6 +1592,26 @@ double: 1 float128: 1 ldouble: 1 +Function: "hypot_vlen16": +float: 1 + +Function: "hypot_vlen2": +double: 1 + +Function: "hypot_vlen4": +double: 1 +float: 1 + +Function: "hypot_vlen4_avx2": +double: 1 + +Function: "hypot_vlen8": +double: 1 +float: 1 + +Function: "hypot_vlen8_avx2": +float: 1 + Function: "j0": double: 3 float: 9 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S new file mode 100644 index 0000000000..237e38459e --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized hypot. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN2vv_hypot _ZGVbN2vv_hypot_sse2 +#include "../svml_d_hypot2_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c new file mode 100644 index 0000000000..3f0865f05d --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized hypot, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN2vv_hypot +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN2vv_hypot, __GI__ZGVbN2vv_hypot, + __redirect__ZGVbN2vv_hypot) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S new file mode 100644 index 0000000000..2940aa7ae8 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S @@ -0,0 +1,2336 @@ +/* Function hypot vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * HIGH LEVEL OVERVIEW + * + * Calculate z = (x*x+y*y) + * Calculate reciplicle sqrt (z) + * Calculate error = z*(rsqrt(z)*rsqrt(z)) - 1 + * Calculate fixing part p with polynom + * Fix answer with sqrt(z) = z * rsqrt(z) + error * p * z + * + * ALGORITHM DETAILS + * + * Multiprecision branch for _HA_ only + * Remove sigm from both arguments + * Find maximum (_x) and minimum (_y) (by abs value) between arguments + * Split _x int _a and _b for multiprecision + * If _x >> _y we will we will not split _y for multiprecision + * all _y will be put into lower part (_d) and higher part (_c = 0) + * Fixing _hilo_mask for the case _x >> _y + * Split _y into _c and _d for multiprecision with fixed mask + * + * compute Hi and Lo parts of _z = _x*_x + _y*_y + * + * _zHi = _a*_a + _c*_c + * _zLo = (_x + _a)*_b + _d*_y + _d*_c + * _z = _zHi + _zLo + * + * No multiprecision branch for _LA_ and _EP_ + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2 + * + * Check _z exponent to be withing borders [3BC ; 441] else goto Callout + * + * _s ~ 1.0/sqrt(_z) + * _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z = (1.0/_z + O) + * _e[rror] = (1.0/_z + O) * _z - 1.0 + * calculate fixing part _p + * _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1 + * some parts of polynom are skipped for lower flav + * + * result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z + * + * + */ + +#include + + .text +ENTRY(_ZGVbN2vv_hypot_sse4) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + movaps %xmm1, %xmm2 + +/* + * Defines + * Implementation + * Multiprecision branch for _HA_ only + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2 + */ + movaps %xmm0, %xmm1 + movaps %xmm2, %xmm3 + mulpd %xmm0, %xmm1 + mulpd %xmm2, %xmm3 + addpd %xmm3, %xmm1 + +/* + * _s ~ 1.0/sqrt(_z) + * _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z + */ + cvtpd2ps %xmm1, %xmm3 + +/* Check _z exponent to be withing borders [3BC ; 441] else goto Callout */ + movq 576+__svml_dhypot_data_internal(%rip), %xmm6 + movq 640+__svml_dhypot_data_internal(%rip), %xmm4 + pshufd $221, %xmm1, %xmm5 + movlhps %xmm3, %xmm3 + pcmpgtd %xmm5, %xmm6 + pcmpgtd %xmm4, %xmm5 + rsqrtps %xmm3, %xmm4 + por %xmm5, %xmm6 + pshufd $80, %xmm6, %xmm7 + cvtps2pd %xmm4, %xmm6 + movmskpd %xmm7, %edx + movaps %xmm6, %xmm3 + mulpd %xmm6, %xmm3 + +/* _e[rror] ~ (1.0/_z + O) * _z - 1.0 */ + mulpd %xmm1, %xmm3 + subpd 128+__svml_dhypot_data_internal(%rip), %xmm3 + +/* + * calculate fixing part _p + * _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1 + * some parts of polynom are skipped for lower flav + */ + movups 256+__svml_dhypot_data_internal(%rip), %xmm5 + mulpd %xmm3, %xmm5 + addpd 320+__svml_dhypot_data_internal(%rip), %xmm5 + mulpd %xmm3, %xmm5 + addpd 384+__svml_dhypot_data_internal(%rip), %xmm5 + mulpd %xmm3, %xmm5 + addpd 448+__svml_dhypot_data_internal(%rip), %xmm5 + +/* result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z */ + mulpd %xmm5, %xmm3 + mulpd %xmm6, %xmm3 + mulpd %xmm1, %xmm6 + mulpd %xmm1, %xmm3 + addpd %xmm6, %xmm3 + +/* The end of implementation */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + movaps %xmm3, %xmm0 + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + movups %xmm0, 192(%rsp) + movups %xmm2, 256(%rsp) + movups %xmm3, 320(%rsp) + xorl %eax, %eax + movups %xmm8, 112(%rsp) + movups %xmm9, 96(%rsp) + movups %xmm10, 80(%rsp) + movups %xmm11, 64(%rsp) + movups %xmm12, 48(%rsp) + movups %xmm13, 32(%rsp) + movups %xmm14, 16(%rsp) + movups %xmm15, (%rsp) + movq %rsi, 136(%rsp) + movq %rdi, 128(%rsp) + movq %r12, 152(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x18, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 144(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $2, %r12d + jl .LBL_1_7 + movups 112(%rsp), %xmm8 + cfi_restore(25) + movups 96(%rsp), %xmm9 + cfi_restore(26) + movups 80(%rsp), %xmm10 + cfi_restore(27) + movups 64(%rsp), %xmm11 + cfi_restore(28) + movups 48(%rsp), %xmm12 + cfi_restore(29) + movups 32(%rsp), %xmm13 + cfi_restore(30) + movups 16(%rsp), %xmm14 + cfi_restore(31) + movups (%rsp), %xmm15 + cfi_restore(32) + movq 136(%rsp), %rsi + cfi_restore(4) + movq 128(%rsp), %rdi + cfi_restore(5) + movq 152(%rsp), %r12 + cfi_restore(12) + movq 144(%rsp), %r13 + cfi_restore(13) + movups 320(%rsp), %xmm3 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x18, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 192(%rsp,%r12,8), %rdi + lea 256(%rsp,%r12,8), %rsi + lea 320(%rsp,%r12,8), %rdx + call __svml_dhypot_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVbN2vv_hypot_sse4) + + .align 16,0x90 + +__svml_dhypot_cout_rare_internal: + + cfi_startproc + + movq %rdx, %r8 + movzwl 6(%rdi), %eax + andl $32752, %eax + cmpl $32752, %eax + je .LBL_2_14 + movzwl 6(%rsi), %eax + andl $32752, %eax + cmpl $32752, %eax + je .LBL_2_13 + movsd (%rdi), %xmm2 + movsd 4096+_vmldHypotHATab(%rip), %xmm0 + movb 7(%rdi), %dl + movb 7(%rsi), %al + movsd (%rsi), %xmm1 + ucomisd %xmm0, %xmm2 + jp .LBL_2_4 + je .LBL_2_11 + +.LBL_2_4: + movsd %xmm2, -16(%rsp) + andb $127, %dl + movsd %xmm1, -48(%rsp) + andb $127, %al + movb %dl, -9(%rsp) + movb %al, -41(%rsp) + movsd -16(%rsp), %xmm8 + movsd -48(%rsp), %xmm1 + comisd %xmm8, %xmm1 + jbe .LBL_2_6 + movaps %xmm8, %xmm2 + movaps %xmm1, %xmm8 + movsd %xmm1, -16(%rsp) + movaps %xmm2, %xmm1 + +.LBL_2_6: + movzwl -10(%rsp), %edx + andl $32752, %edx + shrl $4, %edx + negl %edx + movzwl 4102+_vmldHypotHATab(%rip), %edi + andl $-32753, %edi + movsd %xmm0, -56(%rsp) + movsd 4128+_vmldHypotHATab(%rip), %xmm3 + lea 1025(%rdx), %esi + negl %esi + addl $1000, %esi + shrl $31, %esi + imull $-23, %esi, %eax + lea 1025(%rax,%rdx), %esi + lea 1023(%rsi), %ecx + andl $2047, %ecx + shll $4, %ecx + orl %ecx, %edi + movw %di, -50(%rsp) + movsd -56(%rsp), %xmm2 + mulsd %xmm2, %xmm8 + mulsd %xmm2, %xmm1 + mulsd %xmm8, %xmm3 + movsd %xmm3, -72(%rsp) + movsd -72(%rsp), %xmm4 + movsd %xmm8, -16(%rsp) + subsd %xmm8, %xmm4 + movsd %xmm4, -64(%rsp) + movsd -72(%rsp), %xmm6 + movsd -64(%rsp), %xmm5 + movsd %xmm1, -48(%rsp) + subsd %xmm5, %xmm6 + movsd %xmm6, -72(%rsp) + movsd -72(%rsp), %xmm7 + movzwl -10(%rsp), %r10d + subsd %xmm7, %xmm8 + movzwl -42(%rsp), %r9d + andl $32752, %r10d + andl $32752, %r9d + shrl $4, %r10d + shrl $4, %r9d + movsd %xmm8, -64(%rsp) + subl %r9d, %r10d + movsd -72(%rsp), %xmm8 + movsd -64(%rsp), %xmm4 + cmpl $6, %r10d + jle .LBL_2_8 + movaps %xmm1, %xmm2 + jmp .LBL_2_9 + +.LBL_2_8: + movsd -48(%rsp), %xmm1 + movsd 4128+_vmldHypotHATab(%rip), %xmm0 + movaps %xmm1, %xmm7 + mulsd %xmm1, %xmm0 + movsd %xmm0, -72(%rsp) + movsd -72(%rsp), %xmm2 + subsd -48(%rsp), %xmm2 + movsd %xmm2, -64(%rsp) + movsd -72(%rsp), %xmm5 + movsd -64(%rsp), %xmm3 + subsd %xmm3, %xmm5 + movsd %xmm5, -72(%rsp) + movsd -72(%rsp), %xmm6 + subsd %xmm6, %xmm7 + movsd %xmm7, -64(%rsp) + movsd -72(%rsp), %xmm0 + movsd -64(%rsp), %xmm2 + +.LBL_2_9: + movsd -16(%rsp), %xmm6 + movaps %xmm8, %xmm3 + mulsd %xmm2, %xmm1 + addsd %xmm8, %xmm6 + mulsd %xmm8, %xmm3 + mulsd %xmm6, %xmm4 + movaps %xmm0, %xmm5 + negl %esi + mulsd %xmm0, %xmm5 + addsd %xmm1, %xmm4 + mulsd %xmm2, %xmm0 + addsd %xmm5, %xmm3 + addsd %xmm0, %xmm4 + movaps %xmm3, %xmm7 + addl $1023, %esi + movq 4112+_vmldHypotHATab(%rip), %r11 + movq %r11, %r9 + lea _vmldHypotHATab(%rip), %rdx + addsd %xmm4, %xmm7 + movsd %xmm7, -56(%rsp) + andl $2047, %esi + movzwl -50(%rsp), %ecx + andl $32752, %ecx + shrl $4, %ecx + addl $-1023, %ecx + movl %ecx, %eax + andl $1, %eax + subl %eax, %ecx + shrl $1, %ecx + movsd %xmm7, -48(%rsp) + movzwl -42(%rsp), %edi + andl $-32753, %edi + shrq $48, %r9 + lea 1023(%rcx), %r10d + addl %ecx, %ecx + addl $16368, %edi + negl %ecx + andl $2047, %r10d + addl $1023, %ecx + andl $2047, %ecx + andl $-32753, %r9d + movw %di, -42(%rsp) + shll $4, %r10d + shll $4, %ecx + orl %r9d, %r10d + shll $4, %esi + orl %r9d, %ecx + movsd -48(%rsp), %xmm2 + orl %esi, %r9d + movl -44(%rsp), %esi + mulsd 4112(%rdx,%rax,8), %xmm2 + andl $1048575, %esi + shrl $12, %esi + shll $8, %eax + addl %eax, %esi + movsd (%rdx,%rsi,8), %xmm8 + movsd 4104+_vmldHypotHATab(%rip), %xmm1 + mulsd %xmm8, %xmm2 + mulsd %xmm8, %xmm1 + movaps %xmm2, %xmm9 + mulsd %xmm1, %xmm9 + movsd 4104+_vmldHypotHATab(%rip), %xmm11 + movsd 4104+_vmldHypotHATab(%rip), %xmm14 + subsd %xmm9, %xmm11 + movaps %xmm11, %xmm10 + mulsd %xmm2, %xmm11 + mulsd %xmm1, %xmm10 + addsd %xmm11, %xmm2 + addsd %xmm10, %xmm1 + movaps %xmm2, %xmm12 + movaps %xmm1, %xmm13 + mulsd %xmm1, %xmm12 + movsd 4104+_vmldHypotHATab(%rip), %xmm0 + subsd %xmm12, %xmm14 + mulsd %xmm14, %xmm13 + mulsd %xmm2, %xmm14 + addsd %xmm13, %xmm1 + addsd %xmm14, %xmm2 + movaps %xmm2, %xmm15 + movaps %xmm2, %xmm5 + mulsd %xmm1, %xmm15 + movsd 4128+_vmldHypotHATab(%rip), %xmm6 + subsd %xmm15, %xmm0 + mulsd %xmm0, %xmm5 + mulsd %xmm1, %xmm0 + addsd %xmm5, %xmm2 + addsd %xmm0, %xmm1 + mulsd %xmm2, %xmm6 + movsd %xmm6, -72(%rsp) + movaps %xmm2, %xmm11 + movsd -72(%rsp), %xmm7 + movq %r11, -32(%rsp) + subsd %xmm2, %xmm7 + movsd %xmm7, -64(%rsp) + movsd -72(%rsp), %xmm9 + movsd -64(%rsp), %xmm8 + movw %cx, -26(%rsp) + subsd %xmm8, %xmm9 + movsd %xmm9, -72(%rsp) + movsd -72(%rsp), %xmm10 + movsd -32(%rsp), %xmm15 + subsd %xmm10, %xmm11 + mulsd %xmm15, %xmm3 + mulsd %xmm15, %xmm4 + movsd %xmm11, -64(%rsp) + movsd -72(%rsp), %xmm13 + movsd 4120+_vmldHypotHATab(%rip), %xmm14 + movaps %xmm13, %xmm12 + mulsd %xmm13, %xmm12 + mulsd %xmm13, %xmm14 + subsd %xmm12, %xmm3 + movsd -64(%rsp), %xmm5 + mulsd %xmm5, %xmm14 + mulsd %xmm5, %xmm5 + subsd %xmm14, %xmm3 + movq %r11, -40(%rsp) + subsd %xmm5, %xmm3 + movw %r10w, -34(%rsp) + addsd %xmm4, %xmm3 + mulsd %xmm1, %xmm3 + movq %r11, -24(%rsp) + addsd %xmm3, %xmm2 + mulsd -40(%rsp), %xmm2 + movw %r9w, -18(%rsp) + mulsd -24(%rsp), %xmm2 + movsd %xmm2, (%r8) + +.LBL_2_10: + xorl %eax, %eax + ret + +.LBL_2_11: + ucomisd %xmm0, %xmm1 + jne .LBL_2_4 + jp .LBL_2_4 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_13: + movsd (%rsi), %xmm0 + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_14: + movzwl 6(%rsi), %eax + andl $32752, %eax + cmpl $32752, %eax + je .LBL_2_16 + +.LBL_2_15: + movsd (%rdi), %xmm0 + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_16: + movl 4(%rdi), %edx + movl %edx, %eax + andl $1048575, %eax + jne .LBL_2_18 + cmpl $0, (%rdi) + je .LBL_2_23 + +.LBL_2_18: + testl $1048575, 4(%rsi) + jne .LBL_2_20 + cmpl $0, (%rsi) + je .LBL_2_21 + +.LBL_2_20: + movsd (%rdi), %xmm0 + mulsd (%rsi), %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_21: + testl %eax, %eax + jne .LBL_2_30 + cmpl $0, (%rdi) + je .LBL_2_24 + jmp .LBL_2_29 + +.LBL_2_23: + jne .LBL_2_29 + +.LBL_2_24: + movl 4(%rsi), %eax + testl $1048575, %eax + jne .LBL_2_26 + cmpl $0, (%rsi) + je .LBL_2_15 + +.LBL_2_26: + testl $524288, %eax + jne .LBL_2_15 + movsd 4112+_vmldHypotHATab(%rip), %xmm0 + mulsd (%rsi), %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_29: + je .LBL_2_13 + +.LBL_2_30: + testl $524288, %edx + jne .LBL_2_13 + movsd 4112+_vmldHypotHATab(%rip), %xmm0 + mulsd (%rdi), %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + + cfi_endproc + + .type __svml_dhypot_cout_rare_internal,@function + .size __svml_dhypot_cout_rare_internal,.-__svml_dhypot_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dhypot_data_internal: + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_dhypot_data_internal,@object + .size __svml_dhypot_data_internal,896 + .align 32 + +_vmldHypotHATab: + .long 0 + .long 1072693248 + .long 0 + .long 1072689152 + .long 0 + .long 1072685056 + .long 0 + .long 1072680960 + .long 0 + .long 1072676864 + .long 0 + .long 1072672768 + .long 0 + .long 1072668672 + .long 0 + .long 1072665600 + .long 0 + .long 1072661504 + .long 0 + .long 1072657408 + .long 0 + .long 1072653312 + .long 0 + .long 1072649216 + .long 0 + .long 1072646144 + .long 0 + .long 1072642048 + .long 0 + .long 1072637952 + .long 0 + .long 1072634880 + .long 0 + .long 1072630784 + .long 0 + .long 1072626688 + .long 0 + .long 1072623616 + .long 0 + .long 1072619520 + .long 0 + .long 1072615424 + .long 0 + .long 1072612352 + .long 0 + .long 1072608256 + .long 0 + .long 1072605184 + .long 0 + .long 1072601088 + .long 0 + .long 1072598016 + .long 0 + .long 1072593920 + .long 0 + .long 1072590848 + .long 0 + .long 1072586752 + .long 0 + .long 1072583680 + .long 0 + .long 1072580608 + .long 0 + .long 1072576512 + .long 0 + .long 1072573440 + .long 0 + .long 1072570368 + .long 0 + .long 1072566272 + .long 0 + .long 1072563200 + .long 0 + .long 1072560128 + .long 0 + .long 1072556032 + .long 0 + .long 1072552960 + .long 0 + .long 1072549888 + .long 0 + .long 1072546816 + .long 0 + .long 1072542720 + .long 0 + .long 1072539648 + .long 0 + .long 1072536576 + .long 0 + .long 1072533504 + .long 0 + .long 1072530432 + .long 0 + .long 1072527360 + .long 0 + .long 1072523264 + .long 0 + .long 1072520192 + .long 0 + .long 1072517120 + .long 0 + .long 1072514048 + .long 0 + .long 1072510976 + .long 0 + .long 1072507904 + .long 0 + .long 1072504832 + .long 0 + .long 1072501760 + .long 0 + .long 1072498688 + .long 0 + .long 1072495616 + .long 0 + .long 1072492544 + .long 0 + .long 1072489472 + .long 0 + .long 1072486400 + .long 0 + .long 1072483328 + .long 0 + .long 1072480256 + .long 0 + .long 1072478208 + .long 0 + .long 1072475136 + .long 0 + .long 1072472064 + .long 0 + .long 1072468992 + .long 0 + .long 1072465920 + .long 0 + .long 1072462848 + .long 0 + .long 1072459776 + .long 0 + .long 1072457728 + .long 0 + .long 1072454656 + .long 0 + .long 1072451584 + .long 0 + .long 1072448512 + .long 0 + .long 1072446464 + .long 0 + .long 1072443392 + .long 0 + .long 1072440320 + .long 0 + .long 1072437248 + .long 0 + .long 1072435200 + .long 0 + .long 1072432128 + .long 0 + .long 1072429056 + .long 0 + .long 1072427008 + .long 0 + .long 1072423936 + .long 0 + .long 1072420864 + .long 0 + .long 1072418816 + .long 0 + .long 1072415744 + .long 0 + .long 1072412672 + .long 0 + .long 1072410624 + .long 0 + .long 1072407552 + .long 0 + .long 1072405504 + .long 0 + .long 1072402432 + .long 0 + .long 1072400384 + .long 0 + .long 1072397312 + .long 0 + .long 1072395264 + .long 0 + .long 1072392192 + .long 0 + .long 1072390144 + .long 0 + .long 1072387072 + .long 0 + .long 1072385024 + .long 0 + .long 1072381952 + .long 0 + .long 1072379904 + .long 0 + .long 1072376832 + .long 0 + .long 1072374784 + .long 0 + .long 1072371712 + .long 0 + .long 1072369664 + .long 0 + .long 1072366592 + .long 0 + .long 1072364544 + .long 0 + .long 1072362496 + .long 0 + .long 1072359424 + .long 0 + .long 1072357376 + .long 0 + .long 1072355328 + .long 0 + .long 1072352256 + .long 0 + .long 1072350208 + .long 0 + .long 1072347136 + .long 0 + .long 1072345088 + .long 0 + .long 1072343040 + .long 0 + .long 1072340992 + .long 0 + .long 1072337920 + .long 0 + .long 1072335872 + .long 0 + .long 1072333824 + .long 0 + .long 1072330752 + .long 0 + .long 1072328704 + .long 0 + .long 1072326656 + .long 0 + .long 1072324608 + .long 0 + .long 1072321536 + .long 0 + .long 1072319488 + .long 0 + .long 1072317440 + .long 0 + .long 1072315392 + .long 0 + .long 1072313344 + .long 0 + .long 1072310272 + .long 0 + .long 1072308224 + .long 0 + .long 1072306176 + .long 0 + .long 1072304128 + .long 0 + .long 1072302080 + .long 0 + .long 1072300032 + .long 0 + .long 1072296960 + .long 0 + .long 1072294912 + .long 0 + .long 1072292864 + .long 0 + .long 1072290816 + .long 0 + .long 1072288768 + .long 0 + .long 1072286720 + .long 0 + .long 1072284672 + .long 0 + .long 1072282624 + .long 0 + .long 1072280576 + .long 0 + .long 1072278528 + .long 0 + .long 1072275456 + .long 0 + .long 1072273408 + .long 0 + .long 1072271360 + .long 0 + .long 1072269312 + .long 0 + .long 1072267264 + .long 0 + .long 1072265216 + .long 0 + .long 1072263168 + .long 0 + .long 1072261120 + .long 0 + .long 1072259072 + .long 0 + .long 1072257024 + .long 0 + .long 1072254976 + .long 0 + .long 1072252928 + .long 0 + .long 1072250880 + .long 0 + .long 1072248832 + .long 0 + .long 1072246784 + .long 0 + .long 1072244736 + .long 0 + .long 1072243712 + .long 0 + .long 1072241664 + .long 0 + .long 1072239616 + .long 0 + .long 1072237568 + .long 0 + .long 1072235520 + .long 0 + .long 1072233472 + .long 0 + .long 1072231424 + .long 0 + .long 1072229376 + .long 0 + .long 1072227328 + .long 0 + .long 1072225280 + .long 0 + .long 1072223232 + .long 0 + .long 1072222208 + .long 0 + .long 1072220160 + .long 0 + .long 1072218112 + .long 0 + .long 1072216064 + .long 0 + .long 1072214016 + .long 0 + .long 1072211968 + .long 0 + .long 1072210944 + .long 0 + .long 1072208896 + .long 0 + .long 1072206848 + .long 0 + .long 1072204800 + .long 0 + .long 1072202752 + .long 0 + .long 1072201728 + .long 0 + .long 1072199680 + .long 0 + .long 1072197632 + .long 0 + .long 1072195584 + .long 0 + .long 1072193536 + .long 0 + .long 1072192512 + .long 0 + .long 1072190464 + .long 0 + .long 1072188416 + .long 0 + .long 1072186368 + .long 0 + .long 1072185344 + .long 0 + .long 1072183296 + .long 0 + .long 1072181248 + .long 0 + .long 1072179200 + .long 0 + .long 1072178176 + .long 0 + .long 1072176128 + .long 0 + .long 1072174080 + .long 0 + .long 1072173056 + .long 0 + .long 1072171008 + .long 0 + .long 1072168960 + .long 0 + .long 1072167936 + .long 0 + .long 1072165888 + .long 0 + .long 1072163840 + .long 0 + .long 1072161792 + .long 0 + .long 1072160768 + .long 0 + .long 1072158720 + .long 0 + .long 1072157696 + .long 0 + .long 1072155648 + .long 0 + .long 1072153600 + .long 0 + .long 1072152576 + .long 0 + .long 1072150528 + .long 0 + .long 1072148480 + .long 0 + .long 1072147456 + .long 0 + .long 1072145408 + .long 0 + .long 1072143360 + .long 0 + .long 1072142336 + .long 0 + .long 1072140288 + .long 0 + .long 1072139264 + .long 0 + .long 1072137216 + .long 0 + .long 1072135168 + .long 0 + .long 1072134144 + .long 0 + .long 1072132096 + .long 0 + .long 1072131072 + .long 0 + .long 1072129024 + .long 0 + .long 1072128000 + .long 0 + .long 1072125952 + .long 0 + .long 1072124928 + .long 0 + .long 1072122880 + .long 0 + .long 1072120832 + .long 0 + .long 1072119808 + .long 0 + .long 1072117760 + .long 0 + .long 1072116736 + .long 0 + .long 1072114688 + .long 0 + .long 1072113664 + .long 0 + .long 1072111616 + .long 0 + .long 1072110592 + .long 0 + .long 1072108544 + .long 0 + .long 1072107520 + .long 0 + .long 1072105472 + .long 0 + .long 1072104448 + .long 0 + .long 1072102400 + .long 0 + .long 1072101376 + .long 0 + .long 1072099328 + .long 0 + .long 1072098304 + .long 0 + .long 1072096256 + .long 0 + .long 1072095232 + .long 0 + .long 1072094208 + .long 0 + .long 1072092160 + .long 0 + .long 1072091136 + .long 0 + .long 1072089088 + .long 0 + .long 1072088064 + .long 0 + .long 1072086016 + .long 0 + .long 1072084992 + .long 0 + .long 1072082944 + .long 0 + .long 1072081920 + .long 0 + .long 1072080896 + .long 0 + .long 1072078848 + .long 0 + .long 1072075776 + .long 0 + .long 1072073728 + .long 0 + .long 1072070656 + .long 0 + .long 1072067584 + .long 0 + .long 1072064512 + .long 0 + .long 1072061440 + .long 0 + .long 1072059392 + .long 0 + .long 1072056320 + .long 0 + .long 1072053248 + .long 0 + .long 1072051200 + .long 0 + .long 1072048128 + .long 0 + .long 1072045056 + .long 0 + .long 1072043008 + .long 0 + .long 1072039936 + .long 0 + .long 1072037888 + .long 0 + .long 1072034816 + .long 0 + .long 1072031744 + .long 0 + .long 1072029696 + .long 0 + .long 1072026624 + .long 0 + .long 1072024576 + .long 0 + .long 1072021504 + .long 0 + .long 1072019456 + .long 0 + .long 1072016384 + .long 0 + .long 1072014336 + .long 0 + .long 1072011264 + .long 0 + .long 1072009216 + .long 0 + .long 1072006144 + .long 0 + .long 1072004096 + .long 0 + .long 1072002048 + .long 0 + .long 1071998976 + .long 0 + .long 1071996928 + .long 0 + .long 1071993856 + .long 0 + .long 1071991808 + .long 0 + .long 1071989760 + .long 0 + .long 1071986688 + .long 0 + .long 1071984640 + .long 0 + .long 1071982592 + .long 0 + .long 1071979520 + .long 0 + .long 1071977472 + .long 0 + .long 1071975424 + .long 0 + .long 1071972352 + .long 0 + .long 1071970304 + .long 0 + .long 1071968256 + .long 0 + .long 1071966208 + .long 0 + .long 1071964160 + .long 0 + .long 1071961088 + .long 0 + .long 1071959040 + .long 0 + .long 1071956992 + .long 0 + .long 1071954944 + .long 0 + .long 1071952896 + .long 0 + .long 1071949824 + .long 0 + .long 1071947776 + .long 0 + .long 1071945728 + .long 0 + .long 1071943680 + .long 0 + .long 1071941632 + .long 0 + .long 1071939584 + .long 0 + .long 1071937536 + .long 0 + .long 1071935488 + .long 0 + .long 1071933440 + .long 0 + .long 1071930368 + .long 0 + .long 1071928320 + .long 0 + .long 1071926272 + .long 0 + .long 1071924224 + .long 0 + .long 1071922176 + .long 0 + .long 1071920128 + .long 0 + .long 1071918080 + .long 0 + .long 1071916032 + .long 0 + .long 1071913984 + .long 0 + .long 1071911936 + .long 0 + .long 1071909888 + .long 0 + .long 1071907840 + .long 0 + .long 1071905792 + .long 0 + .long 1071903744 + .long 0 + .long 1071901696 + .long 0 + .long 1071900672 + .long 0 + .long 1071898624 + .long 0 + .long 1071896576 + .long 0 + .long 1071894528 + .long 0 + .long 1071892480 + .long 0 + .long 1071890432 + .long 0 + .long 1071888384 + .long 0 + .long 1071886336 + .long 0 + .long 1071884288 + .long 0 + .long 1071883264 + .long 0 + .long 1071881216 + .long 0 + .long 1071879168 + .long 0 + .long 1071877120 + .long 0 + .long 1071875072 + .long 0 + .long 1071873024 + .long 0 + .long 1071872000 + .long 0 + .long 1071869952 + .long 0 + .long 1071867904 + .long 0 + .long 1071865856 + .long 0 + .long 1071864832 + .long 0 + .long 1071862784 + .long 0 + .long 1071860736 + .long 0 + .long 1071858688 + .long 0 + .long 1071856640 + .long 0 + .long 1071855616 + .long 0 + .long 1071853568 + .long 0 + .long 1071851520 + .long 0 + .long 1071850496 + .long 0 + .long 1071848448 + .long 0 + .long 1071846400 + .long 0 + .long 1071844352 + .long 0 + .long 1071843328 + .long 0 + .long 1071841280 + .long 0 + .long 1071839232 + .long 0 + .long 1071838208 + .long 0 + .long 1071836160 + .long 0 + .long 1071834112 + .long 0 + .long 1071833088 + .long 0 + .long 1071831040 + .long 0 + .long 1071830016 + .long 0 + .long 1071827968 + .long 0 + .long 1071825920 + .long 0 + .long 1071824896 + .long 0 + .long 1071822848 + .long 0 + .long 1071821824 + .long 0 + .long 1071819776 + .long 0 + .long 1071817728 + .long 0 + .long 1071816704 + .long 0 + .long 1071814656 + .long 0 + .long 1071813632 + .long 0 + .long 1071811584 + .long 0 + .long 1071810560 + .long 0 + .long 1071808512 + .long 0 + .long 1071806464 + .long 0 + .long 1071805440 + .long 0 + .long 1071803392 + .long 0 + .long 1071802368 + .long 0 + .long 1071800320 + .long 0 + .long 1071799296 + .long 0 + .long 1071797248 + .long 0 + .long 1071796224 + .long 0 + .long 1071794176 + .long 0 + .long 1071793152 + .long 0 + .long 1071791104 + .long 0 + .long 1071790080 + .long 0 + .long 1071788032 + .long 0 + .long 1071787008 + .long 0 + .long 1071784960 + .long 0 + .long 1071783936 + .long 0 + .long 1071782912 + .long 0 + .long 1071780864 + .long 0 + .long 1071779840 + .long 0 + .long 1071777792 + .long 0 + .long 1071776768 + .long 0 + .long 1071774720 + .long 0 + .long 1071773696 + .long 0 + .long 1071772672 + .long 0 + .long 1071770624 + .long 0 + .long 1071769600 + .long 0 + .long 1071767552 + .long 0 + .long 1071766528 + .long 0 + .long 1071765504 + .long 0 + .long 1071763456 + .long 0 + .long 1071762432 + .long 0 + .long 1071760384 + .long 0 + .long 1071759360 + .long 0 + .long 1071758336 + .long 0 + .long 1071756288 + .long 0 + .long 1071755264 + .long 0 + .long 1071754240 + .long 0 + .long 1071752192 + .long 0 + .long 1071751168 + .long 0 + .long 1071750144 + .long 0 + .long 1071748096 + .long 0 + .long 1071747072 + .long 0 + .long 1071746048 + .long 0 + .long 1071744000 + .long 0 + .long 1071742976 + .long 0 + .long 1071741952 + .long 0 + .long 1071739904 + .long 0 + .long 1071738880 + .long 0 + .long 1071737856 + .long 0 + .long 1071736832 + .long 0 + .long 1071734784 + .long 0 + .long 1071733760 + .long 0 + .long 1071732736 + .long 0 + .long 1071730688 + .long 0 + .long 1071729664 + .long 0 + .long 1071728640 + .long 0 + .long 1071727616 + .long 0 + .long 1071725568 + .long 0 + .long 1071724544 + .long 0 + .long 1071723520 + .long 0 + .long 1071722496 + .long 0 + .long 1071720448 + .long 0 + .long 1071719424 + .long 0 + .long 1071718400 + .long 0 + .long 1071717376 + .long 0 + .long 1071715328 + .long 0 + .long 1071714304 + .long 0 + .long 1071713280 + .long 0 + .long 1071712256 + .long 0 + .long 1071711232 + .long 0 + .long 1071709184 + .long 0 + .long 1071708160 + .long 0 + .long 1071707136 + .long 0 + .long 1071706112 + .long 0 + .long 1071705088 + .long 0 + .long 1071704064 + .long 0 + .long 1071702016 + .long 0 + .long 1071700992 + .long 0 + .long 1071699968 + .long 0 + .long 1071698944 + .long 0 + .long 1071697920 + .long 0 + .long 1071696896 + .long 0 + .long 1071694848 + .long 0 + .long 1071693824 + .long 0 + .long 1071692800 + .long 0 + .long 1071691776 + .long 0 + .long 1071690752 + .long 0 + .long 1071689728 + .long 0 + .long 1071688704 + .long 0 + .long 1071686656 + .long 0 + .long 1071685632 + .long 0 + .long 1071684608 + .long 0 + .long 1071683584 + .long 0 + .long 1071682560 + .long 0 + .long 1071681536 + .long 0 + .long 1071680512 + .long 0 + .long 1071679488 + .long 0 + .long 1071677440 + .long 0 + .long 1071676416 + .long 0 + .long 1071675392 + .long 0 + .long 1071674368 + .long 0 + .long 1071673344 + .long 0 + .long 1071672320 + .long 0 + .long 1071671296 + .long 0 + .long 1071670272 + .long 0 + .long 1071669248 + .long 0 + .long 1071668224 + .long 0 + .long 1071667200 + .long 0 + .long 1071666176 + .long 0 + .long 1071665152 + .long 0 + .long 1071663104 + .long 0 + .long 1071662080 + .long 0 + .long 1071661056 + .long 0 + .long 1071660032 + .long 0 + .long 1071659008 + .long 0 + .long 1071657984 + .long 0 + .long 1071656960 + .long 0 + .long 1071655936 + .long 0 + .long 1071654912 + .long 0 + .long 1071653888 + .long 0 + .long 1071652864 + .long 0 + .long 1071651840 + .long 0 + .long 1071650816 + .long 0 + .long 1071649792 + .long 0 + .long 1071648768 + .long 0 + .long 1071647744 + .long 0 + .long 1071646720 + .long 0 + .long 1071645696 + .long 0 + .long 0 + .long 0 + .long 1071644672 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 33554432 + .long 1101004800 + .type _vmldHypotHATab,@object + .size _vmldHypotHATab,4136 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S new file mode 100644 index 0000000000..5e7c75c44c --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized hypot. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN4vv_hypot _ZGVdN4vv_hypot_sse_wrapper +#include "../svml_d_hypot4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c new file mode 100644 index 0000000000..06f34d35e1 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized hypot, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN4vv_hypot +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN4vv_hypot, __GI__ZGVdN4vv_hypot, + __redirect__ZGVdN4vv_hypot) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S new file mode 100644 index 0000000000..c612159b8b --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S @@ -0,0 +1,2162 @@ +/* Function hypot vectorized with AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * HIGH LEVEL OVERVIEW + * + * Calculate z = (x*x+y*y) + * Calculate reciplicle sqrt (z) + * Calculate error = z*(rsqrt(z)*rsqrt(z)) - 1 + * Calculate fixing part p with polynom + * Fix answer with sqrt(z) = z * rsqrt(z) + error * p * z + * + * ALGORITHM DETAILS + * + * Multiprecision branch for _HA_ only + * Remove sigm from both arguments + * Find maximum (_x) and minimum (_y) (by abs value) between arguments + * Split _x int _a and _b for multiprecision + * If _x >> _y we will we will not split _y for multiprecision + * all _y will be put into lower part (_d) and higher part (_c = 0) + * Fixing _hilo_mask for the case _x >> _y + * Split _y into _c and _d for multiprecision with fixed mask + * + * compute Hi and Lo parts of _z = _x*_x + _y*_y + * + * _zHi = _a*_a + _c*_c + * _zLo = (_x + _a)*_b + _d*_y + _d*_c + * _z = _zHi + _zLo + * + * No multiprecision branch for _LA_ and _EP_ + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2 + * + * Check _z exponent to be withing borders [3BC ; 441] else goto Callout + * + * _s ~ 1.0/sqrt(_z) + * _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z = (1.0/_z + O) + * _e[rror] = (1.0/_z + O) * _z - 1.0 + * calculate fixing part _p + * _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1 + * some parts of polynom are skipped for lower flav + * + * result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z + * + * + */ + +#include + + .text +ENTRY(_ZGVdN4vv_hypot_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + vmovapd %ymm1, %ymm2 + vmovapd %ymm0, %ymm1 + +/* + * Defines + * Implementation + * Multiprecision branch for _HA_ only + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2 + */ + vmulpd %ymm1, %ymm1, %ymm0 + vmovups 576+__svml_dhypot_data_internal(%rip), %xmm4 + vmovups %ymm8, 32(%rsp) + vmovups %ymm13, 288(%rsp) + vmovups %ymm9, 96(%rsp) + vfmadd231pd %ymm2, %ymm2, %ymm0 + vmovups %ymm10, 160(%rsp) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + +/* + * calculate fixing part _p + * _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1 + * some parts of polynom are skipped for lower flav + */ + vmovupd 256+__svml_dhypot_data_internal(%rip), %ymm10 + vmovups %ymm11, 224(%rsp) + vmovups %ymm12, 256(%rsp) + vmovups %ymm14, 320(%rsp) + vmovups %ymm15, 352(%rsp) + +/* Check _z exponent to be withing borders [3BC ; 441] else goto Callout */ + vextractf128 $1, %ymm0, %xmm3 + vshufps $221, %xmm3, %xmm0, %xmm5 + vpcmpgtd 640+__svml_dhypot_data_internal(%rip), %xmm5, %xmm7 + vpcmpgtd %xmm5, %xmm4, %xmm6 + vpor %xmm7, %xmm6, %xmm4 + +/* + * _s ~ 1.0/sqrt(_z) + * _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z + */ + vcvtpd2ps %ymm0, %xmm7 + vpshufd $80, %xmm4, %xmm3 + vpshufd $250, %xmm4, %xmm5 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x22 + vrsqrtps %xmm7, %xmm8 + vcvtps2pd %xmm8, %ymm13 + vmulpd %ymm13, %ymm13, %ymm9 + +/* _e[rror] ~ (1.0/_z + O) * _z - 1.0 */ + vfmsub213pd 128+__svml_dhypot_data_internal(%rip), %ymm0, %ymm9 + vfmadd213pd 320+__svml_dhypot_data_internal(%rip), %ymm9, %ymm10 + vfmadd213pd 384+__svml_dhypot_data_internal(%rip), %ymm9, %ymm10 + vfmadd213pd 448+__svml_dhypot_data_internal(%rip), %ymm9, %ymm10 + +/* result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z */ + vmulpd %ymm10, %ymm9, %ymm11 + vmulpd %ymm11, %ymm13, %ymm12 + vmulpd %ymm12, %ymm0, %ymm14 + vfmadd213pd %ymm14, %ymm13, %ymm0 + vinsertf128 $1, %xmm5, %ymm3, %ymm6 + vmovmskpd %ymm6, %edx + +/* The end of implementation */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovups 32(%rsp), %ymm8 + cfi_restore(91) + vmovups 96(%rsp), %ymm9 + cfi_restore(92) + vmovups 160(%rsp), %ymm10 + cfi_restore(93) + vmovups 224(%rsp), %ymm11 + cfi_restore(94) + vmovups 256(%rsp), %ymm12 + cfi_restore(95) + vmovups 288(%rsp), %ymm13 + cfi_restore(96) + vmovups 320(%rsp), %ymm14 + cfi_restore(97) + vmovups 352(%rsp), %ymm15 + cfi_restore(98) + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_3: + vmovupd %ymm1, 64(%rsp) + vmovupd %ymm2, 128(%rsp) + vmovupd %ymm0, 192(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + movq %rsi, 8(%rsp) + movq %rdi, (%rsp) + movq %r12, 24(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 16(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $4, %r12d + jl .LBL_1_7 + movq 8(%rsp), %rsi + cfi_restore(4) + movq (%rsp), %rdi + cfi_restore(5) + movq 24(%rsp), %r12 + cfi_restore(12) + movq 16(%rsp), %r13 + cfi_restore(13) + vmovupd 192(%rsp), %ymm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,8), %rdi + lea 128(%rsp,%r12,8), %rsi + lea 192(%rsp,%r12,8), %rdx + call __svml_dhypot_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVdN4vv_hypot_avx2) + + .align 16,0x90 + +__svml_dhypot_cout_rare_internal: + + cfi_startproc + + movq %rdx, %r8 + movzwl 6(%rdi), %eax + andl $32752, %eax + cmpl $32752, %eax + je .LBL_2_14 + movzwl 6(%rsi), %eax + andl $32752, %eax + cmpl $32752, %eax + je .LBL_2_13 + movsd (%rdi), %xmm2 + movsd 4096+_vmldHypotHATab(%rip), %xmm0 + movb 7(%rdi), %dl + movb 7(%rsi), %al + movsd (%rsi), %xmm1 + ucomisd %xmm0, %xmm2 + jp .LBL_2_4 + je .LBL_2_11 + +.LBL_2_4: + movsd %xmm2, -16(%rsp) + andb $127, %dl + movsd %xmm1, -48(%rsp) + andb $127, %al + movb %dl, -9(%rsp) + movb %al, -41(%rsp) + movsd -16(%rsp), %xmm8 + movsd -48(%rsp), %xmm1 + comisd %xmm8, %xmm1 + jbe .LBL_2_6 + movaps %xmm8, %xmm2 + movaps %xmm1, %xmm8 + movsd %xmm1, -16(%rsp) + movaps %xmm2, %xmm1 + +.LBL_2_6: + movzwl -10(%rsp), %edx + andl $32752, %edx + shrl $4, %edx + negl %edx + movzwl 4102+_vmldHypotHATab(%rip), %edi + andl $-32753, %edi + movsd %xmm0, -56(%rsp) + movsd 4128+_vmldHypotHATab(%rip), %xmm3 + lea 1025(%rdx), %esi + negl %esi + addl $1000, %esi + shrl $31, %esi + imull $-23, %esi, %eax + lea 1025(%rax,%rdx), %esi + lea 1023(%rsi), %ecx + andl $2047, %ecx + shll $4, %ecx + orl %ecx, %edi + movw %di, -50(%rsp) + movsd -56(%rsp), %xmm2 + mulsd %xmm2, %xmm8 + mulsd %xmm2, %xmm1 + mulsd %xmm8, %xmm3 + movsd %xmm3, -72(%rsp) + movsd -72(%rsp), %xmm4 + movsd %xmm8, -16(%rsp) + subsd %xmm8, %xmm4 + movsd %xmm4, -64(%rsp) + movsd -72(%rsp), %xmm6 + movsd -64(%rsp), %xmm5 + movsd %xmm1, -48(%rsp) + subsd %xmm5, %xmm6 + movsd %xmm6, -72(%rsp) + movsd -72(%rsp), %xmm7 + movzwl -10(%rsp), %r10d + subsd %xmm7, %xmm8 + movzwl -42(%rsp), %r9d + andl $32752, %r10d + andl $32752, %r9d + shrl $4, %r10d + shrl $4, %r9d + movsd %xmm8, -64(%rsp) + subl %r9d, %r10d + movsd -72(%rsp), %xmm8 + movsd -64(%rsp), %xmm4 + cmpl $6, %r10d + jle .LBL_2_8 + movaps %xmm1, %xmm2 + jmp .LBL_2_9 + +.LBL_2_8: + movsd -48(%rsp), %xmm1 + movsd 4128+_vmldHypotHATab(%rip), %xmm0 + movaps %xmm1, %xmm7 + mulsd %xmm1, %xmm0 + movsd %xmm0, -72(%rsp) + movsd -72(%rsp), %xmm2 + subsd -48(%rsp), %xmm2 + movsd %xmm2, -64(%rsp) + movsd -72(%rsp), %xmm5 + movsd -64(%rsp), %xmm3 + subsd %xmm3, %xmm5 + movsd %xmm5, -72(%rsp) + movsd -72(%rsp), %xmm6 + subsd %xmm6, %xmm7 + movsd %xmm7, -64(%rsp) + movsd -72(%rsp), %xmm0 + movsd -64(%rsp), %xmm2 + +.LBL_2_9: + movsd -16(%rsp), %xmm6 + movaps %xmm8, %xmm3 + mulsd %xmm2, %xmm1 + addsd %xmm8, %xmm6 + mulsd %xmm8, %xmm3 + mulsd %xmm6, %xmm4 + movaps %xmm0, %xmm5 + negl %esi + mulsd %xmm0, %xmm5 + addsd %xmm1, %xmm4 + mulsd %xmm2, %xmm0 + addsd %xmm5, %xmm3 + addsd %xmm0, %xmm4 + movaps %xmm3, %xmm7 + addl $1023, %esi + movq 4112+_vmldHypotHATab(%rip), %r11 + movq %r11, %r9 + lea _vmldHypotHATab(%rip), %rdx + addsd %xmm4, %xmm7 + movsd %xmm7, -56(%rsp) + andl $2047, %esi + movzwl -50(%rsp), %ecx + andl $32752, %ecx + shrl $4, %ecx + addl $-1023, %ecx + movl %ecx, %eax + andl $1, %eax + subl %eax, %ecx + shrl $1, %ecx + movsd %xmm7, -48(%rsp) + movzwl -42(%rsp), %edi + andl $-32753, %edi + shrq $48, %r9 + lea 1023(%rcx), %r10d + addl %ecx, %ecx + addl $16368, %edi + negl %ecx + andl $2047, %r10d + addl $1023, %ecx + andl $2047, %ecx + andl $-32753, %r9d + movw %di, -42(%rsp) + shll $4, %r10d + shll $4, %ecx + orl %r9d, %r10d + shll $4, %esi + orl %r9d, %ecx + movsd -48(%rsp), %xmm2 + orl %esi, %r9d + movl -44(%rsp), %esi + mulsd 4112(%rdx,%rax,8), %xmm2 + andl $1048575, %esi + shrl $12, %esi + shll $8, %eax + addl %eax, %esi + movsd (%rdx,%rsi,8), %xmm8 + movsd 4104+_vmldHypotHATab(%rip), %xmm1 + mulsd %xmm8, %xmm2 + mulsd %xmm8, %xmm1 + movaps %xmm2, %xmm9 + mulsd %xmm1, %xmm9 + movsd 4104+_vmldHypotHATab(%rip), %xmm11 + movsd 4104+_vmldHypotHATab(%rip), %xmm14 + subsd %xmm9, %xmm11 + movaps %xmm11, %xmm10 + mulsd %xmm2, %xmm11 + mulsd %xmm1, %xmm10 + addsd %xmm11, %xmm2 + addsd %xmm10, %xmm1 + movaps %xmm2, %xmm12 + movaps %xmm1, %xmm13 + mulsd %xmm1, %xmm12 + movsd 4104+_vmldHypotHATab(%rip), %xmm0 + subsd %xmm12, %xmm14 + mulsd %xmm14, %xmm13 + mulsd %xmm2, %xmm14 + addsd %xmm13, %xmm1 + addsd %xmm14, %xmm2 + movaps %xmm2, %xmm15 + movaps %xmm2, %xmm5 + mulsd %xmm1, %xmm15 + movsd 4128+_vmldHypotHATab(%rip), %xmm6 + subsd %xmm15, %xmm0 + mulsd %xmm0, %xmm5 + mulsd %xmm1, %xmm0 + addsd %xmm5, %xmm2 + addsd %xmm0, %xmm1 + mulsd %xmm2, %xmm6 + movsd %xmm6, -72(%rsp) + movaps %xmm2, %xmm11 + movsd -72(%rsp), %xmm7 + movq %r11, -32(%rsp) + subsd %xmm2, %xmm7 + movsd %xmm7, -64(%rsp) + movsd -72(%rsp), %xmm9 + movsd -64(%rsp), %xmm8 + movw %cx, -26(%rsp) + subsd %xmm8, %xmm9 + movsd %xmm9, -72(%rsp) + movsd -72(%rsp), %xmm10 + movsd -32(%rsp), %xmm15 + subsd %xmm10, %xmm11 + mulsd %xmm15, %xmm3 + mulsd %xmm15, %xmm4 + movsd %xmm11, -64(%rsp) + movsd -72(%rsp), %xmm13 + movsd 4120+_vmldHypotHATab(%rip), %xmm14 + movaps %xmm13, %xmm12 + mulsd %xmm13, %xmm12 + mulsd %xmm13, %xmm14 + subsd %xmm12, %xmm3 + movsd -64(%rsp), %xmm5 + mulsd %xmm5, %xmm14 + mulsd %xmm5, %xmm5 + subsd %xmm14, %xmm3 + movq %r11, -40(%rsp) + subsd %xmm5, %xmm3 + movw %r10w, -34(%rsp) + addsd %xmm4, %xmm3 + mulsd %xmm1, %xmm3 + movq %r11, -24(%rsp) + addsd %xmm3, %xmm2 + mulsd -40(%rsp), %xmm2 + movw %r9w, -18(%rsp) + mulsd -24(%rsp), %xmm2 + movsd %xmm2, (%r8) + +.LBL_2_10: + xorl %eax, %eax + ret + +.LBL_2_11: + ucomisd %xmm0, %xmm1 + jne .LBL_2_4 + jp .LBL_2_4 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_13: + movsd (%rsi), %xmm0 + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_14: + movzwl 6(%rsi), %eax + andl $32752, %eax + cmpl $32752, %eax + je .LBL_2_16 + +.LBL_2_15: + movsd (%rdi), %xmm0 + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_16: + movl 4(%rdi), %edx + movl %edx, %eax + andl $1048575, %eax + jne .LBL_2_18 + cmpl $0, (%rdi) + je .LBL_2_23 + +.LBL_2_18: + testl $1048575, 4(%rsi) + jne .LBL_2_20 + cmpl $0, (%rsi) + je .LBL_2_21 + +.LBL_2_20: + movsd (%rdi), %xmm0 + mulsd (%rsi), %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_21: + testl %eax, %eax + jne .LBL_2_30 + cmpl $0, (%rdi) + je .LBL_2_24 + jmp .LBL_2_29 + +.LBL_2_23: + jne .LBL_2_29 + +.LBL_2_24: + movl 4(%rsi), %eax + testl $1048575, %eax + jne .LBL_2_26 + cmpl $0, (%rsi) + je .LBL_2_15 + +.LBL_2_26: + testl $524288, %eax + jne .LBL_2_15 + movsd 4112+_vmldHypotHATab(%rip), %xmm0 + mulsd (%rsi), %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_29: + je .LBL_2_13 + +.LBL_2_30: + testl $524288, %edx + jne .LBL_2_13 + movsd 4112+_vmldHypotHATab(%rip), %xmm0 + mulsd (%rdi), %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + + cfi_endproc + + .type __svml_dhypot_cout_rare_internal,@function + .size __svml_dhypot_cout_rare_internal,.-__svml_dhypot_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dhypot_data_internal: + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_dhypot_data_internal,@object + .size __svml_dhypot_data_internal,896 + .align 32 + +_vmldHypotHATab: + .long 0 + .long 1072693248 + .long 0 + .long 1072689152 + .long 0 + .long 1072685056 + .long 0 + .long 1072680960 + .long 0 + .long 1072676864 + .long 0 + .long 1072672768 + .long 0 + .long 1072668672 + .long 0 + .long 1072665600 + .long 0 + .long 1072661504 + .long 0 + .long 1072657408 + .long 0 + .long 1072653312 + .long 0 + .long 1072649216 + .long 0 + .long 1072646144 + .long 0 + .long 1072642048 + .long 0 + .long 1072637952 + .long 0 + .long 1072634880 + .long 0 + .long 1072630784 + .long 0 + .long 1072626688 + .long 0 + .long 1072623616 + .long 0 + .long 1072619520 + .long 0 + .long 1072615424 + .long 0 + .long 1072612352 + .long 0 + .long 1072608256 + .long 0 + .long 1072605184 + .long 0 + .long 1072601088 + .long 0 + .long 1072598016 + .long 0 + .long 1072593920 + .long 0 + .long 1072590848 + .long 0 + .long 1072586752 + .long 0 + .long 1072583680 + .long 0 + .long 1072580608 + .long 0 + .long 1072576512 + .long 0 + .long 1072573440 + .long 0 + .long 1072570368 + .long 0 + .long 1072566272 + .long 0 + .long 1072563200 + .long 0 + .long 1072560128 + .long 0 + .long 1072556032 + .long 0 + .long 1072552960 + .long 0 + .long 1072549888 + .long 0 + .long 1072546816 + .long 0 + .long 1072542720 + .long 0 + .long 1072539648 + .long 0 + .long 1072536576 + .long 0 + .long 1072533504 + .long 0 + .long 1072530432 + .long 0 + .long 1072527360 + .long 0 + .long 1072523264 + .long 0 + .long 1072520192 + .long 0 + .long 1072517120 + .long 0 + .long 1072514048 + .long 0 + .long 1072510976 + .long 0 + .long 1072507904 + .long 0 + .long 1072504832 + .long 0 + .long 1072501760 + .long 0 + .long 1072498688 + .long 0 + .long 1072495616 + .long 0 + .long 1072492544 + .long 0 + .long 1072489472 + .long 0 + .long 1072486400 + .long 0 + .long 1072483328 + .long 0 + .long 1072480256 + .long 0 + .long 1072478208 + .long 0 + .long 1072475136 + .long 0 + .long 1072472064 + .long 0 + .long 1072468992 + .long 0 + .long 1072465920 + .long 0 + .long 1072462848 + .long 0 + .long 1072459776 + .long 0 + .long 1072457728 + .long 0 + .long 1072454656 + .long 0 + .long 1072451584 + .long 0 + .long 1072448512 + .long 0 + .long 1072446464 + .long 0 + .long 1072443392 + .long 0 + .long 1072440320 + .long 0 + .long 1072437248 + .long 0 + .long 1072435200 + .long 0 + .long 1072432128 + .long 0 + .long 1072429056 + .long 0 + .long 1072427008 + .long 0 + .long 1072423936 + .long 0 + .long 1072420864 + .long 0 + .long 1072418816 + .long 0 + .long 1072415744 + .long 0 + .long 1072412672 + .long 0 + .long 1072410624 + .long 0 + .long 1072407552 + .long 0 + .long 1072405504 + .long 0 + .long 1072402432 + .long 0 + .long 1072400384 + .long 0 + .long 1072397312 + .long 0 + .long 1072395264 + .long 0 + .long 1072392192 + .long 0 + .long 1072390144 + .long 0 + .long 1072387072 + .long 0 + .long 1072385024 + .long 0 + .long 1072381952 + .long 0 + .long 1072379904 + .long 0 + .long 1072376832 + .long 0 + .long 1072374784 + .long 0 + .long 1072371712 + .long 0 + .long 1072369664 + .long 0 + .long 1072366592 + .long 0 + .long 1072364544 + .long 0 + .long 1072362496 + .long 0 + .long 1072359424 + .long 0 + .long 1072357376 + .long 0 + .long 1072355328 + .long 0 + .long 1072352256 + .long 0 + .long 1072350208 + .long 0 + .long 1072347136 + .long 0 + .long 1072345088 + .long 0 + .long 1072343040 + .long 0 + .long 1072340992 + .long 0 + .long 1072337920 + .long 0 + .long 1072335872 + .long 0 + .long 1072333824 + .long 0 + .long 1072330752 + .long 0 + .long 1072328704 + .long 0 + .long 1072326656 + .long 0 + .long 1072324608 + .long 0 + .long 1072321536 + .long 0 + .long 1072319488 + .long 0 + .long 1072317440 + .long 0 + .long 1072315392 + .long 0 + .long 1072313344 + .long 0 + .long 1072310272 + .long 0 + .long 1072308224 + .long 0 + .long 1072306176 + .long 0 + .long 1072304128 + .long 0 + .long 1072302080 + .long 0 + .long 1072300032 + .long 0 + .long 1072296960 + .long 0 + .long 1072294912 + .long 0 + .long 1072292864 + .long 0 + .long 1072290816 + .long 0 + .long 1072288768 + .long 0 + .long 1072286720 + .long 0 + .long 1072284672 + .long 0 + .long 1072282624 + .long 0 + .long 1072280576 + .long 0 + .long 1072278528 + .long 0 + .long 1072275456 + .long 0 + .long 1072273408 + .long 0 + .long 1072271360 + .long 0 + .long 1072269312 + .long 0 + .long 1072267264 + .long 0 + .long 1072265216 + .long 0 + .long 1072263168 + .long 0 + .long 1072261120 + .long 0 + .long 1072259072 + .long 0 + .long 1072257024 + .long 0 + .long 1072254976 + .long 0 + .long 1072252928 + .long 0 + .long 1072250880 + .long 0 + .long 1072248832 + .long 0 + .long 1072246784 + .long 0 + .long 1072244736 + .long 0 + .long 1072243712 + .long 0 + .long 1072241664 + .long 0 + .long 1072239616 + .long 0 + .long 1072237568 + .long 0 + .long 1072235520 + .long 0 + .long 1072233472 + .long 0 + .long 1072231424 + .long 0 + .long 1072229376 + .long 0 + .long 1072227328 + .long 0 + .long 1072225280 + .long 0 + .long 1072223232 + .long 0 + .long 1072222208 + .long 0 + .long 1072220160 + .long 0 + .long 1072218112 + .long 0 + .long 1072216064 + .long 0 + .long 1072214016 + .long 0 + .long 1072211968 + .long 0 + .long 1072210944 + .long 0 + .long 1072208896 + .long 0 + .long 1072206848 + .long 0 + .long 1072204800 + .long 0 + .long 1072202752 + .long 0 + .long 1072201728 + .long 0 + .long 1072199680 + .long 0 + .long 1072197632 + .long 0 + .long 1072195584 + .long 0 + .long 1072193536 + .long 0 + .long 1072192512 + .long 0 + .long 1072190464 + .long 0 + .long 1072188416 + .long 0 + .long 1072186368 + .long 0 + .long 1072185344 + .long 0 + .long 1072183296 + .long 0 + .long 1072181248 + .long 0 + .long 1072179200 + .long 0 + .long 1072178176 + .long 0 + .long 1072176128 + .long 0 + .long 1072174080 + .long 0 + .long 1072173056 + .long 0 + .long 1072171008 + .long 0 + .long 1072168960 + .long 0 + .long 1072167936 + .long 0 + .long 1072165888 + .long 0 + .long 1072163840 + .long 0 + .long 1072161792 + .long 0 + .long 1072160768 + .long 0 + .long 1072158720 + .long 0 + .long 1072157696 + .long 0 + .long 1072155648 + .long 0 + .long 1072153600 + .long 0 + .long 1072152576 + .long 0 + .long 1072150528 + .long 0 + .long 1072148480 + .long 0 + .long 1072147456 + .long 0 + .long 1072145408 + .long 0 + .long 1072143360 + .long 0 + .long 1072142336 + .long 0 + .long 1072140288 + .long 0 + .long 1072139264 + .long 0 + .long 1072137216 + .long 0 + .long 1072135168 + .long 0 + .long 1072134144 + .long 0 + .long 1072132096 + .long 0 + .long 1072131072 + .long 0 + .long 1072129024 + .long 0 + .long 1072128000 + .long 0 + .long 1072125952 + .long 0 + .long 1072124928 + .long 0 + .long 1072122880 + .long 0 + .long 1072120832 + .long 0 + .long 1072119808 + .long 0 + .long 1072117760 + .long 0 + .long 1072116736 + .long 0 + .long 1072114688 + .long 0 + .long 1072113664 + .long 0 + .long 1072111616 + .long 0 + .long 1072110592 + .long 0 + .long 1072108544 + .long 0 + .long 1072107520 + .long 0 + .long 1072105472 + .long 0 + .long 1072104448 + .long 0 + .long 1072102400 + .long 0 + .long 1072101376 + .long 0 + .long 1072099328 + .long 0 + .long 1072098304 + .long 0 + .long 1072096256 + .long 0 + .long 1072095232 + .long 0 + .long 1072094208 + .long 0 + .long 1072092160 + .long 0 + .long 1072091136 + .long 0 + .long 1072089088 + .long 0 + .long 1072088064 + .long 0 + .long 1072086016 + .long 0 + .long 1072084992 + .long 0 + .long 1072082944 + .long 0 + .long 1072081920 + .long 0 + .long 1072080896 + .long 0 + .long 1072078848 + .long 0 + .long 1072075776 + .long 0 + .long 1072073728 + .long 0 + .long 1072070656 + .long 0 + .long 1072067584 + .long 0 + .long 1072064512 + .long 0 + .long 1072061440 + .long 0 + .long 1072059392 + .long 0 + .long 1072056320 + .long 0 + .long 1072053248 + .long 0 + .long 1072051200 + .long 0 + .long 1072048128 + .long 0 + .long 1072045056 + .long 0 + .long 1072043008 + .long 0 + .long 1072039936 + .long 0 + .long 1072037888 + .long 0 + .long 1072034816 + .long 0 + .long 1072031744 + .long 0 + .long 1072029696 + .long 0 + .long 1072026624 + .long 0 + .long 1072024576 + .long 0 + .long 1072021504 + .long 0 + .long 1072019456 + .long 0 + .long 1072016384 + .long 0 + .long 1072014336 + .long 0 + .long 1072011264 + .long 0 + .long 1072009216 + .long 0 + .long 1072006144 + .long 0 + .long 1072004096 + .long 0 + .long 1072002048 + .long 0 + .long 1071998976 + .long 0 + .long 1071996928 + .long 0 + .long 1071993856 + .long 0 + .long 1071991808 + .long 0 + .long 1071989760 + .long 0 + .long 1071986688 + .long 0 + .long 1071984640 + .long 0 + .long 1071982592 + .long 0 + .long 1071979520 + .long 0 + .long 1071977472 + .long 0 + .long 1071975424 + .long 0 + .long 1071972352 + .long 0 + .long 1071970304 + .long 0 + .long 1071968256 + .long 0 + .long 1071966208 + .long 0 + .long 1071964160 + .long 0 + .long 1071961088 + .long 0 + .long 1071959040 + .long 0 + .long 1071956992 + .long 0 + .long 1071954944 + .long 0 + .long 1071952896 + .long 0 + .long 1071949824 + .long 0 + .long 1071947776 + .long 0 + .long 1071945728 + .long 0 + .long 1071943680 + .long 0 + .long 1071941632 + .long 0 + .long 1071939584 + .long 0 + .long 1071937536 + .long 0 + .long 1071935488 + .long 0 + .long 1071933440 + .long 0 + .long 1071930368 + .long 0 + .long 1071928320 + .long 0 + .long 1071926272 + .long 0 + .long 1071924224 + .long 0 + .long 1071922176 + .long 0 + .long 1071920128 + .long 0 + .long 1071918080 + .long 0 + .long 1071916032 + .long 0 + .long 1071913984 + .long 0 + .long 1071911936 + .long 0 + .long 1071909888 + .long 0 + .long 1071907840 + .long 0 + .long 1071905792 + .long 0 + .long 1071903744 + .long 0 + .long 1071901696 + .long 0 + .long 1071900672 + .long 0 + .long 1071898624 + .long 0 + .long 1071896576 + .long 0 + .long 1071894528 + .long 0 + .long 1071892480 + .long 0 + .long 1071890432 + .long 0 + .long 1071888384 + .long 0 + .long 1071886336 + .long 0 + .long 1071884288 + .long 0 + .long 1071883264 + .long 0 + .long 1071881216 + .long 0 + .long 1071879168 + .long 0 + .long 1071877120 + .long 0 + .long 1071875072 + .long 0 + .long 1071873024 + .long 0 + .long 1071872000 + .long 0 + .long 1071869952 + .long 0 + .long 1071867904 + .long 0 + .long 1071865856 + .long 0 + .long 1071864832 + .long 0 + .long 1071862784 + .long 0 + .long 1071860736 + .long 0 + .long 1071858688 + .long 0 + .long 1071856640 + .long 0 + .long 1071855616 + .long 0 + .long 1071853568 + .long 0 + .long 1071851520 + .long 0 + .long 1071850496 + .long 0 + .long 1071848448 + .long 0 + .long 1071846400 + .long 0 + .long 1071844352 + .long 0 + .long 1071843328 + .long 0 + .long 1071841280 + .long 0 + .long 1071839232 + .long 0 + .long 1071838208 + .long 0 + .long 1071836160 + .long 0 + .long 1071834112 + .long 0 + .long 1071833088 + .long 0 + .long 1071831040 + .long 0 + .long 1071830016 + .long 0 + .long 1071827968 + .long 0 + .long 1071825920 + .long 0 + .long 1071824896 + .long 0 + .long 1071822848 + .long 0 + .long 1071821824 + .long 0 + .long 1071819776 + .long 0 + .long 1071817728 + .long 0 + .long 1071816704 + .long 0 + .long 1071814656 + .long 0 + .long 1071813632 + .long 0 + .long 1071811584 + .long 0 + .long 1071810560 + .long 0 + .long 1071808512 + .long 0 + .long 1071806464 + .long 0 + .long 1071805440 + .long 0 + .long 1071803392 + .long 0 + .long 1071802368 + .long 0 + .long 1071800320 + .long 0 + .long 1071799296 + .long 0 + .long 1071797248 + .long 0 + .long 1071796224 + .long 0 + .long 1071794176 + .long 0 + .long 1071793152 + .long 0 + .long 1071791104 + .long 0 + .long 1071790080 + .long 0 + .long 1071788032 + .long 0 + .long 1071787008 + .long 0 + .long 1071784960 + .long 0 + .long 1071783936 + .long 0 + .long 1071782912 + .long 0 + .long 1071780864 + .long 0 + .long 1071779840 + .long 0 + .long 1071777792 + .long 0 + .long 1071776768 + .long 0 + .long 1071774720 + .long 0 + .long 1071773696 + .long 0 + .long 1071772672 + .long 0 + .long 1071770624 + .long 0 + .long 1071769600 + .long 0 + .long 1071767552 + .long 0 + .long 1071766528 + .long 0 + .long 1071765504 + .long 0 + .long 1071763456 + .long 0 + .long 1071762432 + .long 0 + .long 1071760384 + .long 0 + .long 1071759360 + .long 0 + .long 1071758336 + .long 0 + .long 1071756288 + .long 0 + .long 1071755264 + .long 0 + .long 1071754240 + .long 0 + .long 1071752192 + .long 0 + .long 1071751168 + .long 0 + .long 1071750144 + .long 0 + .long 1071748096 + .long 0 + .long 1071747072 + .long 0 + .long 1071746048 + .long 0 + .long 1071744000 + .long 0 + .long 1071742976 + .long 0 + .long 1071741952 + .long 0 + .long 1071739904 + .long 0 + .long 1071738880 + .long 0 + .long 1071737856 + .long 0 + .long 1071736832 + .long 0 + .long 1071734784 + .long 0 + .long 1071733760 + .long 0 + .long 1071732736 + .long 0 + .long 1071730688 + .long 0 + .long 1071729664 + .long 0 + .long 1071728640 + .long 0 + .long 1071727616 + .long 0 + .long 1071725568 + .long 0 + .long 1071724544 + .long 0 + .long 1071723520 + .long 0 + .long 1071722496 + .long 0 + .long 1071720448 + .long 0 + .long 1071719424 + .long 0 + .long 1071718400 + .long 0 + .long 1071717376 + .long 0 + .long 1071715328 + .long 0 + .long 1071714304 + .long 0 + .long 1071713280 + .long 0 + .long 1071712256 + .long 0 + .long 1071711232 + .long 0 + .long 1071709184 + .long 0 + .long 1071708160 + .long 0 + .long 1071707136 + .long 0 + .long 1071706112 + .long 0 + .long 1071705088 + .long 0 + .long 1071704064 + .long 0 + .long 1071702016 + .long 0 + .long 1071700992 + .long 0 + .long 1071699968 + .long 0 + .long 1071698944 + .long 0 + .long 1071697920 + .long 0 + .long 1071696896 + .long 0 + .long 1071694848 + .long 0 + .long 1071693824 + .long 0 + .long 1071692800 + .long 0 + .long 1071691776 + .long 0 + .long 1071690752 + .long 0 + .long 1071689728 + .long 0 + .long 1071688704 + .long 0 + .long 1071686656 + .long 0 + .long 1071685632 + .long 0 + .long 1071684608 + .long 0 + .long 1071683584 + .long 0 + .long 1071682560 + .long 0 + .long 1071681536 + .long 0 + .long 1071680512 + .long 0 + .long 1071679488 + .long 0 + .long 1071677440 + .long 0 + .long 1071676416 + .long 0 + .long 1071675392 + .long 0 + .long 1071674368 + .long 0 + .long 1071673344 + .long 0 + .long 1071672320 + .long 0 + .long 1071671296 + .long 0 + .long 1071670272 + .long 0 + .long 1071669248 + .long 0 + .long 1071668224 + .long 0 + .long 1071667200 + .long 0 + .long 1071666176 + .long 0 + .long 1071665152 + .long 0 + .long 1071663104 + .long 0 + .long 1071662080 + .long 0 + .long 1071661056 + .long 0 + .long 1071660032 + .long 0 + .long 1071659008 + .long 0 + .long 1071657984 + .long 0 + .long 1071656960 + .long 0 + .long 1071655936 + .long 0 + .long 1071654912 + .long 0 + .long 1071653888 + .long 0 + .long 1071652864 + .long 0 + .long 1071651840 + .long 0 + .long 1071650816 + .long 0 + .long 1071649792 + .long 0 + .long 1071648768 + .long 0 + .long 1071647744 + .long 0 + .long 1071646720 + .long 0 + .long 1071645696 + .long 0 + .long 0 + .long 0 + .long 1071644672 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 33554432 + .long 1101004800 + .type _vmldHypotHATab,@object + .size _vmldHypotHATab,4136 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S new file mode 100644 index 0000000000..a53e82cf9a --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized hypot. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN8vv_hypot _ZGVeN8vv_hypot_avx2_wrapper +#include "../svml_d_hypot8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c new file mode 100644 index 0000000000..6052c752c9 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized hypot, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN8vv_hypot +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN8vv_hypot, __GI__ZGVeN8vv_hypot, + __redirect__ZGVeN8vv_hypot) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S new file mode 100644 index 0000000000..e14b8bd210 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S @@ -0,0 +1,1775 @@ +/* Function hypot vectorized with AVX-512. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * HIGH LEVEL OVERVIEW + * + * Calculate z = (x*x+y*y) + * Calculate reciplicle sqrt (z) + * Calculate error = z*(rsqrt(z)*rsqrt(z)) - 1 + * Calculate fixing part p with polynom + * Fix answer with sqrt(z) = z * rsqrt(z) + error * p * z + * + * ALGORITHM DETAILS + * + * Multiprecision branch for _HA_ only + * Remove sigm from both arguments + * Find maximum (_x) and minimum (_y) (by abs value) between arguments + * Split _x int _a and _b for multiprecision + * If _x >> _y we will we will not split _y for multiprecision + * all _y will be put into lower part (_d) and higher part (_c = 0) + * Fixing _hilo_mask for the case _x >> _y + * Split _y into _c and _d for multiprecision with fixed mask + * + * compute Hi and Lo parts of _z = _x*_x + _y*_y + * + * _zHi = _a*_a + _c*_c + * _zLo = (_x + _a)*_b + _d*_y + _d*_c + * _z = _zHi + _zLo + * + * No multiprecision branch for _LA_ and _EP_ + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2 + * + * Check _z exponent to be withing borders [3BC ; 441] else goto Callout + * + * _s ~ 1.0/sqrt(_z) + * _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z = (1.0/_z + O) + * _e[rror] = (1.0/_z + O) * _z - 1.0 + * calculate fixing part _p + * _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1 + * some parts of polynom are skipped for lower flav + * + * result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z + * + * + */ + +#include + + .text +ENTRY(_ZGVeN8vv_hypot_skx) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $256, %rsp + vgetexppd {sae}, %zmm0, %zmm2 + vgetexppd {sae}, %zmm1, %zmm3 + vmovups 832+__svml_dhypot_data_internal(%rip), %zmm9 + vmaxpd {sae}, %zmm3, %zmm2, %zmm4 + vmulpd {rn-sae}, %zmm0, %zmm0, %zmm2 + vandpd 64+__svml_dhypot_data_internal(%rip), %zmm4, %zmm5 + vfmadd231pd {rn-sae}, %zmm1, %zmm1, %zmm2 + +/* Select exponent bound so that no scaling is needed */ + vpcmpq $5, 704+__svml_dhypot_data_internal(%rip), %zmm5, %k0 + vrsqrt14pd %zmm2, %zmm6 + kmovw %k0, %edx + vmulpd {rn-sae}, %zmm6, %zmm2, %zmm7 + vmulpd {rn-sae}, %zmm6, %zmm9, %zmm8 + vfnmadd231pd {rn-sae}, %zmm7, %zmm8, %zmm9 + vfmadd231pd {rn-sae}, %zmm9, %zmm8, %zmm8 + vfmadd213pd {rn-sae}, %zmm7, %zmm7, %zmm9 + vfnmadd231pd {rn-sae}, %zmm9, %zmm9, %zmm2 + vfmadd213pd {rn-sae}, %zmm9, %zmm8, %zmm2 + +/* The end of implementation */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovaps %zmm2, %zmm0 + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + vmovups %zmm0, 64(%rsp) + vmovups %zmm1, 128(%rsp) + vmovups %zmm2, 192(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + kmovw %k4, 24(%rsp) + kmovw %k5, 16(%rsp) + kmovw %k6, 8(%rsp) + kmovw %k7, (%rsp) + movq %rsi, 40(%rsp) + movq %rdi, 32(%rsp) + movq %r12, 56(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x28, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x38, 0xff, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 48(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x18, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $8, %r12d + jl .LBL_1_7 + kmovw 24(%rsp), %k4 + cfi_restore(122) + kmovw 16(%rsp), %k5 + cfi_restore(123) + kmovw 8(%rsp), %k6 + cfi_restore(124) + kmovw (%rsp), %k7 + cfi_restore(125) + vmovups 192(%rsp), %zmm2 + movq 40(%rsp), %rsi + cfi_restore(4) + movq 32(%rsp), %rdi + cfi_restore(5) + movq 56(%rsp), %r12 + cfi_restore(12) + movq 48(%rsp), %r13 + cfi_restore(13) + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x28, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x38, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x18, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,8), %rdi + lea 128(%rsp,%r12,8), %rsi + lea 192(%rsp,%r12,8), %rdx + call __svml_dhypot_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVeN8vv_hypot_skx) + + .align 16,0x90 + +__svml_dhypot_cout_rare_internal: + + cfi_startproc + + movq %rdx, %r8 + movzwl 6(%rdi), %eax + andl $32752, %eax + cmpl $32752, %eax + je .LBL_2_14 + movzwl 6(%rsi), %eax + andl $32752, %eax + cmpl $32752, %eax + je .LBL_2_13 + movsd (%rdi), %xmm2 + movsd 4096+_vmldHypotHATab(%rip), %xmm0 + movb 7(%rdi), %dl + movb 7(%rsi), %al + movsd (%rsi), %xmm1 + ucomisd %xmm0, %xmm2 + jp .LBL_2_4 + je .LBL_2_11 + +.LBL_2_4: + movsd %xmm2, -16(%rsp) + andb $127, %dl + movsd %xmm1, -48(%rsp) + andb $127, %al + movb %dl, -9(%rsp) + movb %al, -41(%rsp) + movsd -16(%rsp), %xmm8 + movsd -48(%rsp), %xmm1 + comisd %xmm8, %xmm1 + jbe .LBL_2_6 + movaps %xmm8, %xmm2 + movaps %xmm1, %xmm8 + movsd %xmm1, -16(%rsp) + movaps %xmm2, %xmm1 + +.LBL_2_6: + movzwl -10(%rsp), %edx + andl $32752, %edx + shrl $4, %edx + negl %edx + movzwl 4102+_vmldHypotHATab(%rip), %edi + andl $-32753, %edi + movsd %xmm0, -56(%rsp) + movsd 4128+_vmldHypotHATab(%rip), %xmm3 + lea 1025(%rdx), %esi + negl %esi + addl $1000, %esi + shrl $31, %esi + imull $-23, %esi, %eax + lea 1025(%rax,%rdx), %esi + lea 1023(%rsi), %ecx + andl $2047, %ecx + shll $4, %ecx + orl %ecx, %edi + movw %di, -50(%rsp) + movsd -56(%rsp), %xmm2 + mulsd %xmm2, %xmm8 + mulsd %xmm2, %xmm1 + mulsd %xmm8, %xmm3 + movsd %xmm3, -72(%rsp) + movsd -72(%rsp), %xmm4 + movsd %xmm8, -16(%rsp) + subsd %xmm8, %xmm4 + movsd %xmm4, -64(%rsp) + movsd -72(%rsp), %xmm6 + movsd -64(%rsp), %xmm5 + movsd %xmm1, -48(%rsp) + subsd %xmm5, %xmm6 + movsd %xmm6, -72(%rsp) + movsd -72(%rsp), %xmm7 + movzwl -10(%rsp), %r10d + subsd %xmm7, %xmm8 + movzwl -42(%rsp), %r9d + andl $32752, %r10d + andl $32752, %r9d + shrl $4, %r10d + shrl $4, %r9d + movsd %xmm8, -64(%rsp) + subl %r9d, %r10d + movsd -72(%rsp), %xmm8 + movsd -64(%rsp), %xmm4 + cmpl $6, %r10d + jle .LBL_2_8 + movaps %xmm1, %xmm2 + jmp .LBL_2_9 + +.LBL_2_8: + movsd -48(%rsp), %xmm1 + movsd 4128+_vmldHypotHATab(%rip), %xmm0 + movaps %xmm1, %xmm7 + mulsd %xmm1, %xmm0 + movsd %xmm0, -72(%rsp) + movsd -72(%rsp), %xmm2 + subsd -48(%rsp), %xmm2 + movsd %xmm2, -64(%rsp) + movsd -72(%rsp), %xmm5 + movsd -64(%rsp), %xmm3 + subsd %xmm3, %xmm5 + movsd %xmm5, -72(%rsp) + movsd -72(%rsp), %xmm6 + subsd %xmm6, %xmm7 + movsd %xmm7, -64(%rsp) + movsd -72(%rsp), %xmm0 + movsd -64(%rsp), %xmm2 + +.LBL_2_9: + movsd -16(%rsp), %xmm6 + movaps %xmm8, %xmm3 + mulsd %xmm2, %xmm1 + addsd %xmm8, %xmm6 + mulsd %xmm8, %xmm3 + mulsd %xmm6, %xmm4 + movaps %xmm0, %xmm5 + negl %esi + mulsd %xmm0, %xmm5 + addsd %xmm1, %xmm4 + mulsd %xmm2, %xmm0 + addsd %xmm5, %xmm3 + addsd %xmm0, %xmm4 + movaps %xmm3, %xmm7 + addl $1023, %esi + movq 4112+_vmldHypotHATab(%rip), %r11 + movq %r11, %r9 + lea _vmldHypotHATab(%rip), %rdx + addsd %xmm4, %xmm7 + movsd %xmm7, -56(%rsp) + andl $2047, %esi + movzwl -50(%rsp), %ecx + andl $32752, %ecx + shrl $4, %ecx + addl $-1023, %ecx + movl %ecx, %eax + andl $1, %eax + subl %eax, %ecx + shrl $1, %ecx + movsd %xmm7, -48(%rsp) + movzwl -42(%rsp), %edi + andl $-32753, %edi + shrq $48, %r9 + lea 1023(%rcx), %r10d + addl %ecx, %ecx + addl $16368, %edi + negl %ecx + andl $2047, %r10d + addl $1023, %ecx + andl $2047, %ecx + andl $-32753, %r9d + movw %di, -42(%rsp) + shll $4, %r10d + shll $4, %ecx + orl %r9d, %r10d + shll $4, %esi + orl %r9d, %ecx + movsd -48(%rsp), %xmm2 + orl %esi, %r9d + movl -44(%rsp), %esi + mulsd 4112(%rdx,%rax,8), %xmm2 + andl $1048575, %esi + shrl $12, %esi + shll $8, %eax + addl %eax, %esi + movsd (%rdx,%rsi,8), %xmm8 + movsd 4104+_vmldHypotHATab(%rip), %xmm1 + mulsd %xmm8, %xmm2 + mulsd %xmm8, %xmm1 + movaps %xmm2, %xmm9 + mulsd %xmm1, %xmm9 + movsd 4104+_vmldHypotHATab(%rip), %xmm11 + movsd 4104+_vmldHypotHATab(%rip), %xmm14 + subsd %xmm9, %xmm11 + movaps %xmm11, %xmm10 + mulsd %xmm2, %xmm11 + mulsd %xmm1, %xmm10 + addsd %xmm11, %xmm2 + addsd %xmm10, %xmm1 + movaps %xmm2, %xmm12 + movaps %xmm1, %xmm13 + mulsd %xmm1, %xmm12 + movsd 4104+_vmldHypotHATab(%rip), %xmm0 + subsd %xmm12, %xmm14 + mulsd %xmm14, %xmm13 + mulsd %xmm2, %xmm14 + addsd %xmm13, %xmm1 + addsd %xmm14, %xmm2 + movaps %xmm2, %xmm15 + movaps %xmm2, %xmm5 + mulsd %xmm1, %xmm15 + movsd 4128+_vmldHypotHATab(%rip), %xmm6 + subsd %xmm15, %xmm0 + mulsd %xmm0, %xmm5 + mulsd %xmm1, %xmm0 + addsd %xmm5, %xmm2 + addsd %xmm0, %xmm1 + mulsd %xmm2, %xmm6 + movsd %xmm6, -72(%rsp) + movaps %xmm2, %xmm11 + movsd -72(%rsp), %xmm7 + movq %r11, -32(%rsp) + subsd %xmm2, %xmm7 + movsd %xmm7, -64(%rsp) + movsd -72(%rsp), %xmm9 + movsd -64(%rsp), %xmm8 + movw %cx, -26(%rsp) + subsd %xmm8, %xmm9 + movsd %xmm9, -72(%rsp) + movsd -72(%rsp), %xmm10 + movsd -32(%rsp), %xmm15 + subsd %xmm10, %xmm11 + mulsd %xmm15, %xmm3 + mulsd %xmm15, %xmm4 + movsd %xmm11, -64(%rsp) + movsd -72(%rsp), %xmm13 + movsd 4120+_vmldHypotHATab(%rip), %xmm14 + movaps %xmm13, %xmm12 + mulsd %xmm13, %xmm12 + mulsd %xmm13, %xmm14 + subsd %xmm12, %xmm3 + movsd -64(%rsp), %xmm5 + mulsd %xmm5, %xmm14 + mulsd %xmm5, %xmm5 + subsd %xmm14, %xmm3 + movq %r11, -40(%rsp) + subsd %xmm5, %xmm3 + movw %r10w, -34(%rsp) + addsd %xmm4, %xmm3 + mulsd %xmm1, %xmm3 + movq %r11, -24(%rsp) + addsd %xmm3, %xmm2 + mulsd -40(%rsp), %xmm2 + movw %r9w, -18(%rsp) + mulsd -24(%rsp), %xmm2 + movsd %xmm2, (%r8) + +.LBL_2_10: + xorl %eax, %eax + ret + +.LBL_2_11: + ucomisd %xmm0, %xmm1 + jne .LBL_2_4 + jp .LBL_2_4 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_13: + movsd (%rsi), %xmm0 + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_14: + movzwl 6(%rsi), %eax + andl $32752, %eax + cmpl $32752, %eax + je .LBL_2_16 + +.LBL_2_15: + movsd (%rdi), %xmm0 + mulsd %xmm0, %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_16: + movl 4(%rdi), %edx + movl %edx, %eax + andl $1048575, %eax + jne .LBL_2_18 + cmpl $0, (%rdi) + je .LBL_2_23 + +.LBL_2_18: + testl $1048575, 4(%rsi) + jne .LBL_2_20 + cmpl $0, (%rsi) + je .LBL_2_21 + +.LBL_2_20: + movsd (%rdi), %xmm0 + mulsd (%rsi), %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_21: + testl %eax, %eax + jne .LBL_2_30 + cmpl $0, (%rdi) + je .LBL_2_24 + jmp .LBL_2_29 + +.LBL_2_23: + jne .LBL_2_29 + +.LBL_2_24: + movl 4(%rsi), %eax + testl $1048575, %eax + jne .LBL_2_26 + cmpl $0, (%rsi) + je .LBL_2_15 + +.LBL_2_26: + testl $524288, %eax + jne .LBL_2_15 + movsd 4112+_vmldHypotHATab(%rip), %xmm0 + mulsd (%rsi), %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_29: + je .LBL_2_13 + +.LBL_2_30: + testl $524288, %edx + jne .LBL_2_13 + movsd 4112+_vmldHypotHATab(%rip), %xmm0 + mulsd (%rdi), %xmm0 + movsd %xmm0, (%r8) + jmp .LBL_2_10 + + cfi_endproc + + .type __svml_dhypot_cout_rare_internal,@function + .size __svml_dhypot_cout_rare_internal,.-__svml_dhypot_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dhypot_data_internal: + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .long 0 + .long 4294950912 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 4294967295 + .long 2147483647 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .long 0 + .long 3218046976 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .long 0 + .long 1070694400 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .long 0 + .long 3218341888 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .long 0 + .long 1071120384 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .long 0 + .long 3219128320 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 6291456 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1002438656 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 1141899264 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .long 0 + .long 1082126336 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .long 0 + .long 1078951936 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .long 0 + .long 1071644672 + .type __svml_dhypot_data_internal,@object + .size __svml_dhypot_data_internal,896 + .align 32 + +_vmldHypotHATab: + .long 0 + .long 1072693248 + .long 0 + .long 1072689152 + .long 0 + .long 1072685056 + .long 0 + .long 1072680960 + .long 0 + .long 1072676864 + .long 0 + .long 1072672768 + .long 0 + .long 1072668672 + .long 0 + .long 1072665600 + .long 0 + .long 1072661504 + .long 0 + .long 1072657408 + .long 0 + .long 1072653312 + .long 0 + .long 1072649216 + .long 0 + .long 1072646144 + .long 0 + .long 1072642048 + .long 0 + .long 1072637952 + .long 0 + .long 1072634880 + .long 0 + .long 1072630784 + .long 0 + .long 1072626688 + .long 0 + .long 1072623616 + .long 0 + .long 1072619520 + .long 0 + .long 1072615424 + .long 0 + .long 1072612352 + .long 0 + .long 1072608256 + .long 0 + .long 1072605184 + .long 0 + .long 1072601088 + .long 0 + .long 1072598016 + .long 0 + .long 1072593920 + .long 0 + .long 1072590848 + .long 0 + .long 1072586752 + .long 0 + .long 1072583680 + .long 0 + .long 1072580608 + .long 0 + .long 1072576512 + .long 0 + .long 1072573440 + .long 0 + .long 1072570368 + .long 0 + .long 1072566272 + .long 0 + .long 1072563200 + .long 0 + .long 1072560128 + .long 0 + .long 1072556032 + .long 0 + .long 1072552960 + .long 0 + .long 1072549888 + .long 0 + .long 1072546816 + .long 0 + .long 1072542720 + .long 0 + .long 1072539648 + .long 0 + .long 1072536576 + .long 0 + .long 1072533504 + .long 0 + .long 1072530432 + .long 0 + .long 1072527360 + .long 0 + .long 1072523264 + .long 0 + .long 1072520192 + .long 0 + .long 1072517120 + .long 0 + .long 1072514048 + .long 0 + .long 1072510976 + .long 0 + .long 1072507904 + .long 0 + .long 1072504832 + .long 0 + .long 1072501760 + .long 0 + .long 1072498688 + .long 0 + .long 1072495616 + .long 0 + .long 1072492544 + .long 0 + .long 1072489472 + .long 0 + .long 1072486400 + .long 0 + .long 1072483328 + .long 0 + .long 1072480256 + .long 0 + .long 1072478208 + .long 0 + .long 1072475136 + .long 0 + .long 1072472064 + .long 0 + .long 1072468992 + .long 0 + .long 1072465920 + .long 0 + .long 1072462848 + .long 0 + .long 1072459776 + .long 0 + .long 1072457728 + .long 0 + .long 1072454656 + .long 0 + .long 1072451584 + .long 0 + .long 1072448512 + .long 0 + .long 1072446464 + .long 0 + .long 1072443392 + .long 0 + .long 1072440320 + .long 0 + .long 1072437248 + .long 0 + .long 1072435200 + .long 0 + .long 1072432128 + .long 0 + .long 1072429056 + .long 0 + .long 1072427008 + .long 0 + .long 1072423936 + .long 0 + .long 1072420864 + .long 0 + .long 1072418816 + .long 0 + .long 1072415744 + .long 0 + .long 1072412672 + .long 0 + .long 1072410624 + .long 0 + .long 1072407552 + .long 0 + .long 1072405504 + .long 0 + .long 1072402432 + .long 0 + .long 1072400384 + .long 0 + .long 1072397312 + .long 0 + .long 1072395264 + .long 0 + .long 1072392192 + .long 0 + .long 1072390144 + .long 0 + .long 1072387072 + .long 0 + .long 1072385024 + .long 0 + .long 1072381952 + .long 0 + .long 1072379904 + .long 0 + .long 1072376832 + .long 0 + .long 1072374784 + .long 0 + .long 1072371712 + .long 0 + .long 1072369664 + .long 0 + .long 1072366592 + .long 0 + .long 1072364544 + .long 0 + .long 1072362496 + .long 0 + .long 1072359424 + .long 0 + .long 1072357376 + .long 0 + .long 1072355328 + .long 0 + .long 1072352256 + .long 0 + .long 1072350208 + .long 0 + .long 1072347136 + .long 0 + .long 1072345088 + .long 0 + .long 1072343040 + .long 0 + .long 1072340992 + .long 0 + .long 1072337920 + .long 0 + .long 1072335872 + .long 0 + .long 1072333824 + .long 0 + .long 1072330752 + .long 0 + .long 1072328704 + .long 0 + .long 1072326656 + .long 0 + .long 1072324608 + .long 0 + .long 1072321536 + .long 0 + .long 1072319488 + .long 0 + .long 1072317440 + .long 0 + .long 1072315392 + .long 0 + .long 1072313344 + .long 0 + .long 1072310272 + .long 0 + .long 1072308224 + .long 0 + .long 1072306176 + .long 0 + .long 1072304128 + .long 0 + .long 1072302080 + .long 0 + .long 1072300032 + .long 0 + .long 1072296960 + .long 0 + .long 1072294912 + .long 0 + .long 1072292864 + .long 0 + .long 1072290816 + .long 0 + .long 1072288768 + .long 0 + .long 1072286720 + .long 0 + .long 1072284672 + .long 0 + .long 1072282624 + .long 0 + .long 1072280576 + .long 0 + .long 1072278528 + .long 0 + .long 1072275456 + .long 0 + .long 1072273408 + .long 0 + .long 1072271360 + .long 0 + .long 1072269312 + .long 0 + .long 1072267264 + .long 0 + .long 1072265216 + .long 0 + .long 1072263168 + .long 0 + .long 1072261120 + .long 0 + .long 1072259072 + .long 0 + .long 1072257024 + .long 0 + .long 1072254976 + .long 0 + .long 1072252928 + .long 0 + .long 1072250880 + .long 0 + .long 1072248832 + .long 0 + .long 1072246784 + .long 0 + .long 1072244736 + .long 0 + .long 1072243712 + .long 0 + .long 1072241664 + .long 0 + .long 1072239616 + .long 0 + .long 1072237568 + .long 0 + .long 1072235520 + .long 0 + .long 1072233472 + .long 0 + .long 1072231424 + .long 0 + .long 1072229376 + .long 0 + .long 1072227328 + .long 0 + .long 1072225280 + .long 0 + .long 1072223232 + .long 0 + .long 1072222208 + .long 0 + .long 1072220160 + .long 0 + .long 1072218112 + .long 0 + .long 1072216064 + .long 0 + .long 1072214016 + .long 0 + .long 1072211968 + .long 0 + .long 1072210944 + .long 0 + .long 1072208896 + .long 0 + .long 1072206848 + .long 0 + .long 1072204800 + .long 0 + .long 1072202752 + .long 0 + .long 1072201728 + .long 0 + .long 1072199680 + .long 0 + .long 1072197632 + .long 0 + .long 1072195584 + .long 0 + .long 1072193536 + .long 0 + .long 1072192512 + .long 0 + .long 1072190464 + .long 0 + .long 1072188416 + .long 0 + .long 1072186368 + .long 0 + .long 1072185344 + .long 0 + .long 1072183296 + .long 0 + .long 1072181248 + .long 0 + .long 1072179200 + .long 0 + .long 1072178176 + .long 0 + .long 1072176128 + .long 0 + .long 1072174080 + .long 0 + .long 1072173056 + .long 0 + .long 1072171008 + .long 0 + .long 1072168960 + .long 0 + .long 1072167936 + .long 0 + .long 1072165888 + .long 0 + .long 1072163840 + .long 0 + .long 1072161792 + .long 0 + .long 1072160768 + .long 0 + .long 1072158720 + .long 0 + .long 1072157696 + .long 0 + .long 1072155648 + .long 0 + .long 1072153600 + .long 0 + .long 1072152576 + .long 0 + .long 1072150528 + .long 0 + .long 1072148480 + .long 0 + .long 1072147456 + .long 0 + .long 1072145408 + .long 0 + .long 1072143360 + .long 0 + .long 1072142336 + .long 0 + .long 1072140288 + .long 0 + .long 1072139264 + .long 0 + .long 1072137216 + .long 0 + .long 1072135168 + .long 0 + .long 1072134144 + .long 0 + .long 1072132096 + .long 0 + .long 1072131072 + .long 0 + .long 1072129024 + .long 0 + .long 1072128000 + .long 0 + .long 1072125952 + .long 0 + .long 1072124928 + .long 0 + .long 1072122880 + .long 0 + .long 1072120832 + .long 0 + .long 1072119808 + .long 0 + .long 1072117760 + .long 0 + .long 1072116736 + .long 0 + .long 1072114688 + .long 0 + .long 1072113664 + .long 0 + .long 1072111616 + .long 0 + .long 1072110592 + .long 0 + .long 1072108544 + .long 0 + .long 1072107520 + .long 0 + .long 1072105472 + .long 0 + .long 1072104448 + .long 0 + .long 1072102400 + .long 0 + .long 1072101376 + .long 0 + .long 1072099328 + .long 0 + .long 1072098304 + .long 0 + .long 1072096256 + .long 0 + .long 1072095232 + .long 0 + .long 1072094208 + .long 0 + .long 1072092160 + .long 0 + .long 1072091136 + .long 0 + .long 1072089088 + .long 0 + .long 1072088064 + .long 0 + .long 1072086016 + .long 0 + .long 1072084992 + .long 0 + .long 1072082944 + .long 0 + .long 1072081920 + .long 0 + .long 1072080896 + .long 0 + .long 1072078848 + .long 0 + .long 1072075776 + .long 0 + .long 1072073728 + .long 0 + .long 1072070656 + .long 0 + .long 1072067584 + .long 0 + .long 1072064512 + .long 0 + .long 1072061440 + .long 0 + .long 1072059392 + .long 0 + .long 1072056320 + .long 0 + .long 1072053248 + .long 0 + .long 1072051200 + .long 0 + .long 1072048128 + .long 0 + .long 1072045056 + .long 0 + .long 1072043008 + .long 0 + .long 1072039936 + .long 0 + .long 1072037888 + .long 0 + .long 1072034816 + .long 0 + .long 1072031744 + .long 0 + .long 1072029696 + .long 0 + .long 1072026624 + .long 0 + .long 1072024576 + .long 0 + .long 1072021504 + .long 0 + .long 1072019456 + .long 0 + .long 1072016384 + .long 0 + .long 1072014336 + .long 0 + .long 1072011264 + .long 0 + .long 1072009216 + .long 0 + .long 1072006144 + .long 0 + .long 1072004096 + .long 0 + .long 1072002048 + .long 0 + .long 1071998976 + .long 0 + .long 1071996928 + .long 0 + .long 1071993856 + .long 0 + .long 1071991808 + .long 0 + .long 1071989760 + .long 0 + .long 1071986688 + .long 0 + .long 1071984640 + .long 0 + .long 1071982592 + .long 0 + .long 1071979520 + .long 0 + .long 1071977472 + .long 0 + .long 1071975424 + .long 0 + .long 1071972352 + .long 0 + .long 1071970304 + .long 0 + .long 1071968256 + .long 0 + .long 1071966208 + .long 0 + .long 1071964160 + .long 0 + .long 1071961088 + .long 0 + .long 1071959040 + .long 0 + .long 1071956992 + .long 0 + .long 1071954944 + .long 0 + .long 1071952896 + .long 0 + .long 1071949824 + .long 0 + .long 1071947776 + .long 0 + .long 1071945728 + .long 0 + .long 1071943680 + .long 0 + .long 1071941632 + .long 0 + .long 1071939584 + .long 0 + .long 1071937536 + .long 0 + .long 1071935488 + .long 0 + .long 1071933440 + .long 0 + .long 1071930368 + .long 0 + .long 1071928320 + .long 0 + .long 1071926272 + .long 0 + .long 1071924224 + .long 0 + .long 1071922176 + .long 0 + .long 1071920128 + .long 0 + .long 1071918080 + .long 0 + .long 1071916032 + .long 0 + .long 1071913984 + .long 0 + .long 1071911936 + .long 0 + .long 1071909888 + .long 0 + .long 1071907840 + .long 0 + .long 1071905792 + .long 0 + .long 1071903744 + .long 0 + .long 1071901696 + .long 0 + .long 1071900672 + .long 0 + .long 1071898624 + .long 0 + .long 1071896576 + .long 0 + .long 1071894528 + .long 0 + .long 1071892480 + .long 0 + .long 1071890432 + .long 0 + .long 1071888384 + .long 0 + .long 1071886336 + .long 0 + .long 1071884288 + .long 0 + .long 1071883264 + .long 0 + .long 1071881216 + .long 0 + .long 1071879168 + .long 0 + .long 1071877120 + .long 0 + .long 1071875072 + .long 0 + .long 1071873024 + .long 0 + .long 1071872000 + .long 0 + .long 1071869952 + .long 0 + .long 1071867904 + .long 0 + .long 1071865856 + .long 0 + .long 1071864832 + .long 0 + .long 1071862784 + .long 0 + .long 1071860736 + .long 0 + .long 1071858688 + .long 0 + .long 1071856640 + .long 0 + .long 1071855616 + .long 0 + .long 1071853568 + .long 0 + .long 1071851520 + .long 0 + .long 1071850496 + .long 0 + .long 1071848448 + .long 0 + .long 1071846400 + .long 0 + .long 1071844352 + .long 0 + .long 1071843328 + .long 0 + .long 1071841280 + .long 0 + .long 1071839232 + .long 0 + .long 1071838208 + .long 0 + .long 1071836160 + .long 0 + .long 1071834112 + .long 0 + .long 1071833088 + .long 0 + .long 1071831040 + .long 0 + .long 1071830016 + .long 0 + .long 1071827968 + .long 0 + .long 1071825920 + .long 0 + .long 1071824896 + .long 0 + .long 1071822848 + .long 0 + .long 1071821824 + .long 0 + .long 1071819776 + .long 0 + .long 1071817728 + .long 0 + .long 1071816704 + .long 0 + .long 1071814656 + .long 0 + .long 1071813632 + .long 0 + .long 1071811584 + .long 0 + .long 1071810560 + .long 0 + .long 1071808512 + .long 0 + .long 1071806464 + .long 0 + .long 1071805440 + .long 0 + .long 1071803392 + .long 0 + .long 1071802368 + .long 0 + .long 1071800320 + .long 0 + .long 1071799296 + .long 0 + .long 1071797248 + .long 0 + .long 1071796224 + .long 0 + .long 1071794176 + .long 0 + .long 1071793152 + .long 0 + .long 1071791104 + .long 0 + .long 1071790080 + .long 0 + .long 1071788032 + .long 0 + .long 1071787008 + .long 0 + .long 1071784960 + .long 0 + .long 1071783936 + .long 0 + .long 1071782912 + .long 0 + .long 1071780864 + .long 0 + .long 1071779840 + .long 0 + .long 1071777792 + .long 0 + .long 1071776768 + .long 0 + .long 1071774720 + .long 0 + .long 1071773696 + .long 0 + .long 1071772672 + .long 0 + .long 1071770624 + .long 0 + .long 1071769600 + .long 0 + .long 1071767552 + .long 0 + .long 1071766528 + .long 0 + .long 1071765504 + .long 0 + .long 1071763456 + .long 0 + .long 1071762432 + .long 0 + .long 1071760384 + .long 0 + .long 1071759360 + .long 0 + .long 1071758336 + .long 0 + .long 1071756288 + .long 0 + .long 1071755264 + .long 0 + .long 1071754240 + .long 0 + .long 1071752192 + .long 0 + .long 1071751168 + .long 0 + .long 1071750144 + .long 0 + .long 1071748096 + .long 0 + .long 1071747072 + .long 0 + .long 1071746048 + .long 0 + .long 1071744000 + .long 0 + .long 1071742976 + .long 0 + .long 1071741952 + .long 0 + .long 1071739904 + .long 0 + .long 1071738880 + .long 0 + .long 1071737856 + .long 0 + .long 1071736832 + .long 0 + .long 1071734784 + .long 0 + .long 1071733760 + .long 0 + .long 1071732736 + .long 0 + .long 1071730688 + .long 0 + .long 1071729664 + .long 0 + .long 1071728640 + .long 0 + .long 1071727616 + .long 0 + .long 1071725568 + .long 0 + .long 1071724544 + .long 0 + .long 1071723520 + .long 0 + .long 1071722496 + .long 0 + .long 1071720448 + .long 0 + .long 1071719424 + .long 0 + .long 1071718400 + .long 0 + .long 1071717376 + .long 0 + .long 1071715328 + .long 0 + .long 1071714304 + .long 0 + .long 1071713280 + .long 0 + .long 1071712256 + .long 0 + .long 1071711232 + .long 0 + .long 1071709184 + .long 0 + .long 1071708160 + .long 0 + .long 1071707136 + .long 0 + .long 1071706112 + .long 0 + .long 1071705088 + .long 0 + .long 1071704064 + .long 0 + .long 1071702016 + .long 0 + .long 1071700992 + .long 0 + .long 1071699968 + .long 0 + .long 1071698944 + .long 0 + .long 1071697920 + .long 0 + .long 1071696896 + .long 0 + .long 1071694848 + .long 0 + .long 1071693824 + .long 0 + .long 1071692800 + .long 0 + .long 1071691776 + .long 0 + .long 1071690752 + .long 0 + .long 1071689728 + .long 0 + .long 1071688704 + .long 0 + .long 1071686656 + .long 0 + .long 1071685632 + .long 0 + .long 1071684608 + .long 0 + .long 1071683584 + .long 0 + .long 1071682560 + .long 0 + .long 1071681536 + .long 0 + .long 1071680512 + .long 0 + .long 1071679488 + .long 0 + .long 1071677440 + .long 0 + .long 1071676416 + .long 0 + .long 1071675392 + .long 0 + .long 1071674368 + .long 0 + .long 1071673344 + .long 0 + .long 1071672320 + .long 0 + .long 1071671296 + .long 0 + .long 1071670272 + .long 0 + .long 1071669248 + .long 0 + .long 1071668224 + .long 0 + .long 1071667200 + .long 0 + .long 1071666176 + .long 0 + .long 1071665152 + .long 0 + .long 1071663104 + .long 0 + .long 1071662080 + .long 0 + .long 1071661056 + .long 0 + .long 1071660032 + .long 0 + .long 1071659008 + .long 0 + .long 1071657984 + .long 0 + .long 1071656960 + .long 0 + .long 1071655936 + .long 0 + .long 1071654912 + .long 0 + .long 1071653888 + .long 0 + .long 1071652864 + .long 0 + .long 1071651840 + .long 0 + .long 1071650816 + .long 0 + .long 1071649792 + .long 0 + .long 1071648768 + .long 0 + .long 1071647744 + .long 0 + .long 1071646720 + .long 0 + .long 1071645696 + .long 0 + .long 0 + .long 0 + .long 1071644672 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 33554432 + .long 1101004800 + .type _vmldHypotHATab,@object + .size _vmldHypotHATab,4136 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S new file mode 100644 index 0000000000..a6ba40df4d --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized hypotf. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN16vv_hypotf _ZGVeN16vv_hypotf_avx2_wrapper +#include "../svml_s_hypotf16_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c new file mode 100644 index 0000000000..0c9eb6a364 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized hypotf, vector length is 16. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN16vv_hypotf +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN16vv_hypotf, __GI__ZGVeN16vv_hypotf, + __redirect__ZGVeN16vv_hypotf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S new file mode 100644 index 0000000000..c603fc7219 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S @@ -0,0 +1,1684 @@ +/* Function hypotf vectorized with AVX-512. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * HIGH LEVEL OVERVIEW + * + * Calculate z = (x*x+y*y) + * Calculate reciplicle sqrt (z) + * Calculate make two NR iterations + * + * ALGORITHM DETAILS + * + * Multiprecision branch for _HA_ only + * Remove sigm from both arguments + * Find maximum (_x) and minimum (_y) (by abs value) between arguments + * Split _x int _a and _b for multiprecision + * If _x >> _y we will we will not split _y for multiprecision + * all _y will be put into lower part (_d) and higher part (_c = 0) + * Fixing _hilo_mask for the case _x >> _y + * Split _y into _c and _d for multiprecision with fixed mask + * + * compute Hi and Lo parts of _z = _x*_x + _y*_y + * + * _zHi = _a*_a + _c*_c + * _zLo = (_x + _a)*_b + _d*_y + _d*_c + * _z = _zHi + _zLo + * + * No multiprecision branch for _LA_ and _EP_ + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2 + * + * Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout + * + * Compute resciplicle sqrt s0 ~ 1.0/sqrt(_z), + * that multiplied by _z, is final result for _EP_ version. + * + * First iteration (or zero iteration): + * s = z * s0 + * h = .5 * s0 + * d = s * h - .5 + * + * Second iteration: + * h = d * h + h + * s = s * d + s + * d = s * s - z (in multiprecision for _HA_) + * + * result = s - h * d + * + * EP version of the function can be implemented as y[i]=sqrt(a[i]^2+b[i]^2) + * with all intermediate operations done in target precision for i=1,..,n. + * It can return result y[i]=0 in case a[i]^2 and b[i]^2 underflow in target + * precision (for some i). It can return result y[i]=NAN in case + * a[i]^2+b[i]^2 overflow in target precision, for some i. It can return + * result y[i]=NAN in case a[i] or b[i] is infinite, for some i. + * + * + */ + +#include + + .text +ENTRY(_ZGVeN16vv_hypotf_skx) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $256, %rsp + vgetexpps {sae}, %zmm0, %zmm2 + vgetexpps {sae}, %zmm1, %zmm3 + vmovups 192+__svml_shypot_data_internal(%rip), %zmm6 + vmaxps {sae}, %zmm3, %zmm2, %zmm4 + vmulps {rn-sae}, %zmm0, %zmm0, %zmm2 + vandps 128+__svml_shypot_data_internal(%rip), %zmm4, %zmm5 + vfmadd231ps {rn-sae}, %zmm1, %zmm1, %zmm2 + vpcmpd $5, 512+__svml_shypot_data_internal(%rip), %zmm5, %k0 + vrsqrt14ps %zmm2, %zmm7 + kmovw %k0, %edx + vmulps {rn-sae}, %zmm7, %zmm2, %zmm9 + vmulps {rn-sae}, %zmm7, %zmm6, %zmm8 + vfnmadd231ps {rn-sae}, %zmm9, %zmm9, %zmm2 + vfmadd213ps {rn-sae}, %zmm9, %zmm8, %zmm2 + +/* + * VSCALEF( S, _VRES1, _VRES1, sExp ); + * The end of implementation + */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovaps %zmm2, %zmm0 + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + vmovups %zmm0, 64(%rsp) + vmovups %zmm1, 128(%rsp) + vmovups %zmm2, 192(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + kmovw %k4, 24(%rsp) + kmovw %k5, 16(%rsp) + kmovw %k6, 8(%rsp) + kmovw %k7, (%rsp) + movq %rsi, 40(%rsp) + movq %rdi, 32(%rsp) + movq %r12, 56(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x28, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x38, 0xff, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 48(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x18, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $16, %r12d + jl .LBL_1_7 + kmovw 24(%rsp), %k4 + cfi_restore(122) + kmovw 16(%rsp), %k5 + cfi_restore(123) + kmovw 8(%rsp), %k6 + cfi_restore(124) + kmovw (%rsp), %k7 + cfi_restore(125) + vmovups 192(%rsp), %zmm2 + movq 40(%rsp), %rsi + cfi_restore(4) + movq 32(%rsp), %rdi + cfi_restore(5) + movq 56(%rsp), %r12 + cfi_restore(12) + movq 48(%rsp), %r13 + cfi_restore(13) + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x28, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x38, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfa, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x18, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xfd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,4), %rdi + lea 128(%rsp,%r12,4), %rsi + lea 192(%rsp,%r12,4), %rdx + call __svml_shypot_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVeN16vv_hypotf_skx) + + .align 16,0x90 + +__svml_shypot_cout_rare_internal: + + cfi_startproc + + movq %rdx, %r8 + movzwl 2(%rdi), %eax + andl $32640, %eax + cmpl $32640, %eax + je .LBL_2_14 + movzwl 2(%rsi), %eax + andl $32640, %eax + cmpl $32640, %eax + je .LBL_2_13 + pxor %xmm2, %xmm2 + pxor %xmm1, %xmm1 + cvtss2sd (%rdi), %xmm2 + cvtss2sd (%rsi), %xmm1 + movsd 4096+_vmlsHypotHATab(%rip), %xmm0 + movzwl 4102+_vmlsHypotHATab(%rip), %edi + ucomisd %xmm0, %xmm2 + jp .LBL_2_4 + je .LBL_2_11 + +.LBL_2_4: + movsd %xmm2, -16(%rsp) + movsd %xmm1, -48(%rsp) + andb $127, -9(%rsp) + andb $127, -41(%rsp) + movsd -16(%rsp), %xmm8 + movsd -48(%rsp), %xmm1 + comisd %xmm8, %xmm1 + jbe .LBL_2_6 + movaps %xmm8, %xmm2 + movaps %xmm1, %xmm8 + movsd %xmm1, -16(%rsp) + movaps %xmm2, %xmm1 + +.LBL_2_6: + movzwl -10(%rsp), %edx + andl $-32753, %edi + andl $32752, %edx + shrl $4, %edx + negl %edx + movsd %xmm0, -56(%rsp) + movsd 4128+_vmlsHypotHATab(%rip), %xmm3 + lea 1025(%rdx), %esi + negl %esi + addl $1000, %esi + shrl $31, %esi + imull $-23, %esi, %eax + lea 1025(%rax,%rdx), %esi + lea 1023(%rsi), %ecx + andl $2047, %ecx + shll $4, %ecx + orl %ecx, %edi + movw %di, -50(%rsp) + movsd -56(%rsp), %xmm2 + mulsd %xmm2, %xmm8 + mulsd %xmm2, %xmm1 + mulsd %xmm8, %xmm3 + movsd %xmm3, -72(%rsp) + movsd -72(%rsp), %xmm4 + movsd %xmm8, -16(%rsp) + subsd %xmm8, %xmm4 + movsd %xmm4, -64(%rsp) + movsd -72(%rsp), %xmm6 + movsd -64(%rsp), %xmm5 + movsd %xmm1, -48(%rsp) + subsd %xmm5, %xmm6 + movsd %xmm6, -72(%rsp) + movsd -72(%rsp), %xmm7 + movzwl -10(%rsp), %r9d + subsd %xmm7, %xmm8 + movzwl -42(%rsp), %edi + andl $32752, %r9d + andl $32752, %edi + shrl $4, %r9d + shrl $4, %edi + movsd %xmm8, -64(%rsp) + subl %edi, %r9d + movsd -72(%rsp), %xmm7 + movsd -64(%rsp), %xmm8 + cmpl $6, %r9d + jle .LBL_2_8 + movaps %xmm1, %xmm2 + jmp .LBL_2_9 + +.LBL_2_8: + movsd -48(%rsp), %xmm1 + movsd 4128+_vmlsHypotHATab(%rip), %xmm0 + movaps %xmm1, %xmm6 + mulsd %xmm1, %xmm0 + movsd %xmm0, -72(%rsp) + movsd -72(%rsp), %xmm2 + subsd -48(%rsp), %xmm2 + movsd %xmm2, -64(%rsp) + movsd -72(%rsp), %xmm4 + movsd -64(%rsp), %xmm3 + subsd %xmm3, %xmm4 + movsd %xmm4, -72(%rsp) + movsd -72(%rsp), %xmm5 + subsd %xmm5, %xmm6 + movsd %xmm6, -64(%rsp) + movsd -72(%rsp), %xmm0 + movsd -64(%rsp), %xmm2 + +.LBL_2_9: + movsd -16(%rsp), %xmm5 + movaps %xmm0, %xmm4 + mulsd %xmm0, %xmm4 + addsd %xmm1, %xmm0 + addsd %xmm7, %xmm5 + mulsd %xmm2, %xmm0 + mulsd %xmm5, %xmm8 + movaps %xmm7, %xmm3 + negl %esi + mulsd %xmm7, %xmm3 + addsd %xmm8, %xmm0 + movq 4112+_vmlsHypotHATab(%rip), %r11 + movq %r11, %r9 + lea _vmlsHypotHATab(%rip), %rdx + addsd %xmm4, %xmm3 + addl $1023, %esi + addsd %xmm0, %xmm3 + movsd %xmm3, -56(%rsp) + andl $2047, %esi + movzwl -50(%rsp), %ecx + andl $32752, %ecx + shrl $4, %ecx + addl $-1023, %ecx + movl %ecx, %eax + andl $1, %eax + subl %eax, %ecx + shrl $1, %ecx + movsd %xmm3, -48(%rsp) + movzwl -42(%rsp), %edi + andl $-32753, %edi + shrq $48, %r9 + lea 1023(%rcx), %r10d + addl %ecx, %ecx + addl $16368, %edi + negl %ecx + andl $2047, %r10d + addl $1023, %ecx + andl $2047, %ecx + andl $-32753, %r9d + movw %di, -42(%rsp) + shll $4, %r10d + shll $4, %ecx + orl %r9d, %r10d + shll $4, %esi + orl %r9d, %ecx + movsd -48(%rsp), %xmm2 + orl %esi, %r9d + movl -44(%rsp), %esi + mulsd 4112(%rdx,%rax,8), %xmm2 + andl $1048575, %esi + shrl $12, %esi + shll $8, %eax + addl %eax, %esi + movsd (%rdx,%rsi,8), %xmm6 + movsd 4104+_vmlsHypotHATab(%rip), %xmm1 + mulsd %xmm6, %xmm2 + mulsd %xmm6, %xmm1 + movaps %xmm2, %xmm7 + mulsd %xmm1, %xmm7 + movsd 4104+_vmlsHypotHATab(%rip), %xmm9 + movsd 4104+_vmlsHypotHATab(%rip), %xmm12 + subsd %xmm7, %xmm9 + movaps %xmm9, %xmm8 + mulsd %xmm2, %xmm9 + mulsd %xmm1, %xmm8 + addsd %xmm9, %xmm2 + addsd %xmm8, %xmm1 + movaps %xmm2, %xmm10 + movaps %xmm1, %xmm11 + mulsd %xmm1, %xmm10 + movsd 4104+_vmlsHypotHATab(%rip), %xmm0 + subsd %xmm10, %xmm12 + mulsd %xmm12, %xmm11 + mulsd %xmm2, %xmm12 + addsd %xmm11, %xmm1 + addsd %xmm12, %xmm2 + movaps %xmm2, %xmm13 + movaps %xmm2, %xmm14 + mulsd %xmm1, %xmm13 + movsd 4128+_vmlsHypotHATab(%rip), %xmm15 + subsd %xmm13, %xmm0 + mulsd %xmm0, %xmm14 + mulsd %xmm1, %xmm0 + addsd %xmm14, %xmm2 + addsd %xmm0, %xmm1 + mulsd %xmm2, %xmm15 + movsd %xmm15, -72(%rsp) + movaps %xmm2, %xmm8 + movsd -72(%rsp), %xmm4 + movsd 4120+_vmlsHypotHATab(%rip), %xmm10 + subsd %xmm2, %xmm4 + movsd %xmm4, -64(%rsp) + movsd -72(%rsp), %xmm6 + movsd -64(%rsp), %xmm5 + movq %r11, -32(%rsp) + subsd %xmm5, %xmm6 + movsd %xmm6, -72(%rsp) + movsd -72(%rsp), %xmm7 + movw %cx, -26(%rsp) + subsd %xmm7, %xmm8 + movsd %xmm8, -64(%rsp) + movsd -72(%rsp), %xmm11 + movsd -64(%rsp), %xmm12 + movaps %xmm11, %xmm13 + mulsd %xmm12, %xmm10 + mulsd %xmm12, %xmm12 + xorps .FLT_49(%rip), %xmm13 + xorps .FLT_49(%rip), %xmm12 + subsd %xmm10, %xmm13 + mulsd %xmm11, %xmm13 + movsd -32(%rsp), %xmm9 + addsd %xmm12, %xmm13 + mulsd %xmm9, %xmm3 + movq %r11, -40(%rsp) + addsd %xmm13, %xmm3 + mulsd %xmm1, %xmm3 + movw %r10w, -34(%rsp) + addsd %xmm3, %xmm2 + mulsd -40(%rsp), %xmm2 + movq %r11, -24(%rsp) + movw %r9w, -18(%rsp) + mulsd -24(%rsp), %xmm2 + cvtsd2ss %xmm2, %xmm2 + movss %xmm2, (%r8) + +.LBL_2_10: + xorl %eax, %eax + ret + +.LBL_2_11: + ucomisd %xmm0, %xmm1 + jne .LBL_2_4 + jp .LBL_2_4 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_13: + movss (%rsi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_14: + movzwl 2(%rsi), %eax + andl $32640, %eax + cmpl $32640, %eax + je .LBL_2_16 + +.LBL_2_15: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_16: + movl (%rdi), %eax + testl $8388607, %eax + je .LBL_2_22 + testl $8388607, (%rsi) + je .LBL_2_19 + movss (%rdi), %xmm0 + mulss (%rsi), %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_19: + testl $4194304, %eax + jne .LBL_2_13 + movsd 4112+_vmlsHypotHATab(%rip), %xmm0 + cvtsd2ss %xmm0, %xmm0 + mulss (%rdi), %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_22: + movl (%rsi), %eax + testl $8388607, %eax + je .LBL_2_15 + testl $4194304, %eax + jne .LBL_2_15 + movsd 4112+_vmlsHypotHATab(%rip), %xmm0 + cvtsd2ss %xmm0, %xmm0 + mulss (%rsi), %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + + cfi_endproc + + .type __svml_shypot_cout_rare_internal,@function + .size __svml_shypot_cout_rare_internal,.-__svml_shypot_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_shypot_data_internal: + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .type __svml_shypot_data_internal,@object + .size __svml_shypot_data_internal,576 + .align 32 + +_vmlsHypotHATab: + .long 0 + .long 1072693248 + .long 0 + .long 1072689152 + .long 0 + .long 1072685056 + .long 0 + .long 1072680960 + .long 0 + .long 1072676864 + .long 0 + .long 1072672768 + .long 0 + .long 1072668672 + .long 0 + .long 1072665600 + .long 0 + .long 1072661504 + .long 0 + .long 1072657408 + .long 0 + .long 1072653312 + .long 0 + .long 1072649216 + .long 0 + .long 1072646144 + .long 0 + .long 1072642048 + .long 0 + .long 1072637952 + .long 0 + .long 1072634880 + .long 0 + .long 1072630784 + .long 0 + .long 1072626688 + .long 0 + .long 1072623616 + .long 0 + .long 1072619520 + .long 0 + .long 1072615424 + .long 0 + .long 1072612352 + .long 0 + .long 1072608256 + .long 0 + .long 1072605184 + .long 0 + .long 1072601088 + .long 0 + .long 1072598016 + .long 0 + .long 1072593920 + .long 0 + .long 1072590848 + .long 0 + .long 1072586752 + .long 0 + .long 1072583680 + .long 0 + .long 1072580608 + .long 0 + .long 1072576512 + .long 0 + .long 1072573440 + .long 0 + .long 1072570368 + .long 0 + .long 1072566272 + .long 0 + .long 1072563200 + .long 0 + .long 1072560128 + .long 0 + .long 1072556032 + .long 0 + .long 1072552960 + .long 0 + .long 1072549888 + .long 0 + .long 1072546816 + .long 0 + .long 1072542720 + .long 0 + .long 1072539648 + .long 0 + .long 1072536576 + .long 0 + .long 1072533504 + .long 0 + .long 1072530432 + .long 0 + .long 1072527360 + .long 0 + .long 1072523264 + .long 0 + .long 1072520192 + .long 0 + .long 1072517120 + .long 0 + .long 1072514048 + .long 0 + .long 1072510976 + .long 0 + .long 1072507904 + .long 0 + .long 1072504832 + .long 0 + .long 1072501760 + .long 0 + .long 1072498688 + .long 0 + .long 1072495616 + .long 0 + .long 1072492544 + .long 0 + .long 1072489472 + .long 0 + .long 1072486400 + .long 0 + .long 1072483328 + .long 0 + .long 1072480256 + .long 0 + .long 1072478208 + .long 0 + .long 1072475136 + .long 0 + .long 1072472064 + .long 0 + .long 1072468992 + .long 0 + .long 1072465920 + .long 0 + .long 1072462848 + .long 0 + .long 1072459776 + .long 0 + .long 1072457728 + .long 0 + .long 1072454656 + .long 0 + .long 1072451584 + .long 0 + .long 1072448512 + .long 0 + .long 1072446464 + .long 0 + .long 1072443392 + .long 0 + .long 1072440320 + .long 0 + .long 1072437248 + .long 0 + .long 1072435200 + .long 0 + .long 1072432128 + .long 0 + .long 1072429056 + .long 0 + .long 1072427008 + .long 0 + .long 1072423936 + .long 0 + .long 1072420864 + .long 0 + .long 1072418816 + .long 0 + .long 1072415744 + .long 0 + .long 1072412672 + .long 0 + .long 1072410624 + .long 0 + .long 1072407552 + .long 0 + .long 1072405504 + .long 0 + .long 1072402432 + .long 0 + .long 1072400384 + .long 0 + .long 1072397312 + .long 0 + .long 1072395264 + .long 0 + .long 1072392192 + .long 0 + .long 1072390144 + .long 0 + .long 1072387072 + .long 0 + .long 1072385024 + .long 0 + .long 1072381952 + .long 0 + .long 1072379904 + .long 0 + .long 1072376832 + .long 0 + .long 1072374784 + .long 0 + .long 1072371712 + .long 0 + .long 1072369664 + .long 0 + .long 1072366592 + .long 0 + .long 1072364544 + .long 0 + .long 1072362496 + .long 0 + .long 1072359424 + .long 0 + .long 1072357376 + .long 0 + .long 1072355328 + .long 0 + .long 1072352256 + .long 0 + .long 1072350208 + .long 0 + .long 1072347136 + .long 0 + .long 1072345088 + .long 0 + .long 1072343040 + .long 0 + .long 1072340992 + .long 0 + .long 1072337920 + .long 0 + .long 1072335872 + .long 0 + .long 1072333824 + .long 0 + .long 1072330752 + .long 0 + .long 1072328704 + .long 0 + .long 1072326656 + .long 0 + .long 1072324608 + .long 0 + .long 1072321536 + .long 0 + .long 1072319488 + .long 0 + .long 1072317440 + .long 0 + .long 1072315392 + .long 0 + .long 1072313344 + .long 0 + .long 1072310272 + .long 0 + .long 1072308224 + .long 0 + .long 1072306176 + .long 0 + .long 1072304128 + .long 0 + .long 1072302080 + .long 0 + .long 1072300032 + .long 0 + .long 1072296960 + .long 0 + .long 1072294912 + .long 0 + .long 1072292864 + .long 0 + .long 1072290816 + .long 0 + .long 1072288768 + .long 0 + .long 1072286720 + .long 0 + .long 1072284672 + .long 0 + .long 1072282624 + .long 0 + .long 1072280576 + .long 0 + .long 1072278528 + .long 0 + .long 1072275456 + .long 0 + .long 1072273408 + .long 0 + .long 1072271360 + .long 0 + .long 1072269312 + .long 0 + .long 1072267264 + .long 0 + .long 1072265216 + .long 0 + .long 1072263168 + .long 0 + .long 1072261120 + .long 0 + .long 1072259072 + .long 0 + .long 1072257024 + .long 0 + .long 1072254976 + .long 0 + .long 1072252928 + .long 0 + .long 1072250880 + .long 0 + .long 1072248832 + .long 0 + .long 1072246784 + .long 0 + .long 1072244736 + .long 0 + .long 1072243712 + .long 0 + .long 1072241664 + .long 0 + .long 1072239616 + .long 0 + .long 1072237568 + .long 0 + .long 1072235520 + .long 0 + .long 1072233472 + .long 0 + .long 1072231424 + .long 0 + .long 1072229376 + .long 0 + .long 1072227328 + .long 0 + .long 1072225280 + .long 0 + .long 1072223232 + .long 0 + .long 1072222208 + .long 0 + .long 1072220160 + .long 0 + .long 1072218112 + .long 0 + .long 1072216064 + .long 0 + .long 1072214016 + .long 0 + .long 1072211968 + .long 0 + .long 1072210944 + .long 0 + .long 1072208896 + .long 0 + .long 1072206848 + .long 0 + .long 1072204800 + .long 0 + .long 1072202752 + .long 0 + .long 1072201728 + .long 0 + .long 1072199680 + .long 0 + .long 1072197632 + .long 0 + .long 1072195584 + .long 0 + .long 1072193536 + .long 0 + .long 1072192512 + .long 0 + .long 1072190464 + .long 0 + .long 1072188416 + .long 0 + .long 1072186368 + .long 0 + .long 1072185344 + .long 0 + .long 1072183296 + .long 0 + .long 1072181248 + .long 0 + .long 1072179200 + .long 0 + .long 1072178176 + .long 0 + .long 1072176128 + .long 0 + .long 1072174080 + .long 0 + .long 1072173056 + .long 0 + .long 1072171008 + .long 0 + .long 1072168960 + .long 0 + .long 1072167936 + .long 0 + .long 1072165888 + .long 0 + .long 1072163840 + .long 0 + .long 1072161792 + .long 0 + .long 1072160768 + .long 0 + .long 1072158720 + .long 0 + .long 1072157696 + .long 0 + .long 1072155648 + .long 0 + .long 1072153600 + .long 0 + .long 1072152576 + .long 0 + .long 1072150528 + .long 0 + .long 1072148480 + .long 0 + .long 1072147456 + .long 0 + .long 1072145408 + .long 0 + .long 1072143360 + .long 0 + .long 1072142336 + .long 0 + .long 1072140288 + .long 0 + .long 1072139264 + .long 0 + .long 1072137216 + .long 0 + .long 1072135168 + .long 0 + .long 1072134144 + .long 0 + .long 1072132096 + .long 0 + .long 1072131072 + .long 0 + .long 1072129024 + .long 0 + .long 1072128000 + .long 0 + .long 1072125952 + .long 0 + .long 1072124928 + .long 0 + .long 1072122880 + .long 0 + .long 1072120832 + .long 0 + .long 1072119808 + .long 0 + .long 1072117760 + .long 0 + .long 1072116736 + .long 0 + .long 1072114688 + .long 0 + .long 1072113664 + .long 0 + .long 1072111616 + .long 0 + .long 1072110592 + .long 0 + .long 1072108544 + .long 0 + .long 1072107520 + .long 0 + .long 1072105472 + .long 0 + .long 1072104448 + .long 0 + .long 1072102400 + .long 0 + .long 1072101376 + .long 0 + .long 1072099328 + .long 0 + .long 1072098304 + .long 0 + .long 1072096256 + .long 0 + .long 1072095232 + .long 0 + .long 1072094208 + .long 0 + .long 1072092160 + .long 0 + .long 1072091136 + .long 0 + .long 1072089088 + .long 0 + .long 1072088064 + .long 0 + .long 1072086016 + .long 0 + .long 1072084992 + .long 0 + .long 1072082944 + .long 0 + .long 1072081920 + .long 0 + .long 1072080896 + .long 0 + .long 1072078848 + .long 0 + .long 1072075776 + .long 0 + .long 1072073728 + .long 0 + .long 1072070656 + .long 0 + .long 1072067584 + .long 0 + .long 1072064512 + .long 0 + .long 1072061440 + .long 0 + .long 1072059392 + .long 0 + .long 1072056320 + .long 0 + .long 1072053248 + .long 0 + .long 1072051200 + .long 0 + .long 1072048128 + .long 0 + .long 1072045056 + .long 0 + .long 1072043008 + .long 0 + .long 1072039936 + .long 0 + .long 1072037888 + .long 0 + .long 1072034816 + .long 0 + .long 1072031744 + .long 0 + .long 1072029696 + .long 0 + .long 1072026624 + .long 0 + .long 1072024576 + .long 0 + .long 1072021504 + .long 0 + .long 1072019456 + .long 0 + .long 1072016384 + .long 0 + .long 1072014336 + .long 0 + .long 1072011264 + .long 0 + .long 1072009216 + .long 0 + .long 1072006144 + .long 0 + .long 1072004096 + .long 0 + .long 1072002048 + .long 0 + .long 1071998976 + .long 0 + .long 1071996928 + .long 0 + .long 1071993856 + .long 0 + .long 1071991808 + .long 0 + .long 1071989760 + .long 0 + .long 1071986688 + .long 0 + .long 1071984640 + .long 0 + .long 1071982592 + .long 0 + .long 1071979520 + .long 0 + .long 1071977472 + .long 0 + .long 1071975424 + .long 0 + .long 1071972352 + .long 0 + .long 1071970304 + .long 0 + .long 1071968256 + .long 0 + .long 1071966208 + .long 0 + .long 1071964160 + .long 0 + .long 1071961088 + .long 0 + .long 1071959040 + .long 0 + .long 1071956992 + .long 0 + .long 1071954944 + .long 0 + .long 1071952896 + .long 0 + .long 1071949824 + .long 0 + .long 1071947776 + .long 0 + .long 1071945728 + .long 0 + .long 1071943680 + .long 0 + .long 1071941632 + .long 0 + .long 1071939584 + .long 0 + .long 1071937536 + .long 0 + .long 1071935488 + .long 0 + .long 1071933440 + .long 0 + .long 1071930368 + .long 0 + .long 1071928320 + .long 0 + .long 1071926272 + .long 0 + .long 1071924224 + .long 0 + .long 1071922176 + .long 0 + .long 1071920128 + .long 0 + .long 1071918080 + .long 0 + .long 1071916032 + .long 0 + .long 1071913984 + .long 0 + .long 1071911936 + .long 0 + .long 1071909888 + .long 0 + .long 1071907840 + .long 0 + .long 1071905792 + .long 0 + .long 1071903744 + .long 0 + .long 1071901696 + .long 0 + .long 1071900672 + .long 0 + .long 1071898624 + .long 0 + .long 1071896576 + .long 0 + .long 1071894528 + .long 0 + .long 1071892480 + .long 0 + .long 1071890432 + .long 0 + .long 1071888384 + .long 0 + .long 1071886336 + .long 0 + .long 1071884288 + .long 0 + .long 1071883264 + .long 0 + .long 1071881216 + .long 0 + .long 1071879168 + .long 0 + .long 1071877120 + .long 0 + .long 1071875072 + .long 0 + .long 1071873024 + .long 0 + .long 1071872000 + .long 0 + .long 1071869952 + .long 0 + .long 1071867904 + .long 0 + .long 1071865856 + .long 0 + .long 1071864832 + .long 0 + .long 1071862784 + .long 0 + .long 1071860736 + .long 0 + .long 1071858688 + .long 0 + .long 1071856640 + .long 0 + .long 1071855616 + .long 0 + .long 1071853568 + .long 0 + .long 1071851520 + .long 0 + .long 1071850496 + .long 0 + .long 1071848448 + .long 0 + .long 1071846400 + .long 0 + .long 1071844352 + .long 0 + .long 1071843328 + .long 0 + .long 1071841280 + .long 0 + .long 1071839232 + .long 0 + .long 1071838208 + .long 0 + .long 1071836160 + .long 0 + .long 1071834112 + .long 0 + .long 1071833088 + .long 0 + .long 1071831040 + .long 0 + .long 1071830016 + .long 0 + .long 1071827968 + .long 0 + .long 1071825920 + .long 0 + .long 1071824896 + .long 0 + .long 1071822848 + .long 0 + .long 1071821824 + .long 0 + .long 1071819776 + .long 0 + .long 1071817728 + .long 0 + .long 1071816704 + .long 0 + .long 1071814656 + .long 0 + .long 1071813632 + .long 0 + .long 1071811584 + .long 0 + .long 1071810560 + .long 0 + .long 1071808512 + .long 0 + .long 1071806464 + .long 0 + .long 1071805440 + .long 0 + .long 1071803392 + .long 0 + .long 1071802368 + .long 0 + .long 1071800320 + .long 0 + .long 1071799296 + .long 0 + .long 1071797248 + .long 0 + .long 1071796224 + .long 0 + .long 1071794176 + .long 0 + .long 1071793152 + .long 0 + .long 1071791104 + .long 0 + .long 1071790080 + .long 0 + .long 1071788032 + .long 0 + .long 1071787008 + .long 0 + .long 1071784960 + .long 0 + .long 1071783936 + .long 0 + .long 1071782912 + .long 0 + .long 1071780864 + .long 0 + .long 1071779840 + .long 0 + .long 1071777792 + .long 0 + .long 1071776768 + .long 0 + .long 1071774720 + .long 0 + .long 1071773696 + .long 0 + .long 1071772672 + .long 0 + .long 1071770624 + .long 0 + .long 1071769600 + .long 0 + .long 1071767552 + .long 0 + .long 1071766528 + .long 0 + .long 1071765504 + .long 0 + .long 1071763456 + .long 0 + .long 1071762432 + .long 0 + .long 1071760384 + .long 0 + .long 1071759360 + .long 0 + .long 1071758336 + .long 0 + .long 1071756288 + .long 0 + .long 1071755264 + .long 0 + .long 1071754240 + .long 0 + .long 1071752192 + .long 0 + .long 1071751168 + .long 0 + .long 1071750144 + .long 0 + .long 1071748096 + .long 0 + .long 1071747072 + .long 0 + .long 1071746048 + .long 0 + .long 1071744000 + .long 0 + .long 1071742976 + .long 0 + .long 1071741952 + .long 0 + .long 1071739904 + .long 0 + .long 1071738880 + .long 0 + .long 1071737856 + .long 0 + .long 1071736832 + .long 0 + .long 1071734784 + .long 0 + .long 1071733760 + .long 0 + .long 1071732736 + .long 0 + .long 1071730688 + .long 0 + .long 1071729664 + .long 0 + .long 1071728640 + .long 0 + .long 1071727616 + .long 0 + .long 1071725568 + .long 0 + .long 1071724544 + .long 0 + .long 1071723520 + .long 0 + .long 1071722496 + .long 0 + .long 1071720448 + .long 0 + .long 1071719424 + .long 0 + .long 1071718400 + .long 0 + .long 1071717376 + .long 0 + .long 1071715328 + .long 0 + .long 1071714304 + .long 0 + .long 1071713280 + .long 0 + .long 1071712256 + .long 0 + .long 1071711232 + .long 0 + .long 1071709184 + .long 0 + .long 1071708160 + .long 0 + .long 1071707136 + .long 0 + .long 1071706112 + .long 0 + .long 1071705088 + .long 0 + .long 1071704064 + .long 0 + .long 1071702016 + .long 0 + .long 1071700992 + .long 0 + .long 1071699968 + .long 0 + .long 1071698944 + .long 0 + .long 1071697920 + .long 0 + .long 1071696896 + .long 0 + .long 1071694848 + .long 0 + .long 1071693824 + .long 0 + .long 1071692800 + .long 0 + .long 1071691776 + .long 0 + .long 1071690752 + .long 0 + .long 1071689728 + .long 0 + .long 1071688704 + .long 0 + .long 1071686656 + .long 0 + .long 1071685632 + .long 0 + .long 1071684608 + .long 0 + .long 1071683584 + .long 0 + .long 1071682560 + .long 0 + .long 1071681536 + .long 0 + .long 1071680512 + .long 0 + .long 1071679488 + .long 0 + .long 1071677440 + .long 0 + .long 1071676416 + .long 0 + .long 1071675392 + .long 0 + .long 1071674368 + .long 0 + .long 1071673344 + .long 0 + .long 1071672320 + .long 0 + .long 1071671296 + .long 0 + .long 1071670272 + .long 0 + .long 1071669248 + .long 0 + .long 1071668224 + .long 0 + .long 1071667200 + .long 0 + .long 1071666176 + .long 0 + .long 1071665152 + .long 0 + .long 1071663104 + .long 0 + .long 1071662080 + .long 0 + .long 1071661056 + .long 0 + .long 1071660032 + .long 0 + .long 1071659008 + .long 0 + .long 1071657984 + .long 0 + .long 1071656960 + .long 0 + .long 1071655936 + .long 0 + .long 1071654912 + .long 0 + .long 1071653888 + .long 0 + .long 1071652864 + .long 0 + .long 1071651840 + .long 0 + .long 1071650816 + .long 0 + .long 1071649792 + .long 0 + .long 1071648768 + .long 0 + .long 1071647744 + .long 0 + .long 1071646720 + .long 0 + .long 1071645696 + .long 0 + .long 0 + .long 0 + .long 1071644672 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 33554432 + .long 1101004800 + .type _vmlsHypotHATab,@object + .size _vmlsHypotHATab,4136 + .space 472, 0x00 + .align 16 + +.FLT_49: + .long 0x00000000,0x80000000,0x00000000,0x00000000 + .type .FLT_49,@object + .size .FLT_49,16 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S new file mode 100644 index 0000000000..5e9dd22d94 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized hypotf. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN4vv_hypotf _ZGVbN4vv_hypotf_sse2 +#include "../svml_s_hypotf4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c new file mode 100644 index 0000000000..91c9f5ca3f --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized hypotf, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN4vv_hypotf +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN4vv_hypotf, __GI__ZGVbN4vv_hypotf, + __redirect__ZGVbN4vv_hypotf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S new file mode 100644 index 0000000000..4ab49ecc0f --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S @@ -0,0 +1,2062 @@ +/* Function hypotf vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * HIGH LEVEL OVERVIEW + * + * Calculate z = (x*x+y*y) + * Calculate reciplicle sqrt (z) + * Calculate make two NR iterations + * + * ALGORITHM DETAILS + * + * Multiprecision branch for _HA_ only + * Remove sigm from both arguments + * Find maximum (_x) and minimum (_y) (by abs value) between arguments + * Split _x int _a and _b for multiprecision + * If _x >> _y we will we will not split _y for multiprecision + * all _y will be put into lower part (_d) and higher part (_c = 0) + * Fixing _hilo_mask for the case _x >> _y + * Split _y into _c and _d for multiprecision with fixed mask + * + * compute Hi and Lo parts of _z = _x*_x + _y*_y + * + * _zHi = _a*_a + _c*_c + * _zLo = (_x + _a)*_b + _d*_y + _d*_c + * _z = _zHi + _zLo + * + * No multiprecision branch for _LA_ and _EP_ + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2 + * + * Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout + * + * Compute resciplicle sqrt s0 ~ 1.0/sqrt(_z), + * that multiplied by _z, is final result for _EP_ version. + * + * First iteration (or zero iteration): + * s = z * s0 + * h = .5 * s0 + * d = s * h - .5 + * + * Second iteration: + * h = d * h + h + * s = s * d + s + * d = s * s - z (in multiprecision for _HA_) + * + * result = s - h * d + * + * EP version of the function can be implemented as y[i]=sqrt(a[i]^2+b[i]^2) + * with all intermediate operations done in target precision for i=1,..,n. + * It can return result y[i]=0 in case a[i]^2 and b[i]^2 underflow in target + * precision (for some i). It can return result y[i]=NAN in case + * a[i]^2+b[i]^2 overflow in target precision, for some i. It can return + * result y[i]=NAN in case a[i] or b[i] is infinite, for some i. + * + * + */ + +#include + + .text +ENTRY(_ZGVbN4vv_hypotf_sse4) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + movaps %xmm1, %xmm5 + movaps %xmm0, %xmm4 + movaps %xmm5, %xmm6 + +/* + * Implementation + * Multiprecision branch for _HA_ only + * No multiprecision branch for _LA_ + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2 + */ + movaps %xmm4, %xmm3 + mulps %xmm4, %xmm3 + mulps %xmm5, %xmm6 + +/* Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout */ + movdqu 384+__svml_shypot_data_internal(%rip), %xmm0 + addps %xmm6, %xmm3 + +/* _s0 ~ 1.0/sqrt(_z) */ + rsqrtps %xmm3, %xmm1 + movaps %xmm3, %xmm7 + pcmpgtd %xmm3, %xmm0 + pcmpgtd 448+__svml_shypot_data_internal(%rip), %xmm7 + por %xmm7, %xmm0 + movmskps %xmm0, %edx + +/* First iteration */ + movaps %xmm1, %xmm0 + +/* + * Variables + * Defines + * Constants loading + */ + movups 192+__svml_shypot_data_internal(%rip), %xmm2 + mulps %xmm3, %xmm0 + mulps %xmm2, %xmm1 + movaps %xmm0, %xmm6 + mulps %xmm1, %xmm6 + subps %xmm6, %xmm2 + +/* Second iteration */ + movaps %xmm2, %xmm6 + mulps %xmm0, %xmm2 + mulps %xmm1, %xmm6 + addps %xmm2, %xmm0 + addps %xmm6, %xmm1 + +/* Finish second iteration in native precision for _LA_ */ + movaps %xmm0, %xmm6 + mulps %xmm0, %xmm6 + subps %xmm3, %xmm6 + mulps %xmm6, %xmm1 + subps %xmm1, %xmm0 + +/* The end of implementation */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +.LBL_1_3: + movups %xmm4, 192(%rsp) + movups %xmm5, 256(%rsp) + movups %xmm0, 320(%rsp) + xorl %eax, %eax + movups %xmm8, 112(%rsp) + movups %xmm9, 96(%rsp) + movups %xmm10, 80(%rsp) + movups %xmm11, 64(%rsp) + movups %xmm12, 48(%rsp) + movups %xmm13, 32(%rsp) + movups %xmm14, 16(%rsp) + movups %xmm15, (%rsp) + movq %rsi, 136(%rsp) + movq %rdi, 128(%rsp) + movq %r12, 152(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x18, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 144(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $4, %r12d + jl .LBL_1_7 + movups 112(%rsp), %xmm8 + cfi_restore(25) + movups 96(%rsp), %xmm9 + cfi_restore(26) + movups 80(%rsp), %xmm10 + cfi_restore(27) + movups 64(%rsp), %xmm11 + cfi_restore(28) + movups 48(%rsp), %xmm12 + cfi_restore(29) + movups 32(%rsp), %xmm13 + cfi_restore(30) + movups 16(%rsp), %xmm14 + cfi_restore(31) + movups (%rsp), %xmm15 + cfi_restore(32) + movq 136(%rsp), %rsi + cfi_restore(4) + movq 128(%rsp), %rdi + cfi_restore(5) + movq 152(%rsp), %r12 + cfi_restore(12) + movq 144(%rsp), %r13 + cfi_restore(13) + movups 320(%rsp), %xmm0 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x18, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 192(%rsp,%r12,4), %rdi + lea 256(%rsp,%r12,4), %rsi + lea 320(%rsp,%r12,4), %rdx + call __svml_shypot_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVbN4vv_hypotf_sse4) + + .align 16,0x90 + +__svml_shypot_cout_rare_internal: + + cfi_startproc + + movq %rdx, %r8 + movzwl 2(%rdi), %eax + andl $32640, %eax + cmpl $32640, %eax + je .LBL_2_14 + movzwl 2(%rsi), %eax + andl $32640, %eax + cmpl $32640, %eax + je .LBL_2_13 + pxor %xmm2, %xmm2 + pxor %xmm1, %xmm1 + cvtss2sd (%rdi), %xmm2 + cvtss2sd (%rsi), %xmm1 + movsd 4096+_vmlsHypotHATab(%rip), %xmm0 + movzwl 4102+_vmlsHypotHATab(%rip), %edi + ucomisd %xmm0, %xmm2 + jp .LBL_2_4 + je .LBL_2_11 + +.LBL_2_4: + movsd %xmm2, -16(%rsp) + movsd %xmm1, -48(%rsp) + andb $127, -9(%rsp) + andb $127, -41(%rsp) + movsd -16(%rsp), %xmm8 + movsd -48(%rsp), %xmm1 + comisd %xmm8, %xmm1 + jbe .LBL_2_6 + movaps %xmm8, %xmm2 + movaps %xmm1, %xmm8 + movsd %xmm1, -16(%rsp) + movaps %xmm2, %xmm1 + +.LBL_2_6: + movzwl -10(%rsp), %edx + andl $-32753, %edi + andl $32752, %edx + shrl $4, %edx + negl %edx + movsd %xmm0, -56(%rsp) + movsd 4128+_vmlsHypotHATab(%rip), %xmm3 + lea 1025(%rdx), %esi + negl %esi + addl $1000, %esi + shrl $31, %esi + imull $-23, %esi, %eax + lea 1025(%rax,%rdx), %esi + lea 1023(%rsi), %ecx + andl $2047, %ecx + shll $4, %ecx + orl %ecx, %edi + movw %di, -50(%rsp) + movsd -56(%rsp), %xmm2 + mulsd %xmm2, %xmm8 + mulsd %xmm2, %xmm1 + mulsd %xmm8, %xmm3 + movsd %xmm3, -72(%rsp) + movsd -72(%rsp), %xmm4 + movsd %xmm8, -16(%rsp) + subsd %xmm8, %xmm4 + movsd %xmm4, -64(%rsp) + movsd -72(%rsp), %xmm6 + movsd -64(%rsp), %xmm5 + movsd %xmm1, -48(%rsp) + subsd %xmm5, %xmm6 + movsd %xmm6, -72(%rsp) + movsd -72(%rsp), %xmm7 + movzwl -10(%rsp), %r9d + subsd %xmm7, %xmm8 + movzwl -42(%rsp), %edi + andl $32752, %r9d + andl $32752, %edi + shrl $4, %r9d + shrl $4, %edi + movsd %xmm8, -64(%rsp) + subl %edi, %r9d + movsd -72(%rsp), %xmm7 + movsd -64(%rsp), %xmm8 + cmpl $6, %r9d + jle .LBL_2_8 + movaps %xmm1, %xmm2 + jmp .LBL_2_9 + +.LBL_2_8: + movsd -48(%rsp), %xmm1 + movsd 4128+_vmlsHypotHATab(%rip), %xmm0 + movaps %xmm1, %xmm6 + mulsd %xmm1, %xmm0 + movsd %xmm0, -72(%rsp) + movsd -72(%rsp), %xmm2 + subsd -48(%rsp), %xmm2 + movsd %xmm2, -64(%rsp) + movsd -72(%rsp), %xmm4 + movsd -64(%rsp), %xmm3 + subsd %xmm3, %xmm4 + movsd %xmm4, -72(%rsp) + movsd -72(%rsp), %xmm5 + subsd %xmm5, %xmm6 + movsd %xmm6, -64(%rsp) + movsd -72(%rsp), %xmm0 + movsd -64(%rsp), %xmm2 + +.LBL_2_9: + movsd -16(%rsp), %xmm5 + movaps %xmm0, %xmm4 + mulsd %xmm0, %xmm4 + addsd %xmm1, %xmm0 + addsd %xmm7, %xmm5 + mulsd %xmm2, %xmm0 + mulsd %xmm5, %xmm8 + movaps %xmm7, %xmm3 + negl %esi + mulsd %xmm7, %xmm3 + addsd %xmm8, %xmm0 + movq 4112+_vmlsHypotHATab(%rip), %r11 + movq %r11, %r9 + lea _vmlsHypotHATab(%rip), %rdx + addsd %xmm4, %xmm3 + addl $1023, %esi + addsd %xmm0, %xmm3 + movsd %xmm3, -56(%rsp) + andl $2047, %esi + movzwl -50(%rsp), %ecx + andl $32752, %ecx + shrl $4, %ecx + addl $-1023, %ecx + movl %ecx, %eax + andl $1, %eax + subl %eax, %ecx + shrl $1, %ecx + movsd %xmm3, -48(%rsp) + movzwl -42(%rsp), %edi + andl $-32753, %edi + shrq $48, %r9 + lea 1023(%rcx), %r10d + addl %ecx, %ecx + addl $16368, %edi + negl %ecx + andl $2047, %r10d + addl $1023, %ecx + andl $2047, %ecx + andl $-32753, %r9d + movw %di, -42(%rsp) + shll $4, %r10d + shll $4, %ecx + orl %r9d, %r10d + shll $4, %esi + orl %r9d, %ecx + movsd -48(%rsp), %xmm2 + orl %esi, %r9d + movl -44(%rsp), %esi + mulsd 4112(%rdx,%rax,8), %xmm2 + andl $1048575, %esi + shrl $12, %esi + shll $8, %eax + addl %eax, %esi + movsd (%rdx,%rsi,8), %xmm6 + movsd 4104+_vmlsHypotHATab(%rip), %xmm1 + mulsd %xmm6, %xmm2 + mulsd %xmm6, %xmm1 + movaps %xmm2, %xmm7 + mulsd %xmm1, %xmm7 + movsd 4104+_vmlsHypotHATab(%rip), %xmm9 + movsd 4104+_vmlsHypotHATab(%rip), %xmm12 + subsd %xmm7, %xmm9 + movaps %xmm9, %xmm8 + mulsd %xmm2, %xmm9 + mulsd %xmm1, %xmm8 + addsd %xmm9, %xmm2 + addsd %xmm8, %xmm1 + movaps %xmm2, %xmm10 + movaps %xmm1, %xmm11 + mulsd %xmm1, %xmm10 + movsd 4104+_vmlsHypotHATab(%rip), %xmm0 + subsd %xmm10, %xmm12 + mulsd %xmm12, %xmm11 + mulsd %xmm2, %xmm12 + addsd %xmm11, %xmm1 + addsd %xmm12, %xmm2 + movaps %xmm2, %xmm13 + movaps %xmm2, %xmm14 + mulsd %xmm1, %xmm13 + movsd 4128+_vmlsHypotHATab(%rip), %xmm15 + subsd %xmm13, %xmm0 + mulsd %xmm0, %xmm14 + mulsd %xmm1, %xmm0 + addsd %xmm14, %xmm2 + addsd %xmm0, %xmm1 + mulsd %xmm2, %xmm15 + movsd %xmm15, -72(%rsp) + movaps %xmm2, %xmm8 + movsd -72(%rsp), %xmm4 + movsd 4120+_vmlsHypotHATab(%rip), %xmm10 + subsd %xmm2, %xmm4 + movsd %xmm4, -64(%rsp) + movsd -72(%rsp), %xmm6 + movsd -64(%rsp), %xmm5 + movq %r11, -32(%rsp) + subsd %xmm5, %xmm6 + movsd %xmm6, -72(%rsp) + movsd -72(%rsp), %xmm7 + movw %cx, -26(%rsp) + subsd %xmm7, %xmm8 + movsd %xmm8, -64(%rsp) + movsd -72(%rsp), %xmm11 + movsd -64(%rsp), %xmm12 + movaps %xmm11, %xmm13 + mulsd %xmm12, %xmm10 + mulsd %xmm12, %xmm12 + xorps .FLT_52(%rip), %xmm13 + xorps .FLT_52(%rip), %xmm12 + subsd %xmm10, %xmm13 + mulsd %xmm11, %xmm13 + movsd -32(%rsp), %xmm9 + addsd %xmm12, %xmm13 + mulsd %xmm9, %xmm3 + movq %r11, -40(%rsp) + addsd %xmm13, %xmm3 + mulsd %xmm1, %xmm3 + movw %r10w, -34(%rsp) + addsd %xmm3, %xmm2 + mulsd -40(%rsp), %xmm2 + movq %r11, -24(%rsp) + movw %r9w, -18(%rsp) + mulsd -24(%rsp), %xmm2 + cvtsd2ss %xmm2, %xmm2 + movss %xmm2, (%r8) + +.LBL_2_10: + xorl %eax, %eax + ret + +.LBL_2_11: + ucomisd %xmm0, %xmm1 + jne .LBL_2_4 + jp .LBL_2_4 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_13: + movss (%rsi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_14: + movzwl 2(%rsi), %eax + andl $32640, %eax + cmpl $32640, %eax + je .LBL_2_16 + +.LBL_2_15: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_16: + movl (%rdi), %eax + testl $8388607, %eax + je .LBL_2_22 + testl $8388607, (%rsi) + je .LBL_2_19 + movss (%rdi), %xmm0 + mulss (%rsi), %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_19: + testl $4194304, %eax + jne .LBL_2_13 + movsd 4112+_vmlsHypotHATab(%rip), %xmm0 + cvtsd2ss %xmm0, %xmm0 + mulss (%rdi), %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_22: + movl (%rsi), %eax + testl $8388607, %eax + je .LBL_2_15 + testl $4194304, %eax + jne .LBL_2_15 + movsd 4112+_vmlsHypotHATab(%rip), %xmm0 + cvtsd2ss %xmm0, %xmm0 + mulss (%rsi), %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + + cfi_endproc + + .type __svml_shypot_cout_rare_internal,@function + .size __svml_shypot_cout_rare_internal,.-__svml_shypot_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_shypot_data_internal: + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_shypot_data_internal,@object + .size __svml_shypot_data_internal,576 + .align 32 + +_vmlsHypotHATab: + .long 0 + .long 1072693248 + .long 0 + .long 1072689152 + .long 0 + .long 1072685056 + .long 0 + .long 1072680960 + .long 0 + .long 1072676864 + .long 0 + .long 1072672768 + .long 0 + .long 1072668672 + .long 0 + .long 1072665600 + .long 0 + .long 1072661504 + .long 0 + .long 1072657408 + .long 0 + .long 1072653312 + .long 0 + .long 1072649216 + .long 0 + .long 1072646144 + .long 0 + .long 1072642048 + .long 0 + .long 1072637952 + .long 0 + .long 1072634880 + .long 0 + .long 1072630784 + .long 0 + .long 1072626688 + .long 0 + .long 1072623616 + .long 0 + .long 1072619520 + .long 0 + .long 1072615424 + .long 0 + .long 1072612352 + .long 0 + .long 1072608256 + .long 0 + .long 1072605184 + .long 0 + .long 1072601088 + .long 0 + .long 1072598016 + .long 0 + .long 1072593920 + .long 0 + .long 1072590848 + .long 0 + .long 1072586752 + .long 0 + .long 1072583680 + .long 0 + .long 1072580608 + .long 0 + .long 1072576512 + .long 0 + .long 1072573440 + .long 0 + .long 1072570368 + .long 0 + .long 1072566272 + .long 0 + .long 1072563200 + .long 0 + .long 1072560128 + .long 0 + .long 1072556032 + .long 0 + .long 1072552960 + .long 0 + .long 1072549888 + .long 0 + .long 1072546816 + .long 0 + .long 1072542720 + .long 0 + .long 1072539648 + .long 0 + .long 1072536576 + .long 0 + .long 1072533504 + .long 0 + .long 1072530432 + .long 0 + .long 1072527360 + .long 0 + .long 1072523264 + .long 0 + .long 1072520192 + .long 0 + .long 1072517120 + .long 0 + .long 1072514048 + .long 0 + .long 1072510976 + .long 0 + .long 1072507904 + .long 0 + .long 1072504832 + .long 0 + .long 1072501760 + .long 0 + .long 1072498688 + .long 0 + .long 1072495616 + .long 0 + .long 1072492544 + .long 0 + .long 1072489472 + .long 0 + .long 1072486400 + .long 0 + .long 1072483328 + .long 0 + .long 1072480256 + .long 0 + .long 1072478208 + .long 0 + .long 1072475136 + .long 0 + .long 1072472064 + .long 0 + .long 1072468992 + .long 0 + .long 1072465920 + .long 0 + .long 1072462848 + .long 0 + .long 1072459776 + .long 0 + .long 1072457728 + .long 0 + .long 1072454656 + .long 0 + .long 1072451584 + .long 0 + .long 1072448512 + .long 0 + .long 1072446464 + .long 0 + .long 1072443392 + .long 0 + .long 1072440320 + .long 0 + .long 1072437248 + .long 0 + .long 1072435200 + .long 0 + .long 1072432128 + .long 0 + .long 1072429056 + .long 0 + .long 1072427008 + .long 0 + .long 1072423936 + .long 0 + .long 1072420864 + .long 0 + .long 1072418816 + .long 0 + .long 1072415744 + .long 0 + .long 1072412672 + .long 0 + .long 1072410624 + .long 0 + .long 1072407552 + .long 0 + .long 1072405504 + .long 0 + .long 1072402432 + .long 0 + .long 1072400384 + .long 0 + .long 1072397312 + .long 0 + .long 1072395264 + .long 0 + .long 1072392192 + .long 0 + .long 1072390144 + .long 0 + .long 1072387072 + .long 0 + .long 1072385024 + .long 0 + .long 1072381952 + .long 0 + .long 1072379904 + .long 0 + .long 1072376832 + .long 0 + .long 1072374784 + .long 0 + .long 1072371712 + .long 0 + .long 1072369664 + .long 0 + .long 1072366592 + .long 0 + .long 1072364544 + .long 0 + .long 1072362496 + .long 0 + .long 1072359424 + .long 0 + .long 1072357376 + .long 0 + .long 1072355328 + .long 0 + .long 1072352256 + .long 0 + .long 1072350208 + .long 0 + .long 1072347136 + .long 0 + .long 1072345088 + .long 0 + .long 1072343040 + .long 0 + .long 1072340992 + .long 0 + .long 1072337920 + .long 0 + .long 1072335872 + .long 0 + .long 1072333824 + .long 0 + .long 1072330752 + .long 0 + .long 1072328704 + .long 0 + .long 1072326656 + .long 0 + .long 1072324608 + .long 0 + .long 1072321536 + .long 0 + .long 1072319488 + .long 0 + .long 1072317440 + .long 0 + .long 1072315392 + .long 0 + .long 1072313344 + .long 0 + .long 1072310272 + .long 0 + .long 1072308224 + .long 0 + .long 1072306176 + .long 0 + .long 1072304128 + .long 0 + .long 1072302080 + .long 0 + .long 1072300032 + .long 0 + .long 1072296960 + .long 0 + .long 1072294912 + .long 0 + .long 1072292864 + .long 0 + .long 1072290816 + .long 0 + .long 1072288768 + .long 0 + .long 1072286720 + .long 0 + .long 1072284672 + .long 0 + .long 1072282624 + .long 0 + .long 1072280576 + .long 0 + .long 1072278528 + .long 0 + .long 1072275456 + .long 0 + .long 1072273408 + .long 0 + .long 1072271360 + .long 0 + .long 1072269312 + .long 0 + .long 1072267264 + .long 0 + .long 1072265216 + .long 0 + .long 1072263168 + .long 0 + .long 1072261120 + .long 0 + .long 1072259072 + .long 0 + .long 1072257024 + .long 0 + .long 1072254976 + .long 0 + .long 1072252928 + .long 0 + .long 1072250880 + .long 0 + .long 1072248832 + .long 0 + .long 1072246784 + .long 0 + .long 1072244736 + .long 0 + .long 1072243712 + .long 0 + .long 1072241664 + .long 0 + .long 1072239616 + .long 0 + .long 1072237568 + .long 0 + .long 1072235520 + .long 0 + .long 1072233472 + .long 0 + .long 1072231424 + .long 0 + .long 1072229376 + .long 0 + .long 1072227328 + .long 0 + .long 1072225280 + .long 0 + .long 1072223232 + .long 0 + .long 1072222208 + .long 0 + .long 1072220160 + .long 0 + .long 1072218112 + .long 0 + .long 1072216064 + .long 0 + .long 1072214016 + .long 0 + .long 1072211968 + .long 0 + .long 1072210944 + .long 0 + .long 1072208896 + .long 0 + .long 1072206848 + .long 0 + .long 1072204800 + .long 0 + .long 1072202752 + .long 0 + .long 1072201728 + .long 0 + .long 1072199680 + .long 0 + .long 1072197632 + .long 0 + .long 1072195584 + .long 0 + .long 1072193536 + .long 0 + .long 1072192512 + .long 0 + .long 1072190464 + .long 0 + .long 1072188416 + .long 0 + .long 1072186368 + .long 0 + .long 1072185344 + .long 0 + .long 1072183296 + .long 0 + .long 1072181248 + .long 0 + .long 1072179200 + .long 0 + .long 1072178176 + .long 0 + .long 1072176128 + .long 0 + .long 1072174080 + .long 0 + .long 1072173056 + .long 0 + .long 1072171008 + .long 0 + .long 1072168960 + .long 0 + .long 1072167936 + .long 0 + .long 1072165888 + .long 0 + .long 1072163840 + .long 0 + .long 1072161792 + .long 0 + .long 1072160768 + .long 0 + .long 1072158720 + .long 0 + .long 1072157696 + .long 0 + .long 1072155648 + .long 0 + .long 1072153600 + .long 0 + .long 1072152576 + .long 0 + .long 1072150528 + .long 0 + .long 1072148480 + .long 0 + .long 1072147456 + .long 0 + .long 1072145408 + .long 0 + .long 1072143360 + .long 0 + .long 1072142336 + .long 0 + .long 1072140288 + .long 0 + .long 1072139264 + .long 0 + .long 1072137216 + .long 0 + .long 1072135168 + .long 0 + .long 1072134144 + .long 0 + .long 1072132096 + .long 0 + .long 1072131072 + .long 0 + .long 1072129024 + .long 0 + .long 1072128000 + .long 0 + .long 1072125952 + .long 0 + .long 1072124928 + .long 0 + .long 1072122880 + .long 0 + .long 1072120832 + .long 0 + .long 1072119808 + .long 0 + .long 1072117760 + .long 0 + .long 1072116736 + .long 0 + .long 1072114688 + .long 0 + .long 1072113664 + .long 0 + .long 1072111616 + .long 0 + .long 1072110592 + .long 0 + .long 1072108544 + .long 0 + .long 1072107520 + .long 0 + .long 1072105472 + .long 0 + .long 1072104448 + .long 0 + .long 1072102400 + .long 0 + .long 1072101376 + .long 0 + .long 1072099328 + .long 0 + .long 1072098304 + .long 0 + .long 1072096256 + .long 0 + .long 1072095232 + .long 0 + .long 1072094208 + .long 0 + .long 1072092160 + .long 0 + .long 1072091136 + .long 0 + .long 1072089088 + .long 0 + .long 1072088064 + .long 0 + .long 1072086016 + .long 0 + .long 1072084992 + .long 0 + .long 1072082944 + .long 0 + .long 1072081920 + .long 0 + .long 1072080896 + .long 0 + .long 1072078848 + .long 0 + .long 1072075776 + .long 0 + .long 1072073728 + .long 0 + .long 1072070656 + .long 0 + .long 1072067584 + .long 0 + .long 1072064512 + .long 0 + .long 1072061440 + .long 0 + .long 1072059392 + .long 0 + .long 1072056320 + .long 0 + .long 1072053248 + .long 0 + .long 1072051200 + .long 0 + .long 1072048128 + .long 0 + .long 1072045056 + .long 0 + .long 1072043008 + .long 0 + .long 1072039936 + .long 0 + .long 1072037888 + .long 0 + .long 1072034816 + .long 0 + .long 1072031744 + .long 0 + .long 1072029696 + .long 0 + .long 1072026624 + .long 0 + .long 1072024576 + .long 0 + .long 1072021504 + .long 0 + .long 1072019456 + .long 0 + .long 1072016384 + .long 0 + .long 1072014336 + .long 0 + .long 1072011264 + .long 0 + .long 1072009216 + .long 0 + .long 1072006144 + .long 0 + .long 1072004096 + .long 0 + .long 1072002048 + .long 0 + .long 1071998976 + .long 0 + .long 1071996928 + .long 0 + .long 1071993856 + .long 0 + .long 1071991808 + .long 0 + .long 1071989760 + .long 0 + .long 1071986688 + .long 0 + .long 1071984640 + .long 0 + .long 1071982592 + .long 0 + .long 1071979520 + .long 0 + .long 1071977472 + .long 0 + .long 1071975424 + .long 0 + .long 1071972352 + .long 0 + .long 1071970304 + .long 0 + .long 1071968256 + .long 0 + .long 1071966208 + .long 0 + .long 1071964160 + .long 0 + .long 1071961088 + .long 0 + .long 1071959040 + .long 0 + .long 1071956992 + .long 0 + .long 1071954944 + .long 0 + .long 1071952896 + .long 0 + .long 1071949824 + .long 0 + .long 1071947776 + .long 0 + .long 1071945728 + .long 0 + .long 1071943680 + .long 0 + .long 1071941632 + .long 0 + .long 1071939584 + .long 0 + .long 1071937536 + .long 0 + .long 1071935488 + .long 0 + .long 1071933440 + .long 0 + .long 1071930368 + .long 0 + .long 1071928320 + .long 0 + .long 1071926272 + .long 0 + .long 1071924224 + .long 0 + .long 1071922176 + .long 0 + .long 1071920128 + .long 0 + .long 1071918080 + .long 0 + .long 1071916032 + .long 0 + .long 1071913984 + .long 0 + .long 1071911936 + .long 0 + .long 1071909888 + .long 0 + .long 1071907840 + .long 0 + .long 1071905792 + .long 0 + .long 1071903744 + .long 0 + .long 1071901696 + .long 0 + .long 1071900672 + .long 0 + .long 1071898624 + .long 0 + .long 1071896576 + .long 0 + .long 1071894528 + .long 0 + .long 1071892480 + .long 0 + .long 1071890432 + .long 0 + .long 1071888384 + .long 0 + .long 1071886336 + .long 0 + .long 1071884288 + .long 0 + .long 1071883264 + .long 0 + .long 1071881216 + .long 0 + .long 1071879168 + .long 0 + .long 1071877120 + .long 0 + .long 1071875072 + .long 0 + .long 1071873024 + .long 0 + .long 1071872000 + .long 0 + .long 1071869952 + .long 0 + .long 1071867904 + .long 0 + .long 1071865856 + .long 0 + .long 1071864832 + .long 0 + .long 1071862784 + .long 0 + .long 1071860736 + .long 0 + .long 1071858688 + .long 0 + .long 1071856640 + .long 0 + .long 1071855616 + .long 0 + .long 1071853568 + .long 0 + .long 1071851520 + .long 0 + .long 1071850496 + .long 0 + .long 1071848448 + .long 0 + .long 1071846400 + .long 0 + .long 1071844352 + .long 0 + .long 1071843328 + .long 0 + .long 1071841280 + .long 0 + .long 1071839232 + .long 0 + .long 1071838208 + .long 0 + .long 1071836160 + .long 0 + .long 1071834112 + .long 0 + .long 1071833088 + .long 0 + .long 1071831040 + .long 0 + .long 1071830016 + .long 0 + .long 1071827968 + .long 0 + .long 1071825920 + .long 0 + .long 1071824896 + .long 0 + .long 1071822848 + .long 0 + .long 1071821824 + .long 0 + .long 1071819776 + .long 0 + .long 1071817728 + .long 0 + .long 1071816704 + .long 0 + .long 1071814656 + .long 0 + .long 1071813632 + .long 0 + .long 1071811584 + .long 0 + .long 1071810560 + .long 0 + .long 1071808512 + .long 0 + .long 1071806464 + .long 0 + .long 1071805440 + .long 0 + .long 1071803392 + .long 0 + .long 1071802368 + .long 0 + .long 1071800320 + .long 0 + .long 1071799296 + .long 0 + .long 1071797248 + .long 0 + .long 1071796224 + .long 0 + .long 1071794176 + .long 0 + .long 1071793152 + .long 0 + .long 1071791104 + .long 0 + .long 1071790080 + .long 0 + .long 1071788032 + .long 0 + .long 1071787008 + .long 0 + .long 1071784960 + .long 0 + .long 1071783936 + .long 0 + .long 1071782912 + .long 0 + .long 1071780864 + .long 0 + .long 1071779840 + .long 0 + .long 1071777792 + .long 0 + .long 1071776768 + .long 0 + .long 1071774720 + .long 0 + .long 1071773696 + .long 0 + .long 1071772672 + .long 0 + .long 1071770624 + .long 0 + .long 1071769600 + .long 0 + .long 1071767552 + .long 0 + .long 1071766528 + .long 0 + .long 1071765504 + .long 0 + .long 1071763456 + .long 0 + .long 1071762432 + .long 0 + .long 1071760384 + .long 0 + .long 1071759360 + .long 0 + .long 1071758336 + .long 0 + .long 1071756288 + .long 0 + .long 1071755264 + .long 0 + .long 1071754240 + .long 0 + .long 1071752192 + .long 0 + .long 1071751168 + .long 0 + .long 1071750144 + .long 0 + .long 1071748096 + .long 0 + .long 1071747072 + .long 0 + .long 1071746048 + .long 0 + .long 1071744000 + .long 0 + .long 1071742976 + .long 0 + .long 1071741952 + .long 0 + .long 1071739904 + .long 0 + .long 1071738880 + .long 0 + .long 1071737856 + .long 0 + .long 1071736832 + .long 0 + .long 1071734784 + .long 0 + .long 1071733760 + .long 0 + .long 1071732736 + .long 0 + .long 1071730688 + .long 0 + .long 1071729664 + .long 0 + .long 1071728640 + .long 0 + .long 1071727616 + .long 0 + .long 1071725568 + .long 0 + .long 1071724544 + .long 0 + .long 1071723520 + .long 0 + .long 1071722496 + .long 0 + .long 1071720448 + .long 0 + .long 1071719424 + .long 0 + .long 1071718400 + .long 0 + .long 1071717376 + .long 0 + .long 1071715328 + .long 0 + .long 1071714304 + .long 0 + .long 1071713280 + .long 0 + .long 1071712256 + .long 0 + .long 1071711232 + .long 0 + .long 1071709184 + .long 0 + .long 1071708160 + .long 0 + .long 1071707136 + .long 0 + .long 1071706112 + .long 0 + .long 1071705088 + .long 0 + .long 1071704064 + .long 0 + .long 1071702016 + .long 0 + .long 1071700992 + .long 0 + .long 1071699968 + .long 0 + .long 1071698944 + .long 0 + .long 1071697920 + .long 0 + .long 1071696896 + .long 0 + .long 1071694848 + .long 0 + .long 1071693824 + .long 0 + .long 1071692800 + .long 0 + .long 1071691776 + .long 0 + .long 1071690752 + .long 0 + .long 1071689728 + .long 0 + .long 1071688704 + .long 0 + .long 1071686656 + .long 0 + .long 1071685632 + .long 0 + .long 1071684608 + .long 0 + .long 1071683584 + .long 0 + .long 1071682560 + .long 0 + .long 1071681536 + .long 0 + .long 1071680512 + .long 0 + .long 1071679488 + .long 0 + .long 1071677440 + .long 0 + .long 1071676416 + .long 0 + .long 1071675392 + .long 0 + .long 1071674368 + .long 0 + .long 1071673344 + .long 0 + .long 1071672320 + .long 0 + .long 1071671296 + .long 0 + .long 1071670272 + .long 0 + .long 1071669248 + .long 0 + .long 1071668224 + .long 0 + .long 1071667200 + .long 0 + .long 1071666176 + .long 0 + .long 1071665152 + .long 0 + .long 1071663104 + .long 0 + .long 1071662080 + .long 0 + .long 1071661056 + .long 0 + .long 1071660032 + .long 0 + .long 1071659008 + .long 0 + .long 1071657984 + .long 0 + .long 1071656960 + .long 0 + .long 1071655936 + .long 0 + .long 1071654912 + .long 0 + .long 1071653888 + .long 0 + .long 1071652864 + .long 0 + .long 1071651840 + .long 0 + .long 1071650816 + .long 0 + .long 1071649792 + .long 0 + .long 1071648768 + .long 0 + .long 1071647744 + .long 0 + .long 1071646720 + .long 0 + .long 1071645696 + .long 0 + .long 0 + .long 0 + .long 1071644672 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 33554432 + .long 1101004800 + .type _vmlsHypotHATab,@object + .size _vmlsHypotHATab,4136 + .space 472, 0x00 + .align 16 + +.FLT_52: + .long 0x00000000,0x80000000,0x00000000,0x00000000 + .type .FLT_52,@object + .size .FLT_52,16 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S new file mode 100644 index 0000000000..d37556e331 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized hypotf. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN8vv_hypotf _ZGVdN8vv_hypotf_sse_wrapper +#include "../svml_s_hypotf8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c new file mode 100644 index 0000000000..6cc497e73d --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized sinf, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN8vv_hypotf +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN8vv_hypotf, __GI__ZGVdN8vv_hypotf, + __redirect__ZGVdN8vv_hypotf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S new file mode 100644 index 0000000000..29ae4a81e5 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S @@ -0,0 +1,1943 @@ +/* Function hypotf vectorized with AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * HIGH LEVEL OVERVIEW + * + * Calculate z = (x*x+y*y) + * Calculate reciplicle sqrt (z) + * Calculate make two NR iterations + * + * ALGORITHM DETAILS + * + * Multiprecision branch for _HA_ only + * Remove sigm from both arguments + * Find maximum (_x) and minimum (_y) (by abs value) between arguments + * Split _x int _a and _b for multiprecision + * If _x >> _y we will we will not split _y for multiprecision + * all _y will be put into lower part (_d) and higher part (_c = 0) + * Fixing _hilo_mask for the case _x >> _y + * Split _y into _c and _d for multiprecision with fixed mask + * + * compute Hi and Lo parts of _z = _x*_x + _y*_y + * + * _zHi = _a*_a + _c*_c + * _zLo = (_x + _a)*_b + _d*_y + _d*_c + * _z = _zHi + _zLo + * + * No multiprecision branch for _LA_ and _EP_ + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2 + * + * Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout + * + * Compute resciplicle sqrt s0 ~ 1.0/sqrt(_z), + * that multiplied by _z, is final result for _EP_ version. + * + * First iteration (or zero iteration): + * s = z * s0 + * h = .5 * s0 + * d = s * h - .5 + * + * Second iteration: + * h = d * h + h + * s = s * d + s + * d = s * s - z (in multiprecision for _HA_) + * + * result = s - h * d + * + * EP version of the function can be implemented as y[i]=sqrt(a[i]^2+b[i]^2) + * with all intermediate operations done in target precision for i=1,..,n. + * It can return result y[i]=0 in case a[i]^2 and b[i]^2 underflow in target + * precision (for some i). It can return result y[i]=NAN in case + * a[i]^2+b[i]^2 overflow in target precision, for some i. It can return + * result y[i]=NAN in case a[i] or b[i] is infinite, for some i. + * + * + */ + +#include + + .text +ENTRY(_ZGVdN8vv_hypotf_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + vmovups %ymm8, 32(%rsp) + +/* Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout */ + vmovups 384+__svml_shypot_data_internal(%rip), %ymm2 + +/* + * Variables + * Defines + * Constants loading + */ + vmovups 192+__svml_shypot_data_internal(%rip), %ymm7 + vmovups %ymm15, 352(%rsp) + vmovups %ymm14, 320(%rsp) + vmovups %ymm13, 288(%rsp) + vmovups %ymm12, 256(%rsp) + vmovups %ymm11, 224(%rsp) + vmovups %ymm10, 160(%rsp) + vmovups %ymm9, 96(%rsp) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x22 + vmovaps %ymm1, %ymm8 + +/* + * Implementation + * Multiprecision branch for _HA_ only + * No multiprecision branch for _LA_ + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2 + */ + vmulps %ymm0, %ymm0, %ymm1 + vfmadd231ps %ymm8, %ymm8, %ymm1 + +/* _s0 ~ 1.0/sqrt(_z) */ + vrsqrtps %ymm1, %ymm6 + vpcmpgtd %ymm1, %ymm2, %ymm3 + vpcmpgtd 448+__svml_shypot_data_internal(%rip), %ymm1, %ymm4 + vpor %ymm4, %ymm3, %ymm5 + +/* First iteration */ + vmulps %ymm1, %ymm6, %ymm2 + vmulps %ymm7, %ymm6, %ymm3 + vfnmadd231ps %ymm2, %ymm3, %ymm7 + vfmadd213ps %ymm2, %ymm7, %ymm2 + +/* Second iteration */ + vfmadd132ps %ymm7, %ymm3, %ymm3 + +/* Finish second iteration in native precision for _LA_ */ + vfmsub231ps %ymm2, %ymm2, %ymm1 + vmovmskps %ymm5, %edx + vfnmadd213ps %ymm2, %ymm1, %ymm3 + +/* The end of implementation */ + testl %edx, %edx + jne .LBL_1_3 + +.LBL_1_2: + vmovups 32(%rsp), %ymm8 + cfi_restore(91) + vmovups 96(%rsp), %ymm9 + cfi_restore(92) + vmovups 160(%rsp), %ymm10 + cfi_restore(93) + vmovups 224(%rsp), %ymm11 + cfi_restore(94) + vmovups 256(%rsp), %ymm12 + cfi_restore(95) + vmovups 288(%rsp), %ymm13 + cfi_restore(96) + vmovups 320(%rsp), %ymm14 + cfi_restore(97) + vmovups 352(%rsp), %ymm15 + cfi_restore(98) + vmovaps %ymm3, %ymm0 + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x22 + +.LBL_1_3: + vmovups %ymm0, 64(%rsp) + vmovups %ymm8, 128(%rsp) + vmovups %ymm3, 192(%rsp) + je .LBL_1_2 + xorl %eax, %eax + vzeroupper + movq %rsi, 8(%rsp) + movq %rdi, (%rsp) + movq %r12, 24(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 16(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r13d + +.LBL_1_7: + btl %r12d, %r13d + jc .LBL_1_10 + +.LBL_1_8: + incl %r12d + cmpl $8, %r12d + jl .LBL_1_7 + movq 8(%rsp), %rsi + cfi_restore(4) + movq (%rsp), %rdi + cfi_restore(5) + movq 24(%rsp), %r12 + cfi_restore(12) + movq 16(%rsp), %r13 + cfi_restore(13) + vmovups 192(%rsp), %ymm3 + jmp .LBL_1_2 + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + +.LBL_1_10: + lea 64(%rsp,%r12,4), %rdi + lea 128(%rsp,%r12,4), %rsi + lea 192(%rsp,%r12,4), %rdx + call __svml_shypot_cout_rare_internal + jmp .LBL_1_8 + +END(_ZGVdN8vv_hypotf_avx2) + + .align 16,0x90 + +__svml_shypot_cout_rare_internal: + + cfi_startproc + + movq %rdx, %r8 + movzwl 2(%rdi), %eax + andl $32640, %eax + cmpl $32640, %eax + je .LBL_2_14 + movzwl 2(%rsi), %eax + andl $32640, %eax + cmpl $32640, %eax + je .LBL_2_13 + pxor %xmm2, %xmm2 + pxor %xmm1, %xmm1 + cvtss2sd (%rdi), %xmm2 + cvtss2sd (%rsi), %xmm1 + movsd 4096+_vmlsHypotHATab(%rip), %xmm0 + movzwl 4102+_vmlsHypotHATab(%rip), %edi + ucomisd %xmm0, %xmm2 + jp .LBL_2_4 + je .LBL_2_11 + +.LBL_2_4: + movsd %xmm2, -16(%rsp) + movsd %xmm1, -48(%rsp) + andb $127, -9(%rsp) + andb $127, -41(%rsp) + movsd -16(%rsp), %xmm8 + movsd -48(%rsp), %xmm1 + comisd %xmm8, %xmm1 + jbe .LBL_2_6 + movaps %xmm8, %xmm2 + movaps %xmm1, %xmm8 + movsd %xmm1, -16(%rsp) + movaps %xmm2, %xmm1 + +.LBL_2_6: + movzwl -10(%rsp), %edx + andl $-32753, %edi + andl $32752, %edx + shrl $4, %edx + negl %edx + movsd %xmm0, -56(%rsp) + movsd 4128+_vmlsHypotHATab(%rip), %xmm3 + lea 1025(%rdx), %esi + negl %esi + addl $1000, %esi + shrl $31, %esi + imull $-23, %esi, %eax + lea 1025(%rax,%rdx), %esi + lea 1023(%rsi), %ecx + andl $2047, %ecx + shll $4, %ecx + orl %ecx, %edi + movw %di, -50(%rsp) + movsd -56(%rsp), %xmm2 + mulsd %xmm2, %xmm8 + mulsd %xmm2, %xmm1 + mulsd %xmm8, %xmm3 + movsd %xmm3, -72(%rsp) + movsd -72(%rsp), %xmm4 + movsd %xmm8, -16(%rsp) + subsd %xmm8, %xmm4 + movsd %xmm4, -64(%rsp) + movsd -72(%rsp), %xmm6 + movsd -64(%rsp), %xmm5 + movsd %xmm1, -48(%rsp) + subsd %xmm5, %xmm6 + movsd %xmm6, -72(%rsp) + movsd -72(%rsp), %xmm7 + movzwl -10(%rsp), %r9d + subsd %xmm7, %xmm8 + movzwl -42(%rsp), %edi + andl $32752, %r9d + andl $32752, %edi + shrl $4, %r9d + shrl $4, %edi + movsd %xmm8, -64(%rsp) + subl %edi, %r9d + movsd -72(%rsp), %xmm7 + movsd -64(%rsp), %xmm8 + cmpl $6, %r9d + jle .LBL_2_8 + movaps %xmm1, %xmm2 + jmp .LBL_2_9 + +.LBL_2_8: + movsd -48(%rsp), %xmm1 + movsd 4128+_vmlsHypotHATab(%rip), %xmm0 + movaps %xmm1, %xmm6 + mulsd %xmm1, %xmm0 + movsd %xmm0, -72(%rsp) + movsd -72(%rsp), %xmm2 + subsd -48(%rsp), %xmm2 + movsd %xmm2, -64(%rsp) + movsd -72(%rsp), %xmm4 + movsd -64(%rsp), %xmm3 + subsd %xmm3, %xmm4 + movsd %xmm4, -72(%rsp) + movsd -72(%rsp), %xmm5 + subsd %xmm5, %xmm6 + movsd %xmm6, -64(%rsp) + movsd -72(%rsp), %xmm0 + movsd -64(%rsp), %xmm2 + +.LBL_2_9: + movsd -16(%rsp), %xmm5 + movaps %xmm0, %xmm4 + mulsd %xmm0, %xmm4 + addsd %xmm1, %xmm0 + addsd %xmm7, %xmm5 + mulsd %xmm2, %xmm0 + mulsd %xmm5, %xmm8 + movaps %xmm7, %xmm3 + negl %esi + mulsd %xmm7, %xmm3 + addsd %xmm8, %xmm0 + movq 4112+_vmlsHypotHATab(%rip), %r11 + movq %r11, %r9 + lea _vmlsHypotHATab(%rip), %rdx + addsd %xmm4, %xmm3 + addl $1023, %esi + addsd %xmm0, %xmm3 + movsd %xmm3, -56(%rsp) + andl $2047, %esi + movzwl -50(%rsp), %ecx + andl $32752, %ecx + shrl $4, %ecx + addl $-1023, %ecx + movl %ecx, %eax + andl $1, %eax + subl %eax, %ecx + shrl $1, %ecx + movsd %xmm3, -48(%rsp) + movzwl -42(%rsp), %edi + andl $-32753, %edi + shrq $48, %r9 + lea 1023(%rcx), %r10d + addl %ecx, %ecx + addl $16368, %edi + negl %ecx + andl $2047, %r10d + addl $1023, %ecx + andl $2047, %ecx + andl $-32753, %r9d + movw %di, -42(%rsp) + shll $4, %r10d + shll $4, %ecx + orl %r9d, %r10d + shll $4, %esi + orl %r9d, %ecx + movsd -48(%rsp), %xmm2 + orl %esi, %r9d + movl -44(%rsp), %esi + mulsd 4112(%rdx,%rax,8), %xmm2 + andl $1048575, %esi + shrl $12, %esi + shll $8, %eax + addl %eax, %esi + movsd (%rdx,%rsi,8), %xmm6 + movsd 4104+_vmlsHypotHATab(%rip), %xmm1 + mulsd %xmm6, %xmm2 + mulsd %xmm6, %xmm1 + movaps %xmm2, %xmm7 + mulsd %xmm1, %xmm7 + movsd 4104+_vmlsHypotHATab(%rip), %xmm9 + movsd 4104+_vmlsHypotHATab(%rip), %xmm12 + subsd %xmm7, %xmm9 + movaps %xmm9, %xmm8 + mulsd %xmm2, %xmm9 + mulsd %xmm1, %xmm8 + addsd %xmm9, %xmm2 + addsd %xmm8, %xmm1 + movaps %xmm2, %xmm10 + movaps %xmm1, %xmm11 + mulsd %xmm1, %xmm10 + movsd 4104+_vmlsHypotHATab(%rip), %xmm0 + subsd %xmm10, %xmm12 + mulsd %xmm12, %xmm11 + mulsd %xmm2, %xmm12 + addsd %xmm11, %xmm1 + addsd %xmm12, %xmm2 + movaps %xmm2, %xmm13 + movaps %xmm2, %xmm14 + mulsd %xmm1, %xmm13 + movsd 4128+_vmlsHypotHATab(%rip), %xmm15 + subsd %xmm13, %xmm0 + mulsd %xmm0, %xmm14 + mulsd %xmm1, %xmm0 + addsd %xmm14, %xmm2 + addsd %xmm0, %xmm1 + mulsd %xmm2, %xmm15 + movsd %xmm15, -72(%rsp) + movaps %xmm2, %xmm8 + movsd -72(%rsp), %xmm4 + movsd 4120+_vmlsHypotHATab(%rip), %xmm10 + subsd %xmm2, %xmm4 + movsd %xmm4, -64(%rsp) + movsd -72(%rsp), %xmm6 + movsd -64(%rsp), %xmm5 + movq %r11, -32(%rsp) + subsd %xmm5, %xmm6 + movsd %xmm6, -72(%rsp) + movsd -72(%rsp), %xmm7 + movw %cx, -26(%rsp) + subsd %xmm7, %xmm8 + movsd %xmm8, -64(%rsp) + movsd -72(%rsp), %xmm11 + movsd -64(%rsp), %xmm12 + movaps %xmm11, %xmm13 + mulsd %xmm12, %xmm10 + mulsd %xmm12, %xmm12 + xorps .FLT_52(%rip), %xmm13 + xorps .FLT_52(%rip), %xmm12 + subsd %xmm10, %xmm13 + mulsd %xmm11, %xmm13 + movsd -32(%rsp), %xmm9 + addsd %xmm12, %xmm13 + mulsd %xmm9, %xmm3 + movq %r11, -40(%rsp) + addsd %xmm13, %xmm3 + mulsd %xmm1, %xmm3 + movw %r10w, -34(%rsp) + addsd %xmm3, %xmm2 + mulsd -40(%rsp), %xmm2 + movq %r11, -24(%rsp) + movw %r9w, -18(%rsp) + mulsd -24(%rsp), %xmm2 + cvtsd2ss %xmm2, %xmm2 + movss %xmm2, (%r8) + +.LBL_2_10: + xorl %eax, %eax + ret + +.LBL_2_11: + ucomisd %xmm0, %xmm1 + jne .LBL_2_4 + jp .LBL_2_4 + cvtsd2ss %xmm0, %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_13: + movss (%rsi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_14: + movzwl 2(%rsi), %eax + andl $32640, %eax + cmpl $32640, %eax + je .LBL_2_16 + +.LBL_2_15: + movss (%rdi), %xmm0 + mulss %xmm0, %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_16: + movl (%rdi), %eax + testl $8388607, %eax + je .LBL_2_22 + testl $8388607, (%rsi) + je .LBL_2_19 + movss (%rdi), %xmm0 + mulss (%rsi), %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_19: + testl $4194304, %eax + jne .LBL_2_13 + movsd 4112+_vmlsHypotHATab(%rip), %xmm0 + cvtsd2ss %xmm0, %xmm0 + mulss (%rdi), %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + +.LBL_2_22: + movl (%rsi), %eax + testl $8388607, %eax + je .LBL_2_15 + testl $4194304, %eax + jne .LBL_2_15 + movsd 4112+_vmlsHypotHATab(%rip), %xmm0 + cvtsd2ss %xmm0, %xmm0 + mulss (%rsi), %xmm0 + movss %xmm0, (%r8) + jmp .LBL_2_10 + + cfi_endproc + + .type __svml_shypot_cout_rare_internal,@function + .size __svml_shypot_cout_rare_internal,.-__svml_shypot_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_shypot_data_internal: + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .long 4294443008 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .long 4294959104 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .long 1056964608 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .long 2139095040 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .long 33554432 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .long 506462208 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .long 1621098496 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .long 1115422720 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_shypot_data_internal,@object + .size __svml_shypot_data_internal,576 + .align 32 + +_vmlsHypotHATab: + .long 0 + .long 1072693248 + .long 0 + .long 1072689152 + .long 0 + .long 1072685056 + .long 0 + .long 1072680960 + .long 0 + .long 1072676864 + .long 0 + .long 1072672768 + .long 0 + .long 1072668672 + .long 0 + .long 1072665600 + .long 0 + .long 1072661504 + .long 0 + .long 1072657408 + .long 0 + .long 1072653312 + .long 0 + .long 1072649216 + .long 0 + .long 1072646144 + .long 0 + .long 1072642048 + .long 0 + .long 1072637952 + .long 0 + .long 1072634880 + .long 0 + .long 1072630784 + .long 0 + .long 1072626688 + .long 0 + .long 1072623616 + .long 0 + .long 1072619520 + .long 0 + .long 1072615424 + .long 0 + .long 1072612352 + .long 0 + .long 1072608256 + .long 0 + .long 1072605184 + .long 0 + .long 1072601088 + .long 0 + .long 1072598016 + .long 0 + .long 1072593920 + .long 0 + .long 1072590848 + .long 0 + .long 1072586752 + .long 0 + .long 1072583680 + .long 0 + .long 1072580608 + .long 0 + .long 1072576512 + .long 0 + .long 1072573440 + .long 0 + .long 1072570368 + .long 0 + .long 1072566272 + .long 0 + .long 1072563200 + .long 0 + .long 1072560128 + .long 0 + .long 1072556032 + .long 0 + .long 1072552960 + .long 0 + .long 1072549888 + .long 0 + .long 1072546816 + .long 0 + .long 1072542720 + .long 0 + .long 1072539648 + .long 0 + .long 1072536576 + .long 0 + .long 1072533504 + .long 0 + .long 1072530432 + .long 0 + .long 1072527360 + .long 0 + .long 1072523264 + .long 0 + .long 1072520192 + .long 0 + .long 1072517120 + .long 0 + .long 1072514048 + .long 0 + .long 1072510976 + .long 0 + .long 1072507904 + .long 0 + .long 1072504832 + .long 0 + .long 1072501760 + .long 0 + .long 1072498688 + .long 0 + .long 1072495616 + .long 0 + .long 1072492544 + .long 0 + .long 1072489472 + .long 0 + .long 1072486400 + .long 0 + .long 1072483328 + .long 0 + .long 1072480256 + .long 0 + .long 1072478208 + .long 0 + .long 1072475136 + .long 0 + .long 1072472064 + .long 0 + .long 1072468992 + .long 0 + .long 1072465920 + .long 0 + .long 1072462848 + .long 0 + .long 1072459776 + .long 0 + .long 1072457728 + .long 0 + .long 1072454656 + .long 0 + .long 1072451584 + .long 0 + .long 1072448512 + .long 0 + .long 1072446464 + .long 0 + .long 1072443392 + .long 0 + .long 1072440320 + .long 0 + .long 1072437248 + .long 0 + .long 1072435200 + .long 0 + .long 1072432128 + .long 0 + .long 1072429056 + .long 0 + .long 1072427008 + .long 0 + .long 1072423936 + .long 0 + .long 1072420864 + .long 0 + .long 1072418816 + .long 0 + .long 1072415744 + .long 0 + .long 1072412672 + .long 0 + .long 1072410624 + .long 0 + .long 1072407552 + .long 0 + .long 1072405504 + .long 0 + .long 1072402432 + .long 0 + .long 1072400384 + .long 0 + .long 1072397312 + .long 0 + .long 1072395264 + .long 0 + .long 1072392192 + .long 0 + .long 1072390144 + .long 0 + .long 1072387072 + .long 0 + .long 1072385024 + .long 0 + .long 1072381952 + .long 0 + .long 1072379904 + .long 0 + .long 1072376832 + .long 0 + .long 1072374784 + .long 0 + .long 1072371712 + .long 0 + .long 1072369664 + .long 0 + .long 1072366592 + .long 0 + .long 1072364544 + .long 0 + .long 1072362496 + .long 0 + .long 1072359424 + .long 0 + .long 1072357376 + .long 0 + .long 1072355328 + .long 0 + .long 1072352256 + .long 0 + .long 1072350208 + .long 0 + .long 1072347136 + .long 0 + .long 1072345088 + .long 0 + .long 1072343040 + .long 0 + .long 1072340992 + .long 0 + .long 1072337920 + .long 0 + .long 1072335872 + .long 0 + .long 1072333824 + .long 0 + .long 1072330752 + .long 0 + .long 1072328704 + .long 0 + .long 1072326656 + .long 0 + .long 1072324608 + .long 0 + .long 1072321536 + .long 0 + .long 1072319488 + .long 0 + .long 1072317440 + .long 0 + .long 1072315392 + .long 0 + .long 1072313344 + .long 0 + .long 1072310272 + .long 0 + .long 1072308224 + .long 0 + .long 1072306176 + .long 0 + .long 1072304128 + .long 0 + .long 1072302080 + .long 0 + .long 1072300032 + .long 0 + .long 1072296960 + .long 0 + .long 1072294912 + .long 0 + .long 1072292864 + .long 0 + .long 1072290816 + .long 0 + .long 1072288768 + .long 0 + .long 1072286720 + .long 0 + .long 1072284672 + .long 0 + .long 1072282624 + .long 0 + .long 1072280576 + .long 0 + .long 1072278528 + .long 0 + .long 1072275456 + .long 0 + .long 1072273408 + .long 0 + .long 1072271360 + .long 0 + .long 1072269312 + .long 0 + .long 1072267264 + .long 0 + .long 1072265216 + .long 0 + .long 1072263168 + .long 0 + .long 1072261120 + .long 0 + .long 1072259072 + .long 0 + .long 1072257024 + .long 0 + .long 1072254976 + .long 0 + .long 1072252928 + .long 0 + .long 1072250880 + .long 0 + .long 1072248832 + .long 0 + .long 1072246784 + .long 0 + .long 1072244736 + .long 0 + .long 1072243712 + .long 0 + .long 1072241664 + .long 0 + .long 1072239616 + .long 0 + .long 1072237568 + .long 0 + .long 1072235520 + .long 0 + .long 1072233472 + .long 0 + .long 1072231424 + .long 0 + .long 1072229376 + .long 0 + .long 1072227328 + .long 0 + .long 1072225280 + .long 0 + .long 1072223232 + .long 0 + .long 1072222208 + .long 0 + .long 1072220160 + .long 0 + .long 1072218112 + .long 0 + .long 1072216064 + .long 0 + .long 1072214016 + .long 0 + .long 1072211968 + .long 0 + .long 1072210944 + .long 0 + .long 1072208896 + .long 0 + .long 1072206848 + .long 0 + .long 1072204800 + .long 0 + .long 1072202752 + .long 0 + .long 1072201728 + .long 0 + .long 1072199680 + .long 0 + .long 1072197632 + .long 0 + .long 1072195584 + .long 0 + .long 1072193536 + .long 0 + .long 1072192512 + .long 0 + .long 1072190464 + .long 0 + .long 1072188416 + .long 0 + .long 1072186368 + .long 0 + .long 1072185344 + .long 0 + .long 1072183296 + .long 0 + .long 1072181248 + .long 0 + .long 1072179200 + .long 0 + .long 1072178176 + .long 0 + .long 1072176128 + .long 0 + .long 1072174080 + .long 0 + .long 1072173056 + .long 0 + .long 1072171008 + .long 0 + .long 1072168960 + .long 0 + .long 1072167936 + .long 0 + .long 1072165888 + .long 0 + .long 1072163840 + .long 0 + .long 1072161792 + .long 0 + .long 1072160768 + .long 0 + .long 1072158720 + .long 0 + .long 1072157696 + .long 0 + .long 1072155648 + .long 0 + .long 1072153600 + .long 0 + .long 1072152576 + .long 0 + .long 1072150528 + .long 0 + .long 1072148480 + .long 0 + .long 1072147456 + .long 0 + .long 1072145408 + .long 0 + .long 1072143360 + .long 0 + .long 1072142336 + .long 0 + .long 1072140288 + .long 0 + .long 1072139264 + .long 0 + .long 1072137216 + .long 0 + .long 1072135168 + .long 0 + .long 1072134144 + .long 0 + .long 1072132096 + .long 0 + .long 1072131072 + .long 0 + .long 1072129024 + .long 0 + .long 1072128000 + .long 0 + .long 1072125952 + .long 0 + .long 1072124928 + .long 0 + .long 1072122880 + .long 0 + .long 1072120832 + .long 0 + .long 1072119808 + .long 0 + .long 1072117760 + .long 0 + .long 1072116736 + .long 0 + .long 1072114688 + .long 0 + .long 1072113664 + .long 0 + .long 1072111616 + .long 0 + .long 1072110592 + .long 0 + .long 1072108544 + .long 0 + .long 1072107520 + .long 0 + .long 1072105472 + .long 0 + .long 1072104448 + .long 0 + .long 1072102400 + .long 0 + .long 1072101376 + .long 0 + .long 1072099328 + .long 0 + .long 1072098304 + .long 0 + .long 1072096256 + .long 0 + .long 1072095232 + .long 0 + .long 1072094208 + .long 0 + .long 1072092160 + .long 0 + .long 1072091136 + .long 0 + .long 1072089088 + .long 0 + .long 1072088064 + .long 0 + .long 1072086016 + .long 0 + .long 1072084992 + .long 0 + .long 1072082944 + .long 0 + .long 1072081920 + .long 0 + .long 1072080896 + .long 0 + .long 1072078848 + .long 0 + .long 1072075776 + .long 0 + .long 1072073728 + .long 0 + .long 1072070656 + .long 0 + .long 1072067584 + .long 0 + .long 1072064512 + .long 0 + .long 1072061440 + .long 0 + .long 1072059392 + .long 0 + .long 1072056320 + .long 0 + .long 1072053248 + .long 0 + .long 1072051200 + .long 0 + .long 1072048128 + .long 0 + .long 1072045056 + .long 0 + .long 1072043008 + .long 0 + .long 1072039936 + .long 0 + .long 1072037888 + .long 0 + .long 1072034816 + .long 0 + .long 1072031744 + .long 0 + .long 1072029696 + .long 0 + .long 1072026624 + .long 0 + .long 1072024576 + .long 0 + .long 1072021504 + .long 0 + .long 1072019456 + .long 0 + .long 1072016384 + .long 0 + .long 1072014336 + .long 0 + .long 1072011264 + .long 0 + .long 1072009216 + .long 0 + .long 1072006144 + .long 0 + .long 1072004096 + .long 0 + .long 1072002048 + .long 0 + .long 1071998976 + .long 0 + .long 1071996928 + .long 0 + .long 1071993856 + .long 0 + .long 1071991808 + .long 0 + .long 1071989760 + .long 0 + .long 1071986688 + .long 0 + .long 1071984640 + .long 0 + .long 1071982592 + .long 0 + .long 1071979520 + .long 0 + .long 1071977472 + .long 0 + .long 1071975424 + .long 0 + .long 1071972352 + .long 0 + .long 1071970304 + .long 0 + .long 1071968256 + .long 0 + .long 1071966208 + .long 0 + .long 1071964160 + .long 0 + .long 1071961088 + .long 0 + .long 1071959040 + .long 0 + .long 1071956992 + .long 0 + .long 1071954944 + .long 0 + .long 1071952896 + .long 0 + .long 1071949824 + .long 0 + .long 1071947776 + .long 0 + .long 1071945728 + .long 0 + .long 1071943680 + .long 0 + .long 1071941632 + .long 0 + .long 1071939584 + .long 0 + .long 1071937536 + .long 0 + .long 1071935488 + .long 0 + .long 1071933440 + .long 0 + .long 1071930368 + .long 0 + .long 1071928320 + .long 0 + .long 1071926272 + .long 0 + .long 1071924224 + .long 0 + .long 1071922176 + .long 0 + .long 1071920128 + .long 0 + .long 1071918080 + .long 0 + .long 1071916032 + .long 0 + .long 1071913984 + .long 0 + .long 1071911936 + .long 0 + .long 1071909888 + .long 0 + .long 1071907840 + .long 0 + .long 1071905792 + .long 0 + .long 1071903744 + .long 0 + .long 1071901696 + .long 0 + .long 1071900672 + .long 0 + .long 1071898624 + .long 0 + .long 1071896576 + .long 0 + .long 1071894528 + .long 0 + .long 1071892480 + .long 0 + .long 1071890432 + .long 0 + .long 1071888384 + .long 0 + .long 1071886336 + .long 0 + .long 1071884288 + .long 0 + .long 1071883264 + .long 0 + .long 1071881216 + .long 0 + .long 1071879168 + .long 0 + .long 1071877120 + .long 0 + .long 1071875072 + .long 0 + .long 1071873024 + .long 0 + .long 1071872000 + .long 0 + .long 1071869952 + .long 0 + .long 1071867904 + .long 0 + .long 1071865856 + .long 0 + .long 1071864832 + .long 0 + .long 1071862784 + .long 0 + .long 1071860736 + .long 0 + .long 1071858688 + .long 0 + .long 1071856640 + .long 0 + .long 1071855616 + .long 0 + .long 1071853568 + .long 0 + .long 1071851520 + .long 0 + .long 1071850496 + .long 0 + .long 1071848448 + .long 0 + .long 1071846400 + .long 0 + .long 1071844352 + .long 0 + .long 1071843328 + .long 0 + .long 1071841280 + .long 0 + .long 1071839232 + .long 0 + .long 1071838208 + .long 0 + .long 1071836160 + .long 0 + .long 1071834112 + .long 0 + .long 1071833088 + .long 0 + .long 1071831040 + .long 0 + .long 1071830016 + .long 0 + .long 1071827968 + .long 0 + .long 1071825920 + .long 0 + .long 1071824896 + .long 0 + .long 1071822848 + .long 0 + .long 1071821824 + .long 0 + .long 1071819776 + .long 0 + .long 1071817728 + .long 0 + .long 1071816704 + .long 0 + .long 1071814656 + .long 0 + .long 1071813632 + .long 0 + .long 1071811584 + .long 0 + .long 1071810560 + .long 0 + .long 1071808512 + .long 0 + .long 1071806464 + .long 0 + .long 1071805440 + .long 0 + .long 1071803392 + .long 0 + .long 1071802368 + .long 0 + .long 1071800320 + .long 0 + .long 1071799296 + .long 0 + .long 1071797248 + .long 0 + .long 1071796224 + .long 0 + .long 1071794176 + .long 0 + .long 1071793152 + .long 0 + .long 1071791104 + .long 0 + .long 1071790080 + .long 0 + .long 1071788032 + .long 0 + .long 1071787008 + .long 0 + .long 1071784960 + .long 0 + .long 1071783936 + .long 0 + .long 1071782912 + .long 0 + .long 1071780864 + .long 0 + .long 1071779840 + .long 0 + .long 1071777792 + .long 0 + .long 1071776768 + .long 0 + .long 1071774720 + .long 0 + .long 1071773696 + .long 0 + .long 1071772672 + .long 0 + .long 1071770624 + .long 0 + .long 1071769600 + .long 0 + .long 1071767552 + .long 0 + .long 1071766528 + .long 0 + .long 1071765504 + .long 0 + .long 1071763456 + .long 0 + .long 1071762432 + .long 0 + .long 1071760384 + .long 0 + .long 1071759360 + .long 0 + .long 1071758336 + .long 0 + .long 1071756288 + .long 0 + .long 1071755264 + .long 0 + .long 1071754240 + .long 0 + .long 1071752192 + .long 0 + .long 1071751168 + .long 0 + .long 1071750144 + .long 0 + .long 1071748096 + .long 0 + .long 1071747072 + .long 0 + .long 1071746048 + .long 0 + .long 1071744000 + .long 0 + .long 1071742976 + .long 0 + .long 1071741952 + .long 0 + .long 1071739904 + .long 0 + .long 1071738880 + .long 0 + .long 1071737856 + .long 0 + .long 1071736832 + .long 0 + .long 1071734784 + .long 0 + .long 1071733760 + .long 0 + .long 1071732736 + .long 0 + .long 1071730688 + .long 0 + .long 1071729664 + .long 0 + .long 1071728640 + .long 0 + .long 1071727616 + .long 0 + .long 1071725568 + .long 0 + .long 1071724544 + .long 0 + .long 1071723520 + .long 0 + .long 1071722496 + .long 0 + .long 1071720448 + .long 0 + .long 1071719424 + .long 0 + .long 1071718400 + .long 0 + .long 1071717376 + .long 0 + .long 1071715328 + .long 0 + .long 1071714304 + .long 0 + .long 1071713280 + .long 0 + .long 1071712256 + .long 0 + .long 1071711232 + .long 0 + .long 1071709184 + .long 0 + .long 1071708160 + .long 0 + .long 1071707136 + .long 0 + .long 1071706112 + .long 0 + .long 1071705088 + .long 0 + .long 1071704064 + .long 0 + .long 1071702016 + .long 0 + .long 1071700992 + .long 0 + .long 1071699968 + .long 0 + .long 1071698944 + .long 0 + .long 1071697920 + .long 0 + .long 1071696896 + .long 0 + .long 1071694848 + .long 0 + .long 1071693824 + .long 0 + .long 1071692800 + .long 0 + .long 1071691776 + .long 0 + .long 1071690752 + .long 0 + .long 1071689728 + .long 0 + .long 1071688704 + .long 0 + .long 1071686656 + .long 0 + .long 1071685632 + .long 0 + .long 1071684608 + .long 0 + .long 1071683584 + .long 0 + .long 1071682560 + .long 0 + .long 1071681536 + .long 0 + .long 1071680512 + .long 0 + .long 1071679488 + .long 0 + .long 1071677440 + .long 0 + .long 1071676416 + .long 0 + .long 1071675392 + .long 0 + .long 1071674368 + .long 0 + .long 1071673344 + .long 0 + .long 1071672320 + .long 0 + .long 1071671296 + .long 0 + .long 1071670272 + .long 0 + .long 1071669248 + .long 0 + .long 1071668224 + .long 0 + .long 1071667200 + .long 0 + .long 1071666176 + .long 0 + .long 1071665152 + .long 0 + .long 1071663104 + .long 0 + .long 1071662080 + .long 0 + .long 1071661056 + .long 0 + .long 1071660032 + .long 0 + .long 1071659008 + .long 0 + .long 1071657984 + .long 0 + .long 1071656960 + .long 0 + .long 1071655936 + .long 0 + .long 1071654912 + .long 0 + .long 1071653888 + .long 0 + .long 1071652864 + .long 0 + .long 1071651840 + .long 0 + .long 1071650816 + .long 0 + .long 1071649792 + .long 0 + .long 1071648768 + .long 0 + .long 1071647744 + .long 0 + .long 1071646720 + .long 0 + .long 1071645696 + .long 0 + .long 0 + .long 0 + .long 1071644672 + .long 0 + .long 1072693248 + .long 0 + .long 1073741824 + .long 33554432 + .long 1101004800 + .type _vmlsHypotHATab,@object + .size _vmlsHypotHATab,4136 + .space 472, 0x00 + .align 16 + +.FLT_52: + .long 0x00000000,0x80000000,0x00000000,0x00000000 + .type .FLT_52,@object + .size .FLT_52,16 diff --git a/sysdeps/x86_64/fpu/svml_d_hypot2_core.S b/sysdeps/x86_64/fpu/svml_d_hypot2_core.S new file mode 100644 index 0000000000..ea98f36324 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_hypot2_core.S @@ -0,0 +1,29 @@ +/* Function hypot vectorized with SSE2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVbN2vv_hypot) +WRAPPER_IMPL_SSE2_ff hypot +END (_ZGVbN2vv_hypot) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN2vv_hypot) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_hypot4_core.S b/sysdeps/x86_64/fpu/svml_d_hypot4_core.S new file mode 100644 index 0000000000..cedbbff2b6 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_hypot4_core.S @@ -0,0 +1,29 @@ +/* Function hypot vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVdN4vv_hypot) +WRAPPER_IMPL_AVX_ff _ZGVbN2vv_hypot +END (_ZGVdN4vv_hypot) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN4vv_hypot) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S new file mode 100644 index 0000000000..e0fef5203d --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S @@ -0,0 +1,25 @@ +/* Function hypot vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVcN4vv_hypot) +WRAPPER_IMPL_AVX_ff _ZGVbN2vv_hypot +END (_ZGVcN4vv_hypot) diff --git a/sysdeps/x86_64/fpu/svml_d_hypot8_core.S b/sysdeps/x86_64/fpu/svml_d_hypot8_core.S new file mode 100644 index 0000000000..7588e4407b --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_hypot8_core.S @@ -0,0 +1,25 @@ +/* Function hypot vectorized with AVX-512. Wrapper to AVX2 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVeN8vv_hypot) +WRAPPER_IMPL_AVX512_ff _ZGVdN4vv_hypot +END (_ZGVeN8vv_hypot) diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf16_core.S b/sysdeps/x86_64/fpu/svml_s_hypotf16_core.S new file mode 100644 index 0000000000..06d421a926 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_hypotf16_core.S @@ -0,0 +1,25 @@ +/* Function hypotf vectorized with AVX-512. Wrapper to AVX2 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVeN16vv_hypotf) +WRAPPER_IMPL_AVX512_ff _ZGVdN8vv_hypotf +END (_ZGVeN16vv_hypotf) diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf4_core.S b/sysdeps/x86_64/fpu/svml_s_hypotf4_core.S new file mode 100644 index 0000000000..7e8553cae4 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_hypotf4_core.S @@ -0,0 +1,29 @@ +/* Function hypotf vectorized with SSE2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVbN4vv_hypotf) +WRAPPER_IMPL_SSE2_ff hypotf +END (_ZGVbN4vv_hypotf) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN4vv_hypotf) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf8_core.S b/sysdeps/x86_64/fpu/svml_s_hypotf8_core.S new file mode 100644 index 0000000000..a9bf27370b --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_hypotf8_core.S @@ -0,0 +1,29 @@ +/* Function hypotf vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVdN8vv_hypotf) +WRAPPER_IMPL_AVX_ff _ZGVbN4vv_hypotf +END (_ZGVdN8vv_hypotf) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN8vv_hypotf) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S new file mode 100644 index 0000000000..8b8008a7e9 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S @@ -0,0 +1,25 @@ +/* Function hypotf vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY(_ZGVcN8vv_hypotf) +WRAPPER_IMPL_AVX_ff _ZGVbN4vv_hypotf +END(_ZGVcN8vv_hypotf) diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c new file mode 100644 index 0000000000..c6a26a63e4 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c @@ -0,0 +1 @@ +#include "test-double-libmvec-hypot.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c new file mode 100644 index 0000000000..c6a26a63e4 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c @@ -0,0 +1 @@ +#include "test-double-libmvec-hypot.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c new file mode 100644 index 0000000000..c6a26a63e4 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c @@ -0,0 +1 @@ +#include "test-double-libmvec-hypot.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot.c new file mode 100644 index 0000000000..c0f600a443 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE double +#define LIBMVEC_FUNC hypot +#include "test-vector-abi-arg2.h" diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c index 366d05c08a..8e1aeb6cff 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c @@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVbN2v_erfc) VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10) VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2) VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1) +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c index 044de05d87..7f144711bf 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVdN4v_erfc) VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10) VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2) VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1) +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot) #ifndef __ILP32__ # define VEC_INT_TYPE __m256i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c index f54d3a6874..48824d699a 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c @@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVcN4v_erfc) VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10) VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2) VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1) +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c index e277410a34..eda821a402 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c @@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (erfc), _ZGVeN8v_erfc) VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10) VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2) VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1) +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot) #ifndef __ILP32__ # define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c new file mode 100644 index 0000000000..97d11ad1d3 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c @@ -0,0 +1 @@ +#include "test-float-libmvec-hypotf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c new file mode 100644 index 0000000000..97d11ad1d3 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c @@ -0,0 +1 @@ +#include "test-float-libmvec-hypotf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c new file mode 100644 index 0000000000..97d11ad1d3 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c @@ -0,0 +1 @@ +#include "test-float-libmvec-hypotf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c new file mode 100644 index 0000000000..38776fa724 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE float +#define LIBMVEC_FUNC hypotf +#include "test-vector-abi-arg2.h" diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c index b1313fca6b..89132d61e9 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c @@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVeN16v_erfcf) VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f) VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f) VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f) +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf) #define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c index 7120096ee2..5100f35035 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c @@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVbN4v_erfcf) VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f) VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f) VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f) +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c index d910aff10a..cd9be5eed4 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVdN8v_erfcf) VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f) VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f) VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f) +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf) /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf. */ #undef VECTOR_WRAPPER_fFF diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c index daaae1da3e..44e4fd773c 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c @@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (erfcf), _ZGVcN8v_erfcf) VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f) VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f) VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f) +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf) #define VEC_INT_TYPE __m128i