From patchwork Thu Dec 9 05:04:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sunil Pandey X-Patchwork-Id: 1565629 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=iJjBH0Mo; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4J8jgn0j1Bz9sRR for ; Thu, 9 Dec 2021 16:45:57 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C0F2A3858426 for ; Thu, 9 Dec 2021 05:45:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C0F2A3858426 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1639028754; bh=dCNd1uFZlX6CyK/vDGDWzanj5yHmnJZ1sZseEmxSRJE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=iJjBH0MoB2pgZSVKwgOb9AgDel2yKZuwcpcT7MHGJcqiUVjJPByf3magBlfeXIlwH TvagRMR1Uv1cCHlamNFn1VP4PwCaY7657yRZgwLBWu9eK4CgSUaEIEF3MyIslZdgHE wSGTZrBwrGZtTmuEnio8f0GRdaKRC3YJ/ES60t6Y= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 353C63858C39 for ; Thu, 9 Dec 2021 05:05:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 353C63858C39 X-IronPort-AV: E=McAfee;i="6200,9189,10192"; a="237822046" X-IronPort-AV: E=Sophos;i="5.88,191,1635231600"; d="scan'208";a="237822046" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2021 21:05:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,191,1635231600"; d="scan'208";a="601332290" Received: from scymds01.sc.intel.com ([10.148.94.138]) by FMSMGA003.fm.intel.com with ESMTP; 08 Dec 2021 21:05:12 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds01.sc.intel.com with ESMTP id 1B9558Xl031636; Wed, 8 Dec 2021 21:05:12 -0800 To: libc-alpha@sourceware.org Subject: [PATCH v2 15/42] x86-64: Add vector cbrt/cbrtf implementation to libmvec Date: Wed, 8 Dec 2021 21:04:41 -0800 Message-Id: <20211209050508.2614536-16-skpgkp2@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211209050508.2614536-1-skpgkp2@gmail.com> References: <20211209050508.2614536-1-skpgkp2@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, LOTS_OF_MONEY, NML_ADSP_CUSTOM_MED, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Sunil K Pandey via Libc-alpha From: Sunil Pandey Reply-To: Sunil K Pandey Cc: andrey.kolesov@intel.com Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps. --- bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + .../x86/fpu/finclude/math-vector-fortran.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 2 + sysdeps/x86_64/fpu/libm-test-ulps | 20 + .../fpu/multiarch/svml_d_cbrt2_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_d_cbrt2_core.c | 27 + .../fpu/multiarch/svml_d_cbrt2_core_sse4.S | 2025 +++++++++++++++++ .../fpu/multiarch/svml_d_cbrt4_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_d_cbrt4_core.c | 27 + .../fpu/multiarch/svml_d_cbrt4_core_avx2.S | 1799 +++++++++++++++ .../fpu/multiarch/svml_d_cbrt8_core-avx2.S | 20 + .../x86_64/fpu/multiarch/svml_d_cbrt8_core.c | 27 + .../fpu/multiarch/svml_d_cbrt8_core_avx512.S | 895 ++++++++ .../fpu/multiarch/svml_s_cbrtf16_core-avx2.S | 20 + .../fpu/multiarch/svml_s_cbrtf16_core.c | 28 + .../multiarch/svml_s_cbrtf16_core_avx512.S | 1003 ++++++++ .../fpu/multiarch/svml_s_cbrtf4_core-sse2.S | 20 + .../x86_64/fpu/multiarch/svml_s_cbrtf4_core.c | 28 + .../fpu/multiarch/svml_s_cbrtf4_core_sse4.S | 1863 +++++++++++++++ .../fpu/multiarch/svml_s_cbrtf8_core-sse.S | 20 + .../x86_64/fpu/multiarch/svml_s_cbrtf8_core.c | 28 + .../fpu/multiarch/svml_s_cbrtf8_core_avx2.S | 1686 ++++++++++++++ sysdeps/x86_64/fpu/svml_d_cbrt2_core.S | 29 + sysdeps/x86_64/fpu/svml_d_cbrt4_core.S | 29 + sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S | 25 + sysdeps/x86_64/fpu/svml_d_cbrt8_core.S | 25 + sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S | 25 + sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S | 29 + sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S | 29 + sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S | 25 + .../x86_64/fpu/test-double-libmvec-cbrt-avx.c | 1 + .../fpu/test-double-libmvec-cbrt-avx2.c | 1 + .../fpu/test-double-libmvec-cbrt-avx512f.c | 1 + sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../x86_64/fpu/test-float-libmvec-cbrtf-avx.c | 1 + .../fpu/test-float-libmvec-cbrtf-avx2.c | 1 + .../fpu/test-float-libmvec-cbrtf-avx512f.c | 1 + sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 50 files changed, 9843 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt2_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h index 591f0850ca..b282298fe3 100644 --- a/bits/libm-simd-decl-stubs.h +++ b/bits/libm-simd-decl-stubs.h @@ -175,4 +175,15 @@ #define __DECL_SIMD_atanhf32x #define __DECL_SIMD_atanhf64x #define __DECL_SIMD_atanhf128x + +#define __DECL_SIMD_cbrt +#define __DECL_SIMD_cbrtf +#define __DECL_SIMD_cbrtl +#define __DECL_SIMD_cbrtf16 +#define __DECL_SIMD_cbrtf32 +#define __DECL_SIMD_cbrtf64 +#define __DECL_SIMD_cbrtf128 +#define __DECL_SIMD_cbrtf32x +#define __DECL_SIMD_cbrtf64x +#define __DECL_SIMD_cbrtf128x #endif diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h index be18431fd4..180e81e678 100644 --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -149,7 +149,7 @@ __MATHCALL (hypot,, (_Mdouble_ __x, _Mdouble_ __y)); #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99 /* Return the cube root of X. */ -__MATHCALL (cbrt,, (_Mdouble_ __x)); +__MATHCALL_VEC (cbrt,, (_Mdouble_ __x)); #endif diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist index 54489301ac..1cf8e91ffb 100644 --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist @@ -52,6 +52,7 @@ GLIBC_2.35 _ZGVbN2v_asin F GLIBC_2.35 _ZGVbN2v_asinh F GLIBC_2.35 _ZGVbN2v_atan F GLIBC_2.35 _ZGVbN2v_atanh F +GLIBC_2.35 _ZGVbN2v_cbrt F GLIBC_2.35 _ZGVbN2vv_atan2 F GLIBC_2.35 _ZGVbN4v_acosf F GLIBC_2.35 _ZGVbN4v_acoshf F @@ -59,6 +60,7 @@ GLIBC_2.35 _ZGVbN4v_asinf F GLIBC_2.35 _ZGVbN4v_asinhf F GLIBC_2.35 _ZGVbN4v_atanf F GLIBC_2.35 _ZGVbN4v_atanhf F +GLIBC_2.35 _ZGVbN4v_cbrtf F GLIBC_2.35 _ZGVbN4vv_atan2f F GLIBC_2.35 _ZGVcN4v_acos F GLIBC_2.35 _ZGVcN4v_acosh F @@ -66,6 +68,7 @@ GLIBC_2.35 _ZGVcN4v_asin F GLIBC_2.35 _ZGVcN4v_asinh F GLIBC_2.35 _ZGVcN4v_atan F GLIBC_2.35 _ZGVcN4v_atanh F +GLIBC_2.35 _ZGVcN4v_cbrt F GLIBC_2.35 _ZGVcN4vv_atan2 F GLIBC_2.35 _ZGVcN8v_acosf F GLIBC_2.35 _ZGVcN8v_acoshf F @@ -73,6 +76,7 @@ GLIBC_2.35 _ZGVcN8v_asinf F GLIBC_2.35 _ZGVcN8v_asinhf F GLIBC_2.35 _ZGVcN8v_atanf F GLIBC_2.35 _ZGVcN8v_atanhf F +GLIBC_2.35 _ZGVcN8v_cbrtf F GLIBC_2.35 _ZGVcN8vv_atan2f F GLIBC_2.35 _ZGVdN4v_acos F GLIBC_2.35 _ZGVdN4v_acosh F @@ -80,6 +84,7 @@ GLIBC_2.35 _ZGVdN4v_asin F GLIBC_2.35 _ZGVdN4v_asinh F GLIBC_2.35 _ZGVdN4v_atan F GLIBC_2.35 _ZGVdN4v_atanh F +GLIBC_2.35 _ZGVdN4v_cbrt F GLIBC_2.35 _ZGVdN4vv_atan2 F GLIBC_2.35 _ZGVdN8v_acosf F GLIBC_2.35 _ZGVdN8v_acoshf F @@ -87,6 +92,7 @@ GLIBC_2.35 _ZGVdN8v_asinf F GLIBC_2.35 _ZGVdN8v_asinhf F GLIBC_2.35 _ZGVdN8v_atanf F GLIBC_2.35 _ZGVdN8v_atanhf F +GLIBC_2.35 _ZGVdN8v_cbrtf F GLIBC_2.35 _ZGVdN8vv_atan2f F GLIBC_2.35 _ZGVeN16v_acosf F GLIBC_2.35 _ZGVeN16v_acoshf F @@ -94,6 +100,7 @@ GLIBC_2.35 _ZGVeN16v_asinf F GLIBC_2.35 _ZGVeN16v_asinhf F GLIBC_2.35 _ZGVeN16v_atanf F GLIBC_2.35 _ZGVeN16v_atanhf F +GLIBC_2.35 _ZGVeN16v_cbrtf F GLIBC_2.35 _ZGVeN16vv_atan2f F GLIBC_2.35 _ZGVeN8v_acos F GLIBC_2.35 _ZGVeN8v_acosh F @@ -101,4 +108,5 @@ GLIBC_2.35 _ZGVeN8v_asin F GLIBC_2.35 _ZGVeN8v_asinh F GLIBC_2.35 _ZGVeN8v_atan F GLIBC_2.35 _ZGVeN8v_atanh F +GLIBC_2.35 _ZGVeN8v_cbrt F GLIBC_2.35 _ZGVeN8vv_atan2 F diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h index 753b6ff9d1..5b0a2d9efe 100644 --- a/sysdeps/x86/fpu/bits/math-vector.h +++ b/sysdeps/x86/fpu/bits/math-vector.h @@ -86,6 +86,10 @@ # define __DECL_SIMD_atanh __DECL_SIMD_x86_64 # undef __DECL_SIMD_atanhf # define __DECL_SIMD_atanhf __DECL_SIMD_x86_64 +# undef __DECL_SIMD_cbrt +# define __DECL_SIMD_cbrt __DECL_SIMD_x86_64 +# undef __DECL_SIMD_cbrtf +# define __DECL_SIMD_cbrtf __DECL_SIMD_x86_64 # endif #endif diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h index 6a6b3d4a0d..c172dcce91 100644 --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h @@ -42,6 +42,8 @@ !GCC$ builtin (atan2f) attributes simd (notinbranch) if('x86_64') !GCC$ builtin (atanh) attributes simd (notinbranch) if('x86_64') !GCC$ builtin (atanhf) attributes simd (notinbranch) if('x86_64') +!GCC$ builtin (cbrt) attributes simd (notinbranch) if('x86_64') +!GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x86_64') !GCC$ builtin (cos) attributes simd (notinbranch) if('x32') !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32') @@ -69,3 +71,5 @@ !GCC$ builtin (atan2f) attributes simd (notinbranch) if('x32') !GCC$ builtin (atanh) attributes simd (notinbranch) if('x32') !GCC$ builtin (atanhf) attributes simd (notinbranch) if('x32') +!GCC$ builtin (cbrt) attributes simd (notinbranch) if('x32') +!GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x32') diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig index 7ae4c4af6d..b416e2a916 100644 --- a/sysdeps/x86_64/fpu/Makeconfig +++ b/sysdeps/x86_64/fpu/Makeconfig @@ -29,6 +29,7 @@ libmvec-funcs = \ atan \ atan2 \ atanh \ + cbrt \ cos \ exp \ log \ diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions index f80889e3b5..45baab6b6e 100644 --- a/sysdeps/x86_64/fpu/Versions +++ b/sysdeps/x86_64/fpu/Versions @@ -20,6 +20,7 @@ libmvec { _ZGVbN2v_asinh; _ZGVcN4v_asinh; _ZGVdN4v_asinh; _ZGVeN8v_asinh; _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan; _ZGVbN2v_atanh; _ZGVcN4v_atanh; _ZGVdN4v_atanh; _ZGVeN8v_atanh; + _ZGVbN2v_cbrt; _ZGVcN4v_cbrt; _ZGVdN4v_cbrt; _ZGVeN8v_cbrt; _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2; _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf; _ZGVbN4v_acoshf; _ZGVcN8v_acoshf; _ZGVdN8v_acoshf; _ZGVeN16v_acoshf; @@ -27,6 +28,7 @@ libmvec { _ZGVbN4v_asinhf; _ZGVcN8v_asinhf; _ZGVdN8v_asinhf; _ZGVeN16v_asinhf; _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf; _ZGVbN4v_atanhf; _ZGVcN8v_atanhf; _ZGVdN8v_atanhf; _ZGVeN16v_atanhf; + _ZGVbN4v_cbrtf; _ZGVcN8v_cbrtf; _ZGVdN8v_cbrtf; _ZGVeN16v_cbrtf; _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f; } } diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps index 30ac652738..8b681ed441 100644 --- a/sysdeps/x86_64/fpu/libm-test-ulps +++ b/sysdeps/x86_64/fpu/libm-test-ulps @@ -660,6 +660,26 @@ float: 1 float128: 1 ldouble: 1 +Function: "cbrt_vlen16": +float: 1 + +Function: "cbrt_vlen2": +double: 1 + +Function: "cbrt_vlen4": +double: 1 +float: 2 + +Function: "cbrt_vlen4_avx2": +double: 1 + +Function: "cbrt_vlen8": +double: 1 +float: 2 + +Function: "cbrt_vlen8_avx2": +float: 2 + Function: Real part of "ccos": double: 1 float: 1 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S new file mode 100644 index 0000000000..60f4c46a11 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized cbrt, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN2v_cbrt _ZGVbN2v_cbrt_sse2 +#include "../svml_d_cbrt2_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c new file mode 100644 index 0000000000..07390b7150 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized cbrt, vector length is 2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN2v_cbrt +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN2v_cbrt, __GI__ZGVbN2v_cbrt, __redirect__ZGVbN2v_cbrt) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S new file mode 100644 index 0000000000..64b940d156 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S @@ -0,0 +1,2025 @@ +/* Function cbrt vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52 + * Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5], + * where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in double precision + * cbrt(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5] + * (T stores the high 53 bits, D stores the low order bits) + * Result=2^k*T+(2^k*T*r)*P+2^k*D + * where P=p1+p2*r+..+p8*r^7 + * + */ + +#include + + .text + .section .text.sse4,"ax",@progbits +ENTRY(_ZGVbN2v_cbrt_sse4) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $320, %rsp + movaps %xmm0, %xmm4 + +/* Get iX - high part of argument */ + pshufd $221, %xmm4, %xmm6 + +/* Load 1/(1+iRcpIndex/32+1/64) reciprocal table value */ + lea __svml_dcbrt_data_internal(%rip), %rax + +/* If the exponent field is zero - go to callout to process denormals */ + movq 2048+__svml_dcbrt_data_internal(%rip), %xmm0 + movq 2240+__svml_dcbrt_data_internal(%rip), %xmm7 + pand %xmm6, %xmm0 + movq 2304+__svml_dcbrt_data_internal(%rip), %xmm3 + psubd %xmm7, %xmm0 + +/* Calculate CbrtIndex */ + movaps %xmm4, %xmm7 + pcmpgtd %xmm3, %xmm0 + psrlq $52, %xmm7 + movmskps %xmm0, %edx + pand 1856+__svml_dcbrt_data_internal(%rip), %xmm7 + movdqu 1920+__svml_dcbrt_data_internal(%rip), %xmm0 + pmuludq %xmm7, %xmm0 + +/* Calculate Rcp table index */ + movq 1984+__svml_dcbrt_data_internal(%rip), %xmm2 + pand %xmm6, %xmm2 + +/* Compute 2^k */ + psrld $20, %xmm6 + psrld $12, %xmm2 + pshufd $1, %xmm2, %xmm1 + movd %xmm1, %r8d + pshufd $136, %xmm0, %xmm1 + psrld $14, %xmm1 + pshufd $136, %xmm7, %xmm7 + movdqa %xmm1, %xmm0 + psubd %xmm1, %xmm7 + paddd %xmm1, %xmm0 + psubd %xmm0, %xmm7 + +/* + * Declarations + * Load constants + */ + movq 2112+__svml_dcbrt_data_internal(%rip), %xmm5 + pslld $8, %xmm7 + pand %xmm5, %xmm6 + movq 2176+__svml_dcbrt_data_internal(%rip), %xmm5 + movd %xmm2, %ecx + paddd %xmm7, %xmm2 + por %xmm5, %xmm6 + paddd %xmm1, %xmm6 + +/* Load cbrt(2^j*(1+iRcpIndex/32+1/64)) Hi & Lo values */ + movd %xmm2, %r9d + pslld $20, %xmm6 + pshufd $1, %xmm2, %xmm2 + +/* + * VAND( L, l2k, = l2k, lExpHiMask ); + * Argument reduction Z + */ + movups 1728+__svml_dcbrt_data_internal(%rip), %xmm1 + movd %xmm2, %r10d + andps %xmm4, %xmm1 + pxor %xmm2, %xmm2 + punpckldq %xmm6, %xmm2 + movups 1600+__svml_dcbrt_data_internal(%rip), %xmm6 + andps %xmm4, %xmm6 + orps 1664+__svml_dcbrt_data_internal(%rip), %xmm1 + orps 1536+__svml_dcbrt_data_internal(%rip), %xmm6 + movslq %ecx, %rcx + subpd %xmm6, %xmm1 + movslq %r8d, %r8 + movsd (%rax,%rcx), %xmm3 + +/* Polynomial */ + movups 1088+__svml_dcbrt_data_internal(%rip), %xmm5 + movslq %r9d, %r9 + movhpd (%rax,%r8), %xmm3 + mulpd %xmm1, %xmm3 + mulpd %xmm3, %xmm5 + addpd 1152+__svml_dcbrt_data_internal(%rip), %xmm5 + mulpd %xmm3, %xmm5 + addpd 1216+__svml_dcbrt_data_internal(%rip), %xmm5 + mulpd %xmm3, %xmm5 + addpd 1280+__svml_dcbrt_data_internal(%rip), %xmm5 + mulpd %xmm3, %xmm5 + addpd 1344+__svml_dcbrt_data_internal(%rip), %xmm5 + mulpd %xmm3, %xmm5 + movslq %r10d, %r10 + addpd 1408+__svml_dcbrt_data_internal(%rip), %xmm5 + movsd 256(%rax,%r9), %xmm0 + movhpd 256(%rax,%r10), %xmm0 + +/* THi*2^k, TLo*2^k */ + mulpd %xmm2, %xmm0 + mulpd %xmm3, %xmm5 + +/* THi*2^k*Z */ + mulpd %xmm0, %xmm3 + addpd 1472+__svml_dcbrt_data_internal(%rip), %xmm5 + +/* Final reconstruction */ + mulpd %xmm3, %xmm5 + addpd %xmm5, %xmm0 + andl $3, %edx + jne L(2) + +L(1): + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +L(2): + movups %xmm4, 192(%rsp) + movups %xmm0, 256(%rsp) + je L(1) + xorl %eax, %eax + movups %xmm8, 112(%rsp) + movups %xmm9, 96(%rsp) + movups %xmm10, 80(%rsp) + movups %xmm11, 64(%rsp) + movups %xmm12, 48(%rsp) + movups %xmm13, 32(%rsp) + movups %xmm14, 16(%rsp) + movups %xmm15, (%rsp) + movq %rsi, 136(%rsp) + movq %rdi, 128(%rsp) + movq %r12, 152(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 144(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + movl %edx, %r13d + +L(3): + btl %r12d, %r13d + jc L(5) + +L(4): + incl %r12d + cmpl $2, %r12d + jl L(3) + movups 112(%rsp), %xmm8 + cfi_restore(25) + movups 96(%rsp), %xmm9 + cfi_restore(26) + movups 80(%rsp), %xmm10 + cfi_restore(27) + movups 64(%rsp), %xmm11 + cfi_restore(28) + movups 48(%rsp), %xmm12 + cfi_restore(29) + movups 32(%rsp), %xmm13 + cfi_restore(30) + movups 16(%rsp), %xmm14 + cfi_restore(31) + movups (%rsp), %xmm15 + cfi_restore(32) + movq 136(%rsp), %rsi + cfi_restore(4) + movq 128(%rsp), %rdi + cfi_restore(5) + movq 152(%rsp), %r12 + cfi_restore(12) + movq 144(%rsp), %r13 + cfi_restore(13) + movups 256(%rsp), %xmm0 + jmp L(1) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + +L(5): + lea 192(%rsp,%r12,8), %rdi + lea 256(%rsp,%r12,8), %rsi + call __svml_dcbrt_cout_rare_internal + jmp L(4) + +END(_ZGVbN2v_cbrt_sse4) + + .align 16,0x90 + +__svml_dcbrt_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r8 + movzwl 6(%rdi), %r9d + andl $32752, %r9d + shrl $4, %r9d + movb 7(%rdi), %sil + movsd (%rdi), %xmm1 + cmpl $2047, %r9d + je L(11) + ucomisd 432+__dcbrt_la__vmldCbrtTab(%rip), %xmm1 + jp L(6) + je L(10) + +L(6): + movb %sil, %al + lea 440+__dcbrt_la__vmldCbrtTab(%rip), %rdx + andb $-128, %al + andb $127, %sil + shrb $7, %al + xorl %edi, %edi + movsd %xmm1, -56(%rsp) + movzbl %al, %ecx + movb %sil, -49(%rsp) + movsd (%rdx,%rcx,8), %xmm5 + testl %r9d, %r9d + jne L(7) + movsd -56(%rsp), %xmm0 + movl $100, %edi + mulsd 360+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + movsd %xmm0, -56(%rsp) + jmp L(8) + +L(7): + movsd -56(%rsp), %xmm0 + +L(8): + movzwl -50(%rsp), %esi + movl $1431655766, %eax + andl $32752, %esi + lea __dcbrt_la__vmldCbrtTab(%rip), %r11 + shrl $4, %esi + movsd %xmm0, -40(%rsp) + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm14 + imull %esi + movl $1431655766, %eax + lea (%rdx,%rdx,2), %ecx + negl %ecx + addl %esi, %ecx + subl %ecx, %esi + addl %ecx, %ecx + addl $-1023, %esi + imull %esi + sarl $31, %esi + subl %esi, %edx + addl $1023, %edx + subl %edi, %edx + movzwl -34(%rsp), %edi + andl $2047, %edx + andl $-32753, %edi + addl $16368, %edi + movw %di, -34(%rsp) + movsd -40(%rsp), %xmm11 + movaps %xmm11, %xmm6 + mulsd 376+__dcbrt_la__vmldCbrtTab(%rip), %xmm6 + movsd %xmm6, -32(%rsp) + movsd -32(%rsp), %xmm7 + movl -36(%rsp), %r10d + andl $1048575, %r10d + subsd -40(%rsp), %xmm7 + movsd %xmm7, -24(%rsp) + movsd -32(%rsp), %xmm9 + movsd -24(%rsp), %xmm8 + shrl $15, %r10d + subsd %xmm8, %xmm9 + movsd %xmm9, -32(%rsp) + movsd -32(%rsp), %xmm10 + movsd (%r11,%r10,8), %xmm4 + subsd %xmm10, %xmm11 + movaps %xmm4, %xmm12 + movaps %xmm4, %xmm13 + mulsd %xmm4, %xmm12 + movsd %xmm11, -24(%rsp) + movsd -32(%rsp), %xmm2 + mulsd %xmm12, %xmm2 + mulsd %xmm2, %xmm13 + movsd 440+__dcbrt_la__vmldCbrtTab(%rip), %xmm6 + movsd -24(%rsp), %xmm3 + subsd %xmm13, %xmm6 + mulsd %xmm12, %xmm3 + mulsd %xmm6, %xmm14 + mulsd %xmm3, %xmm4 + movsd %xmm14, -32(%rsp) + movsd -32(%rsp), %xmm15 + xorps .FLT_87(%rip), %xmm4 + subsd %xmm6, %xmm15 + movsd %xmm15, -24(%rsp) + movsd -32(%rsp), %xmm1 + movsd -24(%rsp), %xmm0 + movsd 256+__dcbrt_la__vmldCbrtTab(%rip), %xmm9 + subsd %xmm0, %xmm1 + movsd %xmm1, -32(%rsp) + movsd -32(%rsp), %xmm13 + movsd 352+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + subsd %xmm13, %xmm6 + movsd %xmm6, -24(%rsp) + movsd -32(%rsp), %xmm1 + movsd -24(%rsp), %xmm7 + movaps %xmm1, %xmm8 + movsd 256+__dcbrt_la__vmldCbrtTab(%rip), %xmm11 + addsd %xmm7, %xmm4 + movsd 256+__dcbrt_la__vmldCbrtTab(%rip), %xmm7 + addsd %xmm4, %xmm8 + mulsd %xmm8, %xmm0 + movslq %ecx, %rcx + addsd 344+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + movq 440+__dcbrt_la__vmldCbrtTab(%rip), %r9 + movq %r9, -48(%rsp) + shrq $48, %r9 + addsd 336+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + andl $-32753, %r9d + shll $4, %edx + addsd 328+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + orl %edx, %r9d + movw %r9w, -42(%rsp) + addsd 320+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 312+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 304+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 296+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 288+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 280+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 272+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd %xmm0, %xmm9 + movsd %xmm9, -32(%rsp) + movsd -32(%rsp), %xmm10 + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm9 + subsd %xmm10, %xmm11 + movsd %xmm11, -24(%rsp) + movsd -32(%rsp), %xmm14 + movsd -24(%rsp), %xmm12 + addsd %xmm12, %xmm14 + movsd %xmm14, -16(%rsp) + movaps %xmm2, %xmm14 + movsd -24(%rsp), %xmm6 + addsd %xmm0, %xmm6 + movsd %xmm6, -24(%rsp) + movsd -16(%rsp), %xmm15 + subsd %xmm15, %xmm7 + movsd %xmm7, -16(%rsp) + movsd -24(%rsp), %xmm8 + movsd -16(%rsp), %xmm0 + addsd %xmm0, %xmm8 + movsd %xmm8, -16(%rsp) + movaps %xmm1, %xmm8 + movsd -32(%rsp), %xmm13 + mulsd %xmm13, %xmm9 + movsd -16(%rsp), %xmm0 + movsd %xmm9, -32(%rsp) + movsd -32(%rsp), %xmm10 + subsd %xmm13, %xmm10 + addsd 264+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + movsd %xmm10, -24(%rsp) + movsd -32(%rsp), %xmm11 + movsd -24(%rsp), %xmm6 + subsd %xmm6, %xmm11 + movsd %xmm11, -32(%rsp) + movsd -32(%rsp), %xmm12 + subsd %xmm12, %xmm13 + movsd %xmm13, -24(%rsp) + movsd -32(%rsp), %xmm7 + movsd -24(%rsp), %xmm6 + mulsd %xmm7, %xmm8 + addsd %xmm0, %xmm6 + mulsd %xmm4, %xmm7 + mulsd %xmm6, %xmm4 + mulsd %xmm6, %xmm1 + addsd %xmm4, %xmm7 + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm4 + addsd %xmm1, %xmm7 + mulsd %xmm8, %xmm4 + movsd %xmm7, -32(%rsp) + movsd -32(%rsp), %xmm10 + movsd %xmm4, -32(%rsp) + movsd -32(%rsp), %xmm0 + subsd %xmm8, %xmm0 + movsd %xmm0, -24(%rsp) + movsd -32(%rsp), %xmm1 + movsd -24(%rsp), %xmm4 + subsd %xmm4, %xmm1 + movsd %xmm1, -32(%rsp) + movsd -32(%rsp), %xmm6 + subsd %xmm6, %xmm8 + movsd %xmm8, -24(%rsp) + movsd -32(%rsp), %xmm9 + movsd -24(%rsp), %xmm7 + movaps %xmm9, %xmm1 + mulsd %xmm3, %xmm9 + addsd %xmm7, %xmm10 + mulsd %xmm2, %xmm1 + movaps %xmm10, %xmm11 + movaps %xmm1, %xmm12 + mulsd %xmm3, %xmm10 + addsd %xmm2, %xmm12 + mulsd %xmm2, %xmm11 + addsd %xmm9, %xmm10 + addsd %xmm10, %xmm11 + movsd %xmm11, -32(%rsp) + movsd -32(%rsp), %xmm0 + movsd %xmm12, -32(%rsp) + movsd -32(%rsp), %xmm13 + subsd %xmm13, %xmm14 + movsd %xmm14, -24(%rsp) + movsd -32(%rsp), %xmm9 + movsd -24(%rsp), %xmm15 + addsd %xmm15, %xmm9 + movsd %xmm9, -16(%rsp) + movsd -24(%rsp), %xmm10 + addsd %xmm10, %xmm1 + movsd %xmm1, -24(%rsp) + movsd -16(%rsp), %xmm4 + subsd %xmm4, %xmm2 + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm4 + movsd %xmm2, -16(%rsp) + movsd -24(%rsp), %xmm1 + movsd -16(%rsp), %xmm2 + addsd %xmm2, %xmm1 + movsd %xmm1, -16(%rsp) + movsd -32(%rsp), %xmm9 + mulsd %xmm9, %xmm4 + movsd -16(%rsp), %xmm11 + movsd %xmm4, -32(%rsp) + movsd -32(%rsp), %xmm6 + subsd %xmm9, %xmm6 + movsd %xmm6, -24(%rsp) + movsd -32(%rsp), %xmm7 + movsd -24(%rsp), %xmm2 + subsd %xmm2, %xmm7 + movsd %xmm7, -32(%rsp) + movsd -32(%rsp), %xmm8 + subsd %xmm8, %xmm9 + movsd %xmm9, -24(%rsp) + movsd -32(%rsp), %xmm12 + movsd -24(%rsp), %xmm10 + addsd %xmm0, %xmm10 + addsd %xmm3, %xmm10 + movsd 392(%r11,%rcx,8), %xmm3 + movaps %xmm3, %xmm0 + addsd %xmm10, %xmm11 + mulsd %xmm12, %xmm3 + mulsd %xmm11, %xmm0 + movsd 384(%r11,%rcx,8), %xmm10 + addsd %xmm3, %xmm0 + mulsd %xmm10, %xmm11 + mulsd %xmm10, %xmm12 + addsd %xmm11, %xmm0 + movsd %xmm0, -32(%rsp) + movsd -32(%rsp), %xmm3 + addsd %xmm3, %xmm12 + mulsd -48(%rsp), %xmm12 + mulsd %xmm12, %xmm5 + movsd %xmm5, (%r8) + +L(9): + xorl %eax, %eax + ret + +L(10): + movsd 440+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm0, %xmm1 + movsd %xmm1, (%r8) + jmp L(9) + +L(11): + addsd %xmm1, %xmm1 + movsd %xmm1, (%r8) + jmp L(9) + + cfi_endproc + + .type __svml_dcbrt_cout_rare_internal,@function + .size __svml_dcbrt_cout_rare_internal,.-__svml_dcbrt_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dcbrt_data_internal: + .long 528611360 + .long 3220144632 + .long 2884679527 + .long 3220082993 + .long 1991868891 + .long 3220024928 + .long 2298714891 + .long 3219970134 + .long 58835168 + .long 3219918343 + .long 3035110223 + .long 3219869313 + .long 1617585086 + .long 3219822831 + .long 2500867033 + .long 3219778702 + .long 4241943008 + .long 3219736752 + .long 258732970 + .long 3219696825 + .long 404232216 + .long 3219658776 + .long 2172167368 + .long 3219622476 + .long 1544257904 + .long 3219587808 + .long 377579543 + .long 3219554664 + .long 1616385542 + .long 3219522945 + .long 813783277 + .long 3219492562 + .long 3940743189 + .long 3219463431 + .long 2689777499 + .long 3219435478 + .long 1700977147 + .long 3219408632 + .long 3169102082 + .long 3219382828 + .long 327235604 + .long 3219358008 + .long 1244336319 + .long 3219334115 + .long 1300311200 + .long 3219311099 + .long 3095471925 + .long 3219288912 + .long 2166487928 + .long 3219267511 + .long 2913108253 + .long 3219246854 + .long 293672978 + .long 3219226904 + .long 288737297 + .long 3219207624 + .long 1810275472 + .long 3219188981 + .long 174592167 + .long 3219170945 + .long 3539053052 + .long 3219153485 + .long 2164392968 + .long 3219136576 + .long 572345495 + .long 1072698681 + .long 1998204467 + .long 1072709382 + .long 3861501553 + .long 1072719872 + .long 2268192434 + .long 1072730162 + .long 2981979308 + .long 1072740260 + .long 270859143 + .long 1072750176 + .long 2958651392 + .long 1072759916 + .long 313113243 + .long 1072769490 + .long 919449400 + .long 1072778903 + .long 2809328903 + .long 1072788162 + .long 2222981587 + .long 1072797274 + .long 2352530781 + .long 1072806244 + .long 594152517 + .long 1072815078 + .long 1555767199 + .long 1072823780 + .long 4282421314 + .long 1072832355 + .long 2355578597 + .long 1072840809 + .long 1162590619 + .long 1072849145 + .long 797864051 + .long 1072857367 + .long 431273680 + .long 1072865479 + .long 2669831148 + .long 1072873484 + .long 733477752 + .long 1072881387 + .long 4280220604 + .long 1072889189 + .long 801961634 + .long 1072896896 + .long 2915370760 + .long 1072904508 + .long 1159613482 + .long 1072912030 + .long 2689944798 + .long 1072919463 + .long 1248687822 + .long 1072926811 + .long 2967951030 + .long 1072934075 + .long 630170432 + .long 1072941259 + .long 3760898254 + .long 1072948363 + .long 0 + .long 1072955392 + .long 2370273294 + .long 1072962345 + .long 1261754802 + .long 1072972640 + .long 546334065 + .long 1072986123 + .long 1054893830 + .long 1072999340 + .long 1571187597 + .long 1073012304 + .long 1107975175 + .long 1073025027 + .long 3606909377 + .long 1073037519 + .long 1113616747 + .long 1073049792 + .long 4154744632 + .long 1073061853 + .long 3358931423 + .long 1073073713 + .long 4060702372 + .long 1073085379 + .long 747576176 + .long 1073096860 + .long 3023138255 + .long 1073108161 + .long 1419988548 + .long 1073119291 + .long 1914185305 + .long 1073130255 + .long 294389948 + .long 1073141060 + .long 3761802570 + .long 1073151710 + .long 978281566 + .long 1073162213 + .long 823148820 + .long 1073172572 + .long 2420954441 + .long 1073182792 + .long 3815449908 + .long 1073192878 + .long 2046058587 + .long 1073202835 + .long 1807524753 + .long 1073212666 + .long 2628681401 + .long 1073222375 + .long 3225667357 + .long 1073231966 + .long 1555307421 + .long 1073241443 + .long 3454043099 + .long 1073250808 + .long 1208137896 + .long 1073260066 + .long 3659916772 + .long 1073269218 + .long 1886261264 + .long 1073278269 + .long 3593647839 + .long 1073287220 + .long 3086012205 + .long 1073296075 + .long 2769796922 + .long 1073304836 + .long 888716057 + .long 1073317807 + .long 2201465623 + .long 1073334794 + .long 164369365 + .long 1073351447 + .long 3462666733 + .long 1073367780 + .long 2773905457 + .long 1073383810 + .long 1342879088 + .long 1073399550 + .long 2543933975 + .long 1073415012 + .long 1684477781 + .long 1073430209 + .long 3532178543 + .long 1073445151 + .long 1147747300 + .long 1073459850 + .long 1928031793 + .long 1073474314 + .long 2079717015 + .long 1073488553 + .long 4016765315 + .long 1073502575 + .long 3670431139 + .long 1073516389 + .long 3549227225 + .long 1073530002 + .long 11637607 + .long 1073543422 + .long 588220169 + .long 1073556654 + .long 2635407503 + .long 1073569705 + .long 2042029317 + .long 1073582582 + .long 1925128962 + .long 1073595290 + .long 4136375664 + .long 1073607834 + .long 759964600 + .long 1073620221 + .long 4257606771 + .long 1073632453 + .long 297278907 + .long 1073644538 + .long 3655053093 + .long 1073656477 + .long 2442253172 + .long 1073668277 + .long 1111876799 + .long 1073679941 + .long 3330973139 + .long 1073691472 + .long 3438879452 + .long 1073702875 + .long 3671565478 + .long 1073714153 + .long 1317849547 + .long 1073725310 + .long 1642364115 + .long 1073736348 + .long 1553778919 + .long 3213899486 + .long 1553778919 + .long 3213899486 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3582521621 + .long 1066628362 + .long 3582521621 + .long 1066628362 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1646371399 + .long 3214412045 + .long 1646371399 + .long 3214412045 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 889629714 + .long 1067378449 + .long 889629714 + .long 1067378449 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3534952507 + .long 3215266280 + .long 3534952507 + .long 3215266280 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1007461464 + .long 1068473053 + .long 1007461464 + .long 1068473053 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 477218588 + .long 3216798151 + .long 477218588 + .long 3216798151 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1431655765 + .long 1070945621 + .long 1431655765 + .long 1070945621 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3220193280 + .long 0 + .long 3220193280 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1032192 + .long 0 + .long 1032192 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4294967295 + .long 1048575 + .long 4294967295 + .long 1048575 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 4293918720 + .long 0 + .long 4293918720 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2047 + .long 0 + .long 2047 + .long 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 5462 + .long 0 + .long 5462 + .long 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1015808 + .long 1015808 + .long 1015808 + .long 1015808 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2048 + .long 2048 + .long 2048 + .long 2048 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 682 + .long 682 + .long 682 + .long 682 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2148532224 + .long 2148532224 + .long 2148532224 + .long 2148532224 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4292870143 + .long 4292870143 + .long 4292870143 + .long 4292870143 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_dcbrt_data_internal,@object + .size __svml_dcbrt_data_internal,2368 + .align 32 + +__dcbrt_la__vmldCbrtTab: + .long 0 + .long 1072693248 + .long 0 + .long 1072668672 + .long 0 + .long 1072644096 + .long 0 + .long 1072627712 + .long 0 + .long 1072611328 + .long 0 + .long 1072586752 + .long 0 + .long 1072570368 + .long 0 + .long 1072553984 + .long 0 + .long 1072537600 + .long 0 + .long 1072521216 + .long 0 + .long 1072504832 + .long 0 + .long 1072488448 + .long 0 + .long 1072480256 + .long 0 + .long 1072463872 + .long 0 + .long 1072447488 + .long 0 + .long 1072439296 + .long 0 + .long 1072422912 + .long 0 + .long 1072414720 + .long 0 + .long 1072398336 + .long 0 + .long 1072390144 + .long 0 + .long 1072373760 + .long 0 + .long 1072365568 + .long 0 + .long 1072357376 + .long 0 + .long 1072340992 + .long 0 + .long 1072332800 + .long 0 + .long 1072324608 + .long 0 + .long 1072308224 + .long 0 + .long 1072300032 + .long 0 + .long 1072291840 + .long 0 + .long 1072283648 + .long 0 + .long 1072275456 + .long 0 + .long 1072267264 + .long 1431655765 + .long 1071994197 + .long 1431655765 + .long 1015371093 + .long 1908874354 + .long 1071761180 + .long 1007461464 + .long 1071618781 + .long 565592401 + .long 1071446176 + .long 241555088 + .long 1071319599 + .long 943963244 + .long 1071221150 + .long 2330668378 + .long 1071141453 + .long 2770428108 + .long 1071075039 + .long 3622256836 + .long 1071018464 + .long 1497196870 + .long 1070969433 + .long 280472551 + .long 1070926345 + .long 1585032765 + .long 1070888044 + .long 0 + .long 1387266048 + .long 33554432 + .long 1101004800 + .long 512 + .long 1117782016 + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 4160749568 + .long 1072965794 + .long 2921479643 + .long 1043912488 + .long 2684354560 + .long 1073309182 + .long 4060791142 + .long 1045755320 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 0 + .long 3220176896 + .type __dcbrt_la__vmldCbrtTab,@object + .size __dcbrt_la__vmldCbrtTab,456 + .space 8, 0x00 + .align 16 + +.FLT_87: + .long 0x00000000,0x80000000,0x00000000,0x00000000 + .type .FLT_87,@object + .size .FLT_87,16 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S new file mode 100644 index 0000000000..3b54f31fbc --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized cbrt, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN4v_cbrt _ZGVdN4v_cbrt_sse_wrapper +#include "../svml_d_cbrt4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c new file mode 100644 index 0000000000..0b135877aa --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized cbrt, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN4v_cbrt +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN4v_cbrt, __GI__ZGVdN4v_cbrt, __redirect__ZGVdN4v_cbrt) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S new file mode 100644 index 0000000000..694170fe51 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S @@ -0,0 +1,1799 @@ +/* Function cbrt vectorized with AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52 + * Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5], + * where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in double precision + * cbrt(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5] + * (T stores the high 53 bits, D stores the low order bits) + * Result=2^k*T+(2^k*T*r)*P+2^k*D + * where P=p1+p2*r+..+p8*r^7 + * + */ + +#include + + .text + .section .text.avx2,"ax",@progbits +ENTRY(_ZGVdN4v_cbrt_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + +/* Load 1/(1+iRcpIndex/32+1/64) reciprocal table value */ + lea __svml_dcbrt_data_internal(%rip), %rax + vmovapd %ymm0, %ymm5 + vmovups %ymm10, 160(%rsp) + vmovups %ymm11, 192(%rsp) + vmovups %ymm13, 256(%rsp) + vmovups %ymm14, 288(%rsp) + vmovups %ymm12, 224(%rsp) + vmovups %ymm8, 32(%rsp) + vmovups %ymm15, 320(%rsp) + vmovups %ymm9, 96(%rsp) + +/* + * Declarations + * Load constants + * Get iX - high part of argument + */ + vextractf128 $1, %ymm5, %xmm6 + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + +/* Calculate CbrtIndex */ + vpsrlq $52, %ymm5, %ymm10 + vshufps $221, %xmm6, %xmm5, %xmm4 + +/* Calculate Rcp table index */ + vandps 1984+__svml_dcbrt_data_internal(%rip), %xmm4, %xmm1 + vpsrld $12, %xmm1, %xmm3 + vmovd %xmm3, %ecx + +/* If the exponent field is zero - go to callout to process denormals */ + vandps 2048+__svml_dcbrt_data_internal(%rip), %xmm4, %xmm7 + +/* Compute 2^k */ + vpsrld $20, %xmm4, %xmm4 + vpsubd 2240+__svml_dcbrt_data_internal(%rip), %xmm7, %xmm0 + vandps 1856+__svml_dcbrt_data_internal(%rip), %ymm10, %ymm11 + vpextrd $2, %xmm3, %r9d + vpmuludq 1920+__svml_dcbrt_data_internal(%rip), %ymm11, %ymm13 + movslq %ecx, %rcx + vpextrd $1, %xmm3, %r8d + movslq %r9d, %r9 + vpextrd $3, %xmm3, %r10d + movslq %r8d, %r8 + movslq %r10d, %r10 + vmovsd (%rax,%rcx), %xmm6 + vmovsd (%rax,%r9), %xmm8 + vmovhpd (%rax,%r8), %xmm6, %xmm7 + vpcmpgtd 2304+__svml_dcbrt_data_internal(%rip), %xmm0, %xmm2 + vmovhpd (%rax,%r10), %xmm8, %xmm9 + vmovmskps %xmm2, %edx + vandpd 1600+__svml_dcbrt_data_internal(%rip), %ymm5, %ymm10 + vextractf128 $1, %ymm13, %xmm14 + vshufps $136, %xmm14, %xmm13, %xmm15 + vpsrld $14, %xmm15, %xmm1 + +/* Polynomial */ + vmovupd 1088+__svml_dcbrt_data_internal(%rip), %ymm15 + vextractf128 $1, %ymm11, %xmm12 + vshufps $136, %xmm12, %xmm11, %xmm0 + vpsubd %xmm1, %xmm0, %xmm6 + vorpd 1536+__svml_dcbrt_data_internal(%rip), %ymm10, %ymm12 + vinsertf128 $1, %xmm9, %ymm7, %ymm2 + vpaddd %xmm1, %xmm1, %xmm7 + vpsubd %xmm7, %xmm6, %xmm8 + vpslld $8, %xmm8, %xmm9 + vpaddd %xmm9, %xmm3, %xmm6 + +/* + * VAND( L, l2k, = l2k, lExpHiMask ); + * Argument reduction Z + */ + vandpd 1728+__svml_dcbrt_data_internal(%rip), %ymm5, %ymm9 + vorpd 1664+__svml_dcbrt_data_internal(%rip), %ymm9, %ymm11 + vsubpd %ymm12, %ymm11, %ymm13 + +/* Load cbrt(2^j*(1+iRcpIndex/32+1/64)) Hi & Lo values */ + vmovd %xmm6, %r11d + vmulpd %ymm13, %ymm2, %ymm2 + vfmadd213pd 1152+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm15 + movslq %r11d, %r11 + vpextrd $1, %xmm6, %ecx + vfmadd213pd 1216+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm15 + vmovsd 256(%rax,%r11), %xmm3 + vpextrd $2, %xmm6, %r8d + movslq %ecx, %rcx + movslq %r8d, %r8 + vpextrd $3, %xmm6, %r9d + movslq %r9d, %r9 + vmovhpd 256(%rax,%rcx), %xmm3, %xmm0 + vfmadd213pd 1280+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm15 + vmovsd 256(%rax,%r8), %xmm3 + vmovhpd 256(%rax,%r9), %xmm3, %xmm7 + vpand 2112+__svml_dcbrt_data_internal(%rip), %xmm4, %xmm3 + vpor 2176+__svml_dcbrt_data_internal(%rip), %xmm3, %xmm4 + vfmadd213pd 1344+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm15 + vpaddd %xmm1, %xmm4, %xmm1 + vpslld $20, %xmm1, %xmm6 + vfmadd213pd 1408+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm15 + vfmadd213pd 1472+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm15 + vinsertf128 $1, %xmm7, %ymm0, %ymm0 + vmovups __VUNPACK_ODD_ind1.217.0.2(%rip), %ymm7 + vpermps %ymm6, %ymm7, %ymm8 + vandps __VUNPACK_ODD_mask.217.0.2(%rip), %ymm8, %ymm14 + +/* THi*2^k, TLo*2^k */ + vmulpd %ymm14, %ymm0, %ymm1 + +/* THi*2^k*Z */ + vmulpd %ymm1, %ymm2, %ymm0 + +/* Final reconstruction */ + vmulpd %ymm0, %ymm15, %ymm0 + vaddpd %ymm0, %ymm1, %ymm0 + testl %edx, %edx + jne L(2) + +L(1): + vmovups 32(%rsp), %ymm8 + cfi_restore(91) + vmovups 96(%rsp), %ymm9 + cfi_restore(92) + vmovups 160(%rsp), %ymm10 + cfi_restore(93) + vmovups 192(%rsp), %ymm11 + cfi_restore(94) + vmovups 224(%rsp), %ymm12 + cfi_restore(95) + vmovups 256(%rsp), %ymm13 + cfi_restore(96) + vmovups 288(%rsp), %ymm14 + cfi_restore(97) + vmovups 320(%rsp), %ymm15 + cfi_restore(98) + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + +L(2): + vmovupd %ymm5, 64(%rsp) + vmovupd %ymm0, 128(%rsp) + je L(1) + xorl %eax, %eax + vzeroupper + movq %rsi, 8(%rsp) + movq %rdi, (%rsp) + movq %r12, 24(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r12d + movq %r13, 16(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r13d + +L(3): + btl %r12d, %r13d + jc L(5) + +L(4): + incl %r12d + cmpl $4, %r12d + jl L(3) + movq 8(%rsp), %rsi + cfi_restore(4) + movq (%rsp), %rdi + cfi_restore(5) + movq 24(%rsp), %r12 + cfi_restore(12) + movq 16(%rsp), %r13 + cfi_restore(13) + vmovupd 128(%rsp), %ymm0 + jmp L(1) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + +L(5): + lea 64(%rsp,%r12,8), %rdi + lea 128(%rsp,%r12,8), %rsi + call __svml_dcbrt_cout_rare_internal + jmp L(4) + +END(_ZGVdN4v_cbrt_avx2) + .section .rodata, "a" + .align 64 + +__VUNPACK_ODD_ind1.217.0.2: + .long 0 + .long 0 + .long 0 + .long 1 + .long 0 + .long 2 + .long 0 + .long 3 + .space 32, 0x00 + .align 64 + +__VUNPACK_ODD_mask.217.0.2: + .long 0 + .long -1 + .long 0 + .long -1 + .long 0 + .long -1 + .long 0 + .long -1 + + .text + + .align 16,0x90 + +__svml_dcbrt_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r8 + movzwl 6(%rdi), %r9d + andl $32752, %r9d + shrl $4, %r9d + movb 7(%rdi), %sil + movsd (%rdi), %xmm1 + cmpl $2047, %r9d + je L(11) + ucomisd 432+__dcbrt_la__vmldCbrtTab(%rip), %xmm1 + jp L(6) + je L(10) + +L(6): + movb %sil, %al + lea 440+__dcbrt_la__vmldCbrtTab(%rip), %rdx + andb $-128, %al + andb $127, %sil + shrb $7, %al + xorl %edi, %edi + movsd %xmm1, -56(%rsp) + movzbl %al, %ecx + movb %sil, -49(%rsp) + movsd (%rdx,%rcx,8), %xmm5 + testl %r9d, %r9d + jne L(7) + movsd -56(%rsp), %xmm0 + movl $100, %edi + mulsd 360+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + movsd %xmm0, -56(%rsp) + jmp L(8) + +L(7): + movsd -56(%rsp), %xmm0 + +L(8): + movzwl -50(%rsp), %esi + movl $1431655766, %eax + andl $32752, %esi + lea __dcbrt_la__vmldCbrtTab(%rip), %r11 + shrl $4, %esi + movsd %xmm0, -40(%rsp) + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm14 + imull %esi + movl $1431655766, %eax + lea (%rdx,%rdx,2), %ecx + negl %ecx + addl %esi, %ecx + subl %ecx, %esi + addl %ecx, %ecx + addl $-1023, %esi + imull %esi + sarl $31, %esi + subl %esi, %edx + addl $1023, %edx + subl %edi, %edx + movzwl -34(%rsp), %edi + andl $2047, %edx + andl $-32753, %edi + addl $16368, %edi + movw %di, -34(%rsp) + movsd -40(%rsp), %xmm11 + movaps %xmm11, %xmm6 + mulsd 376+__dcbrt_la__vmldCbrtTab(%rip), %xmm6 + movsd %xmm6, -32(%rsp) + movsd -32(%rsp), %xmm7 + movl -36(%rsp), %r10d + andl $1048575, %r10d + subsd -40(%rsp), %xmm7 + movsd %xmm7, -24(%rsp) + movsd -32(%rsp), %xmm9 + movsd -24(%rsp), %xmm8 + shrl $15, %r10d + subsd %xmm8, %xmm9 + movsd %xmm9, -32(%rsp) + movsd -32(%rsp), %xmm10 + movsd (%r11,%r10,8), %xmm4 + subsd %xmm10, %xmm11 + movaps %xmm4, %xmm12 + movaps %xmm4, %xmm13 + mulsd %xmm4, %xmm12 + movsd %xmm11, -24(%rsp) + movsd -32(%rsp), %xmm2 + mulsd %xmm12, %xmm2 + mulsd %xmm2, %xmm13 + movsd 440+__dcbrt_la__vmldCbrtTab(%rip), %xmm6 + movsd -24(%rsp), %xmm3 + subsd %xmm13, %xmm6 + mulsd %xmm12, %xmm3 + mulsd %xmm6, %xmm14 + mulsd %xmm3, %xmm4 + movsd %xmm14, -32(%rsp) + movsd -32(%rsp), %xmm15 + xorps .FLT_87(%rip), %xmm4 + subsd %xmm6, %xmm15 + movsd %xmm15, -24(%rsp) + movsd -32(%rsp), %xmm1 + movsd -24(%rsp), %xmm0 + movsd 256+__dcbrt_la__vmldCbrtTab(%rip), %xmm9 + subsd %xmm0, %xmm1 + movsd %xmm1, -32(%rsp) + movsd -32(%rsp), %xmm13 + movsd 352+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + subsd %xmm13, %xmm6 + movsd %xmm6, -24(%rsp) + movsd -32(%rsp), %xmm1 + movsd -24(%rsp), %xmm7 + movaps %xmm1, %xmm8 + movsd 256+__dcbrt_la__vmldCbrtTab(%rip), %xmm11 + addsd %xmm7, %xmm4 + movsd 256+__dcbrt_la__vmldCbrtTab(%rip), %xmm7 + addsd %xmm4, %xmm8 + mulsd %xmm8, %xmm0 + movslq %ecx, %rcx + addsd 344+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + movq 440+__dcbrt_la__vmldCbrtTab(%rip), %r9 + movq %r9, -48(%rsp) + shrq $48, %r9 + addsd 336+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + andl $-32753, %r9d + shll $4, %edx + addsd 328+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + orl %edx, %r9d + movw %r9w, -42(%rsp) + addsd 320+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 312+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 304+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 296+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 288+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 280+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 272+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd %xmm0, %xmm9 + movsd %xmm9, -32(%rsp) + movsd -32(%rsp), %xmm10 + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm9 + subsd %xmm10, %xmm11 + movsd %xmm11, -24(%rsp) + movsd -32(%rsp), %xmm14 + movsd -24(%rsp), %xmm12 + addsd %xmm12, %xmm14 + movsd %xmm14, -16(%rsp) + movaps %xmm2, %xmm14 + movsd -24(%rsp), %xmm6 + addsd %xmm0, %xmm6 + movsd %xmm6, -24(%rsp) + movsd -16(%rsp), %xmm15 + subsd %xmm15, %xmm7 + movsd %xmm7, -16(%rsp) + movsd -24(%rsp), %xmm8 + movsd -16(%rsp), %xmm0 + addsd %xmm0, %xmm8 + movsd %xmm8, -16(%rsp) + movaps %xmm1, %xmm8 + movsd -32(%rsp), %xmm13 + mulsd %xmm13, %xmm9 + movsd -16(%rsp), %xmm0 + movsd %xmm9, -32(%rsp) + movsd -32(%rsp), %xmm10 + subsd %xmm13, %xmm10 + addsd 264+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + movsd %xmm10, -24(%rsp) + movsd -32(%rsp), %xmm11 + movsd -24(%rsp), %xmm6 + subsd %xmm6, %xmm11 + movsd %xmm11, -32(%rsp) + movsd -32(%rsp), %xmm12 + subsd %xmm12, %xmm13 + movsd %xmm13, -24(%rsp) + movsd -32(%rsp), %xmm7 + movsd -24(%rsp), %xmm6 + mulsd %xmm7, %xmm8 + addsd %xmm0, %xmm6 + mulsd %xmm4, %xmm7 + mulsd %xmm6, %xmm4 + mulsd %xmm6, %xmm1 + addsd %xmm4, %xmm7 + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm4 + addsd %xmm1, %xmm7 + mulsd %xmm8, %xmm4 + movsd %xmm7, -32(%rsp) + movsd -32(%rsp), %xmm10 + movsd %xmm4, -32(%rsp) + movsd -32(%rsp), %xmm0 + subsd %xmm8, %xmm0 + movsd %xmm0, -24(%rsp) + movsd -32(%rsp), %xmm1 + movsd -24(%rsp), %xmm4 + subsd %xmm4, %xmm1 + movsd %xmm1, -32(%rsp) + movsd -32(%rsp), %xmm6 + subsd %xmm6, %xmm8 + movsd %xmm8, -24(%rsp) + movsd -32(%rsp), %xmm9 + movsd -24(%rsp), %xmm7 + movaps %xmm9, %xmm1 + mulsd %xmm3, %xmm9 + addsd %xmm7, %xmm10 + mulsd %xmm2, %xmm1 + movaps %xmm10, %xmm11 + movaps %xmm1, %xmm12 + mulsd %xmm3, %xmm10 + addsd %xmm2, %xmm12 + mulsd %xmm2, %xmm11 + addsd %xmm9, %xmm10 + addsd %xmm10, %xmm11 + movsd %xmm11, -32(%rsp) + movsd -32(%rsp), %xmm0 + movsd %xmm12, -32(%rsp) + movsd -32(%rsp), %xmm13 + subsd %xmm13, %xmm14 + movsd %xmm14, -24(%rsp) + movsd -32(%rsp), %xmm9 + movsd -24(%rsp), %xmm15 + addsd %xmm15, %xmm9 + movsd %xmm9, -16(%rsp) + movsd -24(%rsp), %xmm10 + addsd %xmm10, %xmm1 + movsd %xmm1, -24(%rsp) + movsd -16(%rsp), %xmm4 + subsd %xmm4, %xmm2 + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm4 + movsd %xmm2, -16(%rsp) + movsd -24(%rsp), %xmm1 + movsd -16(%rsp), %xmm2 + addsd %xmm2, %xmm1 + movsd %xmm1, -16(%rsp) + movsd -32(%rsp), %xmm9 + mulsd %xmm9, %xmm4 + movsd -16(%rsp), %xmm11 + movsd %xmm4, -32(%rsp) + movsd -32(%rsp), %xmm6 + subsd %xmm9, %xmm6 + movsd %xmm6, -24(%rsp) + movsd -32(%rsp), %xmm7 + movsd -24(%rsp), %xmm2 + subsd %xmm2, %xmm7 + movsd %xmm7, -32(%rsp) + movsd -32(%rsp), %xmm8 + subsd %xmm8, %xmm9 + movsd %xmm9, -24(%rsp) + movsd -32(%rsp), %xmm12 + movsd -24(%rsp), %xmm10 + addsd %xmm0, %xmm10 + addsd %xmm3, %xmm10 + movsd 392(%r11,%rcx,8), %xmm3 + movaps %xmm3, %xmm0 + addsd %xmm10, %xmm11 + mulsd %xmm12, %xmm3 + mulsd %xmm11, %xmm0 + movsd 384(%r11,%rcx,8), %xmm10 + addsd %xmm3, %xmm0 + mulsd %xmm10, %xmm11 + mulsd %xmm10, %xmm12 + addsd %xmm11, %xmm0 + movsd %xmm0, -32(%rsp) + movsd -32(%rsp), %xmm3 + addsd %xmm3, %xmm12 + mulsd -48(%rsp), %xmm12 + mulsd %xmm12, %xmm5 + movsd %xmm5, (%r8) + +L(9): + xorl %eax, %eax + ret + +L(10): + movsd 440+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm0, %xmm1 + movsd %xmm1, (%r8) + jmp L(9) + +L(11): + addsd %xmm1, %xmm1 + movsd %xmm1, (%r8) + jmp L(9) + + cfi_endproc + + .type __svml_dcbrt_cout_rare_internal,@function + .size __svml_dcbrt_cout_rare_internal,.-__svml_dcbrt_cout_rare_internal + + .section .rodata, "a" + .space 32, 0x00 + .align 64 + +__svml_dcbrt_data_internal: + .long 528611360 + .long 3220144632 + .long 2884679527 + .long 3220082993 + .long 1991868891 + .long 3220024928 + .long 2298714891 + .long 3219970134 + .long 58835168 + .long 3219918343 + .long 3035110223 + .long 3219869313 + .long 1617585086 + .long 3219822831 + .long 2500867033 + .long 3219778702 + .long 4241943008 + .long 3219736752 + .long 258732970 + .long 3219696825 + .long 404232216 + .long 3219658776 + .long 2172167368 + .long 3219622476 + .long 1544257904 + .long 3219587808 + .long 377579543 + .long 3219554664 + .long 1616385542 + .long 3219522945 + .long 813783277 + .long 3219492562 + .long 3940743189 + .long 3219463431 + .long 2689777499 + .long 3219435478 + .long 1700977147 + .long 3219408632 + .long 3169102082 + .long 3219382828 + .long 327235604 + .long 3219358008 + .long 1244336319 + .long 3219334115 + .long 1300311200 + .long 3219311099 + .long 3095471925 + .long 3219288912 + .long 2166487928 + .long 3219267511 + .long 2913108253 + .long 3219246854 + .long 293672978 + .long 3219226904 + .long 288737297 + .long 3219207624 + .long 1810275472 + .long 3219188981 + .long 174592167 + .long 3219170945 + .long 3539053052 + .long 3219153485 + .long 2164392968 + .long 3219136576 + .long 572345495 + .long 1072698681 + .long 1998204467 + .long 1072709382 + .long 3861501553 + .long 1072719872 + .long 2268192434 + .long 1072730162 + .long 2981979308 + .long 1072740260 + .long 270859143 + .long 1072750176 + .long 2958651392 + .long 1072759916 + .long 313113243 + .long 1072769490 + .long 919449400 + .long 1072778903 + .long 2809328903 + .long 1072788162 + .long 2222981587 + .long 1072797274 + .long 2352530781 + .long 1072806244 + .long 594152517 + .long 1072815078 + .long 1555767199 + .long 1072823780 + .long 4282421314 + .long 1072832355 + .long 2355578597 + .long 1072840809 + .long 1162590619 + .long 1072849145 + .long 797864051 + .long 1072857367 + .long 431273680 + .long 1072865479 + .long 2669831148 + .long 1072873484 + .long 733477752 + .long 1072881387 + .long 4280220604 + .long 1072889189 + .long 801961634 + .long 1072896896 + .long 2915370760 + .long 1072904508 + .long 1159613482 + .long 1072912030 + .long 2689944798 + .long 1072919463 + .long 1248687822 + .long 1072926811 + .long 2967951030 + .long 1072934075 + .long 630170432 + .long 1072941259 + .long 3760898254 + .long 1072948363 + .long 0 + .long 1072955392 + .long 2370273294 + .long 1072962345 + .long 1261754802 + .long 1072972640 + .long 546334065 + .long 1072986123 + .long 1054893830 + .long 1072999340 + .long 1571187597 + .long 1073012304 + .long 1107975175 + .long 1073025027 + .long 3606909377 + .long 1073037519 + .long 1113616747 + .long 1073049792 + .long 4154744632 + .long 1073061853 + .long 3358931423 + .long 1073073713 + .long 4060702372 + .long 1073085379 + .long 747576176 + .long 1073096860 + .long 3023138255 + .long 1073108161 + .long 1419988548 + .long 1073119291 + .long 1914185305 + .long 1073130255 + .long 294389948 + .long 1073141060 + .long 3761802570 + .long 1073151710 + .long 978281566 + .long 1073162213 + .long 823148820 + .long 1073172572 + .long 2420954441 + .long 1073182792 + .long 3815449908 + .long 1073192878 + .long 2046058587 + .long 1073202835 + .long 1807524753 + .long 1073212666 + .long 2628681401 + .long 1073222375 + .long 3225667357 + .long 1073231966 + .long 1555307421 + .long 1073241443 + .long 3454043099 + .long 1073250808 + .long 1208137896 + .long 1073260066 + .long 3659916772 + .long 1073269218 + .long 1886261264 + .long 1073278269 + .long 3593647839 + .long 1073287220 + .long 3086012205 + .long 1073296075 + .long 2769796922 + .long 1073304836 + .long 888716057 + .long 1073317807 + .long 2201465623 + .long 1073334794 + .long 164369365 + .long 1073351447 + .long 3462666733 + .long 1073367780 + .long 2773905457 + .long 1073383810 + .long 1342879088 + .long 1073399550 + .long 2543933975 + .long 1073415012 + .long 1684477781 + .long 1073430209 + .long 3532178543 + .long 1073445151 + .long 1147747300 + .long 1073459850 + .long 1928031793 + .long 1073474314 + .long 2079717015 + .long 1073488553 + .long 4016765315 + .long 1073502575 + .long 3670431139 + .long 1073516389 + .long 3549227225 + .long 1073530002 + .long 11637607 + .long 1073543422 + .long 588220169 + .long 1073556654 + .long 2635407503 + .long 1073569705 + .long 2042029317 + .long 1073582582 + .long 1925128962 + .long 1073595290 + .long 4136375664 + .long 1073607834 + .long 759964600 + .long 1073620221 + .long 4257606771 + .long 1073632453 + .long 297278907 + .long 1073644538 + .long 3655053093 + .long 1073656477 + .long 2442253172 + .long 1073668277 + .long 1111876799 + .long 1073679941 + .long 3330973139 + .long 1073691472 + .long 3438879452 + .long 1073702875 + .long 3671565478 + .long 1073714153 + .long 1317849547 + .long 1073725310 + .long 1642364115 + .long 1073736348 + .long 1553778919 + .long 3213899486 + .long 1553778919 + .long 3213899486 + .long 1553778919 + .long 3213899486 + .long 1553778919 + .long 3213899486 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3582521621 + .long 1066628362 + .long 3582521621 + .long 1066628362 + .long 3582521621 + .long 1066628362 + .long 3582521621 + .long 1066628362 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1646371399 + .long 3214412045 + .long 1646371399 + .long 3214412045 + .long 1646371399 + .long 3214412045 + .long 1646371399 + .long 3214412045 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 889629714 + .long 1067378449 + .long 889629714 + .long 1067378449 + .long 889629714 + .long 1067378449 + .long 889629714 + .long 1067378449 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3534952507 + .long 3215266280 + .long 3534952507 + .long 3215266280 + .long 3534952507 + .long 3215266280 + .long 3534952507 + .long 3215266280 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1007461464 + .long 1068473053 + .long 1007461464 + .long 1068473053 + .long 1007461464 + .long 1068473053 + .long 1007461464 + .long 1068473053 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 477218588 + .long 3216798151 + .long 477218588 + .long 3216798151 + .long 477218588 + .long 3216798151 + .long 477218588 + .long 3216798151 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1431655765 + .long 1070945621 + .long 1431655765 + .long 1070945621 + .long 1431655765 + .long 1070945621 + .long 1431655765 + .long 1070945621 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3220193280 + .long 0 + .long 3220193280 + .long 0 + .long 3220193280 + .long 0 + .long 3220193280 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 1032192 + .long 0 + .long 1032192 + .long 0 + .long 1032192 + .long 0 + .long 1032192 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .long 0 + .long 3220176896 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4294967295 + .long 1048575 + .long 4294967295 + .long 1048575 + .long 4294967295 + .long 1048575 + .long 4294967295 + .long 1048575 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 4293918720 + .long 0 + .long 4293918720 + .long 0 + .long 4293918720 + .long 0 + .long 4293918720 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2047 + .long 0 + .long 2047 + .long 0 + .long 2047 + .long 0 + .long 2047 + .long 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 5462 + .long 0 + .long 5462 + .long 0 + .long 5462 + .long 0 + .long 5462 + .long 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1015808 + .long 1015808 + .long 1015808 + .long 1015808 + .long 1015808 + .long 1015808 + .long 1015808 + .long 1015808 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2048 + .long 2048 + .long 2048 + .long 2048 + .long 2048 + .long 2048 + .long 2048 + .long 2048 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 682 + .long 682 + .long 682 + .long 682 + .long 682 + .long 682 + .long 682 + .long 682 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2148532224 + .long 2148532224 + .long 2148532224 + .long 2148532224 + .long 2148532224 + .long 2148532224 + .long 2148532224 + .long 2148532224 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4292870143 + .long 4292870143 + .long 4292870143 + .long 4292870143 + .long 4292870143 + .long 4292870143 + .long 4292870143 + .long 4292870143 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_dcbrt_data_internal,@object + .size __svml_dcbrt_data_internal,2368 + .align 32 + +__dcbrt_la__vmldCbrtTab: + .long 0 + .long 1072693248 + .long 0 + .long 1072668672 + .long 0 + .long 1072644096 + .long 0 + .long 1072627712 + .long 0 + .long 1072611328 + .long 0 + .long 1072586752 + .long 0 + .long 1072570368 + .long 0 + .long 1072553984 + .long 0 + .long 1072537600 + .long 0 + .long 1072521216 + .long 0 + .long 1072504832 + .long 0 + .long 1072488448 + .long 0 + .long 1072480256 + .long 0 + .long 1072463872 + .long 0 + .long 1072447488 + .long 0 + .long 1072439296 + .long 0 + .long 1072422912 + .long 0 + .long 1072414720 + .long 0 + .long 1072398336 + .long 0 + .long 1072390144 + .long 0 + .long 1072373760 + .long 0 + .long 1072365568 + .long 0 + .long 1072357376 + .long 0 + .long 1072340992 + .long 0 + .long 1072332800 + .long 0 + .long 1072324608 + .long 0 + .long 1072308224 + .long 0 + .long 1072300032 + .long 0 + .long 1072291840 + .long 0 + .long 1072283648 + .long 0 + .long 1072275456 + .long 0 + .long 1072267264 + .long 1431655765 + .long 1071994197 + .long 1431655765 + .long 1015371093 + .long 1908874354 + .long 1071761180 + .long 1007461464 + .long 1071618781 + .long 565592401 + .long 1071446176 + .long 241555088 + .long 1071319599 + .long 943963244 + .long 1071221150 + .long 2330668378 + .long 1071141453 + .long 2770428108 + .long 1071075039 + .long 3622256836 + .long 1071018464 + .long 1497196870 + .long 1070969433 + .long 280472551 + .long 1070926345 + .long 1585032765 + .long 1070888044 + .long 0 + .long 1387266048 + .long 33554432 + .long 1101004800 + .long 512 + .long 1117782016 + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 4160749568 + .long 1072965794 + .long 2921479643 + .long 1043912488 + .long 2684354560 + .long 1073309182 + .long 4060791142 + .long 1045755320 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 0 + .long 3220176896 + .type __dcbrt_la__vmldCbrtTab,@object + .size __dcbrt_la__vmldCbrtTab,456 + .space 8, 0x00 + .align 16 + +.FLT_87: + .long 0x00000000,0x80000000,0x00000000,0x00000000 + .type .FLT_87,@object + .size .FLT_87,16 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S new file mode 100644 index 0000000000..3831e582ce --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized cbrt, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN8v_cbrt _ZGVeN8v_cbrt_avx2_wrapper +#include "../svml_d_cbrt8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c new file mode 100644 index 0000000000..28c147216f --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized cbrt, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN8v_cbrt +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN8v_cbrt, __GI__ZGVeN8v_cbrt, __redirect__ZGVeN8v_cbrt) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S new file mode 100644 index 0000000000..7e9b75174b --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S @@ -0,0 +1,895 @@ +/* Function cbrt vectorized with AVX-512. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52 + * Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5], + * where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in double precision + * cbrt(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5] + * (T stores the high 53 bits, D stores the low order bits) + * Result=2^k*T+(2^k*T*r)*P+2^k*D + * where P=p1+p2*r+..+p8*r^7 + * + */ + +#include + + .text + .section .text.evex512,"ax",@progbits +ENTRY(_ZGVeN8v_cbrt_skx) + vgetmantpd $0, {sae}, %zmm0, %zmm14 + +/* GetExp(x) */ + vgetexppd {sae}, %zmm0, %zmm7 + vmovups 384+__svml_dcbrt_data_internal_avx512(%rip), %zmm8 + +/* exponent/3 */ + vmovups 512+__svml_dcbrt_data_internal_avx512(%rip), %zmm9 + vmovups 576+__svml_dcbrt_data_internal_avx512(%rip), %zmm10 + +/* Reduced argument: R = DblRcp*Mantissa - 1 */ + vmovups 704+__svml_dcbrt_data_internal_avx512(%rip), %zmm2 + +/* exponent%3 (to be used as index) */ + vmovups 640+__svml_dcbrt_data_internal_avx512(%rip), %zmm11 + +/* DblRcp ~ 1/Mantissa */ + vrcp14pd %zmm14, %zmm13 + vaddpd {rn-sae}, %zmm8, %zmm7, %zmm12 + vandpd 448+__svml_dcbrt_data_internal_avx512(%rip), %zmm0, %zmm6 + +/* round DblRcp to 3 fractional bits (RN mode, no Precision exception) */ + vrndscalepd $72, {sae}, %zmm13, %zmm15 + vfmsub231pd {rn-sae}, %zmm12, %zmm9, %zmm10 + +/* polynomial */ + vmovups 768+__svml_dcbrt_data_internal_avx512(%rip), %zmm0 + vmovups 896+__svml_dcbrt_data_internal_avx512(%rip), %zmm7 + vmovups 960+__svml_dcbrt_data_internal_avx512(%rip), %zmm9 + vfmsub231pd {rn-sae}, %zmm15, %zmm14, %zmm2 + vrndscalepd $9, {sae}, %zmm10, %zmm5 + +/* Table lookup */ + vmovups 128+__svml_dcbrt_data_internal_avx512(%rip), %zmm10 + vmovups 1024+__svml_dcbrt_data_internal_avx512(%rip), %zmm8 + vmovups 1216+__svml_dcbrt_data_internal_avx512(%rip), %zmm13 + vfmadd231pd {rn-sae}, %zmm2, %zmm7, %zmm9 + vfnmadd231pd {rn-sae}, %zmm5, %zmm11, %zmm12 + vmovups 1088+__svml_dcbrt_data_internal_avx512(%rip), %zmm11 + vmovups 1344+__svml_dcbrt_data_internal_avx512(%rip), %zmm14 + +/* Prepare table index */ + vpsrlq $49, %zmm15, %zmm1 + +/* Table lookup: 2^(exponent%3) */ + vpermpd __svml_dcbrt_data_internal_avx512(%rip), %zmm12, %zmm4 + vpermpd 64+__svml_dcbrt_data_internal_avx512(%rip), %zmm12, %zmm3 + vpermt2pd 192+__svml_dcbrt_data_internal_avx512(%rip), %zmm1, %zmm10 + vmovups 832+__svml_dcbrt_data_internal_avx512(%rip), %zmm1 + vfmadd231pd {rn-sae}, %zmm2, %zmm8, %zmm11 + vmovups 1280+__svml_dcbrt_data_internal_avx512(%rip), %zmm12 + vscalefpd {rn-sae}, %zmm5, %zmm10, %zmm15 + vfmadd231pd {rn-sae}, %zmm2, %zmm0, %zmm1 + vmovups 1152+__svml_dcbrt_data_internal_avx512(%rip), %zmm5 + vfmadd231pd {rn-sae}, %zmm2, %zmm12, %zmm14 + vmulpd {rn-sae}, %zmm2, %zmm2, %zmm0 + vfmadd231pd {rn-sae}, %zmm2, %zmm5, %zmm13 + +/* Sh*R */ + vmulpd {rn-sae}, %zmm2, %zmm4, %zmm2 + vfmadd213pd {rn-sae}, %zmm9, %zmm0, %zmm1 + vfmadd213pd {rn-sae}, %zmm11, %zmm0, %zmm1 + vfmadd213pd {rn-sae}, %zmm13, %zmm0, %zmm1 + vfmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm1 + +/* Sl + (Sh*R)*Poly */ + vfmadd213pd {rn-sae}, %zmm3, %zmm1, %zmm2 + +/* + * branch-free + * scaled_Th*(Sh+Sl+Sh*R*Poly) + */ + vaddpd {rn-sae}, %zmm4, %zmm2, %zmm3 + vmulpd {rn-sae}, %zmm15, %zmm3, %zmm4 + vorpd %zmm6, %zmm4, %zmm0 + ret + +END(_ZGVeN8v_cbrt_skx) + + .align 16,0x90 + +__svml_dcbrt_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r8 + movzwl 6(%rdi), %r9d + andl $32752, %r9d + shrl $4, %r9d + movb 7(%rdi), %sil + movsd (%rdi), %xmm1 + cmpl $2047, %r9d + je L(6) + ucomisd 432+__dcbrt_la__vmldCbrtTab(%rip), %xmm1 + jp L(1) + je L(5) + +L(1): + movb %sil, %al + lea 440+__dcbrt_la__vmldCbrtTab(%rip), %rdx + andb $-128, %al + andb $127, %sil + shrb $7, %al + xorl %edi, %edi + movsd %xmm1, -56(%rsp) + movzbl %al, %ecx + movb %sil, -49(%rsp) + movsd (%rdx,%rcx,8), %xmm5 + testl %r9d, %r9d + jne L(2) + movsd -56(%rsp), %xmm0 + movl $100, %edi + mulsd 360+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + movsd %xmm0, -56(%rsp) + jmp L(3) + +L(2): + movsd -56(%rsp), %xmm0 + +L(3): + movzwl -50(%rsp), %esi + movl $1431655766, %eax + andl $32752, %esi + lea __dcbrt_la__vmldCbrtTab(%rip), %r11 + shrl $4, %esi + movsd %xmm0, -40(%rsp) + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm14 + imull %esi + movl $1431655766, %eax + lea (%rdx,%rdx,2), %ecx + negl %ecx + addl %esi, %ecx + subl %ecx, %esi + addl %ecx, %ecx + addl $-1023, %esi + imull %esi + sarl $31, %esi + subl %esi, %edx + addl $1023, %edx + subl %edi, %edx + movzwl -34(%rsp), %edi + andl $2047, %edx + andl $-32753, %edi + addl $16368, %edi + movw %di, -34(%rsp) + movsd -40(%rsp), %xmm11 + movaps %xmm11, %xmm6 + mulsd 376+__dcbrt_la__vmldCbrtTab(%rip), %xmm6 + movsd %xmm6, -32(%rsp) + movsd -32(%rsp), %xmm7 + movl -36(%rsp), %r10d + andl $1048575, %r10d + subsd -40(%rsp), %xmm7 + movsd %xmm7, -24(%rsp) + movsd -32(%rsp), %xmm9 + movsd -24(%rsp), %xmm8 + shrl $15, %r10d + subsd %xmm8, %xmm9 + movsd %xmm9, -32(%rsp) + movsd -32(%rsp), %xmm10 + movsd (%r11,%r10,8), %xmm4 + subsd %xmm10, %xmm11 + movaps %xmm4, %xmm12 + movaps %xmm4, %xmm13 + mulsd %xmm4, %xmm12 + movsd %xmm11, -24(%rsp) + movsd -32(%rsp), %xmm2 + mulsd %xmm12, %xmm2 + mulsd %xmm2, %xmm13 + movsd 440+__dcbrt_la__vmldCbrtTab(%rip), %xmm6 + movsd -24(%rsp), %xmm3 + subsd %xmm13, %xmm6 + mulsd %xmm12, %xmm3 + mulsd %xmm6, %xmm14 + mulsd %xmm3, %xmm4 + movsd %xmm14, -32(%rsp) + movsd -32(%rsp), %xmm15 + xorps .FLT_81(%rip), %xmm4 + subsd %xmm6, %xmm15 + movsd %xmm15, -24(%rsp) + movsd -32(%rsp), %xmm1 + movsd -24(%rsp), %xmm0 + movsd 256+__dcbrt_la__vmldCbrtTab(%rip), %xmm9 + subsd %xmm0, %xmm1 + movsd %xmm1, -32(%rsp) + movsd -32(%rsp), %xmm13 + movsd 352+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + subsd %xmm13, %xmm6 + movsd %xmm6, -24(%rsp) + movsd -32(%rsp), %xmm1 + movsd -24(%rsp), %xmm7 + movaps %xmm1, %xmm8 + movsd 256+__dcbrt_la__vmldCbrtTab(%rip), %xmm11 + addsd %xmm7, %xmm4 + movsd 256+__dcbrt_la__vmldCbrtTab(%rip), %xmm7 + addsd %xmm4, %xmm8 + mulsd %xmm8, %xmm0 + movslq %ecx, %rcx + addsd 344+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + movq 440+__dcbrt_la__vmldCbrtTab(%rip), %r9 + movq %r9, -48(%rsp) + shrq $48, %r9 + addsd 336+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + andl $-32753, %r9d + shll $4, %edx + addsd 328+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + orl %edx, %r9d + movw %r9w, -42(%rsp) + addsd 320+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 312+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 304+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 296+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 288+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 280+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd 272+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm8, %xmm0 + addsd %xmm0, %xmm9 + movsd %xmm9, -32(%rsp) + movsd -32(%rsp), %xmm10 + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm9 + subsd %xmm10, %xmm11 + movsd %xmm11, -24(%rsp) + movsd -32(%rsp), %xmm14 + movsd -24(%rsp), %xmm12 + addsd %xmm12, %xmm14 + movsd %xmm14, -16(%rsp) + movaps %xmm2, %xmm14 + movsd -24(%rsp), %xmm6 + addsd %xmm0, %xmm6 + movsd %xmm6, -24(%rsp) + movsd -16(%rsp), %xmm15 + subsd %xmm15, %xmm7 + movsd %xmm7, -16(%rsp) + movsd -24(%rsp), %xmm8 + movsd -16(%rsp), %xmm0 + addsd %xmm0, %xmm8 + movsd %xmm8, -16(%rsp) + movaps %xmm1, %xmm8 + movsd -32(%rsp), %xmm13 + mulsd %xmm13, %xmm9 + movsd -16(%rsp), %xmm0 + movsd %xmm9, -32(%rsp) + movsd -32(%rsp), %xmm10 + subsd %xmm13, %xmm10 + addsd 264+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + movsd %xmm10, -24(%rsp) + movsd -32(%rsp), %xmm11 + movsd -24(%rsp), %xmm6 + subsd %xmm6, %xmm11 + movsd %xmm11, -32(%rsp) + movsd -32(%rsp), %xmm12 + subsd %xmm12, %xmm13 + movsd %xmm13, -24(%rsp) + movsd -32(%rsp), %xmm7 + movsd -24(%rsp), %xmm6 + mulsd %xmm7, %xmm8 + addsd %xmm0, %xmm6 + mulsd %xmm4, %xmm7 + mulsd %xmm6, %xmm4 + mulsd %xmm6, %xmm1 + addsd %xmm4, %xmm7 + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm4 + addsd %xmm1, %xmm7 + mulsd %xmm8, %xmm4 + movsd %xmm7, -32(%rsp) + movsd -32(%rsp), %xmm10 + movsd %xmm4, -32(%rsp) + movsd -32(%rsp), %xmm0 + subsd %xmm8, %xmm0 + movsd %xmm0, -24(%rsp) + movsd -32(%rsp), %xmm1 + movsd -24(%rsp), %xmm4 + subsd %xmm4, %xmm1 + movsd %xmm1, -32(%rsp) + movsd -32(%rsp), %xmm6 + subsd %xmm6, %xmm8 + movsd %xmm8, -24(%rsp) + movsd -32(%rsp), %xmm9 + movsd -24(%rsp), %xmm7 + movaps %xmm9, %xmm1 + mulsd %xmm3, %xmm9 + addsd %xmm7, %xmm10 + mulsd %xmm2, %xmm1 + movaps %xmm10, %xmm11 + movaps %xmm1, %xmm12 + mulsd %xmm3, %xmm10 + addsd %xmm2, %xmm12 + mulsd %xmm2, %xmm11 + addsd %xmm9, %xmm10 + addsd %xmm10, %xmm11 + movsd %xmm11, -32(%rsp) + movsd -32(%rsp), %xmm0 + movsd %xmm12, -32(%rsp) + movsd -32(%rsp), %xmm13 + subsd %xmm13, %xmm14 + movsd %xmm14, -24(%rsp) + movsd -32(%rsp), %xmm9 + movsd -24(%rsp), %xmm15 + addsd %xmm15, %xmm9 + movsd %xmm9, -16(%rsp) + movsd -24(%rsp), %xmm10 + addsd %xmm10, %xmm1 + movsd %xmm1, -24(%rsp) + movsd -16(%rsp), %xmm4 + subsd %xmm4, %xmm2 + movsd 368+__dcbrt_la__vmldCbrtTab(%rip), %xmm4 + movsd %xmm2, -16(%rsp) + movsd -24(%rsp), %xmm1 + movsd -16(%rsp), %xmm2 + addsd %xmm2, %xmm1 + movsd %xmm1, -16(%rsp) + movsd -32(%rsp), %xmm9 + mulsd %xmm9, %xmm4 + movsd -16(%rsp), %xmm11 + movsd %xmm4, -32(%rsp) + movsd -32(%rsp), %xmm6 + subsd %xmm9, %xmm6 + movsd %xmm6, -24(%rsp) + movsd -32(%rsp), %xmm7 + movsd -24(%rsp), %xmm2 + subsd %xmm2, %xmm7 + movsd %xmm7, -32(%rsp) + movsd -32(%rsp), %xmm8 + subsd %xmm8, %xmm9 + movsd %xmm9, -24(%rsp) + movsd -32(%rsp), %xmm12 + movsd -24(%rsp), %xmm10 + addsd %xmm0, %xmm10 + addsd %xmm3, %xmm10 + movsd 392(%r11,%rcx,8), %xmm3 + movaps %xmm3, %xmm0 + addsd %xmm10, %xmm11 + mulsd %xmm12, %xmm3 + mulsd %xmm11, %xmm0 + movsd 384(%r11,%rcx,8), %xmm10 + addsd %xmm3, %xmm0 + mulsd %xmm10, %xmm11 + mulsd %xmm10, %xmm12 + addsd %xmm11, %xmm0 + movsd %xmm0, -32(%rsp) + movsd -32(%rsp), %xmm3 + addsd %xmm3, %xmm12 + mulsd -48(%rsp), %xmm12 + mulsd %xmm12, %xmm5 + movsd %xmm5, (%r8) + +L(4): + xorl %eax, %eax + ret + +L(5): + movsd 440+__dcbrt_la__vmldCbrtTab(%rip), %xmm0 + mulsd %xmm0, %xmm1 + movsd %xmm1, (%r8) + jmp L(4) + +L(6): + addsd %xmm1, %xmm1 + movsd %xmm1, (%r8) + jmp L(4) + + cfi_endproc + + .type __svml_dcbrt_cout_rare_internal,@function + .size __svml_dcbrt_cout_rare_internal,.-__svml_dcbrt_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_dcbrt_data_internal_avx512: + .long 0 + .long 1072693248 + .long 4186796683 + .long 1072965794 + .long 2772266557 + .long 1073309182 + .long 0 + .long 0 + .long 0 + .long 3220176896 + .long 4186796683 + .long 3220449442 + .long 2772266557 + .long 3220792830 + .long 0 + .long 0 + .long 0 + .long 0 + .long 1418634270 + .long 3162364962 + .long 2576690953 + .long 3164558313 + .long 0 + .long 0 + .long 0 + .long 0 + .long 1418634270 + .long 1014881314 + .long 2576690953 + .long 1017074665 + .long 0 + .long 0 + .long 4186796683 + .long 1072965794 + .long 1554061055 + .long 1072914931 + .long 3992368458 + .long 1072871093 + .long 3714535808 + .long 1072832742 + .long 954824104 + .long 1072798779 + .long 3256858690 + .long 1072768393 + .long 3858344660 + .long 1072740974 + .long 1027250248 + .long 1072716050 + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 1418634270 + .long 3162364962 + .long 629721892 + .long 1016287007 + .long 1776620500 + .long 3163956186 + .long 648592220 + .long 1016269578 + .long 1295766103 + .long 3161896715 + .long 1348094586 + .long 3164476360 + .long 2407028709 + .long 1015925873 + .long 497428409 + .long 1014435402 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 1127743488 + .long 0 + .long 1127743488 + .long 0 + .long 1127743488 + .long 0 + .long 1127743488 + .long 0 + .long 1127743488 + .long 0 + .long 1127743488 + .long 0 + .long 1127743488 + .long 0 + .long 1127743488 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 0 + .long 2147483648 + .long 1431655766 + .long 1070945621 + .long 1431655766 + .long 1070945621 + .long 1431655766 + .long 1070945621 + .long 1431655766 + .long 1070945621 + .long 1431655766 + .long 1070945621 + .long 1431655766 + .long 1070945621 + .long 1431655766 + .long 1070945621 + .long 1431655766 + .long 1070945621 + .long 0 + .long 1126170624 + .long 0 + .long 1126170624 + .long 0 + .long 1126170624 + .long 0 + .long 1126170624 + .long 0 + .long 1126170624 + .long 0 + .long 1126170624 + .long 0 + .long 1126170624 + .long 0 + .long 1126170624 + .long 0 + .long 1074266112 + .long 0 + .long 1074266112 + .long 0 + .long 1074266112 + .long 0 + .long 1074266112 + .long 0 + .long 1074266112 + .long 0 + .long 1074266112 + .long 0 + .long 1074266112 + .long 0 + .long 1074266112 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 0 + .long 1072693248 + .long 1792985698 + .long 3213372987 + .long 1792985698 + .long 3213372987 + .long 1792985698 + .long 3213372987 + .long 1792985698 + .long 3213372987 + .long 1792985698 + .long 3213372987 + .long 1792985698 + .long 3213372987 + .long 1792985698 + .long 3213372987 + .long 1792985698 + .long 3213372987 + .long 3135539317 + .long 1066129956 + .long 3135539317 + .long 1066129956 + .long 3135539317 + .long 1066129956 + .long 3135539317 + .long 1066129956 + .long 3135539317 + .long 1066129956 + .long 3135539317 + .long 1066129956 + .long 3135539317 + .long 1066129956 + .long 3135539317 + .long 1066129956 + .long 2087834975 + .long 3213899448 + .long 2087834975 + .long 3213899448 + .long 2087834975 + .long 3213899448 + .long 2087834975 + .long 3213899448 + .long 2087834975 + .long 3213899448 + .long 2087834975 + .long 3213899448 + .long 2087834975 + .long 3213899448 + .long 2087834975 + .long 3213899448 + .long 2476259604 + .long 1066628333 + .long 2476259604 + .long 1066628333 + .long 2476259604 + .long 1066628333 + .long 2476259604 + .long 1066628333 + .long 2476259604 + .long 1066628333 + .long 2476259604 + .long 1066628333 + .long 2476259604 + .long 1066628333 + .long 2476259604 + .long 1066628333 + .long 2012366478 + .long 3214412045 + .long 2012366478 + .long 3214412045 + .long 2012366478 + .long 3214412045 + .long 2012366478 + .long 3214412045 + .long 2012366478 + .long 3214412045 + .long 2012366478 + .long 3214412045 + .long 2012366478 + .long 3214412045 + .long 2012366478 + .long 3214412045 + .long 1104999785 + .long 1067378449 + .long 1104999785 + .long 1067378449 + .long 1104999785 + .long 1067378449 + .long 1104999785 + .long 1067378449 + .long 1104999785 + .long 1067378449 + .long 1104999785 + .long 1067378449 + .long 1104999785 + .long 1067378449 + .long 1104999785 + .long 1067378449 + .long 3534763582 + .long 3215266280 + .long 3534763582 + .long 3215266280 + .long 3534763582 + .long 3215266280 + .long 3534763582 + .long 3215266280 + .long 3534763582 + .long 3215266280 + .long 3534763582 + .long 3215266280 + .long 3534763582 + .long 3215266280 + .long 3534763582 + .long 3215266280 + .long 1007386161 + .long 1068473053 + .long 1007386161 + .long 1068473053 + .long 1007386161 + .long 1068473053 + .long 1007386161 + .long 1068473053 + .long 1007386161 + .long 1068473053 + .long 1007386161 + .long 1068473053 + .long 1007386161 + .long 1068473053 + .long 1007386161 + .long 1068473053 + .long 477218625 + .long 3216798151 + .long 477218625 + .long 3216798151 + .long 477218625 + .long 3216798151 + .long 477218625 + .long 3216798151 + .long 477218625 + .long 3216798151 + .long 477218625 + .long 3216798151 + .long 477218625 + .long 3216798151 + .long 477218625 + .long 3216798151 + .long 1431655767 + .long 1070945621 + .long 1431655767 + .long 1070945621 + .long 1431655767 + .long 1070945621 + .long 1431655767 + .long 1070945621 + .long 1431655767 + .long 1070945621 + .long 1431655767 + .long 1070945621 + .long 1431655767 + .long 1070945621 + .long 1431655767 + .long 1070945621 + .type __svml_dcbrt_data_internal_avx512,@object + .size __svml_dcbrt_data_internal_avx512,1408 + .align 32 + +__dcbrt_la__vmldCbrtTab: + .long 0 + .long 1072693248 + .long 0 + .long 1072668672 + .long 0 + .long 1072644096 + .long 0 + .long 1072627712 + .long 0 + .long 1072611328 + .long 0 + .long 1072586752 + .long 0 + .long 1072570368 + .long 0 + .long 1072553984 + .long 0 + .long 1072537600 + .long 0 + .long 1072521216 + .long 0 + .long 1072504832 + .long 0 + .long 1072488448 + .long 0 + .long 1072480256 + .long 0 + .long 1072463872 + .long 0 + .long 1072447488 + .long 0 + .long 1072439296 + .long 0 + .long 1072422912 + .long 0 + .long 1072414720 + .long 0 + .long 1072398336 + .long 0 + .long 1072390144 + .long 0 + .long 1072373760 + .long 0 + .long 1072365568 + .long 0 + .long 1072357376 + .long 0 + .long 1072340992 + .long 0 + .long 1072332800 + .long 0 + .long 1072324608 + .long 0 + .long 1072308224 + .long 0 + .long 1072300032 + .long 0 + .long 1072291840 + .long 0 + .long 1072283648 + .long 0 + .long 1072275456 + .long 0 + .long 1072267264 + .long 1431655765 + .long 1071994197 + .long 1431655765 + .long 1015371093 + .long 1908874354 + .long 1071761180 + .long 1007461464 + .long 1071618781 + .long 565592401 + .long 1071446176 + .long 241555088 + .long 1071319599 + .long 943963244 + .long 1071221150 + .long 2330668378 + .long 1071141453 + .long 2770428108 + .long 1071075039 + .long 3622256836 + .long 1071018464 + .long 1497196870 + .long 1070969433 + .long 280472551 + .long 1070926345 + .long 1585032765 + .long 1070888044 + .long 0 + .long 1387266048 + .long 33554432 + .long 1101004800 + .long 512 + .long 1117782016 + .long 0 + .long 1072693248 + .long 0 + .long 0 + .long 4160749568 + .long 1072965794 + .long 2921479643 + .long 1043912488 + .long 2684354560 + .long 1073309182 + .long 4060791142 + .long 1045755320 + .long 0 + .long 0 + .long 0 + .long 1072693248 + .long 0 + .long 3220176896 + .type __dcbrt_la__vmldCbrtTab,@object + .size __dcbrt_la__vmldCbrtTab,456 + .space 8, 0x00 + .align 16 + +.FLT_81: + .long 0x00000000,0x80000000,0x00000000,0x00000000 + .type .FLT_81,@object + .size .FLT_81,16 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S new file mode 100644 index 0000000000..faa847fba6 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized cbrtf. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVeN16v_cbrtf _ZGVeN16v_cbrtf_avx2_wrapper +#include "../svml_s_cbrtf16_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c new file mode 100644 index 0000000000..785a68cc0d --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized cbrtf, vector length is 16. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVeN16v_cbrtf +#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN16v_cbrtf, __GI__ZGVeN16v_cbrtf, + __redirect__ZGVeN16v_cbrtf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S new file mode 100644 index 0000000000..fabe59ebe1 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S @@ -0,0 +1,1003 @@ +/* Function cbrtf vectorized with AVX-512. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52 + * Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5], + * where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in single precision + * cbrtf(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5] + * (T stores the high 24 bits, D stores the low order bits) + * Result=2^k*T+(2^k*T*r)*P+2^k*D + * where P=p1+p2*r+.. + * + */ + +#include + + .text + .section .text.exex512,"ax",@progbits +ENTRY(_ZGVeN16v_cbrtf_skx) + vgetmantps $0, {sae}, %zmm0, %zmm8 + +/* GetExp(x) */ + vgetexpps {sae}, %zmm0, %zmm1 + vmovups 384+__svml_scbrt_data_internal_avx512(%rip), %zmm2 + +/* exponent/3 */ + vmovups 512+__svml_scbrt_data_internal_avx512(%rip), %zmm3 + vmovups 576+__svml_scbrt_data_internal_avx512(%rip), %zmm4 + vmovups 704+__svml_scbrt_data_internal_avx512(%rip), %zmm15 + +/* exponent%3 (to be used as index) */ + vmovups 640+__svml_scbrt_data_internal_avx512(%rip), %zmm5 + +/* polynomial */ + vmovups 768+__svml_scbrt_data_internal_avx512(%rip), %zmm11 + vmovups 896+__svml_scbrt_data_internal_avx512(%rip), %zmm14 + +/* Table lookup */ + vmovups 128+__svml_scbrt_data_internal_avx512(%rip), %zmm12 + +/* DblRcp ~ 1/Mantissa */ + vrcp14ps %zmm8, %zmm7 + vaddps {rn-sae}, %zmm2, %zmm1, %zmm6 + vandps 448+__svml_scbrt_data_internal_avx512(%rip), %zmm0, %zmm0 + +/* round DblRcp to 3 fractional bits (RN mode, no Precision exception) */ + vrndscaleps $88, {sae}, %zmm7, %zmm9 + vfmsub231ps {rn-sae}, %zmm6, %zmm3, %zmm4 + vmovups 832+__svml_scbrt_data_internal_avx512(%rip), %zmm7 + +/* Reduced argument: R = DblRcp*Mantissa - 1 */ + vfmsub231ps {rn-sae}, %zmm9, %zmm8, %zmm15 + vrndscaleps $9, {sae}, %zmm4, %zmm13 + +/* Prepare table index */ + vpsrld $19, %zmm9, %zmm10 + vfmadd231ps {rn-sae}, %zmm15, %zmm11, %zmm7 + vfnmadd231ps {rn-sae}, %zmm13, %zmm5, %zmm6 + vpermt2ps 192+__svml_scbrt_data_internal_avx512(%rip), %zmm10, %zmm12 + vfmadd213ps {rn-sae}, %zmm14, %zmm15, %zmm7 + vscalefps {rn-sae}, %zmm13, %zmm12, %zmm2 + +/* Table lookup: 2^(exponent%3) */ + vpermps __svml_scbrt_data_internal_avx512(%rip), %zmm6, %zmm1 + vpermps 64+__svml_scbrt_data_internal_avx512(%rip), %zmm6, %zmm6 + +/* Sh*R */ + vmulps {rn-sae}, %zmm15, %zmm1, %zmm14 + +/* Sl + (Sh*R)*Poly */ + vfmadd213ps {rn-sae}, %zmm6, %zmm7, %zmm14 + +/* + * branch-free + * scaled_Th*(Sh+Sl+Sh*R*Poly) + */ + vaddps {rn-sae}, %zmm1, %zmm14, %zmm15 + vmulps {rn-sae}, %zmm2, %zmm15, %zmm3 + vorps %zmm0, %zmm3, %zmm0 + ret + +END(_ZGVeN16v_cbrtf_skx) + + .align 16,0x90 + +__svml_scbrt_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r9 + movl $1065353216, -24(%rsp) + movss (%rdi), %xmm0 + movss -24(%rsp), %xmm1 + mulss %xmm0, %xmm1 + movss %xmm1, -4(%rsp) + movzwl -2(%rsp), %eax + andl $32640, %eax + shrl $7, %eax + cmpl $255, %eax + je L(6) + pxor %xmm0, %xmm0 + ucomiss %xmm0, %xmm1 + jp L(1) + je L(5) + +L(1): + testl %eax, %eax + jne L(2) + movl $2122317824, -24(%rsp) + movl $713031680, -20(%rsp) + jmp L(3) + +L(2): + movl $1065353216, %eax + movl %eax, -24(%rsp) + movl %eax, -20(%rsp) + +L(3): + movss -24(%rsp), %xmm0 + lea __scbrt_la_vscbrt_ha_cout_data(%rip), %rsi + mulss %xmm0, %xmm1 + movd %xmm1, %ecx + movss %xmm1, -4(%rsp) + movl %ecx, %r10d + movl %ecx, %edi + andl $8388607, %r10d + movl %ecx, %r11d + shrl $23, %edi + andl $8257536, %r11d + orl $-1082130432, %r10d + orl $-1081999360, %r11d + movl %r10d, -16(%rsp) + movl %ecx, %edx + movzbl %dil, %r8d + andl $2147483647, %ecx + movl %r11d, -12(%rsp) + andl $-256, %edi + movss -16(%rsp), %xmm1 + addl $2139095040, %ecx + shrl $16, %edx + subss -12(%rsp), %xmm1 + andl $124, %edx + lea (%r8,%r8,4), %r10d + mulss (%rsi,%rdx), %xmm1 + lea (%r10,%r10), %r11d + movss .FLT_35(%rip), %xmm4 + lea (%r11,%r11), %eax + addl %eax, %eax + lea (%r10,%r11,8), %r10d + addl %eax, %eax + decl %r8d + mulss %xmm1, %xmm4 + shll $7, %r8d + lea (%r10,%rax,8), %r11d + lea (%r11,%rax,8), %r10d + shrl $12, %r10d + addss .FLT_34(%rip), %xmm4 + mulss %xmm1, %xmm4 + lea 85(%r10), %eax + orl %edi, %eax + xorl %edi, %edi + cmpl $-16777217, %ecx + addss .FLT_33(%rip), %xmm4 + setg %dil + shll $7, %r10d + negl %edi + subl %r10d, %r8d + addl %r10d, %r10d + subl %r10d, %r8d + notl %edi + addl %r8d, %edx + andl %edx, %edi + shll $23, %eax + addl %edi, %edi + movl %eax, -8(%rsp) + movss 128(%rdi,%rsi), %xmm5 + movss -8(%rsp), %xmm2 + mulss %xmm1, %xmm4 + mulss %xmm2, %xmm5 + addss .FLT_32(%rip), %xmm4 + mulss %xmm5, %xmm1 + movss 132(%rsi,%rdi), %xmm3 + mulss %xmm1, %xmm4 + mulss %xmm2, %xmm3 + addss %xmm3, %xmm4 + addss %xmm4, %xmm5 + mulss -20(%rsp), %xmm5 + movss %xmm5, (%r9) + +L(4): + xorl %eax, %eax + ret + +L(5): + movss %xmm1, (%r9) + jmp L(4) + +L(6): + addss %xmm0, %xmm0 + movss %xmm0, (%r9) + jmp L(4) + + cfi_endproc + + .type __svml_scbrt_cout_rare_internal,@function + .size __svml_scbrt_cout_rare_internal,.-__svml_scbrt_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_scbrt_data_internal_avx512: + .long 1065353216 + .long 1067533592 + .long 1070280693 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 2999865775 + .long 849849800 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 1067533592 + .long 1067322155 + .long 1067126683 + .long 1066945178 + .long 1066775983 + .long 1066617708 + .long 1066469175 + .long 1066329382 + .long 1066197466 + .long 1066072682 + .long 1065954382 + .long 1065841998 + .long 1065735031 + .long 1065633040 + .long 1065535634 + .long 1065442463 + .long 1065353216 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 2999865775 + .long 849353281 + .long 2992093760 + .long 858369405 + .long 861891413 + .long 3001900484 + .long 2988845984 + .long 3009185201 + .long 3001209163 + .long 847824101 + .long 839380496 + .long 845124191 + .long 851391835 + .long 856440803 + .long 2989578734 + .long 852890174 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 0 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 1262485504 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 2147483648 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1249902592 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1077936128 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1065353216 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 1031603580 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 3185812323 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .long 1051372202 + .type __svml_scbrt_data_internal_avx512,@object + .size __svml_scbrt_data_internal_avx512,960 + .align 64 + +__scbrt_la_vscbrt_ha_cout_data: + .long 3212578753 + .long 3212085645 + .long 3211621124 + .long 3211182772 + .long 3210768440 + .long 3210376206 + .long 3210004347 + .long 3209651317 + .long 3209315720 + .long 3208996296 + .long 3208691905 + .long 3208401508 + .long 3208124163 + .long 3207859009 + .long 3207605259 + .long 3207362194 + .long 3207129151 + .long 3206905525 + .long 3206690755 + .long 3206484326 + .long 3206285761 + .long 3206094618 + .long 3205910490 + .long 3205732998 + .long 3205561788 + .long 3205396533 + .long 3205236929 + .long 3205082689 + .long 3204933547 + .long 3204789256 + .long 3204649583 + .long 3204514308 + .long 1065396681 + .long 839340838 + .long 1065482291 + .long 867750258 + .long 1065566215 + .long 851786446 + .long 1065648532 + .long 853949398 + .long 1065729317 + .long 864938789 + .long 1065808640 + .long 864102364 + .long 1065886565 + .long 864209792 + .long 1065963152 + .long 865422805 + .long 1066038457 + .long 867593594 + .long 1066112533 + .long 854482593 + .long 1066185428 + .long 848298042 + .long 1066257188 + .long 860064854 + .long 1066327857 + .long 844792593 + .long 1066397474 + .long 870701309 + .long 1066466079 + .long 872023170 + .long 1066533708 + .long 860255342 + .long 1066600394 + .long 849966899 + .long 1066666169 + .long 863561479 + .long 1066731064 + .long 869115319 + .long 1066795108 + .long 871961375 + .long 1066858329 + .long 859537336 + .long 1066920751 + .long 871954398 + .long 1066982401 + .long 863817578 + .long 1067043301 + .long 861687921 + .long 1067103474 + .long 849594757 + .long 1067162941 + .long 816486846 + .long 1067221722 + .long 858183533 + .long 1067279837 + .long 864500406 + .long 1067337305 + .long 850523240 + .long 1067394143 + .long 808125243 + .long 1067450368 + .long 0 + .long 1067505996 + .long 861173761 + .long 1067588354 + .long 859000219 + .long 1067696217 + .long 823158129 + .long 1067801953 + .long 871826232 + .long 1067905666 + .long 871183196 + .long 1068007450 + .long 839030530 + .long 1068107390 + .long 867690638 + .long 1068205570 + .long 840440923 + .long 1068302063 + .long 868033274 + .long 1068396942 + .long 855856030 + .long 1068490271 + .long 865094453 + .long 1068582113 + .long 860418487 + .long 1068672525 + .long 866225006 + .long 1068761562 + .long 866458226 + .long 1068849275 + .long 865124659 + .long 1068935712 + .long 864837702 + .long 1069020919 + .long 811742505 + .long 1069104937 + .long 869432099 + .long 1069187809 + .long 864584201 + .long 1069269572 + .long 864183978 + .long 1069350263 + .long 844810573 + .long 1069429915 + .long 869245699 + .long 1069508563 + .long 859556409 + .long 1069586236 + .long 870675446 + .long 1069662966 + .long 814190139 + .long 1069738778 + .long 870686941 + .long 1069813702 + .long 861800510 + .long 1069887762 + .long 855649163 + .long 1069960982 + .long 869347119 + .long 1070033387 + .long 864252033 + .long 1070104998 + .long 867276215 + .long 1070175837 + .long 868189817 + .long 1070245925 + .long 849541095 + .long 1070349689 + .long 866633177 + .long 1070485588 + .long 843967686 + .long 1070618808 + .long 857522493 + .long 1070749478 + .long 862339487 + .long 1070877717 + .long 850054662 + .long 1071003634 + .long 864048556 + .long 1071127332 + .long 868027089 + .long 1071248907 + .long 848093931 + .long 1071368446 + .long 865355299 + .long 1071486034 + .long 848111485 + .long 1071601747 + .long 865557362 + .long 1071715659 + .long 870297525 + .long 1071827839 + .long 863416216 + .long 1071938350 + .long 869675693 + .long 1072047254 + .long 865888071 + .long 1072154608 + .long 825332584 + .long 1072260465 + .long 843309506 + .long 1072364876 + .long 870885636 + .long 1072467891 + .long 869119784 + .long 1072569555 + .long 865466648 + .long 1072669911 + .long 867459244 + .long 1072769001 + .long 861192764 + .long 1072866863 + .long 871247716 + .long 1072963536 + .long 864927982 + .long 1073059054 + .long 869195129 + .long 1073153452 + .long 864849564 + .long 1073246762 + .long 840005936 + .long 1073339014 + .long 852579258 + .long 1073430238 + .long 860852782 + .long 1073520462 + .long 869711141 + .long 1073609714 + .long 862506141 + .long 1073698019 + .long 837959274 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .type __scbrt_la_vscbrt_ha_cout_data,@object + .size __scbrt_la_vscbrt_ha_cout_data,1920 + .align 4 + +.FLT_28: + .long 0x007fffff + .type .FLT_28,@object + .size .FLT_28,4 + .align 4 + +.FLT_29: + .long 0x007e0000 + .type .FLT_29,@object + .size .FLT_29,4 + .align 4 + +.FLT_30: + .long 0xbf800000 + .type .FLT_30,@object + .size .FLT_30,4 + .align 4 + +.FLT_31: + .long 0xbf820000 + .type .FLT_31,@object + .size .FLT_31,4 + .align 4 + +.FLT_32: + .long 0x3eaaaaab + .type .FLT_32,@object + .size .FLT_32,4 + .align 4 + +.FLT_33: + .long 0xbde38e39 + .type .FLT_33,@object + .size .FLT_33,4 + .align 4 + +.FLT_34: + .long 0x3d7cd6ea + .type .FLT_34,@object + .size .FLT_34,4 + .align 4 + +.FLT_35: + .long 0xbd288f47 + .type .FLT_35,@object + .size .FLT_35,4 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S new file mode 100644 index 0000000000..76fc254e7a --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized cbrtf, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVbN4v_cbrtf _ZGVbN4v_cbrtf_sse2 +#include "../svml_s_cbrtf4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c new file mode 100644 index 0000000000..564a549b39 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized cbrtf, vector length is 4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVbN4v_cbrtf +#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN4v_cbrtf, __GI__ZGVbN4v_cbrtf, + __redirect__ZGVbN4v_cbrtf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S new file mode 100644 index 0000000000..88b50269e3 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S @@ -0,0 +1,1863 @@ +/* Function cbrtf vectorized with SSE4. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52 + * Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5], + * where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in single precision + * cbrtf(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5] + * (T stores the high 24 bits, D stores the low order bits) + * Result=2^k*T+(2^k*T*r)*P+2^k*D + * where P=p1+p2*r+.. + * + */ + +#include + + .text + .section .text.sse4,"ax",@progbits +ENTRY(_ZGVbN4v_cbrtf_sse4) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $320, %rsp + movaps %xmm0, %xmm5 + +/* + * Load constants + * Reciprocal index calculation + */ + movaps %xmm5, %xmm2 + +/* Load reciprocal value */ + lea __svml_scbrt_data_internal(%rip), %rdx + movdqu 896+__svml_scbrt_data_internal(%rip), %xmm1 + psrld $16, %xmm2 + pand %xmm2, %xmm1 + +/* Get signed biased exponent */ + psrld $7, %xmm2 + pshufd $1, %xmm1, %xmm3 + movd %xmm1, %eax + pshufd $2, %xmm1, %xmm7 + pshufd $3, %xmm1, %xmm0 + movd %xmm3, %ecx + movd %xmm7, %r8d + movd %xmm0, %r9d + movups 704+__svml_scbrt_data_internal(%rip), %xmm7 + andps %xmm5, %xmm7 + movslq %eax, %rax + movslq %ecx, %rcx + movslq %r8d, %r8 + movslq %r9d, %r9 + movd (%rdx,%rax), %xmm4 + movd (%rdx,%rcx), %xmm6 + punpckldq %xmm6, %xmm4 + movd (%rdx,%r8), %xmm6 + movd (%rdx,%r9), %xmm3 + punpckldq %xmm3, %xmm6 + punpcklqdq %xmm6, %xmm4 + +/* Argument reduction */ + movups 640+__svml_scbrt_data_internal(%rip), %xmm6 + andps %xmm5, %xmm6 + orps 768+__svml_scbrt_data_internal(%rip), %xmm6 + orps 832+__svml_scbrt_data_internal(%rip), %xmm7 + movdqu 1280+__svml_scbrt_data_internal(%rip), %xmm3 + +/* r=y-y` */ + subps %xmm7, %xmm6 + movups %xmm5, (%rsp) + pand %xmm5, %xmm3 + +/* Get absolute biased exponent */ + movdqu 960+__svml_scbrt_data_internal(%rip), %xmm0 + +/* + * Calculate exponent/3 + * i555Exp=(2^{12}-1)/3*exponent + */ + movdqu 1216+__svml_scbrt_data_internal(%rip), %xmm5 + pand %xmm2, %xmm0 + movdqa %xmm5, %xmm7 + psrlq $32, %xmm5 + +/* r=(y-y`)*rcp_table(y`) */ + mulps %xmm6, %xmm4 + movdqa %xmm0, %xmm6 + pmuludq %xmm0, %xmm7 + psrlq $32, %xmm0 + pmuludq %xmm5, %xmm0 + pand .FLT_36(%rip), %xmm7 + psllq $32, %xmm0 + por %xmm0, %xmm7 + psubd 1152+__svml_scbrt_data_internal(%rip), %xmm6 + +/* Get K (exponent=3*k+j) */ + psrld $12, %xmm7 + +/* Get J */ + psubd %xmm7, %xmm6 + psubd %xmm7, %xmm6 + psubd %xmm7, %xmm6 + psubd 1344+__svml_scbrt_data_internal(%rip), %xmm3 + +/* Get 128*J */ + pslld $7, %xmm6 + pcmpgtd 1408+__svml_scbrt_data_internal(%rip), %xmm3 + +/* + * iCbrtIndex=4*l+128*j + * Zero index if callout expected + */ + paddd %xmm6, %xmm1 + movmskps %xmm3, %eax + pandn %xmm1, %xmm3 + +/* Load Cbrt table Hi & Lo values */ + pshufd $1, %xmm3, %xmm1 + +/* + * Add 2/3*(bias-1)+1 to (k+1/3*(bias-1)) + * Attach sign to exponent + */ + movdqu 1088+__svml_scbrt_data_internal(%rip), %xmm0 + movd %xmm3, %r10d + paddd %xmm7, %xmm0 + movd %xmm1, %r11d + pshufd $2, %xmm3, %xmm1 + pshufd $3, %xmm3, %xmm3 + movd %xmm1, %ecx + movd %xmm3, %r8d + +/* Biased exponent-1 */ + pand 1024+__svml_scbrt_data_internal(%rip), %xmm2 + por %xmm2, %xmm0 + movslq %r10d, %r10 + pslld $23, %xmm0 + movslq %r11d, %r11 + movslq %ecx, %rcx + movslq %r8d, %r8 + movd 128(%rdx,%r10), %xmm5 + movd 128(%rdx,%r11), %xmm2 + punpckldq %xmm2, %xmm5 + movd 128(%rdx,%rcx), %xmm6 + movd 128(%rdx,%r8), %xmm2 + punpckldq %xmm2, %xmm6 + punpcklqdq %xmm6, %xmm5 + +/* sCbrtHi *= 2^k */ + mulps %xmm5, %xmm0 + +/* Polynomial: p1+r*(p2*r+r*(p3+r*p4)) */ + movups 512+__svml_scbrt_data_internal(%rip), %xmm5 + mulps %xmm4, %xmm5 + +/* T`*r */ + mulps %xmm0, %xmm4 + addps 576+__svml_scbrt_data_internal(%rip), %xmm5 + +/* (T`*r)*P */ + mulps %xmm4, %xmm5 + movups (%rsp), %xmm1 + +/* + * T`*r*P+D` + * result = T`+(T`*r*P+D`) + */ + addps %xmm5, %xmm0 + testl %eax, %eax + jne L(2) + +L(1): + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + +L(2): + movups %xmm1, 192(%rsp) + movups %xmm0, 256(%rsp) + xorl %edx, %edx + movups %xmm8, 112(%rsp) + movups %xmm9, 96(%rsp) + movups %xmm10, 80(%rsp) + movups %xmm11, 64(%rsp) + movups %xmm12, 48(%rsp) + movups %xmm13, 32(%rsp) + movups %xmm14, 16(%rsp) + movups %xmm15, (%rsp) + movq %rsi, 136(%rsp) + movq %rdi, 128(%rsp) + movq %r12, 152(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r12d + movq %r13, 144(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + movl %eax, %r13d + +L(3): + btl %r12d, %r13d + jc L(5) + +L(4): + incl %r12d + cmpl $4, %r12d + jl L(3) + movups 112(%rsp), %xmm8 + cfi_restore(25) + movups 96(%rsp), %xmm9 + cfi_restore(26) + movups 80(%rsp), %xmm10 + cfi_restore(27) + movups 64(%rsp), %xmm11 + cfi_restore(28) + movups 48(%rsp), %xmm12 + cfi_restore(29) + movups 32(%rsp), %xmm13 + cfi_restore(30) + movups 16(%rsp), %xmm14 + cfi_restore(31) + movups (%rsp), %xmm15 + cfi_restore(32) + movq 136(%rsp), %rsi + cfi_restore(4) + movq 128(%rsp), %rdi + cfi_restore(5) + movq 152(%rsp), %r12 + cfi_restore(12) + movq 144(%rsp), %r13 + cfi_restore(13) + movups 256(%rsp), %xmm0 + jmp L(1) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x58, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x19, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x30, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1a, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1b, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x1f, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x20, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22 + +L(5): + lea 192(%rsp,%r12,4), %rdi + lea 256(%rsp,%r12,4), %rsi + call __svml_scbrt_cout_rare_internal + jmp L(4) + +END(_ZGVbN4v_cbrtf_sse4) + + .align 16,0x90 + +__svml_scbrt_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r9 + movl $1065353216, -24(%rsp) + movss (%rdi), %xmm0 + movss -24(%rsp), %xmm1 + mulss %xmm0, %xmm1 + movss %xmm1, -4(%rsp) + movzwl -2(%rsp), %eax + andl $32640, %eax + shrl $7, %eax + cmpl $255, %eax + je L(11) + pxor %xmm0, %xmm0 + ucomiss %xmm0, %xmm1 + jp L(6) + je L(10) + +L(6): + testl %eax, %eax + jne L(7) + movl $2122317824, -24(%rsp) + movl $713031680, -20(%rsp) + jmp L(8) + +L(7): + movl $1065353216, %eax + movl %eax, -24(%rsp) + movl %eax, -20(%rsp) + +L(8): + movss -24(%rsp), %xmm0 + lea __scbrt_la_vscbrt_ha_cout_data(%rip), %rsi + mulss %xmm0, %xmm1 + movd %xmm1, %ecx + movss %xmm1, -4(%rsp) + movl %ecx, %r10d + movl %ecx, %edi + andl $8388607, %r10d + movl %ecx, %r11d + shrl $23, %edi + andl $8257536, %r11d + orl $-1082130432, %r10d + orl $-1081999360, %r11d + movl %r10d, -16(%rsp) + movl %ecx, %edx + movzbl %dil, %r8d + andl $2147483647, %ecx + movl %r11d, -12(%rsp) + andl $-256, %edi + movss -16(%rsp), %xmm1 + addl $2139095040, %ecx + shrl $16, %edx + subss -12(%rsp), %xmm1 + andl $124, %edx + lea (%r8,%r8,4), %r10d + mulss (%rsi,%rdx), %xmm1 + lea (%r10,%r10), %r11d + movss .FLT_44(%rip), %xmm4 + lea (%r11,%r11), %eax + addl %eax, %eax + lea (%r10,%r11,8), %r10d + addl %eax, %eax + decl %r8d + mulss %xmm1, %xmm4 + shll $7, %r8d + lea (%r10,%rax,8), %r11d + lea (%r11,%rax,8), %r10d + shrl $12, %r10d + addss .FLT_43(%rip), %xmm4 + mulss %xmm1, %xmm4 + lea 85(%r10), %eax + orl %edi, %eax + xorl %edi, %edi + cmpl $-16777217, %ecx + addss .FLT_42(%rip), %xmm4 + setg %dil + shll $7, %r10d + negl %edi + subl %r10d, %r8d + addl %r10d, %r10d + subl %r10d, %r8d + notl %edi + addl %r8d, %edx + andl %edx, %edi + shll $23, %eax + addl %edi, %edi + movl %eax, -8(%rsp) + movss 128(%rdi,%rsi), %xmm5 + movss -8(%rsp), %xmm2 + mulss %xmm1, %xmm4 + mulss %xmm2, %xmm5 + addss .FLT_41(%rip), %xmm4 + mulss %xmm5, %xmm1 + movss 132(%rsi,%rdi), %xmm3 + mulss %xmm1, %xmm4 + mulss %xmm2, %xmm3 + addss %xmm3, %xmm4 + addss %xmm4, %xmm5 + mulss -20(%rsp), %xmm5 + movss %xmm5, (%r9) + +L(9): + xorl %eax, %eax + ret + +L(10): + movss %xmm1, (%r9) + jmp L(9) + +L(11): + addss %xmm0, %xmm0 + movss %xmm0, (%r9) + jmp L(9) + + cfi_endproc + + .type __svml_scbrt_cout_rare_internal,@function + .size __svml_scbrt_cout_rare_internal,.-__svml_scbrt_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_scbrt_data_internal: + .long 3212578753 + .long 3212085645 + .long 3211621124 + .long 3211182772 + .long 3210768440 + .long 3210376206 + .long 3210004347 + .long 3209651317 + .long 3209315720 + .long 3208996296 + .long 3208691905 + .long 3208401508 + .long 3208124163 + .long 3207859009 + .long 3207605259 + .long 3207362194 + .long 3207129151 + .long 3206905525 + .long 3206690755 + .long 3206484326 + .long 3206285761 + .long 3206094618 + .long 3205910490 + .long 3205732998 + .long 3205561788 + .long 3205396533 + .long 3205236929 + .long 3205082689 + .long 3204933547 + .long 3204789256 + .long 3204649583 + .long 3204514308 + .long 1065396681 + .long 1065482291 + .long 1065566215 + .long 1065648532 + .long 1065729317 + .long 1065808640 + .long 1065886565 + .long 1065963152 + .long 1066038457 + .long 1066112533 + .long 1066185428 + .long 1066257188 + .long 1066327857 + .long 1066397474 + .long 1066466079 + .long 1066533708 + .long 1066600394 + .long 1066666169 + .long 1066731064 + .long 1066795108 + .long 1066858329 + .long 1066920751 + .long 1066982401 + .long 1067043301 + .long 1067103474 + .long 1067162941 + .long 1067221722 + .long 1067279837 + .long 1067337305 + .long 1067394143 + .long 1067450368 + .long 1067505996 + .long 1067588354 + .long 1067696217 + .long 1067801953 + .long 1067905666 + .long 1068007450 + .long 1068107390 + .long 1068205570 + .long 1068302063 + .long 1068396942 + .long 1068490271 + .long 1068582113 + .long 1068672525 + .long 1068761562 + .long 1068849275 + .long 1068935712 + .long 1069020919 + .long 1069104937 + .long 1069187809 + .long 1069269572 + .long 1069350263 + .long 1069429915 + .long 1069508563 + .long 1069586236 + .long 1069662966 + .long 1069738778 + .long 1069813702 + .long 1069887762 + .long 1069960982 + .long 1070033387 + .long 1070104998 + .long 1070175837 + .long 1070245925 + .long 1070349689 + .long 1070485588 + .long 1070618808 + .long 1070749478 + .long 1070877717 + .long 1071003634 + .long 1071127332 + .long 1071248907 + .long 1071368446 + .long 1071486034 + .long 1071601747 + .long 1071715659 + .long 1071827839 + .long 1071938350 + .long 1072047254 + .long 1072154608 + .long 1072260465 + .long 1072364876 + .long 1072467891 + .long 1072569555 + .long 1072669911 + .long 1072769001 + .long 1072866863 + .long 1072963536 + .long 1073059054 + .long 1073153452 + .long 1073246762 + .long 1073339014 + .long 1073430238 + .long 1073520462 + .long 1073609714 + .long 1073698019 + .long 3185813858 + .long 3185813858 + .long 3185813858 + .long 3185813858 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1051372689 + .long 1051372689 + .long 1051372689 + .long 1051372689 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 124 + .long 124 + .long 124 + .long 124 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 255 + .long 255 + .long 255 + .long 255 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 256 + .long 256 + .long 256 + .long 256 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 85 + .long 85 + .long 85 + .long 85 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1 + .long 1 + .long 1 + .long 1 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1365 + .long 1365 + .long 1365 + .long 1365 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_scbrt_data_internal,@object + .size __svml_scbrt_data_internal,1472 + .align 64 + +__scbrt_la_vscbrt_ha_cout_data: + .long 3212578753 + .long 3212085645 + .long 3211621124 + .long 3211182772 + .long 3210768440 + .long 3210376206 + .long 3210004347 + .long 3209651317 + .long 3209315720 + .long 3208996296 + .long 3208691905 + .long 3208401508 + .long 3208124163 + .long 3207859009 + .long 3207605259 + .long 3207362194 + .long 3207129151 + .long 3206905525 + .long 3206690755 + .long 3206484326 + .long 3206285761 + .long 3206094618 + .long 3205910490 + .long 3205732998 + .long 3205561788 + .long 3205396533 + .long 3205236929 + .long 3205082689 + .long 3204933547 + .long 3204789256 + .long 3204649583 + .long 3204514308 + .long 1065396681 + .long 839340838 + .long 1065482291 + .long 867750258 + .long 1065566215 + .long 851786446 + .long 1065648532 + .long 853949398 + .long 1065729317 + .long 864938789 + .long 1065808640 + .long 864102364 + .long 1065886565 + .long 864209792 + .long 1065963152 + .long 865422805 + .long 1066038457 + .long 867593594 + .long 1066112533 + .long 854482593 + .long 1066185428 + .long 848298042 + .long 1066257188 + .long 860064854 + .long 1066327857 + .long 844792593 + .long 1066397474 + .long 870701309 + .long 1066466079 + .long 872023170 + .long 1066533708 + .long 860255342 + .long 1066600394 + .long 849966899 + .long 1066666169 + .long 863561479 + .long 1066731064 + .long 869115319 + .long 1066795108 + .long 871961375 + .long 1066858329 + .long 859537336 + .long 1066920751 + .long 871954398 + .long 1066982401 + .long 863817578 + .long 1067043301 + .long 861687921 + .long 1067103474 + .long 849594757 + .long 1067162941 + .long 816486846 + .long 1067221722 + .long 858183533 + .long 1067279837 + .long 864500406 + .long 1067337305 + .long 850523240 + .long 1067394143 + .long 808125243 + .long 1067450368 + .long 0 + .long 1067505996 + .long 861173761 + .long 1067588354 + .long 859000219 + .long 1067696217 + .long 823158129 + .long 1067801953 + .long 871826232 + .long 1067905666 + .long 871183196 + .long 1068007450 + .long 839030530 + .long 1068107390 + .long 867690638 + .long 1068205570 + .long 840440923 + .long 1068302063 + .long 868033274 + .long 1068396942 + .long 855856030 + .long 1068490271 + .long 865094453 + .long 1068582113 + .long 860418487 + .long 1068672525 + .long 866225006 + .long 1068761562 + .long 866458226 + .long 1068849275 + .long 865124659 + .long 1068935712 + .long 864837702 + .long 1069020919 + .long 811742505 + .long 1069104937 + .long 869432099 + .long 1069187809 + .long 864584201 + .long 1069269572 + .long 864183978 + .long 1069350263 + .long 844810573 + .long 1069429915 + .long 869245699 + .long 1069508563 + .long 859556409 + .long 1069586236 + .long 870675446 + .long 1069662966 + .long 814190139 + .long 1069738778 + .long 870686941 + .long 1069813702 + .long 861800510 + .long 1069887762 + .long 855649163 + .long 1069960982 + .long 869347119 + .long 1070033387 + .long 864252033 + .long 1070104998 + .long 867276215 + .long 1070175837 + .long 868189817 + .long 1070245925 + .long 849541095 + .long 1070349689 + .long 866633177 + .long 1070485588 + .long 843967686 + .long 1070618808 + .long 857522493 + .long 1070749478 + .long 862339487 + .long 1070877717 + .long 850054662 + .long 1071003634 + .long 864048556 + .long 1071127332 + .long 868027089 + .long 1071248907 + .long 848093931 + .long 1071368446 + .long 865355299 + .long 1071486034 + .long 848111485 + .long 1071601747 + .long 865557362 + .long 1071715659 + .long 870297525 + .long 1071827839 + .long 863416216 + .long 1071938350 + .long 869675693 + .long 1072047254 + .long 865888071 + .long 1072154608 + .long 825332584 + .long 1072260465 + .long 843309506 + .long 1072364876 + .long 870885636 + .long 1072467891 + .long 869119784 + .long 1072569555 + .long 865466648 + .long 1072669911 + .long 867459244 + .long 1072769001 + .long 861192764 + .long 1072866863 + .long 871247716 + .long 1072963536 + .long 864927982 + .long 1073059054 + .long 869195129 + .long 1073153452 + .long 864849564 + .long 1073246762 + .long 840005936 + .long 1073339014 + .long 852579258 + .long 1073430238 + .long 860852782 + .long 1073520462 + .long 869711141 + .long 1073609714 + .long 862506141 + .long 1073698019 + .long 837959274 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .type __scbrt_la_vscbrt_ha_cout_data,@object + .size __scbrt_la_vscbrt_ha_cout_data,1920 + .align 16 + +.FLT_36: + .long 0xffffffff,0x00000000,0xffffffff,0x00000000 + .type .FLT_36,@object + .size .FLT_36,16 + .align 4 + +.FLT_37: + .long 0x007fffff + .type .FLT_37,@object + .size .FLT_37,4 + .align 4 + +.FLT_38: + .long 0x007e0000 + .type .FLT_38,@object + .size .FLT_38,4 + .align 4 + +.FLT_39: + .long 0xbf800000 + .type .FLT_39,@object + .size .FLT_39,4 + .align 4 + +.FLT_40: + .long 0xbf820000 + .type .FLT_40,@object + .size .FLT_40,4 + .align 4 + +.FLT_41: + .long 0x3eaaaaab + .type .FLT_41,@object + .size .FLT_41,4 + .align 4 + +.FLT_42: + .long 0xbde38e39 + .type .FLT_42,@object + .size .FLT_42,4 + .align 4 + +.FLT_43: + .long 0x3d7cd6ea + .type .FLT_43,@object + .size .FLT_43,4 + .align 4 + +.FLT_44: + .long 0xbd288f47 + .type .FLT_44,@object + .size .FLT_44,4 diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S new file mode 100644 index 0000000000..8eaa457fa6 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized cbrtf, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define _ZGVdN8v_cbrtf _ZGVdN8v_cbrtf_sse_wrapper +#include "../svml_s_cbrtf8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c new file mode 100644 index 0000000000..089d28461f --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized cbrtf, vector length is 8. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SYMBOL_NAME _ZGVdN8v_cbrtf +#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN8v_cbrtf, __GI__ZGVdN8v_cbrtf, + __redirect__ZGVdN8v_cbrtf) + __attribute__ ((visibility ("hidden"))); +#endif diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S new file mode 100644 index 0000000000..943d708ef3 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S @@ -0,0 +1,1686 @@ +/* Function cbrtf vectorized with AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + https://www.gnu.org/licenses/. */ + +/* + * ALGORITHM DESCRIPTION: + * + * x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52 + * Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5], + * where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in single precision + * cbrtf(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5] + * (T stores the high 24 bits, D stores the low order bits) + * Result=2^k*T+(2^k*T*r)*P+2^k*D + * where P=p1+p2*r+.. + * + */ + +#include + + .text + .section .text.avx2,"ax",@progbits +ENTRY(_ZGVdN8v_cbrtf_avx2) + pushq %rbp + cfi_def_cfa_offset(16) + movq %rsp, %rbp + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + andq $-64, %rsp + subq $384, %rsp + +/* Load reciprocal value */ + lea __svml_scbrt_data_internal(%rip), %rdx + vmovaps %ymm0, %ymm5 + +/* + * Load constants + * Reciprocal index calculation + */ + vpsrld $16, %ymm5, %ymm3 + vmovups %ymm10, 160(%rsp) + vmovups %ymm12, 224(%rsp) + vmovups %ymm11, 192(%rsp) + vmovups %ymm14, 288(%rsp) + vmovups %ymm9, 96(%rsp) + vmovups %ymm8, 32(%rsp) + vmovups %ymm13, 256(%rsp) + vmovups %ymm15, 320(%rsp) + vpand 896+__svml_scbrt_data_internal(%rip), %ymm3, %ymm4 + vmovd %xmm4, %eax + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + vextractf128 $1, %ymm4, %xmm10 + vpextrd $1, %xmm4, %ecx + movslq %eax, %rax + movslq %ecx, %rcx + vmovd %xmm10, %r10d + vmovd (%rdx,%rax), %xmm6 + vmovd (%rdx,%rcx), %xmm7 + vpextrd $2, %xmm10, %eax + vpextrd $3, %xmm10, %ecx + movslq %eax, %rax + movslq %ecx, %rcx + vpextrd $2, %xmm4, %r8d + vpextrd $3, %xmm4, %r9d + vpextrd $1, %xmm10, %r11d + +/* Get signed biased exponent */ + vpsrld $7, %ymm3, %ymm10 + vmovd (%rdx,%rax), %xmm11 + vmovd (%rdx,%rcx), %xmm12 + vpunpckldq %xmm12, %xmm11, %xmm14 + movslq %r8d, %r8 + movslq %r9d, %r9 + movslq %r10d, %r10 + movslq %r11d, %r11 + +/* Get absolute biased exponent */ + vpand 960+__svml_scbrt_data_internal(%rip), %ymm10, %ymm11 + vmovd (%rdx,%r8), %xmm1 + vmovd (%rdx,%r9), %xmm0 + vmovd (%rdx,%r10), %xmm8 + vmovd (%rdx,%r11), %xmm9 + vpunpckldq %xmm7, %xmm6, %xmm2 + vpunpckldq %xmm0, %xmm1, %xmm6 + vandps 1280+__svml_scbrt_data_internal(%rip), %ymm5, %ymm3 + vpunpckldq %xmm9, %xmm8, %xmm13 + vpsubd 1344+__svml_scbrt_data_internal(%rip), %ymm3, %ymm1 + vpunpcklqdq %xmm6, %xmm2, %xmm7 + vpunpcklqdq %xmm14, %xmm13, %xmm15 + +/* Biased exponent-1 */ + vpand 1024+__svml_scbrt_data_internal(%rip), %ymm10, %ymm14 + +/* Argument reduction */ + vandps 640+__svml_scbrt_data_internal(%rip), %ymm5, %ymm0 + vandps 704+__svml_scbrt_data_internal(%rip), %ymm5, %ymm3 + vorps 768+__svml_scbrt_data_internal(%rip), %ymm0, %ymm6 + +/* + * Calculate exponent/3 + * i555Exp=(2^{12}-1)/3*exponent + */ + vpmulld 1216+__svml_scbrt_data_internal(%rip), %ymm11, %ymm12 + vpcmpgtd 1408+__svml_scbrt_data_internal(%rip), %ymm1, %ymm2 + vmovmskps %ymm2, %eax + vinsertf128 $1, %xmm15, %ymm7, %ymm8 + vorps 832+__svml_scbrt_data_internal(%rip), %ymm3, %ymm7 + +/* r=y-y` */ + vsubps %ymm7, %ymm6, %ymm9 + +/* Get K (exponent=3*k+j) */ + vpsrld $12, %ymm12, %ymm6 + vpsubd 1152+__svml_scbrt_data_internal(%rip), %ymm11, %ymm3 + +/* r=(y-y`)*rcp_table(y`) */ + vmulps %ymm9, %ymm8, %ymm1 + +/* Add 2/3*(bias-1)+1 to (k+1/3*(bias-1)) */ + vpaddd 1088+__svml_scbrt_data_internal(%rip), %ymm6, %ymm13 + +/* Attach sign to exponent */ + vpor %ymm14, %ymm13, %ymm15 + +/* Get J */ + vpsubd %ymm6, %ymm3, %ymm13 + vpslld $23, %ymm15, %ymm0 + vpsubd %ymm6, %ymm13, %ymm14 + vpsubd %ymm6, %ymm14, %ymm7 + +/* Get 128*J */ + vpslld $7, %ymm7, %ymm8 + +/* iCbrtIndex=4*l+128*j */ + vpaddd %ymm8, %ymm4, %ymm4 + +/* Zero index if callout expected */ + vpandn %ymm4, %ymm2, %ymm4 + +/* Load Cbrt table Hi & Lo values */ + vmovd %xmm4, %r8d + vextractf128 $1, %ymm4, %xmm12 + movslq %r8d, %r8 + vpextrd $1, %xmm4, %r9d + vpextrd $3, %xmm4, %ecx + movslq %r9d, %r9 + movslq %ecx, %rcx + vmovd 128(%rdx,%r8), %xmm2 + vmovd %xmm12, %r8d + vmovd 128(%rdx,%r9), %xmm3 + vmovd 128(%rdx,%rcx), %xmm6 + vpextrd $2, %xmm4, %r10d + vpextrd $1, %xmm12, %r9d + vpextrd $2, %xmm12, %r11d + vpextrd $3, %xmm12, %ecx + movslq %r10d, %r10 + movslq %r8d, %r8 + movslq %r9d, %r9 + movslq %r11d, %r11 + movslq %ecx, %rcx + vpunpckldq %xmm3, %xmm2, %xmm7 + vmovd 128(%rdx,%r10), %xmm2 + vmovd 128(%rdx,%r8), %xmm10 + vmovd 128(%rdx,%r9), %xmm11 + vmovd 128(%rdx,%r11), %xmm13 + vmovd 128(%rdx,%rcx), %xmm14 + vpunpckldq %xmm6, %xmm2, %xmm8 + vpunpckldq %xmm11, %xmm10, %xmm15 + vpunpckldq %xmm14, %xmm13, %xmm4 + vpunpcklqdq %xmm8, %xmm7, %xmm9 + vpunpcklqdq %xmm4, %xmm15, %xmm2 + vinsertf128 $1, %xmm2, %ymm9, %ymm3 + +/* sCbrtHi *= 2^k */ + vmulps %ymm3, %ymm0, %ymm2 + +/* Polynomial: p1+r*(p2*r+r*(p3+r*p4)) */ + vmovups 512+__svml_scbrt_data_internal(%rip), %ymm0 + vfmadd213ps 576+__svml_scbrt_data_internal(%rip), %ymm1, %ymm0 + +/* T`*r */ + vmulps %ymm2, %ymm1, %ymm1 + +/* (T`*r)*P */ + vmulps %ymm1, %ymm0, %ymm0 + +/* + * T`*r*P+D` + * result = T`+(T`*r*P+D`) + */ + vaddps %ymm0, %ymm2, %ymm0 + testl %eax, %eax + jne L(2) + +L(1): + vmovups 32(%rsp), %ymm8 + cfi_restore(91) + vmovups 96(%rsp), %ymm9 + cfi_restore(92) + vmovups 160(%rsp), %ymm10 + cfi_restore(93) + vmovups 192(%rsp), %ymm11 + cfi_restore(94) + vmovups 224(%rsp), %ymm12 + cfi_restore(95) + vmovups 256(%rsp), %ymm13 + cfi_restore(96) + vmovups 288(%rsp), %ymm14 + cfi_restore(97) + vmovups 320(%rsp), %ymm15 + cfi_restore(98) + movq %rbp, %rsp + popq %rbp + cfi_def_cfa(7, 8) + cfi_restore(6) + ret + cfi_def_cfa(6, 16) + cfi_offset(6, -16) + .cfi_escape 0x10, 0xdb, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdc, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xe0, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdd, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x20, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xde, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xdf, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x60, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe0, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe1, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0xe2, 0x00, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22 + +L(2): + vmovups %ymm5, 64(%rsp) + vmovups %ymm0, 128(%rsp) + je L(1) + xorl %edx, %edx + vzeroupper + movq %rsi, 8(%rsp) + movq %rdi, (%rsp) + movq %r12, 24(%rsp) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + movl %edx, %r12d + movq %r13, 16(%rsp) + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + movl %eax, %r13d + +L(3): + btl %r12d, %r13d + jc L(5) + +L(4): + incl %r12d + cmpl $8, %r12d + jl L(3) + movq 8(%rsp), %rsi + cfi_restore(4) + movq (%rsp), %rdi + cfi_restore(5) + movq 24(%rsp), %r12 + cfi_restore(12) + movq 16(%rsp), %r13 + cfi_restore(13) + vmovups 128(%rsp), %ymm0 + jmp L(1) + .cfi_escape 0x10, 0x04, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x05, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x98, 0xfe, 0xff, 0xff, 0x22 + .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xfe, 0xff, 0xff, 0x22 + +L(5): + lea 64(%rsp,%r12,4), %rdi + lea 128(%rsp,%r12,4), %rsi + call __svml_scbrt_cout_rare_internal + jmp L(4) + +END(_ZGVdN8v_cbrtf_avx2) + + .align 16,0x90 + +__svml_scbrt_cout_rare_internal: + + cfi_startproc + + movq %rsi, %r9 + movl $1065353216, -24(%rsp) + movss (%rdi), %xmm0 + movss -24(%rsp), %xmm1 + mulss %xmm0, %xmm1 + movss %xmm1, -4(%rsp) + movzwl -2(%rsp), %eax + andl $32640, %eax + shrl $7, %eax + cmpl $255, %eax + je L(11) + pxor %xmm0, %xmm0 + ucomiss %xmm0, %xmm1 + jp L(6) + je L(10) + +L(6): + testl %eax, %eax + jne L(7) + movl $2122317824, -24(%rsp) + movl $713031680, -20(%rsp) + jmp L(8) + +L(7): + movl $1065353216, %eax + movl %eax, -24(%rsp) + movl %eax, -20(%rsp) + +L(8): + movss -24(%rsp), %xmm0 + lea __scbrt_la_vscbrt_ha_cout_data(%rip), %rsi + mulss %xmm0, %xmm1 + movd %xmm1, %ecx + movss %xmm1, -4(%rsp) + movl %ecx, %r10d + movl %ecx, %edi + andl $8388607, %r10d + movl %ecx, %r11d + shrl $23, %edi + andl $8257536, %r11d + orl $-1082130432, %r10d + orl $-1081999360, %r11d + movl %r10d, -16(%rsp) + movl %ecx, %edx + movzbl %dil, %r8d + andl $2147483647, %ecx + movl %r11d, -12(%rsp) + andl $-256, %edi + movss -16(%rsp), %xmm1 + addl $2139095040, %ecx + shrl $16, %edx + subss -12(%rsp), %xmm1 + andl $124, %edx + lea (%r8,%r8,4), %r10d + mulss (%rsi,%rdx), %xmm1 + lea (%r10,%r10), %r11d + movss .FLT_43(%rip), %xmm4 + lea (%r11,%r11), %eax + addl %eax, %eax + lea (%r10,%r11,8), %r10d + addl %eax, %eax + decl %r8d + mulss %xmm1, %xmm4 + shll $7, %r8d + lea (%r10,%rax,8), %r11d + lea (%r11,%rax,8), %r10d + shrl $12, %r10d + addss .FLT_42(%rip), %xmm4 + mulss %xmm1, %xmm4 + lea 85(%r10), %eax + orl %edi, %eax + xorl %edi, %edi + cmpl $-16777217, %ecx + addss .FLT_41(%rip), %xmm4 + setg %dil + shll $7, %r10d + negl %edi + subl %r10d, %r8d + addl %r10d, %r10d + subl %r10d, %r8d + notl %edi + addl %r8d, %edx + andl %edx, %edi + shll $23, %eax + addl %edi, %edi + movl %eax, -8(%rsp) + movss 128(%rdi,%rsi), %xmm5 + movss -8(%rsp), %xmm2 + mulss %xmm1, %xmm4 + mulss %xmm2, %xmm5 + addss .FLT_40(%rip), %xmm4 + mulss %xmm5, %xmm1 + movss 132(%rsi,%rdi), %xmm3 + mulss %xmm1, %xmm4 + mulss %xmm2, %xmm3 + addss %xmm3, %xmm4 + addss %xmm4, %xmm5 + mulss -20(%rsp), %xmm5 + movss %xmm5, (%r9) + +L(9): + xorl %eax, %eax + ret + +L(10): + movss %xmm1, (%r9) + jmp L(9) + +L(11): + addss %xmm0, %xmm0 + movss %xmm0, (%r9) + jmp L(9) + + cfi_endproc + + .type __svml_scbrt_cout_rare_internal,@function + .size __svml_scbrt_cout_rare_internal,.-__svml_scbrt_cout_rare_internal + + .section .rodata, "a" + .align 64 + +__svml_scbrt_data_internal: + .long 3212578753 + .long 3212085645 + .long 3211621124 + .long 3211182772 + .long 3210768440 + .long 3210376206 + .long 3210004347 + .long 3209651317 + .long 3209315720 + .long 3208996296 + .long 3208691905 + .long 3208401508 + .long 3208124163 + .long 3207859009 + .long 3207605259 + .long 3207362194 + .long 3207129151 + .long 3206905525 + .long 3206690755 + .long 3206484326 + .long 3206285761 + .long 3206094618 + .long 3205910490 + .long 3205732998 + .long 3205561788 + .long 3205396533 + .long 3205236929 + .long 3205082689 + .long 3204933547 + .long 3204789256 + .long 3204649583 + .long 3204514308 + .long 1065396681 + .long 1065482291 + .long 1065566215 + .long 1065648532 + .long 1065729317 + .long 1065808640 + .long 1065886565 + .long 1065963152 + .long 1066038457 + .long 1066112533 + .long 1066185428 + .long 1066257188 + .long 1066327857 + .long 1066397474 + .long 1066466079 + .long 1066533708 + .long 1066600394 + .long 1066666169 + .long 1066731064 + .long 1066795108 + .long 1066858329 + .long 1066920751 + .long 1066982401 + .long 1067043301 + .long 1067103474 + .long 1067162941 + .long 1067221722 + .long 1067279837 + .long 1067337305 + .long 1067394143 + .long 1067450368 + .long 1067505996 + .long 1067588354 + .long 1067696217 + .long 1067801953 + .long 1067905666 + .long 1068007450 + .long 1068107390 + .long 1068205570 + .long 1068302063 + .long 1068396942 + .long 1068490271 + .long 1068582113 + .long 1068672525 + .long 1068761562 + .long 1068849275 + .long 1068935712 + .long 1069020919 + .long 1069104937 + .long 1069187809 + .long 1069269572 + .long 1069350263 + .long 1069429915 + .long 1069508563 + .long 1069586236 + .long 1069662966 + .long 1069738778 + .long 1069813702 + .long 1069887762 + .long 1069960982 + .long 1070033387 + .long 1070104998 + .long 1070175837 + .long 1070245925 + .long 1070349689 + .long 1070485588 + .long 1070618808 + .long 1070749478 + .long 1070877717 + .long 1071003634 + .long 1071127332 + .long 1071248907 + .long 1071368446 + .long 1071486034 + .long 1071601747 + .long 1071715659 + .long 1071827839 + .long 1071938350 + .long 1072047254 + .long 1072154608 + .long 1072260465 + .long 1072364876 + .long 1072467891 + .long 1072569555 + .long 1072669911 + .long 1072769001 + .long 1072866863 + .long 1072963536 + .long 1073059054 + .long 1073153452 + .long 1073246762 + .long 1073339014 + .long 1073430238 + .long 1073520462 + .long 1073609714 + .long 1073698019 + .long 3185813858 + .long 3185813858 + .long 3185813858 + .long 3185813858 + .long 3185813858 + .long 3185813858 + .long 3185813858 + .long 3185813858 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1051372689 + .long 1051372689 + .long 1051372689 + .long 1051372689 + .long 1051372689 + .long 1051372689 + .long 1051372689 + .long 1051372689 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 1365 + .long 1365 + .long 1365 + .long 1365 + .long 1365 + .long 1365 + .long 1365 + .long 1365 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .type __svml_scbrt_data_internal,@object + .size __svml_scbrt_data_internal,1472 + .align 64 + +__scbrt_la_vscbrt_ha_cout_data: + .long 3212578753 + .long 3212085645 + .long 3211621124 + .long 3211182772 + .long 3210768440 + .long 3210376206 + .long 3210004347 + .long 3209651317 + .long 3209315720 + .long 3208996296 + .long 3208691905 + .long 3208401508 + .long 3208124163 + .long 3207859009 + .long 3207605259 + .long 3207362194 + .long 3207129151 + .long 3206905525 + .long 3206690755 + .long 3206484326 + .long 3206285761 + .long 3206094618 + .long 3205910490 + .long 3205732998 + .long 3205561788 + .long 3205396533 + .long 3205236929 + .long 3205082689 + .long 3204933547 + .long 3204789256 + .long 3204649583 + .long 3204514308 + .long 1065396681 + .long 839340838 + .long 1065482291 + .long 867750258 + .long 1065566215 + .long 851786446 + .long 1065648532 + .long 853949398 + .long 1065729317 + .long 864938789 + .long 1065808640 + .long 864102364 + .long 1065886565 + .long 864209792 + .long 1065963152 + .long 865422805 + .long 1066038457 + .long 867593594 + .long 1066112533 + .long 854482593 + .long 1066185428 + .long 848298042 + .long 1066257188 + .long 860064854 + .long 1066327857 + .long 844792593 + .long 1066397474 + .long 870701309 + .long 1066466079 + .long 872023170 + .long 1066533708 + .long 860255342 + .long 1066600394 + .long 849966899 + .long 1066666169 + .long 863561479 + .long 1066731064 + .long 869115319 + .long 1066795108 + .long 871961375 + .long 1066858329 + .long 859537336 + .long 1066920751 + .long 871954398 + .long 1066982401 + .long 863817578 + .long 1067043301 + .long 861687921 + .long 1067103474 + .long 849594757 + .long 1067162941 + .long 816486846 + .long 1067221722 + .long 858183533 + .long 1067279837 + .long 864500406 + .long 1067337305 + .long 850523240 + .long 1067394143 + .long 808125243 + .long 1067450368 + .long 0 + .long 1067505996 + .long 861173761 + .long 1067588354 + .long 859000219 + .long 1067696217 + .long 823158129 + .long 1067801953 + .long 871826232 + .long 1067905666 + .long 871183196 + .long 1068007450 + .long 839030530 + .long 1068107390 + .long 867690638 + .long 1068205570 + .long 840440923 + .long 1068302063 + .long 868033274 + .long 1068396942 + .long 855856030 + .long 1068490271 + .long 865094453 + .long 1068582113 + .long 860418487 + .long 1068672525 + .long 866225006 + .long 1068761562 + .long 866458226 + .long 1068849275 + .long 865124659 + .long 1068935712 + .long 864837702 + .long 1069020919 + .long 811742505 + .long 1069104937 + .long 869432099 + .long 1069187809 + .long 864584201 + .long 1069269572 + .long 864183978 + .long 1069350263 + .long 844810573 + .long 1069429915 + .long 869245699 + .long 1069508563 + .long 859556409 + .long 1069586236 + .long 870675446 + .long 1069662966 + .long 814190139 + .long 1069738778 + .long 870686941 + .long 1069813702 + .long 861800510 + .long 1069887762 + .long 855649163 + .long 1069960982 + .long 869347119 + .long 1070033387 + .long 864252033 + .long 1070104998 + .long 867276215 + .long 1070175837 + .long 868189817 + .long 1070245925 + .long 849541095 + .long 1070349689 + .long 866633177 + .long 1070485588 + .long 843967686 + .long 1070618808 + .long 857522493 + .long 1070749478 + .long 862339487 + .long 1070877717 + .long 850054662 + .long 1071003634 + .long 864048556 + .long 1071127332 + .long 868027089 + .long 1071248907 + .long 848093931 + .long 1071368446 + .long 865355299 + .long 1071486034 + .long 848111485 + .long 1071601747 + .long 865557362 + .long 1071715659 + .long 870297525 + .long 1071827839 + .long 863416216 + .long 1071938350 + .long 869675693 + .long 1072047254 + .long 865888071 + .long 1072154608 + .long 825332584 + .long 1072260465 + .long 843309506 + .long 1072364876 + .long 870885636 + .long 1072467891 + .long 869119784 + .long 1072569555 + .long 865466648 + .long 1072669911 + .long 867459244 + .long 1072769001 + .long 861192764 + .long 1072866863 + .long 871247716 + .long 1072963536 + .long 864927982 + .long 1073059054 + .long 869195129 + .long 1073153452 + .long 864849564 + .long 1073246762 + .long 840005936 + .long 1073339014 + .long 852579258 + .long 1073430238 + .long 860852782 + .long 1073520462 + .long 869711141 + .long 1073609714 + .long 862506141 + .long 1073698019 + .long 837959274 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 3173551943 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 1031591658 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 3185806905 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 1051372203 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8388607 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 8257536 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212836864 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 3212967936 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 124 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 255 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 256 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 85 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 1 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2147483647 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 2155872256 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .long 4278190079 + .type __scbrt_la_vscbrt_ha_cout_data,@object + .size __scbrt_la_vscbrt_ha_cout_data,1920 + .align 4 + +.FLT_36: + .long 0x007fffff + .type .FLT_36,@object + .size .FLT_36,4 + .align 4 + +.FLT_37: + .long 0x007e0000 + .type .FLT_37,@object + .size .FLT_37,4 + .align 4 + +.FLT_38: + .long 0xbf800000 + .type .FLT_38,@object + .size .FLT_38,4 + .align 4 + +.FLT_39: + .long 0xbf820000 + .type .FLT_39,@object + .size .FLT_39,4 + .align 4 + +.FLT_40: + .long 0x3eaaaaab + .type .FLT_40,@object + .size .FLT_40,4 + .align 4 + +.FLT_41: + .long 0xbde38e39 + .type .FLT_41,@object + .size .FLT_41,4 + .align 4 + +.FLT_42: + .long 0x3d7cd6ea + .type .FLT_42,@object + .size .FLT_42,4 + .align 4 + +.FLT_43: + .long 0xbd288f47 + .type .FLT_43,@object + .size .FLT_43,4 diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt2_core.S b/sysdeps/x86_64/fpu/svml_d_cbrt2_core.S new file mode 100644 index 0000000000..4bf546564b --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_cbrt2_core.S @@ -0,0 +1,29 @@ +/* Function cbrt vectorized with SSE2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVbN2v_cbrt) +WRAPPER_IMPL_SSE2 cbrt +END (_ZGVbN2v_cbrt) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN2v_cbrt) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt4_core.S b/sysdeps/x86_64/fpu/svml_d_cbrt4_core.S new file mode 100644 index 0000000000..e6d1003e27 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_cbrt4_core.S @@ -0,0 +1,29 @@ +/* Function cbrt vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVdN4v_cbrt) +WRAPPER_IMPL_AVX _ZGVbN2v_cbrt +END (_ZGVdN4v_cbrt) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN4v_cbrt) +#endif diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S new file mode 100644 index 0000000000..70632869ac --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S @@ -0,0 +1,25 @@ +/* Function cbrt vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVcN4v_cbrt) +WRAPPER_IMPL_AVX _ZGVbN2v_cbrt +END (_ZGVcN4v_cbrt) diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt8_core.S b/sysdeps/x86_64/fpu/svml_d_cbrt8_core.S new file mode 100644 index 0000000000..37571673a7 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_cbrt8_core.S @@ -0,0 +1,25 @@ +/* Function cbrt vectorized with AVX-512, wrapper to AVX2. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVeN8v_cbrt) +WRAPPER_IMPL_AVX512 _ZGVdN4v_cbrt +END (_ZGVeN8v_cbrt) diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S b/sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S new file mode 100644 index 0000000000..1be6294026 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S @@ -0,0 +1,25 @@ +/* Function cbrtf vectorized with AVX-512. Wrapper to AVX2 version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVeN16v_cbrtf) +WRAPPER_IMPL_AVX512 _ZGVdN8v_cbrtf +END (_ZGVeN16v_cbrtf) diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S b/sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S new file mode 100644 index 0000000000..2469a100f4 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S @@ -0,0 +1,29 @@ +/* Function cbrtf vectorized with SSE2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVbN4v_cbrtf) +WRAPPER_IMPL_SSE2 cbrtf +END (_ZGVbN4v_cbrtf) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVbN4v_cbrtf) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S new file mode 100644 index 0000000000..efedc22323 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S @@ -0,0 +1,29 @@ +/* Function cbrtf vectorized with AVX2, wrapper version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVdN8v_cbrtf) +WRAPPER_IMPL_AVX _ZGVbN4v_cbrtf +END (_ZGVdN8v_cbrtf) + +#ifndef USE_MULTIARCH + libmvec_hidden_def (_ZGVdN8v_cbrtf) +#endif diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S new file mode 100644 index 0000000000..b5acc62426 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S @@ -0,0 +1,25 @@ +/* Function cbrtf vectorized in AVX ISA as wrapper to SSE4 ISA version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVcN8v_cbrtf) +WRAPPER_IMPL_AVX _ZGVbN4v_cbrtf +END (_ZGVcN8v_cbrtf) diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c new file mode 100644 index 0000000000..c8bc643c99 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c @@ -0,0 +1 @@ +#include "test-double-libmvec-cbrt.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c new file mode 100644 index 0000000000..c8bc643c99 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c @@ -0,0 +1 @@ +#include "test-double-libmvec-cbrt.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c new file mode 100644 index 0000000000..c8bc643c99 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c @@ -0,0 +1 @@ +#include "test-double-libmvec-cbrt.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c new file mode 100644 index 0000000000..fb3684b18c --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE double +#define LIBMVEC_FUNC cbrt +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c index 85b3129618..76dc92b983 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c @@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVbN2v_asinh) VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2) VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh) +VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c index e3e88fe268..e16abf5bb0 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c @@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVdN4v_asinh) VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2) VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh) +VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt) #ifndef __ILP32__ # define VEC_INT_TYPE __m256i diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c index 6f81f13d37..84091a860f 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c @@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVcN4v_asinh) VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2) VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh) +VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c index 197ff12338..873d7aa9c8 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c @@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVeN8v_asinh) VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2) VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh) +VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt) #ifndef __ILP32__ # define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c new file mode 100644 index 0000000000..59b8d77f71 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c @@ -0,0 +1 @@ +#include "test-float-libmvec-cbrtf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c new file mode 100644 index 0000000000..59b8d77f71 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c @@ -0,0 +1 @@ +#include "test-float-libmvec-cbrtf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c new file mode 100644 index 0000000000..59b8d77f71 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c @@ -0,0 +1 @@ +#include "test-float-libmvec-cbrtf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c new file mode 100644 index 0000000000..3a06ba79e0 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE float +#define LIBMVEC_FUNC cbrtf +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c index d21d943404..b4bccd8e84 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c @@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVeN16v_asinhf) VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f) VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf) +VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf) #define VEC_INT_TYPE __m512i diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c index 3d24faf8dc..1aa2c920ed 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c @@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVbN4v_asinhf) VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f) VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf) +VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf) #define VEC_INT_TYPE __m128i diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c index f176c1f4b0..2042aec59e 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c @@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVdN8v_asinhf) VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f) VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf) +VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf) /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf. */ #undef VECTOR_WRAPPER_fFF diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c index 281eb58ad4..bb25393c57 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c @@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVcN8v_asinhf) VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf) VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f) VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf) +VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf) #define VEC_INT_TYPE __m128i