Message ID | CAMXFM3tOpEoMb-oLoeSJNzOWYnJgte3bm_OR=oZ172n845WeZw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Fri, 17 Apr 2015, Andrew Senkevich wrote: > My question is do we need to test this new SSE2 version (which is just > simple wrapper to scalar one)? Well, we need to test each entry point. One of the existing deficiencies of IFUNC testing (which would also apply to the CPU tests here) is that we don't test the different variants of libm functions for different processors - I don't think you need to address that in your patch. (So make sure the SSE2 entry point is tested, but which function that testing calls may depend on the processor.)
2015-04-24 20:37 GMT+03:00 Joseph Myers <joseph@codesourcery.com>: > On Fri, 17 Apr 2015, Andrew Senkevich wrote: > >> My question is do we need to test this new SSE2 version (which is just >> simple wrapper to scalar one)? > > Well, we need to test each entry point. One of the existing deficiencies > of IFUNC testing (which would also apply to the CPU tests here) is that we > don't test the different variants of libm functions for different > processors - I don't think you need to address that in your patch. (So > make sure the SSE2 entry point is tested, but which function that testing > calls may depend on the processor.) Yes, if not build separate test for SSE2 entry point then SSE2 version will be tested only on according processor if called through IFUNC. Do we need to move to multiarch folder implementations with IFUNC? -- WBR, Andrew
On Fri, 24 Apr 2015, Andrew Senkevich wrote: > 2015-04-24 20:37 GMT+03:00 Joseph Myers <joseph@codesourcery.com>: > > On Fri, 17 Apr 2015, Andrew Senkevich wrote: > > > >> My question is do we need to test this new SSE2 version (which is just > >> simple wrapper to scalar one)? > > > > Well, we need to test each entry point. One of the existing deficiencies > > of IFUNC testing (which would also apply to the CPU tests here) is that we > > don't test the different variants of libm functions for different > > processors - I don't think you need to address that in your patch. (So > > make sure the SSE2 entry point is tested, but which function that testing > > calls may depend on the processor.) > > Yes, if not build separate test for SSE2 entry point then SSE2 version > will be tested only on according processor if called through IFUNC. > > Do we need to move to multiarch folder implementations with IFUNC? In general use of IFUNCs is the preferred way of handling different function implementations for different CPU variants, in the absence of a clear reason to do it differently in a particular case.
diff --git a/sysdeps/x86_64/fpu/svml_d_cos2_core.S b/sysdeps/x86_64/fpu/svml_d_cos2_core.S index 11d9c94..ccd0969 100644 --- a/sysdeps/x86_64/fpu/svml_d_cos2_core.S +++ b/sysdeps/x86_64/fpu/svml_d_cos2_core.S @@ -1,4 +1,4 @@ -/* Function cos vectorized with SSE4. +/* Function cos vectorized with SSE2 and SSE4. Copyright (C) 2014 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -17,22 +17,51 @@ <http://www.gnu.org/licenses/>. */ #include <sysdep.h> +#include <init-arch.h> #include "svml_d_cos_data.h" .text ENTRY(_ZGVbN2v_cos) + .type _ZGVbN2v_cos, @gnu_indirect_function + cmpl $0, KIND_OFFSET+__cpu_features(%rip) + jne 1f + call __init_cpu_features +1: leaq _ZGVbN2v_cos_sse4(%rip), %rax + testl $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip) + jz 2f + ret +2: leaq _ZGVbN2v_cos_sse2(%rip), %rax + ret +END(_ZGVbN2v_cos) + +ENTRY(_ZGVbN2v_cos_sse2) +/* SSE2 version as wrapper to scalar. */ + subq $40, %rsp + movaps %xmm0, (%rsp) + call cos@PLT + movsd %xmm0, 16(%rsp) + movsd 8(%rsp), %xmm0 + call cos@PLT + movsd 16(%rsp), %xmm1 + movsd %xmm0, 24(%rsp) + unpcklpd %xmm0, %xmm1 + movaps %xmm1, %xmm0 + addq $40, %rsp + ret +END(_ZGVbN2v_cos_sse2) +ENTRY(_ZGVbN2v_cos_sse4) /* ALGORITHM DESCRIPTION: * * ( low accuracy ( < 4ulp ) or enhanced performance ( half of correct @@ -206,4 +235,4 @@ ENTRY(_ZGVbN2v_cos) movsd %xmm0, 256(%rsp,%r15) jmp .LBL_1_7 -END(_ZGVbN2v_cos) +END(_ZGVbN2v_cos_sse4) My question is do we need to test this new SSE2 version (which is just simple wrapper to scalar one)? If yes, is it ok to have according test linked against libmvec.a (in libmvec.so this SSE2 implementation is hidden and no need to make it