diff mbox series

[v2,libatomic] : Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688]

Message ID 20240718072305.3609-1-MayShao-oc@zhaoxin.com
State New
Headers show
Series [v2,libatomic] : Handle AVX+CX16 ZHAOXIN like intel for 16b atomic [PR104688] | expand

Commit Message

MayShao-oc July 18, 2024, 7:23 a.m. UTC
From: mayshao <mayshao-oc@zhaoxin.com>

Hi Jakub:

    Thanks for your review,We should just amend this to handle Zhaoxin.

    Bootstrapped /regtested X86_64.

    Ok for trunk?
BR
Mayshao

libatomic/ChangeLog:

	PR target/104688
	* config/x86/init.c (__libat_feat1_init): Don't clear
	bit_AVX on ZHAOXIN CPUs.
---
 libatomic/config/x86/init.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Jakub Jelinek July 18, 2024, 7:29 a.m. UTC | #1
On Thu, Jul 18, 2024 at 03:23:05PM +0800, MayShao-oc wrote:
> From: mayshao <mayshao-oc@zhaoxin.com>
> 
> Hi Jakub:
> 
>     Thanks for your review,We should just amend this to handle Zhaoxin.
> 
>     Bootstrapped /regtested X86_64.
> 
>     Ok for trunk?
> BR
> Mayshao
> 
> libatomic/ChangeLog:
> 
> 	PR target/104688
> 	* config/x86/init.c (__libat_feat1_init): Don't clear
> 	bit_AVX on ZHAOXIN CPUs.
> ---
>  libatomic/config/x86/init.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
> index a75be3f175c..0d6864909bb 100644
> --- a/libatomic/config/x86/init.c
> +++ b/libatomic/config/x86/init.c
> @@ -39,12 +39,15 @@ __libat_feat1_init (void)
>        == (bit_AVX | bit_CMPXCHG16B))
>      {
>        /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
> -	 is atomic, and AMD is going to do something similar soon.
> -	 We don't have a guarantee from vendors of other CPUs with AVX,
> -	 like Zhaoxin and VIA.  */
> -      unsigned int ecx2 = 0;
> +	 is atomic, and AMD is going to do something similar soon. Zhaoxin also

Two spaces before Zhaoxin (and also should go on another line).

> +	 guarantees this. We don't have a guarantee from vendors of other CPUs 

Two spaces before We (and again, the line will be too long).

> +	 with AVX,like VIA.  */

Space before like

> +      unsigned int ecx2 = 0, family = 0;
> +      family = (eax >> 8) & 0x0f;
>        __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
> -      if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
> +      if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx

If the whole condition can't fit on one line, then each subcondition should
be on a separate line, so linebreak + indentation should be added also
before && ecx2 != signature_AMD_ecx

> +          && !(ecx2 == signature_CENTAUR_ecx && family > 0x6)
> +          && ecx2 != signature_SHANGHAI_ecx)
>  	FEAT1_REGISTER &= ~bit_AVX;
>      }
>  #endif
> -- 
> 2.27.0

Otherwise LGTM, but please give Uros a day or two to chime in.

	Jakub
Uros Bizjak July 18, 2024, 7:34 a.m. UTC | #2
On Thu, Jul 18, 2024 at 9:29 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Jul 18, 2024 at 03:23:05PM +0800, MayShao-oc wrote:
> > From: mayshao <mayshao-oc@zhaoxin.com>
> >
> > Hi Jakub:
> >
> >     Thanks for your review,We should just amend this to handle Zhaoxin.
> >
> >     Bootstrapped /regtested X86_64.
> >
> >     Ok for trunk?
> > BR
> > Mayshao
> >
> > libatomic/ChangeLog:
> >
> >       PR target/104688
> >       * config/x86/init.c (__libat_feat1_init): Don't clear
> >       bit_AVX on ZHAOXIN CPUs.
> > ---
> >  libatomic/config/x86/init.c | 13 ++++++++-----
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
> > index a75be3f175c..0d6864909bb 100644
> > --- a/libatomic/config/x86/init.c
> > +++ b/libatomic/config/x86/init.c
> > @@ -39,12 +39,15 @@ __libat_feat1_init (void)
> >        == (bit_AVX | bit_CMPXCHG16B))
> >      {
> >        /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
> > -      is atomic, and AMD is going to do something similar soon.
> > -      We don't have a guarantee from vendors of other CPUs with AVX,
> > -      like Zhaoxin and VIA.  */
> > -      unsigned int ecx2 = 0;
> > +      is atomic, and AMD is going to do something similar soon. Zhaoxin also
>
> Two spaces before Zhaoxin (and also should go on another line).
>
> > +      guarantees this. We don't have a guarantee from vendors of other CPUs
>
> Two spaces before We (and again, the line will be too long).
>
> > +      with AVX,like VIA.  */
>
> Space before like
>
> > +      unsigned int ecx2 = 0, family = 0;

No need to initialize these two variables. Please also add one line of
vertical space after variable declarations.

OK with the above change and with Jakub's proposed formatting changes.

Thanks,
Uros.

> > +      family = (eax >> 8) & 0x0f;
> >        __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
> > -      if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
> > +      if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx
>
> If the whole condition can't fit on one line, then each subcondition should
> be on a separate line, so linebreak + indentation should be added also
> before && ecx2 != signature_AMD_ecx
>
> > +          && !(ecx2 == signature_CENTAUR_ecx && family > 0x6)
> > +          && ecx2 != signature_SHANGHAI_ecx)
> >       FEAT1_REGISTER &= ~bit_AVX;
> >      }
> >  #endif
> > --
> > 2.27.0
>
> Otherwise LGTM, but please give Uros a day or two to chime in.
>
>         Jakub
>
Jakub Jelinek July 18, 2024, 7:49 a.m. UTC | #3
On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote:
> > > +      unsigned int ecx2 = 0, family = 0;
> 
> No need to initialize these two variables.

The function ignores __get_cpuid result, so at least the
FEAT1_REGISTER = 0; is needed before the first __get_cpuid.
Do you mean the ecx2 = 0 initialization is useless because
__get_cpuid (0, ...) on x86_64 will always succeed (especially
when __get_cpuid (1, ...) had to succeed otherwise FEAT1_REGISTER
would be zero)?
I guess that is true, but won't that cause -Wmaybe-uninitialized warnings?

I agree initializing family to 0 is not needed, but I don't understand
why it isn't just
      unsigned family = (eax >> 8) & 0x0f;
Though, guess even that might fail with -Wmaybe-uninitialized too, as
eax isn't unconditionally initialized.

	Jakub
Uros Bizjak July 18, 2024, 8:12 a.m. UTC | #4
On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote:
> > > > +      unsigned int ecx2 = 0, family = 0;
> >
> > No need to initialize these two variables.
>
> The function ignores __get_cpuid result, so at least the
> FEAT1_REGISTER = 0; is needed before the first __get_cpuid.
> Do you mean the ecx2 = 0 initialization is useless because
> __get_cpuid (0, ...) on x86_64 will always succeed (especially
> when __get_cpuid (1, ...) had to succeed otherwise FEAT1_REGISTER
> would be zero)?
> I guess that is true, but won't that cause -Wmaybe-uninitialized warnings?

Yes, if the __get_cpuid (1, ...) works OK, then we are sure that
__get_cpuid (0, ...) will also work.

> I agree initializing family to 0 is not needed, but I don't understand
> why it isn't just
>       unsigned family = (eax >> 8) & 0x0f;
> Though, guess even that might fail with -Wmaybe-uninitialized too, as
> eax isn't unconditionally initialized.

Perhaps we should check the result of __get_cpuid (1, ...) and use eax
only if the function returns 1? IMO, this would solve the
uninitialized issue, and we could use __cpuid in the second case (we
would know that leaf 0 is supported, because leaf 1 support was
checked with __get_cpuid (1, ...)).

Uros.
Jakub Jelinek July 18, 2024, 8:21 a.m. UTC | #5
On Thu, Jul 18, 2024 at 10:12:46AM +0200, Uros Bizjak wrote:
> On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote:
> > > > > +      unsigned int ecx2 = 0, family = 0;
> > >
> > > No need to initialize these two variables.
> >
> > The function ignores __get_cpuid result, so at least the
> > FEAT1_REGISTER = 0; is needed before the first __get_cpuid.
> > Do you mean the ecx2 = 0 initialization is useless because
> > __get_cpuid (0, ...) on x86_64 will always succeed (especially
> > when __get_cpuid (1, ...) had to succeed otherwise FEAT1_REGISTER
> > would be zero)?
> > I guess that is true, but won't that cause -Wmaybe-uninitialized warnings?
> 
> Yes, if the __get_cpuid (1, ...) works OK, then we are sure that
> __get_cpuid (0, ...) will also work.
> 
> > I agree initializing family to 0 is not needed, but I don't understand
> > why it isn't just
> >       unsigned family = (eax >> 8) & 0x0f;
> > Though, guess even that might fail with -Wmaybe-uninitialized too, as
> > eax isn't unconditionally initialized.
> 
> Perhaps we should check the result of __get_cpuid (1, ...) and use eax
> only if the function returns 1? IMO, this would solve the
> uninitialized issue, and we could use __cpuid in the second case (we
> would know that leaf 0 is supported, because leaf 1 support was
> checked with __get_cpuid (1, ...)).

We know the code is ok if FEAT1_REGISTER = 0; is done before __get_cpuid (1,
...).
Everything else is implied from it, all we need to ensure is that
-Wmaybe-uninitialized is happy about it.
Whatever doesn't report the warning and ideally doesn't increase the size of
the function.
I think the reason it is written the way it is before the AVX hacks in it
is that we need to handle even the case when __get_cpuid (1, ...) returns 0,
and we want in that case FEAT1_REGISTER = 0.
So it could be
  FEAT1_REGISTER = 0;
#ifdef __x86_64__
  if (__get_cpuid (1, &eax, &ebx, &ecx, &edx)
      && (FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
	  == (bit_AVX | bit_CMPXCHG16B))
    {
...
    }
#else
  __get_cpuid (1, &eax, &ebx, &ecx, &edx);
#endif
etc.

	Jakub
Uros Bizjak July 18, 2024, 8:31 a.m. UTC | #6
On Thu, Jul 18, 2024 at 10:21 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Jul 18, 2024 at 10:12:46AM +0200, Uros Bizjak wrote:
> > On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote:
> > > > > > +      unsigned int ecx2 = 0, family = 0;
> > > >
> > > > No need to initialize these two variables.
> > >
> > > The function ignores __get_cpuid result, so at least the
> > > FEAT1_REGISTER = 0; is needed before the first __get_cpuid.
> > > Do you mean the ecx2 = 0 initialization is useless because
> > > __get_cpuid (0, ...) on x86_64 will always succeed (especially
> > > when __get_cpuid (1, ...) had to succeed otherwise FEAT1_REGISTER
> > > would be zero)?
> > > I guess that is true, but won't that cause -Wmaybe-uninitialized warnings?
> >
> > Yes, if the __get_cpuid (1, ...) works OK, then we are sure that
> > __get_cpuid (0, ...) will also work.
> >
> > > I agree initializing family to 0 is not needed, but I don't understand
> > > why it isn't just
> > >       unsigned family = (eax >> 8) & 0x0f;
> > > Though, guess even that might fail with -Wmaybe-uninitialized too, as
> > > eax isn't unconditionally initialized.
> >
> > Perhaps we should check the result of __get_cpuid (1, ...) and use eax
> > only if the function returns 1? IMO, this would solve the
> > uninitialized issue, and we could use __cpuid in the second case (we
> > would know that leaf 0 is supported, because leaf 1 support was
> > checked with __get_cpuid (1, ...)).
>
> We know the code is ok if FEAT1_REGISTER = 0; is done before __get_cpuid (1,
> ...).
> Everything else is implied from it, all we need to ensure is that
> -Wmaybe-uninitialized is happy about it.
> Whatever doesn't report the warning and ideally doesn't increase the size of
> the function.
> I think the reason it is written the way it is before the AVX hacks in it
> is that we need to handle even the case when __get_cpuid (1, ...) returns 0,
> and we want in that case FEAT1_REGISTER = 0.
> So it could be

Yes, I think this is better, see below.

>   FEAT1_REGISTER = 0;
> #ifdef __x86_64__
>   if (__get_cpuid (1, &eax, &ebx, &ecx, &edx)
>       && (FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
>           == (bit_AVX | bit_CMPXCHG16B))
>     {

Here we can simply use

unsigned int family = (eax >> 8) & 0x0f;
unsigned int ecx2;

__cpuid (0, eax, ebx, ecx2, edx);

if (ecx2 ...)

> ...
>     }
> #else
>   __get_cpuid (1, &eax, &ebx, &ecx, &edx);
> #endif
> etc.
>
>         Jakub

Uros.
Uros Bizjak July 18, 2024, 11:57 a.m. UTC | #7
On Thu, Jul 18, 2024 at 10:31 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Jul 18, 2024 at 10:21 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Thu, Jul 18, 2024 at 10:12:46AM +0200, Uros Bizjak wrote:
> > > On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > > >
> > > > On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote:
> > > > > > > +      unsigned int ecx2 = 0, family = 0;
> > > > >
> > > > > No need to initialize these two variables.
> > > >
> > > > The function ignores __get_cpuid result, so at least the
> > > > FEAT1_REGISTER = 0; is needed before the first __get_cpuid.
> > > > Do you mean the ecx2 = 0 initialization is useless because
> > > > __get_cpuid (0, ...) on x86_64 will always succeed (especially
> > > > when __get_cpuid (1, ...) had to succeed otherwise FEAT1_REGISTER
> > > > would be zero)?
> > > > I guess that is true, but won't that cause -Wmaybe-uninitialized warnings?
> > >
> > > Yes, if the __get_cpuid (1, ...) works OK, then we are sure that
> > > __get_cpuid (0, ...) will also work.
> > >
> > > > I agree initializing family to 0 is not needed, but I don't understand
> > > > why it isn't just
> > > >       unsigned family = (eax >> 8) & 0x0f;
> > > > Though, guess even that might fail with -Wmaybe-uninitialized too, as
> > > > eax isn't unconditionally initialized.
> > >
> > > Perhaps we should check the result of __get_cpuid (1, ...) and use eax
> > > only if the function returns 1? IMO, this would solve the
> > > uninitialized issue, and we could use __cpuid in the second case (we
> > > would know that leaf 0 is supported, because leaf 1 support was
> > > checked with __get_cpuid (1, ...)).
> >
> > We know the code is ok if FEAT1_REGISTER = 0; is done before __get_cpuid (1,
> > ...).
> > Everything else is implied from it, all we need to ensure is that
> > -Wmaybe-uninitialized is happy about it.
> > Whatever doesn't report the warning and ideally doesn't increase the size of
> > the function.
> > I think the reason it is written the way it is before the AVX hacks in it
> > is that we need to handle even the case when __get_cpuid (1, ...) returns 0,
> > and we want in that case FEAT1_REGISTER = 0.
> > So it could be
>
> Yes, I think this is better, see below.
>
> >   FEAT1_REGISTER = 0;
> > #ifdef __x86_64__
> >   if (__get_cpuid (1, &eax, &ebx, &ecx, &edx)
> >       && (FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> >           == (bit_AVX | bit_CMPXCHG16B))
> >     {
>
> Here we can simply use

Attached patch illustrates the proposed improvement with nested cpuid
calls. Bootstrapped and teased with libatomic testsuite.

Jakub, WDYT?

Uros.
diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
index a75be3f175c..94d45683567 100644
--- a/libatomic/config/x86/init.c
+++ b/libatomic/config/x86/init.c
@@ -32,22 +32,25 @@ unsigned int
 __libat_feat1_init (void)
 {
   unsigned int eax, ebx, ecx, edx;
-  FEAT1_REGISTER = 0;
-  __get_cpuid (1, &eax, &ebx, &ecx, &edx);
-#ifdef __x86_64__
-  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
-      == (bit_AVX | bit_CMPXCHG16B))
+  if (__get_cpuid (1, &eax, &ebx, &ecx, &edx))
     {
-      /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
-	 is atomic, and AMD is going to do something similar soon.
-	 We don't have a guarantee from vendors of other CPUs with AVX,
-	 like Zhaoxin and VIA.  */
-      unsigned int ecx2 = 0;
-      __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
-      if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
-	FEAT1_REGISTER &= ~bit_AVX;
-    }
+#ifdef __x86_64__
+      if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
+	  == (bit_AVX | bit_CMPXCHG16B))
+	{
+	  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned
+	     address is atomic, and AMD is going to do something similar soon.
+	     We don't have a guarantee from vendors of other CPUs with AVX,
+	     like Zhaoxin and VIA.  */
+	  unsigned int ecx2;
+	  __cpuid (0, eax, ebx, ecx2, edx);
+	  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
+	    FEAT1_REGISTER &= ~bit_AVX;
+	}
 #endif
+    }
+  else
+    FEAT1_REGISTER = 0;
   /* See the load in load_feat1.  */
   __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
   return FEAT1_REGISTER;
Jakub Jelinek July 18, 2024, 12:07 p.m. UTC | #8
On Thu, Jul 18, 2024 at 01:57:11PM +0200, Uros Bizjak wrote:
> Attached patch illustrates the proposed improvement with nested cpuid
> calls. Bootstrapped and teased with libatomic testsuite.
> 
> Jakub, WDYT?

I'd probably keep the FEAT1_REGISTER = 0; before the if (__get_cpuid (1, ...)
to avoid the else, I think that could result in smaller code, but otherwise
LGTM, especially the use of just __cpuid there.  And note your patch doesn't
incorporate the Zhaoxin changes.

> diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
> index a75be3f175c..94d45683567 100644
> --- a/libatomic/config/x86/init.c
> +++ b/libatomic/config/x86/init.c
> @@ -32,22 +32,25 @@ unsigned int
>  __libat_feat1_init (void)
>  {
>    unsigned int eax, ebx, ecx, edx;
> -  FEAT1_REGISTER = 0;
> -  __get_cpuid (1, &eax, &ebx, &ecx, &edx);
> -#ifdef __x86_64__
> -  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> -      == (bit_AVX | bit_CMPXCHG16B))
> +  if (__get_cpuid (1, &eax, &ebx, &ecx, &edx))
>      {
> -      /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
> -	 is atomic, and AMD is going to do something similar soon.
> -	 We don't have a guarantee from vendors of other CPUs with AVX,
> -	 like Zhaoxin and VIA.  */
> -      unsigned int ecx2 = 0;
> -      __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
> -      if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
> -	FEAT1_REGISTER &= ~bit_AVX;
> -    }
> +#ifdef __x86_64__
> +      if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> +	  == (bit_AVX | bit_CMPXCHG16B))
> +	{
> +	  /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned
> +	     address is atomic, and AMD is going to do something similar soon.
> +	     We don't have a guarantee from vendors of other CPUs with AVX,
> +	     like Zhaoxin and VIA.  */
> +	  unsigned int ecx2;
> +	  __cpuid (0, eax, ebx, ecx2, edx);
> +	  if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
> +	    FEAT1_REGISTER &= ~bit_AVX;
> +	}
>  #endif
> +    }
> +  else
> +    FEAT1_REGISTER = 0;
>    /* See the load in load_feat1.  */
>    __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
>    return FEAT1_REGISTER;


	Jakub
Uros Bizjak July 18, 2024, 12:20 p.m. UTC | #9
On Thu, Jul 18, 2024 at 2:07 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Jul 18, 2024 at 01:57:11PM +0200, Uros Bizjak wrote:
> > Attached patch illustrates the proposed improvement with nested cpuid
> > calls. Bootstrapped and teased with libatomic testsuite.
> >
> > Jakub, WDYT?
>
> I'd probably keep the FEAT1_REGISTER = 0; before the if (__get_cpuid (1, ...)
> to avoid the else, I think that could result in smaller code, but otherwise

OK, I'll keep the initialization this way.

> LGTM, especially the use of just __cpuid there.  And note your patch doesn't
> incorporate the Zhaoxin changes.

This will be a separate patch.

Thanks,
Uros.

>
> > diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
> > index a75be3f175c..94d45683567 100644
> > --- a/libatomic/config/x86/init.c
> > +++ b/libatomic/config/x86/init.c
> > @@ -32,22 +32,25 @@ unsigned int
> >  __libat_feat1_init (void)
> >  {
> >    unsigned int eax, ebx, ecx, edx;
> > -  FEAT1_REGISTER = 0;
> > -  __get_cpuid (1, &eax, &ebx, &ecx, &edx);
> > -#ifdef __x86_64__
> > -  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> > -      == (bit_AVX | bit_CMPXCHG16B))
> > +  if (__get_cpuid (1, &eax, &ebx, &ecx, &edx))
> >      {
> > -      /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
> > -      is atomic, and AMD is going to do something similar soon.
> > -      We don't have a guarantee from vendors of other CPUs with AVX,
> > -      like Zhaoxin and VIA.  */
> > -      unsigned int ecx2 = 0;
> > -      __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
> > -      if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
> > -     FEAT1_REGISTER &= ~bit_AVX;
> > -    }
> > +#ifdef __x86_64__
> > +      if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> > +       == (bit_AVX | bit_CMPXCHG16B))
> > +     {
> > +       /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned
> > +          address is atomic, and AMD is going to do something similar soon.
> > +          We don't have a guarantee from vendors of other CPUs with AVX,
> > +          like Zhaoxin and VIA.  */
> > +       unsigned int ecx2;
> > +       __cpuid (0, eax, ebx, ecx2, edx);
> > +       if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
> > +         FEAT1_REGISTER &= ~bit_AVX;
> > +     }
> >  #endif
> > +    }
> > +  else
> > +    FEAT1_REGISTER = 0;
> >    /* See the load in load_feat1.  */
> >    __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
> >    return FEAT1_REGISTER;
>
>
>         Jakub
>
diff mbox series

Patch

diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
index a75be3f175c..0d6864909bb 100644
--- a/libatomic/config/x86/init.c
+++ b/libatomic/config/x86/init.c
@@ -39,12 +39,15 @@  __libat_feat1_init (void)
       == (bit_AVX | bit_CMPXCHG16B))
     {
       /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
-	 is atomic, and AMD is going to do something similar soon.
-	 We don't have a guarantee from vendors of other CPUs with AVX,
-	 like Zhaoxin and VIA.  */
-      unsigned int ecx2 = 0;
+	 is atomic, and AMD is going to do something similar soon. Zhaoxin also
+	 guarantees this. We don't have a guarantee from vendors of other CPUs 
+	 with AVX,like VIA.  */
+      unsigned int ecx2 = 0, family = 0;
+      family = (eax >> 8) & 0x0f;
       __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
-      if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
+      if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx
+          && !(ecx2 == signature_CENTAUR_ecx && family > 0x6)
+          && ecx2 != signature_SHANGHAI_ecx)
 	FEAT1_REGISTER &= ~bit_AVX;
     }
 #endif