diff mbox series

meson: ensure we enable CMPXCHG128 on x86_64

Message ID 20241004220123.978938-1-pierrick.bouvier@linaro.org
State New
Headers show
Series meson: ensure we enable CMPXCHG128 on x86_64 | expand

Commit Message

Pierrick Bouvier Oct. 4, 2024, 10:01 p.m. UTC
Alex discovered that CMPXCHG128 was not enabled when building for
x86_64, resulting in slow execution for wide atomic instructions,
creating a huge contention when combined with a high number of cpus
(found while booting android aarch64 guest on x86_64 host).

The problem is that even though we enable -mcx16 option for x86_64, this
is not used when testing for CMPXCHG128. Thus, we silently turn it off.

x86_64 is the only architecture adding machine flags for now, so the
problem is limited to this host architecture.

Meson compiler tests are supposed to be independent of environment flags
(https://mesonbuild.com/Reference-manual_returned_compiler.html#returned-by).
However, CFLAGS are used anyway, thus masking the problem when using
something like CFLAGS='-march=native'. This is a meson bug and was reported:
https://github.com/mesonbuild/meson/issues/13757

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 meson.build | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Michael Tokarev Oct. 5, 2024, 4:16 p.m. UTC | #1
05.10.2024 01:01, Pierrick Bouvier wrote:
> Alex discovered that CMPXCHG128 was not enabled when building for
> x86_64, resulting in slow execution for wide atomic instructions,
> creating a huge contention when combined with a high number of cpus
> (found while booting android aarch64 guest on x86_64 host).
> 
> The problem is that even though we enable -mcx16 option for x86_64, this
> is not used when testing for CMPXCHG128. Thus, we silently turn it off.
> 
> x86_64 is the only architecture adding machine flags for now, so the
> problem is limited to this host architecture.
> 
> Meson compiler tests are supposed to be independent of environment flags
> (https://mesonbuild.com/Reference-manual_returned_compiler.html#returned-by).
> However, CFLAGS are used anyway, thus masking the problem when using
> something like CFLAGS='-march=native'. This is a meson bug and was reported:
> https://github.com/mesonbuild/meson/issues/13757
> 
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>   meson.build | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/meson.build b/meson.build
> index b18c2a54ab5..af2ce595dcc 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -2867,6 +2867,13 @@ if has_int128_type
>       config_host_data.set('CONFIG_ATOMIC128_OPT', has_atomic128_opt)
>   
>       if not has_atomic128_opt
> +
> +      host_flags = []
> +      if host_arch == 'x86_64'
> +        # for x86_64, x86_version must be >= 1, and we always enable cmpxchg16
> +        # in this case.
> +        host_flags += ['-mcx16']
> +      endif
>         config_host_data.set('CONFIG_CMPXCHG128', cc.links('''
>           int main(void)
>           {
> @@ -2874,7 +2881,8 @@ if has_int128_type
>             __sync_val_compare_and_swap_16(&x, y, x);
>             return 0;
>           }
> -      '''))
> +      ''',
> +      args: host_flags))
>       endif
>     endif
>   endif


This does not look right.  We test with an extra compiler flag, we should
ensure it is actually enabled for the actual build.  Here, we only test
if cmpxchg128 works with this flag appended, but do not enable if for
the build.  It just happens we have it enabled elsewhere..

Maybe we should add this flags somewhere to $qemu_cflags in this place
if the test were successful.  With the proposed variant it is confusing
and works just because it happens to match in a few places, not because
it is supposed to work :)

Besides, in the current situation where CONFIG_CMPXCHG128 is not defined
due to this bug, the final link fails due to generated calls to -latomic, -
which might mean we have something else wrong.

FWIW.

/mjt
Richard Henderson Oct. 5, 2024, 4:20 p.m. UTC | #2
On 10/5/24 09:16, Michael Tokarev wrote:
> Besides, in the current situation where CONFIG_CMPXCHG128 is not defined
> due to this bug, the final link fails due to generated calls to -latomic, -
> which might mean we have something else wrong.

No, it doesn't fail to link.  That's why it took so long to notice.


r~
Michael Tokarev Oct. 5, 2024, 4:22 p.m. UTC | #3
05.10.2024 19:20, Richard Henderson wrote:
> On 10/5/24 09:16, Michael Tokarev wrote:
>> Besides, in the current situation where CONFIG_CMPXCHG128 is not defined
>> due to this bug, the final link fails due to generated calls to -latomic, -
>> which might mean we have something else wrong.
> 
> No, it doesn't fail to link.  That's why it took so long to notice.

It does here quite often during bisection between 9.0 and 9.1.
Lemme see..

/mjt
Michael Tokarev Oct. 5, 2024, 4:30 p.m. UTC | #4
05.10.2024 19:22, Michael Tokarev wrote:
> 05.10.2024 19:20, Richard Henderson wrote:
>> No, it doesn't fail to link.  That's why it took so long to notice.
> 
> It does here quite often during bisection between 9.0 and 9.1.
> Lemme see..

So, it is just 2 steps:

$ git bisect start v9.0.0 v9.1.0   # at 2529ea2d561ea9fe359fb19ebdcfeb8b6cddd219
$ configure --target-list=x86_64-softmmu
$ ninja qemu-system-x86_64   # this one will work
$ git bisect bad    # at 544595e73007c824b7435b52519cc578586783a6
$ ninja qemu-system-x86_64   # this breaks:

[2/2] Linking target qemu-system-x86_64
FAILED: qemu-system-x86_64
cc -m64 @qemu-system-x86_64.rsp
/usr/bin/ld: libqemu-x86_64-softmmu.fa.p/accel_tcg_cputlb.c.o: in function `atomic16_read_rw':
/build/qemu/master/host/include/x86_64/host/atomic128-ldst.h:51: undefined reference to `__sync_val_compare_and_swap_16'
/usr/bin/ld: /build/qemu/master/host/include/x86_64/host/atomic128-ldst.h:51: undefined reference to `__sync_val_compare_and_swap_16'
/usr/bin/ld: /build/qemu/master/host/include/x86_64/host/atomic128-ldst.h:51: undefined reference to `__sync_val_compare_and_swap_16'
/usr/bin/ld: /build/qemu/master/host/include/x86_64/host/atomic128-ldst.h:51: undefined reference to `__sync_val_compare_and_swap_16'
/usr/bin/ld: /build/qemu/master/host/include/x86_64/host/atomic128-ldst.h:51: undefined reference to `__sync_val_compare_and_swap_16'
/usr/bin/ld: libqemu-x86_64-softmmu.fa.p/accel_tcg_cputlb.c.o: in function `atomic16_set':
/build/qemu/master/host/include/x86_64/host/atomic128-ldst.h:67: undefined reference to `__sync_bool_compare_and_swap_16'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

So it requires second step.
configure is re-run at the second build. So we're holding
some data between rebuilds which we shouldn't.

/mjt
Pierrick Bouvier Oct. 5, 2024, 5:34 p.m. UTC | #5
On 10/5/24 09:16, Michael Tokarev wrote:
> 05.10.2024 01:01, Pierrick Bouvier wrote:
>> Alex discovered that CMPXCHG128 was not enabled when building for
>> x86_64, resulting in slow execution for wide atomic instructions,
>> creating a huge contention when combined with a high number of cpus
>> (found while booting android aarch64 guest on x86_64 host).
>>
>> The problem is that even though we enable -mcx16 option for x86_64, this
>> is not used when testing for CMPXCHG128. Thus, we silently turn it off.
>>
>> x86_64 is the only architecture adding machine flags for now, so the
>> problem is limited to this host architecture.
>>
>> Meson compiler tests are supposed to be independent of environment flags
>> (https://mesonbuild.com/Reference-manual_returned_compiler.html#returned-by).
>> However, CFLAGS are used anyway, thus masking the problem when using
>> something like CFLAGS='-march=native'. This is a meson bug and was reported:
>> https://github.com/mesonbuild/meson/issues/13757
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>>    meson.build | 10 +++++++++-
>>    1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/meson.build b/meson.build
>> index b18c2a54ab5..af2ce595dcc 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -2867,6 +2867,13 @@ if has_int128_type
>>        config_host_data.set('CONFIG_ATOMIC128_OPT', has_atomic128_opt)
>>    
>>        if not has_atomic128_opt
>> +
>> +      host_flags = []
>> +      if host_arch == 'x86_64'
>> +        # for x86_64, x86_version must be >= 1, and we always enable cmpxchg16
>> +        # in this case.
>> +        host_flags += ['-mcx16']
>> +      endif
>>          config_host_data.set('CONFIG_CMPXCHG128', cc.links('''
>>            int main(void)
>>            {
>> @@ -2874,7 +2881,8 @@ if has_int128_type
>>              __sync_val_compare_and_swap_16(&x, y, x);
>>              return 0;
>>            }
>> -      '''))
>> +      ''',
>> +      args: host_flags))
>>        endif
>>      endif
>>    endif
> 
> 
> This does not look right.  We test with an extra compiler flag, we should
> ensure it is actually enabled for the actual build.  Here, we only test
> if cmpxchg128 works with this flag appended, but do not enable if for
> the build.  It just happens we have it enabled elsewhere..
> 
> Maybe we should add this flags somewhere to $qemu_cflags in this place
> if the test were successful.  With the proposed variant it is confusing
> and works just because it happens to match in a few places, not because
> it is supposed to work :)

If you look upper in meson.build (x86_version), you'll see this flag 
(-mcx16) is *always* enabled for x86_64, which is what the comment 
states in this patch.
So it's already added to qemu_common_flags anyway. The problem is just 
that when we call cc.links, environment we defined is not reused.
After discussing on IRC, we identified it was a really special case only 
for x64 and this instruction, so I didn't find useful to make a more 
complex solution than just using the same flag.

Maybe it would be more clear to define a qemu_machine_flags, and reuse 
it here (or in other places where it might be needed), but I feel like 
it's too complicated, and we can always add this later.

I'm open to implement something different for flags management, but I 
feel like we are thinking too much ahead.

> 
> Besides, in the current situation where CONFIG_CMPXCHG128 is not defined
> due to this bug, the final link fails due to generated calls to -latomic, -
> which might mean we have something else wrong.
> 

By default, __sync_val_compare_and_swap_16 is not available, except if 
we use explicitely -mcx16, or if we use an -march that implies this.

Our default level for x86_version does not enable it.

$ cat ./build/meson-logs/meson-log.txt

Code: 

 
 

         int main(void) 
 

         { 

           __uint128_t x = 0, y = 0; 

           __sync_val_compare_and_swap_16(&x, y, x); 
 

           return 0; 
 

         } 
 

 

----------- 
 

Command line: `cc -m64 
/home/user/.work/qemu/build/meson-private/tmpkoy6pfnd/testfile.c -o 
/home/user/.work/qemu/build/meson-private/tmpkoy6pfnd/output.exe -O2 -g 
-fno-omit-f
rame-pointer -D_FILE_OFFSET_BITS=64 -O0 -std=gnu11` -> 1 
 

stderr: 
 

/usr/bin/ld: /tmp/cccPzS5X.o: in function `main': 

/home/user/.work/qemu/build/meson-private/tmpkoy6pfnd/testfile.c:5: 
undefined reference to `__sync_val_compare_and_swap_16' 

collect2: error: ld returned 1 exit status
-----------

> FWIW.
> 
> /mjt
Pierrick Bouvier Oct. 5, 2024, 5:44 p.m. UTC | #6
On 10/5/24 10:34, Pierrick Bouvier wrote:
> On 10/5/24 09:16, Michael Tokarev wrote:
>> 05.10.2024 01:01, Pierrick Bouvier wrote:
>>> Alex discovered that CMPXCHG128 was not enabled when building for
>>> x86_64, resulting in slow execution for wide atomic instructions,
>>> creating a huge contention when combined with a high number of cpus
>>> (found while booting android aarch64 guest on x86_64 host).
>>>
>>> The problem is that even though we enable -mcx16 option for x86_64, this
>>> is not used when testing for CMPXCHG128. Thus, we silently turn it off.
>>>
>>> x86_64 is the only architecture adding machine flags for now, so the
>>> problem is limited to this host architecture.
>>>
>>> Meson compiler tests are supposed to be independent of environment flags
>>> (https://mesonbuild.com/Reference-manual_returned_compiler.html#returned-by).
>>> However, CFLAGS are used anyway, thus masking the problem when using
>>> something like CFLAGS='-march=native'. This is a meson bug and was reported:
>>> https://github.com/mesonbuild/meson/issues/13757
>>>
>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>> ---
>>>     meson.build | 10 +++++++++-
>>>     1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/meson.build b/meson.build
>>> index b18c2a54ab5..af2ce595dcc 100644
>>> --- a/meson.build
>>> +++ b/meson.build
>>> @@ -2867,6 +2867,13 @@ if has_int128_type
>>>         config_host_data.set('CONFIG_ATOMIC128_OPT', has_atomic128_opt)
>>>     
>>>         if not has_atomic128_opt
>>> +
>>> +      host_flags = []
>>> +      if host_arch == 'x86_64'
>>> +        # for x86_64, x86_version must be >= 1, and we always enable cmpxchg16
>>> +        # in this case.
>>> +        host_flags += ['-mcx16']
>>> +      endif
>>>           config_host_data.set('CONFIG_CMPXCHG128', cc.links('''
>>>             int main(void)
>>>             {
>>> @@ -2874,7 +2881,8 @@ if has_int128_type
>>>               __sync_val_compare_and_swap_16(&x, y, x);
>>>               return 0;
>>>             }
>>> -      '''))
>>> +      ''',
>>> +      args: host_flags))
>>>         endif
>>>       endif
>>>     endif
>>
>>
>> This does not look right.  We test with an extra compiler flag, we should
>> ensure it is actually enabled for the actual build.  Here, we only test
>> if cmpxchg128 works with this flag appended, but do not enable if for
>> the build.  It just happens we have it enabled elsewhere..
>>
>> Maybe we should add this flags somewhere to $qemu_cflags in this place
>> if the test were successful.  With the proposed variant it is confusing
>> and works just because it happens to match in a few places, not because
>> it is supposed to work :)
> 
> If you look upper in meson.build (x86_version), you'll see this flag
> (-mcx16) is *always* enabled for x86_64, which is what the comment
> states in this patch.
> So it's already added to qemu_common_flags anyway. The problem is just
> that when we call cc.links, environment we defined is not reused.
> After discussing on IRC, we identified it was a really special case only
> for x64 and this instruction, so I didn't find useful to make a more
> complex solution than just using the same flag.
> 
> Maybe it would be more clear to define a qemu_machine_flags, and reuse
> it here (or in other places where it might be needed), but I feel like
> it's too complicated, and we can always add this later.
> 
> I'm open to implement something different for flags management, but I
> feel like we are thinking too much ahead.
> 
>>
>> Besides, in the current situation where CONFIG_CMPXCHG128 is not defined
>> due to this bug, the final link fails due to generated calls to -latomic, -
>> which might mean we have something else wrong.
>>
> 
> By default, __sync_val_compare_and_swap_16 is not available, except if
> we use explicitely -mcx16, or if we use an -march that implies this.
> 
> Our default level for x86_version does not enable it.
> 
> $ cat ./build/meson-logs/meson-log.txt
> 
> Code:
> 
>   
>   
> 
>           int main(void)
>   
> 
>           {
> 
>             __uint128_t x = 0, y = 0;
> 
>             __sync_val_compare_and_swap_16(&x, y, x);
>   
> 
>             return 0;
>   
> 
>           }
>   
> 
>   
> 
> -----------
>   
> 
> Command line: `cc -m64
> /home/user/.work/qemu/build/meson-private/tmpkoy6pfnd/testfile.c -o
> /home/user/.work/qemu/build/meson-private/tmpkoy6pfnd/output.exe -O2 -g
> -fno-omit-f
> rame-pointer -D_FILE_OFFSET_BITS=64 -O0 -std=gnu11` -> 1
>   
> 
> stderr:
>   
> 
> /usr/bin/ld: /tmp/cccPzS5X.o: in function `main':
> 
> /home/user/.work/qemu/build/meson-private/tmpkoy6pfnd/testfile.c:5:
> undefined reference to `__sync_val_compare_and_swap_16'
> 
> collect2: error: ld returned 1 exit status
> -----------
> 
>> FWIW.
>>
>> /mjt

Initially, the flag -mcx16 was forced in definition of CC in meson cross 
file (see c2bf2ccb266dc9ae4a6da75b845f54535417e109).
When it was removed and added in meson.build instead, it silently 
deactivated cmpxchg128 feature detection, and since everything still 
compiles/links correctly, we didn't notice until a recent investigation 
on a performance issue.

Root cause if that we try to identify a feature that depends on a 
specific compilation flag, and it's a scenario for which 
cc.compiles/cc.links is not designed out of the box. So it's normal that 
we have to add this flag manually somewhere.

I feel that having this explicitely in the meson.build is better than 
implicitly define it in value of CC.

I'm open to any idea you would like, we just need to have this flag 
somewhere, that's what fix the problem.

Regards,
Pierrick
Daniel P. Berrangé Oct. 7, 2024, 8:42 a.m. UTC | #7
On Fri, Oct 04, 2024 at 03:01:23PM -0700, Pierrick Bouvier wrote:
> Alex discovered that CMPXCHG128 was not enabled when building for
> x86_64, resulting in slow execution for wide atomic instructions,
> creating a huge contention when combined with a high number of cpus
> (found while booting android aarch64 guest on x86_64 host).
> 
> The problem is that even though we enable -mcx16 option for x86_64, this
> is not used when testing for CMPXCHG128. Thus, we silently turn it off.
> 
> x86_64 is the only architecture adding machine flags for now, so the
> problem is limited to this host architecture.
> 
> Meson compiler tests are supposed to be independent of environment flags
> (https://mesonbuild.com/Reference-manual_returned_compiler.html#returned-by).
> However, CFLAGS are used anyway, thus masking the problem when using
> something like CFLAGS='-march=native'. This is a meson bug and was reported:
> https://github.com/mesonbuild/meson/issues/13757
> 
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>  meson.build | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/meson.build b/meson.build
> index b18c2a54ab5..af2ce595dcc 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -2867,6 +2867,13 @@ if has_int128_type
>      config_host_data.set('CONFIG_ATOMIC128_OPT', has_atomic128_opt)
>  
>      if not has_atomic128_opt
> +
> +      host_flags = []
> +      if host_arch == 'x86_64'
> +        # for x86_64, x86_version must be >= 1, and we always enable cmpxchg16
> +        # in this case.
> +        host_flags += ['-mcx16']
> +      endif

IMHO, we shouldn't be assuming that the earlier code passed '-mcx16',
as it requires that we keep 2 far away parts of meson.build in sync.

>        config_host_data.set('CONFIG_CMPXCHG128', cc.links('''
>          int main(void)
>          {
> @@ -2874,7 +2881,8 @@ if has_int128_type
>            __sync_val_compare_and_swap_16(&x, y, x);
>            return 0;
>          }
> -      '''))
> +      ''',
> +      args: host_flags))

Just pass 'qemu_common_flags' here.

>      endif
>    endif
>  endif

With regards,
Daniel
diff mbox series

Patch

diff --git a/meson.build b/meson.build
index b18c2a54ab5..af2ce595dcc 100644
--- a/meson.build
+++ b/meson.build
@@ -2867,6 +2867,13 @@  if has_int128_type
     config_host_data.set('CONFIG_ATOMIC128_OPT', has_atomic128_opt)
 
     if not has_atomic128_opt
+
+      host_flags = []
+      if host_arch == 'x86_64'
+        # for x86_64, x86_version must be >= 1, and we always enable cmpxchg16
+        # in this case.
+        host_flags += ['-mcx16']
+      endif
       config_host_data.set('CONFIG_CMPXCHG128', cc.links('''
         int main(void)
         {
@@ -2874,7 +2881,8 @@  if has_int128_type
           __sync_val_compare_and_swap_16(&x, y, x);
           return 0;
         }
-      '''))
+      ''',
+      args: host_flags))
     endif
   endif
 endif