Message ID | 1532463722.15609.40.camel@cavium.com |
---|---|
State | New |
Headers | show |
Series | [Aarch64] PR 86538 - Define __ARM_FEATURE_LSE if LSE is available | expand |
On Tue, Jul 24, 2018 at 03:22:02PM -0500, Steve Ellcey wrote: > This is a patch for PR 86538, to define an __ARM_FEATURE_LSE macro > when LSE is available. Richard Earnshaw closed PR 86538 as WONTFIX > because the ACLE (Arm C Language Extension) does not require this > macro and because he is concerned that it might encourage people to > use inline assembly instead of the __sync and atomic intrinsics. > (See actual comments in the defect report.) > > While I agree that we want people to use the intrinsics I still think > there are use cases where people may want to know if LSE is available > or not and there is currrently no (simple) way to determine if this feature > is available since it can be turned or and off independently of the > architecture used. Also, as a general principle, I think any feature > that can be toggled on or off by the compiler should provide a way for > users to determine what its state is. Well, we blow that design principle all over the place (find me a macro which tells you whether AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW is on for example :-) ) A better design principle would be that if we think language programmers may want to compile in different C code depending on a compiler option, we should consider adding a feature macro. > So what do other ARM maintainers and users think? Is this a useful > feature to have in GCC? I'm with Richard on this one. Whether LSE is available or not at compile time, the best user strategy is to use the C11/C++11 atomic extensions. That's where the memory model is well defined, well reasoned about, and well implemented. Purely in ACLE we're not keen on providing macros that don't provide choice to a C level programmer (i.e. change the prescence of intrinsics). You could well imagine an inline asm programmer wanting to choose between an LSE path and an Armv8.0-A path; but I can't imagine what they would want to do on that path that couldn't be expressed better in the C language. You might say they want to validate presence of the instruction; but that will need to be a dynamic check outside of ACLE anyway. All of which is to say, I don't think that this is a neccessary macro. Each time I've seen it requested by a user, we've told them the same thing; what do you want to express here that isn't better expressed by C atomic primitives. I'd say this patch isn't desirable for trunk. I'd be interested in use cases that need a static decision on presence of LSE that are not better expressed using higher level language features. Thanks, James
On Tue, 2018-07-24 at 22:04 +0100, James Greenhalgh wrote: > > > I'd say this patch isn't desirable for trunk. I'd be interested in use cases > that need a static decision on presence of LSE that are not better expressed > using higher level language features. > > Thanks, > James How about when building the higher level features? Right now, in sysdeps/aarch64/atomic-machine.h, we hardcode ATOMIC_EXCHANGE_USES_CAS to 0. If we had __ARM_FEATURE_LSE we could use that to determine if we wanted to set ATOMIC_EXCHANGE_USES_CAS to 0 or 1 which would affect the call generated in nptl/pthread_spin_lock.c. That would be useful if we built a lipthread specifically for a platform that had LSE. Steve Ellcey sellcey@cavium.com
On 24/07/18 22:55, Steve Ellcey wrote: > On Tue, 2018-07-24 at 22:04 +0100, James Greenhalgh wrote: >> >> >> I'd say this patch isn't desirable for trunk. I'd be interested in use cases >> that need a static decision on presence of LSE that are not better expressed >> using higher level language features. >> >> Thanks, >> James > > How about when building the higher level features? Right now, > in sysdeps/aarch64/atomic-machine.h, we > hardcode ATOMIC_EXCHANGE_USES_CAS to 0. If we had __ARM_FEATURE_LSE we > could use that to determine if we wanted to set > ATOMIC_EXCHANGE_USES_CAS to 0 or 1 which would affect the call > generated in nptl/pthread_spin_lock.c. That would be useful if we > built a lipthread specifically for a platform that had LSE. > > Steve Ellcey > sellcey@cavium.com > If there is a case for such a define, it needs to be made with the ACLE specification maintainers. I don't think GCC should be ploughing a separate furrow here. So make your case to the ACLE maintainers. If that adopts a pre-define, then implementing it in GCC would go through on the nod. R.
On Tue, Jul 24, 2018 at 10:55 PM, Steve Ellcey <sellcey@cavium.com> wrote: > On Tue, 2018-07-24 at 22:04 +0100, James Greenhalgh wrote: >> >> >> I'd say this patch isn't desirable for trunk. I'd be interested in use cases >> that need a static decision on presence of LSE that are not better expressed >> using higher level language features. >> >> Thanks, >> James > > How about when building the higher level features? > Right now, > in sysdeps/aarch64/atomic-machine.h, we > hardcode ATOMIC_EXCHANGE_USES_CAS to 0. If we had __ARM_FEATURE_LSE we > could use that to determine if we wanted to set > ATOMIC_EXCHANGE_USES_CAS to 0 or 1 which would affect the call > generated in nptl/pthread_spin_lock.c. That would be useful if we > built a lipthread specifically for a platform that had LSE. No, you don't need to define ATOMIC_EXCHANGE_USES_CAS=1 to get LSE instructions in libpthread. You get that already with -march=armv8-a+lse on the command line. ATOMIC_EXCHANGE_USES_CAS =1 in glibc implies that a CAS is faster than a SWP and that on AArch64 is a per micro-architectural decision *not* an architectural decision for the port unless someone can categorically say that the majority of implementations that glibc cares about *have* better CAS performance than SWP performance. Both the SWP and CAS instructions are part of v8.1-A / LSE thus all you need to build libpthread with lse is merely the command line option -march=armv8-a+lse. So, no you don't need this macro to build libpthread for v8.1 or LSE . You need that macro to statically choose a cas implementation over a swp implementation. See comment in include/atomic.h : /* ATOMIC_EXCHANGE_USES_CAS is non-zero if atomic_exchange operations are implemented based on a CAS loop; otherwise, this is zero and we assume that the atomic_exchange operations could provide better performance than a CAS loop. */ regards Ramana > > Steve Ellcey > sellcey@cavium.com >
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c index 40c738c..e057ba9 100644 --- a/gcc/config/aarch64/aarch64-c.c +++ b/gcc/config/aarch64/aarch64-c.c @@ -154,6 +154,9 @@ aarch64_update_cpp_builtins (cpp_reader *pfile) aarch64_def_or_undef (TARGET_SM4, "__ARM_FEATURE_SM4", pfile); aarch64_def_or_undef (TARGET_F16FML, "__ARM_FEATURE_FP16_FML", pfile); + /* This is not required by ACLE, but it is useful. */ + aarch64_def_or_undef (TARGET_LSE, "__ARM_FEATURE_LSE", pfile); + /* Not for ACLE, but required to keep "float.h" correct if we switch target between implementations that do or do not support ARMv8.2-A 16-bit floating-point extensions. */