Message ID | BANLkTinZQ=k9EKJL7LgBsDLWt6ozRoh_Kw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Wed, May 25, 2011 at 10:19 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > -- > H.J. > --- > Index: doc/extend.texi > =================================================================== > --- doc/extend.texi (revision 174216) > +++ doc/extend.texi (working copy) > @@ -8699,7 +8699,8 @@ The following built-in function is alway > > @table @code > @item void __builtin_ia32_pause (void) > -Generates the @code{pause} machine instruction with full memory barrier. > +Generates the @code{pause} machine instruction with a compiler memory > +barrier. What is the pause machine instruction do? How is it different from a normal nop? Also pause to me means it waits for input or an interrupt. Thanks, Andrew Pinski
On 05/25/2011 06:26 PM, Andrew Pinski wrote: > On Wed, May 25, 2011 at 10:19 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> -- >> H.J. >> --- >> Index: doc/extend.texi >> =================================================================== >> --- doc/extend.texi (revision 174216) >> +++ doc/extend.texi (working copy) >> @@ -8699,7 +8699,8 @@ The following built-in function is alway >> >> @table @code >> @item void __builtin_ia32_pause (void) >> -Generates the @code{pause} machine instruction with full memory barrier. >> +Generates the @code{pause} machine instruction with a compiler memory >> +barrier. > > What is the pause machine instruction do? That's documented by Intel in the architecture manual. Surely we don't have to explain it all. Andrew. PAUSE—Spin Loop Hint Improves the performance of spin-wait loops. When executing a “spin-wait loop,” a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops. An additional function of the PAUSE instruction is to reduce the power consumed by a Pentium 4 processor while executing a spin loop. The Pentium 4 processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor’s power consumption. This instruction was introduced in the Pentium 4 processors, but is backward compat- ible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a pre-defined delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying no-op operation). This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
On Wed, May 25, 2011 at 7:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, May 25, 2011 at 9:43 AM, Andrew Haley <aph@redhat.com> wrote: >> On 05/25/2011 04:32 PM, H.J. Lu wrote: >>> On Wed, May 25, 2011 at 8:27 AM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Wed, May 25, 2011 at 5:20 PM, Michael Matz <matz@suse.de> wrote: >>>>> Hi, >>>>> >>>>> On Wed, 25 May 2011, Richard Guenther wrote: >>>>> >>>>>>>> asm volatile ("" : : : "memory") in fact will work as a full memory >>>>>>>> barrier >>>>>>> >>>>>>> How? You surely need MFENCE or somesuch, unless all you care about is >>>>>>> a compiler barrier. That's what I think needs to be clarified. >>>>>> >>>>>> Well, yes, I'm talking about the compiler memory barrier. >>>>> >>>>> Something that we conventionally call "optimization barrier" :) memory >>>>> barrier has a fixed meaning which we shouldn't use in this case, it's >>>>> confusing. >>>> >>>> Sure ;) >>>> >>>> And to keep the info in a suitable thread what I'd like to improve here >>>> is to make us disambiguate memory loads/stores against asms that >>>> have no memory outputs/inputs. >>>> >>> >>> Please let me know how I should improve the document, >> >> "Compiler memory barrier" seems to be well-understood. I suggest >> >> +Generates the @code{pause} machine instruction with a compiler memory barrier. >> >> It's clear enough. >> >> Andrew. >> > > I checked in this. > > Thanks. > > > -- > H.J. > --- > Index: doc/extend.texi > =================================================================== > --- doc/extend.texi (revision 174216) > +++ doc/extend.texi (working copy) > @@ -8699,7 +8699,8 @@ The following built-in function is alway > > @table @code > @item void __builtin_ia32_pause (void) > -Generates the @code{pause} machine instruction with full memory barrier. > +Generates the @code{pause} machine instruction with a compiler memory > +barrier. > @end table This isn't true. It is _not_ a compiler memory barrier. Richard.
On 05/26/2011 10:34 AM, Richard Guenther wrote: >> Index: doc/extend.texi >> =================================================================== >> --- doc/extend.texi (revision 174216) >> +++ doc/extend.texi (working copy) >> @@ -8699,7 +8699,8 @@ The following built-in function is alway >> >> @table @code >> @item void __builtin_ia32_pause (void) >> -Generates the @code{pause} machine instruction with full memory barrier. >> +Generates the @code{pause} machine instruction with a compiler memory >> +barrier. >> @end table > > This isn't true. It is _not_ a compiler memory barrier. Please elucidate. Please suggest alternative wording. Andrew.
On Thu, May 26, 2011 at 3:30 PM, Andrew Haley <aph@redhat.com> wrote: > On 05/26/2011 10:34 AM, Richard Guenther wrote: > >>> Index: doc/extend.texi >>> =================================================================== >>> --- doc/extend.texi (revision 174216) >>> +++ doc/extend.texi (working copy) >>> @@ -8699,7 +8699,8 @@ The following built-in function is alway >>> >>> @table @code >>> @item void __builtin_ia32_pause (void) >>> -Generates the @code{pause} machine instruction with full memory barrier. >>> +Generates the @code{pause} machine instruction with a compiler memory >>> +barrier. >>> @end table >> >> This isn't true. It is _not_ a compiler memory barrier. > > Please elucidate. Please suggest alternative wording. +Generates the @code{pause} machine instruction. Richard. > Andrew. >
On 05/26/2011 02:51 PM, Richard Guenther wrote: > On Thu, May 26, 2011 at 3:30 PM, Andrew Haley <aph@redhat.com> wrote: >> On 05/26/2011 10:34 AM, Richard Guenther wrote: >> >>>> Index: doc/extend.texi >>>> =================================================================== >>>> --- doc/extend.texi (revision 174216) >>>> +++ doc/extend.texi (working copy) >>>> @@ -8699,7 +8699,8 @@ The following built-in function is alway >>>> >>>> @table @code >>>> @item void __builtin_ia32_pause (void) >>>> -Generates the @code{pause} machine instruction with full memory barrier. >>>> +Generates the @code{pause} machine instruction with a compiler memory >>>> +barrier. >>>> @end table >>> >>> This isn't true. It is _not_ a compiler memory barrier. >> >> Please elucidate. Please suggest alternative wording. > > +Generates the @code{pause} machine instruction. But that's missing the fact that it generates a compiler memory barrier, which is important. And if you think it's not a compiler memory barrier, please explain a. Why it's not a compiler memory barrier, b. What you'd call it. Andrew.
On Thu, May 26, 2011 at 3:53 PM, Andrew Haley <aph@redhat.com> wrote: > On 05/26/2011 02:51 PM, Richard Guenther wrote: >> On Thu, May 26, 2011 at 3:30 PM, Andrew Haley <aph@redhat.com> wrote: >>> On 05/26/2011 10:34 AM, Richard Guenther wrote: >>> >>>>> Index: doc/extend.texi >>>>> =================================================================== >>>>> --- doc/extend.texi (revision 174216) >>>>> +++ doc/extend.texi (working copy) >>>>> @@ -8699,7 +8699,8 @@ The following built-in function is alway >>>>> >>>>> @table @code >>>>> @item void __builtin_ia32_pause (void) >>>>> -Generates the @code{pause} machine instruction with full memory barrier. >>>>> +Generates the @code{pause} machine instruction with a compiler memory >>>>> +barrier. >>>>> @end table >>>> >>>> This isn't true. It is _not_ a compiler memory barrier. >>> >>> Please elucidate. Please suggest alternative wording. >> >> +Generates the @code{pause} machine instruction. > > But that's missing the fact that it generates a compiler memory barrier, > which is important. And if you think it's not a compiler memory barrier, > please explain > > a. Why it's not a compiler memory barrier, It is not a compiler memory barrier because it is a builtin function call which is never assumed to be a barrier for local automatic storage that does not have its address taken. > b. What you'd call it. Not a compiler memory barrier ;) To make it a compiler memory barrier you have to "expand" the builtin already in the frontend and present the middle-end with __asm__ ("...." : : : "memory"). That will serve as a compiler memory barrier also covering local non-address taken storage (global and practically most of address-taken local storage is covered by a builtin function call already). Richard. > > Andrew. >
On Thu, May 26, 2011 at 04:29:50PM +0200, Richard Guenther wrote: > To make it a compiler memory barrier you have to "expand" the > builtin already in the frontend and present the middle-end with > __asm__ ("...." : : : "memory"). That will serve as a compiler > memory barrier also covering local non-address taken storage > (global and practically most of address-taken local storage > is covered by a builtin function call already). But then, what is the point of the builtin when __asm__ __volatile__ ("rep; nop" : : : "memory"); does all of that already and has been supported for years... Jakub
On Thu, May 26, 2011 at 4:34 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, May 26, 2011 at 04:29:50PM +0200, Richard Guenther wrote: >> To make it a compiler memory barrier you have to "expand" the >> builtin already in the frontend and present the middle-end with >> __asm__ ("...." : : : "memory"). That will serve as a compiler >> memory barrier also covering local non-address taken storage >> (global and practically most of address-taken local storage >> is covered by a builtin function call already). > > But then, what is the point of the builtin when > __asm__ __volatile__ ("rep; nop" : : : "memory"); > does all of that already and has been supported for years... Good question ;) Richard.
On 05/26/2011 03:29 PM, Richard Guenther wrote: > On Thu, May 26, 2011 at 3:53 PM, Andrew Haley <aph@redhat.com> wrote: >> On 05/26/2011 02:51 PM, Richard Guenther wrote: >>> On Thu, May 26, 2011 at 3:30 PM, Andrew Haley <aph@redhat.com> wrote: >>>> On 05/26/2011 10:34 AM, Richard Guenther wrote: >>>> >>>>>> Index: doc/extend.texi >>>>>> =================================================================== >>>>>> --- doc/extend.texi (revision 174216) >>>>>> +++ doc/extend.texi (working copy) >>>>>> @@ -8699,7 +8699,8 @@ The following built-in function is alway >>>>>> >>>>>> @table @code >>>>>> @item void __builtin_ia32_pause (void) >>>>>> -Generates the @code{pause} machine instruction with full memory barrier. >>>>>> +Generates the @code{pause} machine instruction with a compiler memory >>>>>> +barrier. >>>>>> @end table >>>>> >>>>> This isn't true. It is _not_ a compiler memory barrier. >>>> >>>> Please elucidate. Please suggest alternative wording. >>> >>> +Generates the @code{pause} machine instruction. >> >> But that's missing the fact that it generates a compiler memory barrier, >> which is important. And if you think it's not a compiler memory barrier, >> please explain >> >> a. Why it's not a compiler memory barrier, > > It is not a compiler memory barrier because it is a builtin function call > which is never assumed to be a barrier for local automatic storage > that does not have its address taken. OK. How would you tell the difference between the kind of barrier that it is and a real compiler memory barrier? If an auto does not have its address taken, it isn't visible anyway. >> b. What you'd call it. > > Not a compiler memory barrier ;) I don't want to know what not to call it, though. > To make it a compiler memory barrier you have to "expand" the > builtin already in the frontend and present the middle-end with > __asm__ ("...." : : : "memory"). That will serve as a compiler > memory barrier also covering local non-address taken storage > (global and practically most of address-taken local storage > is covered by a builtin function call already). Well, the fact that it's also a memory clobber has to be documented somehow. If the present documentation is to be changed, it should not be changed by deleting a vital piece of information. Andrew.
Hi, On Thu, 26 May 2011, Andrew Haley wrote: > >>> +Generates the @code{pause} machine instruction. > >> > >> But that's missing the fact that it generates a compiler memory > >> barrier, which is important. And if you think it's not a compiler > >> memory barrier, please explain > >> > >> a. Why it's not a compiler memory barrier, > > > > It is not a compiler memory barrier because it is a builtin function call > > which is never assumed to be a barrier for local automatic storage > > that does not have its address taken. > > OK. How would you tell the difference between the kind of barrier > that it is and a real compiler memory barrier? First we have to determine if this builtin really does what its users intend to use it for. I believe they _do_ want to use it also with regards to auto variables (it includes also address-takens whose address doesn't escape). A normal builtin call is not a barrier for operations on such entities, hence it might very well be that the implementation of HJ actually doesn't what he wanted. I don't have a good word for what functions calls are in their barrierness part of pre/post conditions. "global memory movement barrier" perhaps, with an appropriate definition of global memory (which funnily include address-taken escaped local storage, ugh). > > To make it a compiler memory barrier you have to "expand" the > > builtin already in the frontend and present the middle-end with > > __asm__ ("...." : : : "memory"). That will serve as a compiler > > memory barrier also covering local non-address taken storage > > (global and practically most of address-taken local storage > > is covered by a builtin function call already). > > Well, the fact that it's also a memory clobber has to be documented > somehow. If the present documentation is to be changed, it should > not be changed by deleting a vital piece of information. It's not only about the docu. As implemented right now it's neither an optimization barrier nor a memory clobber. Ciao, Michael.
Richard Guenther <richard.guenther@gmail.com> writes: > > To make it a compiler memory barrier you have to "expand" the > builtin already in the frontend and present the middle-end with > __asm__ ("...." : : : "memory"). That will serve as a compiler Those are the intended semantics (at least those I asked for :-). For all practical purposes the same as asm volatile("pause" ::: "memory") HJ? Can it be expanded earlier? As for why having a builtin: one reason would be portability. Various other architectures have a similar instruction (e.g. PPC). They could be added later to this as a next step. Then it also seems cleaner to me to cover the instruction set with builtins like the others. -Andi
On Thu, May 26, 2011 at 09:10:32AM -0700, Andi Kleen wrote: > Richard Guenther <richard.guenther@gmail.com> writes: > As for why having a builtin: one reason would be portability. You mean portability to other compilers (I think reasonable amount of them support gcc-ish inline asm), or to other architectures? __builtin_ia32_pause () doesn't look like a builtin you would want to use on PPC. > Then it also seems cleaner to me to cover the instruction > set with builtins like the others. No idea why in this case. Builtins have the advantage that they can be better scheduled, but in this case you don't want to move it around. Jakub
On Thu, May 26, 2011 at 06:46:39PM +0200, Jakub Jelinek wrote: > On Thu, May 26, 2011 at 09:10:32AM -0700, Andi Kleen wrote: > > Richard Guenther <richard.guenther@gmail.com> writes: > > As for why having a builtin: one reason would be portability. > > You mean portability to other compilers (I think reasonable amount > of them support gcc-ish inline asm), or to other architectures? Both. > __builtin_ia32_pause () doesn't look like a builtin you would > want to use on PPC. That's true, it should probably have a different name. __builtin_pause()? The Linux kernel calls it cpu_relax() on all architectures. The following architectures implement it: ia64, powerpc, x86 On others it just acts like a barrier. I suppose most CPUs that implement SMT will have some equivalent. -Andi
On May 26, 2011, at 1:37 PM, Andi Kleen wrote: > On Thu, May 26, 2011 at 06:46:39PM +0200, Jakub Jelinek wrote: >> On Thu, May 26, 2011 at 09:10:32AM -0700, Andi Kleen wrote: >>> Richard Guenther <richard.guenther@gmail.com> writes: >>> As for why having a builtin: one reason would be portability. >> >> You mean portability to other compilers (I think reasonable amount >> of them support gcc-ish inline asm), or to other architectures? > > Both. > >> __builtin_ia32_pause () doesn't look like a builtin you would >> want to use on PPC. > > That's true, it should probably have a different name. > > __builtin_pause()? > > The Linux kernel calls it cpu_relax() on all architectures. > The following architectures implement it: ia64, powerpc, x86 > On others it just acts like a barrier. Relax? Weird. "Pause" is just as weird. It might be an ia32 instruction, so as an ia32 builtin it is a reasonable name But if you want a generic builtin, you need a name that actually has some plausible connection with what it does, and neither "pause" nor "relax" do that. paul
> Relax? Weird. "Pause" is just as weird. It might be an ia32 instruction, so as an ia32 builtin it is a reasonable name But if you want a generic builtin, you need a name that actually has some plausible connection with what it does, and neither "pause" nor "relax" do that.
It's a short pause for the CPU. Both names fit quite well.
-Andi
On Thu, 26 May 2011 13:48:13 -0400 Paul Koning <paul_koning@dell.com> wrote: > Relax? Weird. "Pause" is just as weird. It might be an ia32 instruction, > so as an ia32 builtin it is a reasonable name But if you want a generic > builtin, you need a name that actually has some plausible connection with > what it does, and neither "pause" nor "relax" do that. I still think that having a builtin which do a "compiler flush" that is which spill all registers to memory is useful, eg a builtin_compiler_flush() And I even think there is another reason to use it. If you are debugging a program compiled with -O2 -g, and if you know where there could be a bug or a fault, temporarily adding a call to that builtin_compiler_flush () would probably help the gdb debugger a lot. Regards.
On 05/26/2011 08:37 PM, Basile Starynkevitch wrote: > On Thu, 26 May 2011 13:48:13 -0400 > Paul Koning <paul_koning@dell.com> wrote: > >> Relax? Weird. "Pause" is just as weird. It might be an ia32 instruction, >> so as an ia32 builtin it is a reasonable name But if you want a generic >> builtin, you need a name that actually has some plausible connection with >> what it does, and neither "pause" nor "relax" do that. > > I still think that having a builtin which do a "compiler flush" that is > which spill all registers to memory is useful, eg a > builtin_compiler_flush() I don't see how it can do that without causing reload failures. You'd have to be very careful somehow to identify user variables. Andrew.
Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 174216) +++ doc/extend.texi (working copy) @@ -8699,7 +8699,8 @@ The following built-in function is alway @table @code @item void __builtin_ia32_pause (void) -Generates the @code{pause} machine instruction with full memory barrier. +Generates the @code{pause} machine instruction with a compiler memory +barrier. @end table The following floating point built-in functions are made available in the Index: ChangeLog =================================================================== --- ChangeLog (revision 174216) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-05-25 H.J. Lu <hongjiu.lu@intel.com> + + * doc/extend.texi (X86 Built-in Functions): Update pause + intrinsic. + 2011-05-25 Bernd Schmidt <bernds@codesourcery.com> PR bootstrap/49160