Message ID | 20110124213133.GA21518@hungry-tiger.westford.ibm.com |
---|---|
State | New |
Headers | show |
On 1/24/2011 1:31 PM, Michael Meissner wrote: > So after some debate within IBM, we've come to the conclusion that I should not > have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that > we should go back to using the Altivec form for these instructions Can you explain why that's desirable? I think that the first thing to do is to convince ourselves that's technically desirable; if we can't do that, then there's no need to think about whether to do it now or later. My gut instinct is that having released 4.5, we should just live with the semantics we now have; we've broken compatibility with some Altivec code when compiled for Power 7, but breaking compatibility again seems like it will just confuse things worse. Thank you,
On Mon, Jan 24, 2011 at 01:34:50PM -0800, Mark Mitchell wrote: > On 1/24/2011 1:31 PM, Michael Meissner wrote: > > > So after some debate within IBM, we've come to the conclusion that I should not > > have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that > > we should go back to using the Altivec form for these instructions > > Can you explain why that's desirable? I think that the first thing to > do is to convince ourselves that's technically desirable; if we can't do > that, then there's no need to think about whether to do it now or later. > > My gut instinct is that having released 4.5, we should just live with > the semantics we now have; we've broken compatibility with some Altivec > code when compiled for Power 7, but breaking compatibility again seems > like it will just confuse things worse. Sure, if you have a program that is dealing with unaligned data, and you know the machine ANDs out the bottom bits, you would write the code to handle the initial bits using something like lex.c has. At this point, I really can see both sides (do we cater to the 4.4 users or the 4.5 users). I suspect that if we wanted to do it automatically, we would need to see if LVSR or similar instruction was used. Or just provide an #ifdef, and make the default the 4.5 behavior. Here is the code in lex.c that knows about this alignment quirk: static const uchar * search_line_fast (const uchar *s, const uchar *end ATTRIBUTE_UNUSED) { typedef __attribute__((altivec(vector))) unsigned char vc; /* ... */ const vc ones = { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, }; const vc zero = { 0 }; vc data, mask, t; /* Altivec loads automatically mask addresses with -16. This lets us issue the first load as early as possible. */ data = __builtin_vec_ld(0, (const vc *)s); /* Discard bytes before the beginning of the buffer. Do this by beginning with all ones and shifting in zeros according to the mis-alignment. The LVSR instruction pulls the exact shift we want from the address. */ mask = __builtin_vec_lvsr(0, s); mask = __builtin_vec_perm(zero, ones, mask); data &= mask; /* While altivec loads mask addresses, we still need to align S so that the offset we compute at the end is correct. */ s = (const uchar *)((uintptr_t)s & -16); /* Main loop processing 16 bytes at a time. */ goto start; do { vc m_nl, m_cr, m_bs, m_qm; s += 16; data = __builtin_vec_ld(0, (const vc *)s); /* ... */ } while (!__builtin_vec_vcmpeq_p(/*__CR6_LT_REV*/3, t, zero));
On 1/24/2011 1:52 PM, Michael Meissner wrote: >> My gut instinct is that having released 4.5, we should just live with >> the semantics we now have; we've broken compatibility with some Altivec >> code when compiled for Power 7, but breaking compatibility again seems >> like it will just confuse things worse. > Sure, if you have a program that is dealing with unaligned data, and you know > the machine ANDs out the bottom bits, you would write the code to handle the > initial bits using something like lex.c has. At this point, I really can see > both sides (do we cater to the 4.4 users or the 4.5 users). OK, right, I see that. We keep learning how important backwards compatibility is to people. We really need to take that incredibly seriously going forward; we (and by "we" I mean "I") have historically under-estimated how important that is. In any case, here we are. I think that having come this far, we might as well just leave things as they are. I don't think that we really make things better by flip-flopping on semantics from release to release; by now, some people have probably figured out that we changed, and have some #ifdef somewhere to deal with it, and when we change back, their #ifdef will break, and we'll lose again. My two cents,
On Mon, 24 Jan 2011, Mark Mitchell wrote: > On 1/24/2011 1:31 PM, Michael Meissner wrote: > > > So after some debate within IBM, we've come to the conclusion that I should not > > have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that > > we should go back to using the Altivec form for these instructions > > Can you explain why that's desirable? I think that the first thing to > do is to convince ourselves that's technically desirable; if we can't do > that, then there's no need to think about whether to do it now or later. > > My gut instinct is that having released 4.5, we should just live with > the semantics we now have; we've broken compatibility with some Altivec > code when compiled for Power 7, but breaking compatibility again seems > like it will just confuse things worse. I think we should revert to the pre-4.5 behavior and also fix 4.5. Especially so if there are other compilers that follow the pre-4.5 behavior - are there? Richard. -- Richard Guenther <rguenther@suse.de> Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex
On Tue, Jan 25, 2011 at 11:15:25AM +0100, Richard Guenther wrote: > On Mon, 24 Jan 2011, Mark Mitchell wrote: > > > So after some debate within IBM, we've come to the conclusion that I should not > > > have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that > > > we should go back to using the Altivec form for these instructions > > > > Can you explain why that's desirable? I think that the first thing to > > do is to convince ourselves that's technically desirable; if we can't do > > that, then there's no need to think about whether to do it now or later. > > > > My gut instinct is that having released 4.5, we should just live with > > the semantics we now have; we've broken compatibility with some Altivec > > code when compiled for Power 7, but breaking compatibility again seems > > like it will just confuse things worse. > > I think we should revert to the pre-4.5 behavior and also fix 4.5. Yeah, I agree. Especially when it is compatible not just with pre-4.5, but also with 4.5 with -mno-vsx. Changing behavior of an intrinsic depending on what CPU variant it is targetting is bad, for the VSX load/store behavior there should be a different intrinsic instead. Jakub
On 1/25/2011 2:15 AM, Richard Guenther wrote: > I think we should revert to the pre-4.5 behavior and also fix 4.5. > Especially so if there are other compilers that follow the pre-4.5 > behavior - are there? You're suggesting changing 4.5 so that 4.5.N is incompatible with previous 4.5 releases? That seems really unfortunate to me; we're now introducing compatibility problems in what's supposed to be a bug-fix release. To justify that, we really have to decide that this was a blatant bug. But, it wasn't; it was a reasonable choice given the hardware. In retrospect, not the best choice, but a reasonable choice. I'm certainly willing to bow to a consensus opinion here. If the Power maintainers have consensus that this is the right way to go, then I would respect that. Thank you,
On Mon, 24 Jan 2011, Mark Mitchell wrote: > On 1/24/2011 1:31 PM, Michael Meissner wrote: > > > So after some debate within IBM, we've come to the conclusion that I should not > > have changed the semantics of __builtin_vec_ld and __builtin_vec_st, and that > > we should go back to using the Altivec form for these instructions > > Can you explain why that's desirable? I think that the first thing to > do is to convince ourselves that's technically desirable; if we can't do > that, then there's no need to think about whether to do it now or later. > > My gut instinct is that having released 4.5, we should just live with > the semantics we now have; we've broken compatibility with some Altivec > code when compiled for Power 7, but breaking compatibility again seems > like it will just confuse things worse. First, the semantics of vec_ld and vec_st (the altivec.h intrinsics, as opposed to the built-in functions) are documented in the AltiVec PIM. Both my copies (I have one PDF with a Motorola logo and one with a Freescale logo) explicitly say for vec_ld: Each operation performs a 16-byte load at a 16-byte aligned address. The a is taken to be an integer value, while b is a pointer. BoundAlign(a+b,16) is the largest value less than or equal to a + b that is a multiple of 16. with similar wording for vec_st. Thus, I would say the semantics of these intrinsics for unaligned operations are well-defined, without reference to particular instructions, and breaking them for a particular -mcpu should be treated much the same as if we (for example) broke integer division for some -mcpu where it had previously followed C99 semantics: it is a wrong-code regression and should be fixed as such. As for the built-in functions: we do of course document that they should not be used directly. I'm not sure there's a particular reason this code uses them, and indeed PR 45381 has a patch to use altivec.h instead, though it's reported as not working to fix the underlying problem there (AltiVec build not working with Apple 4.0 compiler). I don't think there's any good reason for them to deviate from the AltiVec PIM semantics even if they weren't expressly documented to keep to those semantics, and making vec_ld and __builtin-vec_ld different would certainly be nasty for the altivec.h implementation. Unlike __sync_fetch_and_nand and __sync_nand_and_fetch, where there was a long history of consistently implementing semantics different from those intended, and where we added a warning with -Wsync-nand (enabled by default), I do not think a warning is needed here, where the broken semantics were only for a limited period with particular options.
On Tue, 25 Jan 2011, Mark Mitchell wrote: > On 1/25/2011 2:15 AM, Richard Guenther wrote: > > > I think we should revert to the pre-4.5 behavior and also fix 4.5. > > Especially so if there are other compilers that follow the pre-4.5 > > behavior - are there? > > You're suggesting changing 4.5 so that 4.5.N is incompatible with > previous 4.5 releases? That seems really unfortunate to me; we're now > introducing compatibility problems in what's supposed to be a bug-fix > release. To justify that, we really have to decide that this was a > blatant bug. But, it wasn't; it was a reasonable choice given the > hardware. In retrospect, not the best choice, but a reasonable choice. I see it as a blatant bug as it appearantly breaks code that was perfectly working with previous GCC releases. > I'm certainly willing to bow to a consensus opinion here. If the Power > maintainers have consensus that this is the right way to go, then I > would respect that. Of course - it's up to the Power maintainers to decide what to do. Richard.
On Wed, Jan 26, 2011 at 5:21 AM, Richard Guenther <rguenther@suse.de> wrote: > On Tue, 25 Jan 2011, Mark Mitchell wrote: > >> On 1/25/2011 2:15 AM, Richard Guenther wrote: >> >> > I think we should revert to the pre-4.5 behavior and also fix 4.5. >> > Especially so if there are other compilers that follow the pre-4.5 >> > behavior - are there? >> >> You're suggesting changing 4.5 so that 4.5.N is incompatible with >> previous 4.5 releases? That seems really unfortunate to me; we're now >> introducing compatibility problems in what's supposed to be a bug-fix >> release. To justify that, we really have to decide that this was a >> blatant bug. But, it wasn't; it was a reasonable choice given the >> hardware. In retrospect, not the best choice, but a reasonable choice. > > I see it as a blatant bug as it appearantly breaks code that was > perfectly working with previous GCC releases. > >> I'm certainly willing to bow to a consensus opinion here. If the Power >> maintainers have consensus that this is the right way to go, then I >> would respect that. > > Of course - it's up to the Power maintainers to decide what to do. I agree that the POWER port must revert back to the Altivec/VMX meaning of vec_ld/vec_st and backport the fix to the GCC 4.5 branch. - David
On 1/26/2011 7:27 AM, David Edelsohn wrote: >> Of course - it's up to the Power maintainers to decide what to do. > > I agree that the POWER port must revert back to the Altivec/VMX > meaning of vec_ld/vec_st and backport the fix to the GCC 4.5 branch. OK, I think we've reached pretty overwhelming consensus. For avoidance of doubt, I'm fine with that conclusion. Thank you,
Index: gcc/doc/extend.texi =================================================================== --- gcc/doc/extend.texi (revision 169112) +++ gcc/doc/extend.texi (working copy) @@ -12354,6 +12354,12 @@ vector bool long vec_cmplt (vector doubl vector float vec_div (vector float, vector float); vector double vec_div (vector double, vector double); vector double vec_floor (vector double); +vector double vec_ld (int, const vector double *); +vector double vec_ld (int, const double *); +vector double vec_ldl (int, const vector double *); +vector double vec_ldl (int, const double *); +vector unsigned char vec_lvsl (int, const volatile double *); +vector unsigned char vec_lvsr (int, const volatile double *); vector double vec_madd (vector double, vector double, vector double); vector double vec_max (vector double, vector double); vector double vec_min (vector double, vector double); @@ -12382,6 +12388,8 @@ vector double vec_sel (vector double, ve vector double vec_sub (vector double, vector double); vector float vec_sqrt (vector float); vector double vec_sqrt (vector double); +void vec_st (vector double, int, vector double *); +void vec_st (vector double, int, double *); vector double vec_trunc (vector double); vector double vec_xor (vector double, vector double); vector double vec_xor (vector double, vector bool long); @@ -12412,6 +12420,10 @@ int vec_any_nlt (vector double, vector d int vec_any_numeric (vector double); @end smallexample +Note that the @samp{vec_ld} and @samp{vec_st} builtins will always +generate the Altivec @samp{LVX} and @samp{STVX} instructions even +if the VSX instruction set is available. + GCC provides a few other builtins on Powerpc to access certain instructions: @smallexample float __builtin_recipdivf (float, float); Index: libcpp/lex.c =================================================================== --- libcpp/lex.c (revision 169112) +++ libcpp/lex.c (working copy) @@ -547,6 +547,11 @@ search_line_fast (const uchar *s, const const vc zero = { 0 }; vc data, mask, t; + const uchar *unaligned_s = s; + + /* While altivec loads mask addresses, we still need to align S so + that the offset we compute at the end is correct. */ + s = (const uchar *)((uintptr_t)s & -16); /* Altivec loads automatically mask addresses with -16. This lets us issue the first load as early as possible. */ @@ -555,15 +560,20 @@ search_line_fast (const uchar *s, const /* Discard bytes before the beginning of the buffer. Do this by beginning with all ones and shifting in zeros according to the mis-alignment. The LVSR instruction pulls the exact shift we - want from the address. */ - mask = __builtin_vec_lvsr(0, s); + want from the address. + + Originally, we used s in the lvsr and did the alignment afterwords, which + works on a system that supported just the Altivec instruction set using + the LVX instruction. With the introduction of the VSX instruction, for + GCC 4.5, the load became LXVW4X. LVX ignores the bottom 3 bits, and + LXVW4X does not. While GCC 4.6 will revert vec_ld/vec_st to go back to + only produce Altivec instructions, the possibiliy exists that the stage1 + compiler was built with a compiler that generated LXVW4X. This code will + work on either system. */ + mask = __builtin_vec_lvsr(0, unaligned_s); mask = __builtin_vec_perm(zero, ones, mask); data &= mask; - /* While altivec loads mask addresses, we still need to align S so - that the offset we compute at the end is correct. */ - s = (const uchar *)((uintptr_t)s & -16); - /* Main loop processing 16 bytes at a time. */ goto start; do