Message ID | 537f6ea8-12a2-45aa-b3e5-a164b3982bc0@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation | expand |
Hi Carl, on 2024/6/27 01:05, Carl Love wrote: > GCC maintainers: > > The following patch updates the user documentation for the vec_ld, vec_lde, vec_st and vec_ste built-ins to make it clearer that there are data alignment requirements for these built-ins. If the data alignment requirements are not followed, the data loaded or stored by these built-ins will be wrong. > > Please let me know if this patch is acceptable for mainline. Thanks. > > Carl > > ---------------------------------------------------- > rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation > > Use of the vec_ld and vec_st built-ins require that the data be 16-byte > aligned to work properly. Add some additional text to the existing > documentation to make this clearer to the user. > > Similarly, the vec_lde and vec_ste built-ins also have data alignment > requirements based on the size of the vector element. Update the > documentation to make this clear to the user. > > gcc/ChangeLog: > * doc/extend.texi: Add clarification for the use of the vec_ld > vec_st, vec_lde and vec_ste built-ins. > --- > gcc/doc/extend.texi | 15 +++++++++++---- > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index ee3644a5264..55faded17b9 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char, > @end smallexample > > Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always > -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even > -if the VSX instruction set is available. The @samp{vec_vsx_ld} and > -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, > -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. > +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions. The This change removed "even if the VSX instruction set is available.", I think it's not intentional? vec_ld and vec_st are well defined in PVIPR, this paragraph is not to document them IMHO. Since we document vec_vsx_ld and vec_vsx_st here, it aims to note the difference between these two pairs. But I'm not opposed to add more words to emphasis the special masking off, I prefer to use the same words to PVIPR "ignoring the four low-order bits of the calculated address". And IMHO we should not say "it requires the data to be 16-byte aligned to work properly" in case the users are aware of this behavior well and have some no 16-byte aligned data and expect it to behave like that, it's arguable to define "it" as not work properly. > +instructions mask off the lower 4 bits of the effective address thus requiring > +the data to be 16-byte aligned to work properly. The @samp{vec_lde} and > +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer, > +integer, and float. The corresponding AltiVec instructions @samp{LVEBX}, > +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask > +off the lower bits of the effective address based on the size of the data. > +Thus the data must be aligned to the size of the vector element to work > +properly. The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions > +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and > +@samp{STXVW4X} instructions. As above, there was a reason to mention vec_ld and vec_st here, but not one for vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and users should read the description in PVIPR instead (it's more recommended). BR, Kewen > > @node PowerPC AltiVec Built-in Functions Available on ISA 2.07 > @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
On 7/3/24 2:36 AM, Kewen.Lin wrote: > Hi Carl, > > on 2024/6/27 01:05, Carl Love wrote: >> GCC maintainers: >> >> The following patch updates the user documentation for the vec_ld, vec_lde, vec_st and vec_ste built-ins to make it clearer that there are data alignment requirements for these built-ins. If the data alignment requirements are not followed, the data loaded or stored by these built-ins will be wrong. >> >> Please let me know if this patch is acceptable for mainline. Thanks. >> >> Carl >> >> ---------------------------------------------------- >> rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation >> >> Use of the vec_ld and vec_st built-ins require that the data be 16-byte >> aligned to work properly. Add some additional text to the existing >> documentation to make this clearer to the user. >> >> Similarly, the vec_lde and vec_ste built-ins also have data alignment >> requirements based on the size of the vector element. Update the >> documentation to make this clear to the user. >> >> gcc/ChangeLog: >> * doc/extend.texi: Add clarification for the use of the vec_ld >> vec_st, vec_lde and vec_ste built-ins. >> --- >> gcc/doc/extend.texi | 15 +++++++++++---- >> 1 file changed, 11 insertions(+), 4 deletions(-) >> >> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi >> index ee3644a5264..55faded17b9 100644 >> --- a/gcc/doc/extend.texi >> +++ b/gcc/doc/extend.texi >> @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char, >> @end smallexample >> >> Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always >> -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even >> -if the VSX instruction set is available. The @samp{vec_vsx_ld} and >> -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, >> -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. >> +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions. The > This change removed "even if the VSX instruction set is available.", I think it's > not intentional? vec_ld and vec_st are well defined in PVIPR, this paragraph is > not to document them IMHO. Since we document vec_vsx_ld and vec_vsx_st here, it > aims to note the difference between these two pairs. But I'm not opposed to add > more words to emphasis the special masking off, I prefer to use the same words to > PVIPR "ignoring the four low-order bits of the calculated address". And IMHO we > should not say "it requires the data to be 16-byte aligned to work properly" in > case the users are aware of this behavior well and have some no 16-byte aligned > data and expect it to behave like that, it's arguable to define "it" as not work > properly. Yea, probably should have left "even if the VSX instruction set is available." I was looking to make it clear that if the data is not 16-bye aligned you may not get the expected data loaded/stored. So how about the following instead: Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always generate the AltiVec @samp{LVX}, and @samp{STVX} instructions even if the VSX instruction set is available. The instructions mask off the lower 4-bits of the calculated address. The use of these instructions on data that is not 16-byte aligned may result in unexpected bytes being loaded or stored. >> +instructions mask off the lower 4 bits of the effective address thus requiring >> +the data to be 16-byte aligned to work properly. The @samp{vec_lde} and >> +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer, >> +integer, and float. The corresponding AltiVec instructions @samp{LVEBX}, >> +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask >> +off the lower bits of the effective address based on the size of the data. >> +Thus the data must be aligned to the size of the vector element to work >> +properly. The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions >> +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and >> +@samp{STXVW4X} instructions. > As above, there was a reason to mention vec_ld and vec_st here, but not one for > vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and users > should read the description in PVIPR instead (it's more recommended). The goal of mentioning the vec_lde and vec_ste built-ins was to give the user a pointer to built-ins that will work as expected on unaligned data. It will probably save them a lot of time an frustration if they are given a hint of what built-ins they should look at. So, how about the following: See the PVIPR description of the vec_lde and vec_ste for loading and storing data that is not 16-byte aligned. Carl
Hi Carl, on 2024/7/4 01:23, Carl Love wrote: > > On 7/3/24 2:36 AM, Kewen.Lin wrote: >> Hi Carl, >> >> on 2024/6/27 01:05, Carl Love wrote: >>> GCC maintainers: >>> >>> The following patch updates the user documentation for the vec_ld, vec_lde, vec_st and vec_ste built-ins to make it clearer that there are data alignment requirements for these built-ins. If the data alignment requirements are not followed, the data loaded or stored by these built-ins will be wrong. >>> >>> Please let me know if this patch is acceptable for mainline. Thanks. >>> >>> Carl >>> >>> ---------------------------------------------------- >>> rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation >>> >>> Use of the vec_ld and vec_st built-ins require that the data be 16-byte >>> aligned to work properly. Add some additional text to the existing >>> documentation to make this clearer to the user. >>> >>> Similarly, the vec_lde and vec_ste built-ins also have data alignment >>> requirements based on the size of the vector element. Update the >>> documentation to make this clear to the user. >>> >>> gcc/ChangeLog: >>> * doc/extend.texi: Add clarification for the use of the vec_ld >>> vec_st, vec_lde and vec_ste built-ins. >>> --- >>> gcc/doc/extend.texi | 15 +++++++++++---- >>> 1 file changed, 11 insertions(+), 4 deletions(-) >>> >>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi >>> index ee3644a5264..55faded17b9 100644 >>> --- a/gcc/doc/extend.texi >>> +++ b/gcc/doc/extend.texi >>> @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char, >>> @end smallexample >>> Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always >>> -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even >>> -if the VSX instruction set is available. The @samp{vec_vsx_ld} and >>> -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, >>> -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. >>> +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions. The >> This change removed "even if the VSX instruction set is available.", I think it's >> not intentional? vec_ld and vec_st are well defined in PVIPR, this paragraph is >> not to document them IMHO. Since we document vec_vsx_ld and vec_vsx_st here, it >> aims to note the difference between these two pairs. But I'm not opposed to add >> more words to emphasis the special masking off, I prefer to use the same words to >> PVIPR "ignoring the four low-order bits of the calculated address". And IMHO we >> should not say "it requires the data to be 16-byte aligned to work properly" in >> case the users are aware of this behavior well and have some no 16-byte aligned >> data and expect it to behave like that, it's arguable to define "it" as not work >> properly. > > Yea, probably should have left "even if the VSX instruction set is available." > > I was looking to make it clear that if the data is not 16-bye aligned you may not get the expected data loaded/stored. > > So how about the following instead: > > Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always > generate the AltiVec @samp{LVX}, and @samp{STVX} instructions even > if the VSX > instruction set is available. The instructions mask off the lower > 4-bits of > the calculated address. The use of these instructions on data that > is not > 16-byte aligned may result in unexpected bytes being loaded or stored. Sorry for nitpicking, to avoid the implicit conclusion between "not 16-byte aligned" and "unexpected bytes" (considering even if the given address isn't 16-byte aligned, if users would like to leverage this masking off trick, they can do that as the behavior is definite, the results are still expected for them, and PVIPR doesn't object this), so maybe "... of the calculated address, so be careful of the alignment of the calculated address when meeting unexpected load or store data."? > >>> +instructions mask off the lower 4 bits of the effective address thus requiring >>> +the data to be 16-byte aligned to work properly. The @samp{vec_lde} and >>> +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer, >>> +integer, and float. The corresponding AltiVec instructions @samp{LVEBX}, >>> +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask >>> +off the lower bits of the effective address based on the size of the data. >>> +Thus the data must be aligned to the size of the vector element to work >>> +properly. The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions >>> +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and >>> +@samp{STXVW4X} instructions. >> As above, there was a reason to mention vec_ld and vec_st here, but not one for >> vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and users >> should read the description in PVIPR instead (it's more recommended). > > The goal of mentioning the vec_lde and vec_ste built-ins was to give the user a pointer to built-ins that will work as expected on unaligned data. It will probably save them a lot of time an frustration if they are given a hint of what built-ins they should look at. So, how about the following: For that, we probably need to document everything. ;-) > > See the PVIPR description of the vec_lde and vec_ste for loading and > storing > data that is not 16-byte aligned. I think "not 16-byte aligned" isn't correct, it should be "aligned to element boundary or element size aligned". But IMHO users should read PVIPR themselves regardless of aligned or not, PVIPR already notes that "Be careful to note that the address (b+c) is aligned to an element boundary. Do not attempt to load/store unaligned data with this intrinsic.", so I still prefer not to mention them here, but I'm open to hear other's opinions. BR, Kewen
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ee3644a5264..55faded17b9 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char, @end smallexample Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even -if the VSX instruction set is available. The @samp{vec_vsx_ld} and -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions. The +instructions mask off the lower 4 bits of the effective address thus requiring +the data to be 16-byte aligned to work properly. The @samp{vec_lde} and +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer, +integer, and float. The corresponding AltiVec instructions @samp{LVEBX}, +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask +off the lower bits of the effective address based on the size of the data. +Thus the data must be aligned to the size of the vector element to work +properly. The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and +@samp{STXVW4X} instructions. @node PowerPC AltiVec Built-in Functions Available on ISA 2.07 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07