diff mbox series

rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation

Message ID 537f6ea8-12a2-45aa-b3e5-a164b3982bc0@linux.ibm.com
State New
Headers show
Series rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation | expand

Commit Message

Carl Love June 26, 2024, 5:05 p.m. UTC
GCC maintainers:

The following patch updates the user documentation for the vec_ld, vec_lde, vec_st and vec_ste built-ins to make it clearer that there are data alignment requirements for these built-ins.  If the data alignment requirements are not followed, the data loaded or stored by these built-ins will be wrong.

Please let me know if this patch is acceptable for mainline.  Thanks.

                          Carl 

----------------------------------------------------
rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation

Use of the vec_ld and vec_st built-ins require that the data be 16-byte
aligned to work properly.  Add some additional text to the existing
documentation to make this clearer to the user.

Similarly, the vec_lde and vec_ste built-ins also have data alignment
requirements based on the size of the vector element.  Update the
documentation to make this clear to the user.

gcc/ChangeLog:
	* doc/extend.texi: Add clarification for the use of the vec_ld
	vec_st, vec_lde and vec_ste built-ins.
---
 gcc/doc/extend.texi | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

Comments

Kewen.Lin July 3, 2024, 9:36 a.m. UTC | #1
Hi Carl,

on 2024/6/27 01:05, Carl Love wrote:
> GCC maintainers:
> 
> The following patch updates the user documentation for the vec_ld, vec_lde, vec_st and vec_ste built-ins to make it clearer that there are data alignment requirements for these built-ins.  If the data alignment requirements are not followed, the data loaded or stored by these built-ins will be wrong.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>                           Carl 
> 
> ----------------------------------------------------
> rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation
> 
> Use of the vec_ld and vec_st built-ins require that the data be 16-byte
> aligned to work properly.  Add some additional text to the existing
> documentation to make this clearer to the user.
> 
> Similarly, the vec_lde and vec_ste built-ins also have data alignment
> requirements based on the size of the vector element.  Update the
> documentation to make this clear to the user.
> 
> gcc/ChangeLog:
> 	* doc/extend.texi: Add clarification for the use of the vec_ld
> 	vec_st, vec_lde and vec_ste built-ins.
> ---
>  gcc/doc/extend.texi | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index ee3644a5264..55faded17b9 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char,
>  @end smallexample
>  
>  Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
> -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
> -if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
> -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
> -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
> +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The

This change removed "even if the VSX instruction set is available.", I think it's
not intentional?  vec_ld and vec_st are well defined in PVIPR, this paragraph is
not to document them IMHO.  Since we document vec_vsx_ld and vec_vsx_st here, it
aims to note the difference between these two pairs.  But I'm not opposed to add
more words to emphasis the special masking off, I prefer to use the same words to
PVIPR "ignoring the four low-order bits of the calculated address".  And IMHO we
should not say "it requires the data to be 16-byte aligned to work properly" in
case the users are aware of this behavior well and have some no 16-byte aligned
data and expect it to behave like that, it's arguable to define "it" as not work
properly. 

> +instructions mask off the lower 4 bits of the effective address thus requiring
> +the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
> +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
> +integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
> +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
> +off the lower bits of the effective address based on the size of the data.
> +Thus the data must be aligned to the size of the vector element to work
> +properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
> +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
> +@samp{STXVW4X} instructions.

As above, there was a reason to mention vec_ld and vec_st here, but not one for
vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and users
should read the description in PVIPR instead (it's more recommended).

BR,
Kewen

>  
>  @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
>  @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
Carl Love July 3, 2024, 5:23 p.m. UTC | #2
On 7/3/24 2:36 AM, Kewen.Lin wrote:
> Hi Carl,
>
> on 2024/6/27 01:05, Carl Love wrote:
>> GCC maintainers:
>>
>> The following patch updates the user documentation for the vec_ld, vec_lde, vec_st and vec_ste built-ins to make it clearer that there are data alignment requirements for these built-ins.  If the data alignment requirements are not followed, the data loaded or stored by these built-ins will be wrong.
>>
>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>
>>                            Carl
>>
>> ----------------------------------------------------
>> rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation
>>
>> Use of the vec_ld and vec_st built-ins require that the data be 16-byte
>> aligned to work properly.  Add some additional text to the existing
>> documentation to make this clearer to the user.
>>
>> Similarly, the vec_lde and vec_ste built-ins also have data alignment
>> requirements based on the size of the vector element.  Update the
>> documentation to make this clear to the user.
>>
>> gcc/ChangeLog:
>> 	* doc/extend.texi: Add clarification for the use of the vec_ld
>> 	vec_st, vec_lde and vec_ste built-ins.
>> ---
>>   gcc/doc/extend.texi | 15 +++++++++++----
>>   1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index ee3644a5264..55faded17b9 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char,
>>   @end smallexample
>>   
>>   Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
>> -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
>> -if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
>> -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
>> -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
>> +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The
> This change removed "even if the VSX instruction set is available.", I think it's
> not intentional?  vec_ld and vec_st are well defined in PVIPR, this paragraph is
> not to document them IMHO.  Since we document vec_vsx_ld and vec_vsx_st here, it
> aims to note the difference between these two pairs.  But I'm not opposed to add
> more words to emphasis the special masking off, I prefer to use the same words to
> PVIPR "ignoring the four low-order bits of the calculated address".  And IMHO we
> should not say "it requires the data to be 16-byte aligned to work properly" in
> case the users are aware of this behavior well and have some no 16-byte aligned
> data and expect it to behave like that, it's arguable to define "it" as not work
> properly.

Yea, probably should have left "even if the VSX instruction set is 
available."

I was looking to make it clear that if the data is not 16-bye aligned 
you may not get the expected data loaded/stored.

So how about the following instead:

    Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
    generate the AltiVec @samp{LVX}, and @samp{STVX} instructions even
    if the VSX
    instruction set is available. The instructions mask off the lower
    4-bits of
    the calculated address. The use of these instructions on data that
    is not
    16-byte aligned may result in unexpected bytes being loaded or stored.

>> +instructions mask off the lower 4 bits of the effective address thus requiring
>> +the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
>> +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
>> +integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
>> +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
>> +off the lower bits of the effective address based on the size of the data.
>> +Thus the data must be aligned to the size of the vector element to work
>> +properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
>> +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
>> +@samp{STXVW4X} instructions.
> As above, there was a reason to mention vec_ld and vec_st here, but not one for
> vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and users
> should read the description in PVIPR instead (it's more recommended).

The goal of mentioning the vec_lde and vec_ste built-ins was to give the 
user a pointer to built-ins that will work as expected on unaligned 
data.  It will probably save them a lot of time an frustration if they 
are given a hint of what built-ins they should look at.  So, how about 
the following:

    See the PVIPR description of the vec_lde and vec_ste for loading and
    storing
    data that is not 16-byte aligned.

                    Carl
Kewen.Lin July 4, 2024, 7:07 a.m. UTC | #3
Hi Carl,

on 2024/7/4 01:23, Carl Love wrote:
> 
> On 7/3/24 2:36 AM, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2024/6/27 01:05, Carl Love wrote:
>>> GCC maintainers:
>>>
>>> The following patch updates the user documentation for the vec_ld, vec_lde, vec_st and vec_ste built-ins to make it clearer that there are data alignment requirements for these built-ins.  If the data alignment requirements are not followed, the data loaded or stored by these built-ins will be wrong.
>>>
>>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>>
>>>                            Carl
>>>
>>> ----------------------------------------------------
>>> rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation
>>>
>>> Use of the vec_ld and vec_st built-ins require that the data be 16-byte
>>> aligned to work properly.  Add some additional text to the existing
>>> documentation to make this clearer to the user.
>>>
>>> Similarly, the vec_lde and vec_ste built-ins also have data alignment
>>> requirements based on the size of the vector element.  Update the
>>> documentation to make this clear to the user.
>>>
>>> gcc/ChangeLog:
>>>     * doc/extend.texi: Add clarification for the use of the vec_ld
>>>     vec_st, vec_lde and vec_ste built-ins.
>>> ---
>>>   gcc/doc/extend.texi | 15 +++++++++++----
>>>   1 file changed, 11 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index ee3644a5264..55faded17b9 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char,
>>>   @end smallexample
>>>     Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
>>> -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
>>> -if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
>>> -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
>>> -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
>>> +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The
>> This change removed "even if the VSX instruction set is available.", I think it's
>> not intentional?  vec_ld and vec_st are well defined in PVIPR, this paragraph is
>> not to document them IMHO.  Since we document vec_vsx_ld and vec_vsx_st here, it
>> aims to note the difference between these two pairs.  But I'm not opposed to add
>> more words to emphasis the special masking off, I prefer to use the same words to
>> PVIPR "ignoring the four low-order bits of the calculated address".  And IMHO we
>> should not say "it requires the data to be 16-byte aligned to work properly" in
>> case the users are aware of this behavior well and have some no 16-byte aligned
>> data and expect it to behave like that, it's arguable to define "it" as not work
>> properly.
> 
> Yea, probably should have left "even if the VSX instruction set is available."
> 
> I was looking to make it clear that if the data is not 16-bye aligned you may not get the expected data loaded/stored.
> 
> So how about the following instead:
> 
>    Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
>    generate the AltiVec @samp{LVX}, and @samp{STVX} instructions even
>    if the VSX
>    instruction set is available. The instructions mask off the lower
>    4-bits of
>    the calculated address. The use of these instructions on data that
>    is not
>    16-byte aligned may result in unexpected bytes being loaded or stored.

Sorry for nitpicking, to avoid the implicit conclusion between "not 16-byte
aligned" and "unexpected bytes" (considering even if the given address isn't
16-byte aligned, if users would like to leverage this masking off trick, they
can do that as the behavior is definite, the results are still expected for
them, and PVIPR doesn't object this), so maybe "... of the calculated address,
so be careful of the alignment of the calculated address when meeting unexpected
load or store data."?

> 
>>> +instructions mask off the lower 4 bits of the effective address thus requiring
>>> +the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
>>> +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
>>> +integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
>>> +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
>>> +off the lower bits of the effective address based on the size of the data.
>>> +Thus the data must be aligned to the size of the vector element to work
>>> +properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
>>> +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
>>> +@samp{STXVW4X} instructions.
>> As above, there was a reason to mention vec_ld and vec_st here, but not one for
>> vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and users
>> should read the description in PVIPR instead (it's more recommended).
> 
> The goal of mentioning the vec_lde and vec_ste built-ins was to give the user a pointer to built-ins that will work as expected on unaligned data.  It will probably save them a lot of time an frustration if they are given a hint of what built-ins they should look at.  So, how about the following:

For that, we probably need to document everything. ;-)

> 
>    See the PVIPR description of the vec_lde and vec_ste for loading and
>    storing
>    data that is not 16-byte aligned.

I think "not 16-byte aligned" isn't correct, it should be "aligned to element
boundary or element size aligned".  But IMHO users should read PVIPR themselves
regardless of aligned or not, PVIPR already notes that "Be careful to note that
the address (b+c) is aligned to an element boundary. Do not attempt to load/store
unaligned data with this intrinsic.", so I still prefer not to mention them here,
but I'm open to hear other's opinions.

BR,
Kewen
diff mbox series

Patch

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ee3644a5264..55faded17b9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22644,10 +22644,17 @@  vector unsigned char vec_xxsldi (vector unsigned char,
 @end smallexample
 
 Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
-generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
-if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
-@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
-@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
+generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The
+instructions mask off the lower 4 bits of the effective address thus requiring
+the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
+@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
+integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
+@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
+off the lower bits of the effective address based on the size of the data.
+Thus the data must be aligned to the size of the vector element to work
+properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
+always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
+@samp{STXVW4X} instructions.
 
 @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07