diff mbox series

[wwwdocs] gcc-12/changes.html (GCN): >1 workers per gang

Message ID 531ee7a8-0446-8f7b-7504-928426a257c5@codesourcery.com
State New
Headers show
Series [wwwdocs] gcc-12/changes.html (GCN): >1 workers per gang | expand

Commit Message

Tobias Burnus Aug. 9, 2021, 1:55 p.m. UTC
Now that the GCN/OpenACC patches for this have been committed today,
I think it makes sense to add it to the documentation.
(I was told that some follow-up items are still pending, but as
the feature does work ...)

Cf. also Andrew's talk of last year,
https://linuxplumbersconf.org/event/7/contributions/749/attachments/560/988/AMD_GCN_Update_-_LPC_2020.pdf
which I utilized when writing the attached wwwdocs patch.

Comments and/or suggestions?

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

Comments

Thomas Schwinge Aug. 9, 2021, 2:27 p.m. UTC | #1
Hi!

On 2021-08-09T15:55:07+0200, Tobias Burnus <tobias@codesourcery.com> wrote:
> Now that the GCN/OpenACC patches for this have been committed today,
> I think it makes sense to add it to the documentation.

Thanks for thinking of this.

> (I was told that some follow-up items are still pending, but as
> the feature does work ...)

ACK.

> Cf. also Andrew's talk of last year,
> https://linuxplumbersconf.org/event/7/contributions/749/attachments/560/988/AMD_GCN_Update_-_LPC_2020.pdf
> which I utilized when writing the attached wwwdocs patch.
>
> Comments and/or suggestions?

> gcc-12/changes.html (GCN): >1 workers per gang

ACK for that aspect specifically, but:

> --- a/htdocs/gcc-12/changes.html
> +++ b/htdocs/gcc-12/changes.html

> +  <li>When used as OpenACC device: the limitation of 1 worker per gang, 2 gangs
> +      per CU has been lifted; now up to 16 workers per gang and 40 gangs per CU
> +      are supported. (Except that the hardware limit of 40 workers total may
> +      not be exceeded.)</li>

I haven't changed anything related to a "limitation of [...] 2 gangs per
CU has been lifted".  Maybe that has already been done earlier, maybe
that still has to be done?  I don't know -- Julian?


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Gerald Pfeifer Aug. 9, 2021, 6:53 p.m. UTC | #2
On Mon, 9 Aug 2021, Tobias Burnus wrote:
> Comments and/or suggestions?

Looks good from my perspective, with the feedback that Thomas
provided.

(Is "CU" a sufficiently established term, or might it make sense
to spell it out?)

Thanks,
Gerald
Tobias Burnus Aug. 10, 2021, 10:10 a.m. UTC | #3
Hi all,

On 09.08.21 20:53, Gerald Pfeifer wrote:

> (Is "CU" a sufficiently established term, or might it make sense
> to spell it out?)
I don't know – but we could use "per compute unit (CU)".

On 09.08.21 16:27, Thomas Schwinge wrote:
> On 2021-08-09T15:55:07+0200, Tobias Burnus<tobias@codesourcery.com>  wrote:
>> +++ b/htdocs/gcc-12/changes.html
>> +  <li>When used as OpenACC device: the limitation of 1 worker per gang, 2 gangs
>> +      per CU has been lifted; now up to 16 workers per gang and 40 gangs per CU
>> +      are supported. (Except that the hardware limit of 40 workers total may
>> +      not be exceeded.)</li>
> I haven't changed anything related to a "limitation of [...] 2 gangs per
> CU has been lifted".  Maybe that has already been done earlier, maybe
> that still has to be done?  I don't know -- Julian?

Looking at the current code, it has:
   if (dims[0] == 0) dims[0] = get_cu_count (kernel->agent); /* Gangs.  */

Thus at least when nothing else has been specified, it uses #CUs of gangs,
running on #CUs CUs, i.e. 1 gang per CU.

[OG11 – but not mainline] What's needed is something like:
       dims[0] = get_cu_count (kernel->agent) * (32 / dims[1]);
which I see in OG11 – oddly, I also see there code like:
           def->gdims[0] = get_cu_count (agent); // * (40 / gcn_threads);

In other words:  For gangs > #CUs or >1 gang per CU, the following patch
is needed:
   [OG11] https://gcc.gnu.org/g:4dcd1e1f4e6b451aac44f919b8eb3ac49292b308
   [email] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550102.html
      "not suitable for mainline until the multiple-worker support is merged there"

@Andrew + @Julian:  Do you intent to commit it relatively soon?
Regarding the wwwdocs patch, I can hold off until that commit or reword
it to only cover the workers part.

Thanks Thomas & Gerald for the comments!

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Stubbs, Andrew Aug. 16, 2021, 8:34 a.m. UTC | #4
> In other words:  For gangs > #CUs or >1 gang per CU, the following patch
> is needed:
>    [OG11] https://gcc.gnu.org/g:4dcd1e1f4e6b451aac44f919b8eb3ac49292b308
>    [email] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550102.html
>       "not suitable for mainline until the multiple-worker support is merged
> there"
> 
> @Andrew + @Julian:  Do you intent to commit it relatively soon?
> Regarding the wwwdocs patch, I can hold off until that commit or reword
> it to only cover the workers part.

Were these not part of the patch set Thomas was working on?

Andrew
Thomas Schwinge Aug. 16, 2021, 9:28 a.m. UTC | #5
Hi!

On 2021-08-16T10:34:34+0200, "Stubbs, Andrew" <Andrew_Stubbs@mentor.com> wrote:
>> In other words:  For gangs > #CUs or >1 gang per CU, the following patch
>> is needed:
>>    [OG11] https://gcc.gnu.org/g:4dcd1e1f4e6b451aac44f919b8eb3ac49292b308
>>    [email] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550102.html
>>       "not suitable for mainline until the multiple-worker support is merged
>> there"
>>
>> @Andrew + @Julian:  Do you intent to commit it relatively soon?
>> Regarding the wwwdocs patch, I can hold off until that commit or reword
>> it to only cover the workers part.
>
> Were these not part of the patch set Thomas was working on?

That one is "amdgcn: Tune default OpenMP/OpenACC GPU utilization", which
indeed I'm planning to handle in the next few weeks.  So I suggest we
simply hold back Tobias' 'gcc-12/changes.html' patch until that time.


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Tobias Burnus Feb. 2, 2022, 3:39 p.m. UTC | #6
On 09.08.21 15:55, Tobias Burnus wrote:
> Now that the GCN/OpenACC patches for this have been committed today,
> I think it makes sense to add it to the documentation.
> (I was told that some follow-up items are still pending, but as
> the feature does work ...)

I think the follow-up patches have now been committed.
How about the attached patch?

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Andrew Stubbs Feb. 2, 2022, 4:23 p.m. UTC | #7
On 02/02/2022 15:39, Tobias Burnus wrote:
> On 09.08.21 15:55, Tobias Burnus wrote:
>> Now that the GCN/OpenACC patches for this have been committed today,
>> I think it makes sense to add it to the documentation.
>> (I was told that some follow-up items are still pending, but as
>> the feature does work ...)
> 
> I think the follow-up patches have now been committed.
> How about the attached patch?

We should probably add a qualification "(up to a limit of 40 wavefronts 
in total, per CU)" or else were suggesting that there can be 40 
workgroups of 16 wavefronts, which the hardware will not do.

Andrew
Tobias Burnus Feb. 2, 2022, 5:54 p.m. UTC | #8
Now committed, taking Andrew's comments into account:

https://gcc.gnu.org/gcc-12/changes.html#amdgcn

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
diff mbox series

Patch

gcc-12/changes.html (GCN): >1 workers per gang

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 9c2799cf..05e24737 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -119,10 +119,14 @@ 
 <h3 id="amdgcn">AMD Radeon (GCN)</h3>
 <ul>
   <li>Debug experience with ROCGDB has been improved.</li>
   <li>Support for the type <code>__int128_t</code>/<code>integer(kind=16)</code>
       was added.</li>
+  <li>When used as OpenACC device: the limitation of 1 worker per gang, 2 gangs
+      per CU has been lifted; now up to 16 workers per gang and 40 gangs per CU
+      are supported. (Except that the hardware limit of 40 workers total may
+      not be exceeded.)</li>
 </ul>
 
 <!-- <h3 id="arc">ARC</h3> -->
 
 <!-- <h3 id="arm">ARM</h3> -->