diff mbox series

[avr] Disable CRC lookup tables

Message ID 7fc6ebbf-4a3d-4e37-a8bc-975d5f9c5e23@gjlay.de
State New
Headers show
Series [avr] Disable CRC lookup tables | expand

Commit Message

Georg-Johann Lay Dec. 6, 2024, 10:58 a.m. UTC
This patch disables CRC lookup tables which consume quite some RAM.

Ok for trunk?

Johann

--

AVR: Disable generation of CRC lookup tables.

With -foptimize-crc, large lookup tables may be generated which
are places in .rodata (RAM).  This patch disables such tables.

gcc/
         * common/config/avr/avr-common.cc
         (avr_option_optimization_table): Default to -fno-optimize-crc.

      { OPT_LEVELS_1_PLUS, OPT_mfuse_add_, NULL, 1 },

Comments

Sam James Dec. 6, 2024, 12:23 p.m. UTC | #1
Georg-Johann Lay <avr@gjlay.de> writes:

> This patch disables CRC lookup tables which consume quite some RAM.

Given that -foptimize-crc is new, it may be useful to CC the pass
authors in case they have input.

>
> Ok for trunk?
>
> Johann
Georg-Johann Lay Dec. 6, 2024, 1:14 p.m. UTC | #2
Am 06.12.24 um 13:23 schrieb Sam James:
> Georg-Johann Lay <avr@gjlay.de> writes:
> 
>> This patch disables CRC lookup tables which consume quite some RAM.
> 
> Given that -foptimize-crc is new, it may be useful to CC the pass
> authors in case they have input.

CCing Mariam Arutunian

>> Ok for trunk?
>>
>> Johann

The problem is not in the new CRC pass, but because AVR is a very
limited hardware.  Because AVR has no linear address space, .rodata
has to be placed in RAM, and most devices have just a few KiB of RAM,
or even less.

An extension PR49857 to put such lookup tables in flash / program memory
has been rejected by the global maintainers as "too specific", ignoring
most of the constraints imposed by the requirement of using named
address-spaces.

The point is that you cannot just put data in flash, one must also use
the correct instructions to access them, which is achieved by means
of avr specific named address-spaces like __flash.

This would require a new target hook like proposed in PR49857, which
could put lookup-tables into a non-generic address-space provided:

*) All respective data is put in the preferred address-space, and

*) All accesses have to use the same address-space as of 1),
independent of what the rest of the code may look like.

To date, only 3 lookup tables generated by GCC meet these criteria:

1) Lookup tables from tree-switch conversion.

2) Lookup tables from the current CRC work

3) vtables.  Though g++ does not, and probably never will, support
named address spaces.

Johann
Jeff Law Dec. 6, 2024, 1:32 p.m. UTC | #3
On 12/6/24 5:23 AM, Sam James wrote:
> Georg-Johann Lay <avr@gjlay.de> writes:
> 
>> This patch disables CRC lookup tables which consume quite some RAM.
> 
> Given that -foptimize-crc is new, it may be useful to CC the pass
> authors in case they have input.
I think this is trivially OK for the AVR.  The bigger question is should 
we do something more general for -Os.

CRC generation through table lookups is going to take more data space. 
You need a 256 byte table for each unique CRC (sizes & polynomial), and 
the code to compute the index into the table can be (from a code size 
standpoint) relatively expensive as well, particularly on the 
micro-controllers if the crc is to be computed in a mode wider than a 
word on the target.

So I would actually even support a more general "don't optimize CRCs by 
default for -Os".


Jeff
Richard Biener Dec. 6, 2024, 1:53 p.m. UTC | #4
On Fri, Dec 6, 2024 at 2:17 PM Georg-Johann Lay <avr@gjlay.de> wrote:
>
> Am 06.12.24 um 13:23 schrieb Sam James:
> > Georg-Johann Lay <avr@gjlay.de> writes:
> >
> >> This patch disables CRC lookup tables which consume quite some RAM.
> >
> > Given that -foptimize-crc is new, it may be useful to CC the pass
> > authors in case they have input.
>
> CCing Mariam Arutunian
>
> >> Ok for trunk?
> >>
> >> Johann
>
> The problem is not in the new CRC pass, but because AVR is a very
> limited hardware.  Because AVR has no linear address space, .rodata
> has to be placed in RAM, and most devices have just a few KiB of RAM,
> or even less.
>
> An extension PR49857 to put such lookup tables in flash / program memory
> has been rejected by the global maintainers as "too specific", ignoring
> most of the constraints imposed by the requirement of using named
> address-spaces.
>
> The point is that you cannot just put data in flash, one must also use
> the correct instructions to access them, which is achieved by means
> of avr specific named address-spaces like __flash.
>
> This would require a new target hook like proposed in PR49857, which
> could put lookup-tables into a non-generic address-space provided:
>
> *) All respective data is put in the preferred address-space, and
>
> *) All accesses have to use the same address-space as of 1),
> independent of what the rest of the code may look like.
>
> To date, only 3 lookup tables generated by GCC meet these criteria:
>
> 1) Lookup tables from tree-switch conversion.
>
> 2) Lookup tables from the current CRC work
>
> 3) vtables.  Though g++ does not, and probably never will, support
> named address spaces.

4) const global data

I think generally this might be a sound optimization - but as you said
it requires generating correct accesses in the first place which also
means once anything takes the address of the data all pointer uses
have to know beforehand.  Thus application is likely quite limited for
4) at least.

Maybe it's possible to perform half of the task by the linker via
relaxation?   Though I can easily guess it's somewhat difficult for AVR.

Richard.

> Johann
>
>
>
>
Sam James Dec. 6, 2024, 1:56 p.m. UTC | #5
Georg-Johann Lay <avr@gjlay.de> writes:

> Am 06.12.24 um 13:23 schrieb Sam James:
>> Georg-Johann Lay <avr@gjlay.de> writes:
>> 
>>> This patch disables CRC lookup tables which consume quite some RAM.
>> Given that -foptimize-crc is new, it may be useful to CC the pass
>> authors in case they have input.
>
> CCing Mariam Arutunian
>
>>> Ok for trunk?
>>>
>>> Johann
>
> The problem is not in the new CRC pass, but because AVR is a very
> limited hardware.  Because AVR has no linear address space, .rodata
> has to be placed in RAM, and most devices have just a few KiB of RAM,
> or even less.

(To be clear, I do accept that -- it's more that it's: a) useful as a
heads-up to people; b) might affect their future testing; c) might mean
we end up talking about whether the patch has issues on other targets).

>
> An extension PR49857 to put such lookup tables in flash / program memory
> has been rejected by the global maintainers as "too specific", ignoring
> most of the constraints imposed by the requirement of using named
> address-spaces.
>
> The point is that you cannot just put data in flash, one must also use
> the correct instructions to access them, which is achieved by means
> of avr specific named address-spaces like __flash.
>
> This would require a new target hook like proposed in PR49857, which
> could put lookup-tables into a non-generic address-space provided:
>
> *) All respective data is put in the preferred address-space, and
>
> *) All accesses have to use the same address-space as of 1),
> independent of what the rest of the code may look like.
>
> To date, only 3 lookup tables generated by GCC meet these criteria:
>
> 1) Lookup tables from tree-switch conversion.
>
> 2) Lookup tables from the current CRC work
>
> 3) vtables.  Though g++ does not, and probably never will, support
> named address spaces.

Thanks for the extra context!

>
> Johann

sam
Georg-Johann Lay Dec. 6, 2024, 2:29 p.m. UTC | #6
Am 06.12.24 um 14:53 schrieb Richard Biener:
> On Fri, Dec 6, 2024 at 2:17 PM Georg-Johann Lay <avr@gjlay.de> wrote:
>>
>> Am 06.12.24 um 13:23 schrieb Sam James:
>>> Georg-Johann Lay <avr@gjlay.de> writes:
>>>
>>>> This patch disables CRC lookup tables which consume quite some RAM.
>>>
>>> Given that -foptimize-crc is new, it may be useful to CC the pass
>>> authors in case they have input.
>>
>> CCing Mariam Arutunian
>>
>>>> Ok for trunk?
>>>>
>>>> Johann
>>
>> The problem is not in the new CRC pass, but because AVR is a very
>> limited hardware.  Because AVR has no linear address space, .rodata
>> has to be placed in RAM, and most devices have just a few KiB of RAM,
>> or even less.
>>
>> An extension PR49857 to put such lookup tables in flash / program memory
>> has been rejected by the global maintainers as "too specific", ignoring
>> most of the constraints imposed by the requirement of using named
>> address-spaces.
>>
>> The point is that you cannot just put data in flash, one must also use
>> the correct instructions to access them, which is achieved by means
>> of avr specific named address-spaces like __flash.
>>
>> This would require a new target hook like proposed in PR49857, which
>> could put lookup-tables into a non-generic address-space provided:
>>
>> *) All respective data is put in the preferred address-space, and
>>
>> *) All accesses have to use the same address-space as of 1),
>> independent of what the rest of the code may look like.
>>
>> To date, only 3 lookup tables generated by GCC meet these criteria:
>>
>> 1) Lookup tables from tree-switch conversion.
>>
>> 2) Lookup tables from the current CRC work
>>
>> 3) vtables.  Though g++ does not, and probably never will, support
>> named address spaces.
> 
> 4) const global data

No. That won't work. Suppose:

foo.c:

const int ii = 1;
int inc (const int*);

int func (void)
{
     return inc (&ii);
}

bar.c:

int inc (const int *p)
{
     return 1 + *p;
}

This works when ii is in generic address-space.  But when you
put ii in a different AS (like e.g. __flash), then inc() will
read from the wrong AS.  You'd need a version of inc() that is

int inc_flash (const __flash int *p)
{
     return 1 + *p;
}

TL;DR You don't have control over *all* accesses, but in foo.c
the pointer to ii may escape.

Notice that in, say, tree-switch-conversion this is different
because that pass knows *all* accesses to the table, and it
could attach an AS from a target hook to all accesses.

Once that pass has finished, there's no more way to retroactively
optimize this, because addresses may escape to other modules
or to inline asm, or there may be copies or the tree var.
You's have to find all tree vars ssa_names.

> I think generally this might be a sound optimization - but as you said
> it requires generating correct accesses in the first place which also
> means once anything takes the address of the data all pointer uses
> have to know beforehand.  Thus application is likely quite limited for
> 4) at least.
> 
> Maybe it's possible to perform half of the task by the linker via
> relaxation?   Though I can easily guess it's somewhat difficult for AVR.
> 
> Richard.

No.  I don't see what the linker has to do with it.  Different ASes
even have different instructions to access them and support different
addressing modes and need different address registers.

Addressing modes are generated by the compiler and
TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P.  The linker cannot fix that.

Johann
Oleg Endo Dec. 6, 2024, 2:50 p.m. UTC | #7
On Fri, 2024-12-06 at 06:32 -0700, Jeff Law wrote:
> 
> On 12/6/24 5:23 AM, Sam James wrote:
> > Georg-Johann Lay <avr@gjlay.de> writes:
> > 
> > > This patch disables CRC lookup tables which consume quite some RAM.
> > 
> > Given that -foptimize-crc is new, it may be useful to CC the pass
> > authors in case they have input.
> I think this is trivially OK for the AVR.  The bigger question is should 
> we do something more general for -Os.
> 
> CRC generation through table lookups is going to take more data space. 
> You need a 256 byte table for each unique CRC (sizes & polynomial), and 
> the code to compute the index into the table can be (from a code size 
> standpoint) relatively expensive as well, particularly on the 
> micro-controllers if the crc is to be computed in a mode wider than a 
> word on the target.
> 
> So I would actually even support a more general "don't optimize CRCs by 
> default for -Os".
> 

I've been putting CRC tables for many years into .text / .rodata on various
MCU projects.  Never considered putting them into .data, since flash is
usually a lot larger than RAM.  What's the reasoning behind putting the
tables in .data?

Best regards,
Oleg Endo
Georg-Johann Lay Dec. 6, 2024, 3:51 p.m. UTC | #8
Am 06.12.24 um 15:50 schrieb Oleg Endo:
> On Fri, 2024-12-06 at 06:32 -0700, Jeff Law wrote:
>>
>> On 12/6/24 5:23 AM, Sam James wrote:
>>> Georg-Johann Lay <avr@gjlay.de> writes:
>>>
>>>> This patch disables CRC lookup tables which consume quite some RAM.
>>>
>>> Given that -foptimize-crc is new, it may be useful to CC the pass
>>> authors in case they have input.
>> I think this is trivially OK for the AVR.  The bigger question is should
>> we do something more general for -Os.
>>
>> CRC generation through table lookups is going to take more data space.
>> You need a 256 byte table for each unique CRC (sizes & polynomial), and
>> the code to compute the index into the table can be (from a code size
>> standpoint) relatively expensive as well, particularly on the
>> micro-controllers if the crc is to be computed in a mode wider than a
>> word on the target.
>>
>> So I would actually even support a more general "don't optimize CRCs by
>> default for -Os".
> 
> I've been putting CRC tables for many years into .text / .rodata on various
> MCU projects.  Never considered putting them into .data, since flash is
> usually a lot larger than RAM.  What's the reasoning behind putting the
> tables in .data?
> 
> Best regards,
> Oleg Endo

The CRC tables ARE put into .rodata, not into .data.

The correct question is: Why is avr putting .rodata into RAM?

Suppose the following C code:

char read_c (const char *p)
{
     return p[1];
}

Where p may point to .rodata, .data, .bss etc.
Now suppose .rodata is located in flash and .data and .bss
are located in RAM.  Then you'd need the following code to
access c[1]:

if (ADDR_SPACE (p) == GENERIC)
    Use LD, LDD with address R, R++, --R or R+const
    where R is one of the address registers X, Y or Z.
    Also may use direct addressing.
else if (ADDR_SPACE (p) == __flash)
    Use LPM with Z or Z++ addressing mode.  No other
    addressing mode or address reg or instruction is
    allowed.

This would imply that there is some means to tell apart
different address spaces by looking at p.  This is *not* the
case.  In particular, flash address 0x4 looks exactly the same
like RAM address 0x4.  Both are 16-bit address 0x0004, and there
is no way to tell them apart.  It would have been possible to
reserve one bit of the address for the address space and let
the linker set that bit depending on the address space where
a symbol is placed.  This means you'd have to tell apart the
addresses at run-time.

This is not the approach taken by the avr tools.
(This was the design decision back then, and it was a good
decision, even in retrospect IMO.)

Instead, they put .rodata in RAM, and when the user wants a
table in flash, she has to put it in section .progmem.data
by hand and use inline assembly to access them.

https://avrdudes.github.io/avr-libc/avr-libc-user-manual/group__avr__pgmspace.html

As an alternative, a named address space can be used when
available, but all such objects have to be put in that
address space by hand, and all objects have to be accessed
through pointers qualified as __flash.

There are more 16-bit address spaces like __flash1, __flash2
__flash3, __flash4, __flash5 for larger devices.  And there
is a 24-bit address space that can host references to any
address space, including generic.  But using them comes
with quite some overhead because the decision which AS to use
has to be taken at run-time.  This also means the addressing
mode is limited to the ones supported by /all/ address spaces,
since you don't know at compile time where the address points to.

Johann.
Oleg Endo Dec. 7, 2024, 1:03 a.m. UTC | #9
On Fri, 2024-12-06 at 16:51 +0100, Georg-Johann Lay wrote:
> 
> The CRC tables ARE put into .rodata, not into .data.
> 
> The correct question is: Why is avr putting .rodata into RAM?
> 
> Suppose the following C code:
> 
> char read_c (const char *p)
> {
>      return p[1];
> }
> 
> Where p may point to .rodata, .data, .bss etc.
> Now suppose .rodata is located in flash and .data and .bss
> are located in RAM.

Thanks for the detailed explanation!

I feel you.  I've been coding on mcs51 MCUs for a couple of years now (on
SDCC, meh)


> This would imply that there is some means to tell apart
> different address spaces by looking at p.  This is *not* the
> case.  In particular, flash address 0x4 looks exactly the same
> like RAM address 0x4.  Both are 16-bit address 0x0004, and there
> is no way to tell them apart.  
> 

I'm a little surprised that there's no way to get pointer/symbol meta-
information from within the backend code to identify data being accessed
that will end up in __code / __flash / .rodata / .text.

On SDCC/mcs51 things are simple enough so that we can at least try to lookup
a given symbol from the backend code and see if it's got a known address
space assigned to it (via (forward) declaration).  This can be used to
optimize accesses to e.g. SFRs even during later stages of compilation.

Best regards,
Oleg Endo
Georg-Johann Lay Dec. 7, 2024, 2:37 p.m. UTC | #10
Am 07.12.24 um 02:03 schrieb Oleg Endo:
> 
> On Fri, 2024-12-06 at 16:51 +0100, Georg-Johann Lay wrote:
>>
>> The CRC tables ARE put into .rodata, not into .data.
>>
>> The correct question is: Why is avr putting .rodata into RAM?
>>
>> Suppose the following C code:
>>
>> char read_c (const char *p)
>> {
>>       return p[1];
>> }
>>
>> Where p may point to .rodata, .data, .bss etc.
>> Now suppose .rodata is located in flash and .data and .bss
>> are located in RAM.
> 
> Thanks for the detailed explanation!
> 
> I feel you.  I've been coding on mcs51 MCUs for a couple of years now (on
> SDCC, meh)
> 
>> This would imply that there is some means to tell apart
>> different address spaces by looking at p.  This is *not* the
>> case.  In particular, flash address 0x4 looks exactly the same
>> like RAM address 0x4.  Both are 16-bit address 0x0004, and there
>> is no way to tell them apart.
> 
> I'm a little surprised that there's no way to get pointer/symbol meta-
> information from within the backend code to identify data being accessed
> that will end up in __code / __flash / .rodata / .text.

There is a way: Named address spaces as of ISO/IEC DTR 18037.

Depending on the address space, different instructions, different
address register(s), different addressing modes have to be used.

You cannot just put an object into a different address space
without adjusting *all* accesses: Their instructions, their
address registers, their addressing modes, you name it.
In some cases it's even required so set some SFR to the MSByte
of a 24-byte address since GPRs can only hold 16-bit addresses.

So the question is. Who is specifying which address space to use?
Two ways:

1) The user knows that some object may live in a specific AS.
   All accesses to the object must use a pointer qualified for
   that AS (or otherwise, the code won't work).

2) The compiler assigns an AS (with the help of a target hook).
    All decls / rtxes thet accesses the object must have the
    correct TYPE_ADDR_SPACE / MEM_ADDR_SPACE qualification,
    or else wrong code is generated.

Notice that a "const" qualifier is not enough to conclude that
data is located in .rodata.  For example, in

char read_c (const char *pc)
{
     return pc[1];
}

it is perfectly fine to pass a pointer to .data / volatile memory.
Hence the code must work for addresses in .data as well as in .rodata.

> On SDCC/mcs51 things are simple enough so that we can at least try to lookup
> a given symbol from the backend code and see if it's got a known address
> space assigned to it (via (forward) declaration).  This can be used to
> optimize accesses to e.g. SFRs even during later stages of compilation.

AVR does that, too.  SFRs are located in a specific address range
(of the generic address space).  There are also attributes to
specify that specific instructions /may/ be used to access an SFR,
though that's rather an optimization.  SFRs on AVR are such that
they don't require new ASes.

Johann

> Best regards,
> Oleg Endo
diff mbox series

Patch

diff --git a/gcc/common/config/avr/avr-common.cc 
b/gcc/common/config/avr/avr-common.cc
index 9059e7d2b48..7a05e19a8be 100644
--- a/gcc/common/config/avr/avr-common.cc
+++ b/gcc/common/config/avr/avr-common.cc
@@ -32,6 +32,8 @@  static const struct default_options 
avr_option_optimization_table[] =
      // The only effect of -fcaller-saves might be that it triggers
      // a frame without need when it tries to be smart around calls.
      { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 },
+    // Avoid large lookup tables in RAM from -foptimize-crc.
+    { OPT_LEVELS_ALL, OPT_foptimize_crc, NULL, 0 },
      { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mgas_isr_prologues, NULL, 1 },
      { OPT_LEVELS_1_PLUS, OPT_mmain_is_OS_task, NULL, 1 },