Message ID | 7fc6ebbf-4a3d-4e37-a8bc-975d5f9c5e23@gjlay.de |
---|---|
State | New |
Headers | show |
Series | [avr] Disable CRC lookup tables | expand |
Georg-Johann Lay <avr@gjlay.de> writes: > This patch disables CRC lookup tables which consume quite some RAM. Given that -foptimize-crc is new, it may be useful to CC the pass authors in case they have input. > > Ok for trunk? > > Johann
Am 06.12.24 um 13:23 schrieb Sam James: > Georg-Johann Lay <avr@gjlay.de> writes: > >> This patch disables CRC lookup tables which consume quite some RAM. > > Given that -foptimize-crc is new, it may be useful to CC the pass > authors in case they have input. CCing Mariam Arutunian >> Ok for trunk? >> >> Johann The problem is not in the new CRC pass, but because AVR is a very limited hardware. Because AVR has no linear address space, .rodata has to be placed in RAM, and most devices have just a few KiB of RAM, or even less. An extension PR49857 to put such lookup tables in flash / program memory has been rejected by the global maintainers as "too specific", ignoring most of the constraints imposed by the requirement of using named address-spaces. The point is that you cannot just put data in flash, one must also use the correct instructions to access them, which is achieved by means of avr specific named address-spaces like __flash. This would require a new target hook like proposed in PR49857, which could put lookup-tables into a non-generic address-space provided: *) All respective data is put in the preferred address-space, and *) All accesses have to use the same address-space as of 1), independent of what the rest of the code may look like. To date, only 3 lookup tables generated by GCC meet these criteria: 1) Lookup tables from tree-switch conversion. 2) Lookup tables from the current CRC work 3) vtables. Though g++ does not, and probably never will, support named address spaces. Johann
On 12/6/24 5:23 AM, Sam James wrote: > Georg-Johann Lay <avr@gjlay.de> writes: > >> This patch disables CRC lookup tables which consume quite some RAM. > > Given that -foptimize-crc is new, it may be useful to CC the pass > authors in case they have input. I think this is trivially OK for the AVR. The bigger question is should we do something more general for -Os. CRC generation through table lookups is going to take more data space. You need a 256 byte table for each unique CRC (sizes & polynomial), and the code to compute the index into the table can be (from a code size standpoint) relatively expensive as well, particularly on the micro-controllers if the crc is to be computed in a mode wider than a word on the target. So I would actually even support a more general "don't optimize CRCs by default for -Os". Jeff
On Fri, Dec 6, 2024 at 2:17 PM Georg-Johann Lay <avr@gjlay.de> wrote: > > Am 06.12.24 um 13:23 schrieb Sam James: > > Georg-Johann Lay <avr@gjlay.de> writes: > > > >> This patch disables CRC lookup tables which consume quite some RAM. > > > > Given that -foptimize-crc is new, it may be useful to CC the pass > > authors in case they have input. > > CCing Mariam Arutunian > > >> Ok for trunk? > >> > >> Johann > > The problem is not in the new CRC pass, but because AVR is a very > limited hardware. Because AVR has no linear address space, .rodata > has to be placed in RAM, and most devices have just a few KiB of RAM, > or even less. > > An extension PR49857 to put such lookup tables in flash / program memory > has been rejected by the global maintainers as "too specific", ignoring > most of the constraints imposed by the requirement of using named > address-spaces. > > The point is that you cannot just put data in flash, one must also use > the correct instructions to access them, which is achieved by means > of avr specific named address-spaces like __flash. > > This would require a new target hook like proposed in PR49857, which > could put lookup-tables into a non-generic address-space provided: > > *) All respective data is put in the preferred address-space, and > > *) All accesses have to use the same address-space as of 1), > independent of what the rest of the code may look like. > > To date, only 3 lookup tables generated by GCC meet these criteria: > > 1) Lookup tables from tree-switch conversion. > > 2) Lookup tables from the current CRC work > > 3) vtables. Though g++ does not, and probably never will, support > named address spaces. 4) const global data I think generally this might be a sound optimization - but as you said it requires generating correct accesses in the first place which also means once anything takes the address of the data all pointer uses have to know beforehand. Thus application is likely quite limited for 4) at least. Maybe it's possible to perform half of the task by the linker via relaxation? Though I can easily guess it's somewhat difficult for AVR. Richard. > Johann > > > >
Georg-Johann Lay <avr@gjlay.de> writes: > Am 06.12.24 um 13:23 schrieb Sam James: >> Georg-Johann Lay <avr@gjlay.de> writes: >> >>> This patch disables CRC lookup tables which consume quite some RAM. >> Given that -foptimize-crc is new, it may be useful to CC the pass >> authors in case they have input. > > CCing Mariam Arutunian > >>> Ok for trunk? >>> >>> Johann > > The problem is not in the new CRC pass, but because AVR is a very > limited hardware. Because AVR has no linear address space, .rodata > has to be placed in RAM, and most devices have just a few KiB of RAM, > or even less. (To be clear, I do accept that -- it's more that it's: a) useful as a heads-up to people; b) might affect their future testing; c) might mean we end up talking about whether the patch has issues on other targets). > > An extension PR49857 to put such lookup tables in flash / program memory > has been rejected by the global maintainers as "too specific", ignoring > most of the constraints imposed by the requirement of using named > address-spaces. > > The point is that you cannot just put data in flash, one must also use > the correct instructions to access them, which is achieved by means > of avr specific named address-spaces like __flash. > > This would require a new target hook like proposed in PR49857, which > could put lookup-tables into a non-generic address-space provided: > > *) All respective data is put in the preferred address-space, and > > *) All accesses have to use the same address-space as of 1), > independent of what the rest of the code may look like. > > To date, only 3 lookup tables generated by GCC meet these criteria: > > 1) Lookup tables from tree-switch conversion. > > 2) Lookup tables from the current CRC work > > 3) vtables. Though g++ does not, and probably never will, support > named address spaces. Thanks for the extra context! > > Johann sam
Am 06.12.24 um 14:53 schrieb Richard Biener: > On Fri, Dec 6, 2024 at 2:17 PM Georg-Johann Lay <avr@gjlay.de> wrote: >> >> Am 06.12.24 um 13:23 schrieb Sam James: >>> Georg-Johann Lay <avr@gjlay.de> writes: >>> >>>> This patch disables CRC lookup tables which consume quite some RAM. >>> >>> Given that -foptimize-crc is new, it may be useful to CC the pass >>> authors in case they have input. >> >> CCing Mariam Arutunian >> >>>> Ok for trunk? >>>> >>>> Johann >> >> The problem is not in the new CRC pass, but because AVR is a very >> limited hardware. Because AVR has no linear address space, .rodata >> has to be placed in RAM, and most devices have just a few KiB of RAM, >> or even less. >> >> An extension PR49857 to put such lookup tables in flash / program memory >> has been rejected by the global maintainers as "too specific", ignoring >> most of the constraints imposed by the requirement of using named >> address-spaces. >> >> The point is that you cannot just put data in flash, one must also use >> the correct instructions to access them, which is achieved by means >> of avr specific named address-spaces like __flash. >> >> This would require a new target hook like proposed in PR49857, which >> could put lookup-tables into a non-generic address-space provided: >> >> *) All respective data is put in the preferred address-space, and >> >> *) All accesses have to use the same address-space as of 1), >> independent of what the rest of the code may look like. >> >> To date, only 3 lookup tables generated by GCC meet these criteria: >> >> 1) Lookup tables from tree-switch conversion. >> >> 2) Lookup tables from the current CRC work >> >> 3) vtables. Though g++ does not, and probably never will, support >> named address spaces. > > 4) const global data No. That won't work. Suppose: foo.c: const int ii = 1; int inc (const int*); int func (void) { return inc (&ii); } bar.c: int inc (const int *p) { return 1 + *p; } This works when ii is in generic address-space. But when you put ii in a different AS (like e.g. __flash), then inc() will read from the wrong AS. You'd need a version of inc() that is int inc_flash (const __flash int *p) { return 1 + *p; } TL;DR You don't have control over *all* accesses, but in foo.c the pointer to ii may escape. Notice that in, say, tree-switch-conversion this is different because that pass knows *all* accesses to the table, and it could attach an AS from a target hook to all accesses. Once that pass has finished, there's no more way to retroactively optimize this, because addresses may escape to other modules or to inline asm, or there may be copies or the tree var. You's have to find all tree vars ssa_names. > I think generally this might be a sound optimization - but as you said > it requires generating correct accesses in the first place which also > means once anything takes the address of the data all pointer uses > have to know beforehand. Thus application is likely quite limited for > 4) at least. > > Maybe it's possible to perform half of the task by the linker via > relaxation? Though I can easily guess it's somewhat difficult for AVR. > > Richard. No. I don't see what the linker has to do with it. Different ASes even have different instructions to access them and support different addressing modes and need different address registers. Addressing modes are generated by the compiler and TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P. The linker cannot fix that. Johann
On Fri, 2024-12-06 at 06:32 -0700, Jeff Law wrote: > > On 12/6/24 5:23 AM, Sam James wrote: > > Georg-Johann Lay <avr@gjlay.de> writes: > > > > > This patch disables CRC lookup tables which consume quite some RAM. > > > > Given that -foptimize-crc is new, it may be useful to CC the pass > > authors in case they have input. > I think this is trivially OK for the AVR. The bigger question is should > we do something more general for -Os. > > CRC generation through table lookups is going to take more data space. > You need a 256 byte table for each unique CRC (sizes & polynomial), and > the code to compute the index into the table can be (from a code size > standpoint) relatively expensive as well, particularly on the > micro-controllers if the crc is to be computed in a mode wider than a > word on the target. > > So I would actually even support a more general "don't optimize CRCs by > default for -Os". > I've been putting CRC tables for many years into .text / .rodata on various MCU projects. Never considered putting them into .data, since flash is usually a lot larger than RAM. What's the reasoning behind putting the tables in .data? Best regards, Oleg Endo
Am 06.12.24 um 15:50 schrieb Oleg Endo: > On Fri, 2024-12-06 at 06:32 -0700, Jeff Law wrote: >> >> On 12/6/24 5:23 AM, Sam James wrote: >>> Georg-Johann Lay <avr@gjlay.de> writes: >>> >>>> This patch disables CRC lookup tables which consume quite some RAM. >>> >>> Given that -foptimize-crc is new, it may be useful to CC the pass >>> authors in case they have input. >> I think this is trivially OK for the AVR. The bigger question is should >> we do something more general for -Os. >> >> CRC generation through table lookups is going to take more data space. >> You need a 256 byte table for each unique CRC (sizes & polynomial), and >> the code to compute the index into the table can be (from a code size >> standpoint) relatively expensive as well, particularly on the >> micro-controllers if the crc is to be computed in a mode wider than a >> word on the target. >> >> So I would actually even support a more general "don't optimize CRCs by >> default for -Os". > > I've been putting CRC tables for many years into .text / .rodata on various > MCU projects. Never considered putting them into .data, since flash is > usually a lot larger than RAM. What's the reasoning behind putting the > tables in .data? > > Best regards, > Oleg Endo The CRC tables ARE put into .rodata, not into .data. The correct question is: Why is avr putting .rodata into RAM? Suppose the following C code: char read_c (const char *p) { return p[1]; } Where p may point to .rodata, .data, .bss etc. Now suppose .rodata is located in flash and .data and .bss are located in RAM. Then you'd need the following code to access c[1]: if (ADDR_SPACE (p) == GENERIC) Use LD, LDD with address R, R++, --R or R+const where R is one of the address registers X, Y or Z. Also may use direct addressing. else if (ADDR_SPACE (p) == __flash) Use LPM with Z or Z++ addressing mode. No other addressing mode or address reg or instruction is allowed. This would imply that there is some means to tell apart different address spaces by looking at p. This is *not* the case. In particular, flash address 0x4 looks exactly the same like RAM address 0x4. Both are 16-bit address 0x0004, and there is no way to tell them apart. It would have been possible to reserve one bit of the address for the address space and let the linker set that bit depending on the address space where a symbol is placed. This means you'd have to tell apart the addresses at run-time. This is not the approach taken by the avr tools. (This was the design decision back then, and it was a good decision, even in retrospect IMO.) Instead, they put .rodata in RAM, and when the user wants a table in flash, she has to put it in section .progmem.data by hand and use inline assembly to access them. https://avrdudes.github.io/avr-libc/avr-libc-user-manual/group__avr__pgmspace.html As an alternative, a named address space can be used when available, but all such objects have to be put in that address space by hand, and all objects have to be accessed through pointers qualified as __flash. There are more 16-bit address spaces like __flash1, __flash2 __flash3, __flash4, __flash5 for larger devices. And there is a 24-bit address space that can host references to any address space, including generic. But using them comes with quite some overhead because the decision which AS to use has to be taken at run-time. This also means the addressing mode is limited to the ones supported by /all/ address spaces, since you don't know at compile time where the address points to. Johann.
On Fri, 2024-12-06 at 16:51 +0100, Georg-Johann Lay wrote: > > The CRC tables ARE put into .rodata, not into .data. > > The correct question is: Why is avr putting .rodata into RAM? > > Suppose the following C code: > > char read_c (const char *p) > { > return p[1]; > } > > Where p may point to .rodata, .data, .bss etc. > Now suppose .rodata is located in flash and .data and .bss > are located in RAM. Thanks for the detailed explanation! I feel you. I've been coding on mcs51 MCUs for a couple of years now (on SDCC, meh) > This would imply that there is some means to tell apart > different address spaces by looking at p. This is *not* the > case. In particular, flash address 0x4 looks exactly the same > like RAM address 0x4. Both are 16-bit address 0x0004, and there > is no way to tell them apart. > I'm a little surprised that there's no way to get pointer/symbol meta- information from within the backend code to identify data being accessed that will end up in __code / __flash / .rodata / .text. On SDCC/mcs51 things are simple enough so that we can at least try to lookup a given symbol from the backend code and see if it's got a known address space assigned to it (via (forward) declaration). This can be used to optimize accesses to e.g. SFRs even during later stages of compilation. Best regards, Oleg Endo
Am 07.12.24 um 02:03 schrieb Oleg Endo: > > On Fri, 2024-12-06 at 16:51 +0100, Georg-Johann Lay wrote: >> >> The CRC tables ARE put into .rodata, not into .data. >> >> The correct question is: Why is avr putting .rodata into RAM? >> >> Suppose the following C code: >> >> char read_c (const char *p) >> { >> return p[1]; >> } >> >> Where p may point to .rodata, .data, .bss etc. >> Now suppose .rodata is located in flash and .data and .bss >> are located in RAM. > > Thanks for the detailed explanation! > > I feel you. I've been coding on mcs51 MCUs for a couple of years now (on > SDCC, meh) > >> This would imply that there is some means to tell apart >> different address spaces by looking at p. This is *not* the >> case. In particular, flash address 0x4 looks exactly the same >> like RAM address 0x4. Both are 16-bit address 0x0004, and there >> is no way to tell them apart. > > I'm a little surprised that there's no way to get pointer/symbol meta- > information from within the backend code to identify data being accessed > that will end up in __code / __flash / .rodata / .text. There is a way: Named address spaces as of ISO/IEC DTR 18037. Depending on the address space, different instructions, different address register(s), different addressing modes have to be used. You cannot just put an object into a different address space without adjusting *all* accesses: Their instructions, their address registers, their addressing modes, you name it. In some cases it's even required so set some SFR to the MSByte of a 24-byte address since GPRs can only hold 16-bit addresses. So the question is. Who is specifying which address space to use? Two ways: 1) The user knows that some object may live in a specific AS. All accesses to the object must use a pointer qualified for that AS (or otherwise, the code won't work). 2) The compiler assigns an AS (with the help of a target hook). All decls / rtxes thet accesses the object must have the correct TYPE_ADDR_SPACE / MEM_ADDR_SPACE qualification, or else wrong code is generated. Notice that a "const" qualifier is not enough to conclude that data is located in .rodata. For example, in char read_c (const char *pc) { return pc[1]; } it is perfectly fine to pass a pointer to .data / volatile memory. Hence the code must work for addresses in .data as well as in .rodata. > On SDCC/mcs51 things are simple enough so that we can at least try to lookup > a given symbol from the backend code and see if it's got a known address > space assigned to it (via (forward) declaration). This can be used to > optimize accesses to e.g. SFRs even during later stages of compilation. AVR does that, too. SFRs are located in a specific address range (of the generic address space). There are also attributes to specify that specific instructions /may/ be used to access an SFR, though that's rather an optimization. SFRs on AVR are such that they don't require new ASes. Johann > Best regards, > Oleg Endo
diff --git a/gcc/common/config/avr/avr-common.cc b/gcc/common/config/avr/avr-common.cc index 9059e7d2b48..7a05e19a8be 100644 --- a/gcc/common/config/avr/avr-common.cc +++ b/gcc/common/config/avr/avr-common.cc @@ -32,6 +32,8 @@ static const struct default_options avr_option_optimization_table[] = // The only effect of -fcaller-saves might be that it triggers // a frame without need when it tries to be smart around calls. { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 }, + // Avoid large lookup tables in RAM from -foptimize-crc. + { OPT_LEVELS_ALL, OPT_foptimize_crc, NULL, 0 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mgas_isr_prologues, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_mmain_is_OS_task, NULL, 1 },