Message ID | 20230328180139.74395-1-xry111@xry111.site |
---|---|
State | New |
Headers | show |
Series | LoongArch: Improve GAR store for va_list | expand |
Ping. Or maybe I've lost some replies here because my mail server crashed several days ago :). On Wed, 2023-03-29 at 02:01 +0800, Xi Ruoyao wrote: > LoongArch backend used to save all GARs for a function with variable > arguments. But sometimes a function only accepts variable arguments > for > a purpose like C++ function overloading. For example, POSIX defines > open() as: > > int open(const char *path, int oflag, ...); > > But only two forms are actually used: > > int open(const char *pathname, int flags); > int open(const char *pathname, int flags, mode_t mode); > > So it's obviously a waste to save all 8 GARs in open(). We can use > the > cfun->va_list_gpr_size field set by the stdarg pass to only save the > GARs necessary to be saved. > > If the va_list escapes (for example, in fprintf() we pass it to > vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we > don't need a special case. > > With this patch, only one GAR ($a2/$r6) is saved in open(). Ideally > even this stack store should be omitted too, but doing so is not > trivial > and AFAIK there are no compilers (for any target) performing the > "ideal" > optimization here, see https://godbolt.org/z/n1YqWq9c9. > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk > (GCC 14 or now)? > > gcc/ChangeLog: > > * config/loongarch/loongarch.cc > (loongarch_setup_incoming_varargs): Don't save more GARs than > cfun->va_list_gpr_size / UNITS_PER_WORD. > > gcc/testsuite/ChangeLog: > > * gcc.target/loongarch/va_arg.c: New test. > --- > gcc/config/loongarch/loongarch.cc | 4 +++- > gcc/testsuite/gcc.target/loongarch/va_arg.c | 24 > +++++++++++++++++++++ > 2 files changed, 27 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/loongarch/va_arg.c > > diff --git a/gcc/config/loongarch/loongarch.cc > b/gcc/config/loongarch/loongarch.cc > index 6927bdc7fe5..0ecb91ca997 100644 > --- a/gcc/config/loongarch/loongarch.cc > +++ b/gcc/config/loongarch/loongarch.cc > @@ -764,7 +764,9 @@ loongarch_setup_incoming_varargs > (cumulative_args_t cum, > loongarch_function_arg_advance (pack_cumulative_args > (&local_cum), arg); > > /* Found out how many registers we need to save. */ > - gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; > + gp_saved = cfun->va_list_gpr_size / UNITS_PER_WORD; > + if (gp_saved > (int) (MAX_ARGS_IN_REGISTERS - local_cum.num_gprs)) > + gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; > > if (!no_rtl && gp_saved > 0) > { > diff --git a/gcc/testsuite/gcc.target/loongarch/va_arg.c > b/gcc/testsuite/gcc.target/loongarch/va_arg.c > new file mode 100644 > index 00000000000..980c96d0e3d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/loongarch/va_arg.c > @@ -0,0 +1,24 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +/* Technically we shouldn't save any register for this function: it > should be > + compiled as if it accepts 3 named arguments. But AFAIK no > compilers can > + achieve this "perfect" optimization now, so just ensure we are > using the > + knowledge provided by stdarg pass and we won't save GARs > impossible to be > + accessed with __builtin_va_arg () when the va_list does not > escape. */ > + > +/* { dg-final { scan-assembler-not "st.*r7" } } */ > + > +int > +test (int a0, ...) > +{ > + void *arg; > + int a1, a2; > + > + __builtin_va_start (arg, a0); > + a1 = __builtin_va_arg (arg, int); > + a2 = __builtin_va_arg (arg, int); > + __builtin_va_end (arg); > + > + return a0 + a1 + a2; > +}
Sorry, it's my question. I still have some questions that I haven't understood, so I haven't replied to the email yet.:-( 在 2023/4/10 下午5:04, Xi Ruoyao 写道: > Ping. Or maybe I've lost some replies here because my mail server > crashed several days ago :). > > On Wed, 2023-03-29 at 02:01 +0800, Xi Ruoyao wrote: >> LoongArch backend used to save all GARs for a function with variable >> arguments. But sometimes a function only accepts variable arguments >> for >> a purpose like C++ function overloading. For example, POSIX defines >> open() as: >> >> int open(const char *path, int oflag, ...); >> >> But only two forms are actually used: >> >> int open(const char *pathname, int flags); >> int open(const char *pathname, int flags, mode_t mode); >> >> So it's obviously a waste to save all 8 GARs in open(). We can use >> the >> cfun->va_list_gpr_size field set by the stdarg pass to only save the >> GARs necessary to be saved. >> >> If the va_list escapes (for example, in fprintf() we pass it to >> vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we >> don't need a special case. >> >> With this patch, only one GAR ($a2/$r6) is saved in open(). Ideally >> even this stack store should be omitted too, but doing so is not >> trivial >> and AFAIK there are no compilers (for any target) performing the >> "ideal" >> optimization here, see https://godbolt.org/z/n1YqWq9c9. >> >> Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk >> (GCC 14 or now)? >> >> gcc/ChangeLog: >> >> * config/loongarch/loongarch.cc >> (loongarch_setup_incoming_varargs): Don't save more GARs than >> cfun->va_list_gpr_size / UNITS_PER_WORD. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/loongarch/va_arg.c: New test. >> --- >> gcc/config/loongarch/loongarch.cc | 4 +++- >> gcc/testsuite/gcc.target/loongarch/va_arg.c | 24 >> +++++++++++++++++++++ >> 2 files changed, 27 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.target/loongarch/va_arg.c >> >> diff --git a/gcc/config/loongarch/loongarch.cc >> b/gcc/config/loongarch/loongarch.cc >> index 6927bdc7fe5..0ecb91ca997 100644 >> --- a/gcc/config/loongarch/loongarch.cc >> +++ b/gcc/config/loongarch/loongarch.cc >> @@ -764,7 +764,9 @@ loongarch_setup_incoming_varargs >> (cumulative_args_t cum, >> loongarch_function_arg_advance (pack_cumulative_args >> (&local_cum), arg); >> >> /* Found out how many registers we need to save. */ >> - gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; >> + gp_saved = cfun->va_list_gpr_size / UNITS_PER_WORD; >> + if (gp_saved > (int) (MAX_ARGS_IN_REGISTERS - local_cum.num_gprs)) >> + gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; >> >> if (!no_rtl && gp_saved > 0) >> { >> diff --git a/gcc/testsuite/gcc.target/loongarch/va_arg.c >> b/gcc/testsuite/gcc.target/loongarch/va_arg.c >> new file mode 100644 >> index 00000000000..980c96d0e3d >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/loongarch/va_arg.c >> @@ -0,0 +1,24 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +/* Technically we shouldn't save any register for this function: it >> should be >> + compiled as if it accepts 3 named arguments. But AFAIK no >> compilers can >> + achieve this "perfect" optimization now, so just ensure we are >> using the >> + knowledge provided by stdarg pass and we won't save GARs >> impossible to be >> + accessed with __builtin_va_arg () when the va_list does not >> escape. */ >> + >> +/* { dg-final { scan-assembler-not "st.*r7" } } */ >> + >> +int >> +test (int a0, ...) >> +{ >> + void *arg; >> + int a1, a2; >> + >> + __builtin_va_start (arg, a0); >> + a1 = __builtin_va_arg (arg, int); >> + a2 = __builtin_va_arg (arg, int); >> + __builtin_va_end (arg); >> + >> + return a0 + a1 + a2; >> +}
On Mon, 2023-04-10 at 17:45 +0800, Lulu Cheng wrote: > Sorry, it's my question. I still have some questions that I haven't > understood, so I haven't replied to the email yet.:-( I've verified the value of cfun->va_list_gpr_size with -fdump-tree- stdarg and various testcases (including extracting aggregates and floating-point values in the va list) and the result seems correct. And gcc/testsuite/gcc.c-torture/execute/va-arg-*.c should provide a good enough test coverage. Is there still something seemly problematic? > > 在 2023/4/10 下午5:04, Xi Ruoyao 写道: > > Ping. Or maybe I've lost some replies here because my mail server > > crashed several days ago :). > > > > On Wed, 2023-03-29 at 02:01 +0800, Xi Ruoyao wrote: > > > LoongArch backend used to save all GARs for a function with > > > variable > > > arguments. But sometimes a function only accepts variable > > > arguments > > > for > > > a purpose like C++ function overloading. For example, POSIX > > > defines > > > open() as: > > > > > > int open(const char *path, int oflag, ...); > > > > > > But only two forms are actually used: > > > > > > int open(const char *pathname, int flags); > > > int open(const char *pathname, int flags, mode_t mode); > > > > > > So it's obviously a waste to save all 8 GARs in open(). We can > > > use > > > the > > > cfun->va_list_gpr_size field set by the stdarg pass to only save > > > the > > > GARs necessary to be saved. > > > > > > If the va_list escapes (for example, in fprintf() we pass it to > > > vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we > > > don't need a special case. > > > > > > With this patch, only one GAR ($a2/$r6) is saved in open(). > > > Ideally > > > even this stack store should be omitted too, but doing so is not > > > trivial > > > and AFAIK there are no compilers (for any target) performing the > > > "ideal" > > > optimization here, see https://godbolt.org/z/n1YqWq9c9. > > > > > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk > > > (GCC 14 or now)? > > > > > > gcc/ChangeLog: > > > > > > * config/loongarch/loongarch.cc > > > (loongarch_setup_incoming_varargs): Don't save more GARs > > > than > > > cfun->va_list_gpr_size / UNITS_PER_WORD. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/loongarch/va_arg.c: New test. > > > --- > > > gcc/config/loongarch/loongarch.cc | 4 +++- > > > gcc/testsuite/gcc.target/loongarch/va_arg.c | 24 > > > +++++++++++++++++++++ > > > 2 files changed, 27 insertions(+), 1 deletion(-) > > > create mode 100644 gcc/testsuite/gcc.target/loongarch/va_arg.c > > > > > > diff --git a/gcc/config/loongarch/loongarch.cc > > > b/gcc/config/loongarch/loongarch.cc > > > index 6927bdc7fe5..0ecb91ca997 100644 > > > --- a/gcc/config/loongarch/loongarch.cc > > > +++ b/gcc/config/loongarch/loongarch.cc > > > @@ -764,7 +764,9 @@ loongarch_setup_incoming_varargs > > > (cumulative_args_t cum, > > > loongarch_function_arg_advance (pack_cumulative_args > > > (&local_cum), arg); > > > > > > /* Found out how many registers we need to save. */ > > > - gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; > > > + gp_saved = cfun->va_list_gpr_size / UNITS_PER_WORD; > > > + if (gp_saved > (int) (MAX_ARGS_IN_REGISTERS - > > > local_cum.num_gprs)) > > > + gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; > > > > > > if (!no_rtl && gp_saved > 0) > > > { > > > diff --git a/gcc/testsuite/gcc.target/loongarch/va_arg.c > > > b/gcc/testsuite/gcc.target/loongarch/va_arg.c > > > new file mode 100644 > > > index 00000000000..980c96d0e3d > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/loongarch/va_arg.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2" } */ > > > + > > > +/* Technically we shouldn't save any register for this function: > > > it > > > should be > > > + compiled as if it accepts 3 named arguments. But AFAIK no > > > compilers can > > > + achieve this "perfect" optimization now, so just ensure we are > > > using the > > > + knowledge provided by stdarg pass and we won't save GARs > > > impossible to be > > > + accessed with __builtin_va_arg () when the va_list does not > > > escape. */ > > > + > > > +/* { dg-final { scan-assembler-not "st.*r7" } } */ > > > + > > > +int > > > +test (int a0, ...) > > > +{ > > > + void *arg; > > > + int a1, a2; > > > + > > > + __builtin_va_start (arg, a0); > > > + a1 = __builtin_va_arg (arg, int); > > > + a2 = __builtin_va_arg (arg, int); > > > + __builtin_va_end (arg); > > > + > > > + return a0 + a1 + a2; > > > +} >
在 2023/4/18 下午5:27, Xi Ruoyao 写道: > On Mon, 2023-04-10 at 17:45 +0800, Lulu Cheng wrote: >> Sorry, it's my question. I still have some questions that I haven't >> understood, so I haven't replied to the email yet.:-( > I've verified the value of cfun->va_list_gpr_size with -fdump-tree- > stdarg and various testcases (including extracting aggregates and > floating-point values in the va list) and the result seems correct. And > gcc/testsuite/gcc.c-torture/execute/va-arg-*.c should provide a good > enough test coverage. > > Is there still something seemly problematic? I think there is no problem with the code modification, but I found that the $r12 register is stored whether or not this patch is added. I don't understand why.:-( > >> 在 2023/4/10 下午5:04, Xi Ruoyao 写道: >>> Ping. Or maybe I've lost some replies here because my mail server >>> crashed several days ago :). >>> >>> On Wed, 2023-03-29 at 02:01 +0800, Xi Ruoyao wrote: >>>> LoongArch backend used to save all GARs for a function with >>>> variable >>>> arguments. But sometimes a function only accepts variable >>>> arguments >>>> for >>>> a purpose like C++ function overloading. For example, POSIX >>>> defines >>>> open() as: >>>> >>>> int open(const char *path, int oflag, ...); >>>> >>>> But only two forms are actually used: >>>> >>>> int open(const char *pathname, int flags); >>>> int open(const char *pathname, int flags, mode_t mode); >>>> >>>> So it's obviously a waste to save all 8 GARs in open(). We can >>>> use >>>> the >>>> cfun->va_list_gpr_size field set by the stdarg pass to only save >>>> the >>>> GARs necessary to be saved. >>>> >>>> If the va_list escapes (for example, in fprintf() we pass it to >>>> vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we >>>> don't need a special case. >>>> >>>> With this patch, only one GAR ($a2/$r6) is saved in open(). >>>> Ideally >>>> even this stack store should be omitted too, but doing so is not >>>> trivial >>>> and AFAIK there are no compilers (for any target) performing the >>>> "ideal" >>>> optimization here, see https://godbolt.org/z/n1YqWq9c9. >>>> >>>> Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk >>>> (GCC 14 or now)? >>>> >>>> gcc/ChangeLog: >>>> >>>> * config/loongarch/loongarch.cc >>>> (loongarch_setup_incoming_varargs): Don't save more GARs >>>> than >>>> cfun->va_list_gpr_size / UNITS_PER_WORD. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * gcc.target/loongarch/va_arg.c: New test. >>>> --- >>>> gcc/config/loongarch/loongarch.cc | 4 +++- >>>> gcc/testsuite/gcc.target/loongarch/va_arg.c | 24 >>>> +++++++++++++++++++++ >>>> 2 files changed, 27 insertions(+), 1 deletion(-) >>>> create mode 100644 gcc/testsuite/gcc.target/loongarch/va_arg.c >>>> >>>> diff --git a/gcc/config/loongarch/loongarch.cc >>>> b/gcc/config/loongarch/loongarch.cc >>>> index 6927bdc7fe5..0ecb91ca997 100644 >>>> --- a/gcc/config/loongarch/loongarch.cc >>>> +++ b/gcc/config/loongarch/loongarch.cc >>>> @@ -764,7 +764,9 @@ loongarch_setup_incoming_varargs >>>> (cumulative_args_t cum, >>>> loongarch_function_arg_advance (pack_cumulative_args >>>> (&local_cum), arg); >>>> >>>> /* Found out how many registers we need to save. */ >>>> - gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; >>>> + gp_saved = cfun->va_list_gpr_size / UNITS_PER_WORD; >>>> + if (gp_saved > (int) (MAX_ARGS_IN_REGISTERS - >>>> local_cum.num_gprs)) >>>> + gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; >>>> >>>> if (!no_rtl && gp_saved > 0) >>>> { >>>> diff --git a/gcc/testsuite/gcc.target/loongarch/va_arg.c >>>> b/gcc/testsuite/gcc.target/loongarch/va_arg.c >>>> new file mode 100644 >>>> index 00000000000..980c96d0e3d >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/loongarch/va_arg.c >>>> @@ -0,0 +1,24 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2" } */ >>>> + >>>> +/* Technically we shouldn't save any register for this function: >>>> it >>>> should be >>>> + compiled as if it accepts 3 named arguments. But AFAIK no >>>> compilers can >>>> + achieve this "perfect" optimization now, so just ensure we are >>>> using the >>>> + knowledge provided by stdarg pass and we won't save GARs >>>> impossible to be >>>> + accessed with __builtin_va_arg () when the va_list does not >>>> escape. */ >>>> + >>>> +/* { dg-final { scan-assembler-not "st.*r7" } } */ >>>> + >>>> +int >>>> +test (int a0, ...) >>>> +{ >>>> + void *arg; >>>> + int a1, a2; >>>> + >>>> + __builtin_va_start (arg, a0); >>>> + a1 = __builtin_va_arg (arg, int); >>>> + a2 = __builtin_va_arg (arg, int); >>>> + __builtin_va_end (arg); >>>> + >>>> + return a0 + a1 + a2; >>>> +}
On Tue, 2023-04-18 at 19:21 +0800, Lulu Cheng wrote: > > 在 2023/4/18 下午5:27, Xi Ruoyao 写道: > > On Mon, 2023-04-10 at 17:45 +0800, Lulu Cheng wrote: > > > Sorry, it's my question. I still have some questions that I haven't > > > understood, so I haven't replied to the email yet.:-( > > I've verified the value of cfun->va_list_gpr_size with -fdump-tree- > > stdarg and various testcases (including extracting aggregates and > > floating-point values in the va list) and the result seems correct. And > > gcc/testsuite/gcc.c-torture/execute/va-arg-*.c should provide a good > > enough test coverage. > > > > Is there still something seemly problematic? > > > I think there is no problem with the code modification, but I found that > the $r12 register is stored whether or not this patch is added. I don't > understand why.:-( It has been stored before the change: test: .LFB0 = . .cfi_startproc addi.d $r3,$r3,-80 .cfi_def_cfa_offset 80 addi.d $r12,$r3,24 st.d $r5,$r3,24 st.d $r6,$r3,32 st.d $r7,$r3,40 st.d $r8,$r3,48 st.d $r9,$r3,56 st.d $r10,$r3,64 st.d $r11,$r3,72 st.d $r12,$r3,8 # <===== add.w $r4,$r5,$r4 addi.d $r3,$r3,80 .cfi_def_cfa_offset 0 jr $r1 .cfi_endproc AFAIK it's related to how the variable arguments are implemented in general. The problem is when we expands __builtin_va_list or __builtin_va_arg, the registers containing the variable arguments and the pointer to the variable argument store area (r12 in this case) may be already clobbered, so the compiler have to store them expanding the prologue of the function (when the prologue is expanded we don't know if the following code will clobber the registers). This also causes a difficulty to avoid saving the GARs for *used* variable arguments as well. On x86_64 we have the same issue: test: .LFB0: .cfi_startproc leaq 8(%rsp), %rax movq %rsi, -40(%rsp) movq %rax, -64(%rsp) # <===== leaq -48(%rsp), %rax movq %rax, -56(%rsp) movl -40(%rsp), %eax movl $8, -72(%rsp) addl %edi, %eax ret .cfi_endproc I'll try to remove all of these in the GCC 14 development cycle (as they are causing sub-optimal code in various Glibc functions), but it's not easy...
在 2023/4/18 下午7:48, Xi Ruoyao 写道: > On Tue, 2023-04-18 at 19:21 +0800, Lulu Cheng wrote: >> 在 2023/4/18 下午5:27, Xi Ruoyao 写道: >>> On Mon, 2023-04-10 at 17:45 +0800, Lulu Cheng wrote: >>>> Sorry, it's my question. I still have some questions that I haven't >>>> understood, so I haven't replied to the email yet.:-( >>> I've verified the value of cfun->va_list_gpr_size with -fdump-tree- >>> stdarg and various testcases (including extracting aggregates and >>> floating-point values in the va list) and the result seems correct. And >>> gcc/testsuite/gcc.c-torture/execute/va-arg-*.c should provide a good >>> enough test coverage. >>> >>> Is there still something seemly problematic? >> >> I think there is no problem with the code modification, but I found that >> the $r12 register is stored whether or not this patch is added. I don't >> understand why.:-( > It has been stored before the change: > > test: > .LFB0 = . > .cfi_startproc > addi.d $r3,$r3,-80 > .cfi_def_cfa_offset 80 > addi.d $r12,$r3,24 > st.d $r5,$r3,24 > st.d $r6,$r3,32 > st.d $r7,$r3,40 > st.d $r8,$r3,48 > st.d $r9,$r3,56 > st.d $r10,$r3,64 > st.d $r11,$r3,72 > st.d $r12,$r3,8 # <===== > add.w $r4,$r5,$r4 > addi.d $r3,$r3,80 > .cfi_def_cfa_offset 0 > jr $r1 > .cfi_endproc > > AFAIK it's related to how the variable arguments are implemented in > general. The problem is when we expands __builtin_va_list or > __builtin_va_arg, the registers containing the variable arguments and > the pointer to the variable argument store area (r12 in this case) may > be already clobbered, so the compiler have to store them expanding the > prologue of the function (when the prologue is expanded we don't know if > the following code will clobber the registers). > > This also causes a difficulty to avoid saving the GARs for *used* > variable arguments as well. > > On x86_64 we have the same issue: > > test: > .LFB0: > .cfi_startproc > leaq 8(%rsp), %rax > movq %rsi, -40(%rsp) > movq %rax, -64(%rsp) # <===== > leaq -48(%rsp), %rax > movq %rax, -56(%rsp) > movl -40(%rsp), %eax > movl $8, -72(%rsp) > addl %edi, %eax > ret > .cfi_endproc > > I'll try to remove all of these in the GCC 14 development cycle (as they > are causing sub-optimal code in various Glibc functions), but it's not > easy... > > Ok, I have no more questions.
On Tue, 2023-04-18 at 20:03 +0800, Lulu Cheng wrote: > > 在 2023/4/18 下午7:48, Xi Ruoyao 写道: > > On Tue, 2023-04-18 at 19:21 +0800, Lulu Cheng wrote: > > > 在 2023/4/18 下午5:27, Xi Ruoyao 写道: > > > > On Mon, 2023-04-10 at 17:45 +0800, Lulu Cheng wrote: > > > > > Sorry, it's my question. I still have some questions that I > > > > > haven't > > > > > understood, so I haven't replied to the email yet.:-( > > > > I've verified the value of cfun->va_list_gpr_size with -fdump- > > > > tree- > > > > stdarg and various testcases (including extracting aggregates > > > > and > > > > floating-point values in the va list) and the result seems > > > > correct. And > > > > gcc/testsuite/gcc.c-torture/execute/va-arg-*.c should provide a > > > > good > > > > enough test coverage. > > > > > > > > Is there still something seemly problematic? > > > > > > I think there is no problem with the code modification, but I > > > found that > > > the $r12 register is stored whether or not this patch is added. I > > > don't > > > understand why.:-( > > It has been stored before the change: > > > > test: > > .LFB0 = . > > .cfi_startproc > > addi.d $r3,$r3,-80 > > .cfi_def_cfa_offset 80 > > addi.d $r12,$r3,24 > > st.d $r5,$r3,24 > > st.d $r6,$r3,32 > > st.d $r7,$r3,40 > > st.d $r8,$r3,48 > > st.d $r9,$r3,56 > > st.d $r10,$r3,64 > > st.d $r11,$r3,72 > > st.d $r12,$r3,8 # <===== > > add.w $r4,$r5,$r4 > > addi.d $r3,$r3,80 > > .cfi_def_cfa_offset 0 > > jr $r1 > > .cfi_endproc > > > > AFAIK it's related to how the variable arguments are implemented in > > general. The problem is when we expands __builtin_va_list or > > __builtin_va_arg, the registers containing the variable arguments > > and > > the pointer to the variable argument store area (r12 in this case) > > may > > be already clobbered, so the compiler have to store them expanding > > the > > prologue of the function (when the prologue is expanded we don't > > know if > > the following code will clobber the registers). > > > > This also causes a difficulty to avoid saving the GARs for *used* > > variable arguments as well. > > > > On x86_64 we have the same issue: > > > > test: > > .LFB0: > > .cfi_startproc > > leaq 8(%rsp), %rax > > movq %rsi, -40(%rsp) > > movq %rax, -64(%rsp) # <===== > > leaq -48(%rsp), %rax > > movq %rax, -56(%rsp) > > movl -40(%rsp), %eax > > movl $8, -72(%rsp) > > addl %edi, %eax > > ret > > .cfi_endproc > > > > I'll try to remove all of these in the GCC 14 development cycle (as > > they > > are causing sub-optimal code in various Glibc functions), but it's > > not > > easy... > > > > > Ok, I have no more questions. > Pushed r14-69.
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 6927bdc7fe5..0ecb91ca997 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -764,7 +764,9 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum, loongarch_function_arg_advance (pack_cumulative_args (&local_cum), arg); /* Found out how many registers we need to save. */ - gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; + gp_saved = cfun->va_list_gpr_size / UNITS_PER_WORD; + if (gp_saved > (int) (MAX_ARGS_IN_REGISTERS - local_cum.num_gprs)) + gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; if (!no_rtl && gp_saved > 0) { diff --git a/gcc/testsuite/gcc.target/loongarch/va_arg.c b/gcc/testsuite/gcc.target/loongarch/va_arg.c new file mode 100644 index 00000000000..980c96d0e3d --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/va_arg.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* Technically we shouldn't save any register for this function: it should be + compiled as if it accepts 3 named arguments. But AFAIK no compilers can + achieve this "perfect" optimization now, so just ensure we are using the + knowledge provided by stdarg pass and we won't save GARs impossible to be + accessed with __builtin_va_arg () when the va_list does not escape. */ + +/* { dg-final { scan-assembler-not "st.*r7" } } */ + +int +test (int a0, ...) +{ + void *arg; + int a1, a2; + + __builtin_va_start (arg, a0); + a1 = __builtin_va_arg (arg, int); + a2 = __builtin_va_arg (arg, int); + __builtin_va_end (arg); + + return a0 + a1 + a2; +}