diff mbox series

Aarch64: Change stack checking method on Linux

Message ID 21874866.4csPzL39Zc@fomalhaut
State New
Headers show
Series Aarch64: Change stack checking method on Linux | expand

Commit Message

Eric Botcazou Oct. 2, 2024, 6:24 a.m. UTC
Hi,

this changes the stack checking method (that of -fstack-check) used on Linux
from the traditional model (probe then move SP) to the model implemented for
-fstack-clash-protection (probe while moving SP).  The rationale is that the
latter is in widespread use on Linux and thus thought to be more robust.

This entails doing a couple of things: defining STACK_CHECK_MOVING_SP to 1:

 -- Macro: STACK_CHECK_MOVING_SP
     An integer which is nonzero if GCC should move the stack pointer
     page by page when doing probes.  This can be necessary on systems
     where the stack pointer contains the bottom address of the memory
     area accessible to the executing thread at any point in time.  In
     this situation an alternate signal stack is required in order to be
     able to recover from a stack overflow.  The default value of this
     macro is zero

and replacing tests on flag_stack_clash_protection by calls to a new wrapper 
routine do_stack_clash_protection in the back-end [the implementation is the 
same as the one present in the i386 back-end].  

-fstack-check is mainly used for Ada and AdaCore has been using this method in 
its Aarch64/Linux compilers for some time.

Bootstrapped/regtested on Aarch64/Linux, OK for the mainline?


2024-10-01  Eric Botcazou  <ebotcazou@adacore.com>

	* config/aarch64/aarch64-linux.h (STACK_CHECK_MOVING_SP): Define to 1.
	* config/aarch64/aarch64-protos.h (do_stack_clash_protection): Declare.
	* config/aarch64/aarch64.h (STACK_DYNAMIC_OFFSET): Replace
	flag_stack_clash_protection with call to do_stack_clash_protection.
	* config/aarch64/aarch64.cc (aarch64_output_probe_stack_range): Likewise.
	(aarch64_output_probe_sve_stack_clash): Likewise.
	(aarch64_layout_frame): Likewise.
	(aarch64_get_separate_components): Likewise.
	(aarch64_allocate_and_probe_stack_space): Likewise.
	(aarch64_expand_prologue): Likewise.  And do not check the stack prior
	to establishing the frame if STACK_CHECK_MOVING_SP is 1.
	(aarch64_expand_epilogue): Likewise.
	(do_stack_clash_protection): New predicate.

Comments

Richard Sandiford Oct. 3, 2024, 5:23 p.m. UTC | #1
Eric Botcazou <botcazou@adacore.com> writes:
> Hi,
>
> this changes the stack checking method (that of -fstack-check) used on Linux
> from the traditional model (probe then move SP) to the model implemented for
> -fstack-clash-protection (probe while moving SP).  The rationale is that the
> latter is in widespread use on Linux and thus thought to be more robust.
>
> This entails doing a couple of things: defining STACK_CHECK_MOVING_SP to 1:
>
>  -- Macro: STACK_CHECK_MOVING_SP
>      An integer which is nonzero if GCC should move the stack pointer
>      page by page when doing probes.  This can be necessary on systems
>      where the stack pointer contains the bottom address of the memory
>      area accessible to the executing thread at any point in time.  In
>      this situation an alternate signal stack is required in order to be
>      able to recover from a stack overflow.  The default value of this
>      macro is zero

This part I followed :)

> and replacing tests on flag_stack_clash_protection by calls to a new wrapper 
> routine do_stack_clash_protection in the back-end [the implementation is the 
> same as the one present in the i386 back-end].  

But I'm not sure I understand this part.  It seems like it's using the
stack-clash mechanism and stack-clash thresholds for allocating static
parts of the frame while still using the stack-check mechanism for
dynamic alloctions.  Is that right?  If so, don't the two have different
assumptions about which part needs to be probed?  I wasn't sure why we
continued to use PROBE_INTERVAL for dynamic allocations but switched
to stack_clash_probe_interval for static ones (if I followed the code
correctly).

Thanks,
Richard

>
> -fstack-check is mainly used for Ada and AdaCore has been using this method in 
> its Aarch64/Linux compilers for some time.
>
> Bootstrapped/regtested on Aarch64/Linux, OK for the mainline?
>
>
> 2024-10-01  Eric Botcazou  <ebotcazou@adacore.com>
>
> 	* config/aarch64/aarch64-linux.h (STACK_CHECK_MOVING_SP): Define to 1.
> 	* config/aarch64/aarch64-protos.h (do_stack_clash_protection): Declare.
> 	* config/aarch64/aarch64.h (STACK_DYNAMIC_OFFSET): Replace
> 	flag_stack_clash_protection with call to do_stack_clash_protection.
> 	* config/aarch64/aarch64.cc (aarch64_output_probe_stack_range): Likewise.
> 	(aarch64_output_probe_sve_stack_clash): Likewise.
> 	(aarch64_layout_frame): Likewise.
> 	(aarch64_get_separate_components): Likewise.
> 	(aarch64_allocate_and_probe_stack_space): Likewise.
> 	(aarch64_expand_prologue): Likewise.  And do not check the stack prior
> 	to establishing the frame if STACK_CHECK_MOVING_SP is 1.
> 	(aarch64_expand_epilogue): Likewise.
> 	(do_stack_clash_protection): New predicate.
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h
index 8e51c8202cc..de023b97b20 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -72,6 +72,9 @@ 
 #undef TARGET_BINDS_LOCAL_P
 #define TARGET_BINDS_LOCAL_P default_binds_local_p_2
 
+/* The stack pointer should be moved while checking the stack.  */
+#define STACK_CHECK_MOVING_SP 1
+
 /* Define this to be nonzero if static stack checking is supported.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index d03c1fe798b..ec33dc25c25 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -977,6 +977,8 @@  bool aarch64_float_const_representable_p (rtx);
 
 extern int aarch64_epilogue_uses (int);
 
+extern bool do_stack_clash_protection (void);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
 				   rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e7bb3278a27..0f8a649f2db 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -7617,17 +7617,12 @@  aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
   /* Loop.  */
   ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_lab);
 
-  HOST_WIDE_INT stack_clash_probe_interval
-    = 1 << param_stack_clash_protection_guard_size;
-
   /* TEST_ADDR = TEST_ADDR + PROBE_INTERVAL.  */
   xops[0] = reg1;
-  HOST_WIDE_INT interval;
-  if (flag_stack_clash_protection)
-    interval = stack_clash_probe_interval;
-  else
-    interval = PROBE_INTERVAL;
-
+  const HOST_WIDE_INT interval
+    = do_stack_clash_protection ()
+      ? 1 << param_stack_clash_protection_probe_interval
+      : PROBE_INTERVAL;
   gcc_assert (aarch64_uimm12_shift (interval));
   xops[1] = GEN_INT (interval);
 
@@ -7636,7 +7631,7 @@  aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
   /* If doing stack clash protection then we probe up by the ABI specified
      amount.  We do this because we're dropping full pages at a time in the
      loop.  But if we're doing non-stack clash probing, probe at SP 0.  */
-  if (flag_stack_clash_protection)
+  if (do_stack_clash_protection ())
     xops[1] = GEN_INT (STACK_CLASH_CALLER_GUARD);
   else
     xops[1] = CONST0_RTX (GET_MODE (xops[1]));
@@ -7671,7 +7666,7 @@  aarch64_output_probe_sve_stack_clash (rtx base, rtx adjustment,
   /* This function is not allowed to use any instruction generation function
      like gen_ and friends.  If you do you'll likely ICE during CFG validation,
      so instead emit the code you want using output_asm_insn.  */
-  gcc_assert (flag_stack_clash_protection);
+  gcc_assert (do_stack_clash_protection ());
   gcc_assert (CONST_INT_P (min_probe_threshold) && CONST_INT_P (guard_size));
   gcc_assert (INTVAL (guard_size) > INTVAL (min_probe_threshold));
 
@@ -7986,7 +7981,7 @@  aarch64_layout_frame (void)
       allocate_gpr_slot (R29_REGNUM);
       allocate_gpr_slot (R30_REGNUM);
     }
-  else if ((flag_stack_clash_protection || !frame.is_scs_enabled)
+  else if ((do_stack_clash_protection () || !frame.is_scs_enabled)
 	   && known_eq (frame.reg_offset[R30_REGNUM], SLOT_REQUIRED))
     /* Put the LR save slot first, since it makes a good choice of probe
        for stack clash purposes.  The idea is that the link register usually
@@ -8914,7 +8909,7 @@  aarch64_get_separate_components (void)
 
   bitmap_clear_bit (components, LR_REGNUM);
   bitmap_clear_bit (components, SP_REGNUM);
-  if (flag_stack_clash_protection)
+  if (do_stack_clash_protection ())
     {
       if (frame.sve_save_and_probe != INVALID_REGNUM)
 	bitmap_clear_bit (components, frame.sve_save_and_probe);
@@ -9152,6 +9147,23 @@  aarch64_set_handled_components (sbitmap components)
       cfun->machine->reg_is_wrapped_separately[regno] = true;
 }
 
+/* Return true if stack clash protection is requested.  */
+
+bool
+do_stack_clash_protection (void)
+{
+  /* The canonical case.  */
+  if (flag_stack_clash_protection)
+    return true;
+
+  /* If static stack checking is enabled and the stack pointer should be
+     moved while doing it, simply use the stack clash implementation.  */
+  if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK && STACK_CHECK_MOVING_SP)
+    return true;
+
+  return false;
+}
+
 /* On AArch64 we have an ABI defined safe buffer.  This constant is used to
    determining the probe offset for alloca.  */
 
@@ -9215,7 +9227,7 @@  aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2,
   /* We should always have a positive probe threshold.  */
   gcc_assert (min_probe_threshold > 0);
 
-  if (flag_stack_clash_protection && !final_adjustment_p)
+  if (do_stack_clash_protection () && !final_adjustment_p)
     {
       poly_int64 initial_adjust = frame.initial_adjust;
       poly_int64 sve_callee_adjust = frame.sve_callee_adjust;
@@ -9236,7 +9248,7 @@  aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2,
   /* If SIZE is not large enough to require probing, just adjust the stack and
      exit.  */
   if (known_lt (poly_size, min_probe_threshold)
-      || !flag_stack_clash_protection)
+      || !do_stack_clash_protection ())
     {
       aarch64_sub_sp (temp1, temp2, poly_size, force_isa_mode,
 		      frame_related_p);
@@ -9614,7 +9626,7 @@  aarch64_expand_prologue (void)
   if (aarch64_cfun_enables_pstate_sm ())
     force_isa_mode = AARCH64_ISA_MODE_SM_ON;
 
-  if (flag_stack_clash_protection
+  if (do_stack_clash_protection ()
       && known_eq (callee_adjust, 0)
       && known_lt (frame.reg_offset[VG_REGNUM], 0))
     {
@@ -9650,7 +9662,7 @@  aarch64_expand_prologue (void)
   if (flag_stack_usage_info)
     current_function_static_stack_size = constant_lower_bound (frame_size);
 
-  if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK)
+  if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK && !STACK_CHECK_MOVING_SP)
     {
       if (crtl->is_leaf && !cfun->calls_alloca)
 	{
@@ -9743,7 +9755,7 @@  aarch64_expand_prologue (void)
     }
   if (maybe_ne (sve_callee_adjust, 0))
     {
-      gcc_assert (!flag_stack_clash_protection
+      gcc_assert (!do_stack_clash_protection ()
 		  || known_eq (initial_adjust, 0)
 		  /* The VG save isn't shrink-wrapped and so serves as
 		     a probe of the initial allocation.  */
@@ -9894,7 +9906,7 @@  aarch64_expand_epilogue (rtx_call_insn *sibcall)
      value.  */
   bool can_inherit_p = (initial_adjust.is_constant ()
 			&& final_adjust.is_constant ()
-			&& (!flag_stack_clash_protection
+			&& (!do_stack_clash_protection ()
 			    || (known_lt (initial_adjust,
 					  guard_size - guard_used_by_caller)
 				&& known_eq (sve_callee_adjust, 0))));
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index ec8fde783b3..acc26aed808 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -1513,7 +1515,7 @@  extern poly_uint16 aarch64_sve_vg;
    as the extra arg space allows us to skip a check in alloca.  */
 #undef STACK_DYNAMIC_OFFSET
 #define STACK_DYNAMIC_OFFSET(FUNDECL)			   \
-   ((flag_stack_clash_protection			   \
+   ((do_stack_clash_protection ()			   \
      && cfun->calls_alloca				   \
      && known_lt (crtl->outgoing_args_size,		   \
 		  STACK_CLASH_MIN_BYTES_OUTGOING_ARGS))    \